提交记录 21397 - Judge Duck Online

用户	题目	状态	得分	用时	内存	语言	代码长度
jcer114514	1001a. 测测你的排序2	Accepted	100	96.54 us	116 KB	C++	1.14 KB

提交时间	评测时间
2024-03-18 16:55:02	2024-03-18 16:55:05

代码

#pragma GCC optimize("Ofast,inline,unroll-loops,fast-math,no-stack-protector")
#include <bits/stdc++.h>
#include <immintrin.h>

using namespace std;

const int n = 1e4;

template<class T>
void F(uint* __restrict__ buc, uint* __restrict__ a, uint* __restrict__ b, T lambda) {
    for (int i = 0; i < n; i += 16) {
        _mm_prefetch(&a[i + 256], _MM_HINT_NTA);
        #pragma GCC unroll 16
        for (int j = 0; j < 16; j++)
            b[buc[lambda(a[i + j])]++] = a[i + j];
    }
}
void sort(uint* a, int __n) {
    uint buc[4][256] = {};
    uint* b = (uint*)malloc(n * sizeof(uint));
    for (int i = 0; i < n; i++) {
        buc[0][a[i] & 255]++;
        buc[1][a[i] >> 8 & 255]++;
        buc[2][a[i] >> 16 & 255]++;
        buc[3][(a[i] >> 16) >> 8 & 255]++;
    }
    for (int k = 0; k < 4; k++) {
        uint32_t offset = 0;
        for (int i = 0; i < 256; i++)
            swap(buc[k][i], offset), offset += buc[k][i];
    }
    F(buc[0], a, b, [](uint x) { return x & 255; });
    F(buc[1], b, a, [](uint x) { return x >> 8 & 255; });
    F(buc[2], a, b, [](uint x) { return x >> 16 & 255; });
    F(buc[3], b, a, [](uint x) { return x >> 24; });
}

评测结果

Compilation

N/A

Compile OK

Score: N/A

显示更多

Compile OK

Testcase #1

96.54 us

116 KB

Accepted

Score: 100

显示更多

Time (ms): 0.096538
Memory (KiB): 116
Status: Run Finished

ok Accepted