sfmt19937 slower than mt19937 on arm?
Hi there,
Thanks for making this lib public. I was trying it out on an arm server, and encountered some performance issues regarding mt19937 vs sfmt19937. Specifically, I modified the pi example to apply mt19937 or sfmt 19937, and remove the pi calculation part to compare the pure rng performance. Seems that sfmt19937 can only achieve 2/3 throughput of mt19937. Did I miss anything here? Thanks!
key code below:
VSLStreamStatePtr stream;
int errcode = vslNewStream(&stream, VSL_BRNG_MT19937, 42); // or VSL_BRNG_SFMT19937
float *randomNumbers = malloc(1000 * 1000 * 1000 * sizeof(float));
errcode = vsRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, 1000 * 1000 * 1000, randomNumbers, 0, 1);
Edited by Congregate Migrate