Precision Issue in matmul_clamp_f16_f16_f16p Example
I'm currently attempting to utilize matmul_clamp_f16_f16_f16p in an LLM application. Regrettably, it appears that the matmul_clamp_f16_f16_f16p example has precision - related issues. I've made some modifications to the existing matmul_clamp_f16_f16_f16p example. Specifically, I've changed the fill_matrix function to the following:
void fill_uniform_random(size_t num_rows, size_t num_cols, float16_t* dst, size_t seed) {
std::srand(seed);
// Populate the array with random values ranging from -1 to 1
for (size_t i = 0; i < num_rows * num_cols; i++) {
dst[i] = (float16_t)((double)std::rand() / RAND_MAX) * 2 - 1;
}
}
Moreover, I've set M, N, and K all to 1024. Under this configuration, matmul_clamp_f16_f16_f16p fails to generate the correct output.