Retire the ggml_mul_mat() branch for transposed src0 (#500)

* Retire the ggml_mul_mat() for transposed src0

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads

* SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502)

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
pull/505/head master-ecbe466
Georgi Gerganov 1 year ago committed by GitHub
parent 502a400192
commit ecbe466a36
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

953
ggml.c

File diff suppressed because it is too large Load Diff
Loading…
Cancel
Save