Commit Graph

227 Commits (436e56193199a1625f8c561069f702e8840a9e08)
 

Author SHA1 Message Date
Stephan Walter 436e561931
all : be more strict about converting float to double (#458)
* Be more strict about converting float to double

* Test equivalence of round, SILU implementations

Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.

* Fix softmax in perplexity.cpp

* all : prefer float over double where appropriate

* perplexity : add <cmath>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Jed Fox 20e1e84884
deploy : add a Package.swift for SwiftPM support (#393)
* Add a Package.swift for SwiftPM support

* Swap from exclusions to allowlist
1 year ago
Stephan Walter c1f885067c
ggml : introduce structs for the q4 data blocks (#356)
* Introduce structs for the q4 data blocks

* ggml : rename quant struct variables + fix ARM_NEON

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov e0670260fb
gitignore : add "embedding" 1 year ago
dotpy314 28ba975aea
Check the existence of f16_model_path_base in quantize.py (#574)
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com>
1 year ago
slaren a6bdc47cba
Fix usage of F16C intrinsics in AVX code (#563)
* Fix usage of F16C intrinsics in AVX code when F16C is not defined
1 year ago
anzz1 7b8dbcb78b
main.cpp fixes, refactoring (#571)
- main: entering empty line passes back control without new input in interactive/instruct modes
- instruct mode: keep prompt fix
- instruct mode: duplicate instruct prompt fix
- refactor: move common console code from main->common
1 year ago
RJ Adriaansen 4b8efff0e3
Add embedding example to Makefile (#540) 1 year ago
Marco Matthies 7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 1 year ago
Erik Scholz 34c1072e49
ci: add debug build to sanitizer build matrix (#527) 1 year ago
Stephan Walter 939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531) 1 year ago
Juan Calderon-Perez 8c2ec5e21d
Add support for linux/arm64 platform during Docker Builds (#514)
* Add support for linux/arm64 platform

* Add platform to versioned builds
1 year ago
Stephan Walter b391579db9
Update README and comments for standalone perplexity tool (#525) 1 year ago
anzz1 7a87d31f4f
[main] fix infinite generation (-n == -1) (#523) 1 year ago
Georgi Gerganov 348d6926ee
Add logo to README.md 1 year ago
Harald Fernengel 33e35b8fe8
Exit from interactive mode if input stream is bad (#491)
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z
on Windows.
1 year ago
anzz1 19726169b3
CI: Run other sanitizer builds even if one fails (#511)
applies only to sanitizer builds so they wont be cancelled
1 year ago
jp-x-g f732695cd5
Clarify console output in convert-pth-to-ggml.py (#512)
"Processing part 1 of 3" instead of "Processing part 0"
1 year ago
anzz1 2f7bf7dd7c
CMake / CI additions (#497)
* CMake: Add AVX512 option

* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)

* CMake: Fix sanitizer linkage ( merged #468 )

* CI: Add sanitizer builds (Ubuntu)

* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
1 year ago
anzz1 34ab526843
(Windows) Set console to UTF-8 on init (#420)
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
1 year ago
Georgi Gerganov c2b25b6912
Fix colors enabling on WIN32 1 year ago
Georgi Gerganov 79b2b266db
If n_predict == -1, generate forever 1 year ago
Georgi Gerganov e2d490dafd
Inifinite generation via context swapping (#71) 1 year ago
Georgi Gerganov 03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff 1 year ago
Georgi Gerganov 55ad42af84
Move chat scripts into "./examples" 1 year ago
slaren 459e93cce0
Add AVX2 implementation of dequantize_row_q4_1 (#505) 1 year ago
Georgi Gerganov a316a425d0
Overhaul the examples structure
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"

Hope I didn't break something !
1 year ago
Georgi Gerganov ecbe466a36
Retire the ggml_mul_mat() branch for transposed src0 (#500)
* Retire the ggml_mul_mat() for transposed src0

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads

* SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502)

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
1 year ago
Georgi Gerganov 502a400192
Disable prompt verbosity by default and add option to enable (#480) 1 year ago
slaren 09aecbf628
Add AVX2 implementation of dequantize_row_q4_0 (#467) 1 year ago
Georgi Gerganov 4640eff23d
Don't interefe with BLAS for large prompts by running only 1 thread 1 year ago
Georgi Gerganov ab77d76312
Add longer DAN prompt for testing big batch numbers 1 year ago
slaren 29b7baab67
Add timings for the prompt evaluation (#478) 1 year ago
Georgi Gerganov 4a7129acd2
Remove obsolete information from README 1 year ago
Georgi Gerganov 6b6dbc8910
Remove obsolete assert and fix compiler warning 1 year ago
Georgi Gerganov 2a2e63ce05
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS 1 year ago
anzz1 e899bf54b2
bounds checking for input prefix (#492) 1 year ago
anzz1 fbd4d38c64
feat: '--in-prefix STRING' option (#426)
Prefix user inputs with a string
1 year ago
Jed Fox 58e6c9f36f
Add support for file load progress reporting callbacks (#434)
* File load progress reporting

* Move llama_progress_handler into llama_context_params

* Renames

* Use seekg to find file size instead

* More correct load progress

* Call progress callback more frequently

* Fix typo
1 year ago
Doomsdayrs 36d07532ef
Add missing struct annotation (#483)
`llama_sample_top_p_top_k` was missing the struct annotation on line 126.

This causes a compiler issue when being parsed by the Kotlin C interop generator.

This commit fixes the above issue by adding the struct annotation.
1 year ago
Chris Kuehl 6f1ee4b640
Fix crash for 65B model with pre-allocated memory (#485) 1 year ago
Georgi Gerganov 8520fc310e
Disable BLAS altogether - the bug is not just for qunatized mat mul 1 year ago
Georgi Gerganov b3f460e941
Disable BLAS branch in mul_mat - seems there is a bug 1 year ago
Georgi Gerganov 04c6f5ed6f
Immediately start processing the prompt before user input has been provided (#476) 1 year ago
Georgi Gerganov 7a9b6c3a8b
Reduce memory usage and allocate enough memory for largest context (#473)
* Reduce memory usage and allocate enough memory for large contexts

* Simpler scratch buffer usage

* Reenable BLAS for quantized mul_mat

* Fix number of layers in 30B and 65B

* Fix KV cache size for F32
1 year ago
Georgi Gerganov 31572d9665
Temporary bump the memory buffer size - hopefully fix issues from 483bab2e 1 year ago
Gary Mulder f4f5362edb
Update README.md (#444)
Added explicit **bolded** instructions clarifying that people need to request access to models from Facebook and never through through this repo.
1 year ago
rabidcopy 863f65e2e3
fix instruct mode (#445)
changes to EOS behavior in interactive and reverse prompt handling broke instruct mode by erroneously injecting instruct mode's reverse prompt and an extra newline.
1 year ago
Georgi Gerganov afd220d9c6
Properly free llama_context on failure 1 year ago
Cameron Kaiser 481044d50c
additional optimizations for POWER9 (#454) 1 year ago