Commit Graph

14 Commits (4f546091102a418ffdc6230f872ac56e5cedb835)

Author SHA1 Message Date
Stephan Walter 367946c668
Don't tell users to use a bad number of threads (#243)
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
1 year ago
Matvey Soloviev 904d2a8d6a
Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
1 year ago
Justin Suess 2d64715ad4
added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Thomas Klausner 41be0a3b3d
Add NetBSD support. (#90) 1 year ago
Matvey Soloviev 96ea727f47
Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
1 year ago
Ben Garney f385f8dee8
Allow using prompt files (#59) 1 year ago
beiller 02f0c6fe7f
Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Sebastián A eb062bb012
Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
1 year ago
beiller 129c7d1ea8
Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov 007a8f6f45
Support all LLaMA models + change Q4_0 quantization storage 1 year ago
Jean-Michaël Celerier 9dcf4dba45
Add missing headers for memcpy and assert (#3) 1 year ago
Georgi Gerganov 70bc0b8b15
Fix a bug in the rope calculation 1 year ago
Georgi Gerganov 319cdb3e1f
Final touches 1 year ago
Georgi Gerganov 26c0846629
Initial release 1 year ago