Commit Graph

22 Commits (5cb63e2493c49bc2c3b9b355696e8dc26cdd0380)

Author SHA1 Message Date
tjohnman 24568371ae
Support for multiple reverse prompts. (#299)
Co-authored-by: Johnman <>
Co-authored-by: Johnman <tjohnman@github>
1 year ago
tjohnman ad5fd5b60c
Make prompt randomization optional. (#300)
Co-authored-by: Johnman <>
1 year ago
slaren 50fae10d03
Add --ignore-eos parameter (#181)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Erik Scholz 0b366e7357
Command line switch to use F16 for memory_k and memory_v (refactor of #154) (#294)
* Use F16 for memory_k and memory_v

* add command line switch to use f16 instead of f32 for memory k+v

---------

Co-authored-by: Ty Everett <ty@tyweb.us>
1 year ago
Georgi Gerganov 70f01cb863
Drop trailing new line from file prompts (#80) 1 year ago
Georgi Gerganov 9e1707218a
Add "--instruct" argument for usage with Alpaca (#240)
Also start adding prompts in "./prompts"
1 year ago
Gary Linscott a81d0c2a17
Fix n^2 loop in tokenization (#254)
This causes long prompts to parse very slowly.
1 year ago
thement c9f670a177
Implement non-greedy tokenizer that tries to maximize token lengths (#242)
* Implement non-greedy tokenizer that tries to maximize token lengths

* Insert single space in front of the prompt

- this is to match original llama tokenizer behavior

---------

Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
1 year ago
Stephan Walter 367946c668
Don't tell users to use a bad number of threads (#243)
The readme tells people to use the command line option "-t 8", causing 8
threads to be started. On systems with fewer than 8 cores, this causes a
significant slowdown. Remove the option from the example command lines
and use /proc/cpuinfo on Linux to determine a sensible default.
1 year ago
Matvey Soloviev 904d2a8d6a
Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
1 year ago
Justin Suess 2d64715ad4
added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Thomas Klausner 41be0a3b3d
Add NetBSD support. (#90) 1 year ago
Matvey Soloviev 96ea727f47
Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
1 year ago
Ben Garney f385f8dee8
Allow using prompt files (#59) 1 year ago
beiller 02f0c6fe7f
Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Sebastián A eb062bb012
Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
1 year ago
beiller 129c7d1ea8
Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov 007a8f6f45
Support all LLaMA models + change Q4_0 quantization storage 1 year ago
Jean-Michaël Celerier 9dcf4dba45
Add missing headers for memcpy and assert (#3) 1 year ago
Georgi Gerganov 70bc0b8b15
Fix a bug in the rope calculation 1 year ago
Georgi Gerganov 319cdb3e1f
Final touches 1 year ago
Georgi Gerganov 26c0846629
Initial release 1 year ago