llama.cpp

Commit Graph

Author	SHA1	Message	Date
tjohnman	24568371ae	Support for multiple reverse prompts. (#299 ) Co-authored-by: Johnman <> Co-authored-by: Johnman <tjohnman@github>	1 year ago
tjohnman	ad5fd5b60c	Make prompt randomization optional. (#300 ) Co-authored-by: Johnman <>	1 year ago
slaren	50fae10d03	Add --ignore-eos parameter (#181 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Erik Scholz	0b366e7357	Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) (#294 ) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>	1 year ago
Georgi Gerganov	70f01cb863	Drop trailing new line from file prompts (#80 )	1 year ago
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	1 year ago
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	1 year ago
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	1 year ago
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	1 year ago
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	1 year ago
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	1 year ago
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	1 year ago
Ben Garney	f385f8dee8	Allow using prompt files (#59 )	1 year ago
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	1 year ago
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	1 year ago
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	1 year ago
Jean-Michaël Celerier	9dcf4dba45	Add missing headers for memcpy and assert (#3 )	1 year ago
Georgi Gerganov	70bc0b8b15	Fix a bug in the rope calculation	1 year ago
Georgi Gerganov	319cdb3e1f	Final touches	1 year ago
Georgi Gerganov	26c0846629	Initial release	1 year ago

22 Commits (5cb63e2493c49bc2c3b9b355696e8dc26cdd0380)