Commit Graph

494 Commits (e216aa04633892b972d013719e38b59fd4917341)
 

Author SHA1 Message Date
Evan Jones e216aa0463
llama : only copy used KV cache in get / set state (#1272)
* llama : only copy used KV cache in get / set state

* switch to ggml for copying k, v

* avoid designated initializers
1 year ago
DannyDaemonic 2485d7a4d3
Process escape sequences given in prompts (#1173) 1 year ago
DannyDaemonic 13b0c68ed7
Handle signals properly on Windows (#1123) 1 year ago
DannyDaemonic 55bc5f0900
Call sh on build-info.sh (#1294) 1 year ago
kuvaus 9daff419f6
fix build-info.h for git submodules (#1289)
* make git build info work with submodules

---------

Co-authored-by: Green Sky <green@g-s.xyz>
1 year ago
slaren bf4b22ffe4
fix missing parameters in `llama_init_from_gpt_params` (#1293) 1 year ago
Ron Evans 67c77799e0
examples : add llama_init_from_gpt_params() common function (#1290)
Signed-off-by: deadprogram <ron@hybridgroup.com>
1 year ago
Georgi Gerganov 0e6cbff1b7
llama : fix compile warnings 1 year ago
Georgi Gerganov 5d5817ca60
ggml : fix 32-bit ARM 1 year ago
Ron Evans 8c9be35ff9
examples : improve vertical alignment of a few variables (#1286)
Signed-off-by: deadprogram <ron@hybridgroup.com>
1 year ago
Marvin Gießing cc0bb7235c
ggml : fix ppc64le build error and make cmake detect Power processors (#1284)
* Fix ppc64le build issue

* Added support to detect ppc64* processors
1 year ago
Robert Brisita 2bb992f034
llama : allow 0 as a seed number. (#1275) 1 year ago
Ron Evans e2cd506999
main : switch input_noecho to input_echo to remove negation (#979)
Signed-off-by: deadprogram <ron@hybridgroup.com>
1 year ago
slaren 2d099e5193
ggml: add names to tensors (#1268)
* ggml: add names to tensors

* minor improvements to dot file formatting
1 year ago
DannyDaemonic f4cef87edf
Add git-based build information for better issue tracking (#1232)
* Add git-based build information for better issue tracking

* macOS fix

* "build (hash)" and "CMAKE_SOURCE_DIR" changes

* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages

* Fix conditional dependency on missing target

* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile

* 4 space indenting for cmake, attempt to clean up my mess in Makefile

* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
1 year ago
slaren 58b367c2d7
cuBLAS: refactor and optimize f16 mat mul performance (#1259)
* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1
1 year ago
xloem ea3a0ad6b6
llama : update stubs for systems without mmap and mlock (#1266)
Co-authored-by: John Doe <john.doe@example.com>
1 year ago
Kerfuffle 2bdc09646d
ggml : fix ggml_used_mem() (#1264) 1 year ago
Georgi Gerganov 70269cae37
llama : fix session load / save (#1263) 1 year ago
slaren b925f1f1b0
cuBLAS: fall back to pageable memory if pinned alloc fails (#1233)
* cuBLAS: fall back to pageable memory if pinned alloc fails

* cuBLAS: do not use pinned memory if env variable GGML_CUDA_NO_PINNED is set
1 year ago
Alex Klinkhamer 90b19bd6ee
llama : let context be const when accessing const data (#1261) 1 year ago
Georgi Gerganov 7ff0dcd320
ggml : fix UB (int << 31) 1 year ago
Pavol Rusnak 6f79699286
build: add armv{6,7,8} support to cmake (#1251)
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
1 year ago
jon-chuang a5d30b1f53
common : better default number of threads (#934)
* commit

* fix

* try-catch

* apply code review

* improve

* improve

* add macos headers

* done

* remove color

* fix windows

* minor

* fix

* Apply suggestions from code review

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

* remove

* minor

* minor

---------

Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
1 year ago
0cc4m 76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
* Implement q5_0, q5_1 and q8_0

* Work around q5_0 OpenCL issue

* Fix q8_0 dequant kernel

* Move cl kernels into ggml-opencl.c

* Use two memcpy calls for q5_0 buffer transfer
1 year ago
Georgi Gerganov 6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE 1 year ago
Stephan Walter f0d70f147d
Various fixes to mat_mul benchmark (#1253) 1 year ago
Georgi Gerganov 3e5aa8a1c4
ggml : fix labels for GGML_OP_ALIBI 1 year ago
Georgi Gerganov c3ca7a5f05
ggml : fix 32-bit ARM NEON 1 year ago
Georgi Gerganov e8c051611a
ggml : use vzip instead of vuzp for consistency 1 year ago
Georgi Gerganov 0b5a935099
ggml : fix visibility and unused warnings 1 year ago
Georgi Gerganov ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) 1 year ago
Georgi Gerganov 214b6a3570
ggml : adjust mul_mat_f16 work memory (#1226)
* llama : minor - remove explicity int64_t cast

* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS

* ggml : add asserts to guard for incorrect wsize
1 year ago
Georgi Gerganov 305eb5afd5
build : fix reference to old llama_util.h 1 year ago
Georgi Gerganov 84ca9c2ecf
examples : fix save-load-state + rename llama-util.h 1 year ago
Georgi Gerganov 334637e43e
common : change default parameters to pre-#1126 (#1223) 1 year ago
Ivan Stepanov dd7eff57d8
llama : new sampling algorithms (#1126)
* Sample interface, new samplers.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.

* mirostat

* Added --logit-bias and --no-penalize-nl, removed std::span

* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

* Save and load example adjust

* Tests

* Windows build fix

* Windows test fix
1 year ago
slaren 7fc50c051a
cuBLAS: use host pinned memory and dequantize while copying (#1207)
* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
1 year ago
Henri Vasserman b1ee8f59b4
cuBLAS: non-contiguous tensor support (#1215)
* Cuda: non-contiguous tensor support

* remove extra stuff

* rename

* fix error

* more fixes, now OpenBLAS and CLBlast build too

* now then?
1 year ago
Stephan Walter 36d19a603b
Remove Q4_3 which is no better than Q5 (#1218) 1 year ago
Georgi Gerganov 7f15c5c477
readme : update hot topics 1 year ago
Georgi Gerganov 55390bcaf2
ggml : sync ggml (ggml_alibi) 1 year ago
CRD716 5fba3c016b
examples : add Jeopardy example (#1168)
* Basic Setup

* Prevent Results.txt from coming up

* Prefixes, Line separators, etc

* editorcheck

* introduction to give more consistent results

* Basic graph thing

* Grading, ready for testing!

* Y'all ready to get funky?

* fix column removal stuff

* missed a few
1 year ago
Evan Jones 1481a9cf25
llama : add session file format and saved sessions in main (#1169) 1 year ago
Georgi Gerganov 11d902364b
ggml : add helper debug printf in soft_max 1 year ago
0cc4m 7296c961d9
ggml : add CLBlast support (#1164)
* Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing

* Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers

* Finish merge of ClBlast support

* Move CLBlast implementation to separate file

Add buffer reuse code (adapted from slaren's cuda implementation)

* Add q4_2 and q4_3 CLBlast support, improve code

* Double CLBlast speed by disabling OpenBLAS thread workaround

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>

* Fix device selection env variable names

* Fix cast in opencl kernels

* Add CLBlast to CMakeLists.txt

* Replace buffer pool with static buffers a, b, qb, c

Fix compile warnings

* Fix typos, use GGML_TYPE defines, improve code

* Improve btype dequant kernel selection code, add error if type is unsupported

* Improve code quality

* Move internal stuff out of header
* Use internal enums instead of CLBlast enums
* Remove leftover C++ includes and defines
* Make event use easier to read

Co-authored-by: Henri Vasserman <henv@hot.ee>

* Use c compiler for opencl files

* Simplify code, fix include

* First check error, then release event

* Make globals static, fix indentation

* Rename dequant kernels file to conform with other file names

* Fix import cl file name

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Folko-Ven 78ec543733
Correcting link to w64devkit (#1214)
Correcting link to w64devkit (change seeto to skeeto).
1 year ago
Johannes Gäßler 92a6e13a31
Add Manjaro CUDA include and lib dirs to Makefile (#1212) 1 year ago
Yann Follet 04aaae1d79
add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) 1 year ago
Stephan Walter 0b2da20538
ggml : slightly faster AVX2 implementation for Q5 (#1197) 1 year ago