Commit Graph

68 Commits (cbddf4661be736c299598ee37ceb662cdf7bdd7c)
 

Author SHA1 Message Date
Justine Tunney cbddf4661b
Get mmap() working with WIN32 MSVC
- We have pretty high quality POSIX polyfills now
- We no longer need to override malloc()

Tracked by issue #91
Improves upon #341
1 year ago
oKatanaaa e4881686b4
Make WIN32 mmap() improvements (#341)
Still not fully working yet.

Closes #341
1 year ago
Justine Tunney 0b5448a3a4
Implement system polyfill for win32 / posix.1
I don't have access to Microsoft Visual Studio right now (aside from the
the Github Actions CI system) but I think this code should come close to
what we want in terms of polyfilling UNIX functionality.
1 year ago
Justine Tunney 5b8023d935
Implement prototype for instant mmap() loading
This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.
1 year ago
Justine Tunney 2788f373be
Get the build working 1 year ago
Ronsor 47857e564c
Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Radoslav Gerganov 60f819a2b1
Add section to README on how to run the project on Android (#130) 1 year ago
Georgi Gerganov 97ab2b2578
Add Misc section + update hot topics + minor fixes 1 year ago
Sebastián A 2f700a2738
Add windows to the CI (#98) 1 year ago
Georgi Gerganov c09a9cfb06
CMake build in Release by default (#75) 1 year ago
Georgi Gerganov 7ec903d3c1
Update contribution section, hot topics, limitations, etc. 1 year ago
Georgi Gerganov 4497ad819c
Print system information 1 year ago
Sebastián A ed6849cc07
Initial support for CMake (#75) 1 year ago
Thomas Klausner 41be0a3b3d
Add NetBSD support. (#90) 1 year ago
Pavol Rusnak 671d5cac15
Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
1 year ago
Georgi Gerganov 84d9015c4a
Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
1 year ago
uint256_t 63fd76fbb0
Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Val Kharitonov 2a20f48efa
Fix UTF-8 handling (including colors) (#79) 1 year ago
Pavol Rusnak d1f224712d
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov 1808ee0500
Add initial contribution guidelines 1 year ago
Matvey Soloviev a169bb889c Gate signal support on being on a unixoid system. (#74) 1 year ago
Matvey Soloviev 460c482540 Fix token count accounting 1 year ago
Georgi Gerganov c80e2a8f2a
Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
1 year ago
Georgi Gerganov 54a0e66ea0
Check for vdotq_s32 availability 1 year ago
Georgi Gerganov 543c57e991
Ammend to previous commit - forgot to update non-QRDMX branch 1 year ago
Georgi Gerganov 113a9e83eb
10% performance boost on ARM 1 year ago
Matvey Soloviev 404fac0d62
Fix color getting reset before prompt output done (#65)
(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
1 year ago
Georgi Gerganov 1a0a74300f
Update README.md 1 year ago
Matvey Soloviev 96ea727f47
Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
1 year ago
Marc Köhlbrugge 9661954835
Fix typo in README (#45) 1 year ago
Ben Garney f385f8dee8
Allow using prompt files (#59) 1 year ago
beiller 02f0c6fe7f
Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Sebastián A eb062bb012
Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
1 year ago
Georgi Gerganov 7027a97837
Update README.md 1 year ago
Georgi Gerganov 2d555e5b42
Add CI (#60) 1 year ago
Georgi Gerganov 7c9e54e55e
Revert "weights_only" arg - this causing more trouble than help 1 year ago
Oleksandr Nikitin b9bd1d0141
python/pytorch compat notes (#44) 1 year ago
beiller 129c7d1ea8
Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 year ago
Georgi Gerganov 702fddf5c5
Clarify meaning of hacking 1 year ago
Georgi Gerganov 7d86e25bf6
README: add "Supported platforms" + update hot topics 1 year ago
deepdiffuser a93120236f
use weights_only in conversion script (#32)
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
1 year ago
Pavol Rusnak 6a9a67f0be
Add LICENSE (#21) 1 year ago
Georgi Gerganov da1a4ff01f
Update README.md 1 year ago
Juraj Bednar 6b2cb6302f
Fix a typo in model name (#16) 1 year ago
Georgi Gerganov 4235e3d5b3
Update README.md 1 year ago
Georgi Gerganov f1eaff4721 Add AVX2 support for x86 architectures thanks to @Const-me ! 1 year ago
Georgi Gerganov a9e58529ea Fix un-initialized FP16 tables on x86 (#15, #2) 1 year ago
Georgi Gerganov 7d9ed7b25f
Bump memory buffer 1 year ago
Georgi Gerganov 0c6803321c
Update README.md 1 year ago
Georgi Gerganov f60fa9e50a
.gitignore models/ 1 year ago