From 97ab2b257897bfe7e2ae72876a3e50ed41b8c7ce Mon Sep 17 00:00:00 2001 From: Georgi Gerganov Date: Tue, 14 Mar 2023 09:43:52 +0200 Subject: [PATCH] Add Misc section + update hot topics + minor fixes --- README.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e936282..085f19e 100644 --- a/README.md +++ b/README.md @@ -5,12 +5,17 @@ Inference of [Facebook's LLaMA](https://github.com/facebookresearch/llama) model in pure C/C++ +**Hot topics:** + +- Cache input prompts for faster initialization: https://github.com/ggerganov/llama.cpp/issues/64 +- Create a `llama.cpp` logo: https://github.com/ggerganov/llama.cpp/issues/105 + ## Description The main goal is to run the model using 4-bit quantization on a MacBook - Plain C/C++ implementation without dependencies -- Apple silicon first-class citizen - optimized via Arm Neon and Accelerate framework +- Apple silicon first-class citizen - optimized via ARM NEON - AVX2 support for x86 architectures - Mixed F16 / F32 precision - 4-bit quantization support @@ -174,7 +179,7 @@ Note the use of `--color` to distinguish between user input and generated text. ## Limitations -- I don't know yet how much the quantization affects the quality of the generated text +- We don't know yet how much the quantization affects the quality of the generated text - Probably the token sampling can be improved - The Accelerate framework is actually currently unused since I found that for tensor shapes typical for the Decoder, there is no benefit compared to the ARM_NEON intrinsics implementation. Of course, it's possible that I simlpy don't @@ -187,11 +192,15 @@ Note the use of `--color` to distinguish between user input and generated text. - Collaborators can push to branches in the `llama.cpp` repo - Collaborators will be invited based on contributions -### Coding guide-lines +### Coding guidelines - Avoid adding third-party dependencies, extra files, extra headers, etc. - Always consider cross-compatibility with other operating systems and architectures -- Avoid fancy looking modern STL constructs, use basic for loops, avoid templates, keep it simple +- Avoid fancy looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple - There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit -- Clean-up any tailing whitespaces, use 4 spaces indentation, brackets on same line, `int * var` -- Look at the [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks +- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a` +- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions + +### Misc + +- Practice your C++ typing skills: https://typing-battles.ggerganov.com