参考:
https://zhuanlan.zhihu.com/p/638427280
模型下载:
https://huggingface.co/nyanko7/LLaMA-7B/tree/main
下载下来后在llama.cpp-master\models\下再创建LLamda\7B目录
1、 convert the 7B model to ggml FP16 format
convert.py文件就在llama.cpp-master下
python3 convert.py models/7B/
复制代码
2、量化quantize the model to 4-bits (using q4_0 method)
quantize.exe在llama.cpp-master\build\bin\Release下;量化后体积大概从13G到不到4G大小