Ollama - 定制化私有大模型

拥有自己的私有大模型

在本教程中，我们将探索如何使用 Ollama 提供的 Modelfile 机制来定制自己的大型语言模型 (LLM)。我们将以 llama3:8b 为例，展示自定义过程。

引言

今天，我将分享如何利用 Modelfile 创建新的模型或调整已有模型，以适应特定的应用场景。这包括内嵌自定义提示、修改上下文长度、温度、随机种子、减少废话程度以及增加或减少输出多样性等（这不是微调，只是调整模型原参数）。

准备工作

在开始定制之前，您需要准备以下几点：

安装 Ollama
下载并初始化一个新的模型版本：ollama pull llama3:8b
确保能成功运行已下载的大语言模型：ollama run llama3:8b

创建 Modelfile

使用 ollama show 命令生成 Modelfile：ollama show llama3:8b --modelfile > myllama3.modelfile

Modelfile 的内容如下所示：

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM llama3:latest

FROM /Users/yourname/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE "{{ if .System }}{{ .System}}{{ end }}"{{ if .Prompt }}{{ .Prompt}}{{ end }}"{{ .Response }}"

PARAMETER stop "reserved_special_token"
LICENSE "META LLAMA 3 COMMUNITY LICENSE AGREEMENT"

例举一些其他模型的modelfile模板:

mixtral

FROM /Users/user/data/ollama-file/models/MiniCPM-Llama3-V-2_5/model/model-Q4_K_M.gguf
TEMPLATE [INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]
PARAMETER stop [INST]
PARAMETER stop [/INST]

LICENSE """ Apache License  Version 2.0, January 2004 """

llava


FROM /Users/user/data/ollama-file/models/MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf
TEMPLATE "<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

LICENSE """ Apache License  Version 2.0, January 2004 """

other

FROM /Users/user/.ollama/models/blobs/sha256-eb569aba7d65cf3da1d0369610eb6869f4a53ee369992a804d5810a80e9fa035
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
PARAMETER num_ctx 4096
PARAMETER num_keep 4
LICENSE """ Apache License  Version 2.0, January 2004 """

添加系统提示

准备好您的系统提示（System Prompt），例如：


-----英文 Prompt Start------

## Role and Goal:

You are a scientific research paper reviewer, skilled in writing high-quality English scientific research papers. Your main task is to accurately and academically translate Chinese text into English, maintaining the style consistent with English scientific research papers. Users are instructed to input Chinese text directly, which will automatically initiate the translation process into English.

## Constraints:

Input is provided in Markdown format, and the output must also retain the original Markdown format.
Familiarity with specific terminology translations is essential.

## Guidelines:
The translation process involves three steps, with each step's results being printed:
1. Translate the content directly from Chinese to English, maintaining the original format and not omitting any information.
2. Identify specific issues in the direct translation, such as non-native English expressions, awkward phrasing, and ambiguous or difficult-to-understand parts. Provide explanations but do not add content or format not present in the original.
3. Reinterpret the translation based on the direct translation and identified issues, ensuring the content remains true to the original while being more comprehensible and in line with English scientific research paper conventions.

## Clarification:

If necessary, ask for clarification on specific parts of the text to ensure accuracy in translation.

## Personalization:

Engage in a scholarly and formal tone, mirroring the style of academic papers, and provide translations that are academically rigorous.

## Output format:

Please output strictly in the following format

### Direct Translation
{Placeholder}

***

### Identified Issues
{Placeholder}

***

### Reinterpreted Translation
{Placeholder}

Please translate the following content into English:

-----英文 Prompt End------

创建新模型版本

执行命令：

1	ollama create myllama3 -f myllama3.modelfile

完成后将显示 “success”。

检查新建的模型版本

运行 ollama ls，查看是否已成功创建。

运行 ollama run myllama3，查看效果。

如何引入新模型（HF转换成guff文件）

Import a model
llama.cpp
llama定制化(minicpmv)

ModelFile模型文件描述

指令参数

指令	描述
FROM (必填)	定义要使用的基本模型。
PARAMETER	设置 Ollama 如何运行模型的参数。
TEMPLATE	要发送到模型的完整提示模板。
SYSTEM	指定将在模板中设置的系统消息。
ADAPTER	定义要应用于模型的（Q）LoRA 适配器。
LICENSE	指定合法许可证。
MESSAGE	指定消息历史记录。

详细参数设置

参数	描述	值类型	用法示例
mirostat	启用 Mirostat 采样以控制困惑度。（默认值：0、0 = 禁用、1 = 开启）	int	`mirostat=1`
mirostat_eta	影响算法对生成文本反馈的响应速度。较低的学习率将导致调整速度较慢，较高的学习率使算法更具响应性。（默认值：0.1）	float	`mirostat_eta=0.2`
mirostat_tau	控制输出的一致性和多样性的平衡。较低的值将导致文本更加集中连贯。（默认值：5.0）	float	`mirostat_tau=4.0`
num_ctx	设置用于生成下一个标记的上下文窗口大小。（默认值：2048）	int	`num_ctx=4096`
num_gqa	Transformer 层中 GQA 组数。某些型号需要，例如 llama2:70b 为 8	int	`num_gqa=2`
num_gpu	要发送到 GPU 的层数。在 macOS 上，默认为 1 开启金属支持，0 禁用。	int	`num_gpu=1`
num_thread	设置计算期间使用的线程数。默认情况下，Ollama 将检测到这一点以获得最佳性能。建议将此值设置为系统具有的物理 CPU 核心数（而不是逻辑核心数）。	int	`num_thread=4`
repeat_last_n	设置模型回溯多远以防止重复。（默认值：64，0 = 禁用，-1 = num_ctx）	int	`repeat_last_n=32`
repeat_penalty	设置惩罚重复的强度。较高的值（例如，1.5）将更强烈地惩罚重复，较低的值（例如，0.9）将更宽松。（默认值：1.1）	float	`repeat_penalty=1.2`
temperature	模型温度。提高温度将使模型答案更有创意。（默认值：0.8）这个值越低准确率越高。	float	`temperature=0.9`
seed	设置用于生成的随机数种子。将其设置为特定数字将使模型为相同的提示生成相同的文本。（默认值：0）	int	`seed=42`
stop	设置要使用的停止序列。当遇到这种模式时，LLM 将停止生成文本并返回。可以通过 stop 在模型文件中指定多个单独参数来设置多个停止模式。	string	`stop="AI assistant:"`
tfs_z	无尾采样用于减少输出中不太可能的标记影响。较高的值（例如，2.0）将更多地减少影响，1.0 将禁用此设置。（默认值：1）	float	`tfs_z=1.5`
num_predict	生成文本时要预测的最大标记数。（默认值：128，-1 = 无限生成，-2 = 填充上下文）	int	`num_predict=64`
top_k	减少产生废话的可能性。较高的值（例如 100）将给出更多样化答案，较低的值（例如 10）将更加保守。（默认值：40）	int	`top_k=30`
top_p	与 top-k 一起工作。较高的值（例如，0.95）将导致更多样化文本，较低的值（例如，0.5）将生成更集中和保守文本。（默认值：0.9）	float	`top_p=0.8`