Llama3-Tutorial之手把手带你评测Llama3能力(OpenCompass 版)

Llama3-Tutorial之手把手带你评测Llama3能力(OpenCompass 版)

Llama 3 近期重磅发布,发布了 8B 和 70B 参数量的模型,opencompass团队对 Llama 3 进行了评测!

书生·浦语和机智流社区同学投稿了 OpenCompass 评测 Llama 3,欢迎 Star。

https://github.com/open-compass/OpenCompass/

https://github.com/SmartFlowAI/Llama3-Tutorial

1. oepncompass评测实战

本小节将带大家手把手用opencompass评测 Llama3。

1.1 环境配置

conda create -n llama3 python=3.10
conda activate llama3

conda install git
apt install git-lfs

1.2 下载 Llama3 模型

首先通过 OpenXLab 下载 Llama-3-8B-Instruct 这个模型。

mkdir -p ~/model
cd ~/model
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

或者软链接 InternStudio 中的模型

ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct \
    ~/model

1.3 安装OpenCompass

cd ~
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .

1.4 数据准备

# 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

# 此时环境当前目录文件如下
(llama3) root@intern-studio-50014188:~/opencompass# ls
LICENSE                            README.md        configs  docs         opencompass.egg-info  requirements.txt  setup.py  tools
OpenCompassData-core-20240207.zip  README_zh-CN.md  data     opencompass  requirements          run.py            tests

1.5 命令行评测

1.5.1 查看配置文件和支持的数据集名称

OpenCompass预定义了许多模型和数据集的配置,你可以通过工具列出所有可用的模型和数据集配置。

# 列出所有配置
python tools/list_configs.py

# 列出所有跟 llama (模型)及 ceval(数据集)相关的配置
python tools/list_configs.py llama ceval

+----------------------------+-------------------------------------------------------+
| Model                      | Config Path                                           |
|----------------------------+-------------------------------------------------------|
| accessory_llama2_7b        | configs/models/accessory/accessory_llama2_7b.py       |
| hf_codellama_13b           | configs/models/codellama/hf_codellama_13b.py          |
| hf_codellama_13b_instruct  | configs/models/codellama/hf_codellama_13b_instruct.py |
| hf_codellama_13b_python    | configs/models/codellama/hf_codellama_13b_python.py   |
| hf_codellama_34b           | configs/models/codellama/hf_codellama_34b.py          |
| hf_codellama_34b_instruct  | configs/models/codellama/hf_codellama_34b_instruct.py |
| hf_codellama_34b_python    | configs/models/codellama/hf_codellama_34b_python.py   |
| hf_codellama_7b            | configs/models/codellama/hf_codellama_7b.py           |
| hf_codellama_7b_instruct   | configs/models/codellama/hf_codellama_7b_instruct.py  |
| hf_codellama_7b_python     | configs/models/codellama/hf_codellama_7b_python.py    |
| hf_gsm8k_rft_llama7b2_u13b | configs/models/others/hf_gsm8k_rft_llama7b2_u13b.py   |
| hf_llama2_13b              | configs/models/hf_llama/hf_llama2_13b.py              |
| hf_llama2_13b_chat         | configs/models/hf_llama/hf_llama2_13b_chat.py         |
| hf_llama2_70b              | configs/models/hf_llama/hf_llama2_70b.py              |
| hf_llama2_70b_chat         | configs/models/hf_llama/hf_llama2_70b_chat.py         |
| hf_llama2_7b               | configs/models/hf_llama/hf_llama2_7b.py               |
| hf_llama2_7b_chat          | configs/models/hf_llama/hf_llama2_7b_chat.py          |
| hf_llama3_70b              | configs/models/hf_llama/hf_llama3_70b.py              |
| hf_llama3_70b_instruct     | configs/models/hf_llama/hf_llama3_70b_instruct.py     |
| hf_llama3_8b               | configs/models/hf_llama/hf_llama3_8b.py               |
| hf_llama3_8b_instruct      | configs/models/hf_llama/hf_llama3_8b_instruct.py      |
| hf_llama_13b               | configs/models/hf_llama/hf_llama_13b.py               |
| hf_llama_30b               | configs/models/hf_llama/hf_llama_30b.py               |
| hf_llama_65b               | configs/models/hf_llama/hf_llama_65b.py               |
| hf_llama_7b                | configs/models/hf_llama/hf_llama_7b.py                |
| llama2_13b                 | configs/models/llama/llama2_13b.py                    |
| llama2_13b_chat            | configs/models/llama/llama2_13b_chat.py               |
| llama2_70b                 | configs/models/llama/llama2_70b.py                    |
| llama2_70b_chat            | configs/models/llama/llama2_70b_chat.py               |
| llama2_7b                  | configs/models/llama/llama2_7b.py                     |
| llama2_7b_chat             | configs/models/llama/llama2_7b_chat.py                |
| llama_13b                  | configs/models/llama/llama_13b.py                     |
| llama_30b                  | configs/models/llama/llama_30b.py                     |
| llama_65b                  | configs/models/llama/llama_65b.py                     |
| llama_7b                   | configs/models/llama/llama_7b.py                      |
+----------------------------+-------------------------------------------------------+
+--------------------------------+------------------------------------------------------------------+
| Dataset                        | Config Path                                                      |
|--------------------------------+------------------------------------------------------------------|
| base_medium_llama              | configs/datasets/collections/base_medium_llama.py                |
| ceval_clean_ppl                | configs/datasets/ceval/ceval_clean_ppl.py                        |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen                      | configs/datasets/ceval/ceval_gen.py                              |
| ceval_gen_2daf24               | configs/datasets/ceval/ceval_gen_2daf24.py                       |
| ceval_gen_5f30c7               | configs/datasets/ceval/ceval_gen_5f30c7.py                       |
| ceval_internal_ppl_1cd8bf      | configs/datasets/ceval/ceval_internal_ppl_1cd8bf.py              |
| ceval_ppl                      | configs/datasets/ceval/ceval_ppl.py                              |
| ceval_ppl_1cd8bf               | configs/datasets/ceval/ceval_ppl_1cd8bf.py                       |
| ceval_ppl_578f8d               | configs/datasets/ceval/ceval_ppl_578f8d.py                       |
| ceval_ppl_93e5ce               | configs/datasets/ceval/ceval_ppl_93e5ce.py                       |
| ceval_zero_shot_gen_bd40ef     | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py             |
+--------------------------------+------------------------------------------------------------------+
1.5.2 以 C-Eval_gen 为例
python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug

运行错误:

alt

遇到错误执行如下安装和设置:

pip install -r requirements.txt
pip install protobuf
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU

遇到 ModuleNotFoundError: No module named 'rouge' 错误请运行:

git clone https://github.com/pltrdy/rouge
cd rouge
python setup.py install

命令解析

python run.py \
--datasets ceval_gen \    # 使用 ceval数据集进行评测
--hf-path /root/model/Meta-Llama-3-8B-Instruct \  # HuggingFace 模型路径
--tokenizer-path /root/model/Meta-Llama-3-8B-Instruct \  # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 2048 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 4  \  # 批量大小
--num-gpus 1 \ # 运行模型所需的 GPU 数量
--debug

评测时间约为15min左右。评测完成后输出结果如下:

dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_meta-llama_Meta-Llama-3-8B-Instruct
----------------------------------------------  ---------  -------------  ------  --------------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                63.16
ceval-operating_system                          1c2571     accuracy       gen                                                                                63.16
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                52.38
ceval-college_programming                       4ca32a     accuracy       gen                                                                                62.16
ceval-college_physics                           963fa8     accuracy       gen                                                                                42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                29.17
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                42.11
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                27.78
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                32.43
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                62.5
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                 5.56
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                26.32
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                63.16
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                36.84
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                31.58
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                71.43
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                57.89
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                52.17
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                45.45
ceval-business_administration                   c1614e     accuracy       gen                                                                                30.3
ceval-marxism                                   cf874c     accuracy       gen                                                                                47.37
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                50
ceval-education_science                         591fee     accuracy       gen                                                                                51.72
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                72.73
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                68.42
ceval-high_school_geography                     865461     accuracy       gen                                                                                42.11
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                57.14
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                50
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                78.95
ceval-logic                                     f5b022     accuracy       gen                                                                                40.91
ceval-law                                       a110a1     accuracy       gen                                                                                33.33
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                34.78
ceval-art_studies                               2a1300     accuracy       gen                                                                                54.55
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                55.17
ceval-legal_professional                        ce8787     accuracy       gen                                                                                30.43
ceval-high_school_chinese                       315705     accuracy       gen                                                                                31.58
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                59.09
ceval-civil_servant                             87d061     accuracy       gen                                                                                34.04
ceval-sports_science                            70f27b     accuracy       gen                                                                                63.16
ceval-plant_protection                          8941f9     accuracy       gen                                                                                68.18
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                57.89
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                54.55
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                52.17
ceval-accountant                                002837     accuracy       gen                                                                                44.9
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                45.16
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                34.69
ceval-physician                                 6e277d     accuracy       gen                                                                                57.14
ceval-stem                                      -          naive_average  gen                                                                                46.34
ceval-social-science                            -          naive_average  gen                                                                                51.52
ceval-humanities                                -          naive_average  gen                                                                                48.72
ceval-other                                     -          naive_average  gen                                                                                50.05
ceval-hard                                      -          naive_average  gen                                                                                32.65
ceval                                           -          naive_average  gen                                                                                48.63
05/06 16:34:08 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240506_162314/summary/summary_20240506_162314.txt
05/06 16:34:08 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240506_162314/summary/summary_20240506_162314.csv

路径/root/opencompass/outputs/default下存放了评测结果的汇总。

1.6 使用python脚本评测

configs文件夹下添加测试脚本 eval_llama3_8b_demo.py,内容如下:

from mmengine.config import read_base

with read_base():
    from .datasets.mmlu.mmlu_gen_4d595a import mmlu_datasets

datasets = [*mmlu_datasets]

from opencompass.models import HuggingFaceCausalLM

models = [
dict(
type=HuggingFaceCausalLM,
abbr='Llama3_8b'# 运行完结果展示的名称
path='/root/model/Meta-Llama-3-8B-Instruct'# 模型路径
tokenizer_path='/root/model/Meta-Llama-3-8B-Instruct'# 分词器路径
model_kwargs=dict(
device_map='auto',
trust_remote_code=True
),
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
trust_remote_code=True,
use_fast=False
),
generation_kwargs={"eos_token_id": [128001128009]},
batch_padding=True,
max_out_len=100,
max_seq_len=2048,
batch_size=16,
run_cfg=dict(num_gpus=1),
)
]

运行python run.py configs/eval_llama3_8b_demo.py --debug

遇到如下报错,测试结果无数据。待排查原因:

alt

2. opencompass官方仓库及评测结果

opencompass官方已经支持 Llama3。

仓库:

https://github.com/open-compass/opencompass/

评测结果:

alt

本文由 mdnice 多平台发布

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://xiahunao.cn/news/3016194.html

如若内容造成侵权/违法违规/事实不符,请联系瞎胡闹网进行投诉反馈,一经查实,立即删除!

相关文章

本机MySQL数据库服务启动了,但是cmd登录不上10061

注意:不建议安装MySQL8,建议直接使用phpstudy中自带的MySQL5.7 错误信息 ERROR 2003 (HY000): Cant connect to MySQL server on x.x.x.x (10061) 原因 可能是端口号错误。比如修改了my.ini中,或者phpstudy中数据库端口的配置,…

Star-CCM+分配零部件至区域1-将所有零部件分配至区域

前言 Star-CCM中,在划分网格之前需要将零部件分配至区域,然后才可以划分网格。如下图1所示,分配零部件至区域需要选择创建区域的方式、创建边界的方式以及交界面的类型。 图1 将零部件分配至区域 1 创建区域的方式选择 如下图2所示&#x…

Qt :信号与槽

信号与槽 信号介绍connect 函数使用connect 函数传参问题 定义槽(solt)函数方法一方法二 定义信号关键字 signals、emit 定义带参数的信号和槽参数个数不一致问题断开信号和槽的连接 disconnect lambda 表达式 信号介绍 Qt 中,信号会涉及三个…

Redis之Linux下的安装配置

Redis之Linux下的安装配置 Redis下载 Linux下下载源码安装配置 方式一 官网下载:https://redis.io/download ​ 其他版本下载:https://download.redis.io/releases/ 方式二(推荐) GitHub下载:https://github.com/r…

【GDPU】数据结构实验十 哈夫曼编码

【实验内容】 1、假设用于通信的电文仅由8个字母 {a, b, c, d, e, f, g, h} 构成,它们在电文中出现的概率分别为{ 0.07, 0.19, 0.02, 0.06, 0.32, 0.03, 0.21, 0.10 },试为这8个字母设计哈夫曼编码。 提示:包含两个过程:(1)构建…

python菜鸟级安装教程 -下篇(安装编辑器)

来来~接着上篇的来~ 安装好python.exe之后,我们可以根据cmd命令窗口,码代码。 这算最简单入门了~ 如果我们在安装个编辑器。是什么效果,一起体验一下吧 第一步,下载编辑器,选择官网,下载免费版本入门足…

详细分析Mybatis与MybatisPlus中分页查询的差异(附Demo)

目录 前言1. Mybatis2. MybatisPlus3. 实战 前言 更多的知识点推荐阅读: 【Java项目】实战CRUD的功能整理(持续更新)java框架 零基础从入门到精通的学习路线 附开源项目面经等(超全) 本章节主要以Demo为例&#xff…

Vulnhub项目:ICA: 1

1、靶机介绍 靶机地址:ICA: 1 ~ VulnHub 2、渗透过程 首先,部署好靶机后,进行探测,发现靶机ip和本机ip,靶机ip156,本机ip146。 然后查看靶机ip有哪些端口,nmap一下。 出现22、80、3306端口&a…

书生·浦语大模型实战营之手把手带你评测 Llama 3 能力(OpenCompass 版)

书生浦语大模型实战营之手把手带你评测 Llama 3 能力(OpenCompass 版) 环境配置 conda create -n llama3 python3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y conda activate llama3conda install git git-lfs install✨下载 Llama3…

贪心问题 难度[普及-]一赏

目录 #小A的糖果 删数问题 陶陶摘苹果(升级版) P5019 NOIP2018 提高组 铺设道路 小A的糖果 原文链接: P3817 小A的糖果 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn) 题目描述 小 A 有 n 个糖果盒,第 i 个盒中有 a_i 颗糖果。 小 A 每…

用PowerPoint创建毛笔字书写动画

先看看下面这个毛笔字书写动画: 这个动画是用PowerPoint创建的。下面介绍创建过程。 1、在任何一款矢量图片编辑软件中创建一个图片,用文字工具输入文字内容。我用的是InkScape。排好版后将图片保存为.svg格式的矢量图片文件。 2、打开PowerPoint&…

openEuler 22.03 GPT分区表模式下磁盘分区管理

目录 GPT分区表模式下磁盘分区管理parted交互式创建分区步骤 1 执行如下步骤对/dev/sdc磁盘分区 非交互式创建分区步骤 1 输入如下命令直接创建分区。 删除分区步骤 1 执行如下命令删除/dev/sdc1分区。 GPT分区表模式下磁盘分区管理 parted交互式创建分区 步骤 1 执行如下步骤…

【刷题篇】双指针(一)

文章目录 1、移动零2、复写零3、快乐数4、盛最多水的容器 1、移动零 给定一个数组 nums,编写一个函数将所有 0 移动到数组的末尾,同时保持非零元素的相对顺序。 请注意 ,必须在不复制数组的情况下原地对数组进行操作。 class Solution { pub…

SRC公益漏洞挖掘思路分享

0x00 前言 第一次尝试挖SRC的小伙伴可能会觉得挖掘漏洞非常困难,没有思路,不知道从何下手,在这里我分享一下我的思路 0x01 挖掘思路 确定自己要挖的漏洞,以及该漏洞可能存在的功能点,然后针对性的进行信息收集 inurl…

[1726]java试飞任务规划管理系统Myeclipse开发mysql数据库web结构java编程计算机网页项目

一、源码特点 java试飞任务规划管理系统是一套完善的java web信息管理系统,对理解JSP java编程开发语言有帮助,系统具有完整的源代码和数据库,系统主要采用B/S模式开发。开发环境为 TOMCAT7.0,Myeclipse8.5开发,数据库为Mysql…

延时任务通知服务的设计及实现(三)-- JDK的延迟队列DelayQueue

一、接着上文 上文我们讲述了使用redisson的RDelayedQueue实现分布式延迟队列,本文我们将自己JDK的延迟队列DelayQueue实现。 相比前者的实现,作为进程内的延迟队列,它会遇到许多技术难点: 如何支持分布式的多个节点部署场景应…

matplotlib和pandas与numpy

1.matplotlib介绍 一个2D绘图库; 2.Pandas介绍: Pandas一个分析结构化数据的工具; 3.NumPy 一个处理n纬数组的包; 4.实践:绘图matplotlip figure()生成一个图像实例 %matplotlib inline:图形直接在…

就业班 第三阶段(redis) 2401--5.7 day2 redis2 哨兵(前提是做好了主从)+redis集群

1、设置密码(redis) 先在redis.conf里面找到这个 后面写上要设置的密码即可 2、哨兵模式 监控redis集群中master状态的的工具 在做了主从的前提下 主 从1 从2 作用 1):Master状态检测 2):如果Master异常,则会进行…

Linux--基础IO(文件描述符fd)

目录 1.回顾一下文件 2.理解文件 下面就是系统调用的文件操作 文件描述符fd,fd的本质是什么? 读写文件与内核级缓存区的关系 据上理论我们就可以知道:open在干什么 3.理解Linux一切皆文件 4.C语言中的FILE* 1.回顾一下文件 先来段代码…

数据结构——链表(精简易懂版)

文章目录 链表概述链表的实现链表的节点(单个积木)链表的构建直接构建尾插法构建头插法构建 链表的插入 总结 链表概述 1,链表(Linked List)是一种常见的数据结构,用于存储一系列元素。它由一系列节点&…