CHiME丨​MMCSG(智能眼镜多模态对话)

CHiME 挑战赛已经正式开启,今天分享下 CHiME 的子任务MMCSG(智能眼镜多模态对话),欢迎大家投稿报名!

赛事官网:https://www.chimechallenge.org/current/task3/index

CHiME (Computational Hearing in Multisource Environments)挑战赛是由法国计算机科学与自动化研究所、英国谢菲尔德大学、美国三菱电子研究实验室等知名研究机构所于2011年发起的一项重要国际赛事,重点围绕语音研究领域极具挑战的远场语音处理相关任务,今年已举办到第八届。

MMCSG(智能眼镜多模态对话)数据集,记录了使用Aria眼镜的双向对话,包括多通道音频、视频、加速度计和陀螺仪数据等多模态信息。适用于自动语音识别、活动检测和说话者分离等领域的研究。支持多模态数据分析,例如结合音频与惯性传感器数据进行分析,以解决说话者识别、活动检测和语音识别等问题。数据集经人工标注,保护了参与者隐私,旨在支持合法的研究目的。

The CHiME-8 MMCSG task involves natural conversations between two people recorded with smart glasses. The goal is to obtain speaker-attributed transcriptions in streaming fashion, using audio, video, and IMU input modalities. See the dedicated pages to learn more about the data, rules and evaluation and the baseline system. 

This challenge focuses on transcribing both sides of a conversation where one participant is wearing smart glasses equipped with a microphone array and other sensors. The conversations represent natural spontaneous speech of two conversation participants, some of which include noise. Given the use case of real-time captioning, both transcription and diarization need to happen in a streaming fashion with an as short as possible latency.

Rules

Summary of the rules for systems participating in the challenge:

  • For building the system, it is allowed to use the training subset of MMCSG dataset and external data listed in the subsection Data and pre-trained models. If you believe there is some public dataset missing, you can propose it to be added until the deadline as specified in the schedule.

  • The development subset of MMCSG can be used for evaluating the system throughout the challenge period, but not for training or automatic tuning of the systems.

  • Pre-trained models are listed in the “Data and pre-trained models” subsection. Only those pre-trained models are allowed to be used. If you believe there is some model missing, you can propose it to be added until the deadline as specified in the schedule.

  • The submitted systems must be streaming, i.e. process its inputs sequentially in time and specify latency for each emitted word, as described in detail in the subsection Evaluation. It must not use any global information from a recording before processing it in temporal order. Such global information could include global normalizations, non-streaming speaker identification or diarization, etc. This requirement on streaming processing applies to all modalities (audio, visual, accelerometer, gyroscope, etc).

  • The details of the streaming nature of the system, including any lookahead, chunk-based processing, other details that would impact latency, and an explicit estimate of the average algorithmic and emission latency itself should be clearly described in a section of the submitted system description with the heading “Latency”.

  • For evaluation, each recording must be considered separately. The system should not be in any way fine-tuned on the entire evaluation set (e.g. by computing global statistics, gathering speaker information across multiple recordings). If your system does not comply with these rules (e.g. by using a private dataset or having only a partially streaming method), you may still submit your system, but we will not include it in the final rankings.

Baseline System

The baseline system is provided at Github. Please refer to the README therein for information about how to install and run the system.

The baseline system roughly follows the scheme in (Lin et al, 2023). It comprises of:

  • Fixed NLCMV beamformer (Feng et al, 2023) which uses 13 beams into 12 directions uniformly spaced around the wearer + 1 direction for the mouth of the wearer. The beamformer coefficients are derived from acoustic transfer functions (ATF) recorded in anechoic rooms with the Aria glasses. We release both the beamforming coefficients and the original ATFs.

  • Extraction of log-mel features from each of the 13 beams.

  • ASR model processing the multi-channel features and estimating serialized-output-training (SOT) (Kanda et al, 2022) transcriptions.

The ASR model is based on a publicly available pre-trained streaming model - FastConformer Hybrid Transducer-CTC model. By default, this model is a single-speaker, single-channel model. We modify this model by prepending the beamformer, extending its input to multiple channels, extending the tokenizer with speaker tokens »0, »1 (for SELF and OTHER, respectively), and fine-tuning it to provide the SOT transcriptions. The fine-tuning is done on the training subset of the MMCSG dataset.

The results achieved by the baseline system on the dev subset of MMCSG with several different latency settings are summarized in the following table:

Submission

The details of the submission will be defined soon.

During the submission, we ask the participants to submit:

  • the word error rates (including the break-down into substitutions, insertions, deletions and speaker attributions) for SELF and OTHER on dev and eval subsets

  • the computed mean latency on dev and eval subsets

  • the hypotheses files for each recording of dev and eval subsets

  • the hypotheses files from the timestamp test on perturbed and unperturbed files

Important dates

2024.2.15

Challenge begins;

release of train and dev datasets and baseline system

2024.3.20

Deadline for proposing additional public datasets and pre-trained models

2024.6.15

Evaluation set released

2024.6.28

Deadline for submitting results

2024.6.28

Deadline for submitting technical reports

2024.9.6

CHiME-8 Workshop

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://xiahunao.cn/news/2803147.html

如若内容造成侵权/违法违规/事实不符,请联系瞎胡闹网进行投诉反馈,一经查实,立即删除!

相关文章

最新版opencv4.9安装介绍,基本图像处理详解

文章目录 一、什么是OpenCV ?二. OpenCV 安装1. 下载地址2.安装命令:pip install opencv-python 三、图像基础1. 基本概念2. 坐标系3. 基本操作(彩色图片)(1)读取图片:cv2.imread( )&#xff08…

小程序--vscode配置

要在vscode里开发微信小程序,需要安装以下两个插件: 安装后,即可使用vscode开发微信小程序。 注:若要实现鼠标悬浮提示,则需新建jsconfig.json文件,并进行配置,即可实现。 jsconfig.json内容如…

vue3新特性-defineOptions和defineModel

defineOptions 背景说明&#xff1a; 有 <script setup> 之前&#xff0c;如果要定义 props, emits 可以轻而易举地添加一个与 setup 平级的属性。 但是用了 <script setup> 后&#xff0c;就没法这么干了 setup 属性已经没有了&#xff0c;自然无法添加与其平…

【Java程序设计】【C00278】基于Springboot的数码论坛管理系统(有论文)

基于Springboot的数码论坛管理系统&#xff08;有论文&#xff09; 项目简介项目获取开发环境项目技术运行截图 项目简介 这是一个基于Springboot的数码论坛系统 本系统分为系统功能模块、管理员功能模块以及用户功能模块。 系统功能模块&#xff1a;在系统首页可以查看首页、…

小程序生命周期解析(从概念、启动、运行、销毁到生命周期事件及场景的全面解析)

引言 在当今移动应用激烈竞争的时代&#xff0c;小程序作为一种轻量级、高效便捷的移动应用形式&#xff0c;逐渐成为用户和开发者的热门选择。小程序不仅以其小巧灵活的体积受到用户喜爱&#xff0c;更因为无需安装、即点即用的特性在移动应用领域取得了广泛的普及。随着小程…

三好夫人品牌喜获“爱你一万年.com”中文域名,开启数字化新篇章

三好夫人&#xff0c;以中文域名“爱你一万年.com”强化品牌情感联结&#xff0c;预示全新市场布局。 【北京&#xff0c;2024年2月23日讯】 —— 中国领先的旺夫养生茶品牌“三好夫人”今日宣布&#xff0c;成功注册“爱你一万年.com”中文域名&#xff0c;标志着公司在加强品…

亚马逊,速卖通,国际站测评补单的必要性与方法

亚马逊平台的规则与某宝有所不同。亚马逊平台没有产品推广引流和直通车等功能。而且&#xff0c;与某宝不同的是&#xff0c;亚马逊平台没有广告位和卖家客服。在某宝上&#xff0c;当我们选择款式和颜色时&#xff0c;通常会与卖家客服进行沟通。但在亚马逊上&#xff0c;没有…

深入探索pdfplumber:从PDF中提取信息到实际项目应用【第94篇—pdfplumbe】

深入探索pdfplumber&#xff1a;从PDF中提取信息到实际项目应用 在数据处理和信息提取的过程中&#xff0c;PDF文档是一种常见的格式。然而&#xff0c;要从PDF中提取信息并进行进一步的分析&#xff0c;我们需要使用适当的工具。本文将介绍如何使用Python库中的pdfplumber库来…

DataX - 全量数据同步工具

前言 今天是2024-2-21&#xff0c;农历正月十二&#xff0c;相信今天开始是新的阶段&#xff0c;尽管它不是新的周一、某月一日、某年第一天&#xff0c;尽管我是一个很讲究仪式感的人。新年刚过去 12 天&#xff0c;再过 3 天就开学咯&#xff0c;开学之后我的大学时光就进入了…

STM32控制max30102读取血氧心率数据(keil5工程)

一、前言 MAX30102是一款由Maxim Integrated推出的低功耗、高精度的心率和血氧饱和度检测传感器模块&#xff0c;适用于可穿戴设备如智能手环、智能手表等健康管理类电子产品。 该传感器主要特性如下&#xff1a; &#xff08;1&#xff09;光学测量&#xff1a;MAX30102内置…

2024生物发酵展带您领略视觉盛宴-东滤器材

参展企业介绍 东滤器材&#xff08;石家庄&#xff09;有限公司是一家专注于微孔膜产品、深层过滤产品、纳米纤维产品、一次性过滤产品开发和应用的高科技企业&#xff0c;并于2022年顺利通过河北省“高新技术企业”权威认证。 公司拥有近两千平米符合GMP规范的十万级净化车间…

springmvc基于springboot 的音乐播放系统 _7sdu8

这就意味着音乐播放系统的设计可以比其他系统更为出色的能力&#xff0c;可以更高效的完成最新的ymj排行榜、ymj音乐资讯等功能。 此系统设计主要采用的是JAVA语言来进行开发&#xff0c;JSP技术、采用SSM框架技术&#xff0c;框架分为三层&#xff0c;分别是控制层Controller&…

NXP实战笔记(七):S32K3xx基于RTD-SDK在S32DS上配置ICU输入捕获

目录 1、概述 2、输入捕获SDK配置 2.1、SAIC中断方式 2.2、IPWM或者IPM 1、概述 输入捕获&#xff0c;可以抓取高电平时间、低电平时间、占空比、周期、边沿检测与回调函数、边沿计数&#xff08;ABZ解码&#xff09;、时间戳、唤醒中断。 记录一下根据Emios模块实现上述部分…

AGI|AI到底如何生成视频?Sora究竟为何能引爆科技圈?

目录 一、AI生成视频引发新浪潮 二、生成方法及难点 三、Sora的突破进展 &#xff08;一&#xff09;可生成不同尺寸视频 &#xff08;二&#xff09;可生成1分钟时长视频 &#xff08;三&#xff09;图片生成视频 &#xff08;四&#xff09;场景一致性 &#xff08;五…

【C++私房菜】面向对象中的简单继承

文章目录 一、 继承基本概念二、派生类对象及派生类向基类的类型转换三、继承中的公有、私有和受保护的访问控制规则四、派生类的作用域五、继承中的静态成员 一、 继承基本概念 通过继承&#xff08;inheritance&#xff09;联系在一起的类构成一种层次关系。通常在层次关系的…

Jenkins2.426.3运行时提示:mvn: command not found

Jenkins运行时提示&#xff1a;mvn: command not found 第一步&#xff0c;查看服务器上是否已正确安装maven环境 $mvn --version 如果没有显示上面的信息&#xff0c;则需要重新安装maven环境后再往下进行 第二步&#xff1a;Jenkins配置Maven 例如&#xff1a;/usr/local/…

六、回归与聚类算法 - 欠拟合和过拟合

目录 1、定义 2、原因及解决方法 2.1 正则化 线性回归欠拟合与过拟合线性回归的改进 - 岭回归分类算法&#xff1a;逻辑回归模型保存与加载无监督学习&#xff1a;K-means算法 1、定义 2、原因及解决方法 2.1 正则化

洛谷B2008/2009 题解

#题外话&#xff08;第35篇题解&#xff09;&#xff08;太简单&#xff0c;分两个于心不忍……&#xff09;&#xff08;C语言&#xff09; #先看题目 2008&#xff1a; 2009&#xff1a; 题目链接&#xff1a; 2008https://www.luogu.com.cn/problem/B20082009https://www…

海思SD3403,SS928/926,hi3519dv500,hi3516dv500移植yolov7,yolov8(14)

自己挖了一个坑,准备做SS928/SD3403的Yolov8的移植,主要是后台私信太多人在问相关的问题。先别着急去写代码,因为在hi3516dv500下的移植还是比较顺利。之前在hi3519av100和hi3559av100系列时遇到过一些问题,所以没有继续去移植新的算法。 SS928架构乍一看和hi3559av100特别…

多来客资讯:本地生活服务平台加盟方法

本地生活一般涵盖了吃喝玩乐&#xff0c;而本地生活平台&#xff0c;则是指提供这些吃喝玩乐的互联网平台以及各大APP&#xff0c;比如饿了么、美团等等&#xff0c;这些都可以叫做本地生活服务平台。 因为这些平台都是主要做外卖、团购业务为主&#xff0c;所以&#xff0c;本…