PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://xiahunao.cn/news/3017144.html

如若内容造成侵权/违法违规/事实不符,请联系瞎胡闹网进行投诉反馈,一经查实,立即删除!

相关文章

资源管理器CPU占用太高

最近资源管理器经常飙到80%-100%,所以电脑很卡。比如下面的新打开一个文件目录就这样 工具 shexview下载地址 排除 排序 先点Microsoft排序,上面粉色的就是所谓的外部插件 全部禁用 粉色全选->右键->Disable Selected Items (看其他…

PVZ2 植物克僵尸【第二期】

众所周知,PVZ2(植物大战僵尸2)中有许多恶心的僵尸,而我们不得不派出它们的————克星!(*为建议方法) 5.战机小鬼 战机小鬼,恶心会发射子弹,所以: 1&…

在Java中如何有效地处理内存泄露

在Java中,处理内存泄露有多种方法,以下是其中三种常见的方法及其原理和适用场景: ## 1. 合理使用垃圾回收机制 Java中的垃圾回收机制(Garbage Collection,GC)是一种自动化的内存管理技术,它可以…

Penpad再获 Presto Labs 投资,Scroll 生态持续扩张

​Penpad 是 Scroll 生态的 LaunchPad 平台,其整计划像收益聚合器以及 RWA 等功能于一体的综合性 Web3 平台拓展,该平台在近期频获资本市场关注,并获得了多个知名投资者/投资机构的支持。 截止到本文发布前,Penpad 已经获得了包括…

信息系统安全与对抗-网络侦查技术与网络扫描技术(期末复习)

1、网络拓扑结构在网络攻击中的作用 查明目标网络的拓扑结构,有利于找到目标网络的关键节点,从而提高攻击效率,达到最大攻击效果。 2、网络侦查在网络攻击中的作用 识别潜在目标系统,确认目标系统适合哪种类型的攻击。 3、百度…

Linux系统使用Docker安装青龙面板并实现远程访问管理面板

文章目录 一、前期准备本教程环境为:Centos7,可以跑Docker的系统都可以使用。本教程使用Docker部署青龙,如何安装Docker详见: 二、安装青龙面板三、映射本地部署的青龙面板至公网四、使用固定公网地址访问本地部署的青龙面板 青龙…

latex参考文献引用网址,不显示网址问题

以引用UCI数据集为例 1、加入宏包 \usepackage{url} 2、在参考文献bib文件中加入网址文献 misc{UCI, author {{D. Dua, E. Karra Taniskidou}}, year {2024}, title {UCI Machine Learning Repository}, howpublished {\url{http://archive.ics.uci.edu/ml}} } 完成&#x…

WWW‘24 | 课程学习CL+模仿学习IL用于ETF及商品期货交易

WWW24 | 课程学习CL模仿学习IL用于ETF及商品期货交易 原创 QuantML QuantML 2024-05-04 13:47 论文地址:[2311.13326] Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series (arxiv.org) 本文探讨了在金融时间序列数据上…

自动引流获客软件的魅力

在数字化时代,企业之间的竞争愈发激烈。对于任何一家企业来说,如何有效地吸引潜在客户、提升品牌知名度和销售业绩成为了至关重要的课题。传统的营销手段虽然依旧有效,但随着互联网技术的发展,自动引流获客软件以其独特的魅力&…

欧式聚类提取-------PCL

欧式聚类 std::vector<pcl::PointCloud<pcl::PointXYZ>::Ptr> PclTool::euclideanClustering(const pcl::PointCloud<pcl::PointXYZ>::Ptr& cloud) {std::vector<pcl::PointCloud<pcl::PointXYZ>::Ptr> clustered_clouds;// 下采样pcl::Vox…

【修改版】测试下目录间距

文章目录 前言一、Java基础题1、Java语言的三大特性2、JDK 和 JRE 有什么区别3、Java基本数据类型及其封装类4、说明一下public static void main(String args[])这段声明里关键字的作用5、 与 equals 的区别6、Java语言的三大特性7、Java语言的三大特性8、Java语言的三大特性9…

组合模式(Composite)——结构型模式

组合模式(Composite)——结构型模式 组合模式是一种结构型设计模式&#xff0c; 你可以使用它将对象组合成树状结构&#xff0c; 并且能通过通用接口像独立整体对象一样使用它们。如果应用的核心模型能用树状结构表示&#xff0c; 在应用中使用组合模式才有价值。 例如一个场景…

表空间的创建

目录 表空间创建的语法 表空间创建的例子 创建一个永久性表空间&#xff0c;设置表空间初始大小为100MB&#xff0c;自动扩展为 100MB&#xff0c;无最大大小限制&#xff0c;并且该表空间为在线状态&#xff0c;产生日志 创建一个永久性表空间&#xff0c;通过本地化管理方…

JUC-synchronized练习-交替打印ABC

今天来练习一下synchronized 简单来利用synchronized实现一个字符串的交替打印 主要的实现设置一个全局的变量state&#xff0c;线程执行通过不断累加state&#xff0c;根据state对三取余的结果来判断该线程是否继续执行还是进入等待。并通过synchronized锁住一个共享变量loc…

JavaScript异步编程——03-Ajax传输json和XML

Ajax 传输 JSON JSON 的语法 JSON(JavaScript Object Notation)&#xff1a;是 ECMAScript 的子集。作用是进行数据的交换。语法更为简洁&#xff0c;网络传输、机器解析都更为迅速。 语法规则&#xff1a; 数据在键值对中 数据由逗号分隔 花括号保存对象 方括号保存数组…

Python-100-Days: Day11 Files and Exception

1.读取csv文件 读取文本文件时&#xff0c;需要在使用open函数时指定好带路径的文件名&#xff08;可以使用相对路径或绝对路径&#xff09;并将文件模式设置为r&#xff08;如果不指定&#xff0c;默认值也是r&#xff09;&#xff0c;然后通过encoding参数指定编码&#xf…

LTE的EARFCN和band之间的对应关系

一、通过EARFCN查询对应band 工作中经常遇到只知道EARFCN而需要计算band的情况&#xff0c;因此查了相关协议&#xff0c;找到了他们之间的对应关系&#xff0c;可以直接查表&#xff0c;非常方便。 具体见&#xff1a; 3GPP TS 36.101 5.7.3 Carrier frequency and EAR…

【ROMA核心特性数据、服务、消息、设备集成及统一运维】

1、数据集成 FDI旨在解决多种数据源的快速灵活集成能力&#xff0c;您可以在任意时间、任意地点、任意系统之间实现实时数据订阅和定时增量数据迁移。 &#xff08;1&#xff09;数据集成任务的生命周期管理 &#xff08;2&#xff09;FDI支持修改数据集成任务的信息、查看数…

通过氧气退火增强β-Ga₂O₃二极管.中国科技大学和河北半导体研究所的研究人员在这一特定领域取得了最新重大进展

上图所示&#xff1a;&#xff08;a&#xff09;增加台面有助于提高β-Ga2O3肖特基势垒二极管的阻断电压&#xff08;b&#xff09;。 氧气退火和自对准台面终端使β-Ga2O3二极管进一步走向商业化。 虽然β-Ga2O3电力电子技术已经取得了长足的进步&#xff0c;但仍然存在挑战&…

wordpress忘记后台密码,在数据库中修改回来,然后再修改回去。

源地址&#xff1a;https://www.ctvol.com/seoomethods/1421332.html 我们在做wordpess运维的时候&#xff0c;都会遇到很尴尬的时候&#xff0c;有时候在错误运维中&#xff0c;不知道删除了什么东西&#xff0c;造成wordpress后台不能登录&#xff0c;后台页面也直接失效&am…