Trending Papers - Hugging Face

new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Subscribe

byAK and the research community

AutoDev: Automated AI-Driven Development

AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.

  • 5 authors

· Published on Mar 13, 2024

AutoDev: Automated AI-Driven Development

AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.

Submitted by

evanking

Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

Monolingual ASR models trained on a balanced mix of high-quality, pseudo-labeled, and synthetic data outperform multilingual models for small model sizes, achieving superior error rates and enabling on-device ASR for underrepresented languages.

· Published on Sep 2, 2025

Submitted by

evanking

Submitted by

parachas

Arch-Router: Aligning LLM Routing with Human Preferences

A preference-aligned routing framework using a compact 1.5B model effectively matches queries to user-defined domains and action types, outperforming proprietary models in subjective evaluation criteria.

· Published on Jun 19, 2025

Submitted by

parachas

Submitted by

taesiri

Submitted by

taesiri

Submitted by

andito

Submitted by

andito

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

taesiri

Submitted by

taesiri

Submitted by

hao-li

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

  • 11 authors

· Published on Nov 17, 2025

Submitted by

hao-li

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

  • 11 authors

· Nov 17, 2025

Submitted by

taesiri

Qwen3-TTS Technical Report

The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.

Qwen Qwen · Published on Jan 22, 2026

Submitted by

taesiri

Qwen3-TTS Technical Report

The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.

Qwen Qwen · Jan 22, 2026

Submitted by

akhaliq

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

· Published on Apr 28, 2025

Submitted by

akhaliq

Submitted by

taesiri

Submitted by

taesiri

Submitted by

tylerlum

Submitted by

tylerlum

Submitted by

stzhao

Submitted by

stzhao

Submitted by

taesiri

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 advances foundation models with DSA for cost reduction, asynchronous reinforcement learning for improved alignment, and enhanced coding capabilities for real-world software engineering.

· Published on Feb 17, 2026

Submitted by

taesiri

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 advances foundation models with DSA for cost reduction, asynchronous reinforcement learning for improved alignment, and enhanced coding capabilities for real-world software engineering.

Submitted by

taesiri

Submitted by

taesiri

LightRAG: Simple and Fast Retrieval-Augmented Generation

LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.

  • 5 authors

· Published on Oct 8, 2024

Submitted by

ahmedheakl

Submitted by

ahmedheakl

Submitted by

Dongchao

HeartMuLa: A Family of Open Sourced Music Foundation Models

A suite of open-source music foundation models is introduced, featuring components for audio-text alignment, lyric recognition, music coding, and large language model-based song generation with controllable attributes and scalable parameterization.

· Published on Jan 15, 2026

Submitted by

Dongchao

HeartMuLa: A Family of Open Sourced Music Foundation Models

A suite of open-source music foundation models is introduced, featuring components for audio-text alignment, lyric recognition, music coding, and large language model-based song generation with controllable attributes and scalable parameterization.

Submitted by

kpzhang996

Submitted by

kpzhang996

Submitted by

chenwang

Submitted by

chenwang

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

UglyToilet

MemOS: A Memory OS for AI System

MemOS, a memory operating system for Large Language Models, addresses memory management challenges by unifying plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.

· Published on Jul 4, 2025

Submitted by

UglyToilet

MemOS: A Memory OS for AI System

MemOS, a memory operating system for Large Language Models, addresses memory management challenges by unifying plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.

Submitted by

taesiri

Submitted by

taesiri

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Self-Supervised Prompt Optimization

A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.

· Published on Feb 7, 2025

Self-Supervised Prompt Optimization

A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.

Kronos: A Foundation Model for the Language of Financial Markets

Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.

  • 7 authors

· Published on Aug 2, 2025

Multi-Agent Collaboration via Evolving Orchestration

A centralized orchestrator dynamically directs LLM agents via reinforcement learning, achieving superior multi-agent collaboration in varying tasks with reduced computational costs.

  • 14 authors

· Published on May 26, 2025

Multi-Agent Collaboration via Evolving Orchestration

A centralized orchestrator dynamically directs LLM agents via reinforcement learning, achieving superior multi-agent collaboration in varying tasks with reduced computational costs.

  • 14 authors

· May 26, 2025

Submitted by

LakshyAAAgrawal

Submitted by

LakshyAAAgrawal

Submitted by

taesiri

LTX-2: Efficient Joint Audio-Visual Foundation Model

LTX-2 is an open-source audiovisual diffusion model that generates synchronized video and audio content using a dual-stream transformer architecture with cross-modal attention and classifier-free guidance.

· Published on Jan 6, 2026

Submitted by

taesiri

Submitted by

taesiri

FireRed-Image-Edit-1.0 Techinical Report

FireRed-Image-Edit uses a diffusion transformer with optimized data curation and training methods to achieve state-of-the-art performance in instruction-based image editing, supported by a comprehensive benchmark and novel techniques for data efficiency and optimization stability.

  • 19 authors

· Published on Feb 12, 2026

Submitted by

taesiri

FireRed-Image-Edit-1.0 Techinical Report

FireRed-Image-Edit uses a diffusion transformer with optimized data curation and training methods to achieve state-of-the-art performance in instruction-based image editing, supported by a comprehensive benchmark and novel techniques for data efficiency and optimization stability.

  • 19 authors

· Feb 12, 2026

Submitted by

ChilleD

Adapting Web Agents with Synthetic Supervision

SynthAgent is a synthetic supervision framework that refines both tasks and trajectories to improve data quality and enhance web agent adaptation to new websites.

  • 12 authors

· Published on Nov 8, 2025

Submitted by

ChilleD

Adapting Web Agents with Synthetic Supervision

SynthAgent is a synthetic supervision framework that refines both tasks and trajectories to improve data quality and enhance web agent adaptation to new websites.

  • 12 authors

· Nov 8, 2025

Submitted by

daixufang

Submitted by

daixufang

Submitted by

CSJianYang

Evaluating and Aligning CodeLLMs on Human Preference

A human-curated benchmark (CodeArena) and a large synthetic instruction corpus (SynCode-Instruct) are introduced to evaluate code LLMs based on human preference alignment, revealing performance differences between open-source and proprietary models.

· Published on Dec 6, 2024

Submitted by

CSJianYang

Evaluating and Aligning CodeLLMs on Human Preference

A human-curated benchmark (CodeArena) and a large synthetic instruction corpus (SynCode-Instruct) are introduced to evaluate code LLMs based on human preference alignment, revealing performance differences between open-source and proprietary models.

Submitted by

rajkumarrawal

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Submitted by

rajkumarrawal

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Submitted by

unilm

VibeVoice Technical Report

VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.

Submitted by

unilm

VibeVoice Technical Report

VibeVoice synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer, achieving superior performance and fidelity.

Submitted by

taesiri

Submitted by

taesiri

Submitted by

akhaliq

Very Large-Scale Multi-Agent Simulation in AgentScope

Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.

· Published on Jul 25, 2024

Submitted by

akhaliq

Submitted by

taesiri

World Action Models are Zero-shot Policies

DreamZero is a World Action Model that leverages video diffusion to enable better generalization of physical motions across novel environments and embodiments compared to vision-language-action models.

Submitted by

taesiri

World Action Models are Zero-shot Policies

DreamZero is a World Action Model that leverages video diffusion to enable better generalization of physical motions across novel environments and embodiments compared to vision-language-action models.

Submitted by

zhongwenxu

Single-stream Policy Optimization

Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to better performance and efficiency.

tencent Tencent · Published on Sep 16, 2025

Submitted by

zhongwenxu

Single-stream Policy Optimization

Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to better performance and efficiency.

Submitted by

xhyandwyy

Mobile-Agent-v3: Foundamental Agents for GUI Automation

GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models and frameworks that achieve state-of-the-art performance across various benchmarks using innovations in environment infrastructure, agent capabilities, and scalable reinforcement learning.

· Published on Aug 21, 2025

Submitted by

xhyandwyy

Mobile-Agent-v3: Foundamental Agents for GUI Automation

GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models and frameworks that achieve state-of-the-art performance across various benchmarks using innovations in environment infrastructure, agent capabilities, and scalable reinforcement learning.

DeepSeek-V3 Technical Report

DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost.

deepseek-ai DeepSeek · Published on Dec 27, 2024

DeepSeek-V3 Technical Report

DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost.