Trending Papers - Hugging Face
new
Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
by
AK and the research community
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices
Monolingual ASR models trained on a balanced mix of high-quality, pseudo-labeled, and synthetic data outperform multilingual models for small model sizes, achieving superior error rates and enabling on-device ASR for underrepresented languages.
· Published on Sep 2, 2025
Very Large-Scale Multi-Agent Simulation in AgentScope
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.
· Published on Jul 25, 2024
Utonia: Toward One Encoder for All Point Clouds
Utonia enables cross-domain point cloud representation learning through a unified self-supervised transformer encoder, enhancing perception and supporting embodied and multimodal reasoning tasks.
Utonia: Toward One Encoder for All Point Clouds
Utonia enables cross-domain point cloud representation learning through a unified self-supervised transformer encoder, enhancing perception and supporting embodied and multimodal reasoning tasks.
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.
· Published on Apr 28, 2025
AutoDev: Automated AI-Driven Development
AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.
- 5 authors
· Published on Mar 13, 2024
AutoDev: Automated AI-Driven Development
AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.
dLLM: Simple Diffusion Language Modeling
A unified open-source framework is presented that standardizes core components of diffusion language modeling for reproduction, customization, and accessible development of both large and small models.
dLLM: Simple Diffusion Language Modeling
A unified open-source framework is presented that standardizes core components of diffusion language modeling for reproduction, customization, and accessible development of both large and small models.
Mobile-Agent-v3: Foundamental Agents for GUI Automation
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models and frameworks that achieve state-of-the-art performance across various benchmarks using innovations in environment infrastructure, agent capabilities, and scalable reinforcement learning.
· Published on Aug 21, 2025
Mobile-Agent-v3: Foundamental Agents for GUI Automation
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models and frameworks that achieve state-of-the-art performance across various benchmarks using innovations in environment infrastructure, agent capabilities, and scalable reinforcement learning.
Qwen3-TTS Technical Report
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.
Qwen · Published on Jan 22, 2026
Qwen3-TTS Technical Report
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.
Qwen · Jan 22, 2026
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.
- 11 authors
· Published on Nov 17, 2025
Agent READMEs: An Empirical Study of Context Files for Agentic Coding
Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.
- 11 authors
· Nov 17, 2025
LightRAG: Simple and Fast Retrieval-Augmented Generation
LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.
- 5 authors
· Published on Oct 8, 2024
MemOS: A Memory OS for AI System
MemOS, a memory operating system for Large Language Models, addresses memory management challenges by unifying plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.
· Published on Jul 4, 2025
MemOS: A Memory OS for AI System
MemOS, a memory operating system for Large Language Models, addresses memory management challenges by unifying plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.
Self-Supervised Prompt Optimization
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.
· Published on Feb 7, 2025
Self-Supervised Prompt Optimization
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.
FireRed-OCR Technical Report
FireRed-OCR transforms general vision-language models into specialized OCR systems through structured data synthesis and progressive training strategies.
- 22 authors
· Published on Mar 2, 2026
FireRed-OCR Technical Report
FireRed-OCR transforms general vision-language models into specialized OCR systems through structured data synthesis and progressive training strategies.
Single-stream Policy Optimization
Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to better performance and efficiency.
Tencent · Published on Sep 16, 2025
Single-stream Policy Optimization
Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to better performance and efficiency.
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
RAG-Anything: All-in-One RAG Framework
RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.
World Action Models are Zero-shot Policies
DreamZero is a World Action Model that leverages video diffusion to enable better generalization of physical motions across novel environments and embodiments compared to vision-language-action models.
World Action Models are Zero-shot Policies
DreamZero is a World Action Model that leverages video diffusion to enable better generalization of physical motions across novel environments and embodiments compared to vision-language-action models.
MNN: A Universal and Efficient Inference Engine
MNN, a universal and efficient deep learning inference engine for mobile devices, addresses model compatibility, device diversity, and resource limitations through pre-inference, kernel optimization, and backend abstraction.
- 12 authors
· Published on Feb 27, 2020
MNN: A Universal and Efficient Inference Engine
MNN, a universal and efficient deep learning inference engine for mobile devices, addresses model compatibility, device diversity, and resource limitations through pre-inference, kernel optimization, and backend abstraction.
- 12 authors
· Feb 27, 2020
Text-to-LoRA: Instant Transformer Adaption
Text-to-LoRA (T2L) is a hypernetwork that dynamically adapts large language models using natural language descriptions, enabling efficient and zero-shot task-specific fine-tuning with minimal computational resources.
- 4 authors
· Published on Jun 6, 2025
Text-to-LoRA: Instant Transformer Adaption
Text-to-LoRA (T2L) is a hypernetwork that dynamically adapts large language models using natural language descriptions, enabling efficient and zero-shot task-specific fine-tuning with minimal computational resources.
- 4 authors
· Jun 6, 2025
Qwen3-Coder-Next Technical Report
Qwen3-Coder-Next is an 80-billion-parameter language model that activates only 3 billion parameters during inference, achieving strong coding capabilities through agentic training with verifiable task synthesis and reinforcement learning.
Qwen · Published on Feb 28, 2026
Qwen3-Coder-Next Technical Report
Qwen3-Coder-Next is an 80-billion-parameter language model that activates only 3 billion parameters during inference, achieving strong coding capabilities through agentic training with verifiable task synthesis and reinforcement learning.
Qwen · Feb 28, 2026