AI Agent - AI资讯 - AI News Hub

Agent 2026年3月4日

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal, multi-agent AI framework demonstrates superior ransomware classification with a Macro-F1 score of up...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal multi-agent AI framework enhances ransomware detection by integrating static, dynamic, and network an...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal multi-agent AI framework using AutoGen achieves superior ransomware classification with a Macro-F1 sc...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Affinity-based Personalized Collaborative Learning (AffPCL) is a novel AI framework for heterogeneous multi-agent system...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Personalized Collaborative Learning (PCL) is a novel framework that resolves the conflict between collaboration and pers...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Personalized Collaborative Learning (PCL) is a novel multi-agent AI framework that resolves the tension between collabor...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Affinity-based Personalized Collaborative Learning (AffPCL) is a novel framework that enables heterogeneous AI agents to...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning Acrobatic Flight from Preferences

Researchers developed the Reward Ensemble under Confidence (REC) framework that enables AI to learn complex acrobatic dr...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning Acrobatic Flight from Preferences

A new probabilistic framework called Reward Ensemble under Confidence (REC) enables autonomous drones to learn complex a...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables AI agents to master complex acrobatic drone flight by learn...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables autonomous drones to master acrobatic maneuvers by learning...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables autonomous drones to master acrobatic maneuvers by learning...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Network Topology Optimization via Deep Reinforcement Learning

A novel deep reinforcement learning algorithm called DRL-GS automates network topology optimization by efficiently searc...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language

Researchers have developed a generalized neural memory system that enables AI models to be instructed on what to learn a...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language

Researchers have developed a generalized neural memory system that allows AI models to be instructed via natural languag...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

The FlexDOME algorithm is the first to provably achieve near-constant strong constraint violation alongside sublinear st...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a reinforcement learning framework that models distributions over both value function...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a novel reinforcement learning framework that extends distributional reinforcement le...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a novel reinforcement learning framework that extends distributional RL to model both...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

A groundbreaking study provides the first theoretical proof that policy transfer is effective for continuous-time reinfo...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Google makes its industrial robotics AI play official–and this time, it means business

Alphabet has officially integrated its industrial robotics AI subsidiary Intrinsic into Google, positioning it as a dist...

AI News 阅读全文 →

Agent 2026年3月4日

Combinatorial Rising Bandits

Researchers have introduced the Combinatorial Rising Bandit (CRB) framework to address online learning challenges where ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Combinatorial Rising Bandits

Researchers have introduced the Combinatorial Rising Bandit (CRB) framework, a novel combinatorial online learning model...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Combinatorial Rising Bandits

The Combinatorial Rising Bandit (CRB) is a novel online learning framework that models scenarios where chosen actions yi...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

A new research paper (arXiv:2307.15931v2) introduces innovative neural network architectures that make reinforcement lea...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Researchers have developed novel reinforcement learning architectures to address time-varying disturbances in partially ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Recent research introduces three novel neural network architectures for reinforcement learning in Partially Observable M...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

A new research paper (arXiv:2307.15931v2) introduces novel recurrent neural network architectures that explicitly proces...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

Researchers developed the Contextual-LSVI-UCB-Buffer (CLUB) algorithm to optimize reserve prices in multi-phase second-p...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

Researchers developed the Contextual-LSVI-UCB-Buffer (CLUB) algorithm for optimizing reserve prices in multi-phase secon...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

Researchers have developed a novel covering framework for offline POMDP learning that addresses the curse of horizon and...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

Researchers have developed a novel analytical framework for offline POMDP learning that addresses the dual challenges of...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

A new research framework for off-policy evaluation in partially observable Markov decision processes (POMDPs) addresses ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies

A novel Generative Adversarial Imitation Learning (GAIL) framework enables robot swarms to learn complex collective beha...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies

A new research framework enables swarm robotics to learn sophisticated collective behaviors directly from human demonstr...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill is a novel AI framework that enables robots to autonomously request, retrieve, and implement new skills throug...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill is a novel AI framework that enables robots to autonomously evolve their skill libraries through a closed-loop...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Reinforcement Learning with Symbolic Reward Machines

Researchers have introduced Symbolic Reward Machines (SRMs), a novel framework that automates learning of complex, tempo...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Contextual Latent World Models for Offline Meta Reinforcement Learning

Contextual Latent World Models represent a significant advancement in offline meta-reinforcement learning, integrating c...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Learning in Markov Decision Processes with Exogenous Dynamics

A new research breakthrough demonstrates that reinforcement learning (RL) algorithms achieve dramatically improved perfo...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that predicts future encoder embeddings directly using a ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a new model-based reinforcement learning agent that uses temporal transformers to predict future encoder e...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that eliminates decoder networks by using a temporal tran...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that uses a temporal transformer to predict next-step enc...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Self-supervised Action Gating with Energies (SAGE) is a novel inference-time technique that improves diffusion planners ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

SAGE (Self-supervised Action Gating with Energies) is a novel inference-time method that improves diffusion planners in ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Researchers developed Self-supervised Action Gating with Energies (SAGE), a novel inference-time technique that improves...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that addresses critical challenges in o...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel AI framework that combines bandit optimization wi...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that combines bandit optimization with ...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that efficiently optimizes text prompts...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel method that enables artificial intelligence agents to construct Par...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Post Hoc Extraction of Pareto Fronts for Continuous Control

Mixed Advantage Pareto Extraction (MAPEX) is a novel offline reinforcement learning method that enables efficient constr...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel AI method that enables post hoc extraction of Pareto frontiers from...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel method for multi-objective reinforcement learning that enables post...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) is a novel reinforcement learning algorithm th...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) is a novel reinforcement learning algorithm th...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

The DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) algorithm enables real-time autonomous dri...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

Heterogeneous Agent Collaborative Reinforcement Learning

Heterogeneous Agent Collaborative Reinforcement Learning (HACRL) is a novel AI training framework that enables agents wi...

arXiv cs.LG 阅读全文 →

Agent 2026年3月4日

“大界机器人”完成数亿元D轮融资

工业机器人企业大界机器人近日完成数亿元人民币D轮融资，由博华资本管理的梁溪数字产业基金与中金资本旗下基金共同领投。本轮融资资金将主要用于工业具身智能技术的深度迭代，并拓展在船舶海工、能源电力及航空航天等战略性新兴领域的市场。此次融资标志着资...

36kr 阅读全文 →