AI Agent

自主智能体、AI助手、工具调用与规划推理等 AI Agent 领域前沿动态。

Multimodal Multi-Agent Ransomware Analysis Using AutoGen
Agent

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal, multi-agent AI framework demonstrates superior ransomware classification with a Macro-F1 score of up...

Multimodal Multi-Agent Ransomware Analysis Using AutoGen
Agent

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal multi-agent AI framework enhances ransomware detection by integrating static, dynamic, and network an...

Multimodal Multi-Agent Ransomware Analysis Using AutoGen
Agent

Multimodal Multi-Agent Ransomware Analysis Using AutoGen

A novel multimodal multi-agent AI framework using AutoGen achieves superior ransomware classification with a Macro-F1 sc...

Personalized Collaborative Learning with Affinity-Based Variance Reduction
Agent

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Affinity-based Personalized Collaborative Learning (AffPCL) is a novel AI framework for heterogeneous multi-agent system...

Personalized Collaborative Learning with Affinity-Based Variance Reduction
Agent

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Personalized Collaborative Learning (PCL) is a novel framework that resolves the conflict between collaboration and pers...

Personalized Collaborative Learning with Affinity-Based Variance Reduction
Agent

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Personalized Collaborative Learning (PCL) is a novel multi-agent AI framework that resolves the tension between collabor...

Personalized Collaborative Learning with Affinity-Based Variance Reduction
Agent

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Affinity-based Personalized Collaborative Learning (AffPCL) is a novel framework that enables heterogeneous AI agents to...

Learning Acrobatic Flight from Preferences
Agent

Learning Acrobatic Flight from Preferences

Researchers developed the Reward Ensemble under Confidence (REC) framework that enables AI to learn complex acrobatic dr...

Learning Acrobatic Flight from Preferences
Agent

Learning Acrobatic Flight from Preferences

A new probabilistic framework called Reward Ensemble under Confidence (REC) enables autonomous drones to learn complex a...

Learning Acrobatic Flight from Preferences
Agent

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables AI agents to master complex acrobatic drone flight by learn...

Learning Acrobatic Flight from Preferences
Agent

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables autonomous drones to master acrobatic maneuvers by learning...

Learning Acrobatic Flight from Preferences
Agent

Learning Acrobatic Flight from Preferences

The Reward Ensemble under Confidence (REC) framework enables autonomous drones to master acrobatic maneuvers by learning...

Network Topology Optimization via Deep Reinforcement Learning
Agent

Network Topology Optimization via Deep Reinforcement Learning

A novel deep reinforcement learning algorithm called DRL-GS automates network topology optimization by efficiently searc...

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
Agent

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language

Researchers have developed a generalized neural memory system that enables AI models to be instructed on what to learn a...

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
Agent

Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language

Researchers have developed a generalized neural memory system that allows AI models to be instructed via natural languag...

Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins
Agent

Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

The FlexDOME algorithm is the first to provably achieve near-constant strong constraint violation alongside sublinear st...

Distributional value gradients for stochastic environments
Agent

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a reinforcement learning framework that models distributions over both value function...

Distributional value gradients for stochastic environments
Agent

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a novel reinforcement learning framework that extends distributional reinforcement le...

Distributional value gradients for stochastic environments
Agent

Distributional value gradients for stochastic environments

Distributional Sobolev Training is a novel reinforcement learning framework that extends distributional RL to model both...

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach
Agent

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

A groundbreaking study provides the first theoretical proof that policy transfer is effective for continuous-time reinfo...

Google makes its industrial robotics AI play official–and this time, it means business
Agent

Google makes its industrial robotics AI play official–and this time, it means business

Alphabet has officially integrated its industrial robotics AI subsidiary Intrinsic into Google, positioning it as a dist...

Combinatorial Rising Bandits
Agent

Combinatorial Rising Bandits

Researchers have introduced the Combinatorial Rising Bandit (CRB) framework to address online learning challenges where ...

Combinatorial Rising Bandits
Agent

Combinatorial Rising Bandits

Researchers have introduced the Combinatorial Rising Bandit (CRB) framework, a novel combinatorial online learning model...

Combinatorial Rising Bandits
Agent

Combinatorial Rising Bandits

The Combinatorial Rising Bandit (CRB) is a novel online learning framework that models scenarios where chosen actions yi...

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes
Agent

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

A new research paper (arXiv:2307.15931v2) introduces innovative neural network architectures that make reinforcement lea...

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes
Agent

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Researchers have developed novel reinforcement learning architectures to address time-varying disturbances in partially ...

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes
Agent

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Recent research introduces three novel neural network architectures for reinforcement learning in Partially Observable M...

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes
Agent

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

A new research paper (arXiv:2307.15931v2) introduces novel recurrent neural network architectures that explicitly proces...

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design
Agent

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

Researchers developed the Contextual-LSVI-UCB-Buffer (CLUB) algorithm to optimize reserve prices in multi-phase second-p...

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design
Agent

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

Researchers developed the Contextual-LSVI-UCB-Buffer (CLUB) algorithm for optimizing reserve prices in multi-phase secon...

A Covering Framework for Offline POMDPs Learning using Belief Space Metric
Agent

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

Researchers have developed a novel covering framework for offline POMDP learning that addresses the curse of horizon and...

A Covering Framework for Offline POMDPs Learning using Belief Space Metric
Agent

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

Researchers have developed a novel analytical framework for offline POMDP learning that addresses the dual challenges of...

A Covering Framework for Offline POMDPs Learning using Belief Space Metric
Agent

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

A new research framework for off-policy evaluation in partially observable Markov decision processes (POMDPs) addresses ...

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies
Agent

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies

A novel Generative Adversarial Imitation Learning (GAIL) framework enables robot swarms to learn complex collective beha...

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies
Agent

Generative adversarial imitation learning for robot swarms: Learning from human demonstrations and trained policies

A new research framework enables swarm robotics to learn sophisticated collective behaviors directly from human demonstr...

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation
Agent

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill is a novel AI framework that enables robots to autonomously request, retrieve, and implement new skills throug...

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation
Agent

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill is a novel AI framework that enables robots to autonomously evolve their skill libraries through a closed-loop...

Reinforcement Learning with Symbolic Reward Machines
Agent

Reinforcement Learning with Symbolic Reward Machines

Researchers have introduced Symbolic Reward Machines (SRMs), a novel framework that automates learning of complex, tempo...

Contextual Latent World Models for Offline Meta Reinforcement Learning
Agent

Contextual Latent World Models for Offline Meta Reinforcement Learning

Contextual Latent World Models represent a significant advancement in offline meta-reinforcement learning, integrating c...

Learning in Markov Decision Processes with Exogenous Dynamics
Agent

Learning in Markov Decision Processes with Exogenous Dynamics

A new research breakthrough demonstrates that reinforcement learning (RL) algorithms achieve dramatically improved perfo...

Next Embedding Prediction Makes World Models Stronger
Agent

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that predicts future encoder embeddings directly using a ...

Next Embedding Prediction Makes World Models Stronger
Agent

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a new model-based reinforcement learning agent that uses temporal transformers to predict future encoder e...

Next Embedding Prediction Makes World Models Stronger
Agent

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that eliminates decoder networks by using a temporal tran...

Next Embedding Prediction Makes World Models Stronger
Agent

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a novel model-based reinforcement learning agent that uses a temporal transformer to predict next-step enc...

Improving Diffusion Planners by Self-Supervised Action Gating with Energies
Agent

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Self-supervised Action Gating with Energies (SAGE) is a novel inference-time technique that improves diffusion planners ...

Improving Diffusion Planners by Self-Supervised Action Gating with Energies
Agent

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

SAGE (Self-supervised Action Gating with Energies) is a novel inference-time method that improves diffusion planners in ...

Improving Diffusion Planners by Self-Supervised Action Gating with Energies
Agent

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Researchers developed Self-supervised Action Gating with Energies (SAGE), a novel inference-time technique that improves...

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Agent

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that addresses critical challenges in o...

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Agent

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel AI framework that combines bandit optimization wi...

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Agent

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that combines bandit optimization with ...

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Agent

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

MASPOB (Multi-Agent System Prompt Optimization via Bandits) is a novel framework that efficiently optimizes text prompts...

Post Hoc Extraction of Pareto Fronts for Continuous Control
Agent

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel method that enables artificial intelligence agents to construct Par...

Post Hoc Extraction of Pareto Fronts for Continuous Control
Agent

Post Hoc Extraction of Pareto Fronts for Continuous Control

Mixed Advantage Pareto Extraction (MAPEX) is a novel offline reinforcement learning method that enables efficient constr...

Post Hoc Extraction of Pareto Fronts for Continuous Control
Agent

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel AI method that enables post hoc extraction of Pareto frontiers from...

Post Hoc Extraction of Pareto Fronts for Continuous Control
Agent

Post Hoc Extraction of Pareto Fronts for Continuous Control

MAPEX (Mixed Advantage Pareto Extraction) is a novel method for multi-objective reinforcement learning that enables post...

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving
Agent

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) is a novel reinforcement learning algorithm th...

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving
Agent

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) is a novel reinforcement learning algorithm th...

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving
Agent

Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

The DACER-F (Diffusion Actor-Critic with Entropy Regulator via Flow Matching) algorithm enables real-time autonomous dri...

Heterogeneous Agent Collaborative Reinforcement Learning
Agent

Heterogeneous Agent Collaborative Reinforcement Learning

Heterogeneous Agent Collaborative Reinforcement Learning (HACRL) is a novel AI training framework that enables agents wi...

“大界机器人”完成数亿元D轮融资
Agent

“大界机器人”完成数亿元D轮融资

工业机器人企业大界机器人近日完成数亿元人民币D轮融资,由博华资本管理的梁溪数字产业基金与中金资本旗下基金共同领投。本轮融资资金将主要用于工业具身智能技术的深度迭代,并拓展在船舶海工、能源电力及航空航天等战略性新兴领域的市场。此次融资标志着资...