AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Published: 2026-06-04 13:28:51

Authors: Qize Yu, Jiadi You, Yuran Wang, Jiaqi Liang, Bowen Ping, Yang Tian, Yue Chen, Minghong Cai, Zeying Gong, Ruihai Wu, Yinchuan Li, Junwei Liang, Yingcong Chen

Categories: cs.RO, cs.CV, cs.MM

Abstract:
Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise perception--action mappings. To address this challenge, we propose \textbf{AffordanceVLA}, a unified framework that introduces structured affordance forecasting as a task-oriented intermediate representation to establish a more precise and robust perception--action mapping. Specifically, we progressively model manipulation priors through three complementary components: 1) \textbf{Which2Act} for object-centric grounding via visual latent prediction to suppress distractions; 2) \textbf{Where2Act} for 2D interaction localization via affordance map estimation; and 3) \textbf{How2Act} for 3D geometric reasoning to guide manipulation policies. These affordance cues provide spatially grounded, semantically conditioned, and action-coupled intermediate representations, thereby naturally bridging vision, language and action. We integrate these modules into a Mixture-of-Transformer (MoT) architecture with specialized experts and train the model using a three-stage training strategy with a progressive data curriculum. To overcome the scarcity of dense affordance labels in robotic datasets, we also develop a robust automated data augmentation pipeline. Extensive experiments on simulation and real-world demonstrate that AffordanceVLA achieves strong performance across diverse manipulation scenarios.

arXiv Page | PDF

Score: 0

Post-processed frozen-flow methods for the long time sampling of ergodic dynamics on Riemannian manifolds

Published: 2026-06-04 13:26:33

Authors: Adrien Busnot Laurent, Sébastien Macé

Categories: math.NA, math.CO, math.DG, math.PR

Abstract:
In this work, we propose a novel intrinsic approach to the approximation of ergodic SDEs on Riemannian manifolds, which include Riemannian Langevin dynamics. In opposition to the standard extrinsic approaches such as penalization methods and projection methods, our methodology does not use embeddings or coordinates and only relies on natural geometric operations: geodesics, parallel transport,... We give a criterion for high order of accuracy for the invariant measure, develop new intrinsic numerical methods designed solely for sampling the invariant measure, and derive high order conditions using a new algebraic operation on exotic Lie-Butcher series. In the spirit of the Leimkuhler-Matthews method, our approach prioritizes long time sampling efficiency over finite time accuracy, and outperforms the previous extrinsic and intrinsic approaches in terms of cost for a given accuracy, which we illustrate with several numerical experiments.

arXiv Page | PDF

Score: 0

WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation

Published: 2026-06-04 13:23:05

Authors: Shengtao Zheng, Kai Li, Weichen Zhang, Yu Meng, Chen Gao, Xinlei Chen, Yong Li, Xiao-Ping Zhang

Categories: cs.AI

Abstract:
End-to-end Vision-Language-Action (VLA) models have shown promise in UAV navigation. However, existing approaches typically rely on historical observations to directly predict actions, often struggling in dense urban environments where severe occlusions and sharp turns result in drastic viewpoint transitions. We argue that the ability to "imagine" future states -- inherent in World Models -- is critical for robust decision-making under such partial observability. To address this, we construct a challenging Urban Canyon Traversal Benchmark, specifically designed to evaluate spatial understanding in scenarios characterized by severe occlusions and drastic viewpoint transitions. To this end, we propose WorldFly, a novel world-model-based VLA framework that employs a dual-branch coupled flow matching mechanism to jointly generate future video predictions and navigation actions, thereby explicitly guiding the agent's policy via spatial imagination. Extensive evaluations on our benchmark demonstrate that WorldFly outperforms other baselines, particularly in unseen environments, validating the effectiveness of integrating world models into embodied aerial agents.

arXiv Page | PDF

Score: 0

RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing

Published: 2026-06-04 13:19:53

Authors: Weilin Lin, Ziqi Lin, Zhenxing Zhou, Jianze Li, Tong Zhang, Hui Xiong, Li Liu

Categories: cs.CR

Abstract:
Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. To explore this vulnerability, we introduce RedEdit, a novel black-box red-teaming agent that formulates photo-editing evasion as a combinatorial search problem over edit-tool sequences. It adopts a Vision-Language-Model (VLM)-based proposer to generate semantically targeted candidate edits and a Monte Carlo Tree Search (MCTS) planner to prioritize promising edit paths while backtracking from ineffective ones. Together, the proposer and planner instantiate two key capabilities of human attackers, i.e., domain knowledge and iterative backtracking, respectively, to reproduce this practical threat. Our extensive experiments on UnsafeBench reveal profound systemic vulnerabilities: fewer than two edits on average enable 76.2% of unsafe images to evade detectors, while retaining 93.0% malicious semantics, meaning that such manipulated content remains perceptually malicious to humans while easily bypassing automated moderation. We therefore appeal to the community for more attention to this overlooked practical threat.

arXiv Page | PDF

Score: 0

MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation

Published: 2026-06-04 13:19:49

Authors: Ilyass Taouil, Michal Ciebelski, Shafeef Omar, Haizhou Zhao, Angela Dai, Aaron M. Johnson, Majid Khadiv

Categories: cs.RO

Abstract:
We present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is challenging because the space of possible contact interactions grows combinatorially with the task horizon and the number of objects in the scene. MotionDisco enables rapid discovery of novel motions by coupling a large language model (LLM) guided evolutionary search over sequences of interactions with an efficient sequential kinodynamic trajectory optimizer and pruning strategy, enabling the rapid discovery of novel skills. Through extensive ablation studies, we show that our LLM-guided search discovers successful whole-body trajectories across several challenging long-horizon tasks. Finally, by training reinforcement learning tracking policies on the discovered trajectories, we transfer the motions to a real humanoid robot. This is the first work to discover and deploy long-horizon humanoid loco-manipulation skills entirely through automated evolutionary search. Supplementary videos of the experiments are available at: https://youtu.be/DHiVz34QYlw.

arXiv Page | PDF

Score: 0

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

Published: 2026-06-04 13:17:06

Authors: Eric Spencer, Arslan Bisharat, Brian Ortiz, Khushboo Bhadauria, TaiNing Wang, George K. Thiruvathukal, Konstantin Laufer, Mohammed Abuhamad

Categories: cs.SE, cs.AI, cs.LG, cs.LO

Abstract:
TLA+ is a formal specification language for verifying distributed systems and safety-critical protocols. Large language models (LLMs) frequently produce TLA+ specifications that fail the TLC model checker for semantic reasons. Across 25 LLMs, the best public baseline is 26.6% syntactic parse and 8.6% semantic model-check. We present TLA-Prover, a 20-billion-parameter model for TLA+ specification synthesis. Training combines supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO). In the GRPO stage, the model learns to fix its own rejected specifications. We also train a direct preference optimization (DPO) variant from the same SFT checkpoint as an ablation. TLC provides the reward signal directly, with no learned reward model. Four tiers grade each output: Bronze (parses), Silver (no warnings), Gold (passes TLC), and Diamond. To reach Diamond, the model's correctness property is automatically altered in a small way; TLC must then detect a violation. If TLC still passes, the property was always-true and contributes nothing; the output fails Diamond. TLA-Prover reaches 9/30 (i.e. pass@1 = 30%) at both Gold and Diamond on a held-out 30-problem benchmark. This is roughly 3.5x the 8.6% untuned baseline. The DPO variant reaches 20% at Diamond. Gold and Diamond coincide at every checkpoint; this prevents the trivial-property failure mode.

arXiv Page | PDF

Score: 0

Aging Time dependent Static Friction between Soft and Hard Solid Interfaces

Published: 2026-06-04 13:17:03

Authors: Vinay A. Juvekar, Arun K. Singh

Categories: cond-mat.soft

Abstract:
Understanding of friction between sliding surfaces is critical for variety of applications. We present a friction model between soft and hard solid interfaces for studying aging time dependent static friction. The model is based on strengthening of dangling chains with the substrate during aging period. The friction model is, in turn, validated with the experimental data from literature. Friction properties are also estimated in terms of gelatin concentration to justify the results.

arXiv Page | PDF

Score: 0

Rigidity of complete non-compact generalized $m$-quasi-Einstein manifolds

Published: 2026-06-04 13:16:13

Authors: M. Ahmad Mirshafeazadeh

Categories: math.DG

Abstract:
We study complete non-compact gradient generalized m-quasi-Einstein manifolds with constant scalar curvature $R \le 0$, soliton function $λ> 0$, and $m > 1$, where the coefficient $μ= 1/m$ is constant. We introduce the weighted function $v = e^{-f/m}λ$ and prove it is subharmonic. This leads to five rigidity results, each forcing the manifold to be Euclidean. We first show by a concrete example that if $μ$ is allowed to be nonconstant, the rigidity conclusions fail even when all other hypotheses are satisfied. Therefore the constant mu condition is essential.

arXiv Page | PDF

Score: 0

Deterring Searches for Child Sexual Abuse Material on Google Search and Promoting Help-Seeking

Published: 2026-06-04 13:13:30

Authors: Rebecca Umbach, Griffin Hunt, John Buckley, Joel Scanlan, Caoilte Ó Ciardha, Ethel Quayle, Ainslie Heasman, Maximlian von Heyden, Elizabeth Letourneau, Donald Findlater, Tegan Insoll, Richard Wortley, Chad Steel, Abhishek Roy

Categories: cs.HC, cs.CY

Abstract:
Google Search deploys a "Onebox" feature at the top of the results page when users conduct searches for Child Sexual Abuse Material. This study evaluates the impact of a strategic shift in this feature, comparing a revised intervention, focused on repercussions and therapeutic resources, to a previous iteration that focused on reporting. Using a difference-in-differences analysis of internal Google Search logs data, we found the new messaging resulted in a 3.8 percentage point reduction as compared to the status quo in subsequent CSAM-related queries within the same Search session. We found an average click through rate of 0.73% on any of the hyperlinked buttons to help-providing resources. Together, this research presents convergent evidence that a subset of individuals can be deterred from ongoing CSAM-seeking and redirected to therapeutic services.

arXiv Page | PDF

Score: 0

Ensemble Kalman Inversion as an Inertial Interacting Particle System

Published: 2026-06-04 13:07:40

Authors: Michael Herty, Pierpaolo Porretta, Giuseppe Visconti

Categories: math.NA, math.DS, math.OC

Abstract:
Ensemble Kalman Inversion (EKI) is a derivative-free, ensemble-based method for inverse and optimization problems. Its continuous-time formulation can be interpreted as an interacting particle system driven by a Kalman-type preconditioned descent direction. A well-known limitation of this dynamics is the possible premature collapse of the covariance of the ensemble, which makes the method sensitive to the initial ensemble. We introduce a second-order particle system in which the particles evolve according to an inertial dynamics. The model combines a Kalman-type relaxation force with damping, attraction towards the ensemble mean, and a short-range repulsive interaction designed to counteract ensemble collapse. The resulting dynamics can be interpreted as a heavy-ball reformulation of continuous-time EKI enriched by competing attractive and repulsive mechanisms. For linear inverse problems, we analyze the induced mean and fluctuation dynamics and identify a parameter regime in which fully collapsed configurations are linearly unstable. We further characterize asymptotic equilibria through a constrained optimality condition on the subspace retained by the limiting ensemble covariance and derive an exponential decay estimate. Numerical experiments illustrate the effect of inertia and repulsion on the ensemble dynamics and compare the proposed second-order method with first-order EKI-type

arXiv Page | PDF

Score: 0

Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

Published: 2026-06-04 13:03:16

Authors: Dianxing Shi, Junqi He, Junhao Chen, Bowen Wang, Yuta Nakashima

Categories: cs.AI

Abstract:
Self-evolving agents improve through continual self-play and self-generated learning signals, but autonomous evolution can also cause capability degradation and safety drift. Although human feedback has proven effective for static and post-trained agents, its role in self-evolving systems remains underexplored. We introduce Agent Norm Correction through Human-like Oversight and Review (ANCHOR), an LLM-based framework that simulates human supervision and delivers feedback at various phases of self-evolution. With ANCHOR, we evaluate two representative open-source self-evolving agent systems across coding, mathematical reasoning, and safety. Our results show that even limited supervision substantially mitigates safety degradation while preserving stable performance on core evolutionary objectives. Further analysis shows that supervision over the output verification phase is the most effective for intervention, whereas increasing supervision frequency yields diminishing returns. These findings provide empirical evidence and practical guidance for designing more stable, controllable, and human-aligned self-evolving agent systems.

arXiv Page | PDF

Score: 0

A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding

Published: 2026-06-04 12:47:49

Authors: Chen Hu, Rui Wang, Jiale Zhou, Jingjun Yi, Shaocheng Jin, Yidong Song, Yefeng Zheng

Categories: cs.LG

Abstract:
Electroencephalography (EEG) offers noninvasive, millisecond resolution recordings of neuronal activity and is widely used in neuroscience and healthcare. Many EEG decoding pipelines rely on covariance descriptors for their robustness to noise, but such representations are sensitive to channel-wise scaling. Recent studies have therefore advocated full-rank correlation matrices as a scale-invariant alternative for EEG decoding. In this paper, we propose a general framework for Sliced Wasserstein (SW) discrepancies on manifolds endowed with Pullback Euclidean Metrics (PEMs), termed Pullback Euclidean Metric Sliced Wasserstein (PEMSW). Within this framework, we instantiate two Correlation Sliced-Wasserstein (CorSW) discrepancies on the manifold of full-rank correlation matrices under two recently introduced correlation geometries, \textit{i.e.}, the Off-Log Metric (OLM) and Log-Scaled Metric (LSM). Building on CorSW, we further develop a domain generalization (DG) framework for EEG decoding. Experiments on three EEG datasets demonstrate improved generalization under distribution shifts, with low training overhead and no additional inference cost. The source code is available at https://github.com/ChenHu-ML/CorSW.

arXiv Page | PDF

Score: 0

HyperVis: Continuous Latent Visual Relational Graphs on the Lorentz Hyperboloid for Compositional Reasoning

Published: 2026-06-04 12:40:15

Authors: Moshiur Farazi, Sameera Ramasinghe, Mahbub Ahmed Turza, Shafin Rahman

Categories: cs.CV

Abstract:
Vision-Language Models (VLMs) struggle with compositional reasoning that requires understanding inter-object relationships. A natural remedy is to inject explicit scene graph triplets $\langle s, p, o \rangle$ from an off-the-shelf scene graph generator (SGG), but we show this backfires: discrete text labels collide with the continuous visual modality, degrading GQA accuracy from 60.38\% to 58.86\%. We propose \textbf{HyperVis}, which bypasses the SGG semantic bottleneck entirely. From $N$ class-agnostic region proposals, we compute a dense $O(N^2)$ visual relation tensor via spatially-biased cross-attention, project it onto a Lorentz hyperboloid, and enforce hierarchy through spatial physics, namely IoA-driven entailment cones and exterior-angle repulsion. We discover that HyperVis contributes in two complementary ways: (1) as a \emph{training-time regularizer}, the hyperbolic relational losses shape LoRA representations that improve generative VQA (GQA 61.03\% vs.\ 57.21\% for LoRA fine-tuning without relational losses, recovering and surpassing the baseline); and (2) as an \emph{inference-time relational encoder}, hyperbolic prefix tokens boost discriminative compositional scoring (SugarCrepe 79.94\%, $+$6.25pp over baseline). The learned curvature stabilises at $κ{=}4.0$, an order of magnitude above prior hyperbolic VLMs where $κ$ typically collapses toward zero, indicating that continuous visual features genuinely require the exponential volume of strongly curved space. A controlled Euclidean ablation confirms this decomposition: the relational pipeline regularises LoRA comparably in flat space (GQA 60.81\%), but the compositionality gain is specifically hyperbolic (SugarCrepe $+$4.58pp over Euclidean), with entailment loss ${\sim}6{\times}$ higher in Euclidean training. Codes are available at TBA.

arXiv Page | PDF

Score: 0

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

Published: 2026-06-04 12:34:15

Authors: Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Shota Takashiro, Soichiro Nishimori, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

Categories: cs.LG, cs.AI, cs.CL

Abstract:
Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order-statistic objectives. OrderGrad optimizes finite-sample L-statistics, i.e., weighted averages of sorted rewards or costs, recovering objectives such as VaR, CVaR, trimmed means, medians, and top-m/best-of-K criteria by changing only the rank weights. For any fixed sample size and rank-weight vector, OrderGrad provides an unbiased gradient estimator for the corresponding order-statistic objective. The method is implemented as a simple reward transformation that can then be used in an otherwise standard policy-gradient or reparameterized update. We study the resulting estimator's variance behavior and evaluate it on tasks where mean optimization is mismatched to the deployment objective, including LLM math post-training and other tasks. OrderGrad provides a unified, plug-and-play route to risk-averse, robust, and exploratory learning. Code: https://github.com/paavo5/ordergrad

arXiv Page | PDF

Score: 0

The Dignity-Centric Stack: A Commons-Governed, Horizontally Federated Architecture for Human-Dignity AI

Published: 2026-06-04 12:21:07

Authors: Eduardo C. Garrido-Merchán

Categories: cs.CY

Abstract:
The human-dignity-centric digital social contract grounds personal data in human dignity, data personalism, and data sovereignty, and articulates six dimensions of data governance: technological oversight, automation limits, economic justice, political legitimacy, social cohesion, and legal guarantees. It presupposes, however, that enforcement falls to State regulators, licensed fiduciaries, and multi-stakeholder bodies embedded in existing legal systems. This paper asks whether its normative content can instead be realized not as rules imposed on the owners of the AI stack from without, but as a commons-governed infrastructure that any person, firm, or State may use and fund while its governance stays horizontal, polycentric, and subsidiary. We construct the Dignity Stack, a six-layer architecture mapping each dimension onto a layer of commons-governed AI infrastructure, with protocols drawn from the Liberation Stack framework and from the cooperative, mutualist, and libertarian-municipalist traditions. The commons is State-agnostic rather than anti-State, anarchist in its horizontal means but not in the abolition of the State. Its central device is a decoupling of capital from control, by which the stack functions as a shared civic battery, charged by many contributors yet steered by none in proportion to its charge. We prove that this defeats formal capture through votes or surplus, and show that structural capture, the leverage of a dominant supplier free to withdraw what it provides, is resisted only insofar as operational supply is polycentric and substitutable, a condition demanding at the lower layers and perhaps presently unattainable at chip fabrication. We conclude, with explicit attention to its limits, that commons-governed AI realizes the values the contract proclaims more faithfully than the regulation it presupposes.

arXiv Page | PDF

Score: 0

On Advantage Estimates for Max@K Policy Gradients

Published: 2026-06-04 12:16:39

Authors: Shota Takashiro, Soichiro Nishimori, Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Gouki Minegishi, Yusuke Iwasawa, Takeshi Kojima, Yutaka Matsuo

Categories: cs.LG, cs.AI, cs.CL

Abstract:
Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as pass@K and max@K directly, yet existing policy-gradient estimators for these objectives use different signals, baselines, and normalizations, making their relationships unclear. We study this issue through baseline design and advantage centering. Starting from the advantage estimator of a leading method in the field, we show that it is policy-gradient unbiased but yields a non-centered advantage. We then introduce a Leave-Two-Out baseline that preserves policy-gradient unbiasedness while making realized batch advantages exactly centered. The resulting method, MaxPO, has an efficient quadratic-time implementation and integrates naturally into group-based RL for LLM post-training. We further derive the canonical finite-batch advantage for max@K, providing a unified view of existing advantage estimators. Empirically, we verify that the L2O baseline reduces gradient variance and outperforms non-centered alternatives.

arXiv Page | PDF

Score: 0

FontFusion: Enhancing Generative Text in Diffusion Models with Typographic Conditioning

Published: 2026-06-04 12:07:12

Authors: Marian Lupascu, Nipun Jindal, Ionut Mironica, Zhaowen Wang

Categories: cs.CV, cs.GR

Abstract:
Typography generation in diffusion models faces a persistent trade-off: enabling precise font control typically degrades text legibility, while maintaining readability often sacrifices typographic fidelity. We present FontFusion, a plug-and-play conditioning framework for Diffusion Transformer (DiT) architectures that resolves this dilemma through three core innovations: (1) a hierarchical token representation establishing explicit text-font relationships at multiple granularities, (2) position-aware embeddings creating spatial bindings between typography and image content, and (3) a multi-level token dropping strategy improving both computational efficiency and generalization to unseen fonts. Our systematic evaluation of font embedding spaces reveals that a dual encoder combining DeepFont and DINOv2 outperforms any single encoder for typography tasks. FontFusion demonstrates 76% relative improvement on challenging decorative fonts over single-encoder baselines and font consistency gains exceeding approximately 68-76% over unconditioned models, while integrating into existing DiT architectures without retraining.

arXiv Page | PDF

Score: 0

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

Published: 2026-06-04 12:07:07

Authors: Seung Hwan Cho, Young-Min Kim

Categories: cs.CL, cs.SD, eess.AS

Abstract:
Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance.Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.

arXiv Page | PDF

Score: 0

A Two-Graph Refinement of Paulsen's Lollipop Bounds

Published: 2026-06-04 12:05:17

Authors: Siddhartha Mahajan, Paras Chopra

Categories: math.CO, math.MG

Abstract:
Let $a_L(n)$ be the maximum number of regions into which $n$ lollipops divide the plane. Paulsen introduced a second obstruction for this problem, based on pairs of circles meeting at obtuse angle, in addition to the stem-direction obstruction of Cutler-Karlsson-Sloane. We recast Paulsen's argument as a weighted problem for two graphs: a $K_4$-free graph $D$ of non-close stem pairs and a $K_5$-free graph $E$ of non-intriguing circle pairs. For the total number $C$ of pairwise crossings, $$ C\le 4\binom n2+|D|+|E|+|D\cap E|. $$ Paulsen bounds the final term by $|D|$. We keep the overlap term and analyze near-extremal configurations of $D$ and $E$. This closes all of Paulsen's remaining gaps up to $n=17$, and also closes $n=19$: $$ \begin{array}{c} a_L(0),a_L(1),\ldots,a_L(17)\\ =1,2,10,25,45,71,104,142,186,237,294,356,425,500,580,667,761,859, \end{array} $$ and $$ a_L(19)=1076. $$ The same method gives the one-region gaps $$ 964\le a_L(18)\le965,\qquad 1193\le a_L(20)\le1194. $$

arXiv Page | PDF

Score: 0

Barbell Codes: qLDPC Codes for Superconducting Quantum Hardware

Published: 2026-06-04 12:01:24

Authors: Shin Ho Choe, Vincent Steffan, Florian Vigneau, Pedro Parrado-Rodríguez, Hsiang-Sheng Ku, Martin Leib, Francisco Revson Fernandes Pereira, Fedor Šimkovic

Categories: quant-ph

Abstract:
The major challenge on the way to fault-tolerant quantum computing comes from the insufficient quality of hardware components and the difficulty of scaling their number without further compromising fidelity. Quantum Low-Density Parity-Check (qLDPC) codes offer a promising solution by encoding logical qubits with low overhead and at a comparatively high code distance. However, it remains an open question how to scalably implement efficient qLDPC codes on fixed-connectivity quantum chips without increasing hardware complexity to enable the non-local interactions in their underlying QEC cycles. We resolve this challenge for the first time by introducing a family of qLDPC "barbell" codes accompanied by a realistic chip layout that natively supports all required two-qubit interactions. Crucially, the hardware complexity required to implement barbell codes remains constant as code distance increases. We provide a detailed investigation into the feasibility of all required hardware components and simulate a specific family of barbell codes against circuit-level noise. We find that, with a modest overhead of $<30$ data qubits per logical qubit, barbell codes can preserve information at a physical noise strength of $10^{-4}$ for several trillion QEC cycles. Simulations of logical multi-Pauli measurements, performed with circuits tailored to the chip, yield similar logical performance per QEC round, indicating that entangling gates between logical qubits in barbell codes can be realized fault-tolerantly.

arXiv Page | PDF

Score: 0

Residual-based Kaczmarz methods for tensor linear equations with t-product

Published: 2026-06-04 11:57:39

Authors: Li-Lin Ji, Juanjuan Sun, Jun-Feng Yin

Categories: math.NA

Abstract:
Tensor linear systems widely arise from high-dimensional data mining and computing, for instance, natural language processing and machine learning. A class of residual-based tensor Kaczmarz method is proposed for tensor linear equations with t-product. Theoretical analyses prove the convergence and give an upper bound of the convergence rate of the proposed method. Furthermore, an accelerated residual-based Kaczmarz method with heavy ball momentum is developed. Numerical experiments verify the efficiency of the proposed methods and demonstrate that they are faster than the existing tensor Kaczmarz methods.

arXiv Page | PDF

Score: 0

IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval

Published: 2026-06-04 11:39:50

Authors: Xiaoman Wang, Yaoze Zhang, Wenzhuo Fan, Hongwei Zhang, Ding Wang, Guohang Yan, Song Mao, Botian Shi, Yunshi Lan, Pinlong Cai

Categories: cs.CL

Abstract:
Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate time with coarse-grained timestamps or metadata, failing to capture rich temporal structures such as duration, overlap, and containment. We propose IA-RAG, a hierarchical temporal RAG framework that models knowledge as time intervals and performs retrieval under formal temporal constraints. IA-RAG represents facts as Interval Event Units (IEUs) and organizes them into a hierarchical Thematic Forest, where temporal dependencies are governed by Allen's Interval Algebra. To handle incomplete or uncertain temporal boundaries, IA-RAG further introduces a Sub-graph Time Tightening mechanism that refines fuzzy intervals through logical constraints within connected event subgraphs. In addition, IA-RAG supports implicit temporal semantic retrieval through interval-algebra-guided traversal. Experiments on multiple temporal question answering benchmarks, including TimeQA, TempReason, and ComplexTR, demonstrate that IA-RAG achieves strong temporal retrieval and reasoning performance, particularly on complex compositional temporal reasoning tasks. Our code is released at https://github.com/xiaoAugenstern/LogicalRAG_TemporalQA.

arXiv Page | PDF

Score: 0

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

Published: 2026-06-04 11:36:08

Authors: Jongyeong Lee, Junya Honda, Shinji Ito, Chansoo Kim

Categories: stat.ML, cs.LG

Abstract:
Follow-the-regularized-leader framework has shown effectiveness and flexibility in online learning problems, where the choice of learning rates are known to be crucial. Recently, adaptive learning rates defined in terms of the arm-selection probabilities, obtained by solving convex optimization, have achieved improved best-of-both-worlds (BOBW) guarantees in various bandit problems. In contrast, BOBW guarantees for its computationally efficient alternative, follow-the-perturbed-leader (FTPL), remain relatively limited since its optimization-free nature ironically makes the design of adaptive, probability-dependent learning rates non-trivial. To address this challenge, we propose an adaptive learning rate for FTPL by introducing surrogate probability functions that can be computed only from the available quantities, without requiring the exact probabilities. Based on these learning rates with surrogate functions, we provide the BOBW guarantee for FTPL with Pareto perturbations for any shape parameter $α>1$, generalizing prior results restricted to specific choices of $α=2$. We further show the BOBW guarantees for FTPL with adaptive learning rates in the bandit problem with expert advices. Our approach preserves the computational simplicity of FTPL while enabling probability-dependent adaptivity, and the surrogate-based methodology may be of independent interest in other algorithmic frameworks beyond FTPL and learning rate designs.

arXiv Page | PDF

Score: 0

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Published: 2026-06-04 11:35:26

Authors: Jianzong Wu, Hao Lian, Jiongfan Yang, Dachao Hao, Ye Tian, Yunhai Tong, Jingyuan Zhu, Biaolong Chen, Qiaosong Qi, Aixi Zhang, Wanggui He, Mushui Liu, Jinlong Liu, Hao Jiang

Categories: cs.CV

Abstract:
Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for editing by concatenating sequence tokens. This concatenation inevitably doubles the sequence length, quadrupling the computational complexity of the self-attention mechanism and introducing prohibitive overhead. To address these bottlenecks, we present LoomVideo, a highly efficient 5B-parameter unified architecture for both video generation and editing. LoomVideo replaces the standard text encoder with a Multimodal Large Language Model (MLLM) and employs Deepstack injection mechanism to align multi-layer MLLM features with the Diffusion Transformer (DiT). Crucially, we introduce a zero-overhead Scale-and-Add conditioning approach for video editing. By scaling and directly adding the clean source video latent to the noised target latent, this elegant design eliminates the need for token concatenation, drastically reducing computational cost while maintaining robust capabilities for complex, non-rigid edits. Furthermore, a Negative Temporal RoPE strategy is seamlessly integrated to handle multiple reference images. Extensive experiments demonstrate that our compact 5B model achieves state-of-the-art or highly competitive performance across comprehensive benchmarks, exhibiting exceptional superiority in e-commerce and fashion generation scenarios. Benefiting from the zero-overhead conditioning mechanism, LoomVideo achieves at least a 5.41x acceleration in inference speed compared to models of similar capabilities, paving the way for highly practical and efficient video foundation models.

arXiv Page | PDF

Score: 0

An Erdős-Ko-Rado Theorem for Tilings

Published: 2026-06-04 11:22:38

Authors: Casey Tompkins

Categories: math.CO

Abstract:
We prove an Erdős-Ko-Rado type extremal result for tilings of a $1 \times n$ chessboard by tiles whose lengths belong to a set $Λ$. Two tilings are said to intersect if they contain a tile spanning the same set of squares. We prove that if $1\inΛ$, then the maximum size of an intersecting family of tilings is attained by the set of all tilings containing a fixed singleton tile at one of its ends. This result generalizes a theorem of Butler, Horn and Tressler, which is equivalent to the case $Λ=\{1,2\}$.

arXiv Page | PDF

Score: 0

Recent Progress around Cohen-Lenstra Heuristics

Published: 2026-06-04 11:15:16

Authors: Jordan S. Ellenberg

Categories: math.NT

Abstract:
In 1983, Henri Cohen and Hendrik Lenstra proposed a conjecture about the distribution of the N-torsion of the class group of a random quadratic field, supported by what was at the time a large amount of computational evidence. The Cohen-Lenstra heuristics, which are still almost entirely unproven, have become one of the central foundational problems in arithmetic statistics. Recent years have seen a rapidly accelerated pace of development in Cohen-Lenstra problems. I will give a tour of these developments, including the work of Wood and her collaborators developing a fully fleshed out roster of generalized Cohen-Lenstra conjectures, with support from topology; Smith's theorems proving the Cohen--Lenstra conjectures for the 2-primary part of the class group, as part of more general theorems about Selmer groups in quadratic twists, leading to a resolution of the minimalist conjecture for elliptic curves; and recent work by Koymans and Pagano in the ell-primary case, expanding on Smith's work and proving Stevenhagen's conjecture on the negative Pell equation.

arXiv Page | PDF

Score: 0

ReSAGE-PAR: Representational Similarity Assessment for Generative Expansion in Pedestrian Attribute Recognition

Published: 2026-06-04 11:10:55

Authors: Pablo Ayuso-Albizu, Pablo Carballeira, Juan C. SanMiguel, Paula Moral

Categories: cs.CV

Abstract:
To address the limited diversity and data scarcity in Pedestrian Attribute Recognition (PAR), we explore image synthesis using diffusion models guided by attribute-based prompts. While this enables the controlled generation of pedestrian images, it faces two critical challenges: (i) the domain gap between high-quality pre-training data and low-resolution, non-standard surveillance crops, and (ii) the need for reliable attribute verification to prevent generative hallucinations. In this paper, we introduce a robust generate-score-autolabel pipeline called ReSAGE-PAR (REpresentational Similarity Assessment for Generative Expansion in PAR) that bridges this domain gap and enables scalable, high-fidelity dataset expansion. First, we adapt pre-trained diffusion models to native PAR resolutions using a tailored LoRA-based Image-to-Image approach. Second, we extract vision-language alignment scores between the generated images and their conditioning prompts, utilizing a comprehensive prompting strategy that includes label-consistent and inconsistent complements. Finally, we formulate a Bayesian classifier that converts these continuous scores into reliable binary pseudo-labels. Extensive evaluations demonstrate the effectiveness of ReSAGE-PAR in preserving spatial priors and verifying attributes. When integrated into PAR training, ReSAGE-PAR consistently yields significant improvements-achieving gains of up to 8.7% on standard backbones and pushing state-of-the-art frameworks to new performance levels. This proves its value as an architecture-agnostic solution for scalable PAR enhancement. The complete codebase for ReSAGE-PAR is publicly available at http://www-vpu.eps.uam.es/publications/ReSAGE-PAR.

arXiv Page | PDF

Score: 0

Leveraging MTG-FCI fire observations for event-based fire behavior monitoring from near-real-time operation to seasonal analysis

Published: 2026-06-04 11:05:18

Authors: Ronan Paugam, Jean-Baptiste Filippi, Akli Benali, Jorge Gomes, Weidong Xu, Emanuel Dutra, Francois Andre, Damien Boulanger, Vianney Retornard, Andrea Meraner, Julia Harvie, Victor Penot, Cyrielle Denjean

Categories: physics.ao-ph

Abstract:
Wildfire monitoring and suppression require timely information on fire behavior, including fire energy release and rate of spread, to support operational decision-making and resource allocation. Active fire products from the Flexible Combined Imager (FCI) aboard the geostationary Meteosat Third Generation (MTG) satellites provide 10-min observations over Europe and Africa. Deriving fire behavior information from these observations requires associating individual hotspot detections into coherent fire events. We present a Fire Event Tracker (FET) algorithm that performs spatio-temporal clustering of hotspot detections from the LSA-SAF FCI active fire product. The algorithm assigns persistent identifiers to fire events and updates their geometry, fire radiative power, and rate of spread at each 10-min interval. The same parameterization is used for both near-real-time and retrospective processing. FET was applied retrospectively to the Mediterranean FCI hotspot archive of 2025 and operationally in two near-real-time contexts: wildfire monitoring in Portugal and support of the 2025 SILEX airborne campaign within the EUBURN project, where besides fire monitoring, FET products were also used to initialize coupled FOREFIRE-MesoNH simulations for plume forecasting. Results show that event-based clustering of FCI active fire detections provides a consistent description of fire evolution, enabling both tactical wildfire management and high-frequency seasonal fire analyses.

arXiv Page | PDF

Score: 0

Weighted topological entropy and intersecting random translates of Bedford--McMullen carpets

Published: 2026-06-04 11:01:43

Authors: Nima Alibabaei, Masaki Tsukamoto

Categories: math.DS

Abstract:
We establish a relativised variational principle for the Feng--Huang weighted topological entropy associated with a factor map between dynamical systems. Combined with a recent theorem of Yin, this yields an almost-everywhere equivalence between the Feng--Huang entropy and its combinatorial version on fibers. As an application, we compute the Hausdorff dimension of the intersection of random translates of two Bedford--McMullen carpets. The resulting formula extends the Kenyon--Peres formula from the self-similar to the self-affine setting, and also points to a new problem concerning random matrix products.

arXiv Page | PDF

Score: 0

Adaptive Oscillatory-State Alignment for Time Series Forecasting

Published: 2026-06-04 10:59:59

Authors: Zhangyao Song, Ziqiong Li, Xiangfei Qiu, Chao Zha, Yinfei Xu, Tao Guo

Categories: cs.LG, cs.DB

Abstract:
Long-term time series forecasting benefits from inductive biases that expose recurring temporal structure. Existing periodic forecasting methods typically model recurrence through predefined periods, global spectral components, or fixed learnable templates. However, real-world temporal dynamics are rarely rigidly periodic: oscillatory behavior often evolves through amplitude modulation, phase drift, and local frequency variation. Under these conditions, fixed-template periodic modeling can become fundamentally mismatched to the underlying temporal states. We propose AOSNET, a Hilbert-guided forecasting framework that reformulates periodic forecasting from fixed template matching to adaptive oscillatory-state alignment. AOSNET extracts analytic-signal descriptors from both the observed sequence and a learnable global oscillatory prior, then adaptively aligns local states through a descriptor-conditioned gate that selectively preserves reliable observations while softly correcting mismatched regions. The learned prior serves not as a rigid repeated template but as a flexible oscillatory reference interpreted through local state dynamics. Experiments on eight benchmarks demonstrate state-of-the-art or highly competitive accuracy with fast inference speed. Controlled synthetic studies isolating amplitude modulation, phase drift, and local frequency variation confirm that the advantage of oscillatory-state alignment consistently increases as non-stationarity intensifies.

arXiv Page | PDF

Score: 0

Preventing $L^p$ blow-up by local anisotropy of signal production in the Keller-Segel system with strongly differing diffusion rates

Published: 2026-06-04 10:59:51

Authors: Youshan Tao, Michael Winkler

Categories: math.AP

Abstract:
In a smoothly bounded domain $Ω\subset R^n$, $n\le 5$, the manuscript considers the variant of the Keller-Segel system given by \[ \left\{ \begin{array}{l} u_t = D Δu - \nabla \cdot (u\nabla v), \\[1mm] v_t = d Δv + \nabla \cdot (u\nabla v) - v + u, \end{array} \right. \] which involves an additional contribution $\nabla \cdot (u\nabla v)$ to the chemoattractant evolution, in line with refined modeling literature reflecting an anisotropic correction to the isotropic signal production term $+u$ in the classical Keller-Segel model. It is shown that for arbitrary $D>0$ and $d>0$ and any nonnegative intial data from $W^{1,\infty}(Ω)\times W^{1, \infty}(Ω)$, an associated Neumann problem admits a global weak solution $(u,v)$ which, inter alia, satisfies \[ \sup_{t \in (0,\infty)\setminus N} \int_Ωe^{u^α(\cdot,t)} < \infty \] with some $α>0$ and some null set $N\subset (0,\infty)$.

arXiv Page | PDF

Score: 0

Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs

Published: 2026-06-04 10:56:57

Authors: Grama Chethan

Categories: cs.AI

Abstract:
Retrieval-Augmented Generation (RAG) fails systematically on queries requiring structural reasoning over interconnected entities. We compare eight retrieval architectures for aerospace supply chain intelligence, progressing from text retrieval through graph traversal to graph computation. Using a 46-node knowledge graph with 64 typed edges, we evaluate 23 queries across 10 intent categories and demonstrate that five query classes are structurally unreachable for vector retrieval. Our central finding is the operator vocabulary thesis: the barrier to LLM-based graph reasoning is not model intelligence but the computational operators available as tools. An LLM Query Planner with 9 typed traversal primitives outperforms bespoke handlers (F1 = 0.632 vs. 0.472) while generalizing to unseen queries. Adding 6 graph computation tools, the LLM selectively adopts them for exactly the query categories where traversal fails. We also identify a measurement gap: entity-level F1 systematically underscores structural queries where comprehensive answers are correct.

arXiv Page | PDF

Score: 0

Resolving room temperature microscale fracture and plasticity of iron oxides along the cascade of iron ore reduction via nanoindentation and microcantilever bending

Published: 2026-06-04 10:54:59

Authors: Shreehard Sahu, James P. Best, Gerhard Dehm, Anwesha Kanjilal

Categories: cond-mat.mtrl-sci

Abstract:
Understanding the fundamental mechanical behaviour of iron oxide phases is essential for controlling attrition and fracture during iron ore reduction process, particularly in hydrogen-based direct reduction systems. This study investigates the room temperature plasticity and fracture behaviour of single-crystal hematite, magnetite, and Wustite using nanoindentation and micro-cantilever fracture testing. Hematite exhibited the highest hardness, H and elastic modulus, E (H=18.5 GPa, E=281 GPa), followed by magnetite (H=8.7 GPa, E=165 GPa) and Wustite (H=7.5 GPa, E=145 GPa), reflecting differences in slip activity along the iron oxide reduction sequence. Furthermore, fracture toughness was measured using notched microcantilevers for all three iron oxide phases, aligned along low index and high index crystallographic planes, respectively. For the low index-oriented case hematite showed increased fracture toughness owing to crack deviation and faceting while magnetite and Wustite exhibited single plane cleavage fracture. Distinct changes in the deformation behavior in terms of plasticity and cracking of the three iron oxides were evident from both methods. Further investigation of a magnetite-gangue interface, particularly relevant to low-concentration ores, revealed significantly reduced fracture toughness compared to the magnetite phase. Overall, these results provide a comprehensive set of mechanical properties of iron oxides with potential application in material models for predicting fracture and attrition during hydrogen-based direct reduction.

arXiv Page | PDF

Score: 0

ATT-CR: Adaptive Triangular Transformer for Cloud Removal

Published: 2026-06-04 10:47:41

Authors: Yang Wu, Ye Deng, Pengna Li, Wenli Huang, Kangyi Wu, Xiaomeng Xin, Jinjun Wang

Categories: cs.CV, cs.AI

Abstract:
Cloud removal aims to accurately reconstruct the ground objects obscured by clouds in remote sensing images. Existing Transformer-based methods utilizing self-attention have shown impressive results by effectively modeling long-range dependencies in cloudy images. However, they suffer from the following issues: 1) the high computational complexity of self-attention limits scalability; 2) treating both cloudy and clean pixels as valid within the attention computation brings disturbances in subsequent layers, leading to suboptimal performance. To address these challenges, we propose the Adaptive Triangular Transformer for Cloud Removal (ATT-CR), a model that effectively reduces computational costs and mitigates interference from cloudy pixels. Specifically, it consists of two core components: Triangular Attention (TAN) and Feature Selected Gating Module (FSGM). TAN employs lower and upper triangular matrices to approximate Softmax attention with O(N) computational complexity, significantly reducing the computational costs. The FSGM, on the other hand, integrates with TAN to adaptively distinguish between cloudy and clean features, which minimizes the introduction of invalid information into subsequent layers. Extensive experiments on cloud removal benchmarks demonstrate that ATT-CR delivers superior performance compared to existing methods.

arXiv Page | PDF

Score: 0

Quantifying Uncertainty In Wide Two-Layer Neural Networks: On The Law Of The Limiting Fluctuation Process

Published: 2026-06-04 10:25:23

Authors: Arnaud Descours, Arnaud Guillin, Geoffrey Lacour, Manon Michel, Boris Nectoux, Paul Stos

Categories: cs.NE, math.AP, math.PR

Abstract:
Uncertainty quantification in neural networks prediction is a main issue for usual applications. Our approach seeks at reducing computation costs by directly evaluating uncertainty using PDE's information on the asymptotic variance, rather than the deep ensemble method which may be seen as a Monte Carlo estimation of the prediction, requiring the training of multiple networks. We thus study the law of the limiting process describing the random fluctuations around the mean-field limit of wide two-layer neural networks trained by stochastic gradient descent in a weak-noise regime. Building on a recent trajectorial central limit theorem, in which this limit is characterized as the weak solution of a linear stochastic evolution equation, we identify its law explicitly. More precisely, we show that it is a centered Gaussian process in the dual of a weighted Sobolev space, and we derive a closed covariance representation for the finite-dimensional distributions obtained by testing it against smooth functions. This covariance is expressed through the solution of a backward transport equation with a nonlocal source term, whose coefficients are driven by the mean-field trajectory. As a consequence, by testing against the activation function at a fixed input, we obtain an expression for the limiting variance of the corresponding network-output fluctuations. We illustrate this result numerically on a one-dimensional regression example.

arXiv Page | PDF

Score: 0

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

Published: 2026-06-04 10:17:00

Authors: Kuan-Yen Chen, Fang-Yi Su, Jung-Hsien Chiang

Categories: cs.AI, cs.CL

Abstract:
Recent work shows that LLM agents struggle to correct errors in their own reasoning traces yet show markedly higher correction rates when identical claims appear under external sources. We ask whether this asymmetry reflects a capability deficit or a role-label artifact: does an agent's willingness to correct a wrong claim depend causally on the chat-template role that carries it, rather than on the claim's content? Our setup keeps the erroneous claim byte-identical across all conditions (SHA-256 verified) and varies only its wrapping role: the agent's own \role{}, a \role{user} message, a \role{tool} response, or a \role{system } block. Across 13 model-domain cells covering seven model families and three domains ($n{=}30$ paired tasks per cell), relabeling the claim from \role{} to an external role lifts the explicit-correction rate by 23 to 93 percentage points, with 10 of 13 cells reaching $p{<}0.001$. Further experiments confirm that the effect is asymmetric, mechanistically decomposable, and robust across domains. The failure to self-correct is not a cognitive deficit; it is a chat-template artifact. We exploit this artifact by designing a prompt-structure-only intervention that requires no training and no model modification, with its strongest role label being domain-dependent: \role{} dominates on math, while a plain \role{user} message dominates on logical deduction.

arXiv Page | PDF

Score: 0

The quenching time and timescale distribution of z~2 quiescent galaxies from precise colour distribution analysis

Published: 2026-06-04 10:14:47

Authors: Vivienne Wild, Ho-Hin Leung, Adam Carnall, Maya Skarbinski

Categories: astro-ph.GA

Abstract:
Understanding when and how galaxies quench their star formation is crucial for understanding the dominant physical processes at play. The spectral energy distribution (SED) of galaxies encodes significant information on their past histories: the relative importance of different physical processes influences the observed distribution of SED shapes in the galaxy population. We use a simulation based inference (SBI) approach to directly constrain the distribution of formation times, quenching times and quenching timescales within the massive galaxy population at z >~ 2 from their broad band photometric colour distribution at 1.710.3. We measure a quenched galaxy fraction of 0.24+/-0.02, with the number density of quenched galaxies rising rapidly 2.5Gyr after the Big Bang (z<~2.6). Galaxies must quench rapidly to achieve the precise bimodal colour distribution: defining the quenching timescale as the time from peak star formation rate (SFR_peak) -> 0.5xSFR_peak, the quenching timescale distribution has a mode at 97_{-25}^{+31}Myr, a median of 182+/-16Myr and a tail to ~700Myr. To achieve full quiescence takes a median time of ~400Myr. Comparing to direct number density measurements of quenched galaxies at z>2 the combination of recent and rapid quenching inferred from the fossil record suggests a substantial rejuvenation and/or merger rate for quenched galaxies observed directly at z>3.5.

arXiv Page | PDF

Score: 0

Combining diffuse and sharp interface methods in shape optimisation

Published: 2026-06-04 10:03:01

Authors: Philip J. Herbert, Michael Hinze, Christian Kahle

Categories: math.OC

Abstract:
We develop a concept for the numerical treatment of shape optimization problems based on the combination of phase field and sharp interface methods. On the one hand, phase field methods are very well suited to numerically determine the shape, size and topology of a sought domain, but on the other hand they have problems to sharpen out domains where they e.g. should develop corners. However, this is the strength of a sharp-interface approach developed in our group, which provides shape updates in the Lipschitz topology. This leads to a two-stage process that first determines an optimized shape using the phase field method. The resulting domain is the starting solution for the sharp interface shape optimization method. Both methods are discretized with the finite element method. The starting mesh for the sharp method is constructed from the finite element mesh of the optimal phase field solution using its properly post processed zero-level set. We describe this construction process in detail and investigate the performance of our method on a selection of test problems from the literature and from applications.

arXiv Page | PDF

Score: 0

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

Published: 2026-06-04 09:49:47

Authors: Yuxiao Ye, Haoran He, Fangyuan Kong, Xintao Wang, Pengfei Wan, Kun Gai, Ling Pan

Categories: cs.AI

Abstract:
Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of instructions. In this setting, a model must follow each new instruction while preserving accumulated session-level constraints, challenged by two coupled failure modes: long-context dilution, where sparse textual constraints become difficult to recover from growing interleaved image-text histories, and state contamination, where earlier editing mistakes degrade subsequent generations. We introduce Edit-R2, a novel reinforcement learning post-training framework for unified multimodal models. Edit-R2 reconstructs the operative session intent, which effectively consolidates scattered historical constraints into an explicit reasoning trace before each editing turn. It further enables multi-turn RL over both reasoning and generation through a unified objective that jointly optimizes intent reconstruction generation in discrete text space and flow-matching image generation in continuous latent space, while a trajectory filtering mechanism suppresses corrupted rollouts to stabilize training under state contamination. To support systematic evaluation, we introduce MICE-Bench, a large-scale benchmark for multi-turn in-context editing with automated metrics for instruction following (IF), content consistency (CC), and global awareness (GA) over accumulated session constraints. Experiments show that Edit-R2 substantially improves multi-turn in-context editing and achieves competitive performance compared against strong baselines.

arXiv Page | PDF

Score: 0

Exploring the connection between coding habits and cognitive styles in malware developers

Published: 2026-06-04 09:46:25

Authors: Vasilis Vouvoutsis, Constantinos Patsakis, Fran Casino

Categories: cs.CR

Abstract:
Malware research primarily studies the results, the methods, and the impact. Even from an offensive security perspective, what is examined is the method, not the development strategy of the offender. This study investigates the behavioral signatures and coding patterns embedded in the malware source code. By analyzing a large corpus of leaked malware code and comparing it with carefully selected benign open-source software, we apply static application security testing and compute multiple software metrics. Based on cognitive psychology and criminological theories, our work interprets differences in code structure and quality as behavioral indicators, reflecting distinct motivational structures, risk tolerances, and development strategies of malware authors compared to benign software developers. Our findings reveal that malware code is generally smaller, less documented, and exhibits higher cyclomatic complexity per function, with reduced use of abstraction mechanisms such as classes and closures. Vulnerability analysis further reveals that malware exhibits more issues of the types that benign code typically avoids, suggesting a minimal investment in secure development practices. These patterns imply a development style optimized for expedience, operational secrecy, and evasion rather than long-term maintainability. Nonetheless, the code quality metrics indicate that it does not deviate significantly from benign software enough to be distinctive. By framing code metrics as proxies for behavioral signals and strategic choices, we demonstrate how quantitative software analysis can enrich behavioral cybersecurity research, offering new insights into the practices and priorities of malware developers. Our results pave the way for further research in the behavioral profiling of cyber offenders.

arXiv Page | PDF

Score: 0

Simulations of interaction between outflow and surrounding broken power-law circumnuclear medium: implications for different radio light curves of TDEs

Published: 2026-06-04 09:46:14

Authors: Xiangli Lei, Qingwen Wu, Chang Zhou, Wei-Hua Lei, Ya-Ping Li, Jiancheng Wu, Weibo Yang

Categories: astro-ph.HE

Abstract:
The complex radio light curves of tidal disruption events (TDEs) challenge our understanding of the properties of both the outflows and the circumnuclear medium (CNM) surrounding supermassive black holes. In this work, we explore outflow-CNM interactions across a broad parameter space using three-dimensional hydrodynamic simulations, adopting a broken power-law CNM density profile with a transition near the Bondi radius. The outflow-CNM interaction inside Bondi radius produces an early radio flare (\(\lesssim 2\) yr) once the emitting region becomes optically thin. A second radio rebrightening can appear a few years later if the outflow decelerates beyond Bondi radius. We also find that either a very dense inner CNM, which causes rapid deceleration, or a rarefied outer CNM suppresses the late rebrightening that will produces a single early-peaked flare. In contrast, a rarefied CNM inside the Bondi radius suppresses the early flare and yields a single late-peaked event. For the case of very dense CNM at large radii, the interaction will trigger a sharp late-time rise as observed in some TDEs. We further explore the interaction of a relativistic jet with a broken power-law CNM, which can reproduce the characteristic light curves as observed in jetted TDEs without invoking complex jet structure.

arXiv Page | PDF

Score: 0

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

Published: 2026-06-04 09:38:55

Authors: Marco Antonio Stranisci, A Pranav, Rossana Damiano, Christian Hardmeier, Anne Lauscher

Categories: cs.CL

Abstract:
Modern language models rely on pretraining filters to remove undesirable content from training corpora and inference-time guardrails to suppress undesirable outputs during deployment. In this paper, we examine how these filtering and moderation decisions produce forms of epistemic erasure and reveal tensions both across automated systems and between these systems and human judgment. We audit four pretraining filters and three inference-time guardrails on Common Crawl sentences containing gender and regional-origin mentions, together with a manually annotated subset of 500 sentences. Our analysis shows that filtering and guardrail decisions are strongly associated with blocklist-based lexical cues, while frequently failing to flag content containing private information or explicit hate speech. At the same time, marginalized groups, particularly transgender people, women, and Central Americans, are significantly over-flagged across systems. Human annotators, by contrast, would retain 88.5\% of filter-flagged and 91.3\% of guardrail-flagged content, often recognizing representational harms arising from tensions of content removal that current systems fail to capture. Taken together, our findings document a form of epistemic erasure in which mentions of marginalized groups are disproportionately removed before pretraining and additionally suppressed again at inference time.

arXiv Page | PDF

Score: 0

Towards World Models in Biomedical Research

Published: 2026-06-04 09:28:54

Authors: Guangyu Wang, Jingkun Yue, Siqi Zhang, Yu Liu, Xiaoyu Wang, Mingyuan Meng, Changwei Ji, Zongbo Han, Yulin Wang, Yang Yue, Frank Fu, Ting Chen, Song Wu, Ziwei Liu, Jiangning Song, Ming Li, Gao Huang, Xiaohong Liu, Athanasios Vasilakos, Xingcai Zhang, Ping Zhang, Yong Li

Categories: cs.AI

Abstract:
A central goal of biomedicine is to understand, predict and ultimately control the dynamic mechanisms by which biological systems respond to perturbations, disease progression and therapeutic intervention. Although foundation models and large language models have accelerated biomedical data interpretation, most current systems remain focused on static pattern recognition rather than prospective simulation of biological futures. Here we propose biomedical world models as a paradigm for AI-driven discovery. These models learn latent representations of molecular, cellular, tissue and clinical states, together with intervention-conditioned dynamics that allow future trajectories to be simulated before actions are taken. We discuss how biomedical world models could function as data engines, environment simulators and scientific planning substrates across applications including virtual cells, organoids, virtual patients and surgical simulation. We outline the data infrastructure, evaluation benchmarks, safety constraints and governance frameworks required. Biomedical world models may provide a foundation for simulation-guided, closed-loop and experimentally actionable biomedical discovery.

arXiv Page | PDF

Score: 0

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

Published: 2026-06-04 09:16:26

Authors: Cunhang Fan, Enrui Liu, Jing Zhou, Jian Kang, Jie Li, Andong Li, Jian Zhou, Zhao Lv, Xuelong Li

Categories: cs.SD, cs.LG, eess.AS

Abstract:
Although artificial neural network (ANN) based speech enhancement (SE) methods demonstrate excellent performance, the high computational complexity and high energy consumption hinder their deployment in practical front-end processing tasks.} Currently, the spiking neural networks (SNNs) have shown potential in reducing power consumption. However, the discrete binary activation and complex spatio-temporal dynamics of SNNs often result in information loss. The current challenge therefore focuses on how to maintain performance and reduce computational complexity. To address this issue, this work propose a Dual-Branch Hybrid Neural (DBHN) Network. 1) In terms of network architecture: A dual-branch network integrating ANN and SNN was designed, where the SNN branch reduces power consumption while the ANN branch addresses information loss; The BandSplit and Time-Frequency (TF) -Mamba modules were developed to simultaneously compress energy consumption and enhance model performance; Spiking Feature Extraction Group (SFEG) and Information Transformation Block (ITB) components were implemented with residual connections to mitigate information loss while further refining feature representations. 2) To facilitate inter-branch information fusion: An Interaction module was designed to promote information exchange at various stages of the dual-branch network; A TF-Cross Attention-Fusion module was designed to perform time-frequency domain fusion of dual-branch information while data-adaptively guiding the SNN branch to retain more critical information. Results show that the proposed model maintains superior performance across three public datasets while achieving an average 7.5 fold reduction in computational complexity compared to baseline models.

arXiv Page | PDF

Score: 0

ACE-SQL: Adaptive Co-Optimization via Empirical Credit Assignment for Text-to-SQL

Published: 2026-06-04 09:11:04

Authors: Xiaobing Chen, Ai Jian, Eryu Guo, Zhiqi Pang

Categories: cs.CL

Abstract:
Text-to-SQL maps natural language questions to executable SQL queries. Modern databases often contain large and complex schemas, making schema linking a critical step for accurate SQL generation. Existing methods either rely on full-schema generation, which leaves schema linking implicit within a large search space, or use a separate retriever trained with static gold-column supervision, whose targets may be suboptimal for the current generator policy. To address this issue, we propose Adaptive Co-optimization via Empirical Credit Assignment for Text-to-SQL (ACE-SQL), a reinforcement learning (RL) framework that jointly optimizes schema retrieval and SQL generation under execution feedback. ACE-SQL constructs an online column-set pool from generator rollouts and derives adaptive on-policy retrieval targets from the column set most frequently associated with execution-correct rollouts. This induces bidirectional adaptation, where the retriever adapts toward column sets that the generator can execute correctly, while the generator adapts to the retriever's evolving schema selections under execution feedback. With approximately 3k synthetic Text-to-SQL question-database pairs for RL training, ACE-SQL achieves 65.3% greedy execution accuracy on BIRD Dev while using 0.93k output tokens per query. The repository is available at https://github.com/xbchen1/ACE-SQL.

arXiv Page | PDF

Score: 0

Non-equilibrium quantum thermodynamics of a memory-bearing open-system process

Published: 2026-06-04 09:08:31

Authors: Biagio G. Banigi, Eric Lutz, Mauro Paternostro

Categories: quant-ph

Abstract:
We show the emergence of memory effects in the dynamics of a driven two-level system interacting with a composite environment, and analyze their influence on work, heat and entropy production. We further investigate how the interplay between driving, dissipation and memory effects, stemming from the finiteness of the environment, shapes the thermodynamic response of the system, thus providing insight into quantum thermodynamics beyond the Markovian approximation.

arXiv Page | PDF

Score: 0

PriSrv+: Privacy and Usability-Enhanced Wireless Service Discovery with Fast and Expressive Matchmaking Encryption

Published: 2026-06-04 09:07:19

Authors: Yang Yang, Guomin Yang, Yingjiu Li, Pengfei Wu, Rui Shi, Minming Huang, Jian Weng, HweeHwa Pang, Robert H. Deng

Categories: cs.CR

Abstract:
Service discovery is a fundamental process in wireless networks, enabling devices to find and communicate with services dynamically, and is critical for the seamless operation of modern systems like 5G and IoT. This paper introduces PriSrv+, an advanced privacy and usability-enhanced service discovery protocol for modern wireless networks and resource-constrained environments. PriSrv+ builds upon PriSrv (NDSS'24), by addressing critical limitations in expressiveness, privacy, scalability, and efficiency, while maintaining compatibility with widely-used wireless protocols such as mDNS, BLE, and Wi-Fi. A key innovation in PriSrv+ is the development of Fast and Expressive Matchmaking Encryption (FEME), the first matchmaking encryption scheme capable of supporting expressive access control policies with an unbounded attribute universe, allowing any arbitrary string to be used as an attribute. FEME significantly enhances the flexibility of service discovery while ensuring robust message and attribute privacy. Compared to PriSrv, PriSrv+ optimizes cryptographic operations, achieving 7.62* faster for encryption and 6.23* faster for decryption, and dramatically reduces ciphertext sizes by 87.33%. In addition, PriSrv+ reduces communication costs by 87.33% for service broadcast and 86.64% for anonymous mutual authentication compared with PriSrv. Formal security proofs confirm the security of FEME and PriSrv+. Extensive evaluations on multiple platforms demonstrate that PriSrv+ achieves superior performance, scalability, and efficiency compared to existing state-of-the-art protocols.

arXiv Page | PDF

Score: 0

The Analysis of the Influence of Coordinate Error of Observation Station On the Construction Accuracy of Pulsar Time

Published: 2026-06-04 08:59:23

Authors: Zurong Zhou, Chengshi Zhao, Yuping Gao, Jianping Yuan, Wei Han, Shougang Zhang, Yue Hu, Shijun Dang, Na Wang, Jingbo Wang, Minglei Tong, De Wu

Categories: astro-ph.IM, astro-ph.HE

Abstract:
\abstract{Errors in observatory coordinates directly impact the precision of pulsar time-scale construction. Using the pulsar timing software TEMPO2, this study simulates various station position errors within the three-dimensional terrestrial reference frame for three different types of millisecond pulsars, over periods of 13 days and 5 years, and analyzes their effects on pulsar timing results.The findings demonstrate that,for both 13-day and 5-year observation spans, station coordinate errors substantially reduce the accuracy of pulsar timescale construction when the zenith angle exhibits long-term variations. This effect is independent of pulsar type and the daily observable time of the station antenna for the pulsar. A linear relationship is found between station coordinate errors and the Root-Mean-Square (RMS) of pulsar timing residuals, with fitted linear coefficients ranging from $1.36 \times 10^{-11}$ to $1.61 \times 10^{-9}$ for the three pulsars. The Roemer delay error caused by coordinate inaccuracies is notably larger than other delay and correction terms. Errors along the x- and y-axes have comparable influences on timing precision, whereas errors along the z-axis have a relatively smaller effect. Kendall correlation analysis between station error-induced Roemer delay and RMS yields a correlation coefficient $r = 1.67\%$ and $p = 100\%$ in all cases, indicating that, at current timing precision levels, coordinate errors primarily affect the Roemer delay term and thus the pulse arrival times, which is highly consistent with theoretical models.While these findings offer valuable insights into the key factors influencing pulsar timescale accuracy and related applications, they may not hold under conditions of a constant zenith angle or limited elevation angles, such as those at FAST.}

arXiv Page | PDF

Score: 0

Retry Policy Gradients in Continuous Action Spaces

Published: 2026-06-04 08:57:45

Authors: Soichiro Nishimori, Paavo Parmas

Categories: cs.AI

Abstract:
Retry-based objectives such as pass@K and max@K optimize the best return obtained from multiple sampled trajectories, and recent work has shown that they can promote exploration without explicit exploration bonuses. In discrete action spaces, ReMax was shown to do so by adapting to return uncertainty. In this work, we introduce pathwise derivative estimators for retry objectives and use them to extend ReMax to continuous action spaces. We study the resulting learning dynamics and show that, even with deterministic rewards, ReMax can encourage stochastic exploration by reshaping the policy-gradient landscape. In particular, it alters gradients both in direction, biasing updates toward higher policy entropy, and in magnitude, damping gradients and slowing convergence. We further show that Adam's adaptive normalization can mitigate this damping, depending on its numerical stabilization parameter. Empirically, we instantiate this objective as ReMax Actor-Critic (ReMAC), an off-policy actor--critic algorithm that optimizes the ReMax objective using a pathwise derivative estimator. Our experiments show that ReMAC can promote higher policy entropy without entropy regularization and achieves performance comparable to SAC.

arXiv Page | PDF

Score: 0

Polylogarithmic Structure of Bragg Diffraction in Finite-Coherence Lattices

Published: 2026-06-04 08:56:24

Authors: Evangelos G. Filothodoros

Categories: cond-mat.stat-mech

Abstract:
We develop a polylogarithmic structure for Bragg diffraction based on a weighted multi-plane interference model. Within this kind of construction, the scattering amplitude is expressed as a polylogarithmic generating function. By introducing extra contributions with power-law and the usual exponential decay, it takes the form $F(θ) = \mathrm{Li}_m\left(e^{iθ_{\mathrm{eff}} - ε}\right)$, where $ε$ is a finite coherence length. In the limit where $ε\rightarrow 0$, the argument of the polylogarithm approaches the unit circle and the classical Bragg condition corresponds to the approach of the polylogarithm argument toward its branch point $z=1$. This formulation provides a compact analytical framework for describing diffraction line shapes within a generalized correlation model in which peak positions, widths, and line shapes arise from a single analytic structure. Although we are able to recover the standard Bragg law for ideal crystals, the polylogarithm model captures deviations due to finite correlation length, disorder and non-uniform lattice coherence. We show that if Bragg peaks correspond to boundary singularities of the polylogarithm, a connection between diffraction theory and complex analysis arise. The proposed theoretical model may be particularly relevant for disordered or partially coherent materials, where conventional diffraction models often require additional phenomenological broadening assumptions.

arXiv Page | PDF

Score: 0