Education, Science, Technology, Innovation and Life
Open Access
Sign In

ACMAN: Adaptive Cross-Modal Anomaly Network

Download as PDF

DOI: 10.23977/cpcs.2025.090114 | Downloads: 0 | Views: 52

Author(s)

Junwei Wang 1, Junting Liu 1, Yutian Jiao 1

Affiliation(s)

1 Shandong Jiaotong University, Jinan, Shandong, China

Corresponding Author

Junwei Wang

ABSTRACT

Anomaly detection underpins quality inspection, medical diagnosis, and safety monitoring, yet progress remains hindered by the scarcity of anomaly samples, limited semantic alignment, and unreliable uncertainty estimates. Here we present ACMAN-AD (Adaptive Cross-Modal Anomaly Network for Anomaly Detection), a unified framework that leverages vision—language pre-training to overcome these bottlenecks. ACMAN- AD integrates four complementary modules: a Cross-Modal Dynamic Adapter (CMDA) for image-guided prompt generation and adaptive alignment; a Self-Supervised Multi-Scale Feature Fusion (SSMFF) strategy for hierarchical representation learning; a Generative Adversarial Anomaly Synthesis (GAAS) module to enrich anomaly diversity; and a Knowledge Distillation and Uncertainty Quantification (KDUQ) scheme for lightweight inference with calibrated confidence. On MVTec AD and VisA, ACMAN-AD surpasses state-of- the-art methods in both detection and segmentation, improving AUROC and AUPRC by 3.2.

KEYWORDS

Anomaly detection, Contrastive Language-Image Pre-training, Vision-language pre-training

CITE THIS PAPER

Junwei Wang, Junting Liu, Yutian Jiao, ACMAN: Adaptive Cross-Modal Anomaly Network. Computing, Performance and Communication Systems (2025) Vol. 9: 106-114. DOI: http://dx.doi.org/10.23977/cpcs.2025.090114.

REFERENCES

[1] Jaehyeok Bae, Jae-Han Lee, and Seyun Kim. Pni: industrial anomaly detection using position and neighborhood in- formation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6373–6383, 2023.
[2] Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Songhao Piao, and Furu Wei. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897–32912, 2022.
[3] Kilian Batzner, Lars Heckler, and Rebecca Ko… nig. Efficientad: Accurate visual anomaly detection at millisecond-level latencies. arXiv preprint arXiv:2303.14535, 2023.
[4] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger.  Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019.
[5] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger.  Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4183–4192, 2020.
[6]  Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, Thomas Brox, Peter Gehler.Towards Total Recall in Industrial Anomaly Detection.In Proc. 36th Conf. Neural Inf. Process. Syst. (NeurIPS), New Orleans, USA, 2022, arXiv:2106.08265 .
[7] Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, and Weiming Shen. Segment any anomaly without training via hybrid prompt regularization. arXiv preprint arXiv:2305.10724, 2023.
[8] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on ma- chine learning, pages 1597–1607. PMLR, 2020.
[9] Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, and Tieniu Tan. Semantic prompt for few-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23581–23591, 2023.
[10] Xuhai Chen, Yue Han, and Jiangning Zhang. A zero-/few- shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382, 2023.
[11] Niv Cohen and Yedid Hoshen. Sub-image anomaly detection with deep pyramid correspondences. arXiv preprint arXiv:2005.02357, 2020.
[12] Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. In Inter- national Conference on Pattern Recognition, pages 475–489. Springer, 2021.
[13] Kulikov, V., Yadin, S., Kleiner, M., Michaeli, T.: Sinddm: A single image denoising diffusion model. In: International Conference on Machine Learning. pp. 17920-17930. PMLR (2023).
[14] Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1931-1941 (2023).
[15] Liu, R., Liu, W., Zheng, Z., Wang, L., Mao, L., Qiu, Q., Ling, G.: Anomalygan: A data augmentation method for train surface anomaly detection. Expert Systems with Applications p. 120284 (Oct 2023). https://doi.org/10.1016/j.eswa. 2023 . 120284, http://dx.doi.org/10.1016/j.eswa.2023.120284.

Downloads: 3404
Visits: 217091

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.