Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on Identification of Taxpayers' Fraudulent Invoicing Behavior Based on Feature Engineering

Download as PDF

DOI: 10.23977/ferm.2024.070524 | Downloads: 14 | Views: 550

Author(s)

Wei Liu 1, Jiyuan Chen 1, Lingjun Xiao 1, Yin Si 1, Jun Tang 1

Affiliation(s)

1 School of Information and Intelligent Engineering, Guangzhou Xinhua University, Guangzhou, China

Corresponding Author

Jun Tang

ABSTRACT

Fraudulent invoicing is a key part of tax risk work, and how to accurately identify whether taxpayers have fraudulent invoicing behaviors from massive tax data to reduce the loss of tax is the focus of tax risk work. The existing tax data is large in volume, with fuzzy data features, and traditional machine learning models have limited generalization ability, which leads to poor performance in identifying false invoicing behaviors. To address the above problems, this paper establishes a high-quality sample dataset by establishing a tax feature project and proposes a learning model based on Stacking integrated ideas to identify taxpayers' false invoicing behavior. Taking the Stacking-based false invoicing behavior recognition model proposed in this paper as the core, on the tax sample dataset, the classification effect of the proposed model is compared with that of single models, and the results show that the Stacking-based recognition model is superior to others in terms of AUC value, accuracy, and F1 score. The experimental results validate the superiority of the model.

KEYWORDS

False invoicing; Feature engineering; Integrated learning; Stacking

CITE THIS PAPER

Wei Liu, Jiyuan Chen, Lingjun Xiao, Yin Si, Jun Tang, Research on Identification of Taxpayers' Fraudulent Invoicing Behavior Based on Feature Engineering. Financial Engineering and Risk Management (2024) Vol. 7: 186-193. DOI: http://dx.doi.org/10.23977/ferm.2024.070524.

REFERENCES

[1] Li Xiangrong, Zhu Keshi. Analysis of Tax Risk Prevention Countermeasures of VAT Invoice Management[J]. China Accountant General, 2020(09):52-54.
[2] Wolpert D H. Stacked generalization [J]. Neural Networks, 2017, 5(2):241-259.
[3] Ji Yanli, Wang Wenqing. Research on the stock of accuracy of tax risk identification in the context of big data - based on the perspective of machine learning[J]. Fiscal Research, 2020(09):119-129.DOI:10.19477/j.cnki.11-1077/f.2020. 09. 010
[4] Zhu Jiangtao. Utilizing "Internet+" thinking to crack the problem of export tax fraud[J]. Tax Research, 2016(05):22-27.DOI:10.19376/j.cnki.cn11-1011/f.2016.05.003.
[5] Chen Zaosheng, Zhang Junping. VAT tax source risk control model and empirical analysis[J]. Taxation Economics Research, 2015, 20(02):66-71.DOI:10.16340/j.cnki.ssjjyj.2015.02.011.
[6] Yao X, Wang Xiaodan, Zhang Yuxi, Quan Wen. A review of feature selection methods[J]. Control and Decision Making, 2012, 27(02):161-166+192.DOI:10.13195/j.cd.2012.02.4.yaox.013.
[7] J. W. Xu, Y.Y. Yang. Integrated learning methods:A research review[J]. Journal of Yunnan University(Natural Science Edition), 2018, 40(06):1082-1092.
[8] Hart P E. The Condensed Nearest Neighbor Rule[J]. IEEE Transactions on Information Theory, 1968, 14(3):515-516.
[9] Pregibon D. Logistic Regression Diagnostics[J]. Annals of Statistics, 1981, 9(4):705-724.
[10] Breiman L. Random Forests [J]. Machine Learning, 2001.
[11] Ke G L, Meng Q, Finley T, et al. Light GBM: A Highly Efficient Gradient Boosting Decision Tree[C]// Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA.

Downloads: 35758
Visits: 862413

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.