Density-Based Clustering and Latent Dirichlet Allocation Framework for Consumer Preference Mining in E-Commerce Reviews
DOI: 10.23977/autml.2026.070120 | Downloads: 2 | Views: 99
Author(s)
Weiyi Zhu 1, Ming Yin 1, Yan Zhang 1, Yanan Peng 1
Affiliation(s)
1 School of Big Data and Statistics, Sichuan Tourism University, Chengdu, Sichuan, China
Corresponding Author
Yanan PengABSTRACT
Consumer preference inference from large-scale electronic commerce reviews remains a fundamentally challenging task due to the high dimensionality, sparsity, and noise characteristics inherent in user-generated textual content. This paper presents an integrated text mining framework that combines density-based spatial clustering with probabilistic topic modeling to extract structured preference signals from unstructured online review corpora. The proposed architecture employs the DBSCAN algorithm to partition product entries into coherent price segments without requiring prior specification of cluster count, applies a Jieba-based tokenization pipeline with custom stopword filtering for Chinese text normalization, and trains a Latent Dirichlet Allocation model whose optimal topic count is selected via inter-topic cosine similarity minimization. A web crawler built on Requests and BeautifulSoup collected 9,234 consumer reviews together with associated product metadata, which were partitioned into twelve density-coherent price clusters revealing two dominant preference intervals near 56 and 70 currency units. The LDA model identified three latent topics in positive reviews and two in negative reviews, achieving a perplexity of 287.4 and a topic coherence of 0.524, representing an 18.7% improvement over comparable LSA and NMF baselines. Sentiment-aware classification reached 92.6% accuracy with an F1 score of 91.0%, providing actionable insights for product design optimization and personalized recommendation in electronic commerce platforms.
KEYWORDS
Latent Dirichlet Allocation, Density-Based Spatial Clustering, Chinese Text Tokenization, Consumer Preference Mining, Probabilistic Topic Modeling, E-Commerce Review AnalysisCITE THIS PAPER
Weiyi Zhu, Ming Yin, Yan Zhang, Yanan Peng. Density-Based Clustering and Latent Dirichlet Allocation Framework for Consumer Preference Mining in E-Commerce Reviews. Automation and Machine Learning (2026). Vol. 7, No. 1, 162-172. DOI: http://dx.doi.org/10.23977/autml.2026.070120.
REFERENCES
[1] Syamsuri, A.R., Arohman, R., Saputra, M.R., Ikhlash, M. and Damanik, S.K. (2025) Integration of machine learning in e-commerce: A systematic literature review on consumer behavior prediction and product recommendation. Social Sciences Insights Journal, 3, 153-162.
[2] Wu, B., Ding, Z. and Huang, J. (2026) A review of continual learning in edge AI. IEEE Transactions on Network Science and Engineering.
[3] Izumi, C., Ghaffar, S.A. and Setiawan, W.C. (2025) Enhancing customer satisfaction and product quality in e-commerce through post-purchase analysis using text mining and sentiment analysis techniques in digital marketing. Journal of Digital Market and Digital Currency, 2, 1-25.
[4] Wu, B., Ding, Z., Ostigaard, L. and Huang, J. (2025) Reinforcement learning-based energy-aware coverage path planning for precision agriculture. Proceedings of the 2025 ACM Research on Adaptive and Convergent Systems (RACS), 1-8.
[5] Deepika, R. and Kandavel, R. (2025) Mining consumer behavior patterns in e-commerce using Apriori algorithm and sequential pattern analysis. Proceedings of the 2025 International Conference on Automation and Computation (AUTOCOM), 268-273.
[6] Wu, B., Cai, Z., Wu, W. and Yin, X. (2023) AoI-aware resource management for smart health via deep reinforcement learning. IEEE Access, 11, 81180-81195.
[7] Maia, S., Teixeira Domingues, J.P., Rocha Varela, M.L.R. and Fonseca, L.M. (2025) Exploring the user-generated content data to improve quality management. The TQM Journal, 37, 877-901.
[8] Wu, B. and Wu, W. (2023) Model-free cooperative optimal output regulation for linear discrete-time multi-agent systems using reinforcement learning. Mathematical Problems in Engineering, 6350647.
[9] De La Hoz-M, J., Montes-Escobar, K., Salas-Macias, C.A., Fors, M. and Ballaz, S.J. (2026) Using latent Dirichlet allocation topic modeling to uncover latent research topics and trends in renal cell carcinoma: Bibliometric review. JMIR Cancer, 12, e78797.
[10] Kirilenko, A.P. (2025) Topic modeling: Latent Dirichlet allocation. Practical Data Mining with AI for Social Scientists, Springer Nature Switzerland, 359-387.
[11] Noor Mathivanan, N.M., Janor, R.M., Razak, S.A. and Md Ghani, N.A. (2025) Feature substitution using latent Dirichlet allocation for text classification. International Journal of Advanced Computer Science & Applications, 16.
[12] Ningrum, A.F., Talirongan, F.J.B. and Tangaro, D.M.G.G. (2025) Identification of dominant topics in public discussions on IKN using latent Dirichlet allocation (LDA) and BERTopic. Scientific Journal of Computer Science, 1, 16-22..
[13] Nahidmobarakeh, L., Nemetiandoost, M., Yilmaz, B.S., Gazzarri, J., Zhang, X., Arias, S. and Ahmed, R. (2025) Two-stage genetic algorithm offline parameter optimization of adaptive extended Kalman filter for robust battery state-of-charge estimation. IEEE Access.
[14] Huang, J., Wu, B., Duan, Q., Dong, L. and Yu, S. (2025) A fast UAV trajectory planning framework in RIS-assisted communication systems with accelerated learning via multithreading and federating. IEEE Transactions on Mobile Computing.
[15] Kumar, R., Singhal, N. and Chhabra, A. (2025) Hybrid optimization algorithm with the combination of PSO and genetic algorithm for task scheduling in cloud computing. E-Learning and Digital Media, 20427530251331082.
[16] Nathiya, N., Rajan, C. and Geetha, K. (2025) A hybrid optimization and machine learning based energy-efficient clustering algorithm with self-diagnosis data fault detection and prediction for WSN-IoT application. Peer-to-Peer Networking and Applications, 18, 13.
[17] Wu, B., Huang, J. and Yu, S. (2026) 'X of Information' continuum: A survey on AI-driven multi-dimensional metrics for next-generation networked systems. IEEE Communications Surveys & Tutorials.
[18] Wu, B., Huang, J., Duan, Q., Dong, L. and Cai, Z. (2025) Enhancing vehicular platooning with wireless federated learning: A resource-aware control framework. IEEE/ACM Transactions on Networking, 33, 1-16.
[19] Monko, G. and Kimura, M. (2025) Enhanced stratified sampling-density-based spatial clustering of applications with noise (SS-DBSCAN) for high-dimensional data. Data Science, 8, 24518492251349080.
[20] Wu, B., Huang, J. and Duan, Q. (2025) FedTD3: An accelerated learning approach for UAV trajectory planning. Proceedings of the International Conference on Wireless Artificial Intelligent Computing Systems and Applications (WASA), 13-24.
[21] Roh, H., Etzenbach, L., Oltramare, A., Norheim, J. and De Weck, O.L. (2025) Size constrained K-means clustering for controlled design structure matrix partitioning. Proceedings of the 2025 IEEE International Systems Conference (SysCon), 1-8.
[22] Yfantis, V., Wagner, A. and Ruskowski, M. (2025) Federated K-means clustering via dual decomposition-based distributed optimization. Franklin Open, 10, 100204.
[23] Wu, B., Huang, J. and Duan, Q. (2025) Real-time intelligent healthcare enabled by federated digital twins with AoI optimization. IEEE Network, 1.
[24] Okkels, C.B., Aumüller, M., Thomsen, V.B. and Zimek, A. (2025) High-dimensional density-based clustering using locality-sensitive hashing. Proceedings of the EDBT, 694-706.
[25] Pan, D., Wu, B.-N., Sun, Y.-L. and Xu, Y.-P. (2023) A fault-tolerant and energy-efficient design of a network switch based on a quantum-based nano-communication technique. Sustainable Computing: Informatics and Systems, 37, 100827.
[26] Agrawal, S.K. (2026) Adaptive density-aware clustering of high-dimensional patient data in electronic health records. International Journal of Engineering Development and Research, 14, 361-367.
| Downloads: | 5343 |
|---|---|
| Visits: | 271365 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Advances in Computer, Signals and Systems
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks

Download as PDF