USING MLOPS FOR DEPLOYMENT OF OPINION MINING MODEL AS A SERVICE FOR SMART CITY APPLICATIONS

Aigerim Mussina; Didar Yedilkhan; Yermek Alimzhanov; Aliya Nugumanova; Sanzhar Aubakirov; Aigerim Mansurova

doi:10.37943/21CPQX5616

Authors

Aigerim Mussina Al-Farabi Kazakh National University, Kazakhstan https://orcid.org/0000-0002-7043-0810
Didar Yedilkhan Astana IT University, Kazakhstan https://orcid.org/0000-0002-6343-5277
Yermek Alimzhanov Astana IT University, Kazakhstan https://orcid.org/0000-0002-8758-2220
Aliya Nugumanova Astana IT University, Kazakhstan https://orcid.org/0000-0001-5522-4421
Sanzhar Aubakirov Al-Farabi Kazakh National University, Kazakhstan https://orcid.org/0000-0002-8416-527X
Aigerim Mansurova Astana IT University, Kazakhstan https://orcid.org/0009-0003-1978-9574

DOI:

https://doi.org/10.37943/21CPQX5616

Keywords:

Sentiment Analysis, Smart City, Urban Environment, Opinion Mining, Microservice Architecture

Abstract

This paper presents the MLOps strategy, which adapts the automation principles of DevOps to the deployment and lifecycle management of artificial intelligence (AI) models. By leveraging high-performance automation, MLOps ensures seamless AI development and operations integration, enabling efficient and reliable model deployment. The study demonstrates this approach by implementing the Astana Opinion Mining macro-service customized for sentiment analysis. This macro-service evaluates public opinions based on a criteria taxonomy for assessing the urban environment's sustainable development. As a smart city application, the system facilitates the collection and analysis of citizen feedback to assess the performance of city services and inform urban planning decisions. Technologically, the MLOps strategy employs containers and microservices to construct robust data and process pipelines. Four core pipelines were developed in this research: data collection, feature engineering, experimentation, deployment, and maintenance. The data collection pipeline is achieved through automated crawling from diverse sources such as social media and other internet platforms. The feature engineering pipeline ensures data preprocessing by removing noise, identifying message languages, categorizing topics, and preparing data for further analysis. The experimentation pipeline incorporates services for data labeling, model training, and performance evaluation customized to sentiment analysis tasks. Finally, the deployment pipeline and maintenance pipeline deliver trained models to end-users, ensuring their continual improvement and adaptation. Using this MLOps framework, four models of sentiment analysis were tested in Russian: "Blanchefort," "Sismetanin," "MonoHime," and "Dostoevsky." The "Blanchefort" showed an accuracy of 71,43%. The resulting MLOps framework is fault-tolerant, scalable, and enables real-time urban environment assessments. By automating workflows, the architecture enhances operational efficiency, offering practical applications for smart city initiatives and sustainable urban development, contributing to better decision-making.

Author Biographies

Aigerim Mussina, Al-Farabi Kazakh National University, Kazakhstan

PhD student of Computer Science, Department of Computer Science

Didar Yedilkhan, Astana IT University, Kazakhstan

PhD, Head of the Scientific and Innovation Center “Smart City”

Yermek Alimzhanov, Astana IT University, Kazakhstan

Master of Mathematics, Director of Digital Institute of Lifelong Education

Aliya Nugumanova, Astana IT University, Kazakhstan

PhD, Head of the Scientific and Innovation Center “Big Data and Blockchain Technologies”

Sanzhar Aubakirov, Al-Farabi Kazakh National University, Kazakhstan

PhD, Department of Computer Science

Aigerim Mansurova, Astana IT University, Kazakhstan

Master of Technical Sciences

References

Borg, M. (2022). Agility in Software 2.0 – Notebook Interfaces and MLOps with Buttresses and Rebars. In Lecture notes in business information processing (pp. 3–16). https://doi.org/10.1007/978-3-030-94238-0_1

Bernardi, L., Mavridis, T., & Estevez, P. (2019). 150 Successful Machine Learning Models: 6 lessons learned at booking.com. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1743–1751. https://doi.org/10.1145/3292500.3330744

Bosch, J., Olsson, H. H., & Crnkovic, I. (2021). Engineering ai systems: A research agenda. Artificial intelligence paradigms for smart cyber-physical systems, 1-19. https://doi.org/10.4018/978-1-7998-5101-1.ch001

Testi, M., Ballabio, M., Frontoni, E., Iannello, G., Moccia, S., Soda, P., & Vessio, G. (2022). MLOps: A Taxonomy and a Methodology. IEEE Access, 10, 63606–63618. https://doi.org/10.1109/access.2022.3181730

Karamitsos, I., Albarhami, S., & Apostolopoulos, C. (2020). Applying DevOps Practices of Continuous Automation for Machine Learning. Information, 11(7), 363. https://doi.org/10.3390/info11070363

Rubini, L., & Della Lucia, L. (2018). Governance and the stakeholders’ engagement in city logistics: the SULPiTER methodology and the Bologna application. Transportation Research Procedia, 30, 255–264. https://doi.org/10.1016/j.trpro.2018.09.028

Witanto, J. N., Lim, H., & Atiquzzaman, M. (2018). Smart government framework with geo-crowdsourcing and social media analysis. Future Generation Computer Systems, 89, 1–9. https://doi.org/10.1016/j.future.2018.06.019

Lin, Y., & Geertman, S. (2019). Can social media play a role in urban planning? A literature review. Computational Urban Planning and Management for Smart Cities 16, 69-84.

Steils, N., Hanine, S., Rochdane, H., & Hamdani, S. (2021). Urban crowdsourcing: Stakeholder selection and dynamic knowledge flows in high and low complexity projects. Industrial Marketing Management, 94, 164–173. https://doi.org/10.1016/j.indmarman.2021.02.011

Alizadeh, T. (2018, May). Crowdsourced smart cities versus corporate smart cities. In IOP conference series: Earth and environmental science (Vol. 158, No. 1, p. 012046). IOP Publishing.

Ilieva, R. T., & McPhearson, T. (2018). Social media data for urban sustainability. Nature Sustainability, 1(10), 553–565. https://doi.org/10.1038/s41893-018-0153-6

Schrammeijer, E. A., Van Zanten, B. T., & Verburg, P. H. (2021). Whose park? Crowdsourcing citizen’s urban green space preferences to inform needs-based management decisions. Sustainable Cities and Society, 74, 103249. https://doi.org/10.1016/j.scs.2021.103249

Ghermandi, A., & Sinclair, M. (2019). Passive crowdsourcing of social media in environmental research: A systematic map. Global Environmental Change, 55, 36–47. https://doi.org/10.1016/j.gloenvcha.2019.02.003

Mcardle, G., & Kitchin, R. (2016). Improving the Veracity of Open and Real-Time Urban Data. Built Environment, 42(3), 457–473. https://doi.org/10.2148/benv.42.3.457

Palladini, A. (2022). Streamline machine learning projects to production using cutting-edge MLOps best practices on AWS (Doctoral dissertation, Politecnico di Torino).

Kreuzberger, D., Kühl, N., & Hirschl, S. (2023). Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access, 11, 31866–31879. https://doi.org/10.1109/access.2023.3262138

Van Den Heuvel, W., & Tamburri, D. A. (2020). Model-Driven ML-Ops for Intelligent Enterprise Applications: Vision, Approaches and Challenges. In Lecture notes in business information processing (pp. 169–181). https://doi.org/10.1007/978-3-030-52306-0_11

Renggli, C., Rimanic, L., Gurel, N. M., Karlas, B., Wu, W., & Zhang, C. (2021). A Data Quality-Driven View of MLOps. IEEE Data(Base) Engineering Bulletin, 44(1), 11–23. https://www.microsoft.com/en-us/research/publication/a-data-quality-driven-view-of-mlops/

Karimi, M. R., Gürel, N. M., Karlaš, B., Rausch, J., Zhang, C., & Krause, A. (2021, March). Online active model selection for pre-trained classifiers. In International Conference on Artificial Intelligence and Statistics (pp. 307-315). PMLR.

Renggli, C., Karlaš, B., Ding, B., Liu, F., Schawinski, K., Wu, W., & Zhang, C. (2019). Continuous integration of machine learning models with ease. ml/ci: Towards a rigorous yet practical treatment. Proceedings of Machine Learning and Systems, 1, 322-333.

Renggli, C., Rimanic, L., Kolar, L., Hollenstein, N., Wu, W., & Zhang, C. (2020). On automatic feasibility study for machine learning application development with ease. ml/snoopy. arXiv preprint arXiv:2010.08410.

Moreschini, S., Lomio, F., Hastbacka, D., & Taibi, D. (2022). MLOps for evolvable AI intensive software systems. 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 1293–1294. https://doi.org/10.1109/saner53432.2022.00155

Mucha, T. M., Ma, S., & Abhari, K. (2022, August). Beyond MLOps: The Lifecycle of Machine Learning-based Solutions. In AMCIS.

Matsui, B. M., & Goya, D. H. (2022, May). MLOps: A Guide to its Adoption in the Context of Responsible AI. In Proceedings of the 1st Workshop on Software Engineering for Responsible AI (pp. 45-49).

Zhao, Y. (2021). Machine learning in production: A literature.

Ruf, P., Madan, M., Reich, C., & Ould-Abdeslam, D. (2021). Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools. Applied Sciences, 11(19), 8861. https://doi.org/10.3390/app11198861

Hewage, N., & Meedeniya, D. (2022). Machine learning operations: A survey on MLOps tool support. arXiv preprint arXiv:2202.10169.

Recupito, G., Pecorelli, F., Catolino, G., Moreschini, S., Di Nucci, D., Palomba, F., & Tamburri, D. A. (2022). A Multivocal Literature Review of MLOps Tools and Features. In 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 84–91. https://doi.org/10.1109/seaa56994.2022.00021

Zhao, Y., Belloum, A. S., & Zhao, Z. (2022). Mlops scaling machine learning lifecycle in an industrial setting. International Journal of Industrial and Manufacturing Engineering, 16(5), 138-148.

Raj, E., Buffoni, D., Westerlund, M., & Ahola, K. (2021). Edge MLOps: An Automation Framework for AIoT Applications. In 2021 IEEE International Conference on Cloud Engineering (IC2E), 191–200. https://doi.org/10.1109/ic2e52221.2021.00034

Antonini, M., Pincheira, M., Vecchio, M., & Antonelli, F. (2022). Tiny-MLOps: a framework for orchestrating ML applications at the far edge of IoT systems. In 2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS), 1–8. https://doi.org/10.1109/eais51927.2022.9787703

Miñón, R., Diaz-De-Arcaya, J., Torre-Bastida, A. I., & Hartlieb, P. (2022). Pangea: An MLOps Tool for Automatically Generating Infrastructure and Deploying Analytic Pipelines in Edge, Fog and Cloud Layers. Sensors, 22(12), 4425. https://doi.org/10.3390/s22124425

Barrak, A., Petrillo, F., & Jaafar, F. (2022). Serverless on Machine Learning: A Systematic Mapping Study. IEEE Access, 10, 99337–99352. https://doi.org/10.1109/access.2022.3206366

Ciobanu, R., Purdila, A., Piciu, L., & Damian, A. (2021). SOLIS--The MLOps journey from data acquisition to actionable insights. arXiv preprint arXiv:2112.11925.

Garg, S., Pundir, P., Rathee, G., Gupta, P., Garg, S., & Ahlawat, S. (2021). On Continuous Integration / Continuous Delivery for Automated Deployment of Machine Learning Models using MLOps. In 2021 IEEE Fourth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 25–28. https://doi.org/10.1109/aike52691.2021.00010

Fujii, T. Y., Hayashi, V. T., Arakaki, R., Ruggiero, W. V., Bulla, R., Hayashi, F. H., & Khalil, K. A. (2021). A Digital Twin Architecture Model Applied with MLOps Techniques to Improve Short-Term Energy Consumption Prediction. Machines, 10(1), 23. https://doi.org/10.3390/machines10010023

Subramanya, R., Sierla, S., & Vyatkin, V. (2022). From DevOps to MLOps: Overview and Application to Electricity Market Forecasting. Applied Sciences, 12(19), 9851. https://doi.org/10.3390/app12199851

Baniecki, H., Kretowicz, W., PiÄ, P., & WiĹ, J. (2021). Dalex: responsible machine learning with interactive explainability and fairness in python. Journal of Machine Learning Research, 22(214), 1-7.

Banerjee, A., Chen, C., Hung, C., Huang, X., Wang, Y., & Chevesaran, R. (2020). Challenges and Experiences with MLOps for Performance Diagnostics in Hybrid-Cloud Enterprise Software Deployments. In 2020 USENIX Conference on Operational Machine Learning (OpML 20). https://www.usenix.org/system/files/opml20-paper-banerjee.pdf

Granlund, T., Stirbu, V., & Mikkonen, T. (2021). Towards Regulatory-Compliant MLOps: Oravizio’s Journey from a Machine Learning Experiment to a Deployed Certified Medical Product. SN Computer Science, 2(5). https://doi.org/10.1007/s42979-021-00726-1

Granlund, T., Kopponen, A., Stirbu, V., Myllyaho, L., & Mikkonen, T. (2021, May 1). MLOps Challenges in Multi-Organization Setup: Experiences from Two Real-World Cases. https://oraviz.io/, 82–88. https://doi.org/10.1109/wain52551.2021.00019

Stirparo, D., Penna, B., Kazemi, M., & Shashaj, A. (2022). Mining tourism experience on Twitter: a case study. arXiv preprint arXiv:2207.00816.

Olsen, R., Ahmed, N., & Alekseev, I. (2022, April 4). Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow. https://aws.amazon.com. Retrieved September 2, 2023, from https://aws.amazon.com/ru/blogs/machine-learning/build-an-mlops-sentiment-analysis-pipeline-using-amazon-sagemaker-ground-truth-and-databricks-mlflow/

Stepanov, N. (2023, January 7). Nikitast/lang-classifier-roberta. https://huggingface.co. Retrieved March 10, 2023, from https://huggingface.co/nikitast/lang-classifier-roberta#roberta-for-single-language-classification

NLTK. (2023, January 2). https://www.nltk.org/. Retrieved February 15, 2023, from https://www.nltk.org/

Documentation nltk.stem.snowball module. (2023, January 2). https://www.nltk.org. Retrieved February 15, 2023, from https://www.nltk.org/api/nltk.stem.snowball.html#module-nltk.stem.snowball

Sukhonin, D., & Panchenko, A. (2018, July 4). PYMYSTEM3. https://pypi.org. Retrieved March 10, 2023, from https://pypi.org/project/pymystem3/

The Apache Software Foundation. (n.d.). Apache Superset. https://superset.apache.org/. Retrieved March 10, 2023, from https://superset.apache.org/

Fehling, C., Leymann, F., Retter, R., Schupeck, W., & Arbitter, P. (2014). Cloud Computing Patterns. https://doi.org/10.1007/978-3-7091-1568-8

Portainer.io (n.d.). Portainer. https://www.portainer.io/. Retrieved February 20, 2020, from https://www.portainer.io/

Vmware (n.d.). RabbitMQ. https://www.rabbitmq.com/. Retrieved February 20, 2020, from https://www.rabbitmq.com/

Telegram. (n.d.). Telegram database library. https://core.telegram.org/tdlib. Retrieved February 20, 2020, from https://core.telegram.org/tdlib

Vaadin. (2022, June 28). https://vaadin.com/. Retrieved April 10, 2023, from https://vaadin.com/

Haisa, G., & Altenbek, G. (2022). Multi-Task Learning Model for Kazakh Query Understanding. Sensors, 22(24), 9810. https://doi.org/10.3390/s22249810

Singh, P., De Clercq, O., & Lefever, E. (2023). Distilling Monolingual Models from Large Multilingual Transformers. Electronics, 12(4), 1022. https://doi.org/10.3390/electronics12041022

sismetanin/xlm_roberta_base-ru-sentiment-rureviews Hugging Face. (n.d.). https://huggingface.co/sismetanin/xlm_roberta_base-ru-sentiment-rureviews

Yandex. (2023). Geo Reviews Dataset. https://github.com/yandex/. Retrieved October 14, 2023, from https://github.com/yandex/geo-reviews-dataset-2023

USING MLOPS FOR DEPLOYMENT OF OPINION MINING MODEL AS A SERVICE FOR SMART CITY APPLICATIONS

Authors

DOI:

Keywords:

Abstract

Author Biographies

Aigerim Mussina, Al-Farabi Kazakh National University, Kazakhstan

Didar Yedilkhan, Astana IT University, Kazakhstan

Yermek Alimzhanov, Astana IT University, Kazakhstan

Aliya Nugumanova, Astana IT University, Kazakhstan

Sanzhar Aubakirov, Al-Farabi Kazakh National University, Kazakhstan

Aigerim Mansurova, Astana IT University, Kazakhstan

References

Downloads

Published

How to Cite

Issue

Section

License