Modelling Procedure while Assessing the Impact of News Articles on Cryptocurrency (Bitcoin) Market Movement

Document Type : Original article

Authors

1 Department of Statistics, Federal University, Otuoke, Nigeria

2 Department of Statistics, Nasarawa State University, Kefi, Nigeria.

10.22059/jcss.2025.393177.1139

Abstract

Background: Cryptocurrencies have a variety of unique qualities, from cutting-edge technology to highly secure architecture. Additionally, the ability to invest in cryptocurrency, as an asset or a function of its prosperity has made crypto-currencies attractive to venture capitalists, computer scientists, and statisticians.
Aims: In this study, we concentrated on a collection of documents web-scrapped from the market section of CNBC, where each document is associated with a response variable.
Methodology: These documents contain preprocessed words/terms of day-to-day reportage on cryptocurrency (Bitcoin). The corresponding response variables are the daily opening and closing price of Bitcoin prices. The Supervised Latent Dirichlet Allocation(sLDA), a statistical model of labeled documents, was used to analyze the textual data alongside their corresponding response variables, since our study aims to predict the response variable for unlabeled new documents.
Results: Hidden Topics with their unique terms from the preprocessed articles were exposed through a Natural language processor. Mean absolute error (MAE), Mean absolute percentage error (MAPE), and Root mean square error (RMSE) graphs were constructed for the sLDA models with ‘k = 3,10,20,30,50,75,100 and 200 Topics’ values where the model with the best evaluation metric, was selected for prediction purpose.
Conclusion: It was discovered that the sLDA model with k = 20. A posterior covariance matrix which shows the proportion of terms from the documents, making up a Topic. Coefficient values were generated in other to graphically visualize how important the discovered topics are and how they affect the market trend. Finally, the prediction of new labels (numeric-decoded closing prices) for the unlabeled documents was done and comparisons were made; the predicted labels follow a similar pattern to that of the time series closing price trend.

Keywords

Main Subjects


Blei, D.M.; Kucukelbir, A. & McAuliffe, J.D. (2017). “Variational inference: A review for statisticians”. Journal of the American Statistical Association. 112(518): 859-877. https://doi.org/10.1080/01621459.2017.1285773.
Buchholz, M.; Delaney, J.; Warren, J. & Parker, J. (2012). Bits and Bets, Information, Price Volatility, and Demand for Bitcoin. Economics. 312.  https://www.reed.edu/economics/parker/s12/312/finalproj/Bitcoin.pdf.
Clinton, J.; Jackman, S. & Rivers, D. (2004). “The statistical analysis of roll call data”. American Political Science Review. 98(2): 355-370. https://doi.org/10.1017/S0003055404001194.
Dawson, J. & Kendziorski, C. (2012). “Survival-supervised latent Dirichlet allocation models for genomic analysis of time-to-event outcomes”. arXiv. TR 225. https://doi.org/10.48550/arXiv.1202.5999.
Eklund, M. & Bejerholm, U. (2004). “Time use and occupational performance among persons with schizophrenia”. Occupational Therapy in Mental Health. 20: 27-47. https://doi.org/10.1300/J004v20n01_02.
Erosheva, E.; Fienberg, S. & Lafferty, J. (2004). “Mixed-membership models of scientific publications”. Proceedings of the National Academy of Science. 101(1): 5220-5227. https://doi.org/10.1073/pnas.0307760101.
Fataliyev, K.; Chivukula, A.; Prasad, M. & Liu, W. (2021). “Text-based stock market analysis: A review”. 1(1). July. https://arxiv.org/pdf/2106.12985.
Frank, X.; Cambria, E. & Welsch, R.E. (2017). “Natural language based financial forecasting: A survey”. Artificial Intelligence Review. 50(3): 49-73. https://link.springer.com/article/10.1007/s10462-017-9588-9.
Kaya, M.Y. & Karsligil M.E. (2010). “Stock price prediction using financial news articles”. 2nd IEEE International Conference on Information and Financial Engineering. IEEE: 478-482.
Kumar, G.; Jain, S. & Singh, U.P. (2020). “Stock market forecasting using computational intelligence: A survey”. Archives of Computational Methods in Engineering. 28(6): 1-33. http://dx.doi.org/10.1007/s11831-020-09413-5.
Li, Q.; Chen, Y.; Wang, Y.; Chen, Y. & Chen, H. (2018). “Web media and stock markets: A survey and future directions from a big data perspective”. IEEE Transactions on Knowledge and Data Engineering. 30: 381-399. http://dx.doi.org/10.1109/TKDE.2017.2763144.
Loughran, T.; Mcdonald, B. & Pragidis, I. (2019). “Assimilation of oil news into prices”. International Review of Financial Analysis. 63. https://doi.org/10.1016/j.irfa.2019.03.008.
Mckinney, W. (2010). “Data structures for statistical computing in Python”. Pyton in Science Conference. http://dx.doi.org/10.25080/Majora-92bf1922-00a.
Mohan, S.; Mullapudi, S.; Sammeta, S.; Vijayvergia, P. & Anastasiu, D. (2019). “Stock price prediction using news sentiment analysis”. IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService). 205-208. http://dx.doi.org/10.1109/BigDataService.2019.00035.
Pang, B. & Lee, L. (2008). “Opinion Mining and Sentiment Analysis”. Found. Trends Inf. Retr. 2(1-2): 1-135. http://dx.doi.org/10.1561/1500000011.
Perotte, A.; Bartlett, N.; Elhadad, N. & Wood, F. (2011). “Hierarchically supervised latent dirichlet allocation”. Neural Information Processing Systems. 24. https://www.researchgate.net/publication/228449895_Hierarchically_Supervised_Latent_Dirichlet_Allocation.
Sahut, J.; Hájek, P.; Olej, V. & Hikkerova, L. (2024). “The role of news-based sentiment in forecasting crude oil price during the Covid-19 pandemic”. Annals of Operations Research. 345(2): 861-884. http://dx.doi.org/10.1007/s10479-024-05821-z.
Schofield, A.; Magnusson, M.; Thompson, L. & Mimno, D. (2017). “Understanding text pre-processing for latent dirichlet allocation”. EMNLP. https://www.cs.cornell.edu/~xanda/winlp2017.pdf.
Shah, D.; Isah, H. & Zulkernine, F. (2019). “Stock market analysis: A review and taxonomy of prediction techniques”. International Journal of Financial Studies. 7(2): 26.  https://doi.org/10.3390/ijfs7020026.
Sharma, R.K. (2020). “Comparison of stock price prediction models using news articles, currency exchange rates and global indicator performance”. Journal of Advanced Research in Dynamical and Control Systems. 12(7).  http://doi.org/10.5373/JARDCS/V12SP7/20202273.
Thakkar, A. & Chaudhari, K. (2021). “Fusion in stock market prediction: A decade survey on the necessity, recent developments, and potential future directions”. Information Fusion. 65: 95-107. https://doi.org/10.1016/j.inffus.2020.08.019.
Wilcox, K.T.; Jacobucci, R.; Zhang, Z. & Ammerman, B.A. (2021). “Supervised latent dirichlet allocation with covariates: A bayesian structural and measurement model of text and covariates”. Psychological Methods. 28: http://dx.doi.org/10.31234/osf.io/62tc3.
Yao, W.; Xu, K. & Li, Q. (2019). Exploring the influence of news articles on Bitcoin Price with Machine Learning. Graduate School at Shenzhen, Tsinghua University, Shenzhen, China. http://dx.doi.org/10.1109/ISCC47284.2019.8969596.
Yap, A.Y.; Schumaker, R. & Chen, H. (2012). “Predicting Stock price movement from financial news articles”. Information Systems for Global Financial Markets. http://dx.doi.org/10.4018/978-1-61350-162-7.ch006.
Zang, C. & Kjellström, H. (2012). “How to supervise topic models”. European Conference on Computer Vision. pp. 500-515. Springer.

Articles in Press, Accepted Manuscript
Available Online from 13 July 2025
  • Receive Date: 09 April 2025
  • Revise Date: 02 June 2025
  • Accept Date: 02 June 2025
  • First Publish Date: 13 July 2025
  • Publish Date: 13 July 2025