IMPACT OF PARAMETER-TO-DATA RATIO ON LLM FINE-TUNING IN RUSSIAN TEXT CLASSIFICATION TASKS
Abstract and keywords
Abstract (English):
This paper addresses the optimization of fine-tuning large language models (LLMs) for Russian-language text classification under constrained computational resources. The proposed approach hinges on balancing the model size (i.e., number of parameters) against the volume of training data: a smaller model is fine-tuned on a larger dataset and compared against a larger model fine-tuned on a smaller dataset. The aim was to establish the impact of different ratios of model parameters and data for further training on the quality of text classification by large language models. We hypothesized that a weaker (i.e., smaller) model trained on more data could achieve classification performance comparable to or even surpassing that of a stronger (i.e., larger) model trained on less data. This hypothesis was motivated by the need to adapt LLMs to Russian-language tasks, where increased dataset size may compensate for reduced model capacity. The hypothesis was evaluated across three classification tasks: sentiment analysis of movie reviews, sentiment analysis of service reviews, and topic classification of news articles. The experiments were conducted on Russian-language datasets and employed the multilingual models XLM-RoBERTa-comet-small (107M parameters) for the weaker model and XLM-RoBERTa-base (278M parameters) for the stronger model. The smaller model was fine-tuned on proportionally larger datasets (scaled according to the parameter count difference) while the larger model used correspondingly smaller datasets. The weaker model consistently matched or exceeded the performance of the stronger model while requiring 2–3 times fewer computational resources (measured in FLOPs). The result highlights the practical value of this approach for energy-efficient fine-tuning in Russian-language settings.

Keywords:
LLM, fine-tuning, XLM-RoBERTa, Russian-language datasets, text classification, sentiment, topic classification
Text
Text (PDF): Read Download
References

1. Galtseva T. V., Nesterov S. A. Classification and sentiment analysis of texts published on the Internet. System analysis in design and management: Proc. XXVII Intern. Sci.-Prac. Conf., St. Petersburg, 13–14 Oct 2023. St. Petersburg: POLITEKH-PRESS, 2024, pt. 2, 491–498. (In Russ.) https://doi.org/10.18720/SPBPU/2/id24-202

2. Maksimenko P. I. Genre classification of literary texts through neural network methods (based on the Russian-language electronic fanfiction database). Human being: Image and Essence. Humanitarian Aspects, 2025, (1): 184–200. (In Russ.) https://doi.org/10.31249/chel/2025.01.13

3. Markov A. K., Semyonochkin D. O., Kravets A. G., Yanovskiy T. A. Comparative analysis of applied natural language processing technologies for improving the quality of digital document classification. International Journal of Open Information Technologies, 2024, 12(3): 66–77. (In Russ.) https://elibrary.ru/tubosi

4. Pleshakova E. S., Gataullin S. T., Osipov A. V., Romanova E. V., Samburov N. S. Effective classification of natural language texts and determination of speech tonality using selected machine learning methods. Security Issues, 2022, (4): 1–14. (In Russ.) https://doi.org/10.25136/2409-7543.2022.4.38658

5. Chelyshev E. A., Otsokov Sh. A., Raskatova M. V., Shchegolev P. Comparing classification methods for news texts in Russian using machine learning algorithms. Proceedings in Cybernetics, 2022, (1): 63–71. (In Russ.) https://doi.org/10.34822/1999-7604-2022-1-63-71

6. Anisuzzaman D. M., Malins J. G., Friedman P. A., Attia Z. I. Fine-tuning large language models for specialized use cases. Mayo Clinic Proceedings: Digital Health, 2025, 3(1). https://doi.org/10.1016/j.mcpdig.2024.11.005

7. Blinova O., Tarasov N. A hybrid model of complexity estimation: Evidence from Russian legal texts. Frontiers in Artificial Intelligence, 2022, 5. https://doi.org/10.3389/frai.2022.1008530

8. Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal A., Herberts-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D. M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. Language models are few-shot learners. arXiv, 2020. https://doi.org/10.48550/arXiv.2005.14165

9. Chung H. W., Hou L., Longpre S., Zoph B., Tay Y., Fedus W., Li Y., Wang X., Dehghani M., Brahma S., Webson A., Gu S. S., Dai Z., Suzgun M., Chen X., Chowdhery A., Castro-Ros A., Pellat M., Robinson K., Valter D., Narang S., Mishra G., Yu A., Zhao V., Huang Y., Dai A., Yu H., Petrov S., Chi Ed H., Dean J., Devlin J., Roberts A., Zhou D., Le Q. V., Wei J. Scaling instruction-finetuned language models. arXiv, 2022. https://doi.org/10.48550/arXiv.2210.11416

10. Ding N., Qin Y., Yang G., Wei F., Yang Z., Su Y., Hu S., Chen Y., Chan C.-M., Chen W., Yi J., Zhao W., Wang X., Liu Z., Zheng H.-T., Chen J., Liu Y., Tang J., Li J., Sun M. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 2023, 5: 220–235. https://doi.org/10.1038/s42256-023-00626-4

11. Hoffmann J., Borgeaud S., Mensch A., Buchatskaya E., Cai T., Rutherford E., de Las Casas D., Hendricks L. A., Welbi J., Clark A., Hennigan T., Noland E., Millican K., Van den Driessche G., Damoc B., Guy A., Osindero S., Simonyan K., Rae J. W., Vinyals O., Sifre L. Training compute-optimal large language models. arXiv, 2022. https://doi.org/10.48550/arXiv.2203.15556

12. Kaplan J., McCandlish S., Henighan T., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling laws for neural language models. arXiv, 2020. https://doi.org/10.48550/arXiv.2001.08361

13. Lialin V., Deshpande V., Yao X., Rumshisky A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv, 2023. https://doi.org/10.48550/arXiv.2303.15647

14. Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv, 2019. https://doi.org/10.48550/arXiv.1907.11692

15. Lu W., Luu R. K., Buehler M. J. Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities. arXiv, 2024. https://doi.org/10.48550/arXiv.2409.03444

16. Nikolich A., Korolev K., Bratchikov S., Kiselev I., Shelmanov A. Vikhr: The family of open-source instruction-tuned large language models for Russian. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.13929

17. Nikolich A., Puchkova A. Fine-tuning GPT-3 for Russian text summarization. arXiv, 2021. https://doi.org/10.48550/arXiv.2108.03502

18. Pratap S., Aranha A. R., Kumar D., Malhotra G., Iyer A. P. N., Shylaja S. S. The fine art of fine-tuning: A structured review of advanced LLM fine-tuning techniques. Natural Language Processing Journal, 2025, 11. https://doi.org/10.1016/j.nlp.2025.100144

19. Sardana N., Portes J., Doubov S., Franke J. Beyond chinchilla-optimal: Accounting for inference in language model scaling laws. arXiv, 2023. https://doi.org/10.48550/arXiv.2401.00448

20. Smetanin S., Komarov M. Deep transfer learning baselines for sentiment analysis in Russian. Information Processing & Management, 2021, 58(3). https://doi.org/10.1016/j.ipm.2020.102484

21. Srinivasan K. P. V., Gumpena P., Yattapu M., Brahmbhatt V. H. Comparative analysis of different efficient fine tuning methods of large language models (LLMs) in low-resource setting. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.13181

22. Wang L., Chen S., Jiang L., Pan S., Cai R., Yang S., Yang F. Parameter-efficient fine-tuning in large language models: A survey of methodologies. Artificial Intelligence Review, 2025, 58. https://doi.org/10.1007/s10462-025-11236-4


Login or Create
* Forgot password?