Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: http://elibrary.kdpu.edu.ua/xmlui/handle/123456789/12014
Повний запис метаданих
Поле DCЗначенняМова
dc.contributor.authorСемеріков, Сергій Олексійович-
dc.contributor.authorВакалюк, Тетяна Анатоліївна-
dc.contributor.authorКаневська, Ольга Борисівна-
dc.contributor.authorМоісеєнко, Михайло Вікторович-
dc.contributor.authorДончев, Іван Іванович-
dc.contributor.authorКолгатін, Андрій Олександрович-
dc.date.accessioned2025-06-24T10:29:21Z-
dc.date.available2025-06-24T10:29:21Z-
dc.date.issued2025-03-14-
dc.identifier.citationSemerikov S.O., Vakaliuk T.A., Kanevska O.B., Moiseienko M.V., Donchev I.I., Kolhatin A.O. LLM on the edge: the new frontier / CEUR Workshop Proceedings. – 2025. – Vol. 3943. – P. 137–161. – Access mode: https://ceur-ws.org/Vol-3943/paper28.pdfuk
dc.identifier.urihttp://elibrary.kdpu.edu.ua/xmlui/handle/123456789/12014-
dc.identifier.urihttps://ceur-ws.org/Vol-3943/paper28.pdf-
dc.description[1] A. V. Slobodianiuk, S. O. Semerikov, Advances in neural text generation: A systematic review (2022-2024), CEUR Workshop Proceedings 3917 (2025) 332–361. [2] R. O. Liashenko, S. O. Semerikov, Bibliometric analysis and experimental assessment of chatbot training approaches, CEUR Workshop Proceedings 3917 (2025) 199–225. [3] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186. doi:10. 18653/V1/N19-1423. [4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, The Journal of Machine Learning Research 21 (2020) 5485–5551. [5] O. Friha, M. Amine Ferrag, B. Kantarci, B. Cakmak, A. Ozgun, N. Ghoualmi-Zine, LLMBased Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, IEEE Open Journal of the Communications Society 5 (2024) 5799–5856. doi:10.1109/OJCOMS.2024.3456549. [6] F. Cai, D. Yuan, Z. Yang, L. Cui, Edge-LLM: A Collaborative Framework for Large Language Model Serving in Edge Computing, in: R. N. Chang, C. K. Chang, Z. Jiang, J. Yang, Z. Jin, M. Sheng, J. Fan, K. K. Fletcher, Q. He, Q. He, C. Ardagna, J. Yang, J. Yin, Z. Wang, A. Beheshti, S. Russo, N. Atukorala, J. Wu, P. S. Yu, H. Ludwig, S. Reiff-Marganiec, E. Zhang, A. Sailer, N. Bena, K. Li, Y. Watanabe, T. Zhao, S. Wang, Z. Tu, Y. Wang, K. Wei (Eds.), Proceedings of the IEEE International Conference on Web Services, ICWS, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 799–809. doi:10.1109/ICWS62655.2024.00099. [7] Q. Li, J. Wen, H. Jin, Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by Example, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8 (2024) 179. doi:10.1145/3699760. [8] S. Bhardwaj, P. Singh, M. K. Pandit, A Survey on the Integration and Optimization of Large Language Models in Edge Computing Environments, in: 2024 16th International Conference on Computer and Automation Engineering, ICCAE 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 168–172. doi:10.1109/ICCAE59995.2024.10569285. [9] M. Zhang, X. Shen, J. Cao, Z. Cui, S. Jiang, EdgeShard: Efficient LLM Inference via Collaborative Edge Computing, IEEE Internet of Things Journal (2024). doi:10.1109/JIOT.2024.3524255. [10] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language Models are Few-Shot Learners, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020, pp. 1877–1901. URL: https://proceedings. neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html. [11] Y. Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, K. B. Letaief, Large Language Models Empowered Autonomous Edge AI for Connected Intelligence, IEEE Communications Magazine 62 (2024) 140–146. doi:10.1109/MCOM.001.2300550. [12] Z. Yu, Z. Wang, Y. Li, R. Gao, X. Zhou, S. R. Bommu, Y. K. Zhao, Y. C. Lin, EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting, in: Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 327. doi:10.1145/3649329. 3658473. [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is All you Need, in: I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008. URL: https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. [14] D. Qiao, X. Ao, Y. Liu, X. Chen, F. Song, Z. Qin, W. Jin, Tri-AFLLM: Resource-Efficient Adaptive Asynchronous Accelerated Federated LLMs, IEEE Transactions on Circuits and Systems for Video Technology (2024). doi:10.1109/TCSVT.2024.3519790. [15] L. E. Erdogan, N. Lee, S. Jha, S. Kim, R. Tabrizi, S. Moon, C. Hooper, G. Anumanchipalli, K. Keutzer, A. Gholami, TinyAgent: Function Calling at the Edge, in: D. I. H. Farias, T. Hope, M. Li (Eds.), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations, Association for Computational Linguistics (ACL), 2024, pp. 80–88. [16] Z. Wang, J. Yang, X. Qian, S. Xing, X. Jiang, C. Lv, S. Zhang, MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices, in: Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops, MMAsia ’24 Workshops, Association for Computing Machinery, New York, NY, USA, 2024, p. 11. doi:10.1145/3700410.3702126. [17] A. Candel, J. McKinney, P. Singer, P. Pfeiffer, M. Jeblick, C. M. Lee, M. V. Conde, H2O Open Ecosystem for State-of-the-art Large Language Models, in: Y. Feng, E. Lefever (Eds.), EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings of the System Demonstrations, Association for Computational Linguistics (ACL), 2023, pp. 82–89. [18] X. Shen, Z. Han, L. Lu, Z. Kong, P. Dong, Z. Li, Y. Xie, C. Wu, M. Leeser, P. Zhao, X. Lin, Y. Wang, HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024). doi:10.1109/ TCAD.2024.3487781. [19] T. Tambe, J. Zhang, C. Hooper, T. Jia, P. N. Whatmough, J. Zuckerman, M. C. D. Santos, E. J. Loscalzo, D. Giri, K. Shepard, L. Carloni, A. Rush, D. Brooks, G.-Y. Wei, 22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management, in: Digest of Technical Papers - IEEE International Solid-State Circuits Conference, volume 2023-February, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 342–344. doi:10.1109/ISSCC42615.2023.10067817. [20] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, CoRR abs/1910.01108 (2019). URL: http://arxiv.org/abs/1910.01108. arXiv:1910.01108. [21] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly, Parameter-Efficient Transfer Learning for NLP, in: K. Chaudhuri, R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 2790–2799. URL: http://proceedings.mlr.press/v97/houlsby19a.html. [22] G. E. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, CoRR abs/1503.02531 (2015). URL: http://arxiv.org/abs/1503.02531. arXiv:1503.02531. [23] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, Q. Liu, TinyBERT: Distilling BERT for natural language understanding, in: T. Cohn, Y. He, Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 4163–4174. doi:10.18653/v1/2020.findings-emnlp.372. [24] X. Zhou, Q. Jia, Y. Hu, R. Xie, T. Huang, F. R. Yu, GenG: An LLM-Based Generic Time Series Data Generation Approach for Edge Intelligence via Cross-Domain Collaboration, in: IEEE INFOCOM 2024 - IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 1–6. doi:10.1109/INFOCOMWKSHPS61880.2024.10620716. [25] W. Zhao, W. Jing, Z. Lu, X. Wen, Edge and Terminal Cooperation Enabled LLM Deployment Optimization in Wireless Network, in: International Conference on Communications in China, ICCC Workshops 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 220–225. doi:10.1109/ICCCWorkshops62562.2024.10693742. [26] Y. Yao, Z. Li, H. Zhao, GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment, in: L.-W. Ku, A. Martins, V. Srikumar (Eds.), Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2024, pp. 3433–3446. [27] Z. Yao, Z. Tang, J. Lou, P. Shen, W. Jia, VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework, in: R. N. Chang, C. K. Chang, Z. Jiang, J. Yang, Z. Jin, M. Sheng, J. Fan, K. K. Fletcher, Q. He, Q. He, C. Ardagna, J. Yang, J. Yin, Z. Wang, A. Beheshti, S. Russo, N. Atukorala, J. Wu, P. S. Yu, H. Ludwig, S. Reiff-Marganiec, E. Zhang, A. Sailer, N. Bena, K. Li, Y. Watanabe, T. Zhao, S. Wang, Z. Tu, Y. Wang, K. Wei (Eds.), Proceedings of the IEEE International Conference on Web Services, ICWS, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 865–876. doi:10.1109/ICWS62655.2024.00105. [28] N. Nazari, F. Xiang, C. Fang, H. M. Makrani, A. Puri, K. Patwari, H. Sayadi, S. Rafatirad, C.-N. Chuah, H. Homayoun, LLM-FIN: Large Language Models Fingerprinting Attack on Edge Devices, in: Proceedings - International Symposium on Quality Electronic Design, ISQED, IEEE Computer Society, 2024, pp. 1–6. doi:10.1109/ISQED60706.2024.10528736. [29] B. Ouyang, S. Ye, L. Zeng, T. Qian, J. Li, X. Chen, Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuning, in: Proceedings of the 53rd International Conference on Parallel Processing, ICPP ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 762–771. doi:10.1145/3673038.3673043. [30] Z. Yu, S. Liang, T. Ma, Y. Cai, Z. Nan, D. Huang, X. Song, Y. Hao, J. Zhang, T. Zhi, Y. Zhao, Z. Du, X. Hu, Q. Guo, T. Chen, Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM, in: Proceedings of the Annual International Symposium on Microarchitecture, MICRO, IEEE Computer Society, 2024, pp. 1474–1488. doi:10.1109/MICRO61859.2024.00108. [31] T. Glint, B. Mittal, S. Sharma, A. Q. Ronak, A. Goud, N. Kasture, Z. Momin, A. Krishna, J. Mekie, AxLaM: Energy-efficient accelerator design for language models for edge computing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 383 (2025) 20230395. doi:10.1098/rsta.2023.0395. [32] T. Yang, F. Ma, X. Li, F. Liu, Y. Zhao, Z. He, L. Jiang, DTATrans: Leveraging Dynamic Token-Based Quantization With Accuracy Compensation Mechanism for Efficient Transformer Architecture, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42 (2023) 509–520. doi:10.1109/TCAD.2022.3181541. [33] T. Yang, D. Li, Z. Song, Y. Zhao, F. Liu, Z. Wang, Z. He, L. Jiang, DTQAtten: Leveraging Dynamic Token-based Quantization for Efficient Attention Architecture, in: C. Bolchini, I. Verbauwhede, I. Vatajelu (Eds.), Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 700–705. doi:10.23919/DATE54114.2022.9774692. [34] M. Ibrahim, Z. Wan, H. Li, P. Panda, T. Krishna, P. Kanerva, Y. Chen, A. Raychowdhury, Special Session: Neuro-Symbolic Architecture Meets Large Language Models: A Memory-Centric Perspective, in: Proceedings - 2024 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 11–20. doi:10.1109/CODES-ISSS60120.2024.00012. [35] A. Basit, M. Shafique, TinyDigiClones: A Multi-Modal LLM-Based Framework for Edge-optimized Personalized Avatars, in: Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 1–9. doi:10.1109/IJCNN60899. 2024.10649909. [36] L. Wu, Y. Zhao, C. Wang, T. Liu, H. Wang, A First Look at LLM-powered Smartphones, in: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops, ASEW ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 208–217. doi:10.1145/3691621.3694952. [37] D. Zhang, W. Shi, Blockchain-based Edge Intelligence Enabled by AI Large Models for Future Internet of Things, in: 2024 IEEE 12th International Conference on Information and Communication Networks, ICICN 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 368–374. doi:10.1109/ICICN62625.2024.10761527. [38] Y. Rong, Y. Mao, X. He, M. Chen, Large-Scale Traffic Flow Forecast with Lightweight LLM in Edge Intelligence, IEEE Internet of Things Magazine 8 (2025) 12–18. doi:10.1109/IOTM.001.2400047. [39] F. Piccialli, D. Chiaro, P. Qi, V. Bellandi, E. Damiani, Federated and edge learning for large language models, Information Fusion 117 (2025) 102840. doi:10.1016/j.inffus.2024.102840. [40] J. Du, T. Lin, C. Jiang, Q. Yang, C. F. Bader, Z. Han, Distributed Foundation Models for MultiModal Learning in 6G Wireless Networks, IEEE Wireless Communications 31 (2024) 20–30. doi:10.1109/MWC.009.2300501. [41] Y. Hu, Y. Wang, R. Liu, Z. Shen, H. Lipson, Reconfigurable Robot Identification from Motion Data, in: IEEE International Conference on Intelligent Robots and Systems, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 14133–14140. doi:10.1109/IROS58592.2024.10801809. [42] K. Kawaharazuka, Y. Obinata, N. Kanazawa, K. Okada, M. Inaba, Robotic Applications of PreTrained Vision-Language Models to Various Recognition Behaviors, in: IEEE-RAS International Conference on Humanoid Robots, IEEE Computer Society, 2023, pp. 1–8. doi:10.1109/ Humanoids57100.2023.10375211. [43] C. Xu, X. Hou, J. Liu, C. Li, T. Huang, X. Zhu, M. Niu, L. Sun, P. Tang, T. Xu, K.-T. Cheng, M. Guo, MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their HardwareSoftware Implications, in: Proceedings - 2023 IEEE International Symposium on Workload Characterization, IISWC 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 154–166. doi:10.1109/IISWC59245.2023.00014. [44] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, J. Dean, Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL: https://openreview.net/forum?id=B1ckMDqlg. [45] C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, JMLR.org, 2017, p. 1126–1135. [46] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. H. Yoon, In-Datacenter Performance Analysis of a Tensor Processing Unit, SIGARCH Comput. Archit. News 45 (2017) 1–12. doi:10. 1145/3140659.3080246. [47] Y. Xue, Y. Liu, J. Huang, System Virtualization for Neural Processing Units, in: Proceedings of the 19th Workshop on Hot Topics in Operating Systems, HOTOS ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 80–86. doi:10.1145/3593856.3595912. [48] C. Dwork, A. Roth, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science 9 (2014) 211–407. doi:10.1561/0400000042. [49] D. Kim, Y. Lee, S. Cheon, H. Choi, J. Lee, H. Youm, D. Lee, H. Kim, Privacy Set: Privacy-AuthorityAware Compiler for Homomorphic Encryption on Edge-Cloud System, IEEE Internet Things J. 11 (2024) 35167–35184. doi:10.1109/JIOT.2024.3437356. [50] M. Sabt, M. Achemlal, A. Bouabdallah, Trusted Execution Environment: What It is, and What It is Not, in: 2015 IEEE Trustcom/BigDataSE/ISPA, volume 1, 2015, pp. 57–64. doi:10.1109/ Trustcom.2015.357. [51] P. Dubey, M. Kumar, Integrating Explainable AI with Federated Learning for Next-Generation IoT: A comprehensive review and prospective insights, Computer Science Review 56 (2025) 100697. doi:10.1016/J.COSREV.2024.100697. [52] A. Petrella, M. Miozzo, P. Dini, Mobile Traffic Prediction at the Edge Through Distributed and Deep Transfer Learning, IEEE Access 12 (2024) 191288–191303. doi:10.1109/ACCESS.2024.3518483. [53] B. Thomas, S. Kessler, S. Karout, Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition, in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7102–7106. doi:10.1109/ICASSP43922.2022. 9746223. [54] L. Falissard, S. Affeldt, M. Nadif, Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations, in: G. Nicosia, V. Ojha, E. L. Malfa, G. L. Malfa, P. M. Pardalos, R. Umeton (Eds.), Machine Learning, Optimization, and Data Science - 9th International Conference, LOD 2023, Grasmere, UK, September 22-26, 2023, Revised Selected Papers, Part I, volume 14505 of Lecture Notes in Computer Science, Springer, 2023, pp. 488–496. doi:10.1007/ 978-3-031-53969-5_36. [55] B. Yuan, Y. Chen, Y. Zhang, W. Jiang, Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLMs-Powered Assistance, in: L. Ku, A. Martins, V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Association for Computational Linguistics, 2024, pp. 10977–11011. doi:10.18653/V1/2024.ACL-LONG.592. [56] Q. Zhang, C. Xu, J. Li, Y. Sun, J. Bao, D. Zhang, LLM-TSFD: An industrial time series human-inthe-loop fault diagnosis method based on a large language model, Expert Syst. Appl. 264 (2025). doi:10.1016/j.eswa.2024.125861. [57] M. Garofalo, M. Colosi, A. Catalfamo, M. Villari, Web-Centric Federated Learning over the Cloud-Edge Continuum Leveraging ONNX and WASM, in: IEEE Symposium on Computers and Communications, ISCC 2024, Paris, France, June 26-29, 2024, IEEE, 2024, pp. 1–7. doi:10.1109/ ISCC61673.2024.10733614. [58] I. D. Martinez-Casanueva, L. Bellido, C. M. Lentisco, D. Fernández, An Initial Approach to a Multiaccess Edge Computing Reference Architecture Implementation Using Kubernetes, in: H. Gao, R. J. D. Barroso, S. Pang, R. Li (Eds.), Broadband Communications, Networks, and Systems - 11th EAI International Conference, BROADNETS 2020, Qingdao, China, December 11-12, 2020, Proceedings, volume 355 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Springer, 2020, pp. 185–193. doi:10.1007/978-3-030-68737-3_13. [59] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, S. Lim, A survey and comparison of peer-to-peer overlay network schemes, IEEE Communications Surveys & Tutorials 7 (2005) 72–93. doi:10. 1109/COMST.2005.1610546. [60] F. Zhu, F. Huang, Y. Yu, G. Liu, T. Huang, Task Offloading with LLM-Enhanced Multi-Agent Reinforcement Learning in UAV-Assisted Edge Computing, Sensors 25 (2025) 175. doi:10.3390/ s25010175. [61] M. Xu, D. Niyato, C. G. Brinton, Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading, CoRR abs/2501.14205 (2025). doi:10.48550/ARXIV.2501.14205. arXiv:2501.14205. [62] C. Fu, Y. Su, K. Su, Y. Liu, J. Shi, B. Wu, C. Liu, C. T. Ishi, H. Ishiguro, HAM-GNN: A hierarchical attention-based multi-dimensional edge graph neural network for dialogue act classification, Expert Syst. Appl. 261 (2025) 125459. doi:10.1016/J.ESWA.2024.125459. [63] M. Yang, Y. Yang, P. Jiang, A design method for edge–cloud collaborative product service system: a dynamic event-state knowledge graph-based approach with real case study, International Journal of Production Research 62 (2024) 2584–2605. doi:10.1080/00207543.2023.2219345. [64] N. Wang, J. Xie, H. Luo, Q. Cheng, J. Wu, M. Jia, L. Li, Efficient Image Captioning for Edge Devices, in: B. Williams, Y. Chen, J. Neville (Eds.), Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, AAAI Press, 2023, pp. 2608–2616. doi:10.1609/AAAI.V37I2. 25359. [65] Y. Wang, Y. Dong, S. Guo, Y. Yang, X. Liao, Latency-Aware Adaptive Video Summarization for Mobile Edge Clouds, IEEE Trans. Multim. 22 (2020) 1193–1207. doi:10.1109/TMM.2019. 2939753. [66] R. Liashenko, S. Semerikov, The Determination and Visualisation of Key Concepts Related to the Training of Chatbots, in: E. Faure, Y. Tryus, T. Vartiainen, O. Danchenko, M. Bondarenko, C. Bazilo, G. Zaspa (Eds.), Information Technology for Education, Science, and Technics, volume 222 of Lecture Notes on Data Engineering and Communications Technologies, Springer Nature Switzerland, Cham, 2024, pp. 111–126. doi:10.1007/978-3-031-71804-5_8. [67] V. Mukovoz, T. Vakaliuk, S. Semerikov, Road Sign Recognition Using Convolutional Neural Networks, in: Information Technology for Education, Science, and Technics, volume 222 of Lecture Notes on Data Engineering and Communications Technologies, Springer Nature Switzerland, Cham, 2024, pp. 172–188. doi:10.1007/978-3-031-71804-5_12. [68] M. Fakih, R. Dharmaji, Y. Moghaddas, G. Quiros, O. Ogundare, M. A. Al Faruque, LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems, in: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 192–203. doi:10.1145/3639477.3639743. [69] S. Ji, X. Zheng, J. Sun, R. Chen, W. Gao, M. Srivastava, MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM, CoRR abs/2409.10064 (2024). doi:10.48550/ ARXIV.2409.10064. arXiv:2409.10064. [70] E. Strubell, A. Ganesh, A. McCallum, Energy and Policy Considerations for Modern Deep Learning Research, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020) 13693–13696. doi:10.1609/aaai.v34i09.7123. [71] A. Khoshsirat, G. Perin, M. Rossi, Decentralized LLM inference over edge networks with energy harvesting, CoRR abs/2408.15907 (2024). doi:10.48550/ARXIV.2408.15907. arXiv:2408.15907. [72] I. Mohiuddin, A. Almogren, Workload aware VM consolidation method in edge/cloud computing for iot applications, J. Parallel Distributed Comput. 123 (2019) 204–214. doi:10.1016/J.JPDC. 2018.09.011. [73] M. S. Hossain, Y. Hao, L. Hu, J. Liu, G. Wei, M. Chen, Immersive Multimedia Service Caching in Edge Cloud with Renewable Energy, ACM Trans. Multim. Comput. Commun. Appl. 20 (2024) 173:1–173:23. doi:10.1145/3643818. [74] D. O. Hanchuk, S. O. Semerikov, Implementing MLOps practices for effective machine learning model deployment: A meta synthesis, CEUR Workshop Proceedings 3918 (2024) 329–337. [75] D. O. Hanchuk, S. O. Semerikov, Automating machine learning: A meta-synthesis of MLOps tools, frameworks and architectures, CEUR Workshop Proceedings 3917 (2025) 362–414.uk
dc.description.abstractThe advent of large language models (LLMs) has revolutionized natural language processing, enabling unprecedented capabilities in text generation, reasoning, and human-machine interaction. However, their deployment on resource-constrained edge devices presents significant challenges due to high computational complexity, large model sizes, and stringent latency and privacy requirements. This survey provides a comprehensive examination of the emerging field of edge-based LLMs, exploring the techniques, frameworks, hardware solutions, and real-world applications that enable their efficient deployment at the edge. We review key strategies such as model quantization, pruning, knowledge distillation, and adapter tuning, alongside edge-cloud collaborative architectures like EdgeShard, Edge-LLM, and PAC. Additionally, we analyze hardware acceleration solutions, including Cambricon-LLM, AxLaM, and DTATrans/DTQAtten, and their role in overcoming resource limitations. The survey highlights diverse applications, from IoT and smart cities to personalized services and multi-modal intelligence, supported by case studies of real-world deployments. Finally, we discuss open challenges – such as resource efficiency, privacy, security, and scalability – and propose future research directions to advance this transformative technology.uk
dc.language.isoenuk
dc.publisherCEUR Workshop Proceedingsuk
dc.subjectedge computinguk
dc.subjectlarge language models (LLMs)uk
dc.subjectmodel compressionuk
dc.subjectedge-cloud collaborationuk
dc.subjecthardware accelerationuk
dc.subjectIoT applicationsuk
dc.subjectpersonalized servicesuk
dc.subjectmulti-modal intelligenceuk
dc.subjectprivacy-preserving AIuk
dc.subjectresource efficiencyuk
dc.subjectпериферійні обчислення-
dc.subjectвеликі мовні моделі (LLM)-
dc.subjectстиснення моделей-
dc.subjectспівпраця між периферійними хмарами-
dc.subjectапаратне прискорення-
dc.subjectдодатки Інтернету речей-
dc.subjectперсоналізовані послуги-
dc.subjectбагатомодальний інтелект-
dc.subjectштучний інтелект, що зберігає конфіденційність-
dc.subjectефективність використання ресурсів-
dc.titleLLM on the edge: the new frontieruk
Розташовується у зібраннях:Кафедра інформатики та прикладної математики

Файли цього матеріалу:
Файл Опис РозмірФормат 
paper28.pdf1.19 MBAdobe PDFПереглянути/Відкрити


Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.