Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал:
http://elibrary.kdpu.edu.ua/xmlui/handle/123456789/12014Повний запис метаданих
| Поле DC | Значення | Мова |
|---|---|---|
| dc.contributor.author | Семеріков, Сергій Олексійович | - |
| dc.contributor.author | Вакалюк, Тетяна Анатоліївна | - |
| dc.contributor.author | Каневська, Ольга Борисівна | - |
| dc.contributor.author | Моісеєнко, Михайло Вікторович | - |
| dc.contributor.author | Дончев, Іван Іванович | - |
| dc.contributor.author | Колгатін, Андрій Олександрович | - |
| dc.date.accessioned | 2025-06-24T10:29:21Z | - |
| dc.date.available | 2025-06-24T10:29:21Z | - |
| dc.date.issued | 2025-03-14 | - |
| dc.identifier.citation | Semerikov S.O., Vakaliuk T.A., Kanevska O.B., Moiseienko M.V., Donchev I.I., Kolhatin A.O. LLM on the edge: the new frontier / CEUR Workshop Proceedings. – 2025. – Vol. 3943. – P. 137–161. – Access mode: https://ceur-ws.org/Vol-3943/paper28.pdf | uk |
| dc.identifier.uri | http://elibrary.kdpu.edu.ua/xmlui/handle/123456789/12014 | - |
| dc.identifier.uri | https://ceur-ws.org/Vol-3943/paper28.pdf | - |
| dc.description | [1] A. V. Slobodianiuk, S. O. Semerikov, Advances in neural text generation: A systematic review (2022-2024), CEUR Workshop Proceedings 3917 (2025) 332–361. [2] R. O. Liashenko, S. O. Semerikov, Bibliometric analysis and experimental assessment of chatbot training approaches, CEUR Workshop Proceedings 3917 (2025) 199–225. [3] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 4171–4186. doi:10. 18653/V1/N19-1423. [4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, The Journal of Machine Learning Research 21 (2020) 5485–5551. [5] O. Friha, M. Amine Ferrag, B. Kantarci, B. Cakmak, A. Ozgun, N. Ghoualmi-Zine, LLMBased Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, IEEE Open Journal of the Communications Society 5 (2024) 5799–5856. doi:10.1109/OJCOMS.2024.3456549. [6] F. Cai, D. Yuan, Z. Yang, L. Cui, Edge-LLM: A Collaborative Framework for Large Language Model Serving in Edge Computing, in: R. N. Chang, C. K. Chang, Z. Jiang, J. Yang, Z. Jin, M. Sheng, J. Fan, K. K. Fletcher, Q. He, Q. He, C. Ardagna, J. Yang, J. Yin, Z. Wang, A. Beheshti, S. Russo, N. Atukorala, J. Wu, P. S. Yu, H. Ludwig, S. Reiff-Marganiec, E. Zhang, A. Sailer, N. Bena, K. Li, Y. Watanabe, T. Zhao, S. Wang, Z. Tu, Y. Wang, K. Wei (Eds.), Proceedings of the IEEE International Conference on Web Services, ICWS, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 799–809. doi:10.1109/ICWS62655.2024.00099. [7] Q. Li, J. Wen, H. Jin, Governing Open Vocabulary Data Leaks Using an Edge LLM through Programming by Example, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8 (2024) 179. doi:10.1145/3699760. [8] S. Bhardwaj, P. Singh, M. K. Pandit, A Survey on the Integration and Optimization of Large Language Models in Edge Computing Environments, in: 2024 16th International Conference on Computer and Automation Engineering, ICCAE 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 168–172. doi:10.1109/ICCAE59995.2024.10569285. [9] M. Zhang, X. Shen, J. Cao, Z. Cui, S. Jiang, EdgeShard: Efficient LLM Inference via Collaborative Edge Computing, IEEE Internet of Things Journal (2024). doi:10.1109/JIOT.2024.3524255. [10] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language Models are Few-Shot Learners, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020, pp. 1877–1901. URL: https://proceedings. neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html. [11] Y. Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, K. B. Letaief, Large Language Models Empowered Autonomous Edge AI for Connected Intelligence, IEEE Communications Magazine 62 (2024) 140–146. doi:10.1109/MCOM.001.2300550. [12] Z. Yu, Z. Wang, Y. Li, R. Gao, X. Zhou, S. R. Bommu, Y. K. Zhao, Y. C. Lin, EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting, in: Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 327. doi:10.1145/3649329. 3658473. [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is All you Need, in: I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008. URL: https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html. [14] D. Qiao, X. Ao, Y. Liu, X. Chen, F. Song, Z. Qin, W. Jin, Tri-AFLLM: Resource-Efficient Adaptive Asynchronous Accelerated Federated LLMs, IEEE Transactions on Circuits and Systems for Video Technology (2024). doi:10.1109/TCSVT.2024.3519790. [15] L. E. Erdogan, N. Lee, S. Jha, S. Kim, R. Tabrizi, S. Moon, C. Hooper, G. Anumanchipalli, K. Keutzer, A. Gholami, TinyAgent: Function Calling at the Edge, in: D. I. H. Farias, T. Hope, M. Li (Eds.), EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations, Association for Computational Linguistics (ACL), 2024, pp. 80–88. [16] Z. Wang, J. Yang, X. Qian, S. Xing, X. Jiang, C. Lv, S. Zhang, MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices, in: Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops, MMAsia ’24 Workshops, Association for Computing Machinery, New York, NY, USA, 2024, p. 11. doi:10.1145/3700410.3702126. [17] A. Candel, J. McKinney, P. Singer, P. Pfeiffer, M. Jeblick, C. M. Lee, M. V. Conde, H2O Open Ecosystem for State-of-the-art Large Language Models, in: Y. Feng, E. Lefever (Eds.), EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings of the System Demonstrations, Association for Computational Linguistics (ACL), 2023, pp. 82–89. [18] X. Shen, Z. Han, L. Lu, Z. Kong, P. Dong, Z. Li, Y. Xie, C. Wu, M. Leeser, P. Zhao, X. Lin, Y. Wang, HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024). doi:10.1109/ TCAD.2024.3487781. [19] T. Tambe, J. Zhang, C. Hooper, T. Jia, P. N. Whatmough, J. Zuckerman, M. C. D. Santos, E. J. Loscalzo, D. Giri, K. Shepard, L. Carloni, A. Rush, D. Brooks, G.-Y. Wei, 22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management, in: Digest of Technical Papers - IEEE International Solid-State Circuits Conference, volume 2023-February, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 342–344. doi:10.1109/ISSCC42615.2023.10067817. [20] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, CoRR abs/1910.01108 (2019). URL: http://arxiv.org/abs/1910.01108. arXiv:1910.01108. [21] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly, Parameter-Efficient Transfer Learning for NLP, in: K. Chaudhuri, R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 2790–2799. URL: http://proceedings.mlr.press/v97/houlsby19a.html. [22] G. E. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, CoRR abs/1503.02531 (2015). URL: http://arxiv.org/abs/1503.02531. arXiv:1503.02531. [23] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, Q. Liu, TinyBERT: Distilling BERT for natural language understanding, in: T. Cohn, Y. He, Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 4163–4174. doi:10.18653/v1/2020.findings-emnlp.372. [24] X. Zhou, Q. Jia, Y. Hu, R. Xie, T. Huang, F. R. Yu, GenG: An LLM-Based Generic Time Series Data Generation Approach for Edge Intelligence via Cross-Domain Collaboration, in: IEEE INFOCOM 2024 - IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 1–6. doi:10.1109/INFOCOMWKSHPS61880.2024.10620716. [25] W. Zhao, W. Jing, Z. Lu, X. Wen, Edge and Terminal Cooperation Enabled LLM Deployment Optimization in Wireless Network, in: International Conference on Communications in China, ICCC Workshops 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 220–225. doi:10.1109/ICCCWorkshops62562.2024.10693742. [26] Y. Yao, Z. Li, H. Zhao, GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment, in: L.-W. Ku, A. Martins, V. Srikumar (Eds.), Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2024, pp. 3433–3446. [27] Z. Yao, Z. Tang, J. Lou, P. Shen, W. Jia, VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework, in: R. N. Chang, C. K. Chang, Z. Jiang, J. Yang, Z. Jin, M. Sheng, J. Fan, K. K. Fletcher, Q. He, Q. He, C. Ardagna, J. Yang, J. Yin, Z. Wang, A. Beheshti, S. Russo, N. Atukorala, J. Wu, P. S. Yu, H. Ludwig, S. Reiff-Marganiec, E. Zhang, A. Sailer, N. Bena, K. Li, Y. Watanabe, T. Zhao, S. Wang, Z. Tu, Y. Wang, K. Wei (Eds.), Proceedings of the IEEE International Conference on Web Services, ICWS, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 865–876. doi:10.1109/ICWS62655.2024.00105. [28] N. Nazari, F. Xiang, C. Fang, H. M. Makrani, A. Puri, K. Patwari, H. Sayadi, S. Rafatirad, C.-N. Chuah, H. Homayoun, LLM-FIN: Large Language Models Fingerprinting Attack on Edge Devices, in: Proceedings - International Symposium on Quality Electronic Design, ISQED, IEEE Computer Society, 2024, pp. 1–6. doi:10.1109/ISQED60706.2024.10528736. [29] B. Ouyang, S. Ye, L. Zeng, T. Qian, J. Li, X. Chen, Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuning, in: Proceedings of the 53rd International Conference on Parallel Processing, ICPP ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 762–771. doi:10.1145/3673038.3673043. [30] Z. Yu, S. Liang, T. Ma, Y. Cai, Z. Nan, D. Huang, X. Song, Y. Hao, J. Zhang, T. Zhi, Y. Zhao, Z. Du, X. Hu, Q. Guo, T. Chen, Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM, in: Proceedings of the Annual International Symposium on Microarchitecture, MICRO, IEEE Computer Society, 2024, pp. 1474–1488. doi:10.1109/MICRO61859.2024.00108. [31] T. Glint, B. Mittal, S. Sharma, A. Q. Ronak, A. Goud, N. Kasture, Z. Momin, A. Krishna, J. Mekie, AxLaM: Energy-efficient accelerator design for language models for edge computing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 383 (2025) 20230395. doi:10.1098/rsta.2023.0395. [32] T. Yang, F. Ma, X. Li, F. Liu, Y. Zhao, Z. He, L. Jiang, DTATrans: Leveraging Dynamic Token-Based Quantization With Accuracy Compensation Mechanism for Efficient Transformer Architecture, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42 (2023) 509–520. doi:10.1109/TCAD.2022.3181541. [33] T. Yang, D. Li, Z. Song, Y. Zhao, F. Liu, Z. Wang, Z. He, L. Jiang, DTQAtten: Leveraging Dynamic Token-based Quantization for Efficient Attention Architecture, in: C. Bolchini, I. Verbauwhede, I. Vatajelu (Eds.), Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 700–705. doi:10.23919/DATE54114.2022.9774692. [34] M. Ibrahim, Z. Wan, H. Li, P. Panda, T. Krishna, P. Kanerva, Y. Chen, A. Raychowdhury, Special Session: Neuro-Symbolic Architecture Meets Large Language Models: A Memory-Centric Perspective, in: Proceedings - 2024 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 11–20. doi:10.1109/CODES-ISSS60120.2024.00012. [35] A. Basit, M. Shafique, TinyDigiClones: A Multi-Modal LLM-Based Framework for Edge-optimized Personalized Avatars, in: Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 1–9. doi:10.1109/IJCNN60899. 2024.10649909. [36] L. Wu, Y. Zhao, C. Wang, T. Liu, H. Wang, A First Look at LLM-powered Smartphones, in: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops, ASEW ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 208–217. doi:10.1145/3691621.3694952. [37] D. Zhang, W. Shi, Blockchain-based Edge Intelligence Enabled by AI Large Models for Future Internet of Things, in: 2024 IEEE 12th International Conference on Information and Communication Networks, ICICN 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 368–374. doi:10.1109/ICICN62625.2024.10761527. [38] Y. Rong, Y. Mao, X. He, M. Chen, Large-Scale Traffic Flow Forecast with Lightweight LLM in Edge Intelligence, IEEE Internet of Things Magazine 8 (2025) 12–18. doi:10.1109/IOTM.001.2400047. [39] F. Piccialli, D. Chiaro, P. Qi, V. Bellandi, E. Damiani, Federated and edge learning for large language models, Information Fusion 117 (2025) 102840. doi:10.1016/j.inffus.2024.102840. [40] J. Du, T. Lin, C. Jiang, Q. Yang, C. F. Bader, Z. Han, Distributed Foundation Models for MultiModal Learning in 6G Wireless Networks, IEEE Wireless Communications 31 (2024) 20–30. doi:10.1109/MWC.009.2300501. [41] Y. Hu, Y. Wang, R. Liu, Z. Shen, H. Lipson, Reconfigurable Robot Identification from Motion Data, in: IEEE International Conference on Intelligent Robots and Systems, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 14133–14140. doi:10.1109/IROS58592.2024.10801809. [42] K. Kawaharazuka, Y. Obinata, N. Kanazawa, K. Okada, M. Inaba, Robotic Applications of PreTrained Vision-Language Models to Various Recognition Behaviors, in: IEEE-RAS International Conference on Humanoid Robots, IEEE Computer Society, 2023, pp. 1–8. doi:10.1109/ Humanoids57100.2023.10375211. [43] C. Xu, X. Hou, J. Liu, C. Li, T. Huang, X. Zhu, M. Niu, L. Sun, P. Tang, T. Xu, K.-T. Cheng, M. Guo, MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their HardwareSoftware Implications, in: Proceedings - 2023 IEEE International Symposium on Workload Characterization, IISWC 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 154–166. doi:10.1109/IISWC59245.2023.00014. [44] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, J. Dean, Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL: https://openreview.net/forum?id=B1ckMDqlg. [45] C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, JMLR.org, 2017, p. 1126–1135. [46] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. H. Yoon, In-Datacenter Performance Analysis of a Tensor Processing Unit, SIGARCH Comput. Archit. News 45 (2017) 1–12. doi:10. 1145/3140659.3080246. [47] Y. Xue, Y. Liu, J. Huang, System Virtualization for Neural Processing Units, in: Proceedings of the 19th Workshop on Hot Topics in Operating Systems, HOTOS ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 80–86. doi:10.1145/3593856.3595912. [48] C. Dwork, A. Roth, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science 9 (2014) 211–407. doi:10.1561/0400000042. [49] D. Kim, Y. Lee, S. Cheon, H. Choi, J. Lee, H. Youm, D. Lee, H. Kim, Privacy Set: Privacy-AuthorityAware Compiler for Homomorphic Encryption on Edge-Cloud System, IEEE Internet Things J. 11 (2024) 35167–35184. doi:10.1109/JIOT.2024.3437356. [50] M. Sabt, M. Achemlal, A. Bouabdallah, Trusted Execution Environment: What It is, and What It is Not, in: 2015 IEEE Trustcom/BigDataSE/ISPA, volume 1, 2015, pp. 57–64. doi:10.1109/ Trustcom.2015.357. [51] P. Dubey, M. Kumar, Integrating Explainable AI with Federated Learning for Next-Generation IoT: A comprehensive review and prospective insights, Computer Science Review 56 (2025) 100697. doi:10.1016/J.COSREV.2024.100697. [52] A. Petrella, M. Miozzo, P. Dini, Mobile Traffic Prediction at the Edge Through Distributed and Deep Transfer Learning, IEEE Access 12 (2024) 191288–191303. doi:10.1109/ACCESS.2024.3518483. [53] B. Thomas, S. Kessler, S. Karout, Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition, in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7102–7106. doi:10.1109/ICASSP43922.2022. 9746223. [54] L. Falissard, S. Affeldt, M. Nadif, Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations, in: G. Nicosia, V. Ojha, E. L. Malfa, G. L. Malfa, P. M. Pardalos, R. Umeton (Eds.), Machine Learning, Optimization, and Data Science - 9th International Conference, LOD 2023, Grasmere, UK, September 22-26, 2023, Revised Selected Papers, Part I, volume 14505 of Lecture Notes in Computer Science, Springer, 2023, pp. 488–496. doi:10.1007/ 978-3-031-53969-5_36. [55] B. Yuan, Y. Chen, Y. Zhang, W. Jiang, Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLMs-Powered Assistance, in: L. Ku, A. Martins, V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Association for Computational Linguistics, 2024, pp. 10977–11011. doi:10.18653/V1/2024.ACL-LONG.592. [56] Q. Zhang, C. Xu, J. Li, Y. Sun, J. Bao, D. Zhang, LLM-TSFD: An industrial time series human-inthe-loop fault diagnosis method based on a large language model, Expert Syst. Appl. 264 (2025). doi:10.1016/j.eswa.2024.125861. [57] M. Garofalo, M. Colosi, A. Catalfamo, M. Villari, Web-Centric Federated Learning over the Cloud-Edge Continuum Leveraging ONNX and WASM, in: IEEE Symposium on Computers and Communications, ISCC 2024, Paris, France, June 26-29, 2024, IEEE, 2024, pp. 1–7. doi:10.1109/ ISCC61673.2024.10733614. [58] I. D. Martinez-Casanueva, L. Bellido, C. M. Lentisco, D. Fernández, An Initial Approach to a Multiaccess Edge Computing Reference Architecture Implementation Using Kubernetes, in: H. Gao, R. J. D. Barroso, S. Pang, R. Li (Eds.), Broadband Communications, Networks, and Systems - 11th EAI International Conference, BROADNETS 2020, Qingdao, China, December 11-12, 2020, Proceedings, volume 355 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Springer, 2020, pp. 185–193. doi:10.1007/978-3-030-68737-3_13. [59] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, S. Lim, A survey and comparison of peer-to-peer overlay network schemes, IEEE Communications Surveys & Tutorials 7 (2005) 72–93. doi:10. 1109/COMST.2005.1610546. [60] F. Zhu, F. Huang, Y. Yu, G. Liu, T. Huang, Task Offloading with LLM-Enhanced Multi-Agent Reinforcement Learning in UAV-Assisted Edge Computing, Sensors 25 (2025) 175. doi:10.3390/ s25010175. [61] M. Xu, D. Niyato, C. G. Brinton, Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading, CoRR abs/2501.14205 (2025). doi:10.48550/ARXIV.2501.14205. arXiv:2501.14205. [62] C. Fu, Y. Su, K. Su, Y. Liu, J. Shi, B. Wu, C. Liu, C. T. Ishi, H. Ishiguro, HAM-GNN: A hierarchical attention-based multi-dimensional edge graph neural network for dialogue act classification, Expert Syst. Appl. 261 (2025) 125459. doi:10.1016/J.ESWA.2024.125459. [63] M. Yang, Y. Yang, P. Jiang, A design method for edge–cloud collaborative product service system: a dynamic event-state knowledge graph-based approach with real case study, International Journal of Production Research 62 (2024) 2584–2605. doi:10.1080/00207543.2023.2219345. [64] N. Wang, J. Xie, H. Luo, Q. Cheng, J. Wu, M. Jia, L. Li, Efficient Image Captioning for Edge Devices, in: B. Williams, Y. Chen, J. Neville (Eds.), Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, AAAI Press, 2023, pp. 2608–2616. doi:10.1609/AAAI.V37I2. 25359. [65] Y. Wang, Y. Dong, S. Guo, Y. Yang, X. Liao, Latency-Aware Adaptive Video Summarization for Mobile Edge Clouds, IEEE Trans. Multim. 22 (2020) 1193–1207. doi:10.1109/TMM.2019. 2939753. [66] R. Liashenko, S. Semerikov, The Determination and Visualisation of Key Concepts Related to the Training of Chatbots, in: E. Faure, Y. Tryus, T. Vartiainen, O. Danchenko, M. Bondarenko, C. Bazilo, G. Zaspa (Eds.), Information Technology for Education, Science, and Technics, volume 222 of Lecture Notes on Data Engineering and Communications Technologies, Springer Nature Switzerland, Cham, 2024, pp. 111–126. doi:10.1007/978-3-031-71804-5_8. [67] V. Mukovoz, T. Vakaliuk, S. Semerikov, Road Sign Recognition Using Convolutional Neural Networks, in: Information Technology for Education, Science, and Technics, volume 222 of Lecture Notes on Data Engineering and Communications Technologies, Springer Nature Switzerland, Cham, 2024, pp. 172–188. doi:10.1007/978-3-031-71804-5_12. [68] M. Fakih, R. Dharmaji, Y. Moghaddas, G. Quiros, O. Ogundare, M. A. Al Faruque, LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems, in: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 192–203. doi:10.1145/3639477.3639743. [69] S. Ji, X. Zheng, J. Sun, R. Chen, W. Gao, M. Srivastava, MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM, CoRR abs/2409.10064 (2024). doi:10.48550/ ARXIV.2409.10064. arXiv:2409.10064. [70] E. Strubell, A. Ganesh, A. McCallum, Energy and Policy Considerations for Modern Deep Learning Research, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020) 13693–13696. doi:10.1609/aaai.v34i09.7123. [71] A. Khoshsirat, G. Perin, M. Rossi, Decentralized LLM inference over edge networks with energy harvesting, CoRR abs/2408.15907 (2024). doi:10.48550/ARXIV.2408.15907. arXiv:2408.15907. [72] I. Mohiuddin, A. Almogren, Workload aware VM consolidation method in edge/cloud computing for iot applications, J. Parallel Distributed Comput. 123 (2019) 204–214. doi:10.1016/J.JPDC. 2018.09.011. [73] M. S. Hossain, Y. Hao, L. Hu, J. Liu, G. Wei, M. Chen, Immersive Multimedia Service Caching in Edge Cloud with Renewable Energy, ACM Trans. Multim. Comput. Commun. Appl. 20 (2024) 173:1–173:23. doi:10.1145/3643818. [74] D. O. Hanchuk, S. O. Semerikov, Implementing MLOps practices for effective machine learning model deployment: A meta synthesis, CEUR Workshop Proceedings 3918 (2024) 329–337. [75] D. O. Hanchuk, S. O. Semerikov, Automating machine learning: A meta-synthesis of MLOps tools, frameworks and architectures, CEUR Workshop Proceedings 3917 (2025) 362–414. | uk |
| dc.description.abstract | The advent of large language models (LLMs) has revolutionized natural language processing, enabling unprecedented capabilities in text generation, reasoning, and human-machine interaction. However, their deployment on resource-constrained edge devices presents significant challenges due to high computational complexity, large model sizes, and stringent latency and privacy requirements. This survey provides a comprehensive examination of the emerging field of edge-based LLMs, exploring the techniques, frameworks, hardware solutions, and real-world applications that enable their efficient deployment at the edge. We review key strategies such as model quantization, pruning, knowledge distillation, and adapter tuning, alongside edge-cloud collaborative architectures like EdgeShard, Edge-LLM, and PAC. Additionally, we analyze hardware acceleration solutions, including Cambricon-LLM, AxLaM, and DTATrans/DTQAtten, and their role in overcoming resource limitations. The survey highlights diverse applications, from IoT and smart cities to personalized services and multi-modal intelligence, supported by case studies of real-world deployments. Finally, we discuss open challenges – such as resource efficiency, privacy, security, and scalability – and propose future research directions to advance this transformative technology. | uk |
| dc.language.iso | en | uk |
| dc.publisher | CEUR Workshop Proceedings | uk |
| dc.subject | edge computing | uk |
| dc.subject | large language models (LLMs) | uk |
| dc.subject | model compression | uk |
| dc.subject | edge-cloud collaboration | uk |
| dc.subject | hardware acceleration | uk |
| dc.subject | IoT applications | uk |
| dc.subject | personalized services | uk |
| dc.subject | multi-modal intelligence | uk |
| dc.subject | privacy-preserving AI | uk |
| dc.subject | resource efficiency | uk |
| dc.subject | периферійні обчислення | - |
| dc.subject | великі мовні моделі (LLM) | - |
| dc.subject | стиснення моделей | - |
| dc.subject | співпраця між периферійними хмарами | - |
| dc.subject | апаратне прискорення | - |
| dc.subject | додатки Інтернету речей | - |
| dc.subject | персоналізовані послуги | - |
| dc.subject | багатомодальний інтелект | - |
| dc.subject | штучний інтелект, що зберігає конфіденційність | - |
| dc.subject | ефективність використання ресурсів | - |
| dc.title | LLM on the edge: the new frontier | uk |
| Розташовується у зібраннях: | Кафедра інформатики та прикладної математики | |
Файли цього матеріалу:
| Файл | Опис | Розмір | Формат | |
|---|---|---|---|---|
| paper28.pdf | 1.19 MB | Adobe PDF | Переглянути/Відкрити |
Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.