A conceptual framework for deep learning-based multimodal emotion detection using facial expressions and physiological signals

Volume 3, Issue 1, Article Number: 261005 (2026)

👁 Views: 22 | ⬇ Downloads: 2

Akshata G. Shinde1,* | Shivaji G. Shinde1  

1TPCT’s College of Engineering, Dharashiv – 413501, Maharashtra (India)

*Corresponding Author: akshatashinde620@gmail.com

Received: 30 April 2026 | Revised: 18 May 2026

Accepted: 01 June 2026 | Published Online: 05 June 2026

DOI: https://doi.org/10.5281/zenodo.20514398

© 2026 The Authors, under a Creative Commons license, Published by Scholarly Publication

Abstract

There is increased interest in one aspect of this, namely the identification of emotion as an important field in affective computing, in which the ability of intelligent systems to analyze human emotions has been introduced and the efficiency of human–machine interactions has been improved. Facial expression recognition (FER) is the only modality used in traditional approaches to emotion recognition and consequently these approaches are less robust because of inter-personal and inter-environmental variability. In this paper, the recent advances in multimodal emotion recognition (MER) systems that involve combining the facial expression recognition through deep learning approaches with physiological signal-based emotion recognition (PSER) are reviewed. An overview of the reviewed works shows that a conceptual multimodal framework can be presented that consists of data acquisition, data preprocessing, feature extraction, multimodal fusion, and classification stages. Previous studies have extensively used Convolutional Neural Networks (CNNs) for facial feature extraction, while physiological signals like electrodermal activity (EDA) and heart rate variability (HRV) have been traditionally analyzed using statistical and frequency domain approaches. Emotional pattern learning across the various domains and fusion of heterogeneous multimodal representations can be achieved by feature-level fusion strategies, as reported in the literature. Recent studies in the literature have shown that temporal modeling approaches have proven to be more stable when facing dynamic environmental conditions. Architectures reviewed are typically modular and designed to be scalable and adaptable for future real-life implementation scenarios. The aim of the present work is primarily to review the latest emotion recognition systems based on multimodal approach and to present a conceptual framework based on the literature. Future research directions are suggested to be experimental implementation and quantitative validation.

Keywords

Embedded System Design, Energy Efficiency, FPGA, Low-Power Wireless Protocols, Microcontroller Integration, Optimization of Network Topologies, Smart Grid Communication, WSNs

References

  1. Kopalidis, T., Solachidis, V., Vretos, N., & Daras, P. (2024). Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets. Information, 15, 135.

[View Article]       [Google Scholar]

  1. Kim, J., & André, E. (2008). Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 2067–2083.

[View Article]       [Google Scholar]

  1. Barua, A., Ahmed, M. U., & Begum, S. (2023). A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions. IEEE Access, 11, 14804–14831.

[View Article]       [Google Scholar]

  1. Zhang, Z., Cui, P., & Zhu, W. (2022). Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 34, 249–270.

[View Article]       [Google Scholar]

  1. Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3, 42–55.

[View Article]       [Google Scholar]

  1. Sandeep, K., Sharma, N., & Narwade, N. (2024). Design of slotted patch MIMO antenna and investigation of antenna parameters for sub-6 5G network. International Research Journal of Multidisciplinary Scope, 5, 514–523.

[View Article]       [Google Scholar]

  1. Sandeep, K., Sharma, N., & Narawade, N. (2025). High-isolation dual-band slotted patch MIMO antenna for sub-6 GHz 5G applications. International Journal of Advanced Technology and Engineering Exploration, 12, 301.

[View Article]       [Google Scholar]

  1. Sandeep, K., Asmita, D., Shrishail, P., Shivale, N., & Sonawane, V. (2025). A novel four-element button mushroom MIMO antenna for enhanced sub-6 GHz 5G communication. International Research Journal of Multidisciplinary Scope, 6, 397–409.

[View Article]       [Google Scholar]

  1. Li, S., & Deng, W. (2022). Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing, 13, 1195–1215.

[View Article]       [Google Scholar]

  1. Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., Xu, X., & Yang, X. (2018). A review of emotion recognition using physiological signals. Sensors, 18, 2074.

[View Article]       [Google Scholar]

  1. Jiao, T., Guo, C., Feng, X., Chen, Y., & Song, J. (2024). A comprehensive survey on deep learning multi-modal fusion: Methods, technologies and applications. Computers, Materials & Continua, 80, 1-35.

[View Article]       [Google Scholar]

  1. Ma, X., & Sun, Y. (2022). Special issue on multi-modal information learning and analytics on big data. Neural Computing and Applications, 34, 3299–3300.

[View Article]       [Google Scholar]

  1. Cheng, W. X., Gao, R., Suganthan, P. N., & Yuen, K. F. (2022). EEG-based emotion recognition using random convolutional neural networks. Engineering Applications of Artificial Intelligence, 116, 105349.

[View Article]       [Google Scholar]

  1. Al-Zoghby, A. M., Al-Awadly, E. M., Ebada, A. I., & Awad, W. A. (2025). Overview of multimodal machine learning. ACM Transactions on Asian and Low-Resource Language Information Processing, 24, 1–20.

[View Article]       [Google Scholar]

  1. Kumar, P. S., Govarthan, P. K., Gadda, A. A. S., Ganapathy, N., & Ronickom, J. F. A. (2024). Deep learning-based automated emotion recognition using multimodal physiological signals and time-frequency methods. IEEE Transactions on Instrumentation and Measurement, 73, 1-12.

[View Article]       [Google Scholar]

  1. Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information fusion, 37, 98-125.

[View Article]       [Google Scholar]

  1. Dzedzickis, A., Kaklauskas, A., & Bucinskas, V. (2020). Human emotion recognition: Review of sensors and methods. Sensors, 20, 592.

[View Article]       [Google Scholar]

  1. Malik, S. S., Ilyas, M., Haq, Y. U., Sana, R., Razzaq, M. S., Maqbool, F., & Pathan, M. S. (2025). Multi-modal emotion detection and sentiment analysis. IEEE Access, 13, 59790-59810.

[View Article]       [Google Scholar] 

  1. Dewi, C., Gunawan, L. S., Hastoko, S. G., & Christanto, H. J. (2024). Real-time facial expression recognition: advances, challenges, and future directions. Vietnam Journal of Computer Science, 11, 167-193.

[View Article]       [Google Scholar]

  1. Priyadarshini, N., & Aravinth, J. (2023, May). Emotion Recognition based on fusion of multimodal physiological signals using LSTM and GRU. In 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC) (pp. 1-6). IEEE.

[View Article]       [Google Scholar]

  1. Dalal, D., Talreja, D., Vaidya, D., Narvekar, M., & Ghag, K. (2023, October). Comparing the Effects of Various Preprocessing Techniques on the Performance of CNN for Facial Emotion Recognition. In 2023 International Conference on Advanced Computing Technologies and Applications (ICACTA) (pp. 1-6). IEEE.

[View Article]       [Google Scholar]

  1. Chen, P., Li, J., Peng, B., Liu, Z., & Zhou, L. (2025). A 1-Dimensional Physiological Signal Prediction Method Based on Composite Feature Preprocessing and Multi-Scale Modeling. Sensors, 25, 6726.

[View Article]       [Google Scholar]

  1. Wu, Z., Gan, J., Liu, J., & Wang, J. (2023, November). A multimodal emotion recognition method based on multiple fusion of audio-visual modalities. In Proceedings of the 2023 5th International Conference on Video, Signal and Image Processing (pp. 108-114).

[View Article]       [Google Scholar]

  1. Patel, P., R, R., & Annavarapu, R. N. (2021). EEG-based human emotion recognition using entropy as a feature extraction measure. Brain informatics, 8, 20.

[View Article]       [Google Scholar]

  1. Das, R., & Singh, T. D. (2023). Multimodal sentiment analysis: a survey of methods, trends, and challenges. ACM Computing Surveys, 55, 1-38.

[View Article]       [Google Scholar]

  1. Zhao, K., Zheng, M., Li, Q., & Liu, J. (2025). Multimodal sentiment analysis-a comprehensive survey from a fusion methods perspective. IEEE Access, 13, 64556-64583.

[View Article]       [Google Scholar]

  1. Molino-Minero-Re, E., Aguileta, A. A., Brena, R. F., & Garcia-Ceja, E. (2021). Improved accuracy in predicting the best sensor fusion architecture for multiple domains. Sensors, 21, 7007.

[View Article]       [Google Scholar]

  1. Abdulrahman, R., Jamil, A., Amjad, A., Hussain, S., Azhar, M., Aslam, Z., Shabbir, I., Ahmad, W., Mansab, A. A., Akbar, M. H., & Waqas, M. (2025). Automated Deep Learning Approaches for Multimodal Emotion Recognition: A Review of Fusion Strategies, Modalities and Architectures. Machines and Algorithms, 4, 198-214.

[View Article]       [Google Scholar]

  1. Peña, D., Aguilera, A., Dongo, I., Heredia, J., & Cardinale, Y. (2023). A framework to evaluate fusion methods for multimodal emotion recognition. IEEE Access, 11, 10218-10237.

[View Article]       [Google Scholar]

  1. Sanku, R., Singireddy, S., Nandini, M. R., Dhanamalar, M., & Soni, M. (2025). Comprehensive Insights Into Multimodal Emotion Recognition Using Machine Learning and Deep Learning. In 2025 International Conference on Communication, Computer, and Information Technology (IC3IT) (pp. 01-08). IEEE.

[View Article]       [Google Scholar]

  1. Kalateh, S., Estrada-Jimenez, L. A., Nikghadam-Hojjati, S., & Barata, J. (2024). A systematic review on multimodal emotion recognition: building blocks, current state, applications, and challenges. IEEE Access, 12, 103976-104019.

[View Article]       [Google Scholar]

  1. Zhu, X., Liu, Z., Cambria, E., Yu, X., Fan, X., Chen, H., & Wang, R. (2025). A client–server based recognition system: Non-contact single/multiple emotional and behavioral state assessment methods. Computer Methods and Programs in Biomedicine, 260, 108564.

[View Article]       [Google Scholar]

  1. Zhu, X., Guo, C., Feng, H., Huang, Y., Feng, Y., Wang, X., & Wang, R. (2024). A review of key technologies for emotion analysis using multimodal information. Cognitive Computation, 16, 1504-1530.

[View Article]       [Google Scholar]

  1. Wehrli, S., Hertweck, C., Amirian, M., Glüge, S., & Stadelmann, T. (2022). Bias, awareness, and ignorance in deep-learning-based face recognition. AI and Ethics, 2, 509-522.

[View Article]       [Google Scholar]

  1. Serna, I., Morales, A., Fierrez, J., & Obradovich, N. (2022). Sensitive loss: Improving accuracy and fairness of face representations with discrimination-aware deep learning. Artificial Intelligence, 305, 103682.

[View Article]       [Google Scholar]

  1. Ramaswamy, M. P. A., & Palaniswamy, S. (2024). Multimodal emotion recognition: A comprehensive review, trends, and challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14, e1563.

[View Article]       [Google Scholar]

  1. Zhao, S., Jia, G., Yang, J., Ding, G., & Keutzer, K. (2021). Emotion recognition from multiple modalities: Fundamentals and methodologies. IEEE Signal Processing Magazine, 38, 59-73.

[View Article]       [Google Scholar]

  1. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops (pp. 94-101). IEEE.

[View Article]       [Google Scholar]

  1. Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., … & Bengio, Y. (2015). Challenges in representation learning: A report on three machine learning contests. Neural networks, 64, 59-63.

[View Article]       [Google Scholar]

  1. Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T.,
    Pun, T., Nijholt, A., & Patras, I. (2011). Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing, 3, 18-31.

[View Article]       [Google Scholar]

  1. Sharma, G., & Dhall, A. (2020). A survey on automatic multimodal emotion recognition in the wild. In Advances in data science: Methodologies and applications (pp. 35-64). Cham: Springer International Publishing.

[View Chapter]     [Google Scholar]

Cite This Article

A. G. Shinde and S. G. Shinde, “A conceptual framework for deep learning-based multimodal emotion detection using facial expressions and physiological signals,” Radius: Journal of Science and Technology 3(1) (2026) 261005. https://doi.org/10.5281/zenodo.20514398

Rights & Permission

This is an open access article published under the Creative Commons Attribution (CC BY) International License, which allows unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. No permission is needed to reuse this content under the terms of the license.
For uses not covered above, please contact the Scholarly Publication Rights Department.