A conceptual framework for deep learning-based multimodal emotion detection using facial expressions and physiological signals

Akshata G. Shinde; Shivaji G. Shinde

doi:10.5281/zenodo.20514398

A conceptual framework for deep learning-based multimodal emotion detection using facial expressions and physiological signals

Radius: Journal of Science and Technology

Volume 3, Issue 1, Article Number: 261005 (2026)

👁 Views: 238 | ⬇ Downloads: 2

Home >> Radius >> Volume 3, Issue 1

Akshata G. Shinde^1,*| Shivaji G. Shinde¹

¹TPCT’s College of Engineering, Dharashiv – 413501, Maharashtra (India)

^*Corresponding Author: akshatashinde620@gmail.com

Received: 30 April 2026 | Revised: 18 May 2026

Accepted: 01 June 2026 | Published Online: 05 June 2026

DOI: https://doi.org/10.5281/zenodo.20514398

Abstract

There is increased interest in one aspect of this, namely the identification of emotion as an important field in affective computing, in which the ability of intelligent systems to analyze human emotions has been introduced and the efficiency of human–machine interactions has been improved. Facial expression recognition (FER) is the only modality used in traditional approaches to emotion recognition and consequently these approaches are less robust because of inter-personal and inter-environmental variability. In this paper, the recent advances in multimodal emotion recognition (MER) systems that involve combining the facial expression recognition through deep learning approaches with physiological signal-based emotion recognition (PSER) are reviewed. An overview of the reviewed works shows that a conceptual multimodal framework can be presented that consists of data acquisition, data preprocessing, feature extraction, multimodal fusion, and classification stages. Previous studies have extensively used Convolutional Neural Networks (CNNs) for facial feature extraction, while physiological signals like electrodermal activity (EDA) and heart rate variability (HRV) have been traditionally analyzed using statistical and frequency domain approaches. Emotional pattern learning across the various domains and fusion of heterogeneous multimodal representations can be achieved by feature-level fusion strategies, as reported in the literature. Recent studies in the literature have shown that temporal modeling approaches have proven to be more stable when facing dynamic environmental conditions. Architectures reviewed are typically modular and designed to be scalable and adaptable for future real-life implementation scenarios. The aim of the present work is primarily to review the latest emotion recognition systems based on multimodal approach and to present a conceptual framework based on the literature. Future research directions are suggested to be experimental implementation and quantitative validation.

Keywords

Embedded System Design, Energy Efficiency, FPGA, Low-Power Wireless Protocols, Microcontroller Integration, Optimization of Network Topologies, Smart Grid Communication, WSNs

References

Kopalidis, T., Solachidis, V., Vretos, N., & Daras, P. (2024). Advances in facial expression recognition: A survey of methods, benchmarks, models, and datasets. Information, 15, 135.

A conceptual framework for deep learning-based multimodal emotion detection using facial expressions and physiological signals

Abstract

Keywords

References

Cite This Article

Rights & Permission