Sparse landmarks for facial action unit detection using vision transformer and perceiver
Künye
Cakir, D., Yilmaz, G., & Arica, N. (2024). Sparse landmarks for facial action unit detection using vision transformer and perceiver. International Journal of Computational Science and Engineering, 27(5), 607-620.Özet
The ability to accurately detect facial expressions, represented by facial action units (AUs), holds significant implications across diverse fields such as mental health diagnosis, security, and human-computer interaction. Although earlier approaches have made progress, the burgeoning complexity of facial actions demands more nuanced, computationally efficient techniques. This study pioneers the integration of sparse learning with vision transformer (ViT) and perceiver networks, focusing on the most active and descriptive landmarks for AU detection across both controlled (DISFA, BP4D) and in-the-wild (EmotioNet) datasets. Our novel approach, employing active landmark patches instead of the whole face, not only attains state-of-the-art performance but also uncovers insights into the differing attention mechanisms of ViT and perceiver. This fusion of techniques marks a significant advancement in facial analysis, potentially reshaping strategies in noise reduction and patch optimisation, setting a robust foundation for future research in the domain.