Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time ...
Abstract: In this article, we propose an active target detector by using a shallow neural network (NN) with novel features under small sonar data, where deep learning (DL) models are restricted. The ...
This is the first step of a bigger audio-visual project. For now, I added to this repository a simple Streamlit app to preview your audio tracks from a specific folder and convert them into their Mel ...