Skip to main content

Project SpeechRecognition

The SpeechRecognition project, developed as part of the Digital Signal Processing Techniques course, aimed to create a simple EdgeAI system capable of classifying basic voice commands such as left, right, stop, and go using a neural network deployed on an STM32 microcontroller.

At the beginning, the architecture of the entire system was defined, which includes three main components:

  • Signal acquisition (analog microphone and sampling using ADC)
  • Audio preprocessing (filtering, decimation to 8kHz, conversion to spectrogram, and spectral noise reduction)
  • Spectrogram classification

The neural network was trained offline using the TensorFlow library on the Speech Commands dataset. Below is the confusion matrix for the best-performing model on the test data.

The best model was then fine-tuned using samples recorded from the target microphone connected to the microcontroller. The number of output classes was reduced to 4. Before deployment to the STM32 microcontroller using the X-Cube-AI tool, the network was further optimized and quantized.

Below is the confusion matrix showing the results of system-level tests.

The system’s performance could be improved by using a MEMS microphone with a digital I2S interface, increasing the audio sampling rate to fs = 16kHz and deploying on a microcontroller with more available SRAM memory.