|1.||Speech database collection|
It is necessary to train each of the sub-networks for the recognition of emotions. The most important and most difficult issue for neural network training is how to collect a large amount of speech data containing emotions. As our target is content-independent emotion recognition, we adopted one hundred phoneme balanced words for a training word set. Since it is difficult for ordinary people to intentionally utter them with emotions, we have adopted the following strategy.
|2.||Training and recognition experiment|
We used thirty speakers for training out of the fifty speakers used for data collection. To learn the effect of the number of speakers used for the training, we carried out five types of neural net trainings using one to thirty speakers. In addition, we carried out two types of recognition experiments to evaluate the performance of the obtained neural networks.