Hi,
I read your paper Speaker Diarization using Deep Recurrent Convolutional Neural Networks for speaker embedding. The details were very clear regarding the convolutional part.
But for the 2 recurrent blocks, how many neurons you used?
Which way did u flatten the 2nd recurrent layer to connect with the fullt conencted layer?
The embeddings fully connected layer is for the embedding only, means it is conencted to another layer of classification layer, for that classification layer, do you mix the classes among the different datasets?
Thank you.
Hi,
I read your paper Speaker Diarization using Deep Recurrent Convolutional Neural Networks for speaker embedding. The details were very clear regarding the convolutional part.
But for the 2 recurrent blocks, how many neurons you used?
Which way did u flatten the 2nd recurrent layer to connect with the fullt conencted layer?
The embeddings fully connected layer is for the embedding only, means it is conencted to another layer of classification layer, for that classification layer, do you mix the classes among the different datasets?
Thank you.