Language Recognition of Three Indian Languages based on Clustering and Supervised Learning

doi:10.3850/978-981-08-7302-8_1287

Language Recognition of Three Indian Languages based on Clustering and Supervised Learning

Pinki Roy

Computer Science & Engineering Department, National Institute of Technology, Silchar, Assam, India.

ABSTRACT

Language identification is always looked upon as a fascinating field in human computer interaction. It is one of the fundamental steps towards understanding human cognition and their behavior. Here the main objective is identifying a particular language from speech samples spoken by an individual speaker. This paper explicates the theory and implementation of speaker dependent language identification system of three Indian languages Assamese, Hindi and Indian English. The initial step used here is to obtain feature vectors using LP coefficients followed by forming clusters of vectors using the K-means algorithm. Each speech vector will be assigned to that cluster for which the distance is minimum. Supervised learning is used for recognizing the probable cluster the test speech sample vector actually belongs to. The final step includes finding the accuracy of the system which is calculated based on total number of correct speech samples assigned to each cluster for the complete speech database. This system is giving higher recognition accuracy for Hindi followed by Assamese and Indian English.

Keywords: Accuracy, Cluster, K-means algorithm, Language identification, Speech.

Back to TOC

FULL TEXT(PDF)