CLASSIFICATION OF KAZAKH MUSIC GENRES USING MACHINE LEARNING TECHNIqUES

: This article analysis a Kazakh Music dataset, which consists of 800 audio


Introduction
The digital era has revolutionized the way we interact with music.With the vast amount of music available online, the ability to accurately categorize songs by genre is crucial for music discovery, recommendation systems, and audience targeting.
Music genre classification is an important sub-discipline of music information retrieval (MIR), a field that intersects various aspects from musicology to AI and ML.The challenges in this field, notably the ambiguous nature of music genres, have led researchers to explore various ML techniques for more effective classification [1].Historically, music genre classification relied on acoustic and sound characteristics.However, the introduction of ML, especially deep learning and neural networks, has revolutionized this approach.The ability of these models to handle complex, multi-layered data has significantly improved the accuracy and effectiveness of music genre classification [2].Supervised learning is predominantly used in music classification, involving model training using labeled data to predict genres.The process includes extracting key features from audio and training a model with these features.Evaluation metrics vary depending on the balance of the dataset, with accuracy being a standard measure in balanced cases [3].
Music genre classification can encompass various types of classification problems, ranging from binary to multi-class and multi-label classifications, reflecting the complexity and diversity of musical genres [4].The standard procedure in music classification involves transforming audio into representative features, then constructing and evaluating a model based on these features.This evaluation is crucial to assess the model's accuracy and effectiveness in real-world applications [5].
Additionally, it is important to note that audio data is initially in analog format.To process it on a computer, it needs to be converted to a digital format using an analog-to-digital converter.This conversion involves sampling the analog signal at regular intervals and quantizing the amplitude values into discrete levels, creating a digital representation of the audio [6].
Overall, this research paper revolves around understanding and manipulating song data, whether it involves only musical instruments, acoustic compositions, or a combination of instruments and vocals [7].
Music holds a significant place in Kazakhstan's cultural heritage, reflecting the rich history and traditions of the Kazakh people.The traditional music of Kazakhstan, including both instrumental and vocal genres, has deep roots and has evolved over centuries.Like many other countries, Kazakhstan has a rich musical tapestry, blending traditional Kazakh music with modern genres from around the world.It's important to note that the success of music genre classification using machine learning depends on the quality of the dataset, the selection of relevant features, and the choice of an appropriate model.Additionally, incorporating domain knowledge about Kazakh music and culture can enhance the accuracy and cultural relevance of the classification system

Literature Review and problem statement
The literature review underscores the evolving role of ML in music genre classification, highlighting the challenges and advancements in the field.This background sets the stage for the project's objective of developing an ML model to classify songs into Rock or Hip-Hop genres, contributing to the broader field of MIR.The connection of machine learning and music has gained significant attention in recent years, with researchers exploring the development of models capable of automatically classifying music genres.This literature review observes key studies that contribute to the deeper understanding of machine learning models applied to the music genre classification.
In reference [8] authors mainly focused on the practical research using GTZAN dataset and showing the readers different kinds of models that can help to classify audio files into various kinds of music genres.They had approximately 1000 music files and gave us figures and tables with visual information about the overall methodology and formats using spectrogram files and training/test data.Comparisons and testing have been made using Keras framework and Google Colab.
The [9] research introduces an original approach by combining audio and lyrics features for music genre classification.The study explores the complementary nature of audio and information in the text format, employing a "deep neural network architecture" to integrate both modes.This specific work highlights the potential for multimodal approaches to enhance the accuracy and robustness of music genre classification models.
In [10] research paper, authors used the same dataset and techniques (CNN) as the background and framework of their study, to analyze and compare the results and overall information about the whole topic.The main difference that they have in their research is that they did not use visual features and representations with CNN as classifier, the authors represent music segments in the dataset by mel frequency cepstral coefficients (MFCC).Mainly, it was used for reprocessing the music segments in the dataset and only then train the CNN model.To conclude, the results were positive and successful with 94.5 percent of accuracy.
The [11] research is mostly about transfer learning, a technique where a model trained on one task and is adapted for another related task.The authors, for their research, used given technique in the context of music genre classification.The study explores the "transferability" of features learned from a large dataset to improve the performance on a smaller target dataset.Experimental results showed that the transfer learning way can achieve a higher average classification accuracy (95.8 percent) than the same deep Recurrent Neural Network (RNN) which marks the parameters randomly (93.7 percent).
Authors in reference [12] proposed an end-to-end learning approach for music audio analysis, eliminating the need for handcrafted feature engineering.The study introduces a "convolutional neural network architecture'' that is capable of directly processing raw audio waveforms.This approach abridged the model pipeline and showcased the overall potential of end-to-end learning for tasks like music genre classification.
This literature review showed the overall evolution of machine learning models for music genre classification, with a range from early comparative studies to the integration of deep learning and multimodal approaches.Possibly, future research in this field may focus on addressing the issues like scalability, interpretability and the integration of additional modalities to advance the possibilities of music genre classification systems.

Purpose and objectives of the study
This project aims to develop a machine learning model capable of classifying songs based solely on their audio features.The approach involves processing and analyzing audio data, reducing feature dimensions, and applying classification algorithms.
The scientific novelty of this research lies in its potential to enhance music recommendation systems and support artists and producers in understanding the categorization of their music in digital platforms.
The objects of the study are the sound characteristics of Kazakh music's over the past 8 years.
In the course of the work the following tasks were set: 1. Loading and preparing the dataset.2. Pairwise relationships between continuous variables 3. Normalize and scale the data 4. Compare decision tree model to a logistic regression 5. Balance data for greater performance 6. Evaluate models.

Materials and methods
In order to make a comparative analysis of the considered software tools, the materials of Scopus, Web of Science, ResearchGate international databases on the mentioned topic were analyzed, and machine learning algorithms for increasing the productivity of agricultural crops were considered.
The Kazakh_music dataset is a widely used benchmark dataset for music genre classification, which consists of 1000 audio tracks equally distributed across 5 different genres, dataset was manually collected in 2016-2023.The genres in the dataset are Kazakh Rock, Kazakh pop music, Q-pop, Zazz and Toi.Each audio track is 30 seconds long, sampled at 25,050 Hz, and stored in the WAV format.The dataset is carefully curated and annotated with ground truth labels by music experts.
Data Sources CSV File: Contains basic track information and genre labels.JSON File: Includes musical features like danceability and acousticness.

Tools and Technologies
Programming Language: Python Libraries: Pandas for data manipulation, Matplotlib and Seaborn for data visualization, scikit-learn for machine learning.
Methodology Data Cleaning: Handling missing or irrelevant data in the datasets.Exploratory Data Analysis: Visualizing data to uncover patterns and insights.
Feature Reduction: Using techniques like PCA to reduce feature dimensions while retaining variance.
Model Training: Implementing Decision Tree and Logistic Regression models and evaluating their performance.
Heatmap A heatmap is a graphical representation of data in which values are depicted using a color scale.Mathematically, a heatmap can be represented as a matrix M, where each element Mij corresponds to the value of a variable at position (i,j) in the dataset.Once you have this matrix representation, you can apply various mathematical operations or visualization techniques to analyze or display the data.For example, it is uses to calculate statistics like the mean or standard deviation of the values in the matrix, or for to visualize the matrix as an image where different colors represent different values [13].

Logistic regression
Logistic regression predicts "probability value" through a linear combination of the given features plugged inside a logistic function given as below (1): 87 (1) where -intercept, as the input feature vector (a column vector of predictors or independent variables), y as the binary output variable (0 or 1, representing the two classes).
The hypothesis function in logistic regression is defined as (2): (2) Here, is the dot product of the parameter vector and the feature vector [14].

Decision Tree
A decision tree is a popular supervised machine learning algorithm used for classification and regression tasks.It works by partitioning the data into subsets based on certain attributes/ features, with the goal of creating a tree-like structure of decisions.Each internal node represents a "decision" based on the value of a feature, leading to different branches, and each leaf node represents the outcome or predicted value [15].

Gini Impurity in Decision Tree
The algorithm selects the partition that minimizes impurities and maximizes the purity of the split.Informally, the impurity is a gauge of the similarity of the labels at the node.
Let's assume that a dataset D Tree contains examples from n classes.Its Gini Index, Gini(D Tree ), is defined as (3): (3) where p j is the relative frequency of class j in n.The lowest value is zero, which signifies a node that contains the elements of the same class [16].

Load and prepare the dataset
The heat map between the musical characteristics of a given dataset was computed.From the above graph, we can see that none of the features have a strong correlation.Therefore, it is not necessary to remove any attributes from the given data.

Normalize the feature data
The importance of simplifying models and using as few features as possible without sacrificing performance was mentioned above.Since we did not find any strong correlations between our features, we can use a common dimensionality reduction technique called principal component analysis (PCA).PCA works by rotating the data along the axis of highest variance, which allows us to identify the features that contribute the most to the differences between classes.
However, since PCA uses the absolute variance of a feature to rotate the data, features with a wider range of values can overshadow and bias the algorithm.To avoid this, we must first normalize the data, which means scaling all features so that they have a mean of 0 and a standard deviation of 1 [17].

Principal component analysis on scaled data
After preprocessing our data, we can use Principal Component Analysis (PCA) to reduce its dimensionality.Scree plots and cumulative explained ratio plots can help us determine the number of components to retain for further analysis.Scree plots show the number of components on the x-axis and the variance explained by each component on the y-axis, sorted by decreasing variance.This helps us visualize which components explain the most variance in our data.To determine an appropriate cutoff, we look for an "elbow" in the plot, where there is a steep drop in variance from one component to the next (Fig. 3).Unfortunately, there does not appear to be a clear elbow in this scree plot, which means it is not straightforward to find the number of intrinsic dimensions using this method.

Further visualization of PCA
If the scree plot doesn't help us decide how many features to keep, we can use the cumulative explained variance plot.This plot shows how much variance each component explains, and we can use it to find the point where 90% of the variance is explained.This point represents the number of features we need to keep (Fig. 4).

Train a decision tree and a logistic regression model to classify the genre
Decision trees are rule-based classifiers that work by asking a series of binary questions about the data until they reach a conclusion.Decision Tree model was built to classify music genres, Gini index was used for splitting criteria [18].Thus, the Decision tree music genre classification model (Fig. 5) was constructed, using the Gini index to calculate the probability that at random selection a certain feature will be classified incorrectly.Next, we use Logistic regression to classify genres and then compare the classification accuracy of both methods.

Logistic Regression Model
Lets assume that y is the desired probability of class membership, x₁, x₂, x₃, ... are the features of the audio tracks that are used for prediction, and β₀, β₁, β₂, β₃, ... are the coefficients describing the effect of each characteristic of the given Kazakh_music.csvdataset on the probability.
The logistic regression model of genre classification of given dataset is (4): Sigmoid function is used for optimization, which basically compresses the entire function in the range of 0 to 1 [19].Table 1 shows the coefficients of Logistic regression model for Kazakh_music dataset.

Compare decision tree model to a logistic regression
Once we have trained two models, we can compare how well they predict by looking at how many data points they get wrong.

Figure 5. Comparative table of results of Decision Tree Classifier and Logistic Regression models
Both models do similarly well, boasting an average precision of 60% and 70%.However, looking at our classification report, we can see that rock song are fairly well classified, but hiphop songs are disproportionately misclassified as rock songs.

Balance data for greater performance
To compensate for the fact that there are more data points in some classes than others, we can give more weight to the correct classification of data points in the smaller classes.This will help us to train a model that is more accurate for all classes (Fig. 6).We've now balanced our dataset, but in doing so, we've removed a lot of data points that might have been crucial to training our models.

Does balancing our dataset improve model bias?
We have already made our dataset smaller, so we will not use any more dimensionality reduction techniques.We would normally use dimensionality reduction more carefully when working with very large datasets or when it takes too long to run our model.Balancing our data has fixed the problem where our model was biased towards the more common class.To get a better idea of how well our models are actually working, we can use a technique called cross-validation (CV).CV lets us compare models more fairly.

Using cross-validation to evaluate models
Since the way our data is split into train and test sets can impact model performance, CV attempts to split the data multiple ways and test the model on each of the splits.Although there are many different CV methods, all with their own advantages and disadvantages, we will use what's known as K-fold cross-validation here.K-fold first splits the data into K different, equally sized subsets.Then, it iteratively uses each subset as a test set while using the remainder of the data as train sets [20].Finally, we can then aggregate the results from each fold for a final model performance score (Fig. 7).Now, that we have performed k fold cross-validation on our dataset, we can be pretty sure that our model will generalize 77% of the times on the future unseen data points.

Conclusion
This report presents a comprehensive study on the classification of music genres using machine learning techniques, with a specific focus on distinguishing between 5 types of genres.The analysis demonstrates the potential of machine learning algorithms to effectively categorize music based on audio data, without relying on human listening.
Key takeaways from the research include: Effectiveness of Machine Learning in Music Classification: The study successfully employed machine learning algorithms, namely decision trees and logistic regression, to classify music genres.This underscores the capability of these algorithms in handling complex pattern recognition tasks like music genre classification.
Critical Role of Data Preparation and Feature Analysis: The study highlighted the importance of meticulous data preparation, including loading, cleaning, and standardizing data.The absence of strong correlations among features and the application of principal component analysis (PCA) for dimensionality reduction were crucial steps that enhanced the efficiency and effectiveness of the models.
Importance of Data Balance: Addressing the imbalance in the dataset was a pivotal aspect of the study.Balancing the number of data points for each class was essential in reducing model bias and ensuring that the model's accuracy was not skewed towards the more prevalent class.
Model Validation and Reliability: The use of cross-validation, specifically the K-fold method, provided a thorough evaluation of the model's performance.This step was instrumental in assessing the generalization capability of the models, ensuring that they could reliably predict genre classifications on new, unseen data.
Implications for Music Streaming Services: The findings of this study have significant implications for music streaming services.music genres using machine learning can lead to more precise and user-friendly music recommendation systems.This advancement can enhance user experience on streaming platforms, helping users discover music that aligns more closely with their tastes.
In conclusion, this research not only demonstrates the practical application of machine learning in the field of music genre classification but also sets a precedent for future research.It opens avenues for exploring more sophisticated machine learning models and incorporating a broader range of musical features to further refine genre classification systems.The methodologies and insights gained from this study can be pivotal in advancing digital music platforms, making them more responsive and tailored to individual user preferences.

Figure 2 .
Figure 2. Heat map of musical features of tracks

Figure 5 .
Figure 5. Decision tree model visualization of given dataset

Figure 6 .
Figure 6.Code fragment for balancing the original dataset

Figure 7 .
Figure 7. Code snippet for cross validation