CyberCrime and Confusion matrices ..

Tushar Agarwal
2 min readJun 6, 2021

What is a confusion matrix:-

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. … The rows represent the predicted values of the target variable.

Confusion Matrix’s implementation in monitoring Cyber Attacks:

The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called
intrusions or attacks, and ``good’’ normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network
environment In KDD99 dataset these four attack classes (DoS, U2R,R2L, and probe) are divided into 22 different attack classes that tabulated below:

Conclusion:

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of
performance metrics like accuracy, precision, recall, and F1-score.

Need for Confusion Matrix in Machine learning:

1.) It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
2.) It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
3.) With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.

The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. The matrix itself can be easily understood and implemented to test a ML model.

--

--