|
Characteristics
|
Percentage of Phishing attacks
|
|
Other
|
22.71%
|
|
Online Stores
|
17.61%
|
|
Global Internet Portals
|
17.27%
|
|
Payment Systems
|
13.11%
|
|
Banks
|
11.11%
|
|
Social networks and Blogs
|
6.34%
|
|
IMS
|
4.36%
|
|
Telecommunication companies
|
2.09%
|
|
IT Companies
|
2%
|
|
Financial Services
|
1.9%
|
|
Delivery companies
|
1.51%
|
|
Step1: Collection of Datasets
|
|
Step 4: Display of
Results
|
|
Step 3: Classification using SVM Algorithm
|
|
Step 2: Feature selection using KMO Test
|
|
Filter Method
|
Wrapper Method
|
Embedded Method
|
|
Filter methods are generally used as a pre-processing step. The selection of features is independent of any machine
learning algorithms. Instead, features are selected on the basis of their
scores in various statistical
tests for their correlation with the outcome
variable.
|
In wrapper
methods, we try to use a subset of features and
train a model using them. The problem is essentially reduced to a
search problem. these methods are usually computationally very expensive
forward selection backward elimination recursive feature
elimination:
|
Embedded
methods combine the qualities‘ of
filter and wrapper methods. It is implemented
by algorithms that have their own
built-in feature selection methods.
lasso
regression ridge regression
|
|
KMO
|
Interpretation
|
|
0.9 And
Above
|
Marvellous
|
|
0.8-0.9
|
Meritorious
|
|
0.7-0.8
|
Midding
|
|
0.6-0.7
|
Mediocre
|
|
0.5-0.6
|
Miserable
|
|
Under 0.5
|
Unacceptable
|
|
i.
Create a
function KMO(a)
ii.
Find the
correlation matrix of ‘a’ using cor() function
iii.
Apply
KMO index per variable formula by calculating the
sum of square of cor(a)
divided by overall sum of cor(a) plus the sum of partial correlation of ‘a’
iv.
Repeat
the step (iii) for calculating overall KMO index with the colsum() function.
v.
Call the
KMO (a) function.
|
|
FORMULA:
Linear classifier f(x) = N. ?i ?i
|
|
||X1 — X2 || = Euclidean distance
between X1 & X2
Using the distance in the original space we calculate the dot
product (similarity) of X1 & X2.
|
|
Step1: Normalize
the dataset.
Step2: Train and Test the dataset
Step3: Compute the SVM linear formula in trained attribute and predict the model
Step 4: Repeat the step 3 for calculating the
confusion matrix by calling the predication model.
Step 5: Call the linear kernel function for
calculating the parameters with the cost=1.
Step 6: Print and plot the parameters
Step 7: End the process.
|
|
S NO
|
FEATURE
|
MEASURE
|
|
1
|
ON_MOUSE
|
0.93
|
|
2
|
HTTP_TOKEN
|
0.90
|
|
3
|
ABNORMAL_URL
|
0.87
|
|
4
|
SHORTING_SERVICE
|
0.87
|
|
5
|
REDIRECT
|
0.87
|
|
6
|
RIGHT_CLICK
|
0.86
|
|
7
|
LINKS_IN_TAGS
|
0.85
|
|
8
|
DOUBLE_SLASH_REDIRECT
|
0.84
|
|
9
|
PORT
|
0.84
|
|
10
|
SUBMIITING_TO_EMAIL
|
0.84
|
|
11
|
PERFIX_SUFFIX
|
0.81
|
|
12
|
SSL FINAL_STATE
|
0.81
|
|
13
|
WEB_TRAFFIC
|
0.81
|
|
14
|
FAVICON
|
0.80
|
|
15
|
POP_UP_WINDOW
|
0.80
|
|
16
|
I_FRAME
|
0.80
|
|
17
|
STATISTICAL_REPORT
|
0.79
|
|
18
|
HAVING_SUB_DOMAIN
|
0.78
|
|
Reference
|
|
Prediction
-1
1
|
|
-1 1
142 7
|
|
30 876
|
|
Algorithm
|
Feature Selection
|
No of
Features
|
Sensitivity
|
Specificity
|
Accuracy
|
Positive
Class
|
|
SVM
|
KMO Test
|
18
|
0.8256
|
0.9921
|
0.9649
|
-1(Phishing website)
|
|
SVM-Type: C-classification
SVM-Kernel:
linear cost: 1
Number of Support Vectors: 8
|
Authors: SAURABH RANKA, CHETAN JAIN
International Journal for Legal Research and Analysis
All research articles published in International Journal for Legal Research and Analysis are open access and available to read, download and share, subject to proper citation of the original work.
Disclaimer: The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of International Journal for Legal Research and Analysis.