OPTIMASI HYPERPARAMETER PADA MODEL REGRESI LOGISTIK UNTUK MENINGKATKAN AKURASI DETEKSI PHISHING BERBASIS KONTEN DAN METADATA

Repository Analytics

Statistic Details

Updated data
0Viewes
0Downloaded
0Accessed per month
0Countries

Statistic not available yet or restricted.

Abstract

This study evaluates and optimizes the performance of the Logistic Regression algorithm for phishing email detection. The primary challenge lies in balancing the use of technical features (metadata) and textual features (content) to prevent overfitting. This research utilizes a large-scale combined dataset consisting of 102,486 emails, comprising the Phishing dataset (Naser Abdullah Alam) and the Valid dataset (Enron), processed using TF-IDF vectorization and metadata feature extraction techniques. Unlike previous studies, this research implements hyperparameter optimization (C regularization) to assess model stability. Experimental results demonstrate that the Content-Only model yields the most superior and stable performance, achieving an Area Under Curve (AUC) of 0.99 and an F1-Score exceeding 95.61%. In contrast, the incorporation of metadata features in the Hybrid model led to a decline in accuracy at high regularization values, indicating that metadata acts as noise. The study concludes that Logistic Regression utilizing content features alone is sufficiently robust and efficient for phishing detection, eliminating the need for the added complexity of metadata.

Description

Citation

APA

Endorsement

Review

Supplemented By

Referenced By