Create a Machine Learning Model : Phishing websites vs Regular websites

Date : 3/21-3/25 | Time : 8:00 am – 9:00 am | Medium : Hybrid | Max Attendees : 60

To register for this training, please submit the form below:

Conference and Training Pass is required to attend this training.

Training Abstract :

The challenge: Create a machine learning model that distinguishes phishing websites from regular websites based on their url.

3/21/22

  • Lecture (90 minutes): basics of machine learning using scikit-learn in Python, csv datasets, feature extraction, and measures of accuracy.
  • Participants are given access to a partial dataset with two columns: website urls and whether they are phishing websites or not. They will use this for training and validation. The rest of the dataset will be kept secret from them until the final day.
  • Participants split into teams of about four.
  • Teams are given access to a folder in which to upload their final submission.

3/22/22 to 3/24/22

  • No formal lecture. Time to answer any questions.
  • Teams split their dataset into training and validation.
  • Teams extract features from the dataset and train models on the training dataset, looking for the best performance on their validation dataset.
  • Final submissions are due by 8 PM EST 3/24/22.
  • The format of the final submission is a folder upload named after their team name containing at least two files: extract_features.py, exposing a function mapping the csv dataset to features, and model.joblib, containing a scikit-learn model.

3/25/22

  • Final evaluation and discussion (60 minutes): I gather the submissions and test them on the secret test dataset.
  • Teams are scored on three measures: true positive rate, true negative rate, and execution time.
  • Teams discuss what they found worked best with their feature selections and models.

Materials needed:

  • A storage account (such as OneDrive)
  • A room with a projector / large TV for the presentation

Trainings Home