Dates: 15th Feb - 5th Apr 2019
The University of Edinburgh - MSc AI: Data Mining and Exploration
Authors: Cecilia Cobos Santes, Chon In (Haydn) Cheong, Hong Tin Chan, Ivaylo Genev
Supervisor: Dr.Arno Onken
Motivation: In the ever-growing internet advertising system, it is considerably beneficial foradvertisers to promote image-based advertisements. However, such images canincrease a page’s load time, negatively affecting users’ browsing experience. Therefore, the desire arises to remove advertising images.
A number of different ways for exploratory data analyses (EDA) were presented,
intended to inform data pre-processing steps and modelling.
As a result of highly-imbalanced classes and somemissing data,
methods such as SMOTE and multiple imputation are employed to ensure reliable classification.
Moreover, three classification-based models foradvertising detection are proposed.
It is found that with the combination of randomforest for missing data imputation,
LDA for dimensionality reduction, SMOTEfor imbalanced dataset,
and multi-layer perception (MLP) classifier, over 99%balanced accuracy can be obtained!