Projects

Here two different cases are shown, namely the Power Bi case and the Data Science case.

PowerBI Case

Intro

Fashion Direct is a global jewelry brand selling products to different markets.

As a marketing data analyst, you are responsible for assisting the local marketing team in understanding their market performance and providing a product analysis report for the marketing management team. You must describe the sales performance in various markets, the best- and worst-performing goods, the sales trend, and the primary factor driving revenue growth. The local marketing group also wants to see the statistics in their own currency. DKK is the company's currency.

Who needs to see the sales dashboard?

Marketing management team

What they want to see?

- Sales amount

- Sales growth

- Sales of new stores

- Exchange rate

- Business line

Where the team want to see?

- Web

- Mobile

When is the data they want to see?

2015-2020

Sales fact:

- Sales amount

- Sales quantity

- Sales growth

- Business line

Time Dimension:

- Year

- Month

- Week

- Date

Product Dimension:

- Product category

- Product price

- Product group

Business Line Dimension:

- Business line country

- Business line

Date Dimension

Exchange rates fact:

- Currency

- Year

- Exchange rate

Currency Dimension:

- Currency

Data Science Case

Intro

Use data collected from a bank's clients (loan history, family type, mortgage, etc.) to predict which client will likely default on his credit obligations over a particular time.

Solution design and implementation:

Step 1: Business Understanding

The data mining goal of this process is to carefully and precisely anticipate a customer's likelihood of default. This prediction will be based on variables that are crucial for this goal. Predicting if a firm has a high likelihood of defaulting may be crucial when focusing on consumer criteria like industry or financial health.

Step 2: Analysis and Data Understanding

Data description: We looked at the number of data instances (25912) which represented professional loans. For each loan, there are 44 corresponding attributes and a target variable.

Data exploration: The target variable (Label_Default) models whether a loan is a good (N) or a bad (Y). Furthermore, we observed that both continuous and categorical variables were present in the data set. During the encoding of the data, these variables should be treated differently. Moreover, we identified 743 default cases and 25,169 non-defaults. The default rate of 2.8% denotes an imbalance in the data, which will be considered during the modeling phase.

Step 3: Data Preparation (Clean data)

Checking data comprehensiveness and duplication
Handling missing values
Encoding data

Step 4: Machine Learning Modeling and Evaluation

After completing the initial stage of preparation, we fed the pre-processed data into six different prediction models. Afterward, we calculated each model's accuracy and its Area Under the Curve (AUC) using the validation set. Lastly, the model with the highest AUC is chosen and is applied to new unseen data points, which in this case, is our test set.

How they want to see it? (Dimensions)

- By time

- By product

- By business line

- By currency