Projects
Here two different cases are shown, namely the Power Bi case and the Data Science case.
PowerBI Case
Intro
Fashion Direct is a global jewelry brand selling products to different markets.
As a marketing data analyst, you are responsible for assisting the local marketing team in understanding their market performance and providing a product analysis report for the marketing management team. You must describe the sales performance in various markets, the best- and worst-performing goods, the sales trend, and the primary factor driving revenue growth. The local marketing group also wants to see the statistics in their own currency. DKK is the company's currency.
Who needs to see the sales dashboard?
Marketing management team
What they want to see?
- Sales amount
- Sales growth
- Sales of new stores
- Exchange rate
- Business line
Where the team want to see?
- Web
- Mobile
When is the data they want to see?
2015-2020
Sales fact:
- Sales amount
- Sales quantity
- Sales growth
- Business line
Time Dimension:
- Year
- Month
- Week
- Date
Product Dimension:
- Product category
- Product price
- Product group
Business Line Dimension:
- Business line country
- Business line
Date Dimension
​
Exchange rates fact:
- Currency
- Year
- Exchange rate​
Currency Dimension:
- Currency
​
Data Science Case
Intro
-
Use data collected from a bank's clients (loan history, family type, mortgage, etc.) to predict which client will likely default on his credit obligations over a particular time.
Solution design and implementation:
​
Step 1: Business Understanding
The data mining goal of this process is to carefully and precisely anticipate a customer's likelihood of default. This prediction will be based on variables that are crucial for this goal. Predicting if a firm has a high likelihood of defaulting may be crucial when focusing on consumer criteria like industry or financial health.
​
Step 2: Analysis and Data Understanding
Data description: We looked at the number of data instances (25912) which represented professional loans. For each loan, there are 44 corresponding attributes and a target variable.
Data exploration: The target variable (Label_Default) models whether a loan is a good (N) or a bad (Y). Furthermore, we observed that both continuous and categorical variables were present in the data set. During the encoding of the data, these variables should be treated differently. Moreover, we identified 743 default cases and 25,169 non-defaults. The default rate of 2.8% denotes an imbalance in the data, which will be considered during the modeling phase.
​
Step 3: Data Preparation (Clean data)
-
Checking data comprehensiveness and duplication
-
Handling missing values
-
​Encoding data
​
Step 4: Machine Learning Modeling and Evaluation
After completing the initial stage of preparation, we fed the pre-processed data into six different prediction models. Afterward, we calculated each model's accuracy and its Area Under the Curve (AUC) using the validation set. Lastly, the model with the highest AUC is chosen and is applied to new unseen data points, which in this case, is our test set.
​
​
How they want to see it? (Dimensions)
- By time
- By product
- By business line
- By currency