Operation Analytics and Investigating Metric Spike SQL
Operation Analytics and Investigating Metric Spike SQL
Introduction
This project focuses on analyzing the data which is provided by the company. My task is to derive insights and answer the questions asked by different departments. So that these insights are then used by opsteam, support team, marketing team, etc to predict the overall growth or decline of a company’sfortune. It means better automation, and better understanding between cross-functional effective workflows.
In case study 1 there is job_data table while in case study 2 there are users, events and email_events tables
Project Approach
This project is developed using SQL Workbench. First I need to create database by using dataset file which was provided by the company. Next step load the data into SQL Workben h then performed analysis and find the information that will the help the ops team, support team, marketing team, etc to understand questions like - Why is there a dip in daily engagement? Why have sales taken a dip? Etc. Questions like these must be answered daily and for that it’s very important to investigate metric spike.
Tech-Stack Used
MySQL Workbench is a visual editor that unifies data modeling, SQL development, and database administration in one interface.
MySQL Workbench is widely used to handle structured data. It is an open-source Relational Database Management System (RDBMS) developed by Oracle Corporation, Sun Microsystems that uses Structured Query language (SQL) to interact with databases.
MySQL Workbench offers database migration options, making it easier to move data to and from the Microsoft SQL Server, Microsoft Access and other RDBMS tables.
Insights
Case Study 1 (Job Data)
1. Calculate the number of jobs reviewed per hour per day for November 2020?
SELECT ds AS Dates, ROUND((COUNT(job_id)/SUM(time_spent))*3600) AS "Jobs Reviewed per Hour per Day"
FROM job_data
WHERE ds BETWEEN '2020-11-01' AND '2020-11-30' GROUP BY ds;
On date 2020-11-28 there is maximum number of jobs reviewed that is 218.

2. Calculate 7 day rolling average of throughput? For throughput, do you prefer daily metric or 7- day rolling and why?
SELECT ROUND(COUNT(event)/SUM(time_spent), 2) AS "Weekly Throughput" FROM job_data;
The weekly throughput is 0.03.

SELECT ds AS Dates, ROUND(COUNT(event)/SUM(time_spent), 2) AS "Daily Throughput" FROM job_data
GROUP BY ds ORDER BY ds;
On date 2020-11-28 the throughput is highest 0.06.

Metrics will always go up and down on a weekly and daily basis. You’ll get numbers faster every day or minute if you want. As a result, rolling metrics are superb at showing if your metrics are trending up or down on a daily level.
3. Calculate the percentage share of each language in the last 30 days?
SELECT language AS Languages, ROUND(100 * COUNT(*)/total, 2) AS Percentage FROM job_data
CROSS JOIN (SELECT COUNT(*) AS total FROM job_data) sub GROUP BY language;
Persian language is highest with 37.5% total.

4. Let’s say you see some duplicate rows in the data. How will you display duplicates from the table?
SELECT actor_id, COUNT(*) AS Duplicates FROM job_data
GROUP BY actor_id HAVING COUNT(*) > 1;
Actor ID 1003 has duplicate rows.

Case Study 2 (Investigating Metric Spike)
1. Calculate the weekly user engagement?
SELECT EXTRACT(WEEK FROM occurred_at) AS "Week Numbers", COUNT(DISTINCT user_id) AS "Weekly Active Users"
FROM events
WHERE event_type = 'engagement' GROUP BY 1;

2. Calculate the user growth for product?
SELECT Months, Users, ROUND(((Users/LAG(Users, 1) OVER (ORDER BY Months) -
1)*100), 2) AS "Growth in %" FROM
(SELECT EXTRACT(MONTH FROM created_at) AS Months, COUNT(activated_at) AS Users FROM users
WHERE activated_at NOT IN("") GROUP BY 1
ORDER BY 1) sub;

3. Calculate the weekly retention of users-sign up cohort?
SELECT first AS "Week Numbers",
SUM(CASE WHEN week_number = 0 THEN 1 ELSE 0 END) AS "Week 0",
SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS "Week 1",
SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS "Week 2",
SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS "Week 3",
SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS "Week 4",
SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS "Week 5",
SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS "Week 6",
SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS "Week 7",
SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS "Week 8",
SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS "Week 9",
SUM(CASE WHEN week_number = 10 THEN 1 ELSE 0 END) AS "Week 10",
SUM(CASE WHEN week_number = 11 THEN 1 ELSE 0 END) AS "Week 11",
SUM(CASE WHEN week_number = 12 THEN 1 ELSE 0 END) AS "Week 12",
SUM(CASE WHEN week_number = 13 THEN 1 ELSE 0 END) AS "Week 13",
SUM(CASE WHEN week_number = 14 THEN 1 ELSE 0 END) AS "Week 14",
SUM(CASE WHEN week_number = 15 THEN 1 ELSE 0 END) AS "Week 15",
SUM(CASE WHEN week_number = 16 THEN 1 ELSE 0 END) AS "Week 16",
SUM(CASE WHEN week_number = 17 THEN 1 ELSE 0 END) AS "Week 17",
SUM(CASE WHEN week_number = 18 THEN 1 ELSE 0 END) AS "Week 18", FROM
(
SELECT m.user_id, m.login_week, n.first, m.login_week - first AS week_number FROM
(SELECT user_id, EXTRACT(WEEK FROM occurred_at) AS login_week FROM events GROUP BY 1, 2) m,
(SELECT user_id, MIN(EXTRACT(WEEK FROM occurred_at)) AS first FROM events GROUP BY 1) n
WHERE m.user_id = n.user_id
) sub
GROUP BY first ORDER BY first;

6
4. Calculate the weekly engagement per device?
SELECT EXTRACT(WEEK FROM occurred_at) AS "Week Numbers",
COUNT(DISTINCT CASE WHEN device IN('dell inspiron notebook') THEN user_id ELSE NULL END) AS "Dell Inspiron Notebook",
COUNT(DISTINCT CASE WHEN device IN('iphone 5') THEN user_id ELSE NULL END) AS
"iPhone 5",
COUNT(DISTINCT CASE WHEN device IN('iphone 4s') THEN user_id ELSE NULL END) AS
"iPhone 4S",
COUNT(DISTINCT CASE WHEN device IN('windows surface') THEN user_id ELSE NULL END) AS "Windows Surface",
COUNT(DISTINCT CASE WHEN device IN('macbook air') THEN user_id ELSE NULL END) AS "Macbook Air",
COUNT(DISTINCT CASE WHEN device IN('iphone 5s') THEN user_id ELSE NULL END) AS
"iPhone 5S",
COUNT(DISTINCT CASE WHEN device IN('macbook pro') THEN user_id ELSE NULL END) AS "Macbook Pro",
COUNT(DISTINCT CASE WHEN device IN('kindle fire') THEN user_id ELSE NULL END) AS "Kindle Fire",
COUNT(DISTINCT CASE WHEN device IN('ipad mini') THEN user_id ELSE NULL END) AS "iPad Mini",
COUNT(DISTINCT CASE WHEN device IN('nexus 7') THEN user_id ELSE NULL END) AS
"Nexus 7",
COUNT(DISTINCT CASE WHEN device IN('nexus 5') THEN user_id ELSE NULL END) AS
"Nexus 5",
COUNT(DISTINCT CASE WHEN device IN('samsung galaxy s4') THEN user_id ELSE NULL END) AS "Samsung Galaxy S4",
COUNT(DISTINCT CASE WHEN device IN('lenovo thinkpad') THEN user_id ELSE NULL END) AS "Lenovo Thinkpad",
COUNT(DISTINCT CASE WHEN device IN('samsumg galaxy tablet') THEN user_id ELSE NULL END) AS "Samsumg Galaxy Tablet",
COUNT(DISTINCT CASE WHEN device IN('acer aspire notebook') THEN user_id ELSE NULL END) AS "Acer Aspire Notebook",
COUNT(DISTINCT CASE WHEN device IN('asus chromebook') THEN user_id ELSE NULL END) AS "Asus Chromebook",
COUNT(DISTINCT CASE WHEN device IN('htc one') THEN user_id ELSE NULL END) AS "HTC One",
COUNT(DISTINCT CASE WHEN device IN('nokia lumia 635') THEN user_id ELSE NULL END) AS "Nokia Lumia 635",
COUNT(DISTINCT CASE WHEN device IN('samsung galaxy note') THEN user_id ELSE NULL END) AS "Samsung Galaxy Note",
COUNT(DISTINCT CASE WHEN device IN('acer aspire desktop') THEN user_id ELSE NULL END) AS "Acer Aspire Desktop",
COUNT(DISTINCT CASE WHEN device IN('mac mini') THEN user_id ELSE NULL END) AS "Mac Mini",
COUNT(DISTINCT CASE WHEN device IN('hp pavilion desktop') THEN user_id ELSE NULL END) AS "HP Pavilion Desktop",
COUNT(DISTINCT CASE WHEN device IN('dell inspiron desktop') THEN user_id ELSE NULL END) AS "Dell Inspiron Desktop",
COUNT(DISTINCT CASE WHEN device IN('ipad air') THEN user_id ELSE NULL END) AS "iPad Air",
7
COUNT(DISTINCT CASE WHEN device IN('amazon fire phone') THEN user_id ELSE NULL END) AS "Amazon Fire Phone",
COUNT(DISTINCT CASE WHEN device IN('nexus 10') THEN user_id ELSE NULL END) AS
"Nexus 10" FROM events
WHERE event_type = 'engagement' GROUP BY 1
ORDER BY 1;


8
5. Calculate the email engagement metrics?
SELECT Week,
ROUND((weekly_digest/total*100),2) AS "Weekly Digest Rate", ROUND((email_opens/total*100),2) AS "Email Open Rate", ROUND((email_clickthroughs/total*100),2) AS "Email Clickthrough Rate", ROUND((reengagement_emails/total*100),2) AS "Reengagement Email Rate" FROM
(SELECT EXTRACT(WEEK FROM occurred_at) AS Week,
COUNT(CASE WHEN action = 'sent_weekly_digest' THEN user_id ELSE NULL END) AS weekly_digest,
COUNT(CASE WHEN
email_opens,
action = 'email_open' THEN user_id ELSE
NULL END) AS
COUNT(CASE WHEN action = 'email_clickthrough' THEN user_id ELSE NULL END) AS email_clickthroughs,
COUNT(CASE WHEN action = 'sent_reengagement_email' THEN user_id ELSE NULL END) AS reengagement_emails,
COUNT(user_id) AS total FROM email_events GROUP BY 1
) sub
GROUP BY 1
ORDER BY 1;

No comments:
Post a Comment