Introduction
Pizza Runner, a uberized company making & delivering fresh pizza, requires assistance to clean data and apply some calculation for metrics to better direct his runners and optimise its operations.
Entity Relationship Diagram
Table 1: runners
The runners table shows the registration_date for each new runner.
Table 2: customer_orders
Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.
Table 3: runner_orders
After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer. The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.
Table 4: pizza_names
At the moment - Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!
Data Cleansing
customer_orders table
- In the exclusions and extras columns, there are blank spaces and null values. So, we should unify the values to ‘NULL’ .
- (Qurey)
DROP TABLE IF EXISTS customer_orders_temp;
CREATE TEMPORARY TABLE customer_orders_temp AS
SELECT order_id,
customer_id,
pizza_id,
CASE
WHEN exclusions = '' THEN NULL
WHEN exclusions = 'null' THEN NULL
ELSE exclusions
END AS exclusions,
CASE
WHEN extras = '' THEN NULL
WHEN extras = 'null' THEN NULL
ELSE extras
END AS extras,
order_time
FROM pizza_runner.customer_orders;
SELECT * FROM customer_orders_temp;
runner_orders table
- The pickup_time, distance, duration and cancellation columns in runner_orders table will need to be cleaned up before using them in the queries
- In the pickup_time column, there are null values.
- In the distance column, there are null values. It contains unit - km. The 'km' must also be stripped
- In the duration column, there are null values. The 'minutes', 'mins' 'minute' must be stripped
- In the cancellation column, there are blank spaces and null values.
- (Query)
DROP TABLE IF EXISTS runner_orders_temp; CREATE TEMPORARY TABLE runner_orders_temp AS SELECT order_id, runner_id, CASE WHEN pickup_time LIKE 'null' THEN NULL ELSE pickup_time END AS pickup_time, CASE WHEN distance LIKE 'null' THEN NULL ELSE CAST(regexp_replace(distance, '[a-z]+', '') AS FLOAT) END AS distance, CASE WHEN duration LIKE 'null' THEN NULL ELSE CAST(regexp_replace(duration, '[a-z]+', '') AS FLOAT) END AS duration, CASE WHEN cancellation LIKE '' THEN NULL WHEN cancellation LIKE 'null' THEN NULL ELSE cancellation END AS cancellation FROM pizza_runner.runner_orders; SELECT * FROM runner_orders_temp;
Data Analyzing (1) - Order Metrics
1. How many pizzas were ordered?
- Query :
SELECT count(pizza_id) AS "total_order" FROM pizza_runner.customer_orders_temp;
- Results : 14 orders were made right now.
2. How many unique customer orders were made?
- Query :
SELECT COUNT(DISTINCT order_id) AS 'n_order' FROM customer_orders_temp;
3. How many times has each customer orderd/reorderd pizza?
- Query :
-- distict time SELECT customer_id, GROUP_CONCAT(distinct order_time)as order_date, count(distinct order_time) as n_order FROM pizza_runner.customer_orders_temp GROUP BY customer_id ORDER BY customer_id;
- Results : Almost every customer (except customer_id = ‘105’) reorder pizza.
4. How long does it takes to re-order pizza by each customer ?
- Query :
WITH tb1 as ( SELECT customer_id , order_time , lag(order_time) over (partition by customer_id order by order_time asc) prev_order_time FROM pizza_runner.customer_orders_temp ) SELECT customer_id , avg(datediff(order_time, prev_order_time)) avg_reorder FROM tb1 WHERE prev_order_time is not null GROUP BY 1
- Results : 4.875days (average for all customers) takes to reorder.
5. How many of each type of pizza was delivered?
- Query :
SELECT p.pizza_name, count(c.*) AS n_pizza_type FROM pizza_runner.customer_orders_temp AS c LEFT JOIN pizza_runner.pizza_names AS p ON p.pizza_id = c.pizza_id LEFT JOIN pizza_runner.runner_orders_temp AS r ON c.order_id = r.order_id WHERE cancellation IS NULL GROUP BY p.pizza_name ORDER BY n_pizza_type DESC;
6. How many Vegetarian and Meatlovers were ordered by each customer?
- Query :
SELECT customer_id, SUM(CASE WHEN pizza_id = 1 THEN 1 ELSE 0 END) AS 'meat_lovers', SUM(CASE WHEN pizza_id = 2 THEN 1 ELSE 0 END) AS 'vegetarian' FROM pizza_runner.customer_orders_temp GROUP BY customer_id ORDER BY customer_id;
- Results : there are someone who ordered both, and who ordered only meat, who ordered only vegetarian.
7. Which pizza was the most popular for each customer?
- Query : Except 1 person, most loved ‘Meatlovers’ pizza.
WITH order_table AS ( SELECT customer_id, p.pizza_name, count(*) as ordertime, dense_rank() over(PARTITION BY customer_id ORDER BY count(*) desc) as ranking FROM pizza_runner.customer_orders_temp AS c LEFT JOIN pizza_runner.pizza_names AS p ON p.pizza_id = c.pizza_id GROUP BY customer_id, p.pizza_name) SELECT customer_id, pizza_name, ordertime FROM order_table WHERE ranking = 1;
8. What was the maximum number of pizzas delivered in a single order?
- Query :
SELECT customer_id, order_id, count(order_id) AS pizza_count FROM pizza_runner.customer_orders_temp GROUP BY order_id ORDER BY pizza_count DESC LIMIT 1;
9. What was the total volume of pizzas ordered for each hour of the day?
- Query :
SELECT hour(order_time) AS 'Hour', count(order_id) AS 'n_order', round(100*count(order_id) /sum(count(order_id)) over(), 2) AS 'ratio' FROM pizza_runner.customer_orders_temp GROUP BY 1 ORDER BY 1;
10. What was the volume of orders for each day of the week?
- The DAYOFWEEK() function returns the weekday index for a given date ( 1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, 7=Saturday )
- DAYNAME() returns the name of the week day
- Query :
SELECT dayname(order_time) AS 'Day Of Week', count(order_id) AS 'n_order', round(100*count(order_id) /sum(count(order_id)) over(), 2) AS 'ratio' FROM pizza_runner.customer_orders_temp GROUP BY 1 ORDER BY 2 DESC;
Data Analyzing (2) - Runner and Customer Experience
1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
- Query :
2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?WITH runner_signups as (SELECT runner_id, registration_date, date_sub(registration_date, INTERVAL datediff(registration_date,'2021-01-01') %7 DAY) AS starting_week FROM pizza_runner.runners ) SELECT starting_week, count(runner_id) AS n_runners from runner_signups GROUP BY starting_week ORDER BY starting_week;
- Query :
SELECT runner_id, round(avg(TIMESTAMPDIFF(MINUTE, order_time, pickup_time)), 2) avg_arrival_time FROM runner_orders_temp INNER JOIN customer_orders_temp USING (order_id) WHERE cancellation IS NULL GROUP BY runner_id;
- Query :
3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
- Query :
WITH order_count_cte AS (SELECT order_id, COUNT(order_id) AS pizzas_order_count, TIMESTAMPDIFF(MINUTE, order_time, pickup_time) AS prep_time FROM runner_orders_temp INNER JOIN customer_orders_temp USING (order_id) WHERE cancellation IS NULL GROUP BY order_id) SELECT pizzas_order_count, round(avg(prep_time), 2) AS prep_time FROM order_count_cte GROUP BY pizzas_order_count;
- Results : There is a positive correlation between the number of orders and prep time. The more pizzas are orders, the more it takes to be prepared.pizzas_order_count prep_time
1 12.00 2 18.00 3 29.00
4a. What was the average distance travelled for each customer?
SELECT customer_id,
round(avg(distance), 2) AS 'avg_distance'
FROM runner_orders_temp
INNER JOIN customer_orders_temp USING (order_id)
WHERE cancellation IS NULL
GROUP BY customer_id
ORDER BY customer_id;
4b. What was the average distance travelled for each runner?
- Query :
SELECT runner_id, round(avg(distance), 2) AS 'avg_distance' FROM runner_orders_temp WHERE cancellation IS NULL GROUP BY runner_id ORDER BY runner_id;
6. What is the successful delivery percentage for each runner?
- Query :
SELECT runner_id, COUNT(pickup_time) AS delivered_orders, COUNT(*) AS total_orders, ROUND(100 * COUNT(pickup_time) / COUNT(*)) AS delivered_percentage FROM runner_orders_temp GROUP BY runner_id ORDER BY runner_id;
'데이터 스터디 > SQL' 카테고리의 다른 글
[SQL 케이스 스터디] 유저 인게이지먼트 하락 원인 찾기 - Drop in Engagement (0) | 2025.04.01 |
---|---|
[SQL 케이스 스터디] #1. Danny's Diner (0) | 2025.03.31 |
[programmers] SQL 고득점 Kit - JOIN (0) | 2023.04.06 |
[programmers] SQL 고득점 Kit - NOT NULL (0) | 2023.04.05 |
[programmers] SQL 고득점 Kit - SUM, MAX, MIN (0) | 2023.04.05 |