데이터 스터디/SQL

[SQL 케이스 스터디] Pizza Runner

케이와이엠 2025. 4. 1. 14:32

Introduction

Pizza Runner, a uberized company making & delivering fresh pizza, requires assistance to clean data and apply some calculation for metrics to better direct his runners and optimise its operations.

Entity Relationship Diagram

 

Datasets used

Table 1: runners
The runners table shows the registration_date for each new runner.

Table 2: customer_orders
Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.The pizza_id relates to the type of pizza which was ordered whilst the exclusions are the ingredient_id values which should be removed from the pizza and the extras are the ingredient_id values which need to be added to the pizza.

Table 3: runner_orders
After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer. The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas. The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.

Table 4: pizza_names
At the moment - Pizza Runner only has 2 pizzas available the Meat Lovers or Vegetarian!

Data Cleansing

customer_orders table

  • In the exclusions and extras columns, there are blank spaces and null values. So, we should unify the values to ‘NULL’ .
  • (Qurey)
DROP TABLE IF EXISTS customer_orders_temp;

CREATE TEMPORARY TABLE customer_orders_temp AS
SELECT order_id,
       customer_id,
       pizza_id,
       CASE
           WHEN exclusions = '' THEN NULL
           WHEN exclusions = 'null' THEN NULL
           ELSE exclusions
       END AS exclusions,
       CASE
           WHEN extras = '' THEN NULL
           WHEN extras = 'null' THEN NULL
           ELSE extras
       END AS extras,
       order_time
FROM pizza_runner.customer_orders;

SELECT * FROM customer_orders_temp;

runner_orders table

  • The pickup_time, distance, duration and cancellation columns in runner_orders table will need to be cleaned up before using them in the queries
  • In the pickup_time column, there are null values.
  • In the distance column, there are null values. It contains unit - km. The 'km' must also be stripped
  • In the duration column, there are null values. The 'minutes', 'mins' 'minute' must be stripped
  • In the cancellation column, there are blank spaces and null values.
  • (Query)
    DROP TABLE IF EXISTS runner_orders_temp;
    
    CREATE TEMPORARY TABLE runner_orders_temp AS
    
    SELECT order_id,
           runner_id,
           CASE
               WHEN pickup_time LIKE 'null' THEN NULL
               ELSE pickup_time
           END AS pickup_time,
           CASE
               WHEN distance LIKE 'null' THEN NULL
               ELSE CAST(regexp_replace(distance, '[a-z]+', '') AS FLOAT)
           END AS distance,
           CASE
               WHEN duration LIKE 'null' THEN NULL
               ELSE CAST(regexp_replace(duration, '[a-z]+', '') AS FLOAT)
           END AS duration,
           CASE
               WHEN cancellation LIKE '' THEN NULL
               WHEN cancellation LIKE 'null' THEN NULL
               ELSE cancellation
           END AS cancellation
    FROM pizza_runner.runner_orders;
    
    SELECT * FROM runner_orders_temp;
    ​
     
  •  

Data Analyzing (1) - Order Metrics

1. How many pizzas were ordered?

  • Query :
    SELECT count(pizza_id) AS "total_order"
    FROM pizza_runner.customer_orders_temp;
  • Results : 14 orders were made right now.

2. How many unique customer orders were made?

  • Query :
    SELECT 
      COUNT(DISTINCT order_id) AS 'n_order'
    FROM customer_orders_temp;
    ​
     

3. How many times has each customer orderd/reorderd pizza?

  • Query :
    -- distict time
    SELECT customer_id, GROUP_CONCAT(distinct order_time)as order_date, 
            count(distinct order_time) as n_order
    FROM pizza_runner.customer_orders_temp
    GROUP BY customer_id
    ORDER BY customer_id;
  • Results : Almost every customer (except customer_id = ‘105’) reorder pizza.

4. How long does it takes to re-order pizza by each customer ?

  • Query :
    WITH tb1 as (
    SELECT customer_id
    		, order_time
    		, lag(order_time) over (partition by customer_id order by order_time asc) prev_order_time
    FROM pizza_runner.customer_orders_temp
    )
    
    SELECT customer_id 
    , avg(datediff(order_time, prev_order_time)) avg_reorder
    FROM tb1
    WHERE prev_order_time is not null
    GROUP BY 1
     

 - Results : 4.875days (average for all customers) takes to reorder.

 

 

5. How many of each type of pizza was delivered?

  • Query :
    SELECT p.pizza_name,
    	count(c.*) AS n_pizza_type
    FROM pizza_runner.customer_orders_temp AS c
    	LEFT JOIN pizza_runner.pizza_names AS p ON p.pizza_id = c.pizza_id
    	LEFT JOIN pizza_runner.runner_orders_temp AS r ON c.order_id = r.order_id
    WHERE cancellation IS NULL
    GROUP BY p.pizza_name
    ORDER BY n_pizza_type DESC;
    ​

 

6. How many Vegetarian and Meatlovers were ordered by each customer?

  • Query :
    SELECT customer_id,
           SUM(CASE
                   WHEN pizza_id = 1 THEN 1
                   ELSE 0
               END) AS 'meat_lovers',
           SUM(CASE
                   WHEN pizza_id = 2 THEN 1
                   ELSE 0
               END) AS 'vegetarian'
    FROM pizza_runner.customer_orders_temp
    GROUP BY customer_id
    ORDER BY customer_id;
    ​

- Results : there are someone who ordered both, and who ordered only meat, who ordered only vegetarian.

7. Which pizza was the most popular for each customer?

  • Query : Except 1 person, most loved ‘Meatlovers’ pizza.
    WITH order_table AS (
    SELECT customer_id, p.pizza_name, count(*) as ordertime, 
           dense_rank() over(PARTITION BY customer_id ORDER BY count(*) desc) as ranking
    FROM pizza_runner.customer_orders_temp AS c
    	LEFT JOIN pizza_runner.pizza_names AS p ON p.pizza_id = c.pizza_id
    GROUP BY customer_id, p.pizza_name)
    
    SELECT customer_id, pizza_name, ordertime
    FROM order_table
    WHERE ranking = 1;
    ​

 

8. What was the maximum number of pizzas delivered in a single order?

  • Query :
    SELECT customer_id,
           order_id,
           count(order_id) AS pizza_count
    FROM pizza_runner.customer_orders_temp
    GROUP BY order_id
    ORDER BY pizza_count DESC
    LIMIT 1;
    ​

9. What was the total volume of pizzas ordered for each hour of the day?

  • Query :
    SELECT hour(order_time) AS 'Hour',
           count(order_id) AS 'n_order',
           round(100*count(order_id) /sum(count(order_id)) over(), 2) AS 'ratio'
    FROM pizza_runner.customer_orders_temp
    GROUP BY 1
    ORDER BY 1;
    ​

 

10. What was the volume of orders for each day of the week?

  • The DAYOFWEEK() function returns the weekday index for a given date ( 1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, 7=Saturday )
  • DAYNAME() returns the name of the week day
  • Query :
    SELECT dayname(order_time) AS 'Day Of Week',
           count(order_id) AS 'n_order',
           round(100*count(order_id) /sum(count(order_id)) over(), 2) AS 'ratio'
    FROM pizza_runner.customer_orders_temp
    GROUP BY 1
    ORDER BY 2 DESC;
    ​

 

Data Analyzing (2) - Runner and Customer Experience

1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)

  • Query :
    WITH runner_signups as (SELECT runner_id,
    		registration_date,
    		date_sub(registration_date, INTERVAL datediff(registration_date,'2021-01-01') %7 DAY) 
                 AS starting_week
    FROM pizza_runner.runners )
    
    SELECT starting_week,
    	count(runner_id) AS n_runners
    from runner_signups
    GROUP BY starting_week
    ORDER BY starting_week;
    ​
    2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
    • Query :
      SELECT runner_id,
             round(avg(TIMESTAMPDIFF(MINUTE, order_time, pickup_time)), 2) avg_arrival_time
      FROM runner_orders_temp
      INNER JOIN customer_orders_temp USING (order_id)
      WHERE cancellation IS NULL
      GROUP BY runner_id;
      ​

3. Is there any relationship between the number of pizzas and how long the order takes to prepare?

  • Query :
    WITH order_count_cte AS
      (SELECT order_id,
              COUNT(order_id) AS pizzas_order_count,
              TIMESTAMPDIFF(MINUTE, order_time, pickup_time) AS prep_time
       FROM runner_orders_temp
       INNER JOIN customer_orders_temp USING (order_id)
       WHERE cancellation IS NULL
       GROUP BY order_id)
    SELECT pizzas_order_count,
           round(avg(prep_time), 2) AS prep_time
    FROM order_count_cte
    GROUP BY pizzas_order_count;
    ​
  • Results : There is a positive correlation between the number of orders and prep time. The more pizzas are orders, the more it takes to be prepared.pizzas_order_count prep_time
    1 12.00
    2 18.00
    3 29.00

4a. What was the average distance travelled for each customer?

SELECT customer_id,
       round(avg(distance), 2) AS 'avg_distance'
FROM runner_orders_temp
INNER JOIN customer_orders_temp USING (order_id)
WHERE cancellation IS NULL
GROUP BY customer_id
ORDER BY customer_id;

 

 

4b. What was the average distance travelled for each runner?

  • Query :
    SELECT runner_id,
           round(avg(distance), 2) AS 'avg_distance'
    FROM runner_orders_temp
    WHERE cancellation IS NULL
    GROUP BY runner_id
    ORDER BY runner_id;
    ​

 

 

 

6. What is the successful delivery percentage for each runner?

  • Query :
    SELECT runner_id,
           COUNT(pickup_time) AS delivered_orders,
           COUNT(*) AS total_orders,
           ROUND(100 * COUNT(pickup_time) / COUNT(*)) AS delivered_percentage
    FROM runner_orders_temp
    GROUP BY runner_id
    ORDER BY runner_id;
    ​