Building Analytical Reports with PL/pgSQL
Analytical reports are systematic representations of data that help you make decisions. For example:
- Managers want to see what the revenue was for the past month.
- Analysts are looking for trends in the market.
- Developers monitor app performance.
Imagine you're a head chef running a huge restaurant. To figure out which dishes are ordered most often, you need a report. PostgreSQL here is your database of recipes and orders, and PL/pgSQL (procedures) is your kitchen assistant who automates the order analysis process.
Basics of Building Analytical Reports
An analytical report is a tool for aggregating, filtering, sorting, and organizing data to get useful info. Usually, the structure of a report includes these steps:
- Data prep: selecting info from tables, filtering, and pre-processing.
- Data aggregation: calculating metrics (average check, total sales, etc.).
- Formatting: organizing data in a way that's easy to read.
- Output: showing the report to users or saving it to a table for storage.
Each of these steps can be done using PL/pgSQL procedures.
Creating a Procedure for an Analytical Report
Let's break down a basic example of creating an analytical report. Suppose we have an orders table that stores order data:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount NUMERIC(10, 2)
);
Our task: create a report of total sales for a given month. So, we want to see:
- Month.
- Total sales for that month.
Procedure Structure
Here's the plan for our little procedure (don't worry, PL/pgSQL programming doesn't bite):
- Take an input parameter — the month.
- Select data for that month from the
orderstable. - Calculate the total sales.
- Return the result.
Procedure Implementation
Example code:
CREATE OR REPLACE FUNCTION monthly_sales_report(p_month DATE)
RETURNS TABLE (
month DATE,
total_sales NUMERIC(10, 2)
) AS $$
BEGIN
-- Select data for the given month and aggregate it
RETURN QUERY
SELECT
DATE_TRUNC('month', o.order_date) AS month,
SUM(o.total_amount) AS total_sales
FROM orders o
WHERE DATE_TRUNC('month', o.order_date) = DATE_TRUNC('month', p_month)
GROUP BY 1;
END;
$$ LANGUAGE plpgsql;
- Input parameter:
p_month— a date. We'll use it to filter data by month. - RETURN QUERY: this is the magic thing that lets you return data right from the procedure.
- DATE_TRUNC: used to round
order_dateto the start of the month. - SUM: aggregate function to sum up all orders.
- GROUP BY: we group data by month, since reports are built by month.
Now we can call our function:
SELECT * FROM monthly_sales_report('2023-08-01');
And we'll get something like:
| month | total_sales |
|---|---|
| 2023-08-01 | 50000.00 |
This function is the base. Let's make it more interesting!
Creating a More Complex Report
Now imagine we want to break down sales by customer. So, our report should show:
- Customer
- Month
- Total orders for that customer in the month
Let's change the procedure
CREATE OR REPLACE FUNCTION customer_monthly_report(p_month DATE)
RETURNS TABLE (
customer_id INT,
month DATE,
total_sales NUMERIC(10, 2)
) AS $$
BEGIN
RETURN QUERY
SELECT
o.customer_id,
DATE_TRUNC('month', o.order_date) AS month,
SUM(o.total_amount) AS total_sales
FROM orders o
WHERE DATE_TRUNC('month', o.order_date) = DATE_TRUNC('month', p_month)
GROUP BY o.customer_id, DATE_TRUNC('month', o.order_date);
END;
$$ LANGUAGE plpgsql;
Now call the procedure:
SELECT * FROM customer_monthly_report('2023-08-01');
And the result might look like this:
| customer_id | month | total_sales |
|---|---|---|
| 101 | 2023-08-01 | 20000.00 |
| 102 | 2023-08-01 | 30000.00 |
Using Temporary Tables
Sometimes when building complex reports, it's handy to use temporary tables. For example, if you need to process intermediate data.
CREATE OR REPLACE FUNCTION temp_table_example(p_month DATE)
RETURNS VOID AS $$
BEGIN
-- Create a temporary table
CREATE TEMP TABLE temp_sales AS
SELECT
customer_id,
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS total_sales
FROM orders
WHERE DATE_TRUNC('month', order_date) = DATE_TRUNC('month', p_month)
GROUP BY customer_id, DATE_TRUNC('month', order_date);
-- Do extra calculations or manipulations with this table
-- For example, show top-3 customers by order sum
RAISE NOTICE 'Top-3 customers for month %:', p_month;
FOR record IN
SELECT customer_id, total_sales
FROM temp_sales
ORDER BY total_sales DESC
LIMIT 3
LOOP
RAISE NOTICE 'Customer: %, Amount: %', record.customer_id, record.total_sales;
END LOOP;
END;
$$ LANGUAGE plpgsql;
In this case, the temp table temp_sales is used to store intermediate results.
Handy Tips
- Optimization: use indexes to speed up data selection.
- Divide by zero errors: always check the divisor so you don't "kill" your report.
- Date formatting: use functions like
TO_CHARfor nice output.
Hope you didn't have to yawn too much! Even cooler and more interesting tasks are coming up, so don't get too comfy!
GO TO FULL VERSION