Arrays in PostgreSQL let you store a bunch of values in a single table cell. Super handy when you want to group related stuff together, like a list of tags for an article or a set of product categories. But as soon as you start searching, filtering, or intersecting arrays, performance can take a nosedive. That’s where array indexing comes to the rescue. Indexes help speed up things like:
- checking if an array contains a specific element,
- finding arrays that contain certain elements,
- checking if arrays overlap.
Operators for Working with Arrays
Before we dive into creating indexes, let’s break down the main operators for working with arrays:
@> (contains) — checks if an array contains all elements from another array.
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
Here we’re looking for courses that have the tag "SQL".
<@ (is contained by) — checks if one array is contained in another.
SELECT *
FROM courses
WHERE ARRAY['PostgreSQL', 'SQL'] <@ tags;
Here we’re searching for courses whose tags include all elements from ARRAY['PostgreSQL', 'SQL'].
&& (overlap) — checks if arrays have any elements in common.
SELECT *
FROM courses
WHERE tags && ARRAY['NoSQL', 'Big Data'];
This query finds courses that have at least one of the tags "NoSQL" or "Big Data".
How Does Indexing Help?
Imagine you’ve got a courses table with millions of rows, and you run a query using one of the operators above. Without an index, PostgreSQL has to check every single row — an operation that could take forever (especially if your patience is like a dev waiting for a huge project to compile).
Indexes save you from all that pain. PostgreSQL gives you two types of indexes that work with arrays:
GIN(Generalized Inverted Index) — the best choice for arrays.BTREE— used for comparing whole arrays.
Example: Creating an Index for Arrays
Let’s make a small table with arrays so we can test everything out for real.
CREATE TABLE courses (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tags TEXT[] NOT NULL
);
Let’s add a few records:
INSERT INTO courses (name, tags)
VALUES
('SQL Basics', ARRAY['SQL', 'PostgreSQL', 'Databases']),
('Working with Big Data', ARRAY['Hadoop', 'Big Data', 'NoSQL']),
('Python Development', ARRAY['Python', 'Web', 'Data']),
('PostgreSQL Course', ARRAY['PostgreSQL', 'Advanced', 'SQL']);
Here’s what the table looks like:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 2 | Working with Big Data | {Hadoop, Big Data, NoSQL} |
| 3 | Python Development | {Python, Web, Data} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
No Index: Slow Search
Now imagine we want to find all courses that have the SQL tag.
EXPLAIN ANALYZE
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
This query will work, but if you’ve got a ton of data, it’ll be painfully slow. PostgreSQL will do what’s called a sequential scan — basically checking every row in the table.
Sample query result:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
Creating a GIN Index
To speed up the search, let’s create a GIN index:
CREATE INDEX idx_courses_tags
ON courses USING GIN (tags);
Let’s try the same query again:
EXPLAIN ANALYZE
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
Now PostgreSQL will use the GIN index, which will make the query run way faster.
If before you saw a sequential scan (Seq Scan), now the query plan will show a Bitmap Index Scan:
| Step | Rows | Cost | Info |
|---|---|---|---|
| Bitmap Index Scan | N | low | using index idx_courses_tags |
| Bitmap Heap Scan | N | low | fetching rows from table |
The exact values for Rows and Cost depend on your data size, but the main thing — the plan now uses the index.
How Do Operators Work with Indexes?
Example 1: The @> Operator
Query:
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
The GIN index is perfect for this operator. Postgres quickly checks which rows have the given element and returns the result.
Query result:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
@> reads as "contains" — this query returns all courses where the tags array has the value SQL.
Example 2: The && Operator
Query:
SELECT *
FROM courses
WHERE tags && ARRAY['NoSQL', 'Big Data'];
This operator checks for array overlap: it returns rows where the tags array overlaps with at least one element from the given array.
The GIN index does its magic again — the search is fast, even with tons of data.
Query result:
| id | name | tags |
|---|---|---|
| 2 | Working with Big Data | {Hadoop, Big Data, NoSQL} |
reads as "has overlap" — the condition is true if at least one tag matches.
Indexing and Optimization
When working with arrays, try to stick to these tips:
- Use
GINindexes for searching inside arrays. They’re way faster than sequential scans. - Only add indexes to columns you actually use a lot in queries. Indexes take up space and slow down inserts, so don’t index everything just because you can.
- Profile your queries with
EXPLAINandEXPLAIN ANALYZEto make sure your index is really being used.
Examples: Creating Indexes for Arrays
Let’s see how to create indexes for specific array operations and why you’d want to do this in real life.
Index for the @> Operator
Let’s say we already have this courses table:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 2 | Working with Big Data | {Hadoop, Big Data, NoSQL} |
| 3 | Python Development | {Python, Web, Data} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
To speed up queries with the @> operator (array contains element), let’s create a GIN index:
CREATE INDEX idx_courses_tags_gin
ON courses USING GIN (tags);
Now let’s run the query:
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
Result:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
Index for @>, <@, and && Operators
The table is the same as in the previous example.
Since the @>, <@, and && operators all work great with GIN indexes, you can create one universal index to speed up queries with any of these operators:
CREATE INDEX idx_tags
ON courses USING GIN (tags);
Query examples and their results:
@>— check if the array contains the given elements:
SELECT *
FROM courses
WHERE tags @> ARRAY['SQL'];
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
<@— check if the array is contained in another array:
SELECT *
FROM courses
WHERE tags <@ ARRAY['SQL', 'PostgreSQL', 'Advanced', 'Big Data', 'NoSQL', 'Python'];
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL, PostgreSQL, Databases} |
| 2 | Working with Big Data | {Hadoop, Big Data, NoSQL} |
| 3 | Python Development | {Python, Web, Data} |
| 4 | PostgreSQL Course | {PostgreSQL, Advanced, SQL} |
&&— check for array overlap:
SELECT *
FROM courses
WHERE tags && ARRAY['NoSQL', 'Big Data'];
| id | name | tags |
|---|---|---|
| 2 | Working with Big Data | {Hadoop, Big Data, NoSQL} |
Let’s Try Something a Bit Harder
Let’s write a query that returns courses where the tags have at least one overlap with the list ['Python', 'SQL', 'NoSQL']:
SELECT *
FROM courses
WHERE tags && ARRAY['Python', 'SQL', 'NoSQL'];
Output:
| id | name | tags |
|---|---|---|
| 1 | SQL Basics | {SQL,PostgreSQL,Databases} |
| 2 | Working with Big Data | {Hadoop,Big Data,NoSQL} |
| 3 | Python Development | {Python,Web,Data} |
With a GIN index, this kind of query runs instantly, even if your table has millions of rows.
Common Mistakes When Working with Arrays
Index not used: if you see Seq Scan in your EXPLAIN output, check that the index exists and that the operator you’re using actually supports indexing.
Rarely used array column: if your array column is rarely used in queries or gets updated a lot, the index might just take up space and not give you much benefit.
Too many indexes: indexes take up disk space and slow down writes, so only create the ones you really need and that will actually be used in queries.
Now you’ve got all the tools you need to work efficiently with arrays in PostgreSQL — speed up your queries with the @>, <@, && operators and GIN indexes. Don’t be shy, try it out on your own data!
GO TO FULL VERSION