How to Select Top N Rows in PostgreSQL

Joy Idialu Feb 02, 2024
  1. Create a Table in PostgreSQL
  2. Use the FETCH Clause to Select the Top N Rows in PostgreSQL
  3. Use the LIMIT Clause to Select the Top N Rows in PostgreSQL
  4. Use the ROW_NUMBER() Function to Select the Top N Rows in PostgreSQL
  5. Use the RANK() Function to Select the Top N Rows in PostgreSQL
  6. Use the DENSE_RANK() Function to Select the Top N Rows in PostgreSQL
  7. Table Indexing in PostgreSQL
  8. Performance Analysis
How to Select Top N Rows in PostgreSQL

Getting the highest values in a table is an important query to know. One of its areas of application is in generating reports.

This tutorial walks you through different ways to get the N highest scores in a PostgreSQL table.

Create a Table in PostgreSQL

Consider a student_score table with N students and their respective scores as shown here:

id studentid score
1 1 54
2 2 75
3 3 52
4 4 55
5 5 93
6 6 74
7 7 92
8 8 64
9 9 89
10 10 81

Query:

CREATE TABLE public.student_score
(
    id bigint NOT NULL
        GENERATED ALWAYS AS IDENTITY
        ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
    studentid bigint NOT NULL,
    score integer NOT NULL,
    CONSTRAINT student_score_pkey PRIMARY KEY (id)
)

Here’s the INSERT statement to populate the table with 1 million rows of test data:

INSERT INTO student_score (studentid, score)
SELECT i, floor(random()*(100-50+1))+50
FROM generate_series(1,1000000) i

Use the FETCH Clause to Select the Top N Rows in PostgreSQL

The FETCH clause returns only the specified number of scores. The table is sorted in descending order, with the highest value at the top and the score decreasing as there is a progression down the table.

Then the clause is used to return only 10 of the highest scores, as specified in the query below.

SELECT * FROM student_score
ORDER BY score DESC
FETCH NEXT 10 ROWS ONLY

Output:

id   | studentid | score
-----|---------- |--------
324  | 324       | 100
379  | 379       | 100
282  | 282       | 100
79   | 79        | 100
335  | 335       | 100
91   | 91        | 100
13   | 13        | 100
108  | 108       | 100
52   | 52        | 100
414  | 414       | 100

This query, however, returns the top 10 students with the highest scores. In this example, 100 is the highest score any student can achieve, and the query’s output shows that at least 10 students scored 100.

It doesn’t give the 10 different highest values that any student has achieved. If there is a need to get the highest 10 unique scores acquired by any student, the DISTINCT clause is used as shown here:

SELECT DISTINCT score FROM student_score
ORDER BY score DESC
FETCH NEXT 10 ROWS ONLY

In addition, if there is a requirement to know how many students had the highest 10 scores recorded, this query can be used to accomplish that:

SELECT DISTINCT score, COUNT(studentid) FROM student_score
GROUP BY score
ORDER BY score DESC
FETCH NEXT 10 ROWS ONLY

Output:

score | count
----- |-------
100   | 19518
99    | 19719
98    | 19412
97    | 19588
96    | 19652
95    | 19396
94    | 19649
93    | 19427
92    | 19880
91    | 19580

Use the LIMIT Clause to Select the Top N Rows in PostgreSQL

The LIMIT clause returns the maximum number of rows specified. In this case, 10 is the maximum number of rows to be returned.

Using this clause also requires that the rows be sorted in descending order. Here’s the query:

SELECT DISTINCT score, COUNT(studentid) FROM student_score
GROUP BY score
ORDER BY score DESC
LIMIT 10

Use the ROW_NUMBER() Function to Select the Top N Rows in PostgreSQL

The ROW_NUMBER() function can be used to obtain the same result. The query is given below.

SELECT score, student_count
FROM (
    SELECT DISTINCT score AS score,
    COUNT(studentid) AS student_count,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS score_rank
    FROM student_score
    GROUP BY score
) subquery
WHERE score_rank <= 10
ORDER BY score_rank

Use the RANK() Function to Select the Top N Rows in PostgreSQL

Here’s a query using the RANK() function:

SELECT score, student_count
FROM (
    SELECT DISTINCT score AS score,
    COUNT(studentid) AS student_count,
    RANK() OVER (ORDER BY score DESC) AS score_rank
    FROM student_score
    GROUP BY score
) subquery
WHERE score_rank <= 10
ORDER BY score_rank

Use the DENSE_RANK() Function to Select the Top N Rows in PostgreSQL

Here’s a query using the DENSE_RANK() function:

SELECT score, student_count
FROM (
    SELECT DISTINCT score AS score,
    COUNT(studentid) AS student_count,
    DENSE_RANK() OVER (ORDER BY score DESC) AS score_rank
    FROM student_score
    GROUP BY score
) subquery
WHERE score_rank <= 10
ORDER BY score_rank

Table Indexing in PostgreSQL

To improve performance on a large database, you can use indexes. The score column sorted in descending order can be indexed in this example.

It is done by creating an index and adding specific conditions, in this case, sorting the score column in descending order, as shown below.

CREATE INDEX score_desc_idx ON student_score(score DESC)

As mentioned earlier, indexing improves query performance for tables with larger records most effectively. But it isn’t advisable to use it if the table is frequently modified (witH frequent inserts and updates).

Also, PostgreSQL will not use indexes in certain cases, such as when a table scan is quicker.

Performance Analysis

Before creating the score_desc_idx index, the execution time for each operation was given as,

Clause/Function Execution time (ms)
FETCH 844
LIMIT 797
ROW_NUMBER() 745
RANK() 816
DENSE_RANK() 701

However, this was just after a single run. Running it multiple times gave varying times which did not exceed a range.

You can use the EXPLAIN ANALYZE clause to determine which approach is less costly for your database.

EXPLAIN ANALYZE SELECT DISTINCT score, COUNT(studentid) FROM student_score
GROUP BY score
ORDER BY score DESC
LIMIT 10

Output:

Limit  (cost=13636.35..13636.42 rows=10 width=12) (actual time=763.166..797.358 rows=10 loops=1)
->  Unique  (cost=13636.35..13636.73 rows=51 width=12) (actual time=763.163..797.351 rows=10 loops=1)"
        ->  Sort  (cost=13636.35..13636.48 rows=51 width=12) (actual time=763.161..797.342 rows=10 loops=1)"
             Sort Key: score DESC, (count(studentid))"
              Sort Method: quicksort  Memory: 27kB"
             ->  Finalize GroupAggregate  (cost=13621.98..13634.90 rows=51 width=12) (actual time=762.733..797.231 rows=51 loops=1)"
                   Group Key: score"
                   ->  Gather Merge  (cost=13621.98..13633.88 rows=102 width=12) (actual time=762.715..797.047 rows=153 loops=1)"
                          Workers Planned: 2"
                          Workers Launched: 2"
                          ->  Sort  (cost=12621.96..12622.09 rows=51 width=12) (actual time=632.708..632.724 rows=51 loops=3)
                                Sort Key: score DESC
                                Sort Method: quicksort  Memory: 27kB"
                                Worker 0:  Sort Method: quicksort  Memory: 27kB"
                                Worker 1:  Sort Method: quicksort  Memory: 27kB"
                                ->  Partial HashAggregate  (cost=12620.00..12620.51 rows=51 width=12) (actual time=632.509..632.546 rows=51 loops=3)"
                                      Group Key: score
                                      ->  Parallel Seq Scan on student_score  (cost=0.00..10536.67 rows=416667 width=12) (actual time=0.067..176.426 rows=333333 loops=3)
Planning Time: 3.228 ms
Execution Time: 799.928 ms

It can be seen that PostgreSQL did not use the index created earlier, as scanning the table is relatively fast in this case. Since adding an index does not make any significant change to the execution time, the index can be removed by executing this command:

DROP INDEX score_desc_idx;

It’s up to you to decide which approach works best, and any approach can be used to achieve the same result.

Related Article - PostgreSQL Row