Using the PostGIS extension on Managed Databases

PostGIS - Overview

PostGIS is an extension to the PostgreSQL database management system that allows to store and manage Geographic Information Systems (GIS) objects in PostgreSQL. It is available on Scaleway Managed Databases. It was developed by the Canadian company Refractions Research Inc. PostGIS was published under the GNU GPL license, which means that the solution is available as open source software, contrary to many other GIS system which are closed source software. Spatial data types refer to shapes such as point, line, and polygon. Spatial databases fully integrate spatial data with an object relational database. PostGIS includes support for GiST-based R-Tree spatial indexes, and functions for analysis and processing of GIS objects.

How close is the nearest metro station from my office? What is the shortest route from my home to the shopping mall? Where should a new hospital be build to be as near as possible to a maximum of the population of a city?

These questions and other like them are asked on every day by people all ofter the world. Answers to these questions were historically provided by specialised GIS (Geographic Information Systems) applications. While this is completely fine for answering a few questions, it is not a solution for our modern lives where massive spatial datasets have to be handled, such as all the routes in Europe in one dataset. This is where modern systems like PostGIS step in.

Requirements

Creating a New Database for PostGIS

The following steps assume that you have already created a Database Instance. For more information how to create and setup a managed database instance, you may follow the Database documentation.

1 . Connect to your Scaleway Console and enter the Database section.

2 . Choose your Database instance by clicking on its name.

3 . Click on the Databases tab, then on + Create Database.

4 . Enter the name of the new database, in this tutorial we use townssurvey.

5 . Click on the Users tab, followed by and Update Permissions.

6 . Grant the permissions to the newly created townssurvey database to your user.

Enabling the PostGIS Extension

1 . Connect to your managed Database using the psql client:

psql -h <rbd_host> --port <rdb_port> -d townssurvey -U <rdb_user>

Enter the password for your database user when prompted.

You can find the rdb_host and rdb_port on the database information page.

2 . Once connected enable the postgis extension for the database:

CREATE EXTENSION postgis;

3 . Verify that everything went well, by running the following command:

SELECT PostGIS_version();

You will see an output like the following example, which indicates that the extension has been correctly installed on the database:

townssurvey=> SELECT PostGIS_version();
            postgis_version
---------------------------------------
 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
(1 row)

4 . The database is ready, log out of the PostgreSQL shell now:

\q

Optimizing the Database Instance for GIS Database Objects

PostgreSQL is designed to run in very different configurations, from small test or development setups to large databases handling millions rows. GIS database objects are large in comparison to plain text data. Scaleway Managed databases are optimized for performance by default, but it is possible to fine-tune them for some parameters.

1 . Enter the configuration section of your database instance in the Scaleway console

2 . Choose your database instance from the list, and click on it to see the database information page. Then click on Advanced Settings.

3 . The advanced settings tab displays. Click on Add Advanced Settings to configure the parameters of your PostgreSQL server.

The parameter work_mem defines the amount of memory that internal sorting operations and hash tables can consume in the RAM, set this value to 16MB.

4 . Save the configuration by clicking on the check mark symbol (✔). The configuration of your database instance is being updated:

Your database instance is configured now, and it is time to load some spatial data.

Downloading and Converting a Dataset

To get familiar with spatial data, download some demo data and insert them in the database.

Note: You can run the following commands either on a local computer running Ubuntu or on a virtual cloud instance.

1 . Install the required software for the following steps using apt:

apt update && apt install -y wget unzip postgis

2 . Create a directory to store the downloaded data and enter the newly created directory:

mkdir -p /postgis/townssurvey && cd /postgis/townssurvey

3 . Download a dataset. We use a sample provided by MassGIS Data which contains information about the legal boundary for each of the 351 municipalities in Massachusetts.

wget http://download.massgis.digital.mass.gov/shapefiles/state/townssurvey_shp.zip

4 . Unzip the downloaded data and list the unpacked data to view it:

unzip townssurvey_shp.zip && ls

5 . You will notice several .dbf, .prj, .shx and .shp files in the folder. These files together are called a ShapeFile, a common digital vector storage format for storing geometric location an associated attribute information for legacy GIS software.

To import the data into the database, it has to be converted to SQL. To achieve this, PostGIS provides a tool called shp2pgsql which can automatically convert .shp files into readable .sql files. Convert the file TOWNSSURVEY_POLY.shp into a SQL file by running the following command:

shp2pgsql -d ./TOWNSSURVEY_POLY.shp > /postgis/TOWNSSURVEY_POLY.sql

6 . Import the generated SQL file into your database using the psql client:

psql -h <rbd_host> --port <rdb_port> -d townssurvey -U <rdb_user> -f /postgis/TOWNSSURVEY_POLY.sql

An output like the following displays on the screen:

INSERT 0 1
INSERT 0 1
[...]
INSERT 0 1
INSERT 0 1
COMMIT
ANALYZE

The import has been completed successfully once the word ANALYZE displays.

Querying Spatial Data

Now as some demo data is available in the database, you can start querying it.

1 . Connect yourself to the psql console:

psql -h <rbd_host> --port <rdb_port> -d townssurvey -U <rdb_user>

2 . List the tables available in the database:

townssurvey=> \dt

The command above will return a list of two tables:

                  List of relations
 Schema |       Name       | Type  |      Owner
--------+------------------+-------+-----------------
 public | spatial_ref_sys  | table | _rdb_superadmin
 public | townssurvey_poly | table | rdb_user
(2 rows)

3 . The table to query is townssurvey_poly. To view a list of all available columns in the table, run the following command:

townssurvey=> \d townssurvey_poly

Which returns a list as follows:

                                      Table "public.townssurvey_poly"
   Column   |          Type          | Collation | Nullable |                    Default
------------+------------------------+-----------+----------+-----------------------------------------------
 gid        | integer                |           | not null | nextval('townssurvey_poly_gid_seq'::regclass)
 town       | character varying(21)  |           |          |
 town_id    | integer                |           |          |
 pop1980    | bigint                 |           |          |
 pop1990    | bigint                 |           |          |
 pop2000    | bigint                 |           |          |
 popch80_90 | integer                |           |          |
 popch90_00 | bigint                 |           |          |
 type       | character varying(2)   |           |          |
 island     | integer                |           |          |
 coastal_po | character varying(3)   |           |          |
 fourcolor  | integer                |           |          |
 fips_stco  | bigint                 |           |          |
 ccd_mcd    | character varying(3)   |           |          |
 fips_place | character varying(5)   |           |          |
 fips_mcd   | bigint                 |           |          |
 fips_count | integer                |           |          |
 acres      | numeric                |           |          |
 square_mil | numeric                |           |          |
 pop2010    | bigint                 |           |          |
 popch00_10 | bigint                 |           |          |
 shape_leng | numeric                |           |          |
 shape_area | numeric                |           |          |
 geom       | geometry(MultiPolygon) |           |          |
Indexes:
    "townssurvey_poly_pkey" PRIMARY KEY, btree (gid)

The geom coulum data type contains spatial data of the MultiPolygon type.

As the data is now ready, we can find answers to questions like “What are the 10 towns that are located the in most northern part of Massachusetts?

To get the answer to this question, it is possible to query the spatial data in the database. As the town boundaries are not linear, there is more than one value for a latitude. So to get the latitude for each town, the centroid of each town has to be gathered using the ST_Centroid function of PostGIS. The centroids Y value is then extracted using the ST_Y function and this value can be used as latitude.

As several cities have more than one entry in the database, the results are filtered using the DISTINCT ON function from PostgreSQL.

The following query returns a list of the 10 most north towns in Massachusetts:

SELECT town, ST_Y(ST_Centroid(geom)) as latitude FROM (
  SELECT DISTINCT ON (town) *
  FROM townssurvey_poly
) t
ORDER BY latitude DESC
LIMIT 10;

The result will look like the following list:

town     |     latitude
--------------+------------------
AMESBURY     | 953947.884028235
SALISBURY    | 952906.606868324
MERRIMAC     | 952462.995879863
NEWBURYPORT  | 950674.541316222
WEST NEWBURY | 949284.811099743
NEWBURY      | 945386.010693761
HAVERHILL    |  944694.91004576
GROVELAND    | 944664.684131484
CLARKSBURG   | 943028.430506038
MONROE       | 942395.280914487
(10 rows)

As you have your answer to your question now, quit the database:

\q

For more information about PostGIS, refer to the official documentation.

Discover the Cloud That Makes Sense