Postgres
Airbyte's certified Postgres connector offers the following features:
- Replicate data from tables, views and materilized views. Other data objects won't be replicated to the destination like indexes, permissions.
- Multiple methods of keeping your data fresh, including Change Data Capture (CDC) and replication using the xmin system column.
- All available sync modes, providing flexibility in how data is delivered to your destination.
- Reliable replication at any table size with checkpointing and chunking of database reads.
The contents below include a 'Quick Start' guide, advanced setup steps, and reference information (data type mapping, and changelogs). See here to troubleshooting issues with the Postgres connector.
Please note the required minimum platform version is v0.58.0 for this connector.
Quick Start
Here is an outline of the minimum required steps to configure a Postgres connector:
- Create a dedicated read-only Postgres user with permissions for replicating data
- Create a new Postgres source in the Airbyte UI using
xmin
system column - (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs
Once this is complete, you will be able to select Postgres as a source for replicating data.
Step 1: Create a dedicated read-only Postgres user
These steps create a dedicated read-only user for replicating data. Alternatively, you can use an existing Postgres user in your database.
The following commands will create a new user:
CREATE USER <user_name> PASSWORD 'your_password_here';
Now, provide this user with read-only access to relevant schemas and tables. Re-run this command for each schema you expect to replicate data from:
GRANT USAGE ON SCHEMA <schema_name> TO <user_name>;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <user_name>;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO <user_name>;
Step 2: Create a new Postgres source in Airbyte UI
From your Airbyte Cloud or Airbyte Open Source account, select Sources
from the left navigation bar, search for Postgres
, then create a new Postgres source.
To fill out the required information:
- Enter the hostname, port number, and name for your Postgres database.
- You may optionally opt to list each of the schemas you want to sync. These are case-sensitive, and multiple schemas may be entered. By default,
public
is the only selected schema. - Enter the username and password you created in Step 1.
- Select an SSL mode. You will most frequently choose
require
orverify-ca
. Both of these always require encryption.verify-ca
also requires certificates from your Postgres database. See here to learn about other SSL modes and SSH tunneling. - Select
Standard (xmin)
from available replication methods. This uses the xmin system column to reliably replicate data from your database.- If your database is particularly large (> 500 GB), you will benefit from configuring your Postgres source using logical replication (CDC).
Step 3: (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs.
If you are on Airbyte Cloud, you will always need to modify your database configuration to allow inbound traffic from Airbyte IPs. You can find a list of all IPs that need to be allowlisted in our Airbyte Security docs.
Now, click Set up source
in the Airbyte UI. Airbyte will now test connecting to your database. Once this succeeds, you've configured an Airbyte Postgres source!
Advanced Configuration
Setup using CDC
Airbyte uses logical replication of the Postgres write-ahead log (WAL) to incrementally capture deletes using a replication plugin:
- See here to learn more on how Airbyte implements CDC.
- See here to learn more about Postgres CDC requirements and limitations.
We recommend configuring your Postgres source with CDC when:
- You need a record of deletions.
- You have a very large database (500 GB or more).
- Your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (
updated_at
).
These are the additional steps required (after following the quick start) to configure your Postgres source using CDC:
- Provide additional
REPLICATION
permissions to read-only user - Enable logical replication on your Postgres database
- Create a replication slot on your Postgres database
- Create publication and replication identities for each Postgres table
- Enable CDC replication in the Airbyte UI
Step 1: Prepopulate your Postgres source configuration
We recommend following the steps in the quick start section to confirm that Airbyte can connect to your Postgres database prior to configuring CDC settings.
For CDC, you must connect to primary/master databases. Pointing the connector configuration to replica database hosts for CDC will lead to failures.
Step 2: Provide additional permissions to read-only user
To configure CDC for the Postgres source connector, grant REPLICATION
permissions to the user created in step 1 of the quick start:
ALTER USER <user_name> REPLICATION;
Step 3: Enable logical replication on your Postgres database
To enable logical replication on bare metal, VMs (EC2/GCE/etc), or Docker, configure the following parameters in the postgresql.conf file for your Postgres database:
Parameter | Description | Set value to |
---|---|---|
wal_level | Type of coding used within the Postgres write-ahead log | logical |
max_wal_senders | The maximum number of processes used for handling WAL changes | min: 1 |
max_replication_slots | The maximum number of replication slots that are allowed to stream WAL changes | 1 (if Airbyte is the only service reading subscribing to WAL changes. More than 1 if other services are also reading from the WAL) |
To enable logical replication on AWS Postgres RDS or Aurora:
- Go to the Configuration tab for your DB cluster.
- Find your cluster parameter group. Either edit the parameters for this group or create a copy of this parameter group to edit. If you create a copy, change your cluster's parameter group before restarting.
- Within the parameter group page, search for
rds.logical_replication
. Select this row and click Edit parameters. Set this value to 1. - Wait for a maintenance window to automatically restart the instance or restart it manually.
To enable logical replication on Azure Database for Postgres, change the replication mode of your Postgres DB on Azure to logical
using the replication menu of your PostgreSQL instance in the Azure Portal. Alternatively, use the Azure CLI to run the following command:
az postgres server configuration set --resource-group group --server-name server --name azure.replication_support --value logical
az postgres server restart --resource-group group --name server
Step 4: Create a replication slot on your Postgres database
Airbyte requires a replication slot configured only for its use. Only one source should be configured that uses this replication slot.
For this step, Airbyte requires use of the pgoutput plugin. To create a replication slot called airbyte_slot
using pgoutput, run as the user with the newly granted REPLICATION
role:
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
The output of this command will include the name of the replication slot to fill into the Airbyte source setup page.
Step 5: Create publication and replication identities for each Postgres table
For each table you want to replicate with CDC, follow the steps below:
- Add the replication identity (the method of distinguishing between rows) for each table you want to replicate:
ALTER TABLE tbl1 REPLICA IDENTITY DEFAULT;
In rare cases, if your tables use data types that support TOAST or have very large field values, consider instead using replica identity type full: ALTER TABLE tbl1 REPLICA IDENTITY FULL;
.
- Create the Postgres publication. You should include all tables you want to replicate as part of the publication:
CREATE PUBLICATION airbyte_publication FOR TABLE <tbl1, tbl2, tbl3>;`
The publication name is customizable. Refer to the Postgres docs if you need to add or remove tables from your publication in the future.
The Airbyte UI currently allows selecting any tables for CDC. If a table is selected that is not part of the publication, it will not be replicated even though it is selected. If a table is part of the publication but does not have a replication identity, that replication identity will be created automatically on the first run if the Airbyte user has the necessary permissions.
Step 6: Enable CDC replication in Airbyte UI
In your Postgres source, change the replication mode to Logical Replication (CDC)
, and enter the replication slot and publication you just created.
Postgres Replication Methods
The Postgres source currently offers 3 methods of replicating updates to your destination: CDC, xmin and standard (with a user defined cursor). Both CDC and xmin are the most reliable methods of updating your data.
CDC
Airbyte uses logical replication of the Postgres write-ahead log (WAL) to incrementally capture deletes using a replication plugin. To learn more how Airbyte implements CDC, refer to Change Data Capture (CDC). We recommend configuring your Postgres source with CDC when:
- You need a record of deletions.
- You have a very large database (500 GB or more).
- Your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (
updated_at
).
If your goal is to maintain a snapshot of your table in the destination but the limitations prevent you from using CDC, consider using the xmin replication method.
Xmin
Xmin replication is the new cursor-less replication method for Postgres. Cursorless syncs enable syncing new or updated rows without explicitly choosing a cursor field. The xmin system column which (available in all Postgres databases) is used to track inserts and updates to your source data.
This is a good solution if:
- There is not a well-defined cursor candidate to use for Standard incremental mode.
- You want to replace a previously configured full-refresh sync.
- You are replicating Postgres tables less than 500GB.
- Your database doesn't incur heavy writes that would lead to transaction ID wrap around
- You are not replicating non-materialized views. Non-materialized views are not supported by xmin replication.
Connecting with SSL or SSH Tunneling
SSL Modes
Airbyte Cloud uses SSL by default. You are not permitted to disable
SSL while using Airbyte Cloud.
Here is a breakdown of available SSL connection modes:
disable
to disable encrypted communication between Airbyte and the sourceallow
to enable encrypted communication only when required by the sourceprefer
to allow unencrypted communication only when the source doesn't support encryptionrequire
to always require encryption. Note: The connection will fail if the source doesn't support encryption.verify-ca
to always require encryption and verify that the source has a valid SSL certificateverify-full
to always require encryption and verify the identity of the source