Pages

Showing posts with label Replication. Show all posts
Showing posts with label Replication. Show all posts

Friday, January 3, 2020

Replication in PostgreSQL

Knowledge of high availability is a must for anybody managing PostgreSQL. It is a topic that we have seen over and over, but that never gets old.
When talking about replication, we will be talking a lot about WALs. So, let's review a little bit what is this about.

Write Ahead Log (WAL):

Write Ahead Log is a standard method for ensuring data integrity, it is automatically enabled by default.

The WALs are the REDO logs in PostgreSQL. But, what are the REDO logs?

REDO logs contain all changes that were made in the database and they are used by replication, recovery, online backup and point in time recovery (PITR). Any changes that have not been applied to the data pages can be redone from the REDO logs.

Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction.

A WAL record will specify, bit by bit, the changes made to the data. Each WAL record will be appended into a WAL file. The insert position is a Log Sequence Number (LSN) that is a byte offset into the logs, increasing with each new record.

The WALs are stored in pg_xlog (or pg_wal in PostgreSQL 10) directory, under the data directory. These files have a default size of 16MB (the size can be changed by altering the --with-wal-segsize configure option when building the server). They have a unique incremental name, in the following format: "00000001 00000000 00000000".

The number of WAL files contained in pg_xlog (or pg_wal) will depend on the value assigned to the parameter checkpoint_segments (or min_wal_size and max_wal_size, depending on the version) in the postgresql.conf configuration file.

One parameter that we need to setup when configuring all our PostgreSQL installations is the wal_level. It determines how much information is written to the WAL .The default value is minimal, which writes only the information needed to recover from a crash or immediate shutdown. Archive adds logging required for WAL archiving; hot_standby further adds information required to run read-only queries on a standby server; and, finally logical adds information necessary to support logical decoding. This parameter requires a restart, so, it can be hard to change on running prod databases if we have forgotten that.

History of Replication in PostgreSQL:

The first replication method (warm standby) that PostgreSQL implemented (version 8.2 , back in 2006) was based on the log shipping method.

This means that the WAL records are directly moved from one database server to another to be applied. We can say that is a continuous PITR.

PostgreSQL implements file-based log shipping by transferring WAL records one file (WAL segment) at a time.

This replication implementation has the downside that if there is a major failure on the primary servers, transactions not yet shipped will be lost. So there is a window for data loss (you can tune this by using the archive_timeout parameter, which can be set to as low as a few seconds, but such a low setting will substantially increase the bandwidth required for file shipping).

Replication types:
1.warm stand by replication (this is upto 8.4)
2.streaming replication (from 9 onwards )
3.cascading replication
4.logical replication

sync Types:
async - default
sync

Pictorial representation of replication in PostgreSQL:


















1) Warm stand by replication:

  • Warm-standby is simply a replica that's not open for read-only SQL statements. i.e., This is active -passive, we can able to access master server, you can't able to access slave sever, if master server crashed or completely
  • showdown in that case we can make slave server as master serve. but after some time if you want to add old master as slave or you want to fail back, those options will not work.
  • only option is you build replication freshly.
  • This will be suitable for Disaster recovery purpose.

in the older version wal_level=archive (it will takes the WAL file backup form master and restore into slave server). It will be a time lagging process, so most of the times we will see lag b/w master and slave..

2) Streaming Replication:

  • Streaming replication is the upgraded Technic that allows to send modified data blocks without waiting for the WAL file to be closed by the primary. 
  • It's send as soon as that block is written the WAL file. So we can reduce the lag between primary and secondary nodes.
  • Here we will keep wal_level=replication
  • Complete pumping/streaming will happen, due to this there will very less chances for lag.
  • In this we have both ACTIVE ACTIVE server, but one server will be in read only mode, you can able to run select queries only.
  • Always slave server will allows only SELECT queries, at least we can split the read and write queries across 2 servers.
  • If master server went down we can simply change the slave as master. if you want to do fallback simply we can do.
  • This is for high availability, at any point of time DB should be available to users.
  • Streaming replication is asynchronous by default
















Multi slave replication:

master --> slave1  slave2  --here both the salves will be depends on upon master.

3) cascading replication:

A(M)  --> B(S)
B(M)  --> C(S)

This concept we call it as cascading replication, A is the master for B, so B in read only mode, B is the master for C, so C is in read only mode, B also in read only mode, here there will be no direct impact to master. now we can split the read quires across the slave server.

4) Logical replication:

for replicating some set of tables/schemas/complete database from one server to another server, there is no master slave concept. Here we ll not discuss this replication in details. I ll post logical replication in separate post.

Now from here we can discuss how we can set replication.

For Replication setup :

wal_level=replica (streaming--async/async/cascading/multi slave , logical-logical replication) --master side
recovery.conf (slave side)--connection information of master server.

Assume:
192.168.10.11   -master (example01)
192.168.10.12   -slave (example02)

enable ssh connection:
Enable ssh password less connectivity between master and slave servers.

Configuration- Master server

1) Verify port and services in master db
2) For the initial sync we to run the base-backup.
3) Enable pg_hba.conf entry both sides.
host    replication     postgres        192.168.10.11/32       trust
host    replication     postgres        192.168.10.12/32       trust
or allow all IPs
host    replication     postgres        0.0.0.0/0        trust

4) configure master server:
listen_addresses = '*' # any IP can connect
wal_level = replica
NOTE:
 minimal: some thing went wrong with server, wal files will remains only which are required for recovery.
 replica: it will keep the enough wal files in wal directory, which is required for replication setup.
 logical: as part of logical replication we will use this parameter.
5) Set below parameters for wal files:
max_wal_senders = 10
wal_keep_segments = 1000 #(1000 wal files avaible in pg_wal directory, you can keep in size as well like 2gb/3g/4gb.)
after changing this parameter need to restart services.
6) archive mode on
         archive_mode=on
    archive_command='cp % /var/lib/pgsql/archive/%f'

7) Enable below parameters for failover and fail-back
wal_log_hints = on #(tracking itsself)
recovery_min_apply_delay = 0 #(delay between master and slave) If you want your salve should apply WAL files with 1 hour delay, put 60 as value 

Configuration - Slave server

1) Restore backup taken in pg_basebackup and use different port
2) Configure parameter in postgres.conf (contains master server connection information)
recovery.conf
       standby_mode = 'on' ( default)
   primary_conninfo = 'user=postgres host=192.168.10.11 port=5432'
hot_standby = on 
3) Give permissions to data directory
chmod -R 700
4) start slave server
pg_ctl start

Note: slave server only in recovery mode.

5) For big env., you need to always scp the WAL files like below example in recovery.conf

  restore_command='scp postgres@192.168.10.11:/var/lib/pgsql/archive/%f %p'

Check the replication status

from master:

[postgres@example02 11]$ ps -ef|grep sender
postgres   2576   2529  0 19:40 ?        00:00:00 postgres: walsender postgres 192.168.10.12(16864) streaming 0/5000060

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 2576
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 192.168.10.12
client_hostname  |
client_port      | 16864
backend_start    | 2019-09-06 19:40:49.430354-07
backend_xmin     |
state            | streaming
sent_lsn         | 0/5000140
write_lsn        | 0/5000140
flush_lsn        | 0/5000140
replay_lsn       | 0/5000140
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

From slave server:

[postgres@example01 11]$ ps -ef|grep receiver
postgres  13546  13541  0 19:40 ?        00:00:00 postgres: walreceiver   streaming 0/5000140

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t

Note:
always it should be (t) true, then only it is slave server, at the same time by seeing recovery.conf file in any server data directory, we call that server as slave server.

Verify lag:

select now()-pg_last_xact_replay_timestamp() as replication_lag;

SELECT
  pg_last_wal_receive_lsn() receive,
  pg_last_wal_replay_lsn() replay,
  (
   extract(epoch FROM now()) -
   extract(epoch FROM pg_last_xact_replay_timestamp())
  )::int lag;

if lag is there it will  display in seconds. But up to any time lag is ok, WAL files/archive files should exist, then only slave server slowly applies logs one by one.

It is recommended that, for big application environment keep 'restore_command' with archive location in recovery.conf file.

This will effect if required WAL file not exists in the pg_wal of master, then simply WAL receiver goes to archive location.
e.g.,
   restore_command='scp postgres@192.168.10.11:/var/lib/pgsql/archive/%f %p'

==================================
To Make it as sync replication
==================================

in master server postgresql.conf file make this entry

synchronous_standby_names = '*'

[postgres@example02 11]$ pg_ctl reload
server signaled
2019-09-06 19:48:44.581 PDT [2529] LOG:  received SIGHUP, reloading configuration files
2019-09-06 19:48:44.582 PDT [2529] LOG:  parameter "synchronous_standby_names" changed to "*"
[postgres@example02 11]$


[postgres@example02 11]$ psql
psql (11.0)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 2576
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 192.168.10.12
client_hostname  |
client_port      | 16864
backend_start    | 2019-09-06 19:40:49.430354-07
backend_xmin     |
state            | streaming
sent_lsn         | 0/5000140
write_lsn        | 0/5000140
flush_lsn        | 0/5000140
replay_lsn       | 0/5000140
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 1
sync_state       | sync

If slave server services stopped, primary will be affected.

Note:
If your setup is with 2 node, then don't keep slave in 'sync' mode.

Test sync functionality:

stop slave server and create one sample database in master, it will not think about slave sevrer, simply commit in master server, once you started slave, automatically it will comes to sync.

then create one database in master as a example.

if slave server in stop mode transaction will not commit automatically, we have to cancel the transaction, then only it will commit in master.

postgres=# create database test1db;
2020-01-03 17:57:00.479 PST [3443] LOG:  standby "walreceiver" is now a synchronous standby with priority 1
CREATE DATABASE
postgres=# create database test2db;
^CCancel request sent
2020-01-03 17:57:38.372 PST [3439] WARNING:  canceling wait for synchronous replication due to user request
2020-01-03 17:57:38.372 PST [3439] DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
CREATE DATABASE

==================
multi slave
===================
one master multiple slaves, direct headache will be on master, there might be chances for netwrok delay.

You are now connected to database "postgres" as user "postgres".

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 3448
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 192.168.38.133
client_hostname  |
client_port      | 5401
backend_start    | 2020-01-03 17:58:05.884697-08
backend_xmin     |
state            | streaming
sent_lsn         | 0/14000060
write_lsn        | 0/14000060
flush_lsn        | 0/14000060
replay_lsn       | 0/14000060
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+------------------------------
pid              | 3546
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 192.168.38.133
client_hostname  |
client_port      | 5404
backend_start    | 2020-01-03 18:07:46.055172-08
backend_xmin     |
state            | streaming
sent_lsn         | 0/14000060
write_lsn        | 0/14000060
flush_lsn        | 0/14000060
replay_lsn       | 0/14000060
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

Here I sent two replications in one server.

===============================
cascading replication:
==============================
Here there is no concept of one master. One of slave act as master for another slave. So direct dependency between multiple slaves can be avoided but at the same time we have many high availability or replication servers for different purposes.

A master - B slave
B master - C slave

here none of the 2 servers depending up on one master, so load will be less one master server, but if you want run reporting  queries on top slave servers you can use both slaves.

for cascade replication within the server:

e.g., Taking a base backup
$ /usr/local/pgsql/bin/pg_basebackup -D /var/lib/pgsql/slave/ -X fetch -R -p5432
$

standby_mode = 'on'
primary_conninfo = 'user=postgres port=5432'

change the port in new slave.

$ /usr/local/pgsql/bin/pg_ctl -D /var/lib/pgsql/slave/ start
waiting for server to start....2020-01-03 18:14:43.221 PST [13805] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2020-01-03 18:14:43.221 PST [13805] LOG:  listening on IPv6 address "::", port 5433
2020-01-03 18:14:43.222 PST [13805] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2020-01-03 18:14:43.238 PST [13806] LOG:  database system was interrupted while in recovery at log time 2020-01-03 18:10:06 PST
2020-01-03 18:14:43.238 PST [13806] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2020-01-03 18:14:43.250 PST [13806] LOG:  entering standby mode
2020-01-03 18:14:43.253 PST [13806] LOG:  redo starts at 0/14000060
2020-01-03 18:14:43.255 PST [13806] LOG:  consistent recovery state reached at 0/14000140
2020-01-03 18:14:43.255 PST [13806] LOG:  invalid record length at 0/14000140: wanted 24, got 0
2020-01-03 18:14:43.255 PST [13805] LOG:  database system is ready to accept read only connections
2020-01-03 18:14:43.262 PST [13810] LOG:  started streaming WAL from primary at 0/14000000 on timeline 1
 done
server started

-- In primary ( master)

$ /usr/local/pgsql/bin/psql -p5432
psql (11.4)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 13811
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 2020-01-03 18:14:43.259911-08
backend_xmin     |
state            | streaming
sent_lsn         | 0/14000140
write_lsn        | 0/14000140
flush_lsn        | 0/14000140
replay_lsn       | 0/14000140
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

postgres=#

slave also acting as master for another slave,here we 2 slaves but still on top of master no direct impact.