原因1是在開始備份時會開啟強制Full page write,原因2是pg_basebackup工具通過復制槽的方式阻塞WAL日志的回收
公有云的用戶在使用Postgresql時,經常會發現一個現象,備份過程中WAL空間占用會明顯增加。
一個明顯的原因是在開始備份時會開啟強制Full page write,在WAL中記錄完整的數據頁,避免在拷貝數據頁的同時刷臟頁導致拷貝部分有效頁的場景。
以pg12代碼為例
do_pg_start_backup
{
……
XLogCtl->Insert.forcePageWrites = true;
……
}
還有一個原因經常容易被忽視,很多備份程序都會使用Postgresql的原生pg_basebackup工具來進行備份。對于Postgresql的WAL來說,pg_basebackup提供了兩個方式來備份:
fetch
The write-ahead log files are collected at the end of the backup. Therefore, it is necessary for the wal_keep_segments parameter to be set high enough that the log is not removed before the end of the backup. If the log has been rotated when it's time to transfer it, the backup will fail and be unusable.
stream
Stream the write-ahead log while the backup is created. This will open a second connection to the server and start streaming the write-ahead log in parallel while running the backup. Therefore, it will use up two connections configured by the max_wal_senders parameter. As long as the client can keep up with write-ahead log received, using this mode requires no extra write-ahead logs to be saved on the master.
第一種方式和很多第三方的備份工具處理類似,可能會出現拷貝過程中WAL被復用導致拷貝一個無效WAL,發生備份不可用。
第二種方式時默認采用,pg_basebackup采用和備機類似的方法,建立一個流復制的方式來獲取WAL日志,同時建立復制槽來延遲WAL日志的回收,保證了備份WAL日志完整可用。
以pg12代碼為例
StartLogStreamer(xlogstart, starttli, sysidentifier);
{
……
if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, temp_replication_slot, true, true, false))
……
}
Checkpoint的過程中通過replication slot的最小lsn來阻塞住wal日志的回收。
KeepLogSeg
{
keep = XLogGetReplicationSlotMinimumLSN();
/* compute limit for wal_keep_segments first */
if (wal_keep_segments > 0)
{
/* avoid underflow, don't go below 1 */
if (segno <= wal_keep_segments)
segno = 1;
else
segno = segno - wal_keep_segments;
}
/* then check whether slots limit removal further */
if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
{
XLogSegNo slotSegNo;
XLByteToSeg(keep, slotSegNo, wal_segment_size);
if (slotSegNo <= 0)
segno = 1;
else if (slotSegNo < segno)
segno = slotSegNo;
}
}
正如pg_basebackup的文檔中所示,wal_keep_segments作為第一道防線阻塞住wal日志的回收,replication slot的最小lsn作為第二道防線。
上述兩個原因回答了備份過程中WAL空間膨脹的原因。