Now I'm tackling the fourth GTM improvement, to correct backup algorithm.
Current implementation just proxies begin transaction/get GXID type command directly to the backup. Backup tries to assign GXID and transaction handle independently and depending upon the order of each thread execution, each transaction can be assigned different gxid at GTM standby, which may cause serious problem when standby fails over.
The improvement corrents this. Now both handle and gxid are backed up. GTM standby will assign transaction struct slot based upon backed-up handle and use backed-up gxid to keep everything consistent.
One thought about sequence. Because it is not practical to backup current sequence value, sequence command is essentially proxied to the backup (standby). Because history of sequence value has very little meaning and requirement is to begin the next value correctly when failed over, I think current mechanism should work well.
More over, I eliminated needless response from Standby to ACT. When synchronous backup is specified, acknowledgement will be exchanged to make sure that backups reached the standby.
Most of this is skipped when message comes through GTM-Proxy. I need to find where I should insert the code to synchronize in this case.
----
(Addenda: 30th Dec., 2011)
1. Add "Backup_synchronously" to GTM_conn or GTM_TheadInfo
2. Check Backup_synchronously when flush to the client and "send_smething". Send something to backup before send something to the client.
3. Check if I'm running in standby or not when accepting commands.
Dec 28, 2011
Mac, Linux, Windows in the office
Bought new LCD display with 1080p resolution and now running Mac, Linux and Windows. Mac book air runs pretty quick. More than that, thanks to KeyRemap4, key-binding is set to emacs! This is done in very low level and makes typing really a fun.
One tweak to use Mac (or other system) through HDMI. HDMI picture is originally tuned for movies, or digital TV. You should set this to PC ir still picture, which makes screen really beautiful.
My keyboards:
Happy Hacking Keyboard professional for Linux and Windows. Really cool keyboard I've ever met.
Apple wireless keyboard for Mac book air clam shell mode. I didn't expect much but found this keyboard is really as cool as HHK pro.
Happy typing!
One tweak to use Mac (or other system) through HDMI. HDMI picture is originally tuned for movies, or digital TV. You should set this to PC ir still picture, which makes screen really beautiful.
My keyboards:
Happy Hacking Keyboard professional for Linux and Windows. Really cool keyboard I've ever met.
Apple wireless keyboard for Mac book air clam shell mode. I didn't expect much but found this keyboard is really as cool as HHK pro.
Happy typing!
Dec 20, 2011
GTM-standby third patch
Now third patch is done. What I did today are:
- Add missing option definition,
- Correcting wrong option description
- Adding log message of error detection, GTM connection retry and reconnect to GTM-standby.
Dec 19, 2011
GTM-Proxy fails with SEG-V
Sudo-san reported me that GTM-Proxy fails with SEG-V. It runs normally in Ubuntu and without O2 build option, it also runs in CentOS too. Finally, I found that it only fails with O2 option at CentOS. It looks that entry to memory allocation handler is corrupted. I will look into it tomorrow.
---
It was caused by uninitialized thrinfo, which points to all the memory context. Just adding memset() fixed the problem.
I checked all the other malloc() in GTM-related code and found all the others are associated with proper initilization or written before read.
---
It was caused by uninitialized thrinfo, which points to all the memory context. Just adding memset() fixed the problem.
I checked all the other malloc() in GTM-related code and found all the others are associated with proper initilization or written before read.
Dec 16, 2011
GTM-proxy error handling for reconnect
Added a code to allow GTM-Proxy to do the following:
1) Optionally retry connection to current GTM. Specifies count, idle and interval.
2) Optionally waits reconnect command. Specifies count, idle and interval.
To reduce the number of options, I'm not willing to introduce "option" flag to ask yes or no. Instead, maybe we should consider that all zero menas no retry or no reconnect. This code has not been done yet. Maybe next Monday.
Anyway, need to log this activity.
----
Gee! After I made the first commit of this feature, I found there're couple of issues to be fixed before tested.
1) Description of new option GTM_OPTNAME_RETRY_IDLE ... This is wrong!!
2) GTM_OPTNAME_ERR_WAIT_IDLE definition is missing!!
I need to fix them as well as documentation.
Now, GTM_OPTNAME_ERR_WAIT_OPT is removed. So, for connection retry and wait for reconnect,
if all the idle, count and interval are zero, then no such action will be made.
If communication error is detected and no retry, no reconnect wait are specified, then this will cause FATAL error because GTM-Proxy cannot continue service.
---
Koichi
1) Optionally retry connection to current GTM. Specifies count, idle and interval.
2) Optionally waits reconnect command. Specifies count, idle and interval.
To reduce the number of options, I'm not willing to introduce "option" flag to ask yes or no. Instead, maybe we should consider that all zero menas no retry or no reconnect. This code has not been done yet. Maybe next Monday.
Anyway, need to log this activity.
----
Gee! After I made the first commit of this feature, I found there're couple of issues to be fixed before tested.
1) Description of new option GTM_OPTNAME_RETRY_IDLE ... This is wrong!!
2) GTM_OPTNAME_ERR_WAIT_IDLE definition is missing!!
I need to fix them as well as documentation.
Now, GTM_OPTNAME_ERR_WAIT_OPT is removed. So, for connection retry and wait for reconnect,
if all the idle, count and interval are zero, then no such action will be made.
If communication error is detected and no retry, no reconnect wait are specified, then this will cause FATAL error because GTM-Proxy cannot continue service.
---
Koichi
Dec 1, 2011
GTM Standalone connectivity improvement (2)
Today, I finished the first code for this. Tested the following:
- start gtm
- start gtm_proxies
- start coordinators/datanodes and do some session through psql.
- start gtm standby and do another psql sessions.
They worked okay. Promote and reconnect seems to work fine.
However, when gtm standby is shutdown and try to connect again, gtm/gtm_proxies seems to stall. Maybe disconnect sequence doesn't work well. Need to fix this before commit.
Nov 29, 2011
GTM standalone connectivity improvement
Now GTM standby can connect and backup GTM when it connects before any other GTM client connects. Here's an improvement.
- GTM standby connects to GTM and send new command "BEGIN BACKUP" or something like this. When GTM receives this command, GTM first lock all the other thread at their static points, typically just before they read request from clients.
- GTM thread connected to GTM standby then establish connections to GTM-Standby, link this connection to thread information of each thread. Then release locks so that other threads can continue to work with new backup connection to GTM standby.
- Main thread does not have any connection to the standby.
Needed extension:
- New entry in GTM thread information to indicate if the thread is main or not.
- New command to indicate to "BEGIN BACKUP", and then "END BACKUP". When "BEGIN BACKUP" is received, GTM will lock all the other threads. When "END BACKUP" is received, new communication to GTM-Standby will be established for each worker thread (not for main thread), then all the thread locks will be released.
- To lock each thread, thr_lock can be used. This is defined and initialized but not used so far.
Nov 22, 2011
GTM standby problem
Testing GTM and GTM standby. Somehow, GTM does not recognize GTM standby although startup and initial backup is successful.
Looked into the thread of GTM. Althou I'm running only two GTM proxy with one worker thread, there's two thread in GTM for each GTM Proxy. Each GTM Proxy is running only one thread, as specified. Sigh...
Remembered that the main thread also have a connection to GTM. So connection will be main thread + worker threads.
BTW, why the main thread need a connection to GTM? It does nothing but accept client connection and assign it to one of the worker threads???
Problem fixed
There were essentially two failure, one was the condition to find GTM standby from the node registration. The other was constructing connection information to GTM standby. Format string was wrong. With other fixes to handle non-null-terminated strings in libpq connection, it worked!!
Looked into the thread of GTM. Althou I'm running only two GTM proxy with one worker thread, there's two thread in GTM for each GTM Proxy. Each GTM Proxy is running only one thread, as specified. Sigh...
Remembered that the main thread also have a connection to GTM. So connection will be main thread + worker threads.
BTW, why the main thread need a connection to GTM? It does nothing but accept client connection and assign it to one of the worker threads???
Problem fixed
There were essentially two failure, one was the condition to find GTM standby from the node registration. The other was constructing connection information to GTM standby. Format string was wrong. With other fixes to handle non-null-terminated strings in libpq connection, it worked!!
Nov 18, 2011
Tackling GTM promote/reconnect
Okay, now GTM/Proxy works with configuration file under NODE_NAME extension.
Now, promote/reconnect. Promote looked to work but in reality, it didn't. Somehow, promoted standby tries to register itself to itself and this doesn't work obviously. I need to find how this happens, because it worked fine before NODE_NAME work (node id, instead).
One idea: Should we change the way of "register" to connection negotiation to tell GTM who I am, coordinator/datanode/gtm_proxy/gtm_standby. Although this has very little to do with registering self....
Another thing about current GTM protocol is that strings are not NULL-Terminated. Strings has a prefix to indicate the length of the string. That's find. However, the following string is not null-terminated and the recipient has to reconstruct these strings to null-terminated strings, which increase number of lines of code and could be very good cause of failure.
Now, promote/reconnect. Promote looked to work but in reality, it didn't. Somehow, promoted standby tries to register itself to itself and this doesn't work obviously. I need to find how this happens, because it worked fine before NODE_NAME work (node id, instead).
One idea: Should we change the way of "register" to connection negotiation to tell GTM who I am, coordinator/datanode/gtm_proxy/gtm_standby. Although this has very little to do with registering self....
Another thing about current GTM protocol is that strings are not NULL-Terminated. Strings has a prefix to indicate the length of the string. That's find. However, the following string is not null-terminated and the recipient has to reconstruct these strings to null-terminated strings, which increase number of lines of code and could be very good cause of failure.
Nov 17, 2011
Nov.17, GTM config file
Found several codes which were changed due to NODE NAME work and found some leads to failure.
Did most of them, including promote and reconnect. When promoted and reconnected, GTM_Proxy won't shut down with gtm_ctl. And GTM wont't shut down either.
This should be taken care of tomorrow.
--- A bit tired ...
Did most of them, including promote and reconnect. When promoted and reconnected, GTM_Proxy won't shut down with gtm_ctl. And GTM wont't shut down either.
This should be taken care of tomorrow.
--- A bit tired ...
Node_Name Initialization (XC)
How to initialize node names and node groups in initdb?
Edit file "cluster_nodes.sql" at $prefix/share/postgresql.
$prefix is the installation point of Postgres-XC, as specified in ./configure
Cluster-wide settings can be done here. Node-specific settings should be postgresql.conf at each coordinator/datanode database cluster.
Questions:
In cluster_nodes.sql, we specify all the node information using CREATE NODES nodename ....
In postgresql.conf, we specify individual node name using pgxc_node_name.
The former is case insensitive and the latter is case sensitive. Should we assume that they're stored in lower case, which is PostgreSQL specific?
Edit file "cluster_nodes.sql" at $prefix/share/postgresql.
$prefix is the installation point of Postgres-XC, as specified in ./configure
Cluster-wide settings can be done here. Node-specific settings should be postgresql.conf at each coordinator/datanode database cluster.
Questions:
In cluster_nodes.sql, we specify all the node information using CREATE NODES nodename ....
In postgresql.conf, we specify individual node name using pgxc_node_name.
The former is case insensitive and the latter is case sensitive. Should we assume that they're stored in lower case, which is PostgreSQL specific?
Node_Name comments
1. CREATE TABLE ... TO node or group
--> What if [TO ...] clause is omitted.
2. CREATE NODE ... Preferred
--> "Preferred" is not a node attribute. Preferred node is an attribute of a coordinator.
3. Node (coordinator/data node) and node member
--> Node consists of one ore more than one member. One of them must be the master (or root master). The other members should be slaves. A member, which is not the root master, is a slave of another member. The master (member A) of a member can be a slave of another master if and only if member A is not the root master.
--> What if [TO ...] clause is omitted.
2. CREATE NODE ... Preferred
--> "Preferred" is not a node attribute. Preferred node is an attribute of a coordinator.
3. Node (coordinator/data node) and node member
--> Node consists of one ore more than one member. One of them must be the master (or root master). The other members should be slaves. A member, which is not the root master, is a slave of another member. The master (member A) of a member can be a slave of another master if and only if member A is not the root master.
Nov 15, 2011
コマンド対応のGTM Standby へのバックアップとプロトコル
MSG_NODE_REGISTER -> MSG_NODE_REGISTER_BKUP
GTM_PGXCNodeType: nodetype (internal format)
int32: len(nodename)
string: nodename(not null-terminated)
* No response to GTM.
MSG_NODE_UNREGISTER -> MSG_NODE_UNREGISTER_BKUP
GTM_PGXCNodeType: nodetype (internal format)
int32: len(nodename)
string: nodename(not null-terminated)
* No response to GTM
MSG_NODE_LIST -> none (This message is from standby to backup the node list)
MSG_NODE_BEGIN_REPLICATION_INIT -> none
MSG_NODE_END_REPLICATION_INIT -> none
MSG_TXN_BEGIN -> MSG_TXN_BEGIN_BKUP
GTM_TransactionHandle: txn
GTM_Timestamp: timestamp
GTM_IsolationLevel: isolationLevel
bool: txn_read_only
note) No response to GTM
MSG_TXN_BEGIN_GETGXID -> MSG_TXN_BEGIN_GETGXID_BKUP
GTM_TransactionHandle: txn
GlobalTransactionId: gxid
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
GTM_Timestamp: timestamp
note) No response to GTM
MSG_TXN_BEGIN_GETGXID_AUTOVACUUM -> MSG_TXN_BEGIN_GETGXID_AUTOVACUUM_BKUP
GTM_TransactionHandle: txn
GlobalTransactionId: gxid
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
GTM_Timestamp: timestamp
note) No response to GTM
MSG_TXN_BEGIN_GETGXID_MULTI -> MSG_TXN_BEGIN_GETGXID_MULTI_BKUP
GTM_TransactionId: gxid
int: txn_count
GTMTimestamp: timestamp
{
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
} * (txn_count)
note) No response to GTM
MSG_TXN_START_PREPARED -> MSG_TXN_START_PREPARED_BKUP
int: isgxid
(GTM_TransactionHandle | GlobalTransactionId)
GTM_Strlen: gidlen
string: gid
GTM_Strlen: nodelen
string: nodestring
GTM_PGXCNodeType: nodetype (internal format)
int32: len(nodename)
string: nodename(not null-terminated)
* No response to GTM.
MSG_NODE_UNREGISTER -> MSG_NODE_UNREGISTER_BKUP
GTM_PGXCNodeType: nodetype (internal format)
int32: len(nodename)
string: nodename(not null-terminated)
* No response to GTM
MSG_NODE_LIST -> none (This message is from standby to backup the node list)
MSG_NODE_BEGIN_REPLICATION_INIT -> none
MSG_NODE_END_REPLICATION_INIT -> none
MSG_TXN_BEGIN -> MSG_TXN_BEGIN_BKUP
GTM_TransactionHandle: txn
GTM_Timestamp: timestamp
GTM_IsolationLevel: isolationLevel
bool: txn_read_only
note) No response to GTM
MSG_TXN_BEGIN_GETGXID -> MSG_TXN_BEGIN_GETGXID_BKUP
GTM_TransactionHandle: txn
GlobalTransactionId: gxid
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
GTM_Timestamp: timestamp
note) No response to GTM
MSG_TXN_BEGIN_GETGXID_AUTOVACUUM -> MSG_TXN_BEGIN_GETGXID_AUTOVACUUM_BKUP
GTM_TransactionHandle: txn
GlobalTransactionId: gxid
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
GTM_Timestamp: timestamp
note) No response to GTM
MSG_TXN_BEGIN_GETGXID_MULTI -> MSG_TXN_BEGIN_GETGXID_MULTI_BKUP
GTM_TransactionId: gxid
int: txn_count
GTMTimestamp: timestamp
{
GTM_IsolationLevel: txn_isolation_level
bool: txn_read_only
} * (txn_count)
note) No response to GTM
MSG_TXN_START_PREPARED -> MSG_TXN_START_PREPARED_BKUP
int: isgxid
(GTM_TransactionHandle | GlobalTransactionId)
GTM_Strlen: gidlen
string: gid
GTM_Strlen: nodelen
string: nodestring
動作途中でGTM Standby が接続できるようにする
現時点では、GTMにクライアント (GTM-Proxy等) がGTMに接続する時点で standby が接続完了していないと接続できないようになっている。(standby の設定が ThreadLoop() に入る前でしかなされていない)。この改善策。
1. 各Threadで standby へのバックアップ判断前に ThreadInfo のロックを取得し、新たに standby が接続されていないかどうかを確認する。ThreadInfo のメンバ追加が必要かもしれない。
2. 新たに standby が接続されていたら、この Thread から standby に接続し、Conn をThreadInfo に設定し、ロックを解放。これで、このスレッドの情報がバックアップ可能になる。
3. GTMは、standby からの接続がなされたら、全Threadinfoを確認する。Threadinfo が設定前であれば、既存のコードでstandby の初期設定を行う。そうでなければ、各ThreadInfo のロックを取得し、standby からの接続があった旨のフラグを立てる。(ThreadInfo へのメンバ追加)。
GTM connection to stanby from the master remains after promotion
現状、GTM-Standby を promote しても、既存の master との接続は閉じられない。
ま、いずれ古い master はなくなり、connection は切断されるということだろうが、今ひとつ気に入らない。switchover の場合など、不要なリソースが残ってしまうのではないかと思う。どこかでこのコードも入れるべきと思う。
StandbyからGTMへの接続、GTMからStandbyへの接続をキャンセルすればいいのだから、各スレッドでconnectionをクローズしてスレッドを終了すればよい。メモリ、fd のリソースの後始末も忘れずに。
バックアップ直前に standby からの接続を確認するので、接続を確認し、ThreadInfo を更新する共通ルーチンを作成する。これをバックアップ直前に共通に呼び出すようにする。gtm_standby.c にルーチンを追加すればいいと考える。
ま、いずれ古い master はなくなり、connection は切断されるということだろうが、今ひとつ気に入らない。switchover の場合など、不要なリソースが残ってしまうのではないかと思う。どこかでこのコードも入れるべきと思う。
StandbyからGTMへの接続、GTMからStandbyへの接続をキャンセルすればいいのだから、各スレッドでconnectionをクローズしてスレッドを終了すればよい。メモリ、fd のリソースの後始末も忘れずに。
バックアップ直前に standby からの接続を確認するので、接続を確認し、ThreadInfo を更新する共通ルーチンを作成する。これをバックアップ直前に共通に呼び出すようにする。gtm_standby.c にルーチンを追加すればいいと考える。
Subscribe to:
Posts (Atom)