Kochidbms: January 2012

Jan 31, 2012

Testing XC toward 1.0

Now we're becoming the goal of this quarter's development. Depending upon each mender's schedule, we will go into dedicated test for 1.0.
So far, from the discussion among the core members, we agreed that the test should consider at least three points of view.

How many bugs are hidden and how many bugs should be found. Considering the written code is around 100,000 lines, I think we should find around 500 bugs. I do hope the code is much better. We need much more test cases to run.
SQL functionality check: what is supported and what is not. I've been discussion this with the sponsor. I don't know if the developer team should (and can) do this. From my experience, it will take at least five to six man-month to write a doc of test case for this purpose.
Code test coverage. There could be a handy tool to measure this. Although test code test coverage doesn't tell about the condition for each piece of code, Pavan tells this will be good to check the coverage of error handling. I agree on this.

I should write first draft of XC test schedule by the next Tuesday to discuss in core member's teleconference.

After consideration...
Maybe increase code coverage makes most sense from current development team. Feature test and performance test are being done for each development project.

Jan 29, 2012

Cluster Summit and PGCon

Now began discussion about Cluster Summit in PGCon with Josh. Josh still likes an idea to have CHAR in Europe, Japan and US in turn. Yes, I think this is quite a good idea but it seems to me that we need some independent organization (in PostgreSQL community) to manage this, as well as local organization to help preparation. In Feb, we may be able to innate this to Simon at Paris through Michael. Should Japanese local organization be in JPUG?

Jan 27, 2012

pgxc_clean

Finished initial pgxc_clean code. It is not tested yet.

At first, I thought I need direct connection to GTM to clean up TXN status, but finally I found that I just need to tell the coordinator to commit or abort prepared transaction. With this, the code is simple enough. It is just libpq application. If I wish, I could have written this as ECPG application or even JDBC application.

Anyway, because this completely depends upon XC internal catalog and proprietary function calls, this is local to XC and is not portable at all.

Jan 12, 2012

GTM HA for 0.9.7

Committed the patch. Now GTM's transaction backup to the standby is accurate, even in heavy workload. Remaining issues are:

tcp_keepalives feature works to monitor client connection at the server. Need some more work to enable monitoring server connection at clients.
Need to add "status" feature. I once considered to add this to gtm_ctl. It may work but new command can be okay too. Anyway, this feature should be implemented using normal communication with gtm/gtm_proxy, not signal.
"gtminit", associated with XC cluster bootstrap.

Jan 10, 2012

GTM standby test

Found that GTM synchronous backup does not work well.
What found:

Try to sync with GTM-Standby by gtm_sync_standby()
It does not return somehow.

Before these, called bkup_node_register_internal() to backup the command to GTM-Standby and it looks successful.

Maybe GTM-Standby does not handle backup messages correctly and GTM-Standby is waiting for some more information.

Jan 6, 2012

GTM standby

Now GTM standby is at the last stage.

Error handling in GTM-Proxy done: when GTM communication error occurs, proxy does some retry and if it fails then wait for reconnect operation.
GTM standby can now connect to GTM at any moment.
After reconnect and shutdown whole cluster, GTM-Standby can start as GTM. Do not forget to rearrange configuration files properly.
GTM backup is now corrected so that transaction handle and GXID are backed up to the standby.
Asynchronous backup works fine.
Synchronous backup has still some issues. Maybe some minor protocol handling. Will be tested next.

Jan 3, 2012

XC bootstrap

Pavan wrote his proposal on XC bootstrap. Here's some of my idea/comment

It's nice to run initdb as independently as possible and register each node after initdb. It will be even nicer if initdb runs vacuum freeze so that any node can begin with any GXID. It will make adding nodes safer.
He is right that XC configuration makes sense with at least one coordinator registered. The issues are:

Should a coordinator registered at first? The order of registration can be more flexible. Because clients target to a coordinator, there will be no problem to have only datanodes initialized and registered at initial phase of the bootstrap.
Should datanode be registered to GTM? What the registration works for?

Coordinator was made a separate node because we thought coordinator and datanode should be different binaries. Now that they share the same binary what happens if we a node is both coordinator and datanode? If they're the same, configuration may look simpler.