From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Wiener Date: Mon, 07 Oct 2013 14:48:49 +0200 Subject: [Cluster-devel] postgres/drbd start-up issue with clusvcadm Message-ID: <5252ADB1.7010705@mazbr.de> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Dear cluster-devel list, I have a very strange problem with my postgres service. I implemented the service in the cluster.conf and tested it with rg_test test cluster.conf start service pgsql The result was a succesfully started postgres service. However if I start the service via clusvcadm -e pgsql Oct 07 14:26:16 rgmanager Starting disabled service service:pgsql Oct 07 14:26:16 rgmanager [ip] Link for bridge0: Detected Oct 07 14:26:16 rgmanager [ip] Adding IPv4 address 10.0.1.15/21 to bridge0 Oct 07 14:26:16 rgmanager [ip] Pinging addr 10.0.1.15 from dev bridge0 Oct 07 14:26:18 rgmanager [ip] Sending gratuitous ARP: 10.0.1.15 00:25:90:a2:c7:b6 brd ff:ff:ff:ff:ff:ff Oct 07 14:26:19 rgmanager [drbd] Setting resource pgsql to state : primary Oct 07 14:26:20 rgmanager [fs] mounting /dev/drbd6 on /var/lib/pgsql Oct 07 14:26:20 rgmanager [fs] mount -t ext4 -o noatime /dev/drbd6 /var/lib/pgsql Oct 07 14:26:20 rgmanager [postgres-8] Verifying Configuration Of postgres-8:pgsqld Oct 07 14:26:20 rgmanager [postgres-8] Verifying Configuration Of postgres-8:pgsqld > Succeed Oct 07 14:26:20 rgmanager [postgres-8] Starting Service postgres-8:pgsqld Oct 07 14:26:20 rgmanager [postgres-8] PID File "/var/run/cluster/postgres-8/postgres-8:pgsqld.pid" Was Removed - Zero length Oct 07 14:26:20 rgmanager [postgres-8] Looking For IP Addresses Oct 07 14:26:20 rgmanager [postgres-8] IP 10.0.1.15 found @ /cluster/rm/service[@name="pgsql"]/ip[1] Oct 07 14:26:21 rgmanager [postgres-8] 1 IP addresses found for pgsql/pgsqld Oct 07 14:26:21 rgmanager [postgres-8] Looking For IP Addresses > Succeed - IP Addresses Found Oct 07 14:26:21 rgmanager [postgres-8] Checking: SHA1 checksum of config file /etc/cluster/postgres-8/postgres-8:pgsqld/postgresql.conf Oct 07 14:26:21 rgmanager [ip] Checking 10.0.1.12, Level 0 Oct 07 14:26:21 rgmanager [ip] Checking 10.0.1.13, Level 0 Oct 07 14:26:21 rgmanager [postgres-8] Checking: SHA1 checksum > succeed Oct 07 14:26:21 rgmanager [ip] Checking 10.0.1.14, Level 0 Oct 07 14:26:21 rgmanager [ip] 10.0.1.12 present on bridge0 Oct 07 14:26:21 rgmanager [ip] 10.0.1.13 present on bridge0 Oct 07 14:26:21 rgmanager [postgres-8] Generating New Config File /etc/cluster/postgres-8/postgres-8:pgsqld/postgresql.conf From /var/lib/pgsql/data/posOct 07 14:26:21 rgmanager [ip] 10.0.1.14 present on bridge0 Oct 07 14:26:21 rgmanager [postgres-8] #x#x#x# forcing a cr here Oct 07 14:26:22 rgmanager [postgres-8] Generating New Config File /etc/cluster/postgres-8/postgres-8:pgsqld/postgresql.conf From /var/lib/pgsql/data/posOct 07 14:26:22 rgmanager [ip] Link detected on bridge0 Oct 07 14:26:22 rgmanager [fs] Checking fs "install_fs", Level 10 Oct 07 14:26:22 rgmanager [postgres-8] #x#x#x# forcing a cr here Oct 07 14:26:22 rgmanager [fs] Checking fs "www_fs", Level 10 Oct 07 14:26:22 rgmanager [postgres-8] Waiting for 2 seconds before calling pg_ctl status.. Oct 07 14:26:24 rgmanager [postgres-8] trying to get status : su - "postgres" -c "/usr/bin/pg_ctl status -D/var/lib/pgsql/data" &> /dev/null Oct 07 14:26:24 rgmanager [postgres-8] pg_ctl status: failed Oct 07 14:26:24 rgmanager [postgres-8] Starting Service postgres-8:pgsqld > Failed Oct 07 14:26:24 rgmanager start on postgres-8 "pgsqld" returned 1 (generic error) I get a failed service. I tried to debug the problem by adding additional ocf_log lines into the postgres-8.sh script. However, the results are rather confusing, since it seems that the line that starts the postmaster process is not generating any output. I redirected the output to a file instead of a /dev/null -> nothing. I enabled syslogging for the postgres process in the postgresql.conf file under /var/lib/pgsql/data/postgresql.conf. I also checked the latest git version of the postgres-8.sh script and found a small change which is related to stopping the service but the starting part is the same. I am at a loss and any help to further debug this issue is greatly appreciated. In addition I found the following small issues: a) the loglines generated from the generate_config_file() call somehow miss a CR so that followin messages are printed overlapping. b) during start of the service a variable pguser_group is set with pguser_group=`groups $OCF_RESKEY_postmaster_user | cut -f1 -d' '` I believe that this is incorrect, as the first field in the groups call delivers the user name and not the group. In this case it should not matter as the group name and user name for postgres is the same but I believe it should read: pguser_group=`groups $OCF_RESKEY_postmaster_user | cut -f3 -d' '` Thanks Andi -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: