All of lore.kernel.org
 help / color / mirror / Atom feed
* [tabled patch 1/5] Fix crash when stopping slave
       [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
@ 2010-08-12 19:21 ` Pete Zaitcev
  2010-08-12 19:22 ` [tabled patch 2/5] Clean name vs host Pete Zaitcev
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Pete Zaitcev @ 2010-08-12 19:21 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

When a slave is shut down, it crashes, because we cannot call tdb_down
without tdb_up, and that obviously only happens on a master.
Also, plug a problem inside rtdb_fini (needs a better fix but
should do for now).

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---
 server/metarep.c |    9 ++++++++-
 server/server.c  |   11 +++--------
 2 files changed, 11 insertions(+), 9 deletions(-)

commit dd6ed2529efa71902f7ac1958f01a2541b4f9133
Author: Pete Zaitcev <zaitcev@yahoo.com>
Date:   Thu Aug 12 12:27:48 2010 -0600

    Crash on tdb_down if slave.

diff --git a/server/metarep.c b/server/metarep.c
index d3eec49..e13b13f 100644
--- a/server/metarep.c
+++ b/server/metarep.c
@@ -1240,6 +1240,13 @@ int rtdb_restart(struct tablerep *rtdb, bool we_are_master)
 void rtdb_fini(struct tablerep *rtdb)
 {
 	__rtdb_fini(rtdb);
-	tdb_fini(&rtdb->tdb);
+	/*
+	 * This check is ewwww, but unfortunately there's potentially a gap
+	 * between DB going master and us bringing up the environment.
+	 * If we condition the tdb_fini on DB status, we'll end crashing
+	 * if the server terminates during the gap.
+	 */
+	if (rtdb->tdb.env)
+		tdb_fini(&rtdb->tdb);
 }
 
diff --git a/server/server.c b/server/server.c
index 8859847..1f8164b 100644
--- a/server/server.c
+++ b/server/server.c
@@ -2326,18 +2326,13 @@ int main (int argc, char *argv[])
 	applog(LOG_INFO, "shutting down");
 
 	rc = 0;
-
+	if (tabled_srv.state_tdb == ST_TDB_MASTER)
+		tdb_down(&tdbrep.tdb);
 	cld_end();
 err_cld_session:
 	/* net_close(); */
 err_out_net:
-	if (tabled_srv.state_tdb == ST_TDB_MASTER ||
-	    tabled_srv.state_tdb == ST_TDB_SLAVE) {
-		tdb_down(&tdbrep.tdb);
-		rtdb_fini(&tdbrep);
-	} else if (tabled_srv.state_tdb == ST_TDB_OPEN) {
-		rtdb_fini(&tdbrep);
-	}
+	rtdb_fini(&tdbrep);
 err_rtdb:
 	event_del(&tabled_srv.pevt);
 err_pevt:

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tabled patch 2/5] Clean name vs host
       [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
  2010-08-12 19:21 ` [tabled patch 1/5] Fix crash when stopping slave Pete Zaitcev
@ 2010-08-12 19:22 ` Pete Zaitcev
  2010-08-12 19:22 ` [tabled patch 3/5] cleanup a call to closelog() Pete Zaitcev
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Pete Zaitcev @ 2010-08-12 19:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

Mop up small problems from the name/host split. In one place we
printed "slave (null)". In other place printing name made more sense.

Most importantly, while CLD client used name correctly, we sent a slave
login using host (argument to rtdb_start), which was incorrect.
Swap that to replication name.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---
 server/metarep.c |    2 +-
 server/server.c  |   10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

commit 294393ed1a0789b71b673f894232ba1b5cdb9dc3
Author: Pete Zaitcev <zaitcev@yahoo.com>
Date:   Thu Aug 12 12:30:45 2010 -0600

    Cleanup of name vs host.

diff --git a/server/metarep.c b/server/metarep.c
index e13b13f..9466029 100644
--- a/server/metarep.c
+++ b/server/metarep.c
@@ -884,7 +884,7 @@ static int rtdb_master_login_reply(struct db_conn *dbc, unsigned char *msgbuf)
 	}
 	if (debugging)
 		applog(LOG_DEBUG, "Link login, slave %s dbid %d",
-		       slave->host, slave->dbid);
+		       slave->name, slave->dbid);
 
 	/*
 	 * Dispose of all existing connections. Our current implementation
diff --git a/server/server.c b/server/server.c
index 1f8164b..829d2db 100644
--- a/server/server.c
+++ b/server/server.c
@@ -399,7 +399,7 @@ static void stats_dump(void)
 
 	applog(LOG_INFO, "TDB: group %s state %s host %s rep_port %d dbid %d%s",
 	       tabled_srv.group, state_name_tdb[tabled_srv.state_tdb],
-	       tabled_srv.ourhost, tabled_srv.rep_port, tdbrep.thisid,
+	       tabled_srv.rep_name, tabled_srv.rep_port, tdbrep.thisid,
 	       (tabled_srv.mc_delay)? " mc_delay": "");
 	for (tmp = tabled_srv.rep_remotes; tmp; tmp = tmp->next) {
 		rp = tmp->data;
@@ -447,7 +447,7 @@ bool stat_status(struct client *cli, GList *content)
 		     "<p>TDB: group %s "
 		     "state %s host %s rep_port %d dbid %d%s</p>\r\n",
 		     tabled_srv.group, state_name_tdb[tabled_srv.state_tdb],
-		     tabled_srv.ourhost, tabled_srv.rep_port, tdbrep.thisid,
+		     tabled_srv.rep_name, tabled_srv.rep_port, tdbrep.thisid,
 		     (tabled_srv.mc_delay)? " mc_delay": "") < 0)
 		return false;
 	content = g_list_append(content, str);
@@ -1719,7 +1719,7 @@ int tdb_slave_login_cb(int srcid)
 		if (rtdb_start(&tdbrep, tabled_srv.tdb_dir,
 			       false,
 			       master,
-			       tabled_srv.rep_port, tdb_state_cb)) {
+			       0, tdb_state_cb)) {
 			tabled_srv.state_tdb = ST_TDB_INIT;
 			applog(LOG_ERR, "Failed to open TDB, limping");
 			return -1;
@@ -2248,6 +2248,8 @@ int main (int argc, char *argv[])
 	else if (debugging)
 		applog(LOG_INFO, "Forcing local hostname to %s",
 		       tabled_srv.ourhost);
+	if (!tabled_srv.rep_name)
+		tabled_srv.rep_name = tabled_srv.ourhost;
 
 	/*
 	 * background outselves, write PID file ASAP
@@ -2294,7 +2296,7 @@ int main (int argc, char *argv[])
 	}
 
 	/* late-construct structures with allocations */
-	if (rtdb_init(&tdbrep, tabled_srv.ourhost)) {
+	if (rtdb_init(&tdbrep, tabled_srv.rep_name)) {
 		applog(LOG_WARNING, "rtdb_init");
 		rc = 1;
 		goto err_rtdb;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tabled patch 3/5] cleanup a call to closelog()
       [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
  2010-08-12 19:21 ` [tabled patch 1/5] Fix crash when stopping slave Pete Zaitcev
  2010-08-12 19:22 ` [tabled patch 2/5] Clean name vs host Pete Zaitcev
@ 2010-08-12 19:22 ` Pete Zaitcev
  2010-08-12 19:22 ` [tabled patch 4/5] Support "auto" replicaton port Pete Zaitcev
  2010-08-12 19:22 ` [tabled patch 5/5] Metadata replication test Pete Zaitcev
  4 siblings, 0 replies; 6+ messages in thread
From: Pete Zaitcev @ 2010-08-12 19:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

Since we're on it, only call closelog() if openlog() was called.
There is no crash if we do that, but still it's not very correct.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---
 server/server.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

commit ae8ac067adde81a34c9d6114cfbaa1f95c9b48da
Author: Pete Zaitcev <zaitcev@yahoo.com>
Date:   Thu Aug 12 12:33:39 2010 -0600

    Syslog cleanup.

diff --git a/server/server.c b/server/server.c
index 829d2db..7a9fb7a 100644
--- a/server/server.c
+++ b/server/server.c
@@ -2344,7 +2344,8 @@ err_evpipe:
 	unlink(tabled_srv.pid_file);
 	close(tabled_srv.pid_fd);
 err_out:
-	closelog();
+	if (use_syslog)
+		closelog();
 	return rc;
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tabled patch 4/5] Support "auto" replicaton port
       [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
                   ` (2 preceding siblings ...)
  2010-08-12 19:22 ` [tabled patch 3/5] cleanup a call to closelog() Pete Zaitcev
@ 2010-08-12 19:22 ` Pete Zaitcev
  2010-08-13 20:21   ` Jeff Garzik
  2010-08-12 19:22 ` [tabled patch 5/5] Metadata replication test Pete Zaitcev
  4 siblings, 1 reply; 6+ messages in thread
From: Pete Zaitcev @ 2010-08-12 19:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

Allow random ports for replication master to listen on.

The patch is somewhat larger than expected, because before we had
the MASTER file written right after locking. Now we may have it
written without listening parameters, and the slaves must be
ready to deal with it.

Unlike the auto client port, we do not need to write any "accessor"
files, because we already report the host and port through CLD.

Listening on random ports has security implications.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---
 doc/etc.tabled.conf   |   16 ++-
 server/cldu.c         |  166 ++++++++++++++++++++++++++--------------
 server/config.c       |   22 +++--
 server/metarep.c      |   34 ++++++++
 server/tabled.h       |    1 
 test/tabled-test.conf |    2 
 6 files changed, 173 insertions(+), 68 deletions(-)

commit 7ca8a587348e315ab7c6c9e32476d8fa387718d5
Author: Pete Zaitcev <zaitcev@yahoo.com>
Date:   Thu Aug 12 12:37:27 2010 -0600

    TDBRepPort "auto".

diff --git a/doc/etc.tabled.conf b/doc/etc.tabled.conf
index c3b1d1d..5112e8a 100644
--- a/doc/etc.tabled.conf
+++ b/doc/etc.tabled.conf
@@ -12,14 +12,22 @@
 </Listen>
 
 <!--
-  One group per DB, don't skimp on groups. Also, make sure the replication
-  ports do not conflict when you make boxes to host several groups or use
-  replication instances iwth TDBRepName.
+  One group per DB, don't skimp on groups.
   -->
 <Group>ultracart2</Group>
+
 <TDB>/path/tabled-uc2/</TDB>        <!-- mkdir -p /path/tabled-uc2 -->
-<!-- <TDBRepName>12345.my_local_node_name.example.com</TDBRepName> -->
+
+<!--
+  The usual practice is to set a fixed TDBRepPort (8083) because this
+  permits to configure a firewall easily. Remember that replication
+  has no authentication and authorization whatsoever for now!
+  When running two test instances on the same host, you may use "auto".
+  But if so, do not forget to set replication instances with TDBRepName.
+  By default, a hostname serves fine as an instance name, port is "auto".
+  -->
 <TDBRepPort>8083</TDBRepPort>
+<!-- <TDBRepName>inst-b.my_local_node_name.example.com</TDBRepName> -->
 
 <!--
   The clause <CLD> is not to be used in production configurations.
diff --git a/server/cldu.c b/server/cldu.c
index 45a6a83..57d486e 100644
--- a/server/cldu.c
+++ b/server/cldu.c
@@ -67,7 +67,6 @@ struct cld_session {
 
 	char *thisname;
 	char *thisgroup;
-	char *thishost;
 	char *cfname;		/* /tabled-group directory */
 	struct ncld_fh *cfh;	/* /tabled-group directory, keep open for scan */
 	char *ffname;		/* /tabled-group/thisname */
@@ -119,24 +118,17 @@ static int cldu_nextactive(struct cld_session *sp)
  * chunkservers that it uses, so this function only takes one group argument.
  */
 static int cldu_setgroup(struct cld_session *sp,
-			 const char *thisgroup, const char *thishost,
-			 const char *thisname)
+			 const char *thisgroup, const char *thisname)
 {
 	char *mem;
 
 	if (thisgroup == NULL) {
 		thisgroup = "default";
 	}
-	if (thisname == NULL) {
-		thisname = thishost;
-	}
 
 	sp->thisgroup = strdup(thisgroup);
 	if (!sp->thisgroup)
 		goto err_oom;
-	sp->thishost = strdup(thishost);
-	if (!sp->thishost)
-		goto err_oom;
 	sp->thisname = strdup(thisname);
 	if (!sp->thisname)
 		goto err_oom;
@@ -256,59 +248,82 @@ static void cldu_parse_master(const char *mfname, const char *mfile, long len)
 			applog(LOG_DEBUG, "%s: No name", mfname);
 		return;
 	}
+	if (namelen >= sizeof(namebuf)) {
+		applog(LOG_ERR, "Long master name");
+		return;
+	}
+	memcpy(namebuf, name, namelen);
+	namebuf[namelen] = 0;
+
 	if (!host || !hostlen) {
 		if (debugging)
 			applog(LOG_DEBUG, "%s: No host", mfname);
-		return;
+		hostlen = 0;
 	}
 	if (!port || !portlen) {
 		if (debugging)
 			applog(LOG_DEBUG, "%s: No port", mfname);
-		return;
+		portlen = 0;
 	}
 
-	if (namelen >= sizeof(namebuf)) {
-		applog(LOG_ERR, "Long master name");
-		return;
-	}
-	memcpy(namebuf, name, namelen);
-	namebuf[namelen] = 0;
+	if (hostlen != 0 && portlen != 0) {
 
-	if (hostlen >= sizeof(hostbuf)) {
-		applog(LOG_ERR, "Long host");
-		return;
-	}
-	memcpy(hostbuf, host, hostlen);
-	hostbuf[hostlen] = 0;
+		if (hostlen >= sizeof(hostbuf)) {
+			applog(LOG_ERR, "Long host");
+			return;
+		}
+		memcpy(hostbuf, host, hostlen);
+		hostbuf[hostlen] = 0;
 
-	if (portlen >= sizeof(portbuf)) {
-		applog(LOG_ERR, "Long port");
-		return;
-	}
-	memcpy(portbuf, port, portlen);
-	portbuf[portlen] = 0;
-	portnum = strtol(port, NULL, 10);
-	if (portnum <= 0 || portnum >= 65536) {
-		applog(LOG_ERR, "Bad port %s", portbuf);
-		return;
-	}
+		if (portlen >= sizeof(portbuf)) {
+			applog(LOG_ERR, "Long port");
+			return;
+		}
+		memcpy(portbuf, port, portlen);
+		portbuf[portlen] = 0;
+		portnum = strtol(port, NULL, 10);
+		if (portnum <= 0 || portnum >= 65536) {
+			applog(LOG_ERR, "Bad port %s", portbuf);
+			return;
+		}
 
-	rp = tdb_find_remote_byname(namebuf);
-	if (!rp) {
+		rp = tdb_find_remote_byname(namebuf);
+		if (!rp) {
+			if (debugging)
+				applog(LOG_DEBUG, "%s: Not found master %s",
+				       mfname, namebuf);
+			return;
+		}
 		if (debugging)
-			applog(LOG_DEBUG, "%s: Not found master %s",
-			       mfname, namebuf);
-		return;
-	}
+			applog(LOG_DEBUG, "Found master %s host %s port %u",
+			       namebuf, hostbuf, portnum);
 
-	if (debugging)
-		applog(LOG_DEBUG, "Found master %s host %s port %u",
-		       namebuf, hostbuf, portnum);
+		free(rp->host);
+		rp->host = strdup(hostbuf);
+		rp->port = portnum;
+		if (!rp->host)
+			return;
+	} else {
 
-	rp->host = strdup(hostbuf);
-	rp->port = portnum;
-	if (!rp->host)
-		return;
+		rp = tdb_find_remote_byname(namebuf);
+		if (!rp) {
+			if (debugging)
+				applog(LOG_DEBUG, "%s: Not found master %s",
+				       mfname, namebuf);
+			return;
+		}
+		if (debugging)
+			applog(LOG_DEBUG, "Found master %s", namebuf);
+
+		/*
+		 * At this point some other node owns the MASTER file, but
+		 * it did not supply the host and port. There is no reason
+		 * to rely on obsolete contact information, so remove it.
+		 */
+		free(rp->host);
+		rp->host = NULL;
+		rp->port = 0;
+	}
 	tabled_srv.rep_master = rp;
 }
 
@@ -357,8 +372,6 @@ static void cldu_get_master(const char *mfname, struct ncld_fh *mfh)
  * N.B. Only call this if you know that mfh is closed or never open:
  * right after cldu_set_cldc (disposing of session closes handles),
  * or when we were slave and so should not kept mfh ...
- * FIXME this will become more interesting when we keep mfh open in slave
- * state so we can have outstanding locks for master failover notification.
  */
 static int cldu_set_master(struct cld_session *sp)
 {
@@ -391,8 +404,13 @@ static int cldu_set_master(struct cld_session *sp)
 		goto err_lock;
 	}
 
-	len = asprintf(&buf, "name: %s\nhost: %s\nport: %u\n",
-		       sp->thisname, sp->thishost, tabled_srv.rep_port);
+	/*
+	 * If "auto" is used, we do not know the replication socket host
+	 * and port at this time, so we just write the name and expect
+	 * the caller to update the MASTER file later. In case of a fixed
+	 * host and port we can write it here, but there is no point.
+	 */
+	len = asprintf(&buf, "name: %s\n", sp->thisname);
 	if (len < 0) {
 		applog(LOG_ERR, "internal error: no core");
 		goto err_wmem;
@@ -879,7 +897,7 @@ int cld_begin(const char *thishost, const char *thisgroup,
 
 	evtimer_set(&ses.tm_rescan, cldu_tm_rescan, &ses);
 
-	if (cldu_setgroup(sp, thisgroup, thishost, thisname)) {
+	if (cldu_setgroup(sp, thisgroup, thisname)) {
 		/* Already logged error */
 		goto err_group;
 	}
@@ -981,6 +999,48 @@ void cldu_add_host(const char *hostname, unsigned int port)
 	sp->forced_hosts = true;
 }
 
+void cld_post_rep_conn(const char *rep_host, unsigned int rep_port)
+{
+	static struct cld_session *sp = &ses;
+	char *buf;
+	int len;
+	int rc;
+
+	if (!sp->nsp || sp->is_dead)
+		return;
+	if (!sp->mfh) {
+		/*
+		 * We should only get here when we are a master, and since
+		 * the session is up, the MASTER handle must be present.
+		 * Report an internal error.
+		 */
+		applog(LOG_WARNING,
+		       "Unable to post connection, no MASTER file");
+		return;
+	}
+
+	len = asprintf(&buf, "name: %s\nhost: %s\nport: %u\n",
+		       sp->thisname, rep_host, rep_port);
+	if (len < 0) {
+		applog(LOG_ERR, "internal error: no core");
+		goto err_wmem;
+	}
+
+	rc = ncld_write(sp->mfh, buf, len);
+	if (rc) {
+		applog(LOG_ERR, "CLD put(%s) failed: %d", sp->mfname, rc);
+		goto err_write;
+	}
+
+	free(buf);
+	return;
+
+ err_write:
+	free(buf);
+ err_wmem:
+	return;
+}
+
 void cld_end(void)
 {
 	static struct cld_session *sp = &ses;
@@ -1012,8 +1072,6 @@ void cld_end(void)
 	sp->mfname = NULL;
 	free(sp->thisgroup);
 	sp->thisgroup = NULL;
-	free(sp->thishost);
-	sp->thishost = NULL;
 	free(sp->thisname);
 	sp->thisname = NULL;
 }
diff --git a/server/config.c b/server/config.c
index 293a5dd..f94886e 100644
--- a/server/config.c
+++ b/server/config.c
@@ -211,15 +211,20 @@ static void cfg_elm_end (GMarkupParseContext *context,
 			return;
 		}
 
-		n = strtol(cc->text, NULL, 10);
-		if (n <= 0 || n >= 65536) {
-			applog(LOG_WARNING,
-			       "TDBRepPort '%s' invalid, ignoring", cc->text);
-			free(cc->text);
-			cc->text = NULL;
-			return;
+		if (!strcmp(cc->text, "auto")) {
+			tabled_srv.rep_port = 0;
+		} else {
+			n = strtol(cc->text, NULL, 10);
+			if (n <= 0 || n >= 65536) {
+				applog(LOG_WARNING,
+				       "TDBRepPort '%s' invalid, ignoring",
+				       cc->text);
+				free(cc->text);
+				cc->text = NULL;
+				return;
+			}
+			tabled_srv.rep_port = n;
 		}
-		tabled_srv.rep_port = n;
 		free(cc->text);
 		cc->text = NULL;
 	}
@@ -432,7 +437,6 @@ void read_config(void)
 	memset(&ctx, 0, sizeof(struct config_context));
 
 	tabled_srv.port = strdup("8080");
-	tabled_srv.rep_port = 8083;
 
 	if (!g_file_get_contents(tabled_srv.config, &text, &len, NULL)) {
 		applog(LOG_ERR, "failed to read config file %s",
diff --git a/server/metarep.c b/server/metarep.c
index 9466029..49c5157 100644
--- a/server/metarep.c
+++ b/server/metarep.c
@@ -676,6 +676,7 @@ static int rtdb_rep_listen(struct tablerep *rtdb, unsigned short port)
 {
 	struct sockaddr_in addr4;
 	struct sockaddr_in6 addr6;
+	socklen_t addr_len;
 	int rc;
 
 	memset(&addr6, 0, sizeof(addr6));
@@ -694,6 +695,16 @@ static int rtdb_rep_listen(struct tablerep *rtdb, unsigned short port)
 			  tdb_conn_event, rtdb);
 		if (event_add(&rtdb->lsev6, NULL) < 0)
 			applog(LOG_ERR, "event_add failed");
+
+		if (!port) {
+			addr_len = sizeof(addr6);
+			if (getsockname(rtdb->sockfd6, &addr6, &addr_len) < 0) {
+				applog(LOG_ERR, "getsockname failed: %s",
+				       strerror(errno));
+			} else {
+				port = ntohs(addr6.sin6_port);
+			}
+		}
 	}
 
 	memset(&addr4, 0, sizeof(addr4));
@@ -712,8 +723,21 @@ static int rtdb_rep_listen(struct tablerep *rtdb, unsigned short port)
 			  tdb_conn_event, rtdb);
 		if (event_add(&rtdb->lsev4, NULL) < 0)
 			applog(LOG_ERR, "event_add failed");
+
+		if (!port) {
+			addr_len = sizeof(addr4);
+			if (getsockname(rtdb->sockfd4, &addr4, &addr_len) < 0) {
+				applog(LOG_ERR, "getsockname failed: %s",
+				       strerror(errno));
+			} else {
+				port = ntohs(addr4.sin_port);
+			}
+		}
 	}
 
+	if (port)
+		cld_post_rep_conn(tabled_srv.ourhost, port);
+
 	return 0;
 }
 
@@ -1053,6 +1077,11 @@ static int rtdb_rep_connect(struct db_conn *dbc)
 	return 0;
 }
 
+/*
+ * Sadly, this has to be idempotent, because it's called for cleanup
+ * in rtdb_start, and because various link resets may invoke this, then fail.
+ * So, clear all the tested flags.
+ */
 static void __rtdb_fini(struct tablerep *rtdb)
 {
 	struct db_conn *dbc;
@@ -1097,6 +1126,11 @@ static int __rtdb_start(struct tablerep *rtdb, bool we_are_master,
 			applog(LOG_INFO, "No master yet"); /* P3 */
 			return -1;
 		}
+		if (!rep_master->host || !rep_master->port) {
+			/* FIXME This should be retried quicker than usual. */
+			applog(LOG_INFO, "Master not up yet"); /* P3 */
+			return -1;
+		}
 		if (!rtdb->mdbc) {
 			dbc = dbc_alloc(rtdb, rep_master);
 			if (!dbc)
diff --git a/server/tabled.h b/server/tabled.h
index c90511c..d4d2048 100644
--- a/server/tabled.h
+++ b/server/tabled.h
@@ -370,6 +370,7 @@ extern void cld_init(void);
 extern int cld_begin(const char *fqdn, const char *group, const char *name,
 		int verbose);
 extern void cldu_add_host(const char *host, unsigned int port);
+extern void cld_post_rep_conn(const char *rep_host, unsigned int rep_port);
 extern void cld_end(void);
 
 /* util.c */
diff --git a/test/tabled-test.conf b/test/tabled-test.conf
index f06ae2c..af2227c 100644
--- a/test/tabled-test.conf
+++ b/test/tabled-test.conf
@@ -6,7 +6,7 @@
   <PortFile>tabled.acc</PortFile>
 </Listen>
 <TDB>data/tdb</TDB>
-<TDBRepPort>18083</TDBRepPort>
+<TDBRepPort>auto</TDBRepPort>
 
 <CLD>
   <PortFile>cld.port</PortFile>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tabled patch 5/5] Metadata replication test
       [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
                   ` (3 preceding siblings ...)
  2010-08-12 19:22 ` [tabled patch 4/5] Support "auto" replicaton port Pete Zaitcev
@ 2010-08-12 19:22 ` Pete Zaitcev
  4 siblings, 0 replies; 6+ messages in thread
From: Pete Zaitcev @ 2010-08-12 19:22 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

This patch adds a build-time test for the metadata replication feature.
Note that this is not a complete test, in particular it does not
verify that the database was synched correctly at the slave.
But it verifies quite a bit.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>

---
 server/metarep.c |    2 +-
 test/Makefile.am          |    2 +
 test/metadata-rep         |   48 ++++++++++++++++++++++++++++++++++++
 test/prep-db              |    2 +
 test/stop-daemon          |   15 ++++++++---
 test/tabled-test-bis.conf |   21 +++++++++++++++
 test/wait-for-listen.c    |   11 +++++++-
 6 files changed, 94 insertions(+), 5 deletions(-)

commit 5a8b74d5541bbdf111f037246883182efaf637ba
Author: Pete Zaitcev <zaitcev@yahoo.com>
Date:   Thu Aug 12 12:44:30 2010 -0600

    Test for metadata replication (incomlete - does not verify
    key integrity, just that server thinks it went slave).

diff --git a/test/Makefile.am b/test/Makefile.am
index 9264cba..7799a79 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -7,6 +7,7 @@ EXTRA_DIST =			\
 	users.data		\
 	chunkd-test.conf	\
 	tabled-test.conf	\
+	tabled-test-bis.conf	\
 	prep-db			\
 	start-daemon		\
 	pid-exists		\
@@ -26,6 +27,7 @@ TESTS =				\
 	large-object		\
 	hdr-content-type	\
 	hdr-meta		\
+	metadata-rep		\
 	stop-daemon		\
 	clean-db
 
diff --git a/test/metadata-rep b/test/metadata-rep
new file mode 100644
index 0000000..b8b84f7
--- /dev/null
+++ b/test/metadata-rep
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+#
+# First, start the tabled-bis
+#
+# We use tee so that if things go pear-shaped in Fedora build system,
+# we get to see the error. In the same time we use the log to see
+# how far the test went.
+#
+# However, a pipeline with tee makes us to use a non-daemonized tabled
+# and then put it into background with normal shell means.
+#
+(../server/tabled -C $top_srcdir/test/tabled-test-bis.conf -E -F 2>&1 | tee tabled-bis.log) &
+
+# Why so much? Well... The time needed is gated by the time the master
+# needs to re-read the group in CLD. Until the re-read happens, the
+# would-be slave is rejected with "unknown slave". We could easily
+# pre-seed the group with the TDBRepName using cldcli, but that would
+# be cheating. So, that's 50s (cldu_rescan_delay). Then, we need to
+# give slave time to retry, done once in 35s (TABLED_MCWAIT_SEC).
+# That's 85s. And add a few seconds for databases to sync.
+echo "Sleeping 90s"
+sleep 90
+
+# Signal slave to dump the authoritative state
+kill -USR1 $(cat tabled-bis.pid)
+sleep 3
+
+grep -s "TDB: group default state Slave" tabled-bis.log
+if [ "$?" != 0 ]; then
+  echo "The slave state is not found in the log" >&2
+  exit 1
+fi
+
+kill $(cat tabled.pid)
+
+# This is pits. The wait-for-listen waits for 25s, but sometimes fails.
+# But when it succeeds, it's always "went up after 0 s". What the deuce.
+sleep 10
+
+echo "Running wait-for-listen"
+./wait-for-listen tabled-bis.acc
+if [ "$?" != 0 ]; then
+  # don't echo anything, wait-for-listen prints an error
+  exit 1
+fi
+
+exit 0
diff --git a/test/prep-db b/test/prep-db
index ef90e1c..a44c9be 100755
--- a/test/prep-db
+++ b/test/prep-db
@@ -2,10 +2,12 @@
 
 DATADIR=data
 TDBDIR=$DATADIR/tdb
+TDB2DIR=$DATADIR/tdb-bis
 CLDDIR=$DATADIR/cld
 CHUNKDIR=$DATADIR/chunk
 
 mkdir -p $TDBDIR
+mkdir -p $TDB2DIR
 mkdir -p $CLDDIR
 mkdir -p $CHUNKDIR
 
diff --git a/test/stop-daemon b/test/stop-daemon
index de5db0c..17a9b13 100755
--- a/test/stop-daemon
+++ b/test/stop-daemon
@@ -19,28 +19,35 @@ killpid () {
 	return 1
 }
 
-rm -f cld.port tabled.acc
+rm -f cld.port tabled.acc tabled-bis.acc tabled-bis.log
 
 ret=0
 
 if [ ! -f tabled.pid ]
 then
 	# Just a warning. Previous test somehow made the daemon to die.
-	echo "No tabled PID file found." >&2
+	echo "No tabled.pid file found." >&2
 else
 	killpid tabled.pid || ret=1
 fi
 
+if [ ! -f tabled-bis.pid ]
+then
+	echo "No tabled-bis.pid file found." >&2
+else
+	killpid tabled-bis.pid || ret=1
+fi
+
 if [ ! -f chunkd.pid ]
 then
-	echo "No chunkd PID file found." >&2
+	echo "No chunkd.pid file found." >&2
 else
 	killpid chunkd.pid || ret=1
 fi
 
 if [ ! -f cld.pid ]
 then
-	echo "No cld PID file found." >&2
+	echo "No cld.pid file found." >&2
 else
 	killpid cld.pid || ret=1
 fi
diff --git a/test/tabled-test-bis.conf b/test/tabled-test-bis.conf
new file mode 100644
index 0000000..0130491
--- /dev/null
+++ b/test/tabled-test-bis.conf
@@ -0,0 +1,21 @@
+
+<PID>tabled-bis.pid</PID>
+<ForceHost>localhost.localdomain</ForceHost>
+<Listen>
+  <Port>auto</Port>
+  <PortFile>tabled-bis.acc</PortFile>
+</Listen>
+<TDB>data/tdb-bis</TDB>
+<TDBRepPort>auto</TDBRepPort>
+<TDBRepName>bis.localhost.localdomain</TDBRepName>
+
+<CLD>
+  <PortFile>cld.port</PortFile>
+  <Host>localhost</Host>
+</CLD>
+
+<ChunkUser>testuser</ChunkUser>
+<ChunkKey>testuser</ChunkKey>
+
+<!-- Setting "default" just to make sure tabled configures it correctly. -->
+<Group>default</Group>
diff --git a/test/wait-for-listen.c b/test/wait-for-listen.c
index fef5028..06bbaeb 100644
--- a/test/wait-for-listen.c
+++ b/test/wait-for-listen.c
@@ -86,13 +86,22 @@ int main(int argc, char **argv)
 {
 	struct server_node snode, *sn = &snode;
 	time_t start_time;
-	const static char accname[] = TEST_FILE_TB;
+	char *accname;
 	char accbuf[80];
 	char *s;
 	int cnt;
 	int sfd;
 	int rc;
 
+	if (argc == 2) {
+		accname = argv[1];
+	} else if (argc == 1) {
+		accname = TEST_FILE_TB;
+	} else {
+		fprintf(stderr, "Usage: wait-for-listen [accessor.file]\n");
+		exit(1);
+	}
+
 	cnt = 0;
 	for (;;) {
 		rc = tb_readport(accname, accbuf, sizeof(accbuf));

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [tabled patch 4/5] Support "auto" replicaton port
  2010-08-12 19:22 ` [tabled patch 4/5] Support "auto" replicaton port Pete Zaitcev
@ 2010-08-13 20:21   ` Jeff Garzik
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Garzik @ 2010-08-13 20:21 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 08/12/2010 03:22 PM, Pete Zaitcev wrote:
> Allow random ports for replication master to listen on.
>
> The patch is somewhat larger than expected, because before we had
> the MASTER file written right after locking. Now we may have it
> written without listening parameters, and the slaves must be
> ready to deal with it.
>
> Unlike the auto client port, we do not need to write any "accessor"
> files, because we already report the host and port through CLD.
>
> Listening on random ports has security implications.
>
> Signed-off-by: Pete Zaitcev<zaitcev@redhat.com>

applied 1-4 of 5


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-13 20:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20100812125517.4edd31db@lembas.zaitcev.lan>
2010-08-12 19:21 ` [tabled patch 1/5] Fix crash when stopping slave Pete Zaitcev
2010-08-12 19:22 ` [tabled patch 2/5] Clean name vs host Pete Zaitcev
2010-08-12 19:22 ` [tabled patch 3/5] cleanup a call to closelog() Pete Zaitcev
2010-08-12 19:22 ` [tabled patch 4/5] Support "auto" replicaton port Pete Zaitcev
2010-08-13 20:21   ` Jeff Garzik
2010-08-12 19:22 ` [tabled patch 5/5] Metadata replication test Pete Zaitcev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.