* [Patch 2/2] tabled: rotate the input chunkserver
@ 2009-10-23 5:26 Pete Zaitcev
2009-10-23 20:40 ` Jeff Garzik
0 siblings, 1 reply; 3+ messages in thread
From: Pete Zaitcev @ 2009-10-23 5:26 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Project Hail List
Another step to data redundancy: select the input chunkserver more
properly than just using the first one. Note: presently, if the
chunkserver dies while we're accessing it, we do not recover.
However, application can retry and hit an available one.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
---
server/object.c | 49 ++++++++++++++++++++++++++++++++--------------
server/tabled.h | 1
2 files changed, 36 insertions(+), 14 deletions(-)
commit 13ba45da80def6e206ba0ebb69185c20461f523a
Author: Master <zaitcev@hitlain.zaitcev.lan>
Date: Thu Oct 22 22:51:23 2009 -0600
Rotate input chunkservers.
diff --git a/server/object.c b/server/object.c
index 8fc5c6f..69f3af0 100644
--- a/server/object.c
+++ b/server/object.c
@@ -1054,7 +1054,7 @@ bool object_get_body(struct client *cli, const char *user, const char *bucket,
{
char *md5;
char timestr[64], modstr[64], *hdr, *tmp;
- int rc, i;
+ int rc, i, n;
enum errcode err = InternalError;
char buf[4096];
ssize_t bytes;
@@ -1150,35 +1150,56 @@ bool object_get_body(struct client *cli, const char *user, const char *bucket,
cli->in_objid = GUINT64_FROM_LE(obj->d.a.oid);
+ n = 0;
for (i = 0; i < MAXWAY; i++ ) {
uint32_t nid;
+ nid = GUINT32_FROM_LE(obj->d.a.nidv[i]);
+ if (nid)
+ n++;
+ }
+ cli->in_retry = n * 2;
- nid = GUINT32_FROM_LE(obj->d.a.nidv[0]);
- if (!nid)
- continue;
- stnode = stor_node_by_nid(nid);
- if (stnode) /* FIXME temporarily 1-way */
- break;
+ stnode_open_retry:
+ if (cli->in_retry == 0) {
+ applog(LOG_ERR, "No input nodes for oid %llX", cli->in_objid);
+ goto err_out_str;
+ }
+ --cli->in_retry;
- applog(LOG_ERR, "No chunk node nid %u for oid %llX",
- nid, cli->in_objid);
+ stnode = NULL;
+ n = rand() % MAXWAY;
+ for (i = 0; i < MAXWAY; i++ ) {
+ uint32_t nid;
+ nid = GUINT32_FROM_LE(obj->d.a.nidv[n]);
+ if (nid) {
+ stnode = stor_node_by_nid(nid);
+ if (stnode) {
+ if (debugging)
+ applog(LOG_DEBUG,
+ "Selected nid %u for oid %llX",
+ nid, cli->in_objid);
+ break;
+ }
+ }
+ n = (n + 1) % MAXWAY;
}
if (!stnode)
- goto err_out_str;
+ goto stnode_open_retry;
rc = stor_open(&cli->in_ce, stnode);
if (rc < 0) {
applog(LOG_WARNING, "Cannot open input chunk, nid %u (%d)",
stnode->id, rc);
- goto err_out_str;
+ goto stnode_open_retry;
}
rc = stor_open_read(&cli->in_ce, object_get_event, cli->in_objid,
&objsize);
if (rc < 0) {
- applog(LOG_ERR, "open oid %llX failed, nid %u (%d)",
- (unsigned long long) cli->in_objid, stnode->id, rc);
- goto err_out_str;
+ applog(LOG_ERR, "Cannot start nid %u for oid %llX (%d)",
+ stnode->id, (unsigned long long) cli->in_objid, rc);
+ stor_close(&cli->in_ce);
+ goto stnode_open_retry;
}
cli->in_ce.cli = cli;
diff --git a/server/tabled.h b/server/tabled.h
index d10835e..e4dbbd5 100644
--- a/server/tabled.h
+++ b/server/tabled.h
@@ -181,6 +181,7 @@ struct client {
unsigned char *in_mem;
uint64_t in_objid;
long in_len;
+ int in_retry;
/* we put the big arrays and objects at the end... */
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [Patch 2/2] tabled: rotate the input chunkserver
2009-10-23 5:26 [Patch 2/2] tabled: rotate the input chunkserver Pete Zaitcev
@ 2009-10-23 20:40 ` Jeff Garzik
2009-10-24 2:19 ` Pete Zaitcev
0 siblings, 1 reply; 3+ messages in thread
From: Jeff Garzik @ 2009-10-23 20:40 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: Project Hail List
On 10/23/2009 01:26 AM, Pete Zaitcev wrote:
> Another step to data redundancy: select the input chunkserver more
> properly than just using the first one. Note: presently, if the
> chunkserver dies while we're accessing it, we do not recover.
> However, application can retry and hit an available one.
>
> Signed-off-by: Pete Zaitcev<zaitcev@redhat.com>
applied 1-2
As long as we (a) return an error documented in S3 spec as needing a
retry, or (b) close the TCP connection before Content-Length is
satisfied, the S3 client should retry, or at least, notice an error and
not silently corrupt data.
Have you verified that (b) occurs, if chunk->tabled->client data
pipeline is interrupted after data transfer is under way?
Jeff
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Patch 2/2] tabled: rotate the input chunkserver
2009-10-23 20:40 ` Jeff Garzik
@ 2009-10-24 2:19 ` Pete Zaitcev
0 siblings, 0 replies; 3+ messages in thread
From: Pete Zaitcev @ 2009-10-24 2:19 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Project Hail List
On Fri, 23 Oct 2009 16:40:32 -0400, Jeff Garzik <jeff@garzik.org> wrote:
> As long as we (a) return an error documented in S3 spec as needing a
> retry, or (b) close the TCP connection before Content-Length is
> satisfied, the S3 client should retry, or at least, notice an error and
> not silently corrupt data.
>
> Have you verified that (b) occurs, if chunk->tabled->client data
> pipeline is interrupted after data transfer is under way?
I haven't sorry. Boto by itself doesn't retry, and I don't think
Duplicity does, but I don't know. It's on my "list" to try, but...
-- Pete
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-10-24 2:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-23 5:26 [Patch 2/2] tabled: rotate the input chunkserver Pete Zaitcev
2009-10-23 20:40 ` Jeff Garzik
2009-10-24 2:19 ` Pete Zaitcev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.