From: Menyhart Zoltan <Zoltan.Menyhart@bull.net>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Patch: making DLM more robust
Date: Wed, 01 Dec 2010 10:23:25 +0100 [thread overview]
Message-ID: <4CF6140D.7060809@bull.net> (raw)
In-Reply-To: <20101130173051.GB27123@redhat.com>
David Teigland wrote:
> Thanks, I'll take a look; as long as it's disabled by default I don't
> expect I'd object much. There are two main problems with this idea,
> though, that need to be handled before it's generally usable:
>
> 1. The kernel can wait on user space indefinately during completely normal
> situations, e.g. the loss of quorum or fencing failures can delay
> completion indefinately.
In my eyes, a networked application should indicate a failure within a
"human expectable" time delay. E.g.:
- You can try a DLM_USER_CREATE_LOCKSPACE for 5 seconds
- If it times out, you can log it, display some status telling the user
that it has already been retried for H hours M minutes and S seconds
- And retry (if configured so to do by itself) if there is no intervention
> This means you can easily introduce false
> failures when using a timeout.
If we cannot obtain a given resource within a limited time frame,
then it is a real error for the customer: s/he cannot mount an OCFS2
volume, cannot issue a cluster command, etc.
> EINTR, since it's driven by user
> intervention, is a better idea, e.g. killing a mount process.
>
> 2. The difficulty, even with EINTR, is correctly and cleanly unwinding the
> dlm_controld state.
Let's take this example indlm/libdlm/libdlm.c:
int create_lockspace_v6(const char *name, uint32_t flags)
{
char reqbuf[sizeof(struct dlm_write_request) + DLM_LOCKSPACE_LEN];
struct dlm_write_request *req = (struct dlm_write_request *)reqbuf;
int namelen = strlen(name);
memset(reqbuf, 0, sizeof(reqbuf));
set_version_v6(req);
req->cmd = DLM_USER_CREATE_LOCKSPACE;
req->i.lspace.flags = flags;
if (namelen > DLM_LOCKSPACE_LEN) {
errno = EINVAL;
return -1;
}
memcpy(req->i.lspace.name, name, namelen);
return write(control_fd, req, sizeof(*req) + namelen);
}
The caller should already be prepared to unwind everything in case of an
EINVAL is returned due to a name length error.
"write()" can also return several errors.
We will have two more error codes:
EINTR: there is no much difference if the signal arrives just before we
call "write()" or inside the system call...
If you already ignore it... If you already handle it...
ETIMEDOUT:see above
There should be a smooth way out from errors, other than hard reseting the
machine :-)
Thanks,
Zoltan Menyhart
next prev parent reply other threads:[~2010-12-01 9:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 16:31 [Cluster-devel] "->ls_in_recovery" not released Menyhart Zoltan
2010-11-22 17:34 ` David Teigland
2010-11-23 14:58 ` Menyhart Zoltan
2010-11-23 17:15 ` David Teigland
2010-11-24 16:13 ` Menyhart Zoltan
2010-11-24 20:29 ` David Teigland
2010-11-30 16:57 ` [Cluster-devel] Patch: making DLM more robust Menyhart Zoltan
2010-11-30 17:30 ` David Teigland
2010-12-01 9:23 ` Menyhart Zoltan [this message]
2010-12-01 17:27 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CF6140D.7060809@bull.net \
--to=zoltan.menyhart@bull.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.