[Cluster-devel] Patch: making DLM more robust

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Menyhart Zoltan <Zoltan.Menyhart@bull.net>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Patch: making DLM more robust
Date: Wed, 01 Dec 2010 10:23:25 +0100	[thread overview]
Message-ID: <4CF6140D.7060809@bull.net> (raw)
In-Reply-To: <20101130173051.GB27123@redhat.com>

David Teigland wrote:

> Thanks, I'll take a look; as long as it's disabled by default I don't
> expect I'd object much.  There are two main problems with this idea,
> though, that need to be handled before it's generally usable:
>
> 1. The kernel can wait on user space indefinately during completely normal
> situations, e.g. the loss of quorum or fencing failures can delay
> completion indefinately.

In my eyes, a networked application should indicate a failure within a
"human expectable" time delay. E.g.:
- You can try a DLM_USER_CREATE_LOCKSPACE for 5 seconds
- If it times out, you can log it, display some status telling the user
   that it has already been retried for H hours M minutes  and S seconds
- And retry (if configured so to do by itself) if there is no intervention

> This means you can easily introduce false
> failures when using a timeout.

If we cannot obtain a given resource within a limited time frame,
then it is a real error for the customer: s/he cannot mount an OCFS2
volume, cannot issue a cluster command, etc.

> EINTR, since it's driven by user
> intervention, is a better idea, e.g. killing a mount process.
>
> 2. The difficulty, even with EINTR, is correctly and cleanly unwinding the
> dlm_controld state.

Let's take this example indlm/libdlm/libdlm.c:

int create_lockspace_v6(const char *name, uint32_t flags)
{
         char reqbuf[sizeof(struct dlm_write_request) + DLM_LOCKSPACE_LEN];
         struct dlm_write_request *req = (struct dlm_write_request *)reqbuf;
         int namelen = strlen(name);

         memset(reqbuf, 0, sizeof(reqbuf));
         set_version_v6(req);
         req->cmd = DLM_USER_CREATE_LOCKSPACE;
         req->i.lspace.flags = flags;
         if (namelen > DLM_LOCKSPACE_LEN) {
                 errno = EINVAL;
                 return -1;
         }
         memcpy(req->i.lspace.name, name, namelen);
         return write(control_fd, req, sizeof(*req) + namelen);
}

The caller should already be prepared to unwind everything in case of an
EINVAL is returned due to a name length error.
"write()" can also return several errors.

We will have two more error codes:

EINTR: there is no much difference if the signal arrives just before we
call "write()" or inside the system call...
If you already ignore it... If you already handle it...

ETIMEDOUT:see above

There should be a smooth way out from errors, other than hard reseting the
machine :-)

Thanks,

Zoltan Menyhart

next prev parent reply	other threads:[~2010-12-01  9:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-22 16:31 [Cluster-devel] "->ls_in_recovery" not released Menyhart Zoltan
2010-11-22 17:34 ` David Teigland
2010-11-23 14:58   ` Menyhart Zoltan
2010-11-23 17:15     ` David Teigland
2010-11-24 16:13       ` Menyhart Zoltan
2010-11-24 20:29         ` David Teigland
2010-11-30 16:57       ` [Cluster-devel] Patch: making DLM more robust Menyhart Zoltan
2010-11-30 17:30         ` David Teigland
2010-12-01  9:23           ` Menyhart Zoltan [this message]
2010-12-01 17:27             ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF6140D.7060809@bull.net \
    --to=zoltan.menyhart@bull.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.