Re: [PATCH 3/7] dlm: recovery

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nish Aravamudan <nish.aravamudan@gmail.com>
To: David Teigland <teigland@redhat.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH 3/7] dlm: recovery
Date: Mon, 25 Apr 2005 09:42:21 -0700	[thread overview]
Message-ID: <29495f1d05042509426681c4a0@mail.gmail.com> (raw)
In-Reply-To: <20050425151239.GD6826@redhat.com>

On 4/25/05, David Teigland <teigland@redhat.com> wrote:
> 
> When a node is removed from a lockspace, recovery is required for that
> lockspace on all the remaining lockspace members.  Recovery involves: a
> full rebuild of the distributed resource directory, selecting a new master
> node for locks/resources previously mastered on the removed node, and
> rebuilding master-copy locks on newly selected masters.
> 
> Signed-Off-By: Dave Teigland <teigland@redhat.com>
> Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>

<snip>

> +int dlm_wait_function(struct dlm_ls *ls, int (*testfn) (struct dlm_ls *ls))
> +{
> +       int error = 0;
> +
> +       for (;;) {
> +               wait_event_interruptible_timeout(ls->ls_wait_general,
> +                                       testfn(ls) ||
> +                                       test_bit(LSFL_LS_STOP, &ls->ls_flags),

Shouldn't this be
     dlm_recover_stopped(ls) and not test_bit()?
?

> +                                       (dlm_config.recover_timer * HZ));
> +               if (testfn(ls))
> +                       break;
> +               if (dlm_recovery_stopped(ls)) {
> +                       error = -1;
> +                       break;
> +               }
> +       }
> +
> +       return error;
> +}

Rather than wrap an infinite loop in an infinite loop, since all you
really want the second loop for is a periodic gurantee of
recover_timer seconds, wouldn't it be easier to have a sof-timer set
up to go off in that amount of time, and have it's callback do an
unconditional wake-up on the local wait-queue then mod_timer the timer
back in? Or might there be other tasks sleeping on the same
wait-queue? At that point, since your code does not seem to care about
signals (uses interruptible version, but doesn't check for
signal_pending() return value from wait_event), you probably could
just use the stock version of wait_event(). The conditions would be
checked in wait_event() on the desired periodic basis, just as they
are now, but maybe slightly more efficiently (and cleaner code, IMO :)
). If you do *need* interruptible, please put in a comment as to why.

Maybe we need yet another interface? :)

<snip>

> +int dlm_wait_status_all(struct dlm_ls *ls, unsigned int wait_status)
> +{
> +       struct dlm_rcom *rc = (struct dlm_rcom *) ls->ls_recover_buf;
> +       struct dlm_member *memb;
> +       int error = 0;
> +
> +       list_for_each_entry(memb, &ls->ls_nodes, list) {
> +               for (;;) {
> +                       error = dlm_recovery_stopped(ls);
> +                       if (error)
> +                               goto out;
> +
> +                       error = dlm_rcom_status(ls, memb->nodeid);
> +                       if (error)
> +                               goto out;
> +
> +                       if (rc->rc_result & wait_status)
> +                               break;
> +                       else {
> +                               set_current_state(TASK_INTERRUPTIBLE);
> +                               schedule_timeout(HZ >> 1);

<snip>

> +int dlm_wait_status_low(struct dlm_ls *ls, unsigned int wait_status)
> +{
> +       struct dlm_rcom *rc = (struct dlm_rcom *) ls->ls_recover_buf;
> +       int error = 0, nodeid = ls->ls_low_nodeid;
> +
> +       for (;;) {
> +               error = dlm_recovery_stopped(ls);
> +               if (error)
> +                       goto out;
> +
> +               error = dlm_rcom_status(ls, nodeid);
> +               if (error)
> +                       break;
> +
> +               if (rc->rc_result & wait_status)
> +                       break;
> +               else {
> +                       set_current_state(TASK_INTERRUPTIBLE);
> +                       schedule_timeout(HZ >> 1);

500 ms is a long time! What's the justification? No comments?
Especially considering you will wake up on *every* signal. It's
unlikely you'll actually sleep for 500 ms. Also, please use
msleep_interruptible(), unless you expect to be woken by  wait-queues
(I didn't have enough time to trace all the possible code paths :) )
-- in which case, make it explicit with comments, please.

Thanks,
Nish

next prev parent reply	other threads:[~2005-04-25 16:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-25 15:12 [PATCH 3/7] dlm: recovery David Teigland
2005-04-25 16:42 ` Nish Aravamudan [this message]
2005-04-26  7:27   ` David Teigland
2005-04-26 16:30     ` Nish Aravamudan
2005-04-27  5:23       ` David Teigland
2005-04-25 18:24 ` Jesper Juhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29495f1d05042509426681c4a0@mail.gmail.com \
    --to=nish.aravamudan@gmail.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.