Re: [PATCH 3/7] dlm: recovery

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nish Aravamudan <nish.aravamudan@gmail.com>
To: David Teigland <teigland@redhat.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH 3/7] dlm: recovery
Date: Tue, 26 Apr 2005 09:30:25 -0700	[thread overview]
Message-ID: <29495f1d05042609301d47351e@mail.gmail.com> (raw)
In-Reply-To: <20050426072742.GF12096@redhat.com>

On 4/26/05, David Teigland <teigland@redhat.com> wrote:
> On Mon, Apr 25, 2005 at 09:42:21AM -0700, Nish Aravamudan wrote:
> > On 4/25/05, David Teigland <teigland@redhat.com> wrote:
 
<snip>

> > Rather than wrap an infinite loop in an infinite loop, since all you
> > really want the second loop for is a periodic gurantee of
> > recover_timer seconds, wouldn't it be easier to have a sof-timer set
> > up to go off in that amount of time, and have it's callback do an
> > unconditional wake-up on the local wait-queue then mod_timer the timer
> > back in? Or might there be other tasks sleeping on the same
> > wait-queue? At that point, since your code does not seem to care about
> > signals (uses interruptible version, but doesn't check for
> > signal_pending() return value from wait_event), you probably could
> > just use the stock version of wait_event(). The conditions would be
> > checked in wait_event() on the desired periodic basis, just as they
> > are now, but maybe slightly more efficiently (and cleaner code, IMO :)
> > ). If you do *need* interruptible, please put in a comment as to why.
> 
> Don't need inter, wait_event_timeout should work.  Let's see if I followed
> all that; are you suggesting:
> 
> static void dlm_wait_timer_fn(unsigned long data)
> {
>         struct dlm_ls *ls = (struct dlm_ls *) data;

You can do the mod_timer() here if you make the timer_list global
(dlm_timer or something).

mod_timer(&dlm_timer, jiffies + dlm_config.recover_timer * HZ;

>         wake_up(&ls->ls_wait_general);
> }
> 
> int dlm_wait_function(struct dlm_ls *ls, int (*testfn) (struct dlm_ls *ls))
> {
>         struct timer_list timer;
>         int error = 0;
> 
>         init_timer(&timer);
>         timer.function = dlm_wait_timer_fn;
>         timer.data = (long) ls;

do an

timer.expires = jiffies + (dlm_config.recover_timer * HZ);
add_timer(&dlm_timer);

before wait_event().

>         for (;;) {

Should not be need anymore.

>                 mod_timer(&timer, jiffies + (dlm_config.recover_timer * HZ));

Neither should this.

>                 wait_event(ls->ls_wait_general,
>                            testfn(ls) || dlm_recovery_stopped(ls));

So, now, you get woken up every so often to check the conditions *in*
wait_event() and if either is set you'll return to this context.
Otherwise, you'll go back to sleep in wait_event() -- that's why you
shouldn't need the infinite loop here.

>                 if (timer_pending(&timer))
>                         del_timer(&timer);

Good, this makes sure that you won't have the callback happening again
:) Do you need the sync version?

>                 if (testfn(ls))
>                         break;

Since you don't modify error in this case, you won't need this anymore

>                 if (dlm_recovery_stopped(ls)) {
>                         error = -1;
>                         break;
>                 }

Shouldn't need the break anymore, since you there is no loop.

>         }
>         return error;
> }
> 
> [Another thread should usually be calling wake_up().]
> 
> This is actually how it was some months ago, but I thought the timers made
> it more complicated given that wait_event_timeout is available.  Do you
> really think the timers are nicer if we don't use interruptible?

The reason I think timers are nicer is that you don't need to nest
wait_event*() in a loop (and really, it looks strange when you do).
Complicated or not, I think they are the "right thing" for this case.

> > > +int dlm_wait_status_low(struct dlm_ls *ls, unsigned int wait_status)
> > > +{
> > > +       struct dlm_rcom *rc = (struct dlm_rcom *) ls->ls_recover_buf;
> > > +       int error = 0, nodeid = ls->ls_low_nodeid;
> > > +
> > > +       for (;;) {
> > > +               error = dlm_recovery_stopped(ls);
> > > +               if (error)
> > > +                       goto out;
> > > +
> > > +               error = dlm_rcom_status(ls, nodeid);
> > > +               if (error)
> > > +                       break;
> > > +
> > > +               if (rc->rc_result & wait_status)
> > > +                       break;
> > > +               else {
> > > +                       set_current_state(TASK_INTERRUPTIBLE);
> > > +                       schedule_timeout(HZ >> 1);
> >
> > 500 ms is a long time! What's the justification? No comments?
> > Especially considering you will wake up on *every* signal. It's
> > unlikely you'll actually sleep for 500 ms. Also, please use
> > msleep_interruptible(), unless you expect to be woken by  wait-queues
> > (I didn't have enough time to trace all the possible code paths :) )
> > -- in which case, make it explicit with comments, please.
> 
> This is polling the status of a remote node.  All I'm after here is a
> delay between the dlm_rcom_status() calls so that repeated status messages
> aren't flooding the network.  We keep sending status messages until the
> remote node reports the status we want to see.
> 
> When the remote node has _failed_ we're repeating these status requests
> hopelessly until the membership system detects the node failure and
> recovery is aborted (dlm_recovery_stopped) -- up to several seconds.  When
> the remote node is just _slower_ at doing recovery, I'm guessing the
> difference is on average a second.
> 
> Given that info, do you have a suggested delay?  I've already switched to
> msleep.

Nope, that seems reasonable (500ms).

Thanks,
Nish

next prev parent reply	other threads:[~2005-04-26 16:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-25 15:12 [PATCH 3/7] dlm: recovery David Teigland
2005-04-25 16:42 ` Nish Aravamudan
2005-04-26  7:27   ` David Teigland
2005-04-26 16:30     ` Nish Aravamudan [this message]
2005-04-27  5:23       ` David Teigland
2005-04-25 18:24 ` Jesper Juhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29495f1d05042609301d47351e@mail.gmail.com \
    --to=nish.aravamudan@gmail.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox