From: David Teigland <teigland@redhat.com>
To: Nish Aravamudan <nish.aravamudan@gmail.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH 3/7] dlm: recovery
Date: Tue, 26 Apr 2005 15:27:42 +0800 [thread overview]
Message-ID: <20050426072742.GF12096@redhat.com> (raw)
In-Reply-To: <29495f1d05042509426681c4a0@mail.gmail.com>
On Mon, Apr 25, 2005 at 09:42:21AM -0700, Nish Aravamudan wrote:
> On 4/25/05, David Teigland <teigland@redhat.com> wrote:
> > +int dlm_wait_function(struct dlm_ls *ls, int (*testfn) (struct dlm_ls *ls))
> > +{
> > + int error = 0;
> > +
> > + for (;;) {
> > + wait_event_interruptible_timeout(ls->ls_wait_general,
> > + testfn(ls) ||
> > + test_bit(LSFL_LS_STOP, &ls->ls_flags),
>
> Shouldn't this be
> dlm_recover_stopped(ls) and not test_bit()?
yes, done
> > + (dlm_config.recover_timer * HZ));
> > + if (testfn(ls))
> > + break;
> > + if (dlm_recovery_stopped(ls)) {
> > + error = -1;
> > + break;
> > + }
> > + }
> > +
> > + return error;
> > +}
>
> Rather than wrap an infinite loop in an infinite loop, since all you
> really want the second loop for is a periodic gurantee of
> recover_timer seconds, wouldn't it be easier to have a sof-timer set
> up to go off in that amount of time, and have it's callback do an
> unconditional wake-up on the local wait-queue then mod_timer the timer
> back in? Or might there be other tasks sleeping on the same
> wait-queue? At that point, since your code does not seem to care about
> signals (uses interruptible version, but doesn't check for
> signal_pending() return value from wait_event), you probably could
> just use the stock version of wait_event(). The conditions would be
> checked in wait_event() on the desired periodic basis, just as they
> are now, but maybe slightly more efficiently (and cleaner code, IMO :)
> ). If you do *need* interruptible, please put in a comment as to why.
Don't need inter, wait_event_timeout should work. Let's see if I followed
all that; are you suggesting:
static void dlm_wait_timer_fn(unsigned long data)
{
struct dlm_ls *ls = (struct dlm_ls *) data;
wake_up(&ls->ls_wait_general);
}
int dlm_wait_function(struct dlm_ls *ls, int (*testfn) (struct dlm_ls *ls))
{
struct timer_list timer;
int error = 0;
init_timer(&timer);
timer.function = dlm_wait_timer_fn;
timer.data = (long) ls;
for (;;) {
mod_timer(&timer, jiffies + (dlm_config.recover_timer * HZ));
wait_event(ls->ls_wait_general,
testfn(ls) || dlm_recovery_stopped(ls));
if (timer_pending(&timer))
del_timer(&timer);
if (testfn(ls))
break;
if (dlm_recovery_stopped(ls)) {
error = -1;
break;
}
}
return error;
}
[Another thread should usually be calling wake_up().]
This is actually how it was some months ago, but I thought the timers made
it more complicated given that wait_event_timeout is available. Do you
really think the timers are nicer if we don't use interruptible?
> > +int dlm_wait_status_low(struct dlm_ls *ls, unsigned int wait_status)
> > +{
> > + struct dlm_rcom *rc = (struct dlm_rcom *) ls->ls_recover_buf;
> > + int error = 0, nodeid = ls->ls_low_nodeid;
> > +
> > + for (;;) {
> > + error = dlm_recovery_stopped(ls);
> > + if (error)
> > + goto out;
> > +
> > + error = dlm_rcom_status(ls, nodeid);
> > + if (error)
> > + break;
> > +
> > + if (rc->rc_result & wait_status)
> > + break;
> > + else {
> > + set_current_state(TASK_INTERRUPTIBLE);
> > + schedule_timeout(HZ >> 1);
>
> 500 ms is a long time! What's the justification? No comments?
> Especially considering you will wake up on *every* signal. It's
> unlikely you'll actually sleep for 500 ms. Also, please use
> msleep_interruptible(), unless you expect to be woken by wait-queues
> (I didn't have enough time to trace all the possible code paths :) )
> -- in which case, make it explicit with comments, please.
This is polling the status of a remote node. All I'm after here is a
delay between the dlm_rcom_status() calls so that repeated status messages
aren't flooding the network. We keep sending status messages until the
remote node reports the status we want to see.
When the remote node has _failed_ we're repeating these status requests
hopelessly until the membership system detects the node failure and
recovery is aborted (dlm_recovery_stopped) -- up to several seconds. When
the remote node is just _slower_ at doing recovery, I'm guessing the
difference is on average a second.
Given that info, do you have a suggested delay? I've already switched to
msleep.
Dave
next prev parent reply other threads:[~2005-04-26 7:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-25 15:12 [PATCH 3/7] dlm: recovery David Teigland
2005-04-25 16:42 ` Nish Aravamudan
2005-04-26 7:27 ` David Teigland [this message]
2005-04-26 16:30 ` Nish Aravamudan
2005-04-27 5:23 ` David Teigland
2005-04-25 18:24 ` Jesper Juhl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050426072742.GF12096@redhat.com \
--to=teigland@redhat.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nish.aravamudan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox