From: Eric Barton <eeb@sun.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] server-side resending & bulk transfer
Date: Sat, 06 Feb 2010 12:28:45 +0000 [thread overview]
Message-ID: <008101caa727$ed16ad90$c74408b0$@com> (raw)
In-Reply-To: <20100205202013.GN1061@Sun.COM>
Nico,
> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams at Sun.COM]
> Sent: 05 February 2010 8:20 PM
> To: Eric Barton
> Cc: 'Johann Lombardi'; lustre-devel at lists.lustre.org
> Subject: Re: [Lustre-devel] server-side resending & bulk transfer
<snip>
> I agree that tying down a server thread on a long block is not a good
> thing. If the LLNL proposal (resend the start bulk signal) is on the
> money, then the thing to do would be to create a queue and separate
> service thread(s) to handle such resends.
That's a dreadful layering violation - how LNET implements
GET and PUT is down to each LND in each network traversed. The only
think you can do at the Lustre level is retry the GET or PUT on the
assumption that router failure caused the timeout, not the client's
death.
> > Roll on the health network! :)
>
> Well, if the deadline here is on the order of 1s or thereabouts then the
> health network isn't likely to help much because we're not going to get
> sub-second dead node detection. (Well, if we jack up the ping rate and
> reduce the time-to-declare-death low enough, and make sure that HN
> threads and messaging are suitably prioritized, then we might be able to
> get sub-second dead node detection, but my gut feeling is that any
> heuristic approach should wait for longer than 1s.)
The point is that the server can legitimately dedicate a thread retrying
communications with the client until it discovers the client is dead.
Currently the bulk timeout is the sole, yet unreliable indication of
this. A health network that provided reliable notification within 10s
of seconds would be a considerable improvement.
Cheers,
Eric
next prev parent reply other threads:[~2010-02-06 12:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20100205163524.GW236@granier.hd.free.fr>
2010-02-05 17:12 ` [Lustre-devel] server-side resending & bulk transfer Eric Barton
2010-02-05 20:20 ` Nicolas Williams
2010-02-06 12:28 ` Eric Barton [this message]
2010-02-09 19:21 ` Nathan Rutman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='008101caa727$ed16ad90$c74408b0$@com' \
--to=eeb@sun.com \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.