All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Braam <Peter.Braam@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] hiding non-fatal communications errors
Date: Thu, 05 Jun 2008 20:29:48 -0700	[thread overview]
Message-ID: <C46DFD3C.587B%peter.braam@sun.com> (raw)
In-Reply-To: <1F3456C3-7172-4762-93DB-8589B44189D2@Sun.COM>

Why can we not send early replies?


On 6/5/08 9:59 AM, "Oleg Drokin" <Oleg.Drokin@Sun.COM> wrote:

> Hello!
> 
> On Jun 5, 2008, at 12:42 PM, Robert Read wrote:
> 
>>>> I suspect this could be adapted to allowing a fixed number of
>>>> retries for
>>>> server-originated RPCs also.  In the case of LDLM blocking callbacks
>>>> sent
>>>> to a client, a resend is currently harmless (either the client is
>>>> already
>>>> processing the callback, or the lock was cancelled).
>>> We need to be careful here and decide on a good strategy on when to
>>> resend.
>>> E.g. recent case at ORNL (even if a bit pathologic) is they pound
>>> through
>>> thousands of clients to 4 OSSes via 2 routers. That creates request
>>> waiting
>>> lists on OSSes well into tens of thousands. When we block on a lock
>>> and send
>>> blocking AST to the client, it quickly turns around and puts in his
>>> data...
>>> at the end of our list that takes hundreds of seconds (more than
>>> obd_timeout,
>>> obviously). No matter how much you resend, it won't help.
>> This looks like the poster child for adaptive timeouts, although we
>> might want need some version of the early margin update patch on
>> 15501.  Have you tried enabling AT?
> 
> The problem is AT does not handle this specific case, there is no way to
> deliver "early replay" from a client to server that "I am working on
> it" outside of
> just sending dirty data. But dirty data gets into a queue for way too
> long.
> There re no timed out requests, the only thing timing out is lock that
> is not
> cancelled in time.
> AT was not tried - this is hard to do at ORNL, as client side is Cray
> XT4 machine,
> and updating clients is hard. So they are on 1.4.11 of some sort.
> They can easily update servers, but this won't help, of course.
> 
>> Maybe that's was done to discourage people from disabling AT?
>> Seriously, though, I don't know why that was changed. Perhaps it was
>> done on b1_6 before to AT landed?
> 
> hm, indeed. I see this change in 1.6.3.
> 
> Bye,
>      Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

  reply	other threads:[~2008-06-06  3:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-04 13:25 [Lustre-devel] hiding non-fatal communications errors Eric Barton
2008-06-04 21:17 ` Peter Braam
2008-06-04 22:20   ` Andreas Dilger
2008-06-05  4:12     ` Oleg Drokin
2008-06-05 16:42       ` Robert Read
2008-06-05 16:59         ` Oleg Drokin
2008-06-06  3:29           ` Peter Braam [this message]
2008-06-06  3:38             ` Oleg Drokin
2008-06-06  3:40               ` Peter Braam
2008-06-06  4:41                 ` Andreas Dilger
2008-06-06 11:13                   ` Eric Barton
2008-06-19 20:24                     ` Nathaniel Rutman
2008-06-06 12:23                   ` Peter Braam
2008-06-06  3:37         ` Peter Braam
2008-06-04 23:41   ` Eric Barton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C46DFD3C.587B%peter.braam@sun.com \
    --to=peter.braam@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.