From: Isaac Huang <He.Huang@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] protocol backofs
Date: Tue, 17 Mar 2009 11:28:44 -0400 [thread overview]
Message-ID: <20090317152844.GG17185@sun.com> (raw)
In-Reply-To: <49BEB984.5030206@lbl.gov>
On Mon, Mar 16, 2009 at 01:41:40PM -0700, Andrew C. Uselton wrote:
> Howdy Isaac,
> Nice to meet you. As Eric suggested I am also cc:ing Nick Henke,
> since he might find this an interesting discussion. For all you
> lustre-devel dwellers out there, feel free to chime in.
Hello Andrew, please see my comments inline.
> ......
> The "frank_jag" page shows data collected during 4 test with 256 tasks
> (4 tasks per node on 64 nodes). The target is a single file striped
> across all OSTs of the Lustre file system. Two tests are on Franklin
> and two on Jaguar. Each machine runs a test using the POSIX I/O
> interface and another using the MPI-I/O interface. In the third column
> the Franklin, MPI-I/O test has extremely long delays in the reads in the
> middle phase, but not during the other reads or any of the writes. This
I've got zero knowledge on MPI-IO. Could you please elaborate for a
bit on how this "delays in the reads" are measured and what "the
middle phase" is?
> does not happen for POSIX, nor does it happen for Jaguar using MPI-I/O.
> The results shown are entirely reproducible and not due to interference
> from other jobs on the system. The only difference between the Franklin
> and Jaguar configurations is that Jaguar has 144 OSTs on 72 OSSs instead
> of 80 OSTs on 20 OSSs.
Not sure about Franklin, but on Jaguar, depending on the file-system in
use, the OSSs could reside in either the Sea-Star network or an IB
network (accessed via lnet routers). I think it might be worthwhile to
double check what server network had been used.
> Eric put the notion in my head that that we may be looking at a
> contention issue in the Sea-Star network. Since the I/O is being necked
> down to 20 OSSs in the case of Franklin, this seems plausible. If you
> guys have a moment to consider the subject I'd like to think about:
> a) Why would contention introduce the catastrophic delays rather than
> just slow things down generally and more or less evenly? Is there some
> form of back-off in the protocol(s) that could occasionally get kicked
> up to tens of seconds?
It involves many layers:
1. At Lustre/PTLRPC layer, there is a limit on the number of in-flight
RPCs to a server. This is end-to-end, and the limit could change at
runtime.
2. At lnet/lnd layer, for ptllnd and o2iblnd, there's a credit-based
mechanism to prevent a sending node from overrunning buffers at the
remote end. This is not end-to-end, and the number of pre-granted
credits doesn't change over runtime.
3. Cray Portals and the Sea-Star network runs beneath lnet/ptllnd,
and I'd think that there could also be some similar mechanisms.
Thanks,
Isaac
next prev parent reply other threads:[~2009-03-17 15:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <49BEA192.2050701@lbl.gov>
[not found] ` <029901c9a66b$d7107020$85315060$@com>
2009-03-16 20:41 ` [Lustre-devel] protocol backofs Andrew C. Uselton
2009-03-16 22:13 ` Robert Latham
2009-03-16 22:41 ` Andrew C. Uselton
2009-03-17 15:28 ` Isaac Huang [this message]
2009-03-17 21:45 ` Andrew C. Uselton
2009-03-17 18:13 ` Isaac Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090317152844.GG17185@sun.com \
--to=he.huang@sun.com \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.