From: Stuart Kendrick <skendric@fhcrc.org>
To: linux-nfs@vger.kernel.org
Subject: ls stalls
Date: Wed, 16 May 2012 16:34:18 -0700 [thread overview]
Message-ID: <4FB4397A.6060104@fhcrc.org> (raw)
In-Reply-To: <4FB43655.5040907@fhcrc.org>
Hi folks,
On our large memory (64GB) HPC nodes, we intermittently see what we call
'interactive stalls': pauses in receiving 'ls' output. Also, bash
shell completion stalls, emacs stalls. We've hacked /bin/ls to time how
long it takes to complete and then to log diagnostic information when
that time exceeds 3 seconds. In some cases, the result isn't surprising
-- a directory containing thousands or tens of thousands of files,
hosted on slow storage, might well take seconds to display. But most of
the time, these stalls occur on directories containing tens or
occasionally hundreds of files; 'ls' on such a directory normally takes
a millisecond or less to complete. Stalls vary in length: most of them
under 10s, with a significant portion under 100s, and the occasional
stall in the 100-300s range.
I've been correlating strace output ('strace -f -tt ls {directory}')
with packet traces. And I see the following pattern:
(A) The stall occurs between a 'stat' on the directory and the 'open' on
the directory ... and sometimes, though not always, between the 'open'
and the following 'fcntl'. Here's an example of a 10s stall:
17:20:01.365375
stat("/shared/silo_r/xxx/colongwas_archive/plco-sshfs/pancreatic-panscan-dbgap/panscan-work/610-gtc",
{st_mode=S_IFDIR|0770, st_size=327680, ...}) = 0
17:20:11.774368
open("/shared/silo_r/xxx/colongwas_archive/plco-sshfs/pancreatic-panscan-dbgap/panscan-work/610-gtc",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
And of a ~200s stall:
10:30:01.768459
stat("/shared/silo_r/xxx/colongwas_archive/plco-sshfs/pancreatic-panscan-dbgap/panscan-work/610-gtc",
{st_mode=S_IFDIR|0770, st_size=327680, ...}) = 0
10:33:06.072659
open("/shared/silo_r/xxx/colongwas_archive/plco-sshfs/pancreatic-panscan-dbgap/panscan-work/610-gtc",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
10:33:28.884426 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
10:33:28.884600 getdents64(3, /* 683 entries */, 32768) = 32736
(B) On the wire, during that stall, the HPC node says nothing to the NFS
server (sometimes literally, sometimes it is reading or writing in
support of some other task/user, but not emitting GETATTR or READDIR or
READDIRPLUS calls). [No dropped frames, no TCP pathology.]
(C) Network IO is noticeable: the node is reading and/or writing,
rapidly, with at least one of the handful of NFS servers which provide
storage to the HPC environment.
The clients are all running OpenSuse 11.3 Teal (kernel
2.6.34.10-0.2-default). The NFS servers are a mix -- Solaris 10,
several NetApps, Windows 2008 -- backed by several different storage
systems.
Diagrams and related information visible at
https://vishnu.fhcrc.org/Rhino-RCA/
Insights? Suggestions?
--sk
Stuart Kendrick
Fred Hutchinson Cancer Research Center
Seattle, WA USA
next parent reply other threads:[~2012-05-16 23:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4FB43655.5040907@fhcrc.org>
2012-05-16 23:34 ` Stuart Kendrick [this message]
2012-05-16 23:41 ` ls stalls Chuck Lever
2012-05-17 13:12 ` Stuart Kendrick
2012-05-17 13:46 ` Chuck Lever
2012-05-17 23:51 ` Stuart Kendrick
2012-05-29 13:07 ` Stuart Kendrick
2012-05-29 16:20 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FB4397A.6060104@fhcrc.org \
--to=skendric@fhcrc.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).