From: Xeno <xeno@overture.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: 2.4: NFS client kmapping across I/O
Date: Mon, 28 Jan 2002 18:25:08 -0800 [thread overview]
Message-ID: <3C560804.C68BC6F4@overture.com> (raw)
Trond, thanks for the excellent fattr race fix. I'm sorry I haven't
been able to give feedback until now, things got busy for a while. I
have not yet had a chance to run your fixes, but after studying them I
believe that they will resolve the race nicely, especially with the use
of nfs_inode_lock in the recent NFS_ALL experimental patches. FWIW.
Now I also have time to mention the other NFS client issue we ran into
recently, I have not found mention of it on the mailing lists. The NFS
client is kmapping pages for the duration of reads from and writes to
the server. This creates a scaling limitation, especially under
CONFIG_HIGHMEM64G and i386 where there are only 512 entries in the
highmem kmap table. Under I/O load, it is easy to fill up the table,
hanging all processes that need to map highmem pages a substantial
fraction of the time.
Before 2.4.15, it is particularly bad, nfs_flushd locks up the kernel
under I/O load. My testcase was to copy 12 100M files from one NFS
server to another, it was very reliable at producing the lockup right
away. nfs_flushd fills up the kmap table as it sends out requests, then
blocks waiting for another kmap entry to free up. But it is running in
rpciod, so rpciod is blocked and cannot process any responses to the
requests. No kmap entries are ever freed once the table fills up. In
this state, the machine pings and responds to SysRq on the serial port,
but just about everything else hangs.
It looks like nfs_flushd was turned off in 2.4.15, that is the
workaround I have applied to our machines. I have also limited the
number of requests across all NFS servers to LAST_PKMAP-64, to leave
some kmap entries available for non-NFS use. It is not an ideal
workaround, though, it artificially limits I/O to multiple servers.
I've thought about bumping up LAST_PKMAP to increase the size of the
highmem kmap table, but the table looks like it was designed to be
small.
I've also thought about pushing the kmaps and kunmaps down into the RPC
layer, so the pages are only mapped while data is copied to or from
them, not while waiting for the network. That would be more work, but
it looks doable, so I wanted to run the problem and the approach by you
knowledgeable folks while I'm waiting for hardware to free up for kernel
hacking.
Thanks,
Xeno
next reply other threads:[~2002-01-29 2:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-01-29 2:25 Xeno [this message]
2002-01-29 9:05 ` 2.4: NFS client kmapping across I/O Trond Myklebust
2002-01-29 17:42 ` Hugh Dickins
2002-01-29 18:34 ` Rik van Riel
2002-01-30 0:56 ` Xeno
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C560804.C68BC6F4@overture.com \
--to=xeno@overture.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox