From: Martin Knoblauch <knobi-Ys4E+72pFW0hFhg+JK9F0w@public.gmane.org>
To: Peter Staubach <staubach@redhat.com>
Cc: linux-nfs list <linux-nfs@vger.kernel.org>, linux-kernel@vger.kernel.org
Subject: Re: [RFC][Resend] Make NFS-Client readahead tunable
Date: Wed, 17 Sep 2008 09:03:17 -0700 (PDT) [thread overview]
Message-ID: <270934.3119.qm@web32606.mail.mud.yahoo.com> (raw)
----- Original Message ----
> From: Peter Staubach <staubach@redhat.com>
> To: Martin Knoblauch <knobi-Ys4E+72pFW0hFhg+JK9F0w@public.gmane.org>
> Cc: linux-nfs list <linux-nfs@vger.kernel.org>; linux-kernel@vger.kernel.org
> Sent: Wednesday, September 17, 2008 4:06:44 PM
> Subject: Re: [RFC][Resend] Make NFS-Client readahead tunable
>
> Martin Knoblauch wrote:
> > Hi,
> >
> > the following/attached patch works around a [obscure] problem when an 2.6 (not
> sure/caring about 2.4) NFS client accesses an "offline" file on a Sun/Solaris-10
> NFS server when the underlying filesystem is of type SAM-FS. Happens with
> RHEL4/5 and mainline kernels. Frankly, it is not a Linux problem, but the chance
> for a short-/mid-term solution from Sun are very slim. So, being lazy, I would
> love to get this patch into Linux. If not, I just will have to maintain it for
> eternity out of tree.
> >
> > The problem: SAM-FS is Suns proprietary HSM filesystem. It stores meta-data
> and a relatively small amount of data "online" on disk and pushes old or
> infrequently used data to "offline" media like e.g. tape. This is completely
> transparent to the users. If the date for an "offline" file is needed, the so
> called "stager daemon" copies it back from the offline medium. All of this works
> great most of the time. Now, if an Linux NFS client tries to read such an
> offline file, performance drops to "extremely slow". After lengthly
> investigation of tcp-dumps, mount options and procedures involving black cats at
> midnight, we found out that the readahead behaviour of the Linux NFS client
> causes the problem. Basically it seems to issue read requests up to 15*rsize to
> the server. In the case of the "offline" files, this behaviour causes heavy
> competition for the inode lock between the NFSD process and the stager daemon on
> the Solaris server.
> >
> > - The real solution: fixing SAM-FS/NFSD interaction. Sun engineering acks the
> problem, but a solution will need time. Lots of it.
> > - The working solution: disable the client side readahead, or make it tunable.
> The patch does that by introducing a NFS module parameter "ra_factor" which can
> take values between 1 and 15 (default 15) and a tunable
> "/proc/sys/fs/nfs/nfs_ra_factor" with the same range and default.
>
> Hi.
>
> I was curious if a design to limit or eliminate read-ahead
> activity when the server returns EJUKEBOX was considered?
not seriously, because that would need a lot more knowledge about the internal workings of the NFS-Client than I have. The Solaris client seems to be working along that lines, but the code to modify the readahead window looks complicated. The Solaris client also seems to be a lot less agressive when doing readahead. Maximum seems to be 4x8k. As far as I see, the Linux client doesn't really care about the readahead handling at all. It just fills "server->backing_dev_info.ra_pages" and leaves the handling to the MM system.
Then, there is no guarantee that EJUKEBOX is ever sent by the server. If the offline archive resides on disk (e.g. a cheap SATA array), delivery will start almost immediatelly and the server will not send that error. Tracked that :-( Same for already positioned tapes.
> Unless one can know that the server and client can get into
> this situation ahead of time, how would the tunable be used?
>
Basically one has to know that the problem exists (that is easily detected) and that the readahead factor is involved.
My patch has of course some pitfalls. at least:
a) as implemented, the nfs_ra_factor will be used for all NFS mounts. It should/could be per filesystem, but that needs a new mount option and I did not want to touch that code due to lack of understanding (and no time to aquire said understanding). But frankly, so far we have not observed any serious performance drawbacks with ra_factor=1.
b) changing the factor needs a remount, as the NFS client only cares about it at that time.
Not a problem in my situation of course.
Cheers
Martin
next reply other threads:[~2008-09-17 16:03 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-17 16:03 Martin Knoblauch [this message]
-- strict thread matches above, loose matches on Subject: below --
2008-09-21 12:53 [RFC][Resend] Make NFS-Client readahead tunable Martin Knoblauch
2008-09-21 12:50 Martin Knoblauch
[not found] ` <968192.84087.qm-f6uctMgKLEavuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-21 13:53 ` Chuck Lever
2008-09-18 11:53 Martin Knoblauch
[not found] ` <688309.69831.qm-lSXk2nNw7cevuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-18 18:24 ` Chuck Lever
[not found] ` <76bd70e30809181124t78c0d574gaed5702095c02921-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-09-18 19:03 ` Peter Staubach
2008-09-18 9:32 Martin Knoblauch
2008-09-18 8:38 Martin Knoblauch
[not found] ` <124712.40022.qm-n7KXdZBPtPqvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-18 8:47 ` Andrew Morton
2008-09-18 8:57 ` Greg Banks
2008-09-18 13:20 ` Peter Zijlstra
2008-09-18 8:19 Martin Knoblauch
[not found] ` <136998.55258.qm-RqHyxEpxwZuvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-18 8:45 ` Greg Banks
2008-09-18 7:42 Martin Knoblauch
[not found] ` <418380.19358.qm-1+WuAixcP4WvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-18 8:18 ` Andrew Morton
2008-09-17 17:01 Martin Knoblauch
2008-09-17 16:23 Martin Knoblauch
[not found] ` <804604.40886.qm-f6uctMgKLEavuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-17 16:43 ` Chuck Lever
2008-09-17 16:15 Martin Knoblauch
2008-09-17 16:10 Martin Knoblauch
2008-09-17 13:42 Michael Trimarchi
2008-09-17 13:27 Martin Knoblauch
2008-09-17 13:25 Martin Knoblauch
[not found] ` <995475.95604.qm-f6uctMgKLEavuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-17 15:31 ` Jim Rees
2008-09-17 13:19 Michael Trimarchi
2008-09-17 13:06 Martin Knoblauch
[not found] ` <997439.5560.qm-VAEUvbQToQWvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-09-17 13:21 ` Jim Rees
2008-09-17 14:06 ` Peter Staubach
2008-09-17 15:41 ` Chuck Lever
2008-09-18 1:42 ` Greg Banks
[not found] ` <48D1B21E.3060509-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-09-18 3:13 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=270934.3119.qm@web32606.mail.mud.yahoo.com \
--to=knobi-ys4e+72pfw0hfhg+jk9f0w@public.gmane.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=staubach@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox