From: Peter Staubach <staubach@redhat.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Brian R Cowan <brcowan@us.ibm.com>,
Chuck Lever <chuck.lever@oracle.com>,
linux-nfs@vger.kernel.org, linux-nfs-owner@vger.kernel.org
Subject: Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing
Date: Fri, 29 May 2009 13:48:57 -0400 [thread overview]
Message-ID: <4A202009.4010202@redhat.com> (raw)
In-Reply-To: <1243615595.7155.48.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
Trond Myklebust wrote:
> Look... This happens when you _flush_ the file to stable storage if
> there is only a single write < wsize. It isn't the business of the NFS
> layer to decide when you flush the file; that's an application
> decision...
>
>
I think that one easy way to show why this optimization is
not quite what we would all like, why there only being a
single write _now_ isn't quite sufficient, is to write a
block of a file and then read it back. Things like
compilers and linkers might do this during their random
access to the file being created. I would guess that this
audit thing that Brian has refered to does the same sort
of thing.
ps
ps. Why do we flush dirty pages before they can be read?
I am not even clear why we care about waiting for an
already existing flush to be completed before using the
page to satisfy a read system call.
> Trond
>
>
>
> On Fri, 2009-05-29 at 11:55 -0400, Brian R Cowan wrote:
>
>> Been working this issue with Red hat, and didn't need to go to the list...
>> Well, now I do... You mention that "The main type of workload we're
>> targetting with this patch is the app that opens a file, writes < 4k and
>> then closes the file." Well, it appears that this issue also impacts
>> flushing pages from filesystem caches.
>>
>> The reason this came up in my environment is that our product's build
>> auditing gives the the filesystem cache an interesting workout. When
>> ClearCase audits a build, the build places data in a few places,
>> including:
>> 1) a build audit file that usually resides in /tmp. This build audit is
>> essentially a log of EVERY file open/read/write/delete/rename/etc. that
>> the programs called in the build script make in the clearcase "view"
>> you're building in. As a result, this file can get pretty large.
>> 2) The build outputs themselves, which in this case are being written to a
>> remote storage location on a Linux or Solaris server, and
>> 3) a file called .cmake.state, which is a local cache that is written to
>> after the build script completes containing what is essentially a "Bill of
>> materials" for the files created during builds in this "view."
>>
>> We believe that the build audit file access is causing build output to get
>> flushed out of the filesystem cache. These flushes happen *in 4k chunks.*
>> This trips over this change since the cache pages appear to get flushed on
>> an individual basis.
>>
>> One note is that if the build outputs were going to a clearcase view
>> stored on an enterprise-level NAS device, there isn't as much of an issue
>> because many of these return from the stable write request as soon as the
>> data goes into the battery-backed memory disk cache on the NAS. However,
>> it really impacts writes to general-purpose OS's that follow Sun's lead in
>> how they handle "stable" writes. The truly annoying part about this rather
>> subtle change is that the NFS client is specifically ignoring the client
>> mount options since we cannot force the "async" mount option to turn off
>> this behavior.
>>
>> =================================================================
>> Brian Cowan
>> Advisory Software Engineer
>> ClearCase Customer Advocacy Group (CAG)
>> Rational Software
>> IBM Software Group
>> 81 Hartwell Ave
>> Lexington, MA
>>
>> Phone: 1.781.372.3580
>> Web: http://www.ibm.com/software/rational/support/
>>
>>
>> Please be sure to update your PMR using ESR at
>> http://www-306.ibm.com/software/support/probsub.html or cc all
>> correspondence to sw_support@us.ibm.com to be sure your PMR is updated in
>> case I am not available.
>>
>>
>>
>> From:
>> Trond Myklebust <trond.myklebust@fys.uio.no>
>> To:
>> Peter Staubach <staubach@redhat.com>
>> Cc:
>> Chuck Lever <chuck.lever@oracle.com>, Brian R Cowan/Cupertino/IBM@IBMUS,
>> linux-nfs@vger.kernel.org
>> Date:
>> 04/30/2009 05:23 PM
>> Subject:
>> Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing
>> Sent by:
>> linux-nfs-owner@vger.kernel.org
>>
>>
>>
>> On Thu, 2009-04-30 at 16:41 -0400, Peter Staubach wrote:
>>
>>> Chuck Lever wrote:
>>>
>>>> On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote:
>>>>
>>>>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2
>>
>>
>>> Actually, the "stable" part can be a killer. It depends upon
>>> why and when nfs_flush_inode() is invoked.
>>>
>>> I did quite a bit of work on this aspect of RHEL-5 and discovered
>>> that this particular code was leading to some serious slowdowns.
>>> The server would end up doing a very slow FILE_SYNC write when
>>> all that was really required was an UNSTABLE write at the time.
>>>
>>> Did anyone actually measure this optimization and if so, what
>>> were the numbers?
>>>
>> As usual, the optimisation is workload dependent. The main type of
>> workload we're targetting with this patch is the app that opens a file,
>> writes < 4k and then closes the file. For that case, it's a no-brainer
>> that you don't need to split a single stable write into an unstable + a
>> commit.
>>
>> So if the application isn't doing the above type of short write followed
>> by close, then exactly what is causing a flush to disk in the first
>> place? Ordinarily, the client will try to cache writes until the cows
>> come home (or until the VM tells it to reclaim memory - whichever comes
>> first)...
>>
>> Cheers
>> Trond
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
>
>
next prev parent reply other threads:[~2009-05-29 17:49 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-30 20:12 Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Brian R Cowan
2009-04-30 20:25 ` Christoph Hellwig
2009-04-30 20:28 ` Chuck Lever
2009-04-30 20:41 ` Peter Staubach
2009-04-30 21:13 ` Chuck Lever
2009-04-30 21:23 ` Trond Myklebust
2009-05-01 16:39 ` Brian R Cowan
[not found] ` <1241126587.15476.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 15:55 ` Brian R Cowan
2009-05-29 16:46 ` Trond Myklebust
[not found] ` <1243615595.7155.48.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 17:25 ` Brian R Cowan
2009-05-29 17:35 ` Trond Myklebust
[not found] ` <1243618500.7155.56.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-30 0:22 ` Greg Banks
[not found] ` <ac442c870905291722x1ec811b2sda997d464898fcda-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-05-30 7:57 ` Christoph Hellwig
2009-06-01 22:30 ` J. Bruce Fields
2009-06-05 14:54 ` Christoph Hellwig
2009-06-05 16:01 ` J. Bruce Fields
2009-06-05 16:12 ` Trond Myklebust
[not found] ` <1244218328.5410.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-05 19:54 ` J. Bruce Fields
2009-06-05 21:21 ` Trond Myklebust
2009-05-30 12:26 ` Trond Myklebust
[not found] ` <1243686363.5209.16.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-30 12:43 ` Trond Myklebust
2009-05-30 13:02 ` Greg Banks
[not found] ` <ac442c870905300602v6950ec42y5195d2d6ea7dd4c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-06-01 22:30 ` J. Bruce Fields
2009-06-02 15:00 ` Chuck Lever
2009-06-02 17:27 ` Trond Myklebust
[not found] ` <1243963631.4868.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-02 18:15 ` Chuck Lever
2009-06-03 16:22 ` Carlos Carvalho
2009-06-03 17:10 ` Trond Myklebust
[not found] ` <OFB53BFCCB.0CEC7A7E-ON852575C <1244138698.5203.59.camel@heimdal.trondhjem.org>
2009-06-03 21:28 ` Dean Hildebrand
2009-06-04 2:16 ` Carlos Carvalho
2009-06-04 17:42 ` Brian R Cowan
2009-06-04 18:04 ` Trond Myklebust
2009-06-04 20:43 ` Link performance over NFS degraded in RHEL5. -- was : " Brian R Cowan
2009-06-04 20:57 ` Trond Myklebust
2009-06-04 21:30 ` Brian R Cowan
2009-06-04 21:48 ` Trond Myklebust
2009-06-04 21:07 ` Peter Staubach
2009-06-04 21:39 ` Brian R Cowan
2009-06-05 11:35 ` Steve Dickson
2009-06-05 12:46 ` Trond Myklebust
2009-06-05 13:03 ` Brian R Cowan
2009-06-05 13:05 ` Tom Talpey
[not found] ` <4A29144A.6030405@gmail.com>
2009-06-05 13:30 ` Steve Dickson
2009-06-05 13:52 ` Trond Myklebust
[not found] ` <1244209956.5410.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-05 13:57 ` Steve Dickson
[not found] ` <4A29243F.8080008-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
2009-06-05 16:05 ` J. Bruce Fields
2009-06-05 16:35 ` Trond Myklebust
[not found] ` <1244219715.5410.40.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-15 23:08 ` J. Bruce Fields
2009-06-16 0:21 ` NeilBrown
[not found] ` <99d4545537613ce76040d3655b78bdb7.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2009-06-16 0:33 ` J. Bruce Fields
2009-06-16 0:50 ` NeilBrown
[not found] ` <02ada87c636e1088e9365a3cbea301e7.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2009-06-16 0:55 ` J. Bruce Fields
2009-06-17 16:54 ` J. Bruce Fields
2009-06-17 16:59 ` [PATCH 1/3] nfsd: track last inode only in use_wgather case J. Bruce Fields
2009-06-17 16:59 ` [PATCH 2/3] nfsd: Pull write-gathering code out of nfsd_vfs_write J. Bruce Fields
2009-06-17 16:59 ` [PATCH 3/3] nfsd: minor nfsd_vfs_write cleanup J. Bruce Fields
2009-06-16 0:32 ` Link performance over NFS degraded in RHEL5. -- was : Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Trond Myklebust
[not found] ` <1245112324.7470.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-16 2:02 ` J. Bruce Fields
[not found] ` <4A291D83.1000508@RedHat.com>
2009-06-05 13:50 ` Tom Talpey
2009-06-05 13:54 ` Trond Myklebust
2009-06-05 13:58 ` Tom Talpey
2009-06-05 13:56 ` Brian R Cowan
2009-06-24 19:54 ` [PATCH] read-modify-write page updating Peter Staubach
2009-06-25 17:13 ` Trond Myklebust
[not found] ` <1245950029.4913.17.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-09 13:59 ` Peter Staubach
2009-07-09 14:12 ` [PATCH v2] " Peter Staubach
2009-07-09 15:39 ` Trond Myklebust
[not found] ` <1247153972.5766.15.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-10 15:57 ` Peter Staubach
2009-07-10 17:22 ` J. Bruce Fields
2009-08-04 17:52 ` [PATCH v3] " Peter Staubach
2009-08-05 0:50 ` Trond Myklebust
2009-05-29 17:48 ` Peter Staubach [this message]
2009-05-29 18:21 ` Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Trond Myklebust
2009-05-29 17:01 ` Chuck Lever
2009-05-29 17:38 ` Brian R Cowan
2009-05-29 17:42 ` Trond Myklebust
[not found] ` <1243618968.7155.60.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 17:47 ` Chuck Lever
2009-05-29 18:15 ` Trond Myklebust
2009-05-29 17:51 ` Peter Staubach
2009-05-29 18:25 ` Brian R Cowan
2009-05-29 18:43 ` Trond Myklebust
2009-05-29 17:55 ` Brian R Cowan
2009-05-29 18:07 ` Trond Myklebust
[not found] ` <1243620455.7155.80.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 18:18 ` Brian R Cowan
2009-05-29 18:29 ` Trond Myklebust
[not found] ` <1243621769.7155.97.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 20:09 ` Brian R Cowan
2009-05-29 20:21 ` Trond Myklebust
[not found] ` <1243628519.7155.150.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 21:55 ` Brian R Cowan
2009-05-29 22:03 ` Trond Myklebust
[not found] ` <OFBB9B2C07.CC3D028B-ON852575C5. <1243634634.7155.160.camel@heimdal.trondhjem.org>
[not found] ` <1243634634.7155.160.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 22:20 ` Brian R Cowan
2009-05-29 22:36 ` Trond Myklebust
[not found] ` <OF061E0258.9581352B-ON852575C <1243636593.7155.188.camel@heimdal.trondhjem.org>
[not found] ` <1243636593.7155.188.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-05-29 23:02 ` Brian R Cowan
2009-05-29 23:13 ` Trond Myklebust
2009-05-29 17:57 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A202009.4010202@redhat.com \
--to=staubach@redhat.com \
--cc=brcowan@us.ibm.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs-owner@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.