Re: [PATCH] [RFC] xfs: wire up aio_fsync method

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	linux-mm@kvack.org, linux-man@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: wire up aio_fsync method
Date: Wed, 18 Jun 2014 16:13:20 +1000	[thread overview]
Message-ID: <20140618061320.GP9508@dastard> (raw)
In-Reply-To: <20140618050230.GO9508@dastard>

On Wed, Jun 18, 2014 at 03:02:30PM +1000, Dave Chinner wrote:
> On Tue, Jun 17, 2014 at 08:20:55PM -0700, Jens Axboe wrote:
> > On 2014-06-17 20:13, Dave Chinner wrote:
> > >On Tue, Jun 17, 2014 at 07:24:10PM -0700, Jens Axboe wrote:
> > >>On 2014-06-17 17:28, Dave Chinner wrote:
> > >>>[cc linux-mm]
> > >>>
> > >>>On Tue, Jun 17, 2014 at 07:23:58AM -0600, Jens Axboe wrote:
> > >>>>On 2014-06-16 16:27, Dave Chinner wrote:
> > >>>>>On Mon, Jun 16, 2014 at 01:30:42PM -0600, Jens Axboe wrote:
> > >>>>>>On 06/16/2014 01:19 AM, Dave Chinner wrote:
> > >>>>>>>On Sun, Jun 15, 2014 at 08:58:46PM -0600, Jens Axboe wrote:
> > >>>>>>>>On 2014-06-15 20:00, Dave Chinner wrote:
> > >>>>>>>>>On Mon, Jun 16, 2014 at 08:33:23AM +1000, Dave Chinner wrote:
> > >>>>>>>>>FWIW, the non-linear system CPU overhead of a fs_mark test I've been
> > >>>>>>>>>running isn't anything related to XFS.  The async fsync workqueue
> > >>>>>>>>>results in several thousand worker threads dispatching IO
> > >>>>>>>>>concurrently across 16 CPUs:
> > >....
> > >>>>>>>>>I know that the tag allocator has been rewritten, so I tested
> > >>>>>>>>>against a current a current Linus kernel with the XFS aio-fsync
> > >>>>>>>>>patch. The results are all over the place - from several sequential
> > >>>>>>>>>runs of the same test (removing the files in between so each tests
> > >>>>>>>>>starts from an empty fs):
> > >>>>>>>>>
> > >>>>>>>>>Wall time	sys time	IOPS	 files/s
> > >>>>>>>>>4m58.151s	11m12.648s	30,000	 13,500
> > >>>>>>>>>4m35.075s	12m45.900s	45,000	 15,000
> > >>>>>>>>>3m10.665s	11m15.804s	65,000	 21,000
> > >>>>>>>>>3m27.384s	11m54.723s	85,000	 20,000
> > >>>>>>>>>3m59.574s	11m12.012s	50,000	 16,500
> > >>>>>>>>>4m12.704s	12m15.720s	50,000	 17,000
> 
> ....
> > >But the IOPS rate has definitely increased with this config
> > >- I just saw 90k, 100k and 110k IOPS in the last 3 iterations of the
> > >workload (the above profile is from the 100k IOPS period). However,
> > >the wall time was still only 3m58s, which again tends to implicate
> > >the write() portion of the benchmark for causing the slowdowns
> > >rather than the fsync() portion that is dispatching all the IO...
> > 
> > Some contention for this case is hard to avoid, and the above looks
> > better than 3.15 does. So the big question is whether it's worth
> > fixing the gaps with multiple waitqueues (and if that actually still
> > buys us anything), or whether we should just disable them.
> > 
> > If I can get you to try one more thing, can you apply this patch and
> > give that a whirl? Get rid of the other patches I sent first, this
> > has everything.
> 
> Not much difference in the CPU usage profiles or base line
> performance. It runs at 3m10s from empty memory, and ~3m45s when
> memory starts full of clean pages. system time varies from 10m40s to
> 12m55s with no real correlation to overall runtime.
> 
> From observation of all the performance metrics I graph in real
> time, however, the pattern of the peaks and troughs from run to run
> and even iteration to iteration is much more regular than the
> previous patches. So from that perspective it is an improvement.
> Again, all the variability in the graphs show up when free memory
> runs out...

And I've identified the commit that caused the memory reclaim
behaviour to go south:

commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679
Author: Dave Chinner <dchinner@redhat.com>
Date:   Fri Jun 6 15:59:59 2014 +1000

    xfs: block allocation work needs to be kswapd aware
    
    Upon memory pressure, kswapd calls xfs_vm_writepage() from
    shrink_page_list(). This can result in delayed allocation occurring
    and that gets deferred to the the allocation workqueue.
    
    The allocation then runs outside kswapd context, which means if it
    needs memory (and it does to demand page metadata from disk) it can
    block in shrink_inactive_list() waiting for IO congestion. These
    blocking waits are normally avoiding in kswapd context, so under
    memory pressure writeback from kswapd can be arbitrarily delayed by
    memory reclaim.
    
    To avoid this, pass the kswapd context to the allocation being done
    by the workqueue, so that memory reclaim understands correctly that
    the work is being done for kswapd and therefore it is not blocked
    and does not delay memory reclaim.
    
    To avoid issues with int->char conversion of flag fields (as noticed
    in v1 of this patch) convert the flag fields in the struct
    xfs_bmalloca to bool types. pahole indicates these variables are
    still single byte variables, so no extra space is consumed by this
    change.
    
    cc: <stable@vger.kernel.org>
    Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Reverting this patch results in runtimes of between 3m and 3m10s
regardless of the amount of free memory when the test starts.

I'm probably going to have to revert this and make sure it stays out
of the stable kernels now - I think that unbalancing memory reclaim
and introducing performance degradations of 25-30% to work around a
problem that is only hit by an extreme memory pressure stress test
is a bad trade-off.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-06-18  6:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1402562047-31276-1-git-send-email-david@fromorbit.com>
     [not found] ` <20140612141329.GA11676@infradead.org>
     [not found]   ` <20140612234441.GT9508@dastard>
2014-06-13 16:23     ` [PATCH] [RFC] xfs: wire up aio_fsync method Christoph Hellwig
2014-06-15 22:33       ` Dave Chinner
2014-06-16  2:00         ` Dave Chinner
2014-06-16  2:58           ` Jens Axboe
     [not found]             ` <539E5D66.8040605-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2014-06-16  7:19               ` Dave Chinner
2014-06-16 19:30                 ` Jens Axboe
2014-06-16 22:27                   ` Dave Chinner
2014-06-17 13:23                     ` Jens Axboe
2014-06-18  0:28                       ` Dave Chinner
2014-06-18  2:24                         ` Jens Axboe
2014-06-18  3:13                           ` Dave Chinner
2014-06-18  3:20                             ` Jens Axboe
     [not found]                               ` <53A10597.6020707-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2014-06-18  5:02                                 ` Dave Chinner
2014-06-18  6:13                                   ` Dave Chinner [this message]
     [not found]       ` <20140613162352.GB23394-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-06-16 21:06         ` Michael Kerrisk (man-pages)
2014-06-17 14:01           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140618061320.GP9508@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).