Re: [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED

lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Rob van der Heij <rvdheij@gmail.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yannick Brosseau <yannick.brosseau@gmail.com>,
	stable@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	"lttng-dev@lists.lttng.org" <lttng-dev@lists.lttng.org>
Subject: Re: [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED
Date: Tue, 2 Jul 2013 20:55:14 -0400	[thread overview]
Message-ID: <20130703005514.GA17149@Krystal> (raw)
In-Reply-To: <20130702135858.GA30837@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Dave Chinner (david@fromorbit.com) wrote:
> > On Thu, Jun 20, 2013 at 08:20:16AM -0400, Mathieu Desnoyers wrote:
> > > * Rob van der Heij (rvdheij@gmail.com) wrote:
> > > > Wouldn't you batch the calls to drop the pages from cache rather than drop
> > > > one packet at a time?
> > > 
> > > By default for kernel tracing, lttng's trace packets are 1MB, so I
> > > consider the call to fadvise to be already batched by applying it to 1MB
> > > packets rather than indivitual pages. Even there, it seems that the
> > > extra overhead added by the lru drain on each CPU is noticeable.
> > > 
> > > Another reason for not batching this in larger chunks is to limit the
> > > impact of the tracer on the kernel page cache. LTTng limits itself to
> > > its own set of buffers, and use the page cache for what is absolutely
> > > needed to perform I/O, but no more.
> > 
> > I think you are doing it wrong. This is a poster child case for
> > using Direct IO and completely avoiding the page cache altogether....
> 
> I just tried replacing my sync_file_range()+fadvise() calls and instead
> pass the O_DIRECT flag to open(). Unfortunately, I must be doing
> something very wrong, because I get only 1/3rd of the throughput, and
> the page cache fills up. Any idea why ?

Since O_DIRECT does not seem to provide acceptable throughput, it may be
interesting to investigate other ways to lessen the latency impact of
the fadvise DONTNEED hint.

Given it is just a hint, we should be allowed to perform page
deactivation lazily. Is there any fundamental reason to wait for worker
threads on each CPU to complete their lru drain before returning from
fadvise() to user-space ?

Thanks,

Mathieu

> 
> Here are my results:
> 
> heavy-syscall.c: 30M sigaction() syscall with bad parameters (returns
> immediately). Used as high-throughput stress-test for the tracer.
> Tracing to disk with LTTng, all kernel tracepoints activated, including
> system calls.
> 
> Tracer configuration: per-core buffers split into 4 sub-buffers of
> 262kB. splice() is used to transfer data from buffers to disk. Runs on a
> 8-core Intel machine.
> 
> Writing to a software raid-1 ext3 partition.
> ext3 mount options: rw,errors=remount-ro
> 
> * sync_file_range+fadvise 3.9.8
>   - with lru drain on fadvise
> 
> Kernel cache usage:
> Before tracing: 56272k cached
> After tracing:  56388k cached
> 
> 939M	/root/lttng-traces/auto-20130702-090430
> time ./heavy-syscall 
> real	0m21.910s
> throughput: 42MB/s
> 
> 
> * sync_file_range+fadvise 3.9.8
>   - without lru drain on fadvise: manually reverted
> 
> Kernel cache usage:
> Before tracing: 67968k cached
> After tracing:  67984k cached
> 
> 945M	/root/lttng-traces/auto-20130702-092505
> time ./heavy-syscall 
> real	0m21.872s
> throughput: 43MB/s
> 
> 
> * O_DIRECT 3.9.8
>   - O_DIRECT flag on open(), removed fadvise and sync_file_range calls
> 
> Kernel cache usage:
> Before tracing:  99480k cached
> After tracing:  360132k cached
> 
> 258M	/root/lttng-traces/auto-20130702-090603
> time ./heavy-syscall 
> real	0m19.627s
> throughput: 13MB/s
> 
> 
> * No cache hints 3.9.8
>   - only removed fadvise and sync_file_range calls
> 
> Kernel cache usage:
> Before tracing: 103556k cached
> After tracing:  363712k cached
> 
> 945M	/root/lttng-traces/auto-20130702-092505
> time ./heavy-syscall 
> real	0m19.672s
> throughput: 48MB/s
> 
> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2013-07-03  0:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <51BE1828.3060206@gmail.com>
2013-06-17 14:13 ` [-stable 3.8.1 performance regression] madvise POSIX_FADV_DONTNEED Mathieu Desnoyers
2013-06-17 21:24   ` Andrew Morton
2013-06-17 21:39     ` Raphaël Beamonte
     [not found]     ` <CAE_Gge34HCroSgNgiXL1j7Le3CNKRXR=7TZQhJSmY+wfWniKug@mail.gmail.com>
2013-06-17 21:57       ` [lttng-dev] " Andrew Morton
2013-06-18  2:15         ` Mathieu Desnoyers
2013-06-18  2:44           ` Andrew Morton
2013-06-18  9:29     ` Mel Gorman
2013-06-18 10:11       ` Mel Gorman
2013-06-19 19:25         ` Mathieu Desnoyers
2013-06-20  6:36           ` Rob van der Heij
     [not found]           ` <CAJCc=kijujORhPUmPvzHj-MMdyVbf-iHEK0Jx-VHbTO8q4ESFA@mail.gmail.com>
2013-06-20 12:20             ` Mathieu Desnoyers
2013-06-25  1:56               ` Dave Chinner
2013-07-02 13:58                 ` Mathieu Desnoyers
2013-07-03  0:55                   ` Mathieu Desnoyers [this message]
2013-07-03  8:47                     ` Mel Gorman
2013-07-03 14:53                       ` Jeff Moyer
2013-07-04  0:03                         ` Dave Chinner
2013-07-04  0:31                           ` Mathieu Desnoyers
2013-07-04 21:11                             ` Rob van der Heij
2013-07-05  1:42                             ` Dave Chinner
2013-07-05  2:34                               ` Mathieu Desnoyers
2013-07-03 18:47                       ` Yannick Brosseau
2013-07-05 14:18                         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130703005514.GA17149@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lttng-dev@lists.lttng.org \
    --cc=mgorman@suse.de \
    --cc=rvdheij@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=yannick.brosseau@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).