public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Shawn Bohrer <shawn.bohrer@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: High latencies writing to a memory mapped file
Date: Mon, 26 Jul 2010 17:09:09 -0500	[thread overview]
Message-ID: <20100726220909.GA3017@BohrerMBP.rgmadvisors.com> (raw)
In-Reply-To: <20100722232846.GC32635@dastard>

Hey Dave,

Thanks for the suggestions.

On Fri, Jul 23, 2010 at 09:28:46AM +1000, Dave Chinner wrote:
> On Thu, Jul 22, 2010 at 09:47:06AM -0500, Shawn Bohrer wrote:
> > Hello,
> > 
> > We have an application that receives data over the network, and then
> > writes data out to a shared memory mapped file on xfs.  Previously
> > everything worked great when we were running a CentOS 5.3 kernel
> > (2.6.18-128.7.1.el5), but when we upgraded to the newer CentOS 5.4 or
> > 5.5 kernels we started to see occasional spikes (up to 1 second) in
> > the time it takes to write the data to the memory mapped file.  I
> > noticed that between the CentOS 5.3 and 5.4 kernels the upstream xfs
> > changes were backported.  Testing on 2.6.32.16 shows the same
> > occasional latency spikes.
> > 
> > After doing some debugging on 2.6.32.16 I have found the cause of the
> > latency spikes.  The spikes occur every time pdflush runs to flush the
> > dirty pages.  Note our app only writes data at about 1.5 MB/s and we
> > are well below both the dirty_background_ratio and the dirty_ratio so
> > these flushes are only of pages that are dirty_expire_centisecs old.
> > Capturing one of the spikes with ftrace shows:
> > 
> > ...
> > flush-8:0-601   [011] 1291592.127590: funcgraph_entry:                   |  xfs_ilock() {
> > flush-8:0-601   [011] 1291592.127590: funcgraph_entry:                   |    down_write() {
> > flush-8:0-601   [011] 1291592.127590: funcgraph_entry:        0.213 us   |      _cond_resched();
> > flush-8:0-601   [011] 1291592.127591: funcgraph_exit:         0.657 us   |    }
> > flush-8:0-601   [011] 1291592.127591: funcgraph_exit:         1.174 us   |  }
> > flush-8:0-601   [011] 1291592.127764: sched_switch:         task flush-8:0:601 [120] (D) ==> swapper:0 [120]
> > ...
> 
> what is the lead-in context to this hunk? i.e. how does xfs_ilock()
> get called? That will tell me if it is blocking on the iolock or the
> ilock...

I don't have the lead-in context here.  I'm using the function_graph
tracer and with stalls up to a second in length it easily overflowed
the ftrace ring buffer.  Once I found I my process was blocking on the
xfs_ilock(), I limited my traces to include xfs_ilock, xfs_iunlock,
handle_mm_fault, and the scheduler events.  If you are still
interested in seeing some more detail here I can try to collect some
more captures.

> > Here you can see flush-8:0 acquires the xfs_ilock() then I believe it
> > blocks waiting for the I/O to complete.  Next you can see my
> > file_writer process generates a page fault while writing new data to
> > the memory mapped file.  The page fault tries to acquire the
> > xfs_ilock() and then blocks because it is held by the flush-8:0
> > process.
> > 
> > ...
> > file_writer-10549 [001] 1291593.100416: print:                00000000 write_delay start
> > file_writer-10549 [001] 1291593.100419: funcgraph_entry:                   |  handle_mm_fault() {
> > file_writer-10549 [001] 1291593.100420: funcgraph_entry:                   |    __do_fault() {
> > file_writer-10549 [001] 1291593.100420: funcgraph_entry:                   |      filemap_fault() {
> > file_writer-10549 [001] 1291593.100421: funcgraph_entry:        0.775 us   |        find_get_page();
> > file_writer-10549 [001] 1291593.100422: funcgraph_entry:        0.191 us   |        _cond_resched();
> > file_writer-10549 [001] 1291593.100422: funcgraph_exit:         1.980 us   |      }
> > file_writer-10549 [001] 1291593.100422: funcgraph_entry:                   |      unlock_page() {
> > file_writer-10549 [001] 1291593.100423: funcgraph_entry:        0.154 us   |        page_waitqueue();
> > file_writer-10549 [001] 1291593.100423: funcgraph_entry:        0.313 us   |        __wake_up_bit();
> > file_writer-10549 [001] 1291593.100423: funcgraph_exit:         1.080 us   |      }
> > file_writer-10549 [001] 1291593.100424: funcgraph_entry:                   |      xfs_vm_page_mkwrite() {
> > file_writer-10549 [001] 1291593.100424: funcgraph_entry:                   |        block_page_mkwrite() {
> > file_writer-10549 [001] 1291593.100424: funcgraph_entry:        0.240 us   |          _cond_resched();
> > file_writer-10549 [001] 1291593.100425: funcgraph_entry:                   |          block_prepare_write() {
> > file_writer-10549 [001] 1291593.100425: funcgraph_entry:                   |            __block_prepare_write() {
> > file_writer-10549 [001] 1291593.100426: funcgraph_entry:                   |              create_empty_buffers() {
> > file_writer-10549 [001] 1291593.100426: funcgraph_entry:                   |                alloc_page_buffers() {
> > file_writer-10549 [001] 1291593.100426: funcgraph_entry:                   |                  alloc_buffer_head() {
> > file_writer-10549 [001] 1291593.100426: funcgraph_entry:                   |                    kmem_cache_alloc() {
> > file_writer-10549 [001] 1291593.100427: funcgraph_entry:        0.491 us   |                      _cond_resched();
> > file_writer-10549 [001] 1291593.100427: funcgraph_exit:         1.043 us   |                    }
> > file_writer-10549 [001] 1291593.100428: funcgraph_entry:        0.185 us   |                    recalc_bh_state();
> > file_writer-10549 [001] 1291593.100428: funcgraph_exit:         2.003 us   |                  }
> > file_writer-10549 [001] 1291593.100428: funcgraph_entry:        0.137 us   |                  set_bh_page();
> > file_writer-10549 [001] 1291593.100428: funcgraph_exit:         2.701 us   |                }
> > file_writer-10549 [001] 1291593.100429: funcgraph_entry:        0.271 us   |                _spin_lock();
> > file_writer-10549 [001] 1291593.100429: funcgraph_exit:         3.751 us   |              }
> > file_writer-10549 [001] 1291593.100429: funcgraph_entry:                   |              xfs_get_blocks() {
> > file_writer-10549 [001] 1291593.100430: funcgraph_entry:                   |                __xfs_get_blocks() {
> > file_writer-10549 [001] 1291593.100430: funcgraph_entry:                   |                  xfs_iomap() {
> > file_writer-10549 [001] 1291593.100430: funcgraph_entry:                   |                    xfs_ilock() {
> 
> So this is trying to lock the ilock exclusively to map the range or
> do delayed allocation reservations.
> 
> > Finally xfslogd/1 wakes up the flush-8:0 process which releases the
> > xfs_ilock() waking up my file_writer process and allowing it to
> > continue.
> > ...
> > xfslogd/1-213   [001] 1291593.158606: sched_wakeup:         task flush-8:0:601 [120] success=1 [011]
> >    <idle>-0     [011] 1291593.158609: sched_switch:         task swapper:0 [120] (R) ==> flush-8:0:601 [120]
> 
> xfslogd? Inodes do not get locked across log writes; they are
> unlocked in the transaction commit.
> 
> Hmmm - perhaps the issue is that the transaction commit has not been
> able to get a iclog buffer to write the transaction into and so is
> sleeping waiting for an iclog IO to complete. That would be woken by
> the xfslogd, and then the transaction would write into the
> now-available iclog and unlock the inode.
> 
> So, I'd say that in this case your application is waiting for log IO
> completion to occur. Nothing you can do to avoid that, but you can
> minimise the impact - increase the log buffer size via the mount
> option logbsize (i.e. -o logbsize=262144) to reduce the frequency of
> iclog buffer IOs and make it less likely the system will run stall
> having run out of iclog space.
> 
> The other possibility is that the log IO is being slowed by barriers
> being issued with the log writes. If you've got non-volatile caches
> on your storage, you can turn off barriers and that should help
> reduce log IO latency significantly.

I've tried mounting with "noatime,nobarrier,logbufs=8,logbsize=262144"
but I don't notice much difference and still see the stalls.

> Another possibility is to switch to the deadline or no-op IO
> scheduler to avoid log IO being delayed by (the ever changing)
> CFQ IO scheduling idiosyncracies.

I also tried using the noop IO scheduler, but also didn't see any
improvement.

Thanks,
Shawn

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-07-26 22:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 14:47 High latencies writing to a memory mapped file Shawn Bohrer
2010-07-22 23:28 ` Dave Chinner
2010-07-26 22:09   ` Shawn Bohrer [this message]
2010-07-26 23:22     ` Dave Chinner
2010-08-03 22:03       ` Shawn Bohrer
2010-07-27  9:24 ` Matthias Schniedermeyer
2010-07-27 10:47   ` Emmanuel Florac
2010-07-27 11:27     ` Matthias Schniedermeyer
2010-07-27 13:06       ` Emmanuel Florac
2010-07-27 13:00     ` Stan Hoeppner
     [not found]   ` <4C4EDEFD.7000401@hardwarefreak.com>
2010-07-27 14:49     ` Matthias Schniedermeyer
2010-07-27 19:59       ` Kinzel, David
2010-07-27 21:02         ` Matthias Schniedermeyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100726220909.GA3017@BohrerMBP.rgmadvisors.com \
    --to=shawn.bohrer@gmail.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox