All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 0/2] xfs: regression fixes for 3.5-rc7
Date: Fri, 13 Jul 2012 12:36:53 -0500	[thread overview]
Message-ID: <20120713173653.GG22462@sgi.com> (raw)
In-Reply-To: <20120711235227.GA19223@dastard>

On Thu, Jul 12, 2012 at 09:52:27AM +1000, Dave Chinner wrote:
> On Wed, Jul 11, 2012 at 05:48:22PM -0500, Ben Myers wrote:
> > On Thu, Jul 12, 2012 at 07:40:41AM +1000, Dave Chinner wrote:
> > > These two patches need to go to Linus before 3.5 is released. The
> > > first fixes the btree cursor leak properly, and the second fixes a
> > > significant performance regression reported by Mel Gorman. Can you
> > > review them and send them onwards to Linus?
> > 
> > I have some test exposure on the first one, but not on the second.  I'd rather
> > have a known performance regression in 3.5 than start popping stacks again this
> > late toward the end of the rc cycle.  I suggest that #2 go in for the 3.6 merge
> > window and back to 3.5 via -stable.  Any other opinions?
> 
> I'd prefer not to have to go via a -stable kernel. It's a pain to
> have to keep track and maintain multiple -stable kernels...

I tend to agree.

> As it is, I've never once seen a popped stack through a metadata
> allocation triggered path.  The worst I've seen in the past few days
> is this: a bmbt block allocation in the sync write path:
> 
>        Depth    Size   Location    (45 entries)
>         -----    ----   --------
>   0)     5880      40   zone_statistics+0xbd/0xc0
>   1)     5840     272   get_page_from_freelist+0x3db/0x870
>   2)     5568     304   __alloc_pages_nodemask+0x192/0x840
>   3)     5264      80   alloc_pages_current+0xb0/0x120
>   4)     5184      96   xfs_buf_allocate_memory+0xfc/0x270
>   5)     5088      80   xfs_buf_get_map+0x15d/0x1b0
>   6)     5008      64   xfs_buf_read_map+0x33/0x130
>   7)     4944      48   xfs_buf_readahead_map+0x51/0x70
>   8)     4896      64   xfs_btree_reada_bufs+0xa3/0xc0
>   9)     4832      48   xfs_btree_readahead_sblock+0x7e/0xa0
>  10)     4784      16   xfs_btree_readahead+0x5f/0x80
>  11)     4768      96   xfs_btree_increment+0x52/0x360
>  12)     4672     208   xfs_btree_rshift+0x6bf/0x730
>  13)     4464     288   xfs_btree_delrec+0x1308/0x14d0
>  14)     4176      64   xfs_btree_delete+0x41/0xe0
>  15)     4112     112   xfs_alloc_fixup_trees+0x21b/0x580
>  16)     4000     192   xfs_alloc_ag_vextent_near+0xb05/0xcd0
>  17)     3808      32   xfs_alloc_ag_vextent+0x228/0x2a0
>  18)     3776     112   __xfs_alloc_vextent+0x4b7/0x910
>  19)     3664      80   xfs_alloc_vextent+0xa5/0xb0
>  20)     3584     224   xfs_bmbt_alloc_block+0xd1/0x240
>  21)     3360     240   xfs_btree_split+0xe2/0x7e0
>  22)     3120      96   xfs_btree_make_block_unfull+0xde/0x1d0
>  23)     3024     240   xfs_btree_insrec+0x587/0x8e0
>  24)     2784     112   xfs_btree_insert+0x6b/0x1d0
>  25)     2672     256   xfs_bmap_add_extent_delay_real+0x1fd4/0x2210
>  26)     2416      80   xfs_bmapi_allocate+0x290/0x350
>  27)     2336     368   xfs_bmapi_write+0x66e/0xa60
>  28)     1968     176   xfs_iomap_write_allocate+0x131/0x350
>  29)     1792     112   xfs_map_blocks+0x302/0x390
>  30)     1680     192   xfs_vm_writepage+0x19b/0x5a0
>  31)     1488      32   __writepage+0x1a/0x50
>  32)     1456     304   write_cache_pages+0x258/0x4d0
>  33)     1152      96   generic_writepages+0x4d/0x70
>  34)     1056      48   xfs_vm_writepages+0x4d/0x60
>  35)     1008      16   do_writepages+0x21/0x50
>  36)      992      64   __filemap_fdatawrite_range+0x59/0x60
>  37)      928      16   filemap_fdatawrite_range+0x13/0x20
>  38)      912      80   xfs_flush_pages+0x75/0xb0
>  39)      832     176   xfs_file_buffered_aio_write+0xdf/0x1e0
>  40)      656     128   xfs_file_aio_write+0x157/0x1d0
>  41)      528     272   do_sync_write+0xde/0x120
>  42)      256      48   vfs_write+0xa8/0x160
>  43)      208      80   sys_write+0x4a/0x90
>  44)      128     128   system_call_fastpath+0x16/0x1b

Nice stack!

> This is not quite the insanely complex, worst case double tree split
> path (i.e. bmbt split, followed by a allocbt split), but it's still
> very close to the worst case stack usage I've seen in above
> xfs_bmapi_write() (i.e. >3.5k of stack from xfs_vm_writepage() to
> the last XFS function in the stack).
> 
> In theory, the directory code could trigger such a stack, and it
> calls xfs_bmapi_write() from roughly the same stack depth. It woul
> dtake quite a large directory to be causing splits in the bmbt tree,
> so it's a relatively rare occurance....
> 
> Also, while this is arguably a metadata allocation here, this is in
> the data writeback path so perhaps the xfs_bmbt_alloc_block() call
> needs to trigger allocation via a workqueue here. That would make
> the worst case bmbt split usage (even for directory allocation)
> switch stacks.  It's rare enough that it shouldn't introduce
> performance issues....
> 
> What is notable here, however, is that none of the directory or
> inode allocation paths consumed more stack than this sync write
> case. That indicates that my statement assumption about inode and
> directory allocation does hold some water....
>
> > I'll feel better about if after some testing, so I'll get tests running asap.
> > What testing have you and Mel done?
> 
> Mel reran his performance tests that were showing regressions, and
> those regressions went away, and I've rerun all my perf tests and
> xfstests for the past couple of days and not seen any changes in
> performance or stack usage.

Sounds ok.  Wednesday night's testing didn't go well due to a hardware issue
and a PEBKAC, but last night's was fine.  Based on that smoke test and what
you've described I am more comfortable with this.  I really don't want to break
something at the last minute.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2012-07-13 17:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-11 21:40 [PATCH 0/2] xfs: regression fixes for 3.5-rc7 Dave Chinner
2012-07-11 21:40 ` [PATCH 1/2] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near Dave Chinner
2012-07-13 17:18   ` Ben Myers
2012-07-11 21:40 ` [PATCH 2/2] xfs: don't defer metadata allocation to the workqueue Dave Chinner
2012-07-11 22:48 ` [PATCH 0/2] xfs: regression fixes for 3.5-rc7 Ben Myers
2012-07-11 23:52   ` Dave Chinner
2012-07-12  8:24     ` Christoph Hellwig
2012-07-13 17:36     ` Ben Myers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120713173653.GG22462@sgi.com \
    --to=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.