From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 0/2] xfs: regression fixes for 3.5-rc7
Date: Fri, 13 Jul 2012 12:36:53 -0500 [thread overview]
Message-ID: <20120713173653.GG22462@sgi.com> (raw)
In-Reply-To: <20120711235227.GA19223@dastard>
On Thu, Jul 12, 2012 at 09:52:27AM +1000, Dave Chinner wrote:
> On Wed, Jul 11, 2012 at 05:48:22PM -0500, Ben Myers wrote:
> > On Thu, Jul 12, 2012 at 07:40:41AM +1000, Dave Chinner wrote:
> > > These two patches need to go to Linus before 3.5 is released. The
> > > first fixes the btree cursor leak properly, and the second fixes a
> > > significant performance regression reported by Mel Gorman. Can you
> > > review them and send them onwards to Linus?
> >
> > I have some test exposure on the first one, but not on the second. I'd rather
> > have a known performance regression in 3.5 than start popping stacks again this
> > late toward the end of the rc cycle. I suggest that #2 go in for the 3.6 merge
> > window and back to 3.5 via -stable. Any other opinions?
>
> I'd prefer not to have to go via a -stable kernel. It's a pain to
> have to keep track and maintain multiple -stable kernels...
I tend to agree.
> As it is, I've never once seen a popped stack through a metadata
> allocation triggered path. The worst I've seen in the past few days
> is this: a bmbt block allocation in the sync write path:
>
> Depth Size Location (45 entries)
> ----- ---- --------
> 0) 5880 40 zone_statistics+0xbd/0xc0
> 1) 5840 272 get_page_from_freelist+0x3db/0x870
> 2) 5568 304 __alloc_pages_nodemask+0x192/0x840
> 3) 5264 80 alloc_pages_current+0xb0/0x120
> 4) 5184 96 xfs_buf_allocate_memory+0xfc/0x270
> 5) 5088 80 xfs_buf_get_map+0x15d/0x1b0
> 6) 5008 64 xfs_buf_read_map+0x33/0x130
> 7) 4944 48 xfs_buf_readahead_map+0x51/0x70
> 8) 4896 64 xfs_btree_reada_bufs+0xa3/0xc0
> 9) 4832 48 xfs_btree_readahead_sblock+0x7e/0xa0
> 10) 4784 16 xfs_btree_readahead+0x5f/0x80
> 11) 4768 96 xfs_btree_increment+0x52/0x360
> 12) 4672 208 xfs_btree_rshift+0x6bf/0x730
> 13) 4464 288 xfs_btree_delrec+0x1308/0x14d0
> 14) 4176 64 xfs_btree_delete+0x41/0xe0
> 15) 4112 112 xfs_alloc_fixup_trees+0x21b/0x580
> 16) 4000 192 xfs_alloc_ag_vextent_near+0xb05/0xcd0
> 17) 3808 32 xfs_alloc_ag_vextent+0x228/0x2a0
> 18) 3776 112 __xfs_alloc_vextent+0x4b7/0x910
> 19) 3664 80 xfs_alloc_vextent+0xa5/0xb0
> 20) 3584 224 xfs_bmbt_alloc_block+0xd1/0x240
> 21) 3360 240 xfs_btree_split+0xe2/0x7e0
> 22) 3120 96 xfs_btree_make_block_unfull+0xde/0x1d0
> 23) 3024 240 xfs_btree_insrec+0x587/0x8e0
> 24) 2784 112 xfs_btree_insert+0x6b/0x1d0
> 25) 2672 256 xfs_bmap_add_extent_delay_real+0x1fd4/0x2210
> 26) 2416 80 xfs_bmapi_allocate+0x290/0x350
> 27) 2336 368 xfs_bmapi_write+0x66e/0xa60
> 28) 1968 176 xfs_iomap_write_allocate+0x131/0x350
> 29) 1792 112 xfs_map_blocks+0x302/0x390
> 30) 1680 192 xfs_vm_writepage+0x19b/0x5a0
> 31) 1488 32 __writepage+0x1a/0x50
> 32) 1456 304 write_cache_pages+0x258/0x4d0
> 33) 1152 96 generic_writepages+0x4d/0x70
> 34) 1056 48 xfs_vm_writepages+0x4d/0x60
> 35) 1008 16 do_writepages+0x21/0x50
> 36) 992 64 __filemap_fdatawrite_range+0x59/0x60
> 37) 928 16 filemap_fdatawrite_range+0x13/0x20
> 38) 912 80 xfs_flush_pages+0x75/0xb0
> 39) 832 176 xfs_file_buffered_aio_write+0xdf/0x1e0
> 40) 656 128 xfs_file_aio_write+0x157/0x1d0
> 41) 528 272 do_sync_write+0xde/0x120
> 42) 256 48 vfs_write+0xa8/0x160
> 43) 208 80 sys_write+0x4a/0x90
> 44) 128 128 system_call_fastpath+0x16/0x1b
Nice stack!
> This is not quite the insanely complex, worst case double tree split
> path (i.e. bmbt split, followed by a allocbt split), but it's still
> very close to the worst case stack usage I've seen in above
> xfs_bmapi_write() (i.e. >3.5k of stack from xfs_vm_writepage() to
> the last XFS function in the stack).
>
> In theory, the directory code could trigger such a stack, and it
> calls xfs_bmapi_write() from roughly the same stack depth. It woul
> dtake quite a large directory to be causing splits in the bmbt tree,
> so it's a relatively rare occurance....
>
> Also, while this is arguably a metadata allocation here, this is in
> the data writeback path so perhaps the xfs_bmbt_alloc_block() call
> needs to trigger allocation via a workqueue here. That would make
> the worst case bmbt split usage (even for directory allocation)
> switch stacks. It's rare enough that it shouldn't introduce
> performance issues....
>
> What is notable here, however, is that none of the directory or
> inode allocation paths consumed more stack than this sync write
> case. That indicates that my statement assumption about inode and
> directory allocation does hold some water....
>
> > I'll feel better about if after some testing, so I'll get tests running asap.
> > What testing have you and Mel done?
>
> Mel reran his performance tests that were showing regressions, and
> those regressions went away, and I've rerun all my perf tests and
> xfstests for the past couple of days and not seen any changes in
> performance or stack usage.
Sounds ok. Wednesday night's testing didn't go well due to a hardware issue
and a PEBKAC, but last night's was fine. Based on that smoke test and what
you've described I am more comfortable with this. I really don't want to break
something at the last minute.
Reviewed-by: Ben Myers <bpm@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2012-07-13 17:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-11 21:40 [PATCH 0/2] xfs: regression fixes for 3.5-rc7 Dave Chinner
2012-07-11 21:40 ` [PATCH 1/2] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near Dave Chinner
2012-07-13 17:18 ` Ben Myers
2012-07-11 21:40 ` [PATCH 2/2] xfs: don't defer metadata allocation to the workqueue Dave Chinner
2012-07-11 22:48 ` [PATCH 0/2] xfs: regression fixes for 3.5-rc7 Ben Myers
2012-07-11 23:52 ` Dave Chinner
2012-07-12 8:24 ` Christoph Hellwig
2012-07-13 17:36 ` Ben Myers [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120713173653.GG22462@sgi.com \
--to=bpm@sgi.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.