From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
NeilBrown <neilb@suse.de>, Mike Snitzer <snitzer@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: [PATCH 05/10] block: remove per-queue plugging
Date: Wed, 13 Apr 2011 09:35:36 +1000 [thread overview]
Message-ID: <20110412233536.GK31057@dastard> (raw)
In-Reply-To: <4DA4581A.4090600@fusionio.com>
On Tue, Apr 12, 2011 at 03:48:10PM +0200, Jens Axboe wrote:
> On 2011-04-12 15:40, Dave Chinner wrote:
> > On Tue, Apr 12, 2011 at 02:28:31PM +0200, Jens Axboe wrote:
> >> On 2011-04-12 14:22, Dave Chinner wrote:
> >>> On Tue, Apr 12, 2011 at 10:36:30AM +0200, Jens Axboe wrote:
> >>>> On 2011-04-12 03:12, hch@infradead.org wrote:
> >>>>> On Mon, Apr 11, 2011 at 02:48:45PM +0200, Jens Axboe wrote:
> >>>>> function calls.
> >>>>> - Why is having a plug in blk_flush_plug marked unlikely? Note that
> >>>>> unlikely is the static branch prediction hint to mark the case
> >>>>> extremly unlikely and is even used for hot/cold partitioning. But
> >>>>> when we call it we usually check beforehand if we actually have
> >>>>> plugs, so it's actually likely to happen.
> >>>>
> >>>> The existance and out-of-line is for the scheduler() hook. It should be
> >>>> an unlikely event to schedule with a plug held, normally the plug should
> >>>> have been explicitly unplugged before that happens.
> >>>
> >>> Though if it does, haven't you just added a significant amount of
> >>> depth to the worst case stack usage? I'm seeing this sort of thing
> >>> from io_schedule():
> >>>
> >>> Depth Size Location (40 entries)
> >>> ----- ---- --------
> >>> 0) 4256 16 mempool_alloc_slab+0x15/0x20
> >>> 1) 4240 144 mempool_alloc+0x63/0x160
> >>> 2) 4096 16 scsi_sg_alloc+0x4c/0x60
> >>> 3) 4080 112 __sg_alloc_table+0x66/0x140
> >>> 4) 3968 32 scsi_init_sgtable+0x33/0x90
> >>> 5) 3936 48 scsi_init_io+0x31/0xc0
> >>> 6) 3888 32 scsi_setup_fs_cmnd+0x79/0xe0
> >>> 7) 3856 112 sd_prep_fn+0x150/0xa90
> >>> 8) 3744 48 blk_peek_request+0x6a/0x1f0
> >>> 9) 3696 96 scsi_request_fn+0x60/0x510
> >>> 10) 3600 32 __blk_run_queue+0x57/0x100
> >>> 11) 3568 80 flush_plug_list+0x133/0x1d0
> >>> 12) 3488 32 __blk_flush_plug+0x24/0x50
> >>> 13) 3456 32 io_schedule+0x79/0x80
> >>>
> >>> (This is from a page fault on ext3 that is doing page cache
> >>> readahead and blocking on a locked buffer.)
> >
> > FYI, the next step in the allocation chain adds >900 bytes to that
> > stack:
> >
> > $ cat /sys/kernel/debug/tracing/stack_trace
> > Depth Size Location (47 entries)
> > ----- ---- --------
> > 0) 5176 40 zone_statistics+0xad/0xc0
> > 1) 5136 288 get_page_from_freelist+0x2cf/0x840
> > 2) 4848 304 __alloc_pages_nodemask+0x121/0x930
> > 3) 4544 48 kmem_getpages+0x62/0x160
> > 4) 4496 96 cache_grow+0x308/0x330
> > 5) 4400 80 cache_alloc_refill+0x21c/0x260
> > 6) 4320 64 kmem_cache_alloc+0x1b7/0x1e0
> > 7) 4256 16 mempool_alloc_slab+0x15/0x20
> > 8) 4240 144 mempool_alloc+0x63/0x160
> > 9) 4096 16 scsi_sg_alloc+0x4c/0x60
> > 10) 4080 112 __sg_alloc_table+0x66/0x140
> > 11) 3968 32 scsi_init_sgtable+0x33/0x90
> > 12) 3936 48 scsi_init_io+0x31/0xc0
> > 13) 3888 32 scsi_setup_fs_cmnd+0x79/0xe0
> > 14) 3856 112 sd_prep_fn+0x150/0xa90
> > 15) 3744 48 blk_peek_request+0x6a/0x1f0
> > 16) 3696 96 scsi_request_fn+0x60/0x510
> > 17) 3600 32 __blk_run_queue+0x57/0x100
> > 18) 3568 80 flush_plug_list+0x133/0x1d0
> > 19) 3488 32 __blk_flush_plug+0x24/0x50
> > 20) 3456 32 io_schedule+0x79/0x80
> >
> > That's close to 1800 bytes now, and that's not entering the reclaim
> > path. If i get one deeper than that, I'll be sure to post it. :)
>
> Do you have traces from 2.6.38, or are you just doing them now?
I do stack checks like this all the time. I generally don't keep
them around, just pay attention to the path and depth. ext3 is used
for / on my test VMs, and has never shown up as the worse case stack
usage when running xfstests. As of the block plugging code, this
trace is the top stack user for the first ~130 tests, and often for
the entire test run on XFS....
> The path you quote above should not go into reclaim, it's a GFP_ATOMIC
> allocation.
Right. I'm still trying to produce a trace that shows more stack
usage in the block layer. It's random chance as to what pops up most
of the time. However, some of the stacks that are showing up in
2.6.39 are quite different from any I've ever seen before...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2011-04-12 23:35 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1295659049-2688-1-git-send-email-jaxboe@fusionio.com>
[not found] ` <1295659049-2688-6-git-send-email-jaxboe@fusionio.com>
[not found] ` <AANLkTin8FoXX6oqUyW+scwhadyX-TfW16_oKjvngU9-m@mail.gmail.com>
[not found] ` <20110303221353.GA10366@redhat.com>
[not found] ` <4D761E0D.8050200@fusionio.com>
[not found] ` <20110308202100.GA31744@redhat.com>
[not found] ` <4D76912C.9040705@fusionio.com>
[not found] ` <20110308220526.GA393@redhat.com>
[not found] ` <20110310005810.GA17911@redhat.com>
[not found] ` <20110405130541.6c2b5f86@notabene.brown>
2011-04-11 4:50 ` [PATCH 05/10] block: remove per-queue plugging NeilBrown
2011-04-11 9:19 ` Jens Axboe
2011-04-11 10:59 ` NeilBrown
2011-04-11 11:04 ` Jens Axboe
2011-04-11 11:26 ` NeilBrown
2011-04-11 11:37 ` Jens Axboe
2011-04-11 12:05 ` NeilBrown
2011-04-11 12:11 ` Jens Axboe
2011-04-11 12:36 ` NeilBrown
2011-04-11 12:48 ` Jens Axboe
2011-04-12 1:12 ` hch
2011-04-12 8:36 ` Jens Axboe
2011-04-12 12:22 ` Dave Chinner
2011-04-12 12:28 ` Jens Axboe
2011-04-12 12:41 ` Dave Chinner
2011-04-12 12:58 ` Jens Axboe
2011-04-12 13:31 ` Dave Chinner
2011-04-12 13:45 ` Jens Axboe
2011-04-12 14:34 ` Dave Chinner
2011-04-12 21:08 ` NeilBrown
2011-04-13 2:23 ` Linus Torvalds
2011-04-13 11:12 ` Peter Zijlstra
2011-04-13 11:23 ` Jens Axboe
2011-04-13 11:41 ` Peter Zijlstra
2011-04-13 15:13 ` Linus Torvalds
2011-04-13 17:35 ` Jens Axboe
2011-04-12 16:58 ` hch
2011-04-12 17:29 ` Jens Axboe
2011-04-12 16:44 ` hch
2011-04-12 16:49 ` Jens Axboe
2011-04-12 16:54 ` hch
2011-04-12 17:24 ` Jens Axboe
2011-04-12 13:40 ` Dave Chinner
2011-04-12 13:48 ` Jens Axboe
2011-04-12 23:35 ` Dave Chinner [this message]
2011-04-12 16:50 ` hch
2011-04-15 4:26 ` hch
2011-04-15 6:34 ` Jens Axboe
2011-04-17 22:19 ` NeilBrown
2011-04-18 4:19 ` NeilBrown
2011-04-18 6:38 ` Jens Axboe
2011-04-18 7:25 ` NeilBrown
2011-04-18 8:10 ` Jens Axboe
2011-04-18 8:33 ` NeilBrown
2011-04-18 8:42 ` Jens Axboe
2011-04-18 21:23 ` hch
2011-04-18 21:30 ` hch
2011-04-18 22:38 ` NeilBrown
2011-04-20 10:55 ` hch
2011-04-18 9:19 ` hch
2011-04-18 9:40 ` [dm-devel] " Hannes Reinecke
2011-04-18 9:47 ` Jens Axboe
2011-04-18 9:46 ` Jens Axboe
2011-04-11 11:55 ` NeilBrown
2011-04-11 12:12 ` Jens Axboe
2011-04-11 22:58 ` hch
2011-04-12 6:20 ` Jens Axboe
2011-04-11 16:59 ` hch
2011-04-11 21:14 ` NeilBrown
2011-04-11 22:59 ` hch
2011-04-12 6:18 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110412233536.GK31057@dastard \
--to=david@fromorbit.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=jaxboe@fusionio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).