* RE: per-cpu blk_plug_list
@ 2004-03-03 4:20 Chen, Kenneth W
2004-03-03 5:13 ` Andrew Morton
0 siblings, 1 reply; 2+ messages in thread
From: Chen, Kenneth W @ 2004-03-03 4:20 UTC (permalink / raw)
To: Andrew Morton, Jesse Barnes; +Cc: linux-kernel, linux-ia64
I don't understand the proposal here. There is a per-device lock
already. But the plugged queue need to be on some list outside itself
so a group of them can be unplugged later on to flush all the I/O.
- Ken
-----Original Message-----
From: Andrew Morton [mailto:akpm@osdl.org]
Sent: Tuesday, March 02, 2004 7:56 PM
To: Jesse Barnes
Cc: Chen, Kenneth W; linux-kernel@vger.kernel.org
Subject: Re: per-cpu blk_plug_list
> On Mon, Mar 01, 2004 at 01:18:40PM -0800, Chen, Kenneth W wrote:
> > blk_plug_list/blk_plug_lock manages plug/unplug action. When you
have
> > lots of cpu simultaneously submits I/O, there are lots of movement
with
> > the device queue on and off that global list. Our measurement
showed
> > that blk_plug_lock contention prevents linux-2.6.3 kernel to scale
pass
> > beyond 40 thousand I/O per second in the I/O submit path.
>
> This helped out our machines quite a bit too. Without the patch, we
> weren't able to scale above 80000 IOPS, but now we exceed 110000 (and
> parity with our internal XSCSI based tree).
>
> Maybe the plug lists and locks should be per-device though, rather
than
> per-cpu? That would make the migration case easier I think. Is that
> possible?
It's possible, yes. It is the preferred solution. We need to identify
all
the queues which need to be unplugged to permit a VFS-level IO request
to
complete. It involves running down the device stack and running around
all
the contributing queues at each level.
Relatively straightforward, but first those dang sempahores in device
mapper need to become spinlocks. I haven't looked into what
difficulties
might be present in the RAID implementation.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: per-cpu blk_plug_list
2004-03-03 4:20 per-cpu blk_plug_list Chen, Kenneth W
@ 2004-03-03 5:13 ` Andrew Morton
0 siblings, 0 replies; 2+ messages in thread
From: Andrew Morton @ 2004-03-03 5:13 UTC (permalink / raw)
To: Chen, Kenneth W; +Cc: jbarnes, linux-kernel, linux-ia64
"Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
>
> I don't understand the proposal here. There is a per-device lock
> already. But the plugged queue need to be on some list outside itself
> so a group of them can be unplugged later on to flush all the I/O.
here's the proposal:
Regarding this:
http://www.ussg.iu.edu/hypermail/linux/kernel/0403.0/0179.html
And also having looked at Miquel's (currently slightly defective)
implementation of the any_congested() API for devicemapper:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.4-rc1/2.6.4-rc1-mm1/broken-out/queue-congestion-dm-implementation.patch
I am thinking that an appropriate way of solving the blk_run_queues() lock
contention problem is to nuke the global plug list altogther and make the
unplug function a method in struct backing_device_info.
This is conceptually the appropriate place to put it - it is almost always
the case that when we run blk_run_queues() it is on behalf of an
address_space, and the few remaining case can be simply deleted -
mm/mempool.c is the last one I think.
The implementation of backing_dev_info.unplug() would have to run the
unplug_fn of every queue which contributes to the top-level queue (the
thing which the address_space is sitting on top of).
We discussed this maybe a year ago with Jens but decided against it for
complexity reasons, but gee, dm_table_any_congested() isn't complex. Do we
forsee any particular problem with this?
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-03-03 5:13 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-03 4:20 per-cpu blk_plug_list Chen, Kenneth W
2004-03-03 5:13 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox