From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754345AbYEDTLt (ORCPT ); Sun, 4 May 2008 15:11:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754213AbYEDTLm (ORCPT ); Sun, 4 May 2008 15:11:42 -0400 Received: from brick.kernel.dk ([87.55.233.238]:3507 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754071AbYEDTLk (ORCPT ); Sun, 4 May 2008 15:11:40 -0400 Date: Sun, 4 May 2008 21:11:33 +0200 From: Jens Axboe To: Mikulas Patocka Cc: Mike Anderson , linux-kernel@vger.kernel.org, Alasdair Graeme Kergon Subject: Re: [PATCH] Optimize lock in queue unplugging Message-ID: <20080504191132.GQ12774@kernel.dk> References: <20080429192556.GB12774@kernel.dk> <20080429202928.GB10641@linux.vnet.ibm.com> <20080430071415.GM12774@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 30 2008, Mikulas Patocka wrote: > > > On Wed, 30 Apr 2008, Jens Axboe wrote: > > >On Tue, Apr 29 2008, Mike Anderson wrote: > >>Jens Axboe wrote: > >>>On Tue, Apr 29 2008, Mikulas Patocka wrote: > >>>>Hi > >>>> > >>>>Mike Anderson was doing an OLTP benchmark on a computer with 48 physical > >>>>disks mapped to one logical device via device mapper. > >>>> > >>>>He found that there was a slowdown on request_queue->lock in function > >>>>generic_unplug_device. The slowdown is caused by the fact that when some > >>>>code calls unplug on the device mapper, device mapper calls unplug on > >>>>all > >>>>physical disks. These unplug calls take the lock, find that the queue is > >>>>already unplugged, release the lock and exit. > >>>> > >>>>With the below patch, performance of the benchmark was increased by 18% > >>>>(the whole OLTP application, not just block layer microbenchmarks). > >>>> > >>>>So I'm submitting this patch for upstream. I think the patch is correct, > >>>>because when more threads call simultaneously plug and unplug, it is > >>>>unspecified, if the queue is or isn't plugged (so the patch can't make > >>>>this worse). And the caller that plugged the queue should unplug it > >>>>anyway. (if it doesn't, there's 3ms timeout). > >>> > >>>Where were these unplug calls coming from? The block layer will > >>>generally only unplug when it is already unplugged, so if you are seeing > >>>so many unplug calls that the patch redues overhead by as much > >>>described, perhaps the callsite is buggy? > >> > >>I do not have direct access the the benchmark setup, but here is the data > >>I have received. > >> > >>The oprofile data was showing ll_rw_blk::generic_unplug_device() as a top > >>routine at 13% of the samples. Annotation of the samples shows hits on > >>spin_lock_irq(q->queue_lock). > >> > >>Here are some sample call traces: > >> > >>Call trace #1 > >> > >>kernel: [] generic_unplug_device+0x5d/0xc6 > >>kernel: [] :dm_mod:dm_table_unplug_all+0x33/0x41 > >>kernel: [] :dm_mod:dm_unplug_all+0x1d/0x28 > >>kernel: [] blk_backing_dev_unplug+0x56/0x5b > >>kernel: [] sync_buffer+0x36/0x3f > >>kernel: [] __wait_on_bit+0x40/0x6f > >>kernel: [] sync_buffer+0x0/0x3f > >>kernel: [] out_of_line_wait_on_bit+0x6c/0x78 > >>kernel: [] wake_bit_function+0x0/0x23 > >>kernel: [] :jbd:journal_commit_transaction+0x91f/0x1086 > >>kernel: [] lock_timer_base+0x1b/0x3c > >>kernel: [] :jbd:kjournald+0xc1/0x213 > >>kernel: [] autoremove_wake_function+0x0/0x2e > >>kernel: [] keventd_create_kthread+0x0/0x61 > >>kernel: [] :jbd:kjournald+0x0/0x213 > >>kernel: [] keventd_create_kthread+0x0/0x61 > >>kernel: [] kthread+0xfe/0x132 > >>kernel: [] child_rip+0xa/0x11 > >>kernel: [] keventd_create_kthread+0x0/0x61 > >>kernel: [] kthread+0x0/0x132 > >>kernel: [] child_rip+0x0/0x11 > >> > >>Call trace #2 > >>kernel: [] generic_unplug_device+0x5d/0xc6 > >>kernel: [] :dm_mod:dm_table_unplug_all+0x33/0x41 > >>kernel: [] :dm_mod:dm_unplug_all+0x1d/0x28 > >>kernel: [] blk_backing_dev_unplug+0x56/0x5b > >>kernel: [] __blockdev_direct_IO+0x889/0xaa2 > >>kernel: [] :ext3:ext3_direct_IO+0xf3/0x18b > >>kernel: [] :ext3:ext3_get_block+0x0/0xe3 > >>kernel: [] generic_file_direct_IO+0xbd/0xfb > >>kernel: [] generic_file_direct_write+0x60/0xf2 > >>kernel: [] __generic_file_aio_write_nolock+0x2b7/0x3b8 > >>kernel: [] generic_file_aio_write+0x65/0xc1 > >>kernel: [] :ext3:ext3_file_write+0x16/0x91 > >>kernel: [] do_sync_write+0xc7/0x104 > >>kernel: [] autoremove_wake_function+0x0/0x2e > >>kernel: [] free_msg+0x22/0x3c > >>kernel: [] vfs_write+0xce/0x174 > >>kernel: [] sys_pwrite64+0x50/0x70 > >>kernel: [] error_exit+0x0/0x84 > >>kernel: [] system_call+0x7e/0x83 > > > >So it's basically dm calling into blk_unplug() all the time, which > >doesn't check if the queue is plugged. The reason why I didn't like the > >initial patch is that ->unplug_fn() really should not be called unless > >the queue IS plugged. So how about this instead: > > > >http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=c44993018887e82abd49023e92e8d8b6000e03ed > > > >That's a lot more appropriate, imho. > > > >-- > >Jens Axboe > > This doesn't seem correct to me. The difference between blk_unplug and > generic_unplug_device is that blk_unplug is called on every type of device > and generic_unplug_device (pointed to by q->unplug_fn) is a method that is > called on low-level disk devices. > > dm and md redefine q->unplug_fn to point to their own method. On dm and > md, blk_unplug is called, but generic_unplug_device is not. > > So if you have this setup > dm-linear(unplugged) -> disk(plugged) > > then, with your patch, a call to blk_unplug(dm-linear) will not unplug the > disk. With my patch, a call to blk_unplug(dm-linear) will unplug the disk > --- it calls q->unplug_fn that points to dm_unplug_all, that calls > blk_unplug again on the disk and that calls generic_unplug_device on disk > queue. That is because the md/dm don't set the plugged flag which I think they should. So we fix that instead so that plugging works the same from the block core or from a driver instead of adding work-arounds in the block unplug handler. Adding a check for plugged in the plug handler is a hack, I don't see how you can argue against that. -- Jens Axboe