From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763444AbYD2U3p (ORCPT ); Tue, 29 Apr 2008 16:29:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756678AbYD2U3d (ORCPT ); Tue, 29 Apr 2008 16:29:33 -0400 Received: from e3.ny.us.ibm.com ([32.97.182.143]:35569 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756210AbYD2U3c (ORCPT ); Tue, 29 Apr 2008 16:29:32 -0400 Date: Tue, 29 Apr 2008 13:29:28 -0700 From: Mike Anderson To: Jens Axboe Cc: Mikulas Patocka , linux-kernel@vger.kernel.org, Alasdair Graeme Kergon Subject: Re: [PATCH] Optimize lock in queue unplugging Message-ID: <20080429202928.GB10641@linux.vnet.ibm.com> References: <20080429192556.GB12774@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080429192556.GB12774@kernel.dk> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jens Axboe wrote: > On Tue, Apr 29 2008, Mikulas Patocka wrote: > > Hi > > > > Mike Anderson was doing an OLTP benchmark on a computer with 48 physical > > disks mapped to one logical device via device mapper. > > > > He found that there was a slowdown on request_queue->lock in function > > generic_unplug_device. The slowdown is caused by the fact that when some > > code calls unplug on the device mapper, device mapper calls unplug on all > > physical disks. These unplug calls take the lock, find that the queue is > > already unplugged, release the lock and exit. > > > > With the below patch, performance of the benchmark was increased by 18% > > (the whole OLTP application, not just block layer microbenchmarks). > > > > So I'm submitting this patch for upstream. I think the patch is correct, > > because when more threads call simultaneously plug and unplug, it is > > unspecified, if the queue is or isn't plugged (so the patch can't make > > this worse). And the caller that plugged the queue should unplug it > > anyway. (if it doesn't, there's 3ms timeout). > > Where were these unplug calls coming from? The block layer will > generally only unplug when it is already unplugged, so if you are seeing > so many unplug calls that the patch redues overhead by as much > described, perhaps the callsite is buggy? I do not have direct access the the benchmark setup, but here is the data I have received. The oprofile data was showing ll_rw_blk::generic_unplug_device() as a top routine at 13% of the samples. Annotation of the samples shows hits on spin_lock_irq(q->queue_lock). Here are some sample call traces: Call trace #1 kernel: [] generic_unplug_device+0x5d/0xc6 kernel: [] :dm_mod:dm_table_unplug_all+0x33/0x41 kernel: [] :dm_mod:dm_unplug_all+0x1d/0x28 kernel: [] blk_backing_dev_unplug+0x56/0x5b kernel: [] sync_buffer+0x36/0x3f kernel: [] __wait_on_bit+0x40/0x6f kernel: [] sync_buffer+0x0/0x3f kernel: [] out_of_line_wait_on_bit+0x6c/0x78 kernel: [] wake_bit_function+0x0/0x23 kernel: [] :jbd:journal_commit_transaction+0x91f/0x1086 kernel: [] lock_timer_base+0x1b/0x3c kernel: [] :jbd:kjournald+0xc1/0x213 kernel: [] autoremove_wake_function+0x0/0x2e kernel: [] keventd_create_kthread+0x0/0x61 kernel: [] :jbd:kjournald+0x0/0x213 kernel: [] keventd_create_kthread+0x0/0x61 kernel: [] kthread+0xfe/0x132 kernel: [] child_rip+0xa/0x11 kernel: [] keventd_create_kthread+0x0/0x61 kernel: [] kthread+0x0/0x132 kernel: [] child_rip+0x0/0x11 Call trace #2 kernel: [] generic_unplug_device+0x5d/0xc6 kernel: [] :dm_mod:dm_table_unplug_all+0x33/0x41 kernel: [] :dm_mod:dm_unplug_all+0x1d/0x28 kernel: [] blk_backing_dev_unplug+0x56/0x5b kernel: [] __blockdev_direct_IO+0x889/0xaa2 kernel: [] :ext3:ext3_direct_IO+0xf3/0x18b kernel: [] :ext3:ext3_get_block+0x0/0xe3 kernel: [] generic_file_direct_IO+0xbd/0xfb kernel: [] generic_file_direct_write+0x60/0xf2 kernel: [] __generic_file_aio_write_nolock+0x2b7/0x3b8 kernel: [] generic_file_aio_write+0x65/0xc1 kernel: [] :ext3:ext3_file_write+0x16/0x91 kernel: [] do_sync_write+0xc7/0x104 kernel: [] autoremove_wake_function+0x0/0x2e kernel: [] free_msg+0x22/0x3c kernel: [] vfs_write+0xce/0x174 kernel: [] sys_pwrite64+0x50/0x70 kernel: [] error_exit+0x0/0x84 kernel: [] system_call+0x7e/0x83 -andmike -- Michael Anderson andmike@linux.vnet.ibm.com