From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755669AbYKJSbR (ORCPT ); Mon, 10 Nov 2008 13:31:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754169AbYKJSbD (ORCPT ); Mon, 10 Nov 2008 13:31:03 -0500 Received: from pasmtpa.tele.dk ([80.160.77.114]:37797 "EHLO pasmtpA.tele.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753859AbYKJSbB (ORCPT ); Mon, 10 Nov 2008 13:31:01 -0500 Date: Mon, 10 Nov 2008 19:29:28 +0100 From: Jens Axboe To: "Vitaly V. Bursov" Cc: Jeff Moyer , linux-kernel@vger.kernel.org Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases Message-ID: <20081110182928.GN26778@kernel.dk> References: <4917263D.2090904@telenet.dn.ua> <20081110104423.GA26778@kernel.dk> <20081110135618.GI26778@kernel.dk> <49186C5A.5020809@telenet.dn.ua> <20081110173504.GL26778@kernel.dk> <49187D05.9050407@telenet.dn.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49187D05.9050407@telenet.dn.ua> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 10 2008, Vitaly V. Bursov wrote: > Jens Axboe wrote: > > On Mon, Nov 10 2008, Vitaly V. Bursov wrote: > >> Jens Axboe wrote: > >>> On Mon, Nov 10 2008, Jeff Moyer wrote: > >>>> Jens Axboe writes: > >>>> > >>>>> http://bugzilla.kernel.org/attachment.cgi?id=18473&action=view > >>>> Funny, I was going to ask the same question. ;) The reason Jens wants > >>>> you to try this patch is that nfsd may be farming off the I/O requests > >>>> to different threads which are then performing interleaved I/O. The > >>>> above patch tries to detect this and allow cooperating processes to get > >>>> disk time instead of waiting for the idle timeout. > >>> Precisely :-) > >>> > >>> The only reason I haven't merged it yet is because of worry of extra > >>> cost, but I'll throw some SSD love at it and see how it turns out. > >>> > >> Sorry, but I get "oops" same moment nfs read transfer starts. > >> I can get directory list via nfs, read files locally (not > >> carefully tested, though) > >> > >> Dumps captured via netconsole, so these may not be completely accurate > >> but hopefully will give a hint. > > > > Interesting, strange how that hasn't triggered here. Or perhaps the > > version that Jeff posted isn't the one I tried. Anyway, search for: > > > > RB_CLEAR_NODE(&cfqq->rb_node); > > > > and add a > > > > RB_CLEAR_NODE(&cfqq->prio_node); > > > > just below that. It's in cfq_find_alloc_queue(). I think that should fix > > it. > > > > Same problem. > > I did make clean; make -j3; sync; on (2 times) patched kernel and it went OK > but It won't boot anymore with cfq with same error... > > Switching cfq io scheduler at runtime (booting with "as") appears to work with > two parallel local dd reads. > > But when NFS server starts up: > > [ 469.000105] BUG: unable to handle kernel > NULL pointer dereference > at 0000000000000000 > [ 469.000305] IP: > [] rb_erase+0x124/0x290 > ... > > [ 469.001905] Pid: 2296, comm: md1_raid5 Not tainted 2.6.27.5 #4 > [ 469.001982] RIP: 0010:[] > [] rb_erase+0x124/0x290 > ... > [ 469.002509] Call Trace: > [ 469.002509] [] ? rb_erase_init+0x9/0x17 > [ 469.002509] [] ? cfq_prio_tree_add+0x38/0xa8 > [ 469.002509] [] ? cfq_add_rq_rb+0xb5/0xc8 > [ 469.002509] [] ? cfq_insert_request+0x5a/0x356 > [ 469.002509] [] ? elv_insert+0x14b/0x218 > [ 469.002509] [] ? bio_phys_segments+0xf/0x15 > [ 469.002509] [] ? __make_request+0x3b9/0x3eb > [ 469.002509] [] ? generic_make_request+0x30b/0x346 > [ 469.002509] [] ? raid5_end_write_request+0x0/0xb8 > [ 469.002509] [] ? ops_run_io+0x16a/0x1c1 > [ 469.002509] [] ? handle_stripe5+0x9b5/0x9d6 > [ 469.002509] [] ? handle_stripe+0xc3a/0xc6a > [ 469.002509] [] ? pick_next_task_fair+0x8d/0x9c > [ 469.002509] [] ? thread_return+0x3a/0xaa > [ 469.002509] [] ? raid5d+0x396/0x3cd > [ 469.002509] [] ? schedule_timeout+0x1e/0xad > [ 469.002509] [] ? md_thread+0xdd/0xf9 > [ 469.002509] [] ? autoremove_wake_function+0x0/0x2e > [ 469.002509] [] ? md_thread+0x0/0xf9 > [ 469.002509] [] ? kthread+0x47/0x73 > [ 469.002509] [] ? schedule_tail+0x28/0x60 > [ 469.002509] [] ? child_rip+0xa/0x11 > [ 469.002509] [] ? kthread+0x0/0x73 > [ 469.002509] [] ? child_rip+0x0/0x11 > ... > [ 469.002509] RIP > [] rb_erase+0x124/0x290 > [ 469.002509] RSP > [ 469.002509] CR2: 0000000000000000 > [ 469.002509] ---[ end trace acdef779aeb56048 ]--- > > > "Best" result I got with NFS was > avg-cpu: %user %nice %system %iowait %steal %idle > 0,00 0,00 0,20 0,65 0,00 99,15 > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util > sda 11,30 0,00 7,60 0,00 245,60 0,00 32,32 0,01 1,18 0,79 0,60 > sdb 12,10 0,00 8,00 0,00 246,40 0,00 30,80 0,01 1,62 0,62 0,50 > > and it lasted around 30 seconds. OK, I'll throw some NFS at this patch in the morning and do some measurements as well, so it can get queued up. -- Jens Axboe