From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755248AbYKJSoY (ORCPT ); Mon, 10 Nov 2008 13:44:24 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753859AbYKJSoP (ORCPT ); Mon, 10 Nov 2008 13:44:15 -0500 Received: from pasmtpb.tele.dk ([80.160.77.98]:44117 "EHLO pasmtpB.tele.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752836AbYKJSoO (ORCPT ); Mon, 10 Nov 2008 13:44:14 -0500 Date: Mon, 10 Nov 2008 19:42:42 +0100 From: Jens Axboe To: "Vitaly V. Bursov" Cc: Jeff Moyer , linux-kernel@vger.kernel.org Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases Message-ID: <20081110184241.GO26778@kernel.dk> References: <4917263D.2090904@telenet.dn.ua> <20081110104423.GA26778@kernel.dk> <20081110135618.GI26778@kernel.dk> <49186C5A.5020809@telenet.dn.ua> <20081110173504.GL26778@kernel.dk> <49187D05.9050407@telenet.dn.ua> <20081110182928.GN26778@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081110182928.GN26778@kernel.dk> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 10 2008, Jens Axboe wrote: > On Mon, Nov 10 2008, Vitaly V. Bursov wrote: > > Jens Axboe wrote: > > > On Mon, Nov 10 2008, Vitaly V. Bursov wrote: > > >> Jens Axboe wrote: > > >>> On Mon, Nov 10 2008, Jeff Moyer wrote: > > >>>> Jens Axboe writes: > > >>>> > > >>>>> http://bugzilla.kernel.org/attachment.cgi?id=18473&action=view > > >>>> Funny, I was going to ask the same question. ;) The reason Jens wants > > >>>> you to try this patch is that nfsd may be farming off the I/O requests > > >>>> to different threads which are then performing interleaved I/O. The > > >>>> above patch tries to detect this and allow cooperating processes to get > > >>>> disk time instead of waiting for the idle timeout. > > >>> Precisely :-) > > >>> > > >>> The only reason I haven't merged it yet is because of worry of extra > > >>> cost, but I'll throw some SSD love at it and see how it turns out. > > >>> > > >> Sorry, but I get "oops" same moment nfs read transfer starts. > > >> I can get directory list via nfs, read files locally (not > > >> carefully tested, though) > > >> > > >> Dumps captured via netconsole, so these may not be completely accurate > > >> but hopefully will give a hint. > > > > > > Interesting, strange how that hasn't triggered here. Or perhaps the > > > version that Jeff posted isn't the one I tried. Anyway, search for: > > > > > > RB_CLEAR_NODE(&cfqq->rb_node); > > > > > > and add a > > > > > > RB_CLEAR_NODE(&cfqq->prio_node); > > > > > > just below that. It's in cfq_find_alloc_queue(). I think that should fix > > > it. > > > > > > > Same problem. > > > > I did make clean; make -j3; sync; on (2 times) patched kernel and it went OK > > but It won't boot anymore with cfq with same error... > > > > Switching cfq io scheduler at runtime (booting with "as") appears to work with > > two parallel local dd reads. > > > > But when NFS server starts up: > > > > [ 469.000105] BUG: unable to handle kernel > > NULL pointer dereference > > at 0000000000000000 > > [ 469.000305] IP: > > [] rb_erase+0x124/0x290 > > ... > > > > [ 469.001905] Pid: 2296, comm: md1_raid5 Not tainted 2.6.27.5 #4 > > [ 469.001982] RIP: 0010:[] > > [] rb_erase+0x124/0x290 > > ... > > [ 469.002509] Call Trace: > > [ 469.002509] [] ? rb_erase_init+0x9/0x17 > > [ 469.002509] [] ? cfq_prio_tree_add+0x38/0xa8 > > [ 469.002509] [] ? cfq_add_rq_rb+0xb5/0xc8 > > [ 469.002509] [] ? cfq_insert_request+0x5a/0x356 > > [ 469.002509] [] ? elv_insert+0x14b/0x218 > > [ 469.002509] [] ? bio_phys_segments+0xf/0x15 > > [ 469.002509] [] ? __make_request+0x3b9/0x3eb > > [ 469.002509] [] ? generic_make_request+0x30b/0x346 > > [ 469.002509] [] ? raid5_end_write_request+0x0/0xb8 > > [ 469.002509] [] ? ops_run_io+0x16a/0x1c1 > > [ 469.002509] [] ? handle_stripe5+0x9b5/0x9d6 > > [ 469.002509] [] ? handle_stripe+0xc3a/0xc6a > > [ 469.002509] [] ? pick_next_task_fair+0x8d/0x9c > > [ 469.002509] [] ? thread_return+0x3a/0xaa > > [ 469.002509] [] ? raid5d+0x396/0x3cd > > [ 469.002509] [] ? schedule_timeout+0x1e/0xad > > [ 469.002509] [] ? md_thread+0xdd/0xf9 > > [ 469.002509] [] ? autoremove_wake_function+0x0/0x2e > > [ 469.002509] [] ? md_thread+0x0/0xf9 > > [ 469.002509] [] ? kthread+0x47/0x73 > > [ 469.002509] [] ? schedule_tail+0x28/0x60 > > [ 469.002509] [] ? child_rip+0xa/0x11 > > [ 469.002509] [] ? kthread+0x0/0x73 > > [ 469.002509] [] ? child_rip+0x0/0x11 > > ... > > [ 469.002509] RIP > > [] rb_erase+0x124/0x290 > > [ 469.002509] RSP > > [ 469.002509] CR2: 0000000000000000 > > [ 469.002509] ---[ end trace acdef779aeb56048 ]--- > > > > > > "Best" result I got with NFS was > > avg-cpu: %user %nice %system %iowait %steal %idle > > 0,00 0,00 0,20 0,65 0,00 99,15 > > > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util > > sda 11,30 0,00 7,60 0,00 245,60 0,00 32,32 0,01 1,18 0,79 0,60 > > sdb 12,10 0,00 8,00 0,00 246,40 0,00 30,80 0,01 1,62 0,62 0,50 > > > > and it lasted around 30 seconds. > > OK, I'll throw some NFS at this patch in the morning and do some > measurements as well, so it can get queued up. I spotted a bug - if the ioprio of a process gets changed, it needs to be repositioned in the cooperator tree or we'll end up doing erase on a wrong root. Perhaps that is what is biting you here. -- Jens Axboe