From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755227AbYKJS1e (ORCPT ); Mon, 10 Nov 2008 13:27:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753273AbYKJS1Z (ORCPT ); Mon, 10 Nov 2008 13:27:25 -0500 Received: from voyager.telenet.dn.ua ([195.39.211.35]:54470 "EHLO voyager.telenet.dn.ua" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752408AbYKJS1Y (ORCPT ); Mon, 10 Nov 2008 13:27:24 -0500 Message-ID: <49187D05.9050407@telenet.dn.ua> Date: Mon, 10 Nov 2008 20:27:17 +0200 From: "Vitaly V. Bursov" User-Agent: Thunderbird 2.0.0.17 (X11/20081011) MIME-Version: 1.0 To: Jens Axboe CC: Jeff Moyer , linux-kernel@vger.kernel.org Subject: Re: Slow file transfer speeds with CFQ IO scheduler in some cases References: <4917263D.2090904@telenet.dn.ua> <20081110104423.GA26778@kernel.dk> <20081110135618.GI26778@kernel.dk> <49186C5A.5020809@telenet.dn.ua> <20081110173504.GL26778@kernel.dk> In-Reply-To: <20081110173504.GL26778@kernel.dk> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jens Axboe wrote: > On Mon, Nov 10 2008, Vitaly V. Bursov wrote: >> Jens Axboe wrote: >>> On Mon, Nov 10 2008, Jeff Moyer wrote: >>>> Jens Axboe writes: >>>> >>>>> http://bugzilla.kernel.org/attachment.cgi?id=18473&action=view >>>> Funny, I was going to ask the same question. ;) The reason Jens wants >>>> you to try this patch is that nfsd may be farming off the I/O requests >>>> to different threads which are then performing interleaved I/O. The >>>> above patch tries to detect this and allow cooperating processes to get >>>> disk time instead of waiting for the idle timeout. >>> Precisely :-) >>> >>> The only reason I haven't merged it yet is because of worry of extra >>> cost, but I'll throw some SSD love at it and see how it turns out. >>> >> Sorry, but I get "oops" same moment nfs read transfer starts. >> I can get directory list via nfs, read files locally (not >> carefully tested, though) >> >> Dumps captured via netconsole, so these may not be completely accurate >> but hopefully will give a hint. > > Interesting, strange how that hasn't triggered here. Or perhaps the > version that Jeff posted isn't the one I tried. Anyway, search for: > > RB_CLEAR_NODE(&cfqq->rb_node); > > and add a > > RB_CLEAR_NODE(&cfqq->prio_node); > > just below that. It's in cfq_find_alloc_queue(). I think that should fix > it. > Same problem. I did make clean; make -j3; sync; on (2 times) patched kernel and it went OK but It won't boot anymore with cfq with same error... Switching cfq io scheduler at runtime (booting with "as") appears to work with two parallel local dd reads. But when NFS server starts up: [ 469.000105] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 469.000305] IP: [] rb_erase+0x124/0x290 ... [ 469.001905] Pid: 2296, comm: md1_raid5 Not tainted 2.6.27.5 #4 [ 469.001982] RIP: 0010:[] [] rb_erase+0x124/0x290 ... [ 469.002509] Call Trace: [ 469.002509] [] ? rb_erase_init+0x9/0x17 [ 469.002509] [] ? cfq_prio_tree_add+0x38/0xa8 [ 469.002509] [] ? cfq_add_rq_rb+0xb5/0xc8 [ 469.002509] [] ? cfq_insert_request+0x5a/0x356 [ 469.002509] [] ? elv_insert+0x14b/0x218 [ 469.002509] [] ? bio_phys_segments+0xf/0x15 [ 469.002509] [] ? __make_request+0x3b9/0x3eb [ 469.002509] [] ? generic_make_request+0x30b/0x346 [ 469.002509] [] ? raid5_end_write_request+0x0/0xb8 [ 469.002509] [] ? ops_run_io+0x16a/0x1c1 [ 469.002509] [] ? handle_stripe5+0x9b5/0x9d6 [ 469.002509] [] ? handle_stripe+0xc3a/0xc6a [ 469.002509] [] ? pick_next_task_fair+0x8d/0x9c [ 469.002509] [] ? thread_return+0x3a/0xaa [ 469.002509] [] ? raid5d+0x396/0x3cd [ 469.002509] [] ? schedule_timeout+0x1e/0xad [ 469.002509] [] ? md_thread+0xdd/0xf9 [ 469.002509] [] ? autoremove_wake_function+0x0/0x2e [ 469.002509] [] ? md_thread+0x0/0xf9 [ 469.002509] [] ? kthread+0x47/0x73 [ 469.002509] [] ? schedule_tail+0x28/0x60 [ 469.002509] [] ? child_rip+0xa/0x11 [ 469.002509] [] ? kthread+0x0/0x73 [ 469.002509] [] ? child_rip+0x0/0x11 ... [ 469.002509] RIP [] rb_erase+0x124/0x290 [ 469.002509] RSP [ 469.002509] CR2: 0000000000000000 [ 469.002509] ---[ end trace acdef779aeb56048 ]--- "Best" result I got with NFS was avg-cpu: %user %nice %system %iowait %steal %idle 0,00 0,00 0,20 0,65 0,00 99,15 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 11,30 0,00 7,60 0,00 245,60 0,00 32,32 0,01 1,18 0,79 0,60 sdb 12,10 0,00 8,00 0,00 246,40 0,00 30,80 0,01 1,62 0,62 0,50 and it lasted around 30 seconds. -- Thanks, Vitaly