From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [PATCH block/for-3.3/core] block: an exiting task should be allowed to create io_context Date: Wed, 28 Dec 2011 08:48:36 -0800 Message-ID: <20111228164836.GP17712@google.com> References: <20111222150836.af172886.akpm@linux-foundation.org> <20111222232036.GP17084@google.com> <20111222152427.c944c747.akpm@linux-foundation.org> <20111222233843.GR17084@google.com> <20111222154427.89b245c7.akpm@linux-foundation.org> <20111222234639.GS17084@google.com> <20111223004244.GU17084@google.com> <20111225010238.GA6013@htj.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-ide-owner@vger.kernel.org To: Hugh Dickins Cc: Jens Axboe , Andrew Morton , Stephen Rothwell , linux-next@vger.kernel.org, LKML , linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, x86@kernel.org List-Id: linux-next.vger.kernel.org Hello, Hugh. On Wed, Dec 28, 2011 at 12:33:01AM -0800, Hugh Dickins wrote: > Thanks, I think I've now built enough kernels on -next plus your patch > to say that it does indeed solve that problem. Awesome, thanks for verifying the fix. > However, there are a couple of other unhealthy symptoms I've noticed > under load in -next's block/cfq layer, both with and without your patch. > > One is kernel BUG at block/cfq-iosched.c:2585! > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); > > cfq_dispatch_request+0x1a > cfq_dispatch_requests+0x5c > blk_peek_request+0x195 > scsi_request_fn+0x6a > __blk_run_queue+0x16 > scsi_run_queue+0x18a > scsi_next_command+0x36 > scsi_io_completion+0x426 > scsi_finish_command+0xaf > scsi_softirq_done+0xdd > blk_done_softirq+0x6c > __do_softirq+0x80 > call_softirq+0x1c > do_softirq+0x33 > irq_exit+0x3f > do_IRQ+0x97 > ret_from_intr > > I've had that one four times now on different machines; but quicker > to reproduce are these warnings from CONFIG_DEBUG_LIST=y: > > ------------[ cut here ]------------ > WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() > Hardware name: 4174AY9 > list_del corruption. prev->next should be ffff880005aa1380, but was 6b6b6b6b6b6b6b6b > Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device > Pid: 29241, comm: cc1 Tainted: G W 3.2.0-rc6-next-20111222 #18 > Call Trace: > [] warn_slowpath_common+0x80/0x98 > [] warn_slowpath_fmt+0x41/0x43 > [] __list_del_entry+0x8d/0x98 > [] cfq_remove_request+0x3b/0xdf > [] cfq_dispatch_insert+0x3a/0x87 > [] cfq_dispatch_request+0x65/0x92 > [] cfq_dispatch_requests+0x5c/0x133 > [] ? scsi_request_fn+0x3b6/0x3d3 > [] blk_peek_request+0x195/0x1a6 > [] ? scsi_request_fn+0x3b6/0x3d3 > [] scsi_request_fn+0x6d/0x3d3 > [] __blk_run_queue+0x19/0x1b > [] blk_run_queue+0x21/0x35 > [] scsi_run_queue+0x11f/0x1b9 > [] scsi_next_command+0x36/0x46 > [] scsi_io_completion+0x426/0x4a9 > [] scsi_finish_command+0xaf/0xb8 > [] scsi_softirq_done+0xdd/0xe5 > [] blk_done_softirq+0x76/0x8a > [] __do_softirq+0x98/0x136 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x38/0x81 > [] irq_exit+0x4e/0xb6 > [] do_IRQ+0x97/0xae > [] common_interrupt+0x70/0x70 > [] ? retint_swapgs+0xe/0x13 > ---[ end trace 61fdaa1b260613d1 ]--- Hmm... that looks like cfqq being freed before unlinked. I'll try to reproduce it. Is there any particular workload you were running? Thanks. -- tejun