From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [Bug 12945] New: SCSI Generic (sg): BUG: sleeping function called from invalid context Date: Fri, 27 Mar 2009 07:57:27 +0100 Message-ID: <20090327065727.GR27476@kernel.dk> References: <20090326074952.40ffdcd9.akpm@linux-foundation.org> <20090326184301.GK27476@kernel.dk> <20090327130843Z.fujita.tomonori@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from brick.kernel.dk ([93.163.65.50]:56458 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751020AbZC0G5c (ORCPT ); Fri, 27 Mar 2009 02:57:32 -0400 Content-Disposition: inline In-Reply-To: <20090327130843Z.fujita.tomonori@lab.ntt.co.jp> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: FUJITA Tomonori Cc: akpm@linux-foundation.org, bugzilla-daemon@bugzilla.kernel.org, linux-scsi@vger.kernel.org, txtoxtox285@googlemail.com, dougg@torque.net, James.Bottomley@HansenPartnership.com On Fri, Mar 27 2009, FUJITA Tomonori wrote: > On Thu, 26 Mar 2009 19:43:02 +0100 > Jens Axboe wrote: > > > On Thu, Mar 26 2009, Andrew Morton wrote: > > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > > > On Thu, 26 Mar 2009 12:27:53 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=12945 > > > > > > > > Summary: SCSI Generic (sg): BUG: sleeping function called from > > > > invalid context > > > > Product: SCSI Drivers > > > > Version: 2.5 > > > > Kernel Version: 2.6.28.9 > > > > Platform: All > > > > OS/Version: Linux > > > > Tree: Mainline > > > > Status: NEW > > > > Severity: normal > > > > Priority: P1 > > > > Component: Other > > > > AssignedTo: scsi_drivers-other@kernel-bugs.osdl.org > > > > ReportedBy: txtoxtox285@googlemail.com > > > > Regression: No > > > > > > > > > > > > Created an attachment (id=20685) > > > > --> (http://bugzilla.kernel.org/attachment.cgi?id=20685) > > > > Stack trace on program kill (2.6.28.9) > > > > > > > > I am experimenting with CD audio extraction. I use the SCSI Generic driver for > > > > this. > > > > > > > > My test program uses read() and write() (instead of ioctl) to send requests to > > > > the driver and receive responses. I use SG_FLAG_DIRECT_IO. > > > > > > > > When I kill my program (because I don't want to wait until it has ripped the > > > > entire CD), I am often rewarded with messages like "BUG: sleeping function > > > > called from invalid context at linux-2.6.28.9/include/linux/pagemap.h:347". I > > > > have attached typical stack trace. > > > > > > > > Another case when I hit this BUG is when I set a time out and the CD drive > > > > doesn't respond fast enough. A stack trace is attached. > > > > > > > [34215.786870] BUG: sleeping function called from invalid context at /mnt/var-pub/src/linux-2.6.28.9/include/linux/pagemap.h:347 > > > > [34215.786880] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper > > > > [34215.786886] Pid: 0, comm: swapper Not tainted 2.6.28.9 #1 > > > > [34215.786890] Call Trace: > > > > [34215.786894] [] set_page_dirty_lock+0x1a/0x45 > > > > [34215.786911] [] bio_unmap_user+0x1e/0x4a > > > > [34215.786920] [] __blk_rq_unmap_user+0x14/0x20 > > > > [34215.786928] [] pit_next_event+0x2e/0x49 > > > > [34215.786934] [] blk_rq_unmap_user+0x1e/0x4b > > > > [34215.786965] [] sg_finish_rem_req+0x6d/0x88 [sg] > > > > [34215.786979] [] sg_rq_end_io+0x131/0x205 [sg] > > > > [34215.786986] [] end_that_request_last+0x58/0x194 > > > > [34215.786992] [] blk_end_io+0x48/0x7d > > > > [34215.787019] [] scsi_next_command+0x219/0x283 [scsi_mod] > > > > [34215.787039] [] scsi_io_completion+0x181/0x53b [scsi_mod] > > > > [34215.787047] [] blk_done_softirq+0x5f/0x6d > > > > [34215.787054] [] __do_softirq+0x5e/0xf8 > > > > [34215.787061] [] call_softirq+0x1c/0x28 > > > > [34215.787067] [] do_softirq+0x2c/0x68 > > > > [34215.787073] [] irq_exit+0x36/0x82 > > > > [34215.787079] [] do_IRQ+0xa6/0xb8 > > > > [34215.787085] [] ret_from_intr+0x0/0xa > > > > [34215.787088] [] menu_reflect+0x0/0x6d > > > > [34215.787112] [] acpi_idle_enter_simple+0x170/0x1d6 [processor] > > > > [34215.787127] [] acpi_idle_enter_simple+0x166/0x1d6 [processor] > > > > [34215.787134] [] cpuidle_idle_call+0x73/0xb1 > > > > [34215.787140] [] cpu_idle+0x3c/0x73 > > > > > > Argh. sg_finish_rem_req() is called from interrupt context. But > > > blk_rq_unmap_user() can run > > > __bio_unmap_user()->set_page_dirty_lock()->lock_page(), which can call > > > schedule(). If it does call schedule(), the machine will crash. > > > > > > afacit, blk_rq_unmap_user() has always been a can-sleep function, and > > > this is a regression caused by > > > > > > commit 6e5a30cba5e7c03b2cd564e968f1dd667a0f7c42 > > > > Yep, it is. The problem is the usage of: > > > > blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk, > > srp->rq, 1, sg_rq_end_io); > > > > and then doing the sg_finish_rem_req() -> blk_rq_unmap_user() from the > > end_io path, where other users do a sync request and then unmap from the > > same context. > > Right. And only sg does that. I've already converted st and osst to > use the block layer but they works synchronously. Precisely. > > > Hmm. Perhaps we can add some request flag to specify doing > > the completion from user context, then other users could be converted do > > the _nowait() approach as well and get some unification/cleanup there as > > well. > > Since only sg needs this so I simply fixed sg instead of changing the > block layer. But it might be nice if block layer can handle this. > > Seems there are several patches for the block layer (including > mapping) from Tejun and Boaz. I'll read them to see what we could do. > I'm always too busy in March with the company matters. OK, let me know what you find in the scsi tree. I'll hold off on this one. -- Jens Axboe