From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Re: [PATCH] mpt2sas: Remove queuecommand wrapper Date: Tue, 5 Jul 2011 16:24:53 -0400 Message-ID: <20110705202453.GC4061@linux.intel.com> References: <20110704192713.GZ4061@linux.intel.com> <20110705141434.GB389@parisc-linux.org> <20110705152143.GB4061@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mga11.intel.com ([192.55.52.93]:9484 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754736Ab1GEUYz (ORCPT ); Tue, 5 Jul 2011 16:24:55 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "Desai, Kashyap" Cc: Matthew Wilcox , "linux-scsi@vger.kernel.org" , DL-MPT Fusion Linux , "nab@linux-iscsi.org" , Doug Gilbert On Tue, Jul 05, 2011 at 10:48:00PM +0530, Desai, Kashyap wrote: > > -----Original Message----- > > From: Matthew Wilcox [mailto:willy@linux.intel.com] > > Sent: Tuesday, July 05, 2011 8:52 PM > > To: Desai, Kashyap > > Cc: Matthew Wilcox; linux-scsi@vger.kernel.org; DL-MPT Fusion Linux; > > nab@linux-iscsi.org > > Subject: Re: [PATCH] mpt2sas: Remove queuecommand wrapper > > void scsi_eh_wakeup(struct Scsi_Host *shost) > > { > > if (shost->host_busy == shost->host_failed) { > > I think you are right here. > > I did below experiment. > Since now host_lock is removed from dispatch() callback. > I added sleep() inside scsih_qcmd().. Just to mimic preemption. And called Host reset using > "Sg_reset -h" at the sametime when driver is sleeping inside qcmd(). I see the kernel crash due to invalid scsi command pointer access after driver comes out from sleep(). OK, but I don't think we've introduced a new race here, just expanded the window. If you follow the path down from userspace, we get to scsi_nonblockable_ioctl() (doesn't take the host_lock), then scsi_reset_provider (which takes and releases the host_lock to set tmf_in_progress), then scsi_try_host_reset() which calls eh_host_reset_handler() without holding the host_lock. So it's entirely possible to hit the same race on an SMP machine already as there's nothing to synchronise the eh_host_reset_handler call with queuecommand. I think someone who uses sg_reset ought to be aware of the possibility that they can trash their system. Doug, what do you think? > As you said, due to actual Error handling will not wake up, but if user/customer do testing using sg_tools > They can kick off Host reset on demand. > > What is your input on my above test case/observation ? > > > trace_scsi_eh_wakeup(shost); > > wake_up_process(shost->ehandler); > > > > So the SCSI error handler doesn't run until host_busy == host_failed. > > host_busy is incremented in scsi_request_fn() (before scsi_dispatch_cmd > > is called). host_failed is incremented in scsi_eh_scmd_add(). So after > > a command has been issued, it prevents the EH from running until it > > too fails. That means queuecommand cannot be running at the same time > > as the EH. > > > > That's documented in Documentation/scsi/scsi_eh.txt section [1-3].