From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: Interrupt strangeness in scsi_request_fn() Date: Fri, 01 Aug 2014 18:00:03 -0400 Message-ID: <1406930403.2654.40.camel@jarvis> References: <20140801204848.GE7525@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:59892 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756441AbaHAWAH (ORCPT ); Fri, 1 Aug 2014 18:00:07 -0400 In-Reply-To: <20140801204848.GE7525@quack.suse.cz> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Jan Kara Cc: linux-scsi@vger.kernel.org On Fri, 2014-08-01 at 22:48 +0200, Jan Kara wrote: > Hello, > > when debugging one bug, I've noticed one strangeness in > scsi_request_fn(). We enter it with interrupts disabled and queue_lock > held. In the function we do stuff like: > spin_unlock_irq(shost->host_lock); > /* > * Finally, initialize any error handling parameters, and > * set up > * the timers for timeouts. > */ > scsi_init_cmd_errh(cmd); > > /* > * Dispatch the command to the low-level driver. > */ > rtn = scsi_dispatch_cmd(cmd); > spin_lock_irq(q->queue_lock); > > Why do we enable interrupts there? The thing is that scsi_request_fn() can > be called from IO completion (thus softirq context) and if we enable > interrupts there, another HW interrupt can interrupt softirq processing > which seems wrong to me. According to my reading of kernel/softirq.c, it looks like interrupts *are* already enabled when softirq functions are called by design. That doesn't mean your patch is necessarily wrong, it just means the premise above isn't quite right. > Now to admit my motivation I have reports from one customer (SLES kernel so > too old to be really interesting upstream but still of some value I > believe) where he's seeing softlockup reports when fully loading his SAN > and from the crash dumps it seems the machine simply spends too long in IO > completion because HW interrupts keep interrupting it plus there's > contention on shost->host_lock. If I change scsi_request_fn() to never > enable interrupts problems go away. > > So what would people say to something like the attached patch? I think the theory for enabling interrupts here is that the path is rather long ... it goes through the driver to the hardware and that the problem you describe above *should* be self correcting. If we keep interrupting dispatch, then we're throttling the outbound I/O which should reduce the returning interrupts. I'm not immovably opposed, I'd just like to see better justification. James