From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bill Davidsen Subject: Re: smp dead lock of io_request_lock/queue_lock patch Date: Thu, 15 Jan 2004 12:01:31 -0500 Sender: linux-kernel-owner@vger.kernel.org Message-ID: <4006C76B.3090206@tmr.com> References: <20040112092231.GG29177@suse.de> <1073914073.3114.263.camel@compaq.xsintricity.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1073914073.3114.263.camel@compaq.xsintricity.com> To: Doug Ledford Cc: Jens Axboe , Arjan Van de Ven , Peter Yao , linux-kernel@vger.kernel.org, linux-scsi mailing list List-Id: linux-scsi@vger.kernel.org Doug Ledford wrote: > On Mon, 2004-01-12 at 04:22, Jens Axboe wrote: > >>On Mon, Jan 12 2004, Arjan van de Ven wrote: >> >>>On Mon, Jan 12, 2004 at 10:19:46AM +0100, Jens Axboe wrote: >>> >>>>... and still exists in your 2.4.21 based kernel. >>> >>>The RHL 2.4.21 kernels don't have the locking patch at all... >> >>But RHEL3 does: >> >>http://kernelnewbies.org/kernels/rhel3/SOURCES/linux-2.4.21-iorl.patch >> >>and the bug is there. > > > But in RHEL3 the bug is fixed already (not in a released kernel, but the > fix went into our internal kernel some time back and will be in our next > update kernel). From my internal bk tree for this stuff: "not in a released kernel..." Do I read this right? That you have a fix for a critical bug and it hasn't been pushed to customers yet? How about security bugs, has the fix you pushed in RH-9.0 been push to EL customers? > [dledford@compaq RHEL3-scsi]$ bk changes -r1.23 > ChangeSet@1.23, 2003-11-10 17:19:54-05:00, dledford@compaq.xsintricity.com > drivers/scsi/scsi_error.c > Don't panic if the eh thread is dead, instead do the same thing that > scsi_softirq_handler does and just complete the command as a failed > command. > Change when we wake the eh thread in scsi_times_out to accomodate > the changes to the mlqueue operations. > Clear blocked status on the host and all devices in scsi_restart_operations > -> Don't grab the host_lock in scsi_restart_operations, we aren't doing > anything that needs it. Just goose the queues unconditionally, > scsi_request_fn() will know to not send commands if they shouldn't > go for some reason. > Make sure we account SCSI_STATE_MLQUEUE commands as not being failed > commands in scsi_unjam_host. > > But, Jens is right, it's a real bug. I just fixed it in a different > way. And my fix is dependent on other changes in our scsi stack as > well. Yes, thanks to Peter for that fix, nice that someone provides timely fixes... -- bill davidsen CTO TMR Associates, Inc Doing interesting things with small computers since 1979