* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
@ 2007-04-06 2:21 Andrew Burgess
2007-04-06 13:44 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2007-04-06 2:21 UTC (permalink / raw)
To: davem, James.Bottomley, linux-kernel, linux-scsi
On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> From: Andrew Burgess <aab@cichlid.com>
> Date: Thu, 5 Apr 2007 15:13:27 -0700
>
> > David, do you see any other problems with scsi_send_eh_cmnd?
> >
> > I've switched back to 2.6.18 which seems to not oops
> > and am happy to try patches.
>
> Does 2.6.20 with my patch OOPS too? Does reverting my patch
> make the oops go away?
>
> If reverting my patch makes the OOPS go away, we need to
> verify if page_address() is returning crap for some reason
> or the length is wrong.
2.6.20.4 with your patch dies in the memcpy (as does 21-gitN)
2.6.20.4 without your patch dies in the subsequent __free_page
with a null pointer ref at 000...008
James should I try your posted patch? On which kernel?
This machine will die in boot on these kernels until I power
cycle it (which somehow fixes the disk/controller for a
while), 2.6.18 continue to work (gets the scsi errors and
continues)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 2:21 Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13 Andrew Burgess
@ 2007-04-06 13:44 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2007-04-06 13:44 UTC (permalink / raw)
To: Andrew Burgess; +Cc: davem, linux-kernel, linux-scsi
On Thu, 2007-04-05 at 19:21 -0700, Andrew Burgess wrote:
> 2.6.20.4 with your patch dies in the memcpy (as does 21-gitN)
>
> 2.6.20.4 without your patch dies in the subsequent __free_page
> with a null pointer ref at 000...008
>
> James should I try your posted patch? On which kernel?
Well, mine will work, but will also cover up the problem. It looks like
the scatterlist is getting corrupted by the HBA driver somehow ... we
really need to root cause this, otherwise other things will go wrong in
strange ways.
Just to confirm, the HBA with the problem is 3w-xxxx?
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
@ 2007-04-06 18:18 Andrew Burgess
0 siblings, 0 replies; 16+ messages in thread
From: Andrew Burgess @ 2007-04-06 18:18 UTC (permalink / raw)
To: aradford, cebbert, davem, dougg, James.Bottomley, linux-kernel,
linux-scsi
James Bottomley wrote:
> On Fri, 2007-04-06 at 12:32 -0400, Douglas Gilbert wrote:
> > That last line should be:
> > request_buffer[7] = 10; /* minimum size per SPC: 18 bytes */
>
> Er, yes, I'm afraid when I see n-7 I always think n is the lenght, not n
> is the last byte ...
OK. That patch with Doug's fix works in 2.6.21rc6.
Thanks to you all, now I can recrimp my power connector
rats nest and get things back up again.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
@ 2007-04-06 15:51 Andrew Burgess
2007-04-06 16:14 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2007-04-06 15:51 UTC (permalink / raw)
To: james.bottomley; +Cc: cebbert, davem, linux-kernel, linux-scsi, aradford
James Bottomley wrote:
>It's actually a long standing bug in the 3w-xxxx driver. Apparently it
>assumes request sense is always the use_sg == 0 case. This is what it
>does on a request sense:
>static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
>{
> dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
> /* For now we just zero the request buffer */
> memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
> tw_dev->state[request_id] = TW_S_COMPLETED;
> tw_state_request_finish(tw_dev, request_id);
>....
>Note that it's clearing the request buffer, which is actually zeroing the scatterlist, hence the problem.
OK. Is there a quick workaround or should I just wait for
Adam & Company to make a patch?
You said your earlier patch would hide it, and then said you
had a length wrong in it and I'm not sure what length you
mean.
Thanks for yours and Chuck's and Dave's time.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 15:51 Andrew Burgess
@ 2007-04-06 16:14 ` James Bottomley
2007-04-06 16:32 ` Douglas Gilbert
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2007-04-06 16:14 UTC (permalink / raw)
To: Andrew Burgess; +Cc: cebbert, davem, linux-kernel, linux-scsi, aradford
On Fri, 2007-04-06 at 08:51 -0700, Andrew Burgess wrote:
> James Bottomley wrote:
>
> >It's actually a long standing bug in the 3w-xxxx driver. Apparently it
> >assumes request sense is always the use_sg == 0 case. This is what it
> >does on a request sense:
>
> >static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
> >{
> > dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
>
> > /* For now we just zero the request buffer */
> > memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
> > tw_dev->state[request_id] = TW_S_COMPLETED;
> > tw_state_request_finish(tw_dev, request_id);
> >....
>
> >Note that it's clearing the request buffer, which is actually zeroing the scatterlist, hence the problem.
>
> OK. Is there a quick workaround or should I just wait for
> Adam & Company to make a patch?
Try this ... I think it's roughly the correct fix.
> You said your earlier patch would hide it, and then said you
> had a length wrong in it and I'm not sure what length you
> mean.
It's the length specifier in the error handler request sense command ...
I'll fix it up and redo my patch through scsi-misc, since it's not going
to fix the root cause of the problem.
James
diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c
index bf5d63e..6b303ba 100644
--- a/drivers/scsi/3w-xxxx.c
+++ b/drivers/scsi/3w-xxxx.c
@@ -1864,10 +1864,17 @@ static int tw_scsiop_read_write(TW_Device_Extension *tw_dev, int request_id)
/* This function will handle the request sense scsi command */
static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
{
+ char request_buffer[18];
+
dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
- /* For now we just zero the request buffer */
- memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
+ memset(request_buffer, 0, sizeof(request_buffer));
+ request_buffer[0] = 0x70; /* Immediate fixed format */
+ request_buffer[7] = 11; /* minimum size per SPC: 18 bytes */
+ /* leave all other fields zero, giving effectively NO_SENSE return */
+ tw_transfer_internal(tw_dev, request_id, request_buffer,
+ sizeof(request_buffer));
+
tw_dev->state[request_id] = TW_S_COMPLETED;
tw_state_request_finish(tw_dev, request_id);
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 16:14 ` James Bottomley
@ 2007-04-06 16:32 ` Douglas Gilbert
2007-04-06 16:47 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: Douglas Gilbert @ 2007-04-06 16:32 UTC (permalink / raw)
To: James Bottomley
Cc: Andrew Burgess, cebbert, davem, linux-kernel, linux-scsi,
aradford
James Bottomley wrote:
> On Fri, 2007-04-06 at 08:51 -0700, Andrew Burgess wrote:
>> James Bottomley wrote:
>>
>>> It's actually a long standing bug in the 3w-xxxx driver. Apparently it
>>> assumes request sense is always the use_sg == 0 case. This is what it
>>> does on a request sense:
>>> static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
>>> {
>>> dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
>>> /* For now we just zero the request buffer */
>>> memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
>>> tw_dev->state[request_id] = TW_S_COMPLETED;
>>> tw_state_request_finish(tw_dev, request_id);
>>> ....
>>> Note that it's clearing the request buffer, which is actually zeroing the scatterlist, hence the problem.
>> OK. Is there a quick workaround or should I just wait for
>> Adam & Company to make a patch?
>
> Try this ... I think it's roughly the correct fix.
>
>> You said your earlier patch would hide it, and then said you
>> had a length wrong in it and I'm not sure what length you
>> mean.
>
> It's the length specifier in the error handler request sense command ...
> I'll fix it up and redo my patch through scsi-misc, since it's not going
> to fix the root cause of the problem.
>
> James
>
> diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c
> index bf5d63e..6b303ba 100644
> --- a/drivers/scsi/3w-xxxx.c
> +++ b/drivers/scsi/3w-xxxx.c
> @@ -1864,10 +1864,17 @@ static int tw_scsiop_read_write(TW_Device_Extension *tw_dev, int request_id)
> /* This function will handle the request sense scsi command */
> static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
> {
> + char request_buffer[18];
> +
> dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
>
> - /* For now we just zero the request buffer */
> - memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
> + memset(request_buffer, 0, sizeof(request_buffer));
> + request_buffer[0] = 0x70; /* Immediate fixed format */
> + request_buffer[7] = 11; /* minimum size per SPC: 18 bytes */
James,
That last line should be:
request_buffer[7] = 10; /* minimum size per SPC: 18 bytes */
Doug Gilbert
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 16:32 ` Douglas Gilbert
@ 2007-04-06 16:47 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2007-04-06 16:47 UTC (permalink / raw)
To: dougg; +Cc: Andrew Burgess, cebbert, davem, linux-kernel, linux-scsi,
aradford
On Fri, 2007-04-06 at 12:32 -0400, Douglas Gilbert wrote:
> That last line should be:
> request_buffer[7] = 10; /* minimum size per SPC: 18 bytes */
Er, yes, I'm afraid when I see n-7 I always think n is the lenght, not n
is the last byte ...
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
@ 2007-04-06 15:12 Andrew Burgess
2007-04-06 15:27 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2007-04-06 15:12 UTC (permalink / raw)
To: cebbert, davem, James.Bottomley, linux-kernel, linux-scsi
James Bottomley said:
> On Thu, 2007-04-05 at 19:21 -0700, Andrew Burgess wrote:
> > 2.6.20.4 with your patch dies in the memcpy (as does 21-gitN)
> >
> > 2.6.20.4 without your patch dies in the subsequent __free_page
> > with a null pointer ref at 000...008
> >
> > James should I try your posted patch? On which kernel?
>
> Well, mine will work, but will also cover up the problem. It looks like
> the scatterlist is getting corrupted by the HBA driver somehow ... we
> really need to root cause this, otherwise other things will go wrong in
> strange ways.
>
> Just to confirm, the HBA with the problem is 3w-xxxx?
Yes. The 3w-xxxx.c driver changed between 2.6.18 and 2.6.20 but
nothing jumps out to my untrained eyes. Here's the diff:
Also, I should mention that the working kernel is a fedora
rpm (2.6.18-1.2798.fc6) so I don't know what patches are in it.
The vmlinuz is dated Oct 6 2006.
--------------------------------------------------------------------
root@athlon:kernel # diff -u linux-2.6.20-rt5/drivers/scsi/3w-xxxx.c linux-2.6.18.6/drivers/scsi/3w-xxxx.c
--- linux-2.6.20-rt5/drivers/scsi/3w-xxxx.c 2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.18.6/drivers/scsi/3w-xxxx.c 2006-12-16 16:21:00.000000000 -0800
@@ -6,7 +6,7 @@
Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Brad Strand <linux@3ware.com>
- Copyright (C) 1999-2007 3ware Inc.
+ Copyright (C) 1999-2005 3ware Inc.
Kernel compatiblity By: Andre Hedrick <andre@suse.com>
Non-Copyright (C) 2000 Andre Hedrick <andre@suse.com>
@@ -191,9 +191,6 @@
before shutting down card.
Change to new 'change_queue_depth' api.
Fix 'handled=1' ISR usage, remove bogus IRQ check.
- 1.26.02.002 - Free irq handler in __tw_shutdown().
- Turn on RCD bit for caching mode page.
- Serialize reset code.
*/
#include <linux/module.h>
@@ -217,7 +214,7 @@
#include "3w-xxxx.h"
/* Globals */
-#define TW_DRIVER_VERSION "1.26.02.002"
+#define TW_DRIVER_VERSION "1.26.02.001"
static TW_Device_Extension *tw_device_extension_list[TW_MAX_SLOT];
static int tw_device_extension_count = 0;
static int twe_major = -1;
@@ -229,7 +226,7 @@
MODULE_VERSION(TW_DRIVER_VERSION);
/* Function prototypes */
-static int tw_reset_device_extension(TW_Device_Extension *tw_dev);
+static int tw_reset_device_extension(TW_Device_Extension *tw_dev, int ioctl_reset);
/* Functions */
@@ -987,12 +984,24 @@
/* Now wait for the command to complete */
timeout = wait_event_timeout(tw_dev->ioctl_wqueue,
tw_dev->chrdev_request_id == TW_IOCTL_CHRDEV_FREE, timeout);
+ /* See if we reset while waiting for the ioctl to complete */
+ if (test_bit(TW_IN_RESET, &tw_dev->flags)) {
+ clear_bit(TW_IN_RESET, &tw_dev->flags);
+ retval = -ERESTARTSYS;
+ goto out2;
+ }
+
/* We timed out, and didn't get an interrupt */
if (tw_dev->chrdev_request_id != TW_IOCTL_CHRDEV_FREE) {
/* Now we need to reset the board */
printk(KERN_WARNING "3w-xxxx: scsi%d: Character ioctl (0x%x) timed
out, resetting card.\n", tw_dev->host->host_no, cmd);
retval = -EIO;
- if (tw_reset_device_extension(tw_dev)) {
+ spin_lock_irqsave(tw_dev->host->host_lock, flags);
+ tw_dev->state[request_id] = TW_S_COMPLETED;
+ tw_state_request_finish(tw_dev, request_id);
+ tw_dev->posted_request_count--;
+ spin_unlock_irqrestore(tw_dev->host->host_lock, flags);
+ if (tw_reset_device_extension(tw_dev, 1)) {
printk(KERN_WARNING "3w-xxxx: tw_chrdev_ioctl(): Reset
failed for card %d.\n", tw_dev->host->host_no);
}
goto out2;
@@ -1327,7 +1336,7 @@
} /* End tw_unmap_scsi_data() */
/* This function will reset a device extension */
-static int tw_reset_device_extension(TW_Device_Extension *tw_dev)
+static int tw_reset_device_extension(TW_Device_Extension *tw_dev, int ioctl_reset)
{
int i = 0;
struct scsi_cmnd *srb;
@@ -1373,10 +1382,15 @@
printk(KERN_WARNING "3w-xxxx: scsi%d: Reset sequence failed.\n",
tw_dev->host->host_no);
return 1;
}
-
TW_ENABLE_AND_CLEAR_INTERRUPTS(tw_dev);
- clear_bit(TW_IN_RESET, &tw_dev->flags);
- tw_dev->chrdev_request_id = TW_IOCTL_CHRDEV_FREE;
+
+ /* Wake up any ioctl that was pending before the reset */
+ if ((tw_dev->chrdev_request_id == TW_IOCTL_CHRDEV_FREE) || (ioctl_reset)) {
+ clear_bit(TW_IN_RESET, &tw_dev->flags);
+ } else {
+ tw_dev->chrdev_request_id = TW_IOCTL_CHRDEV_FREE;
+ wake_up(&tw_dev->ioctl_wqueue);
+ }
return 0;
} /* End tw_reset_device_extension() */
@@ -1423,18 +1437,14 @@
"WARNING: Command (0x%x) timed out, resetting card.\n",
SCpnt->cmnd[0]);
- /* Make sure we are not issuing an ioctl or resetting from ioctl */
- mutex_lock(&tw_dev->ioctl_lock);
-
/* Now reset the card and some of the device extension data */
- if (tw_reset_device_extension(tw_dev)) {
+ if (tw_reset_device_extension(tw_dev, 0)) {
printk(KERN_WARNING "3w-xxxx: scsi%d: Reset failed.\n", tw_dev->host->host_no);
goto out;
}
retval = SUCCESS;
out:
- mutex_unlock(&tw_dev->ioctl_lock);
return retval;
} /* End tw_scsi_eh_reset() */
@@ -1650,9 +1660,9 @@
request_buffer[4] = 0x8; /* caching page */
request_buffer[5] = 0xa; /* page length */
if (*flags & 0x1)
- request_buffer[6] = 0x5; /* WCE on, RCD on */
+ request_buffer[6] = 0x4; /* WCE on */
else
- request_buffer[6] = 0x1; /* WCE off, RCD on */
+ request_buffer[6] = 0x0; /* WCE off */
tw_transfer_internal(tw_dev, request_id, request_buffer,
sizeof(request_buffer));
@@ -2002,10 +2012,6 @@
int retval = 1;
TW_Device_Extension *tw_dev = (TW_Device_Extension *)SCpnt->device->host->hostdata;
- /* If we are resetting due to timed out ioctl, report as busy */
- if (test_bit(TW_IN_RESET, &tw_dev->flags))
- return SCSI_MLQUEUE_HOST_BUSY;
-
/* Save done function into Scsi_Cmnd struct */
SCpnt->scsi_done = done;
@@ -2072,7 +2078,8 @@
} /* End tw_scsi_queue() */
/* This function is the interrupt service routine */
-static irqreturn_t tw_interrupt(int irq, void *dev_instance)
+static irqreturn_t tw_interrupt(int irq, void *dev_instance,
+ struct pt_regs *regs)
{
int request_id;
u32 status_reg_value;
@@ -2094,10 +2101,6 @@
handled = 1;
- /* If we are resetting, bail */
- if (test_bit(TW_IN_RESET, &tw_dev->flags))
- goto tw_interrupt_bail;
-
/* Check controller for errors */
if (tw_check_bits(status_reg_value)) {
dprintk(KERN_WARNING "3w-xxxx: tw_interrupt(): Unexpected bits.\n");
@@ -2274,9 +2277,6 @@
/* Disable interrupts */
TW_DISABLE_INTERRUPTS(tw_dev);
- /* Free up the IRQ */
- free_irq(tw_dev->tw_pci_dev->irq, tw_dev);
-
printk(KERN_WARNING "3w-xxxx: Shutting down host %d.\n", tw_dev->host->host_no);
/* Tell the card we are shutting down */
@@ -2445,6 +2445,9 @@
twe_major = -1;
}
+ /* Free up the IRQ */
+ free_irq(tw_dev->tw_pci_dev->irq, tw_dev);
+
/* Shutdown the card */
__tw_shutdown(tw_dev);
@@ -2483,7 +2486,7 @@
{
printk(KERN_WARNING "3ware Storage Controller device driver for Linux v%s.\n",
TW_DRIVER_VERSION);
- return pci_register_driver(&tw_driver);
+ return pci_module_init(&tw_driver);
} /* End tw_init() */
/* This function is called on driver exit */
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 15:12 Andrew Burgess
@ 2007-04-06 15:27 ` James Bottomley
2007-04-06 18:32 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2007-04-06 15:27 UTC (permalink / raw)
To: Andrew Burgess; +Cc: cebbert, davem, linux-kernel, linux-scsi, adam radford
On Fri, 2007-04-06 at 08:12 -0700, Andrew Burgess wrote:
> Yes. The 3w-xxxx.c driver changed between 2.6.18 and 2.6.20 but
> nothing jumps out to my untrained eyes. Here's the diff:
> Also, I should mention that the working kernel is a fedora
> rpm (2.6.18-1.2798.fc6) so I don't know what patches are in it.
> The vmlinuz is dated Oct 6 2006.
It's actually a long standing bug in the 3w-xxxx driver. Apparently it
assumes request sense is always the use_sg == 0 case. This is what it
does on a request sense:
static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
{
dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
/* For now we just zero the request buffer */
memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
tw_dev->state[request_id] = TW_S_COMPLETED;
tw_state_request_finish(tw_dev, request_id);
....
Note that it's clearing the request buffer, which is actually zeroing the scatterlist, hence the problem.
James
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 15:27 ` James Bottomley
@ 2007-04-06 18:32 ` David Miller
2007-04-06 21:35 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2007-04-06 18:32 UTC (permalink / raw)
To: James.Bottomley; +Cc: aab, cebbert, linux-kernel, linux-scsi, aradford
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Fri, 06 Apr 2007 10:27:57 -0500
> On Fri, 2007-04-06 at 08:12 -0700, Andrew Burgess wrote:
> > Yes. The 3w-xxxx.c driver changed between 2.6.18 and 2.6.20 but
> > nothing jumps out to my untrained eyes. Here's the diff:
> > Also, I should mention that the working kernel is a fedora
> > rpm (2.6.18-1.2798.fc6) so I don't know what patches are in it.
> > The vmlinuz is dated Oct 6 2006.
>
> It's actually a long standing bug in the 3w-xxxx driver. Apparently it
> assumes request sense is always the use_sg == 0 case. This is what it
> does on a request sense:
Thanks for figuring this out James. I was very worried there
was some bug in my scsi_send_eh_cmnd() fix :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 18:32 ` David Miller
@ 2007-04-06 21:35 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2007-04-06 21:35 UTC (permalink / raw)
To: David Miller; +Cc: aab, cebbert, linux-kernel, linux-scsi, aradford
On Fri, 2007-04-06 at 11:32 -0700, David Miller wrote:
> From: James Bottomley <James.Bottomley@SteelEye.com>
> Date: Fri, 06 Apr 2007 10:27:57 -0500
>
> > On Fri, 2007-04-06 at 08:12 -0700, Andrew Burgess wrote:
> > > Yes. The 3w-xxxx.c driver changed between 2.6.18 and 2.6.20 but
> > > nothing jumps out to my untrained eyes. Here's the diff:
> > > Also, I should mention that the working kernel is a fedora
> > > rpm (2.6.18-1.2798.fc6) so I don't know what patches are in it.
> > > The vmlinuz is dated Oct 6 2006.
> >
> > It's actually a long standing bug in the 3w-xxxx driver. Apparently it
> > assumes request sense is always the use_sg == 0 case. This is what it
> > does on a request sense:
>
> Thanks for figuring this out James. I was very worried there
> was some bug in my scsi_send_eh_cmnd() fix :)
You're welcome ... the code in the fix looked fine, which is why I was
initially suspicious ... however, the bug in __free_page() on reversion
pretty much confirmed a driver issue.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
@ 2007-04-05 22:13 Andrew Burgess
2007-04-05 22:36 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Burgess @ 2007-04-05 22:13 UTC (permalink / raw)
To: davem, linux-kernel, linux-scsi
Chuck Ebbert wrote:
>Andrew Burgess wrote:
>
>> Apr 5 03:45:16 cichlid kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x7f, unit #4.
>> Apr 5 03:45:20 cichlid kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7, flags = 0x80, unit #4.
>> Apr 5 03:47:20 cichlid kernel: 3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x80, unit #0.
>> Apr 5 03:47:20 cichlid kernel: 3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x80, unit #1.
>> Apr 5 04:00:08 cichlid kernel: 3w-xxxx: scsi0: Command failed: status = 0xc7, flags = 0x80, unit #0.
>...<BUG inside oops handler deleted>...
>> Apr 5 04:00:08 cichlid kernel:
>> Apr 5 04:00:08 cichlid kernel: general protection fault: 0000 [1] PREEMPT SMP
>> Apr 5 04:00:08 cichlid kernel: CPU 1
>> Apr 5 04:00:08 cichlid kernel: Modules linked in: dm_multipath multipath linear raid456 xor raid1 md_mod act_police sch_ingress sch_sfq sch_cbq ipt_TOS cls_u32 sch_htb ipt_MASQUERADE ipt_LOG xt_multiport nf_nat_ftp nf_conntrack_ftp iptable_mangle iptable_nat nf_nat emi26 w83627hf hwmon_vid i2c_isa sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables x_tables freq_table sr_mod loop dm_mirror dm_mod video thermal sbs processor i2c_ec fan dock button battery asus_acpi ac parport_pc lp parport floppy nvram snd_usb_audio snd_ice1712 snd_ice17xx_ak4xxx snd_via82xx sg gameport snd_seq_dummy pcspkr snd_ak4xxx_adda snd_cs8427 snd_via82xx_modem snd_seq_oss snd_ac97_codec sata_via snd_i2c snd_seq_midi_event snd_seq skge i2c_viapro snd_mpu401_uart i2c_core k8temp ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc dsbr100 snd_usb_
lib snd_rawmidi compat_ioctl32 snd_seq_device videodev v4l2_common v4l1_compat snd_hwdep !
snd soundcore sisusbvga ftdi_sio usb_stor
>> Apr 5 04:00:08 cichlid kernel: ge serio_raw emi62 usbserial asix usbnet ata_piix 3w_xxxx ata_generic pata_via libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
>> Apr 5 04:00:08 cichlid kernel: Pid: 386, comm: scsi_eh_0 Tainted: G M 2.6.21-rc5-git10-1-slab-debug #1
>> Apr 5 04:00:08 cichlid kernel: RIP: 0010:[memcpy_c+11/32] [memcpy_c+11/32] memcpy_c+0xb/0x20
>> Apr 5 04:00:08 cichlid kernel: RIP: 0010:[<ffffffff8026a7cb>] [<ffffffff8026a7cb>] memcpy_c+0xb/0x20
>> Apr 5 04:00:08 cichlid kernel: RSP: 0000:ffff8100beebbce8 EFLAGS: 00010246
>> Apr 5 04:00:08 cichlid kernel: RAX: ffff8100b4978140 RBX: 0000000000002003 RCX: 000000000000000c
>> Apr 5 04:00:08 cichlid kernel: RDX: 0000000000000000 RSI: 6ddaa59200000000 RDI: ffff8100b4978140
>> Apr 5 04:00:08 cichlid kernel: RBP: ffff8100beebbe20 R08: 0000000000000002 R09: 0000000000000001
>> Apr 5 04:00:08 cichlid kernel: R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8100b4978140
>> Apr 5 04:00:08 cichlid kernel: R13: ffffffff8807a1ce R14: ffff8100b4978058 R15: ffff8100beebc000
>> Apr 5 04:00:08 cichlid kernel: FS: 00002b615f4dc240(0000) GS:ffff8100bffae4c8(0000) knlGS:00000000f795eb90
>> Apr 5 04:00:08 cichlid kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> Apr 5 04:00:08 cichlid kernel: CR2: 00002b615f4b9000 CR3: 00000000a25ad000 CR4: 00000000000006e0
>> Apr 5 04:00:08 cichlid kernel: Process scsi_eh_0 (pid: 386, threadinfo ffff8100beeba000, task ffff8100bedd0100)
>> Apr 5 04:00:08 cichlid kernel: Stack: ffffffff88055efc 0000271000000001 ffff8100b49780b4 ffff8100bef94508
>> Apr 5 04:00:08 cichlid kernel: 0000000200000002 000010000a240001 ffff810013404c78 0000000000000000
>> Apr 5 04:00:08 cichlid kernel: ffff810000000001 ffffffffdead4ead ffffffffffffffff 0000000000000000
>> Apr 5 04:00:08 cichlid kernel: Call Trace:
>> Apr 5 04:00:08 cichlid kernel: [_end+123674792/2126102956] :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff88055efc>] :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480
>> Apr 5 04:00:08 cichlid kernel: [thread_return+230/301] thread_return+0xe6/0x12d
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff8026b699>] thread_return+0xe6/0x12d
>> Apr 5 04:00:08 cichlid kernel: [_end+123678124/2126102956] :scsi_mod:scsi_error_handler+0x0/0x540
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff88056c00>] :scsi_mod:scsi_error_handler+0x0/0x540
>> Apr 5 04:00:08 cichlid kernel: [keventd_create_kthread+0/144] keventd_create_kthread+0x0/0x90
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff802a21c0>] keventd_create_kthread+0x0/0x90
>> Apr 5 04:00:08 cichlid kernel: [kthread+218/272] kthread+0xda/0x110
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff80237b1a>] kthread+0xda/0x110
>> Apr 5 04:00:08 cichlid autoblacklist[9591]: src= proto= srcport= destport= srcname= destportname= srcportname= icmptype= icmpcode=
>> Apr 5 04:00:08 cichlid kernel: [child_rip+10/18] child_rip+0xa/0x12
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff80268ea8>] child_rip+0xa/0x12
>> Apr 5 04:00:08 cichlid kernel: [schedule_tail+124/240] schedule_tail+0x7c/0xf0
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff8022b90c>] schedule_tail+0x7c/0xf0
>> Apr 5 04:00:08 cichlid kernel: [restore_args+0/48] restore_args+0x0/0x30
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff80268590>] restore_args+0x0/0x30
>> Apr 5 04:00:08 cichlid kernel: [kthread+0/272] kthread+0x0/0x110
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff80237a40>] kthread+0x0/0x110
>> Apr 5 04:00:08 cichlid kernel: [child_rip+0/18] child_rip+0x0/0x12
>> Apr 5 04:00:08 cichlid kernel: [<ffffffff80268e9e>] child_rip+0x0/0x12
>> Apr 5 04:00:08 cichlid kernel:
>> Apr 5 04:00:08 cichlid kernel:
>> Apr 5 04:00:08 cichlid kernel: Code: f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 0f 1f 80 00 00 00
>> Apr 5 04:00:08 cichlid kernel: RIP [memcpy_c+11/32] memcpy_c+0xb/0x20
>> Apr 5 04:00:08 cichlid kernel: RIP [<ffffffff8026a7cb>] memcpy_c+0xb/0x20
>> Apr 5 04:00:08 cichlid kernel: RSP <ffff8100beebbce8>
> Probably fixed by this:
>
> Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8cc574a3c5cea70229f243a6b57fd69e60491d82
> Commit: 8cc574a3c5cea70229f243a6b57fd69e60491d82
> Parent: d80f0a4beb15d817bfbb18a29e5ffc1d9dc353ea
> Author: David S. Miller <davem@sunset.davemloft.net>
> AuthorDate: Mon Apr 2 14:21:55 2007 -0700
> Committer: David S. Miller <davem@sunset.davemloft.net>
> CommitDate: Mon Apr 2 14:26:22 2007 -0700
>
> [SCSI]: Fix scsi_send_eh_cmnd scatterlist handling
>
> This fixes a regression caused by commit:
>
> 2dc611de5a3fd955cd0298c50691d4c05046db97
Sadly not. That patch was already in git10. I just tried
git13 and got the same memcpy oops in scsi_send_eh_cmnd.
David, do you see any other problems with scsi_send_eh_cmnd?
I've switched back to 2.6.18 which seems to not oops
and am happy to try patches.
Andrew
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-05 22:13 Andrew Burgess
@ 2007-04-05 22:36 ` David Miller
2007-04-06 0:02 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2007-04-05 22:36 UTC (permalink / raw)
To: aab; +Cc: linux-kernel, linux-scsi
From: Andrew Burgess <aab@cichlid.com>
Date: Thu, 5 Apr 2007 15:13:27 -0700
> David, do you see any other problems with scsi_send_eh_cmnd?
>
> I've switched back to 2.6.18 which seems to not oops
> and am happy to try patches.
Does 2.6.20 with my patch OOPS too? Does reverting my patch
make the oops go away?
If reverting my patch makes the OOPS go away, we need to
verify if page_address() is returning crap for some reason
or the length is wrong.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-05 22:36 ` David Miller
@ 2007-04-06 0:02 ` James Bottomley
2007-04-06 0:15 ` David Miller
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2007-04-06 0:02 UTC (permalink / raw)
To: David Miller; +Cc: aab, linux-kernel, linux-scsi
On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> From: Andrew Burgess <aab@cichlid.com>
> Date: Thu, 5 Apr 2007 15:13:27 -0700
>
> > David, do you see any other problems with scsi_send_eh_cmnd?
> >
> > I've switched back to 2.6.18 which seems to not oops
> > and am happy to try patches.
>
> Does 2.6.20 with my patch OOPS too? Does reverting my patch
> make the oops go away?
>
> If reverting my patch makes the OOPS go away, we need to
> verify if page_address() is returning crap for some reason
> or the length is wrong.
Assuming this does turn out to be the problem, we should just junk the
page allocation ... it's completely unnecessary; when the slab allocated
commands were done, we made sure the actual sense_buffer is at the
correct location, so this should be the final fix:
James
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index adb40f2..997532b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -18,12 +18,12 @@
#include <linux/sched.h>
#include <linux/timer.h>
#include <linux/string.h>
-#include <linux/slab.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/interrupt.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
+#include <linux/scatterlist.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -641,16 +641,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
memcpy(scmd->cmnd, cmnd, cmnd_size);
if (copy_sense) {
- gfp_t gfp_mask = GFP_ATOMIC;
-
- if (shost->hostt->unchecked_isa_dma)
- gfp_mask |= __GFP_DMA;
-
- sgl.page = alloc_page(gfp_mask);
- if (!sgl.page)
- return FAILED;
- sgl.offset = 0;
- sgl.length = 252;
+ sg_init_one(&sgl, scmd->sense_buffer,
+ sizeof(scmd->sense_buffer));
scmd->sc_data_direction = DMA_FROM_DEVICE;
scmd->request_bufflen = sgl.length;
@@ -721,18 +713,6 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
/*
- * Last chance to have valid sense data.
- */
- if (copy_sense) {
- if (!SCSI_SENSE_VALID(scmd)) {
- memcpy(scmd->sense_buffer, page_address(sgl.page),
- sizeof(scmd->sense_buffer));
- }
- __free_page(sgl.page);
- }
-
-
- /*
* Restore original data
*/
scmd->request_buffer = old_buffer;
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 0:02 ` James Bottomley
@ 2007-04-06 0:15 ` David Miller
2007-04-06 0:51 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2007-04-06 0:15 UTC (permalink / raw)
To: James.Bottomley; +Cc: aab, linux-kernel, linux-scsi
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Thu, 05 Apr 2007 19:02:19 -0500
> On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> > From: Andrew Burgess <aab@cichlid.com>
> > Date: Thu, 5 Apr 2007 15:13:27 -0700
> >
> > > David, do you see any other problems with scsi_send_eh_cmnd?
> > >
> > > I've switched back to 2.6.18 which seems to not oops
> > > and am happy to try patches.
> >
> > Does 2.6.20 with my patch OOPS too? Does reverting my patch
> > make the oops go away?
> >
> > If reverting my patch makes the OOPS go away, we need to
> > verify if page_address() is returning crap for some reason
> > or the length is wrong.
>
> Assuming this does turn out to be the problem, we should just junk the
> page allocation ... it's completely unnecessary; when the slab allocated
> commands were done, we made sure the actual sense_buffer is at the
> correct location, so this should be the final fix:
This won't work I believe.
There are cases that use smaller sense buffers than the minimum
specified by the SCSI layer.
One example is that do_sr_ioctl() stuff when the cgc passed
in has a sense buffer. That will only be as large as a
"struct request_sense".
I'm pretty sure that's one of the reasons why we cons up a local sense
buffer in this EH code.
So we could walk past the end of that and corrupt memory with
your patch.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13
2007-04-06 0:15 ` David Miller
@ 2007-04-06 0:51 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2007-04-06 0:51 UTC (permalink / raw)
To: David Miller; +Cc: aab, linux-kernel, linux-scsi
On Thu, 2007-04-05 at 17:15 -0700, David Miller wrote:
> This won't work I believe.
>
> There are cases that use smaller sense buffers than the minimum
> specified by the SCSI layer.
>
> One example is that do_sr_ioctl() stuff when the cgc passed
> in has a sense buffer. That will only be as large as a
> "struct request_sense".
>
> I'm pretty sure that's one of the reasons why we cons up a local sense
> buffer in this EH code.
>
> So we could walk past the end of that and corrupt memory with
> your patch.
That should be fine ... the application copies the sense out of
scmnd->sense_buffer ... it can take as much or as little as it wants
(sense_buffer is actually a SCSI_SENSE_BUFFERSIZE array inside the
command). There was one thing I missed, which is that the sense buffer
size of the command is 252, whereas I need to set it back down to
sizeof(scmnd->sense_buffer).
This is another area where we "could do better" ... the request actually
gives us a sense buffer, but we use our own and later copy data out of
it back into the request.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-04-06 21:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-06 2:21 Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13 Andrew Burgess
2007-04-06 13:44 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2007-04-06 18:18 Andrew Burgess
2007-04-06 15:51 Andrew Burgess
2007-04-06 16:14 ` James Bottomley
2007-04-06 16:32 ` Douglas Gilbert
2007-04-06 16:47 ` James Bottomley
2007-04-06 15:12 Andrew Burgess
2007-04-06 15:27 ` James Bottomley
2007-04-06 18:32 ` David Miller
2007-04-06 21:35 ` James Bottomley
2007-04-05 22:13 Andrew Burgess
2007-04-05 22:36 ` David Miller
2007-04-06 0:02 ` James Bottomley
2007-04-06 0:15 ` David Miller
2007-04-06 0:51 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox