* Kernel Oops with 2.6.15
@ 2006-01-03 14:32 Ralf Müller
2006-01-04 3:10 ` Albert Lee
0 siblings, 1 reply; 7+ messages in thread
From: Ralf Müller @ 2006-01-03 14:32 UTC (permalink / raw)
To: linux-kernel; +Cc: jgarzik, linux-ide
The kernel oops reported for 2.6.15-rc7 when calling hddtemp on a sata
drive which is in standby mode is still present in 2.6.15. Maybe someone
is interested.
Regards
Ralf
Jan 3 15:21:35 DatenGrab kernel: ata1: unknown timeout, cmd 0xb0 stat 0xff
Jan 3 15:21:35 DatenGrab kernel: ata1: translated ATA stat/err 0xff/00
to SCSI SK/ASC/ASCQ 0xb/47/00
Jan 3 15:21:35 DatenGrab kernel: ata1: status=0xff { Busy }
Jan 3 15:21:35 DatenGrab kernel: Assertion failed! qc !=
NULL,drivers/scsi/libata-core.c,ata_pio_block,line=3216
Jan 3 15:21:35 DatenGrab kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000014
Jan 3 15:21:35 DatenGrab kernel: printing eip:
Jan 3 15:21:35 DatenGrab kernel: e02d9a3e
Jan 3 15:21:35 DatenGrab kernel: *pde = 00000000
Jan 3 15:21:35 DatenGrab kernel: Oops: 0000 [#1]
Jan 3 15:21:35 DatenGrab kernel: Modules linked in: edd ipv6 nfsd
w83627hf hwmon_vid hwmon eeprom i2c_isa button battery ac af_packet xfs
exportfs reiserfs ohci1394 ieee1394 sk98lin i2c_i801 i2c_core generic
i8xx_tco shpchp pci_hotplug uhci_hcd intel_agp agpgart raid5 xor
parport_pc lp parport sata_promise libata dm_mod sg skge ohci_hcd
ehci_hcd usb_storage usbcore fan thermal processor piix sd_mod scsi_mod
ide_disk ide_core
Jan 3 15:21:35 DatenGrab kernel: CPU: 0
Jan 3 15:21:35 DatenGrab kernel: EIP: 0060:[<e02d9a3e>] Not
tainted VLI
Jan 3 15:21:35 DatenGrab kernel: EFLAGS: 00010292 (2.6.15-default)
Jan 3 15:21:35 DatenGrab kernel: EIP is at ata_pio_block+0xb9/0xfd [libata]
Jan 3 15:21:35 DatenGrab kernel: eax: 00000056 ebx: c14d6284 ecx:
00000000 edx: 00000000
Jan 3 15:21:35 DatenGrab kernel: esi: 55b17c50 edi: 00000000 ebp:
c14d6284 esp: def3df70
Jan 3 15:21:35 DatenGrab kernel: ds: 007b es: 007b ss: 0068
Jan 3 15:21:35 DatenGrab kernel: Process ata/0 (pid: 2179,
threadinfo=def3c000 task=deefc580)
Jan 3 15:21:35 DatenGrab kernel: Stack: c14d6284 deefb3c0 00000287
e02d9b00 c14d683c c01230b6 e02d9ae3 00000001
Jan 3 15:21:35 DatenGrab kernel: 00000000 00000000 00010000
00000000 00000000 deefc580 c01150bc 00100100
Jan 3 15:21:35 DatenGrab kernel: 00200200 ffffffff ffffffff
def3c000 de189f54 deefb3c0 c0122f77 c0125c9f
Jan 3 15:21:35 DatenGrab kernel: Call Trace:
Jan 3 15:21:35 DatenGrab kernel: [<e02d9b00>] ata_pio_task+0x1d/0x54
[libata]
Jan 3 15:21:35 DatenGrab kernel: [<c01230b6>] worker_thread+0x13f/0x19d
Jan 3 15:21:35 DatenGrab kernel: [<e02d9ae3>] ata_pio_task+0x0/0x54
[libata]
Jan 3 15:21:35 DatenGrab kernel: [<c01150bc>]
default_wake_function+0x0/0xc
Jan 3 15:21:35 DatenGrab kernel: [<c0122f77>] worker_thread+0x0/0x19d
Jan 3 15:21:35 DatenGrab kernel: [<c0125c9f>] kthread+0x63/0x8f
Jan 3 15:21:35 DatenGrab kernel: [<c0125c3c>] kthread+0x0/0x8f
Jan 3 15:21:35 DatenGrab kernel: [<c0101281>] kernel_thread_helper+0x5/0xb
Jan 3 15:21:35 DatenGrab kernel: Code: 89 df 81 c7 d4 04 00 00 75 21 68
90 0c 00 00 68 fb cd 2d e0 68 ef cf 2d e0 68 b0 d5 2d e0 68 61 d0 2d e0
e8 19 e1 e3 df 83 c4 14 <8a> 47 14 83 e8 05 3c 02 77 1b 83 e6 08 75 0c
c7 83 e4 05 00 00
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel Oops with 2.6.15
2006-01-03 14:32 Kernel Oops with 2.6.15 Ralf Müller
@ 2006-01-04 3:10 ` Albert Lee
2006-01-04 9:29 ` Ralf Müller
0 siblings, 1 reply; 7+ messages in thread
From: Albert Lee @ 2006-01-04 3:10 UTC (permalink / raw)
To: Ralf Müller; +Cc: linux-kernel, jgarzik, linux-ide
Ralf Müller wrote:
> The kernel oops reported for 2.6.15-rc7 when calling hddtemp on a sata
> drive which is in standby mode is still present in 2.6.15. Maybe someone
> is interested.
>
> Regards
> Ralf
>
Hi Ralf,
It looks like the command is running in PIO polling thread.
The timeout handler is called and the qc is cleared.
So, the PIO polling thread got NULL qc.
Could you post more trace before the timeout occurs, thanks,
Albert
> Jan 3 15:21:35 DatenGrab kernel: ata1: unknown timeout, cmd 0xb0 stat 0xff
> Jan 3 15:21:35 DatenGrab kernel: ata1: translated ATA stat/err 0xff/00
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Jan 3 15:21:35 DatenGrab kernel: ata1: status=0xff { Busy }
> Jan 3 15:21:35 DatenGrab kernel: Assertion failed! qc !=
> NULL,drivers/scsi/libata-core.c,ata_pio_block,line=3216
> Jan 3 15:21:35 DatenGrab kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000014
> Jan 3 15:21:35 DatenGrab kernel: printing eip:
> Jan 3 15:21:35 DatenGrab kernel: e02d9a3e
> Jan 3 15:21:35 DatenGrab kernel: *pde = 00000000
> Jan 3 15:21:35 DatenGrab kernel: Oops: 0000 [#1]
> Jan 3 15:21:35 DatenGrab kernel: Modules linked in: edd ipv6 nfsd
> w83627hf hwmon_vid hwmon eeprom i2c_isa button battery ac af_packet xfs
> exportfs reiserfs ohci1394 ieee1394 sk98lin i2c_i801 i2c_core generic
> i8xx_tco shpchp pci_hotplug uhci_hcd intel_agp agpgart raid5 xor
> parport_pc lp parport sata_promise libata dm_mod sg skge ohci_hcd
> ehci_hcd usb_storage usbcore fan thermal processor piix sd_mod scsi_mod
> ide_disk ide_core
> Jan 3 15:21:35 DatenGrab kernel: CPU: 0
> Jan 3 15:21:35 DatenGrab kernel: EIP: 0060:[<e02d9a3e>] Not
> tainted VLI
> Jan 3 15:21:35 DatenGrab kernel: EFLAGS: 00010292 (2.6.15-default)
> Jan 3 15:21:35 DatenGrab kernel: EIP is at ata_pio_block+0xb9/0xfd
> [libata]
> Jan 3 15:21:35 DatenGrab kernel: eax: 00000056 ebx: c14d6284 ecx:
> 00000000 edx: 00000000
> Jan 3 15:21:35 DatenGrab kernel: esi: 55b17c50 edi: 00000000 ebp:
> c14d6284 esp: def3df70
> Jan 3 15:21:35 DatenGrab kernel: ds: 007b es: 007b ss: 0068
> Jan 3 15:21:35 DatenGrab kernel: Process ata/0 (pid: 2179,
> threadinfo=def3c000 task=deefc580)
> Jan 3 15:21:35 DatenGrab kernel: Stack: c14d6284 deefb3c0 00000287
> e02d9b00 c14d683c c01230b6 e02d9ae3 00000001
> Jan 3 15:21:35 DatenGrab kernel: 00000000 00000000 00010000
> 00000000 00000000 deefc580 c01150bc 00100100
> Jan 3 15:21:35 DatenGrab kernel: 00200200 ffffffff ffffffff
> def3c000 de189f54 deefb3c0 c0122f77 c0125c9f
> Jan 3 15:21:35 DatenGrab kernel: Call Trace:
> Jan 3 15:21:35 DatenGrab kernel: [<e02d9b00>] ata_pio_task+0x1d/0x54
> [libata]
> Jan 3 15:21:35 DatenGrab kernel: [<c01230b6>] worker_thread+0x13f/0x19d
> Jan 3 15:21:35 DatenGrab kernel: [<e02d9ae3>] ata_pio_task+0x0/0x54
> [libata]
> Jan 3 15:21:35 DatenGrab kernel: [<c01150bc>]
> default_wake_function+0x0/0xc
> Jan 3 15:21:35 DatenGrab kernel: [<c0122f77>] worker_thread+0x0/0x19d
> Jan 3 15:21:35 DatenGrab kernel: [<c0125c9f>] kthread+0x63/0x8f
> Jan 3 15:21:35 DatenGrab kernel: [<c0125c3c>] kthread+0x0/0x8f
> Jan 3 15:21:35 DatenGrab kernel: [<c0101281>]
> kernel_thread_helper+0x5/0xb
> Jan 3 15:21:35 DatenGrab kernel: Code: 89 df 81 c7 d4 04 00 00 75 21 68
> 90 0c 00 00 68 fb cd 2d e0 68 ef cf 2d e0 68 b0 d5 2d e0 68 61 d0 2d e0
> e8 19 e1 e3 df 83 c4 14 <8a> 47 14 83 e8 05 3c 02 77 1b 83 e6 08 75 0c
> c7 83 e4 05 00 00
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel Oops with 2.6.15
2006-01-04 3:10 ` Albert Lee
@ 2006-01-04 9:29 ` Ralf Müller
2006-01-06 3:41 ` [PATCH/RFC] libata-dev: update timeout for PIO polling Albert Lee
0 siblings, 1 reply; 7+ messages in thread
From: Ralf Müller @ 2006-01-04 9:29 UTC (permalink / raw)
To: Albert Lee; +Cc: linux-kernel, jgarzik, linux-ide
Albert Lee schrieb:
> It looks like the command is running in PIO polling thread.
> The timeout handler is called and the qc is cleared.
> So, the PIO polling thread got NULL qc.
>
> Could you post more trace before the timeout occurs, thanks,
Sure. Here it is:
Jan 3 15:21:32 DatenGrab kernel: ata1: command timeout
Jan 3 15:21:32 DatenGrab kernel: ATA: abnormal status 0xFF on port
0xE006821C
Jan 3 15:21:32 DatenGrab kernel: ata1: translated ATA stat/err 0xff/00
to SCSI SK/ASC/ASCQ 0xb/47/00
Jan 3 15:21:32 DatenGrab kernel: ata1: status=0xff { Busy }
Jan 3 15:21:32 DatenGrab kernel: ATA: abnormal status 0xFF on port
0xE006821C
Jan 3 15:21:32 DatenGrab kernel: ATA: abnormal status 0xFF on port
0xE006821C
Jan 3 15:21:32 DatenGrab kernel: ATA: abnormal status 0xFF on port
0xE006821C
Jan 3 15:21:32 DatenGrab kernel: ATA: abnormal status 0xFF on port
0xE006821C
Jan 3 15:21:35 DatenGrab kernel: ata1: unknown timeout, cmd 0xb0 stat 0xff
Jan 3 15:21:35 DatenGrab kernel: ata1: translated ATA stat/err 0xff/00
to SCSI SK/ASC/ASCQ 0xb/47/00
Jan 3 15:21:35 DatenGrab kernel: ata1: status=0xff { Busy }
Jan 3 15:21:35 DatenGrab kernel: Assertion failed! qc !=
NULL,drivers/scsi/libata-core.c,ata_pio_block,line=3216
Jan 3 15:21:35 DatenGrab kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000014
Jan 3 15:21:35 DatenGrab kernel: printing eip:
Jan 3 15:21:35 DatenGrab kernel: e02d9a3e
Jan 3 15:21:35 DatenGrab kernel: *pde = 00000000
Jan 3 15:21:35 DatenGrab kernel: Oops: 0000 [#1]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC] libata-dev: update timeout for PIO polling
2006-01-04 9:29 ` Ralf Müller
@ 2006-01-06 3:41 ` Albert Lee
2006-01-18 0:52 ` Jeff Garzik
0 siblings, 1 reply; 7+ messages in thread
From: Albert Lee @ 2006-01-06 3:41 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Ralf Müller, Doug Maxey, Linux IDE
Jeff,
In the bug reported by Ralf Müller:
(http://marc.theaimsgroup.com/?l=linux-ide&m=113629906809278&w=2)
When the command times out, the qc is completed by the time
out handler. This might cause null pointer dereference if
the command is running in the PIO polling thread.
Changes:
- Update the timeout for PIO polling to 5 minutes.
This should be long enough to prevent the timeout handler from
interfering with the PIO polling thread.
Patch against the libata-dev irq-pio branch
(61420e147a706ee7c7a902008045547fb2a2a330).
For your review, thanks.
Albert
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
==========
--- irq-pio/include/linux/libata.h 2006-01-05 15:16:32.000000000 +0800
+++ update_tmout/include/linux/libata.h 2006-01-06 10:17:34.000000000 +0800
@@ -141,6 +141,7 @@ enum {
ATA_TMOUT_CDB_QUICK = 5 * HZ,
ATA_TMOUT_INTERNAL = 30 * HZ,
ATA_TMOUT_INTERNAL_QUICK = 5 * HZ,
+ ATA_TMOUT_LONG = 300 * HZ, /* looks long enough */
/* ATA bus states */
BUS_UNKNOWN = 0,
--- irq-pio/drivers/scsi/libata-core.c 2006-01-05 15:16:29.000000000 +0800
+++ update_tmout/drivers/scsi/libata-core.c 2006-01-06 10:41:23.000000000 +0800
@@ -3866,6 +3866,21 @@ int ata_qc_issue_prot(struct ata_queued_
}
}
+ /* The PIO polling thread has its own timeout handling.
+ * It won't loop forever. Even if it takes a long time,
+ * we can wait the PIO polling thread to finish its work.
+ * No need to stop the polling thread from the timeout handler.
+ * So the timeout is updated to a long time here. This can prevent
+ * the timeout handler from interfering with the PIO polling thread.
+ */
+ if (qc->tf.flags & ATA_TFLAG_POLLING &&
+ qc->scsicmd &&
+ qc->scsicmd->eh_timeout.function) {
+ DPRINTK("ata%u: timer updated for PIO polling\n", ap->id);
+ mod_timer(&qc->scsicmd->eh_timeout,
+ jiffies + ATA_TMOUT_LONG);
+ }
+
/* select the device */
ata_dev_select(ap, qc->dev->devno, 1, 0);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH/RFC] libata-dev: update timeout for PIO polling
2006-01-06 3:41 ` [PATCH/RFC] libata-dev: update timeout for PIO polling Albert Lee
@ 2006-01-18 0:52 ` Jeff Garzik
2006-01-20 14:19 ` [PATCH/RFC] libata-dev: update timer for PIO polling - revised Albert Lee
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2006-01-18 0:52 UTC (permalink / raw)
To: Albert Lee; +Cc: Ralf Müller, Doug Maxey, Linux IDE
Albert Lee wrote:
> Jeff,
>
> In the bug reported by Ralf Müller:
> (http://marc.theaimsgroup.com/?l=linux-ide&m=113629906809278&w=2)
>
> When the command times out, the qc is completed by the time
> out handler. This might cause null pointer dereference if
> the command is running in the PIO polling thread.
>
> Changes:
> - Update the timeout for PIO polling to 5 minutes.
> This should be long enough to prevent the timeout handler from
> interfering with the PIO polling thread.
>
> Patch against the libata-dev irq-pio branch
> (61420e147a706ee7c7a902008045547fb2a2a330).
>
> For your review, thanks.
>
> Albert
> Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
>
> ==========
>
> --- irq-pio/include/linux/libata.h 2006-01-05 15:16:32.000000000 +0800
> +++ update_tmout/include/linux/libata.h 2006-01-06 10:17:34.000000000 +0800
> @@ -141,6 +141,7 @@ enum {
> ATA_TMOUT_CDB_QUICK = 5 * HZ,
> ATA_TMOUT_INTERNAL = 30 * HZ,
> ATA_TMOUT_INTERNAL_QUICK = 5 * HZ,
> + ATA_TMOUT_LONG = 300 * HZ, /* looks long enough */
>
> /* ATA bus states */
> BUS_UNKNOWN = 0,
> --- irq-pio/drivers/scsi/libata-core.c 2006-01-05 15:16:29.000000000 +0800
> +++ update_tmout/drivers/scsi/libata-core.c 2006-01-06 10:41:23.000000000 +0800
> @@ -3866,6 +3866,21 @@ int ata_qc_issue_prot(struct ata_queued_
> }
> }
>
> + /* The PIO polling thread has its own timeout handling.
> + * It won't loop forever. Even if it takes a long time,
> + * we can wait the PIO polling thread to finish its work.
> + * No need to stop the polling thread from the timeout handler.
> + * So the timeout is updated to a long time here. This can prevent
> + * the timeout handler from interfering with the PIO polling thread.
> + */
> + if (qc->tf.flags & ATA_TFLAG_POLLING &&
> + qc->scsicmd &&
> + qc->scsicmd->eh_timeout.function) {
> + DPRINTK("ata%u: timer updated for PIO polling\n", ap->id);
> + mod_timer(&qc->scsicmd->eh_timeout,
> + jiffies + ATA_TMOUT_LONG);
> + }
While this may solve the problem, unfortunately we really don't want to
be directly touching the scsi EH timer.
A better method would be to implement ->eh_timed_out() SCSI host
template hook, which is the preferred way to accomplish what you are
attempting here.
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC] libata-dev: update timer for PIO polling - revised
2006-01-18 0:52 ` Jeff Garzik
@ 2006-01-20 14:19 ` Albert Lee
2006-01-21 1:05 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Albert Lee @ 2006-01-20 14:19 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Ralf Müller, Doug Maxey, Linux IDE
Jeff,
Revised per your advice to use the eh_timed_out() handler.
Description:
In the bug reported by Ralf Müller:
(http://marc.theaimsgroup.com/?l=linux-ide&m=113629906809278&w=2)
When the scsi command times out, the qc is completed by the time
out handler. This might cause null pointer dereference if
the qc is running in the PIO polling thread.
Changes:
Add ata_scsi_timed_out() to libata-scsi.
If the qc times out when running in PIO mode, EH_RESET_TIMER is returned.
Otherwise, EH_NOT_HANDLED is returned and the eh_strategy_handler()
is triggered as before.
(Will add ata_scsi_timed_out() to LLDs later if this is ok.)
Patch against the libata-dev irq-pio branch
(000080c3499cd5037e60c08a8053efb9e48aa9c0).
For your review, thanks.
Albert
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
=====
--- irq-pio/include/linux/libata.h 2006-01-20 10:35:11.000000000 +0800
+++ eh_timed_out/include/linux/libata.h 2006-01-20 10:48:29.000000000 +0800
@@ -449,6 +449,7 @@ extern void ata_host_set_remove(struct a
extern int ata_scsi_detect(struct scsi_host_template *sht);
extern int ata_scsi_ioctl(struct scsi_device *dev, int cmd, void __user *arg);
extern int ata_scsi_queuecmd(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *));
+extern enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *);
extern int ata_scsi_error(struct Scsi_Host *host);
extern int ata_scsi_release(struct Scsi_Host *host);
extern unsigned int ata_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc);
--- irq-pio/drivers/scsi/libata-scsi.c 2006-01-20 10:35:05.000000000 +0800
+++ eh_timed_out/drivers/scsi/libata-scsi.c 2006-01-20 22:13:14.000000000 +0800
@@ -717,6 +717,59 @@ int ata_scsi_slave_config(struct scsi_de
}
/**
+ * ata_scsi_timed_out - SCSI layer time out handler callback
+ * @cmd: SCSI cmd on which time out occurred
+ *
+ * Handles SCSI-layer-thrown time out events.
+ *
+ * RETURNS:
+ * EH_HANDLED if qc was completed
+ * EH_RESET_TIMER for polling case
+ * EH_NOT_HANDLED otherwise.
+ */
+enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *cmd)
+{
+ struct scsi_device *scsidev = cmd->device;
+ struct ata_port *ap = (struct ata_port *) &scsidev->host->hostdata[0];
+ struct ata_host_set *host_set = ap->host_set;
+ struct ata_queued_cmd *qc;
+ unsigned long flags;
+ int rc = EH_NOT_HANDLED;
+
+ DPRINTK("ENTER\n");
+ spin_lock_irqsave(&host_set->lock, flags);
+
+ qc = ata_qc_from_tag(ap, ap->active_tag);
+ if (!qc) {
+ /* ata_poll_qc_complete() or ata_interrupt() win the race.
+ */
+ printk(KERN_ERR "ata%u: timeout without qc\n", ap->id);
+ rc = EH_HANDLED;
+ goto out;
+ }
+
+ /* check if we are handling the right qc */
+ assert(qc->scsicmd == cmd);
+ assert(qc->flags & ATA_QCFLAG_ACTIVE);
+
+ if (qc->tf.flags & ATA_TFLAG_POLLING) {
+ /* The PIO polling thread has its own timeout handling.
+ * It won't loop forever, even if it takes a long time.
+ * We can wait the PIO polling thread to finish its work.
+ * No need to stop the polling thread from the timeout handler.
+ * Just reset the timer here.
+ */
+ printk(KERN_WARNING "ata%u: reset scsi timer for PIO polling\n", ap->id);
+ cmd->retries = 0;
+ rc = EH_RESET_TIMER;
+ }
+out:
+ spin_unlock_irqrestore(&host_set->lock, flags);
+ DPRINTK("EXIT rc %d\n", rc);
+ return rc;
+}
+
+/**
* ata_scsi_error - SCSI layer error handler callback
* @host: SCSI host on which error occurred
*
--- irq-pio/drivers/scsi/libata-core.c 2006-01-20 10:35:05.000000000 +0800
+++ eh_timed_out/drivers/scsi/libata-core.c 2006-01-20 10:48:44.000000000 +0800
@@ -5434,6 +5434,7 @@ EXPORT_SYMBOL_GPL(ata_port_disable);
EXPORT_SYMBOL_GPL(ata_ratelimit);
EXPORT_SYMBOL_GPL(ata_scsi_ioctl);
EXPORT_SYMBOL_GPL(ata_scsi_queuecmd);
+EXPORT_SYMBOL_GPL(ata_scsi_timed_out);
EXPORT_SYMBOL_GPL(ata_scsi_error);
EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
EXPORT_SYMBOL_GPL(ata_scsi_release);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH/RFC] libata-dev: update timer for PIO polling - revised
2006-01-20 14:19 ` [PATCH/RFC] libata-dev: update timer for PIO polling - revised Albert Lee
@ 2006-01-21 1:05 ` Tejun Heo
0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2006-01-21 1:05 UTC (permalink / raw)
To: Albert Lee; +Cc: Jeff Garzik, Ralf Müller, Doug Maxey, Linux IDE
Albert Lee wrote:
> Jeff,
>
> Revised per your advice to use the eh_timed_out() handler.
>
> Description:
> In the bug reported by Ralf Müller:
> (http://marc.theaimsgroup.com/?l=linux-ide&m=113629906809278&w=2)
>
> When the scsi command times out, the qc is completed by the time
> out handler. This might cause null pointer dereference if
> the qc is running in the PIO polling thread.
>
> Changes:
> Add ata_scsi_timed_out() to libata-scsi.
> If the qc times out when running in PIO mode, EH_RESET_TIMER is returned.
> Otherwise, EH_NOT_HANDLED is returned and the eh_strategy_handler()
> is triggered as before.
>
> (Will add ata_scsi_timed_out() to LLDs later if this is ok.)
>
> Patch against the libata-dev irq-pio branch
> (000080c3499cd5037e60c08a8053efb9e48aa9c0).
>
> For your review, thanks.
>
> Albert
> Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
>
> =====
>
> --- irq-pio/include/linux/libata.h 2006-01-20 10:35:11.000000000 +0800
> +++ eh_timed_out/include/linux/libata.h 2006-01-20 10:48:29.000000000 +0800
> @@ -449,6 +449,7 @@ extern void ata_host_set_remove(struct a
> extern int ata_scsi_detect(struct scsi_host_template *sht);
> extern int ata_scsi_ioctl(struct scsi_device *dev, int cmd, void __user *arg);
> extern int ata_scsi_queuecmd(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *));
> +extern enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *);
> extern int ata_scsi_error(struct Scsi_Host *host);
> extern int ata_scsi_release(struct Scsi_Host *host);
> extern unsigned int ata_host_intr(struct ata_port *ap, struct ata_queued_cmd *qc);
> --- irq-pio/drivers/scsi/libata-scsi.c 2006-01-20 10:35:05.000000000 +0800
> +++ eh_timed_out/drivers/scsi/libata-scsi.c 2006-01-20 22:13:14.000000000 +0800
> @@ -717,6 +717,59 @@ int ata_scsi_slave_config(struct scsi_de
> }
>
> /**
> + * ata_scsi_timed_out - SCSI layer time out handler callback
> + * @cmd: SCSI cmd on which time out occurred
> + *
> + * Handles SCSI-layer-thrown time out events.
> + *
> + * RETURNS:
> + * EH_HANDLED if qc was completed
> + * EH_RESET_TIMER for polling case
> + * EH_NOT_HANDLED otherwise.
> + */
> +enum scsi_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *cmd)
> +{
> + struct scsi_device *scsidev = cmd->device;
> + struct ata_port *ap = (struct ata_port *) &scsidev->host->hostdata[0];
> + struct ata_host_set *host_set = ap->host_set;
> + struct ata_queued_cmd *qc;
> + unsigned long flags;
> + int rc = EH_NOT_HANDLED;
> +
> + DPRINTK("ENTER\n");
> + spin_lock_irqsave(&host_set->lock, flags);
> +
> + qc = ata_qc_from_tag(ap, ap->active_tag);
> + if (!qc) {
> + /* ata_poll_qc_complete() or ata_interrupt() win the race.
> + */
> + printk(KERN_ERR "ata%u: timeout without qc\n", ap->id);
> + rc = EH_HANDLED;
> + goto out;
> + }
> +
> + /* check if we are handling the right qc */
> + assert(qc->scsicmd == cmd);
> + assert(qc->flags & ATA_QCFLAG_ACTIVE);
> +
> + if (qc->tf.flags & ATA_TFLAG_POLLING) {
> + /* The PIO polling thread has its own timeout handling.
> + * It won't loop forever, even if it takes a long time.
> + * We can wait the PIO polling thread to finish its work.
> + * No need to stop the polling thread from the timeout handler.
> + * Just reset the timer here.
> + */
> + printk(KERN_WARNING "ata%u: reset scsi timer for PIO polling\n", ap->id);
> + cmd->retries = 0;
> + rc = EH_RESET_TIMER;
> + }
> +out:
> + spin_unlock_irqrestore(&host_set->lock, flags);
> + DPRINTK("EXIT rc %d\n", rc);
> + return rc;
> +}
> +
> +/**
> * ata_scsi_error - SCSI layer error handler callback
> * @host: SCSI host on which error occurred
> *
> --- irq-pio/drivers/scsi/libata-core.c 2006-01-20 10:35:05.000000000 +0800
> +++ eh_timed_out/drivers/scsi/libata-core.c 2006-01-20 10:48:44.000000000 +0800
> @@ -5434,6 +5434,7 @@ EXPORT_SYMBOL_GPL(ata_port_disable);
> EXPORT_SYMBOL_GPL(ata_ratelimit);
> EXPORT_SYMBOL_GPL(ata_scsi_ioctl);
> EXPORT_SYMBOL_GPL(ata_scsi_queuecmd);
> +EXPORT_SYMBOL_GPL(ata_scsi_timed_out);
> EXPORT_SYMBOL_GPL(ata_scsi_error);
> EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
> EXPORT_SYMBOL_GPL(ata_scsi_release);
>
Albert, Jeff, can you please hold off this for a while? I currently
have a lot of pending EH changes (reset, revalidation, reconfiguration,
etc...). I'm in the process of splitting the changes and throwing out
unnecessary ones.
One of the changes is ata_flush_pio_task() which waits for the pio tasks
to finish. This function is called by error handler. This achieves
about the same thing, but in sligthly different manner.
* Doesn't use eh_timed_out. I think this callback interface is somewhat
weird and it's also slightly broken in that commands can be completed
before timer is re-added. It's not fatal but the command will have to
wait until the next timeout.
* ata_flush_pio_task() uses a flag to tell pio tasks to exit early. All
pio tasks have to do is check whether a flag is set before rescheduling
itself.
Sorry about asking to hold off after a lot is done. I've been lazy with
mailing list for a week or so and am catching up now.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-01-21 1:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-03 14:32 Kernel Oops with 2.6.15 Ralf Müller
2006-01-04 3:10 ` Albert Lee
2006-01-04 9:29 ` Ralf Müller
2006-01-06 3:41 ` [PATCH/RFC] libata-dev: update timeout for PIO polling Albert Lee
2006-01-18 0:52 ` Jeff Garzik
2006-01-20 14:19 ` [PATCH/RFC] libata-dev: update timer for PIO polling - revised Albert Lee
2006-01-21 1:05 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).