From: Brad Campbell <brad@wasp.net.au>
To: linux-ide@vger.kernel.org
Subject: Re: sata media errors not propagated? (was Re: sata_via issue)
Date: Tue, 17 Aug 2004 22:57:00 +0400 [thread overview]
Message-ID: <412254FC.2010207@wasp.net.au> (raw)
In-Reply-To: <4122350F.3040700@wasp.net.au>
Brad Campbell wrote:
> About here, scsi passes a command down to libata, ata_queued_cmd which
> causes the next 3 messages.
> libata never errors out here and completes the request so it all falls
> in a heap.
> I'm going out to dinner, I'll see if I can trace it further later tonight.
>
>> ATA: abnormal status 0x59 on port 0xB807
>> ATA: abnormal status 0x59 on port 0xB807
>> ATA: abnormal status 0x59 on port 0xB807
OOOooooo it's timing sensitive.
I sprinkled some printk's around and now scsi_request_fn is stuck in an infinite loop as this
evaluates as true. In case you are wondering where all the 12345 and the like come from, I'm
including my scsi_request_fn with it's added printk's
if (!scsi_host_queue_ready(q, shost, sdev))
goto not_ready;
Aug 17 22:32:54 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:32:54 srv kernel: ata_qc_new_init
Aug 17 22:32:54 srv kernel: ata_qc_new
Aug 17 22:32:54 srv kernel: ata_qc_new out
Aug 17 22:32:54 srv kernel: ata_qc_new_init out
Aug 17 22:32:54 srv kernel: ata_qc_issue
Aug 17 22:32:54 srv kernel: ata_qc_issue_out
Aug 17 22:32:54 srv kernel: ata_qc_issue_prot
Aug 17 22:32:54 srv kernel: ata_qc_issue_prot out
Aug 17 22:32:54 srv kernel: CD3EHI
Aug 17 22:33:24 srv kernel: ata13: command 0x25 timeout, stat 0xd0 host_stat 0x1
Aug 17 22:33:24 srv kernel: ata_qc_complete
Aug 17 22:33:24 srv kernel: ata_scsi_qc_complete
Aug 17 22:33:24 srv kernel: scsi12: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 00 00 00
00 00 80 00
Aug 17 22:33:24 srv kernel: Current sda: sense key Medium Error
Aug 17 22:33:24 srv kernel: Additional sense: Unrecovered read error - auto reallocate failed
Aug 17 22:33:24 srv kernel: end_request: I/O error, dev sda, sector 0
Aug 17 22:33:24 srv kernel: Buffer I/O error on device sda, logical block 0
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: ata_qc_complete_out
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: 123456FGHI
This last line loops forever and prevent me from doing anything at all on the console.
Notice no abnormal status messages anymore.
Progress! Without the extra printk's in libata-core and libata-scsi this was how it went
Aug 17 22:01:10 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:01:10 srv kernel: CD3EHI
Aug 17 22:01:40 srv kernel: ata13: command 0x25 timeout, stat 0xd0 host_stat 0x1
Aug 17 22:01:40 srv kernel: ata_scsi_qc_complete
Aug 17 22:01:40 srv kernel: scsi12: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 00 00 80
00 00 80 00
Aug 17 22:01:40 srv kernel: Current sda: sense key Medium Error
Aug 17 22:01:41 srv kernel: Additional sense: Unrecovered read error - auto reallocate failed
Aug 17 22:01:41 srv kernel: end_request: I/O error, dev sda, sector 128
Aug 17 22:01:41 srv kernel: Buffer I/O error on device sda, logical block 128
Aug 17 22:01:41 srv kernel: 123456FGHI
Aug 17 22:01:41 srv kernel: 123456FGHI
Aug 17 22:01:41 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:01:41 srv kernel: ATA: abnormal status 0xD0 on port 0x8807
Aug 17 22:01:41 srv last message repeated 2 times
Aug 17 22:01:41 srv kernel: CD3EHI
This just sits here idle with dd in the D state.
dd's call trace for the above log.
Aug 17 20:00:52 srv kernel: dd D DFCA4C88 0 1681 1585 (NOTLB)
Aug 17 20:00:52 srv kernel: dce31dfc 00000086 dce31e34 dfca4c88 c01048b4 78601dca 00000034 dfaa1790
Aug 17 20:00:52 srv kernel: 000d6653 78603c1f 00000034 0000031d 7860af72 00000034 dcccea68
c130cd60
Aug 17 20:00:52 srv kernel: c1400588 dce31e34 dce31e04 c02fd98e dfca4c88 c01310f5 c130cd60
00000000
Aug 17 20:00:52 srv kernel: Call Trace:
Aug 17 20:00:52 srv kernel: [<c01048b4>] common_interrupt+0x18/0x20
Aug 17 20:00:52 srv kernel: [<c02fd98e>] io_schedule+0xe/0x20
Aug 17 20:00:52 srv kernel: [<c01310f5>] __lock_page+0xc5/0xf0
Aug 17 20:00:52 srv kernel: [<c0130e40>] page_wake_function+0x0/0x50
Aug 17 20:00:52 srv kernel: [<c0130e40>] page_wake_function+0x0/0x50
Aug 17 20:00:52 srv kernel: [<c01318da>] do_generic_mapping_read+0x3ca/0x480
Aug 17 20:00:52 srv kernel: [<c0131c47>] __generic_file_aio_read+0x1d7/0x210
Aug 17 20:00:52 srv kernel: [<c0131990>] file_read_actor+0x0/0xe0
Aug 17 20:00:52 srv kernel: [<c0131d5f>] generic_file_read+0x6f/0x90
Aug 17 20:00:52 srv kernel: [<c01154eb>] scheduler_tick+0x1b/0x400
Aug 17 20:00:52 srv kernel: [<c012060f>] do_timer+0xcf/0xe0
Aug 17 20:00:52 srv kernel: [<c014c5b0>] vfs_read+0xb0/0x100
Aug 17 20:00:52 srv kernel: [<c014c7e7>] sys_read+0x47/0x80
Aug 17 20:00:52 srv kernel: [<c0103f47>] syscall_call+0x7/0xb
static void scsi_request_fn(struct request_queue *q)
{
struct scsi_device *sdev = q->queuedata;
struct Scsi_Host *shost = sdev->host;
struct scsi_cmnd *cmd;
struct request *req;
printk("1");
if(!get_device(&sdev->sdev_gendev))
/* We must be tearing the block queue down already */
return;
/*
* To start with, we keep looping until the queue is empty, or until
* the host is no longer able to accept any more requests.
*/
printk("2");
while (!blk_queue_plugged(q)) {
int rtn;
/*
* get next queueable request. We do this early to make sure
* that the request is fully prepared even if we cannot
* accept it.
*/
printk("3");
req = elv_next_request(q);
if (!req || !scsi_dev_queue_ready(q, sdev))
break;
printk("4");
if (unlikely(!scsi_device_online(sdev))) {
printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n",
sdev->host->host_no, sdev->id, sdev->lun);
blkdev_dequeue_request(req);
req->flags |= REQ_QUIET;
while (end_that_request_first(req, 0, req->nr_sectors))
;
end_that_request_last(req);
continue;
}
/*
* Remove the request from the request list.
*/
if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
blkdev_dequeue_request(req);
sdev->device_busy++;
printk("5");
spin_unlock(q->queue_lock);
spin_lock(shost->host_lock);
printk("6");
if (!scsi_host_queue_ready(q, shost, sdev))
goto not_ready;
printk("7");
if (sdev->single_lun) {
if (sdev->sdev_target->starget_sdev_user &&
sdev->sdev_target->starget_sdev_user != sdev)
goto not_ready;
sdev->sdev_target->starget_sdev_user = sdev;
}
shost->host_busy++;
printk("8");
/*
* XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
* take the lock again.
*/
spin_unlock_irq(shost->host_lock);
printk("9");
cmd = req->special;
if (unlikely(cmd == NULL)) {
printk(KERN_CRIT "impossible request in %s.\n"
"please mail a stack trace to "
"linux-scsi@vger.kernel.org",
__FUNCTION__);
BUG();
}
printk("A");
/*
* Finally, initialize any error handling parameters, and set up
* the timers for timeouts.
*/
scsi_init_cmd_errh(cmd);
printk("B");
/*
* Dispatch the command to the low-level driver.
*/
rtn = scsi_dispatch_cmd(cmd);
spin_lock_irq(q->queue_lock);
printk("C");
if(rtn) {
/* we're refusing the command; because of
* the way locks get dropped, we need to
* check here if plugging is required */
if(sdev->device_busy == 0)
blk_plug_device(q);
break;
}
printk("D");
}
printk("E");
goto out;
not_ready:
spin_unlock_irq(shost->host_lock);
printk("F");
/*
* lock q, handle tag, requeue req, and decrement device_busy. We
* must return with queue_lock held.
*
* Decrementing device_busy without checking it is OK, as all such
* cases (host limits or settings) should run the queue at some
* later time.
*/
spin_lock_irq(q->queue_lock);
blk_requeue_request(q, req);
sdev->device_busy--;
if(sdev->device_busy == 0)
blk_plug_device(q);
printk("G");
out:
/* must be careful here...if we trigger the ->remove() function
* we cannot be holding the q lock */
printk("H");
spin_unlock_irq(q->queue_lock);
put_device(&sdev->sdev_gendev);
spin_lock_irq(q->queue_lock);
printk("I\n");
}
Does anyone get the idea I'm stumbling around in the dark ? ;p)
Regards,
Brad
prev parent reply other threads:[~2004-08-17 18:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-17 11:17 sata media errors not propagated? (was Re: sata_via issue) Kevin Shanahan
2004-08-17 12:38 ` Brad Campbell
2004-08-17 16:40 ` Brad Campbell
2004-08-17 18:57 ` Brad Campbell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=412254FC.2010207@wasp.net.au \
--to=brad@wasp.net.au \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).