linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brad Campbell <brad@wasp.net.au>
To: linux-ide@vger.kernel.org
Subject: Re: sata media errors not propagated? (was Re: sata_via issue)
Date: Tue, 17 Aug 2004 22:57:00 +0400	[thread overview]
Message-ID: <412254FC.2010207@wasp.net.au> (raw)
In-Reply-To: <4122350F.3040700@wasp.net.au>

Brad Campbell wrote:
> About here, scsi passes a command down to libata, ata_queued_cmd which 
> causes the next 3 messages.
> libata never errors out here and completes the request so it all falls 
> in a heap.
> I'm going out to dinner, I'll see if I can trace it further later tonight.
> 
>> ATA: abnormal status 0x59 on port 0xB807
>> ATA: abnormal status 0x59 on port 0xB807
>> ATA: abnormal status 0x59 on port 0xB807

OOOooooo it's timing sensitive.
I sprinkled some printk's around and now scsi_request_fn is stuck in an infinite loop as this 
evaluates as true. In case you are wondering where all the 12345 and the like come from, I'm 
including my scsi_request_fn with it's added printk's

                 if (!scsi_host_queue_ready(q, shost, sdev))
                         goto not_ready;

Aug 17 22:32:54 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:32:54 srv kernel: ata_qc_new_init
Aug 17 22:32:54 srv kernel: ata_qc_new
Aug 17 22:32:54 srv kernel: ata_qc_new out
Aug 17 22:32:54 srv kernel: ata_qc_new_init out
Aug 17 22:32:54 srv kernel:  ata_qc_issue
Aug 17 22:32:54 srv kernel: ata_qc_issue_out
Aug 17 22:32:54 srv kernel: ata_qc_issue_prot
Aug 17 22:32:54 srv kernel: ata_qc_issue_prot out
Aug 17 22:32:54 srv kernel: CD3EHI
Aug 17 22:33:24 srv kernel: ata13: command 0x25 timeout, stat 0xd0 host_stat 0x1
Aug 17 22:33:24 srv kernel: ata_qc_complete
Aug 17 22:33:24 srv kernel: ata_scsi_qc_complete
Aug 17 22:33:24 srv kernel: scsi12: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 00 00 00 
00 00 80 00
Aug 17 22:33:24 srv kernel: Current sda: sense key Medium Error
Aug 17 22:33:24 srv kernel: Additional sense: Unrecovered read error - auto reallocate failed
Aug 17 22:33:24 srv kernel: end_request: I/O error, dev sda, sector 0
Aug 17 22:33:24 srv kernel: Buffer I/O error on device sda, logical block 0
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: ata_qc_complete_out
Aug 17 22:33:24 srv kernel: 123456FGHI
Aug 17 22:33:24 srv kernel: 123456FGHI
This last line loops forever and prevent me from doing anything at all on the console.
Notice no abnormal status messages anymore.

Progress! Without the extra printk's in libata-core and libata-scsi this was how it went

Aug 17 22:01:10 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:01:10 srv kernel: CD3EHI
Aug 17 22:01:40 srv kernel: ata13: command 0x25 timeout, stat 0xd0 host_stat 0x1
Aug 17 22:01:40 srv kernel: ata_scsi_qc_complete
Aug 17 22:01:40 srv kernel: scsi12: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 00 00 80 
00 00 80 00
Aug 17 22:01:40 srv kernel: Current sda: sense key Medium Error
Aug 17 22:01:41 srv kernel: Additional sense: Unrecovered read error - auto reallocate failed
Aug 17 22:01:41 srv kernel: end_request: I/O error, dev sda, sector 128
Aug 17 22:01:41 srv kernel: Buffer I/O error on device sda, logical block 128
Aug 17 22:01:41 srv kernel: 123456FGHI
Aug 17 22:01:41 srv kernel: 123456FGHI
Aug 17 22:01:41 srv kernel: 123456789ABata_queued_cmd
Aug 17 22:01:41 srv kernel: ATA: abnormal status 0xD0 on port 0x8807
Aug 17 22:01:41 srv last message repeated 2 times
Aug 17 22:01:41 srv kernel: CD3EHI
This just sits here idle with dd in the D state.

dd's call trace for the above log.

Aug 17 20:00:52 srv kernel: dd            D DFCA4C88     0  1681   1585                     (NOTLB)
Aug 17 20:00:52 srv kernel: dce31dfc 00000086 dce31e34 dfca4c88 c01048b4 78601dca 00000034 dfaa1790
Aug 17 20:00:52 srv kernel:        000d6653 78603c1f 00000034 0000031d 7860af72 00000034 dcccea68 
c130cd60
Aug 17 20:00:52 srv kernel:        c1400588 dce31e34 dce31e04 c02fd98e dfca4c88 c01310f5 c130cd60 
00000000
Aug 17 20:00:52 srv kernel: Call Trace:
Aug 17 20:00:52 srv kernel:  [<c01048b4>] common_interrupt+0x18/0x20
Aug 17 20:00:52 srv kernel:  [<c02fd98e>] io_schedule+0xe/0x20
Aug 17 20:00:52 srv kernel:  [<c01310f5>] __lock_page+0xc5/0xf0
Aug 17 20:00:52 srv kernel:  [<c0130e40>] page_wake_function+0x0/0x50
Aug 17 20:00:52 srv kernel:  [<c0130e40>] page_wake_function+0x0/0x50
Aug 17 20:00:52 srv kernel:  [<c01318da>] do_generic_mapping_read+0x3ca/0x480
Aug 17 20:00:52 srv kernel:  [<c0131c47>] __generic_file_aio_read+0x1d7/0x210
Aug 17 20:00:52 srv kernel:  [<c0131990>] file_read_actor+0x0/0xe0
Aug 17 20:00:52 srv kernel:  [<c0131d5f>] generic_file_read+0x6f/0x90
Aug 17 20:00:52 srv kernel:  [<c01154eb>] scheduler_tick+0x1b/0x400
Aug 17 20:00:52 srv kernel:  [<c012060f>] do_timer+0xcf/0xe0
Aug 17 20:00:52 srv kernel:  [<c014c5b0>] vfs_read+0xb0/0x100
Aug 17 20:00:52 srv kernel:  [<c014c7e7>] sys_read+0x47/0x80
Aug 17 20:00:52 srv kernel:  [<c0103f47>] syscall_call+0x7/0xb



static void scsi_request_fn(struct request_queue *q)
{
         struct scsi_device *sdev = q->queuedata;
         struct Scsi_Host *shost = sdev->host;
         struct scsi_cmnd *cmd;
         struct request *req;

         printk("1");
         if(!get_device(&sdev->sdev_gendev))
                 /* We must be tearing the block queue down already */
                 return;

         /*
          * To start with, we keep looping until the queue is empty, or until
          * the host is no longer able to accept any more requests.
          */
         printk("2");
         while (!blk_queue_plugged(q)) {
                 int rtn;
                 /*
                  * get next queueable request.  We do this early to make sure
                  * that the request is fully prepared even if we cannot
                  * accept it.
                  */
                 printk("3");
                 req = elv_next_request(q);
                 if (!req || !scsi_dev_queue_ready(q, sdev))
                         break;
                 printk("4");
                 if (unlikely(!scsi_device_online(sdev))) {
                         printk(KERN_ERR "scsi%d (%d:%d): rejecting I/O to offline device\n",
                                sdev->host->host_no, sdev->id, sdev->lun);
                         blkdev_dequeue_request(req);
                         req->flags |= REQ_QUIET;
                         while (end_that_request_first(req, 0, req->nr_sectors))
                                 ;
                         end_that_request_last(req);
                         continue;
                 }


                 /*
                  * Remove the request from the request list.
                  */
                 if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
                         blkdev_dequeue_request(req);
                 sdev->device_busy++;
                 printk("5");
                 spin_unlock(q->queue_lock);
                 spin_lock(shost->host_lock);
                 printk("6");
                 if (!scsi_host_queue_ready(q, shost, sdev))
                         goto not_ready;
                 printk("7");
                 if (sdev->single_lun) {
                         if (sdev->sdev_target->starget_sdev_user &&
                             sdev->sdev_target->starget_sdev_user != sdev)
                                 goto not_ready;
                         sdev->sdev_target->starget_sdev_user = sdev;
                 }
                 shost->host_busy++;
                 printk("8");

                 /*
                  * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
                  *              take the lock again.
                  */
                 spin_unlock_irq(shost->host_lock);
                 printk("9");

                 cmd = req->special;
                 if (unlikely(cmd == NULL)) {
                         printk(KERN_CRIT "impossible request in %s.\n"
                                          "please mail a stack trace to "
                                          "linux-scsi@vger.kernel.org",
                                          __FUNCTION__);
                         BUG();
                 }
                 printk("A");

                 /*
                  * Finally, initialize any error handling parameters, and set up
                  * the timers for timeouts.
                  */
                 scsi_init_cmd_errh(cmd);
                 printk("B");

                 /*
                  * Dispatch the command to the low-level driver.
                  */
                 rtn = scsi_dispatch_cmd(cmd);
                 spin_lock_irq(q->queue_lock);
                 printk("C");
                 if(rtn) {
                         /* we're refusing the command; because of
                          * the way locks get dropped, we need to
                          * check here if plugging is required */
                         if(sdev->device_busy == 0)
                                 blk_plug_device(q);

                         break;
                 }
                 printk("D");
         }

                 printk("E");
         goto out;

  not_ready:
         spin_unlock_irq(shost->host_lock);
                 printk("F");

         /*
          * lock q, handle tag, requeue req, and decrement device_busy. We
          * must return with queue_lock held.
          *
          * Decrementing device_busy without checking it is OK, as all such
          * cases (host limits or settings) should run the queue at some
          * later time.
          */
         spin_lock_irq(q->queue_lock);
         blk_requeue_request(q, req);
         sdev->device_busy--;
         if(sdev->device_busy == 0)
                 blk_plug_device(q);
                 printk("G");
  out:
         /* must be careful here...if we trigger the ->remove() function
          * we cannot be holding the q lock */
                 printk("H");
         spin_unlock_irq(q->queue_lock);
         put_device(&sdev->sdev_gendev);
         spin_lock_irq(q->queue_lock);
                 printk("I\n");
}


Does anyone get the idea I'm stumbling around in the dark ? ;p)

Regards,
Brad

      reply	other threads:[~2004-08-17 18:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-17 11:17 sata media errors not propagated? (was Re: sata_via issue) Kevin Shanahan
2004-08-17 12:38 ` Brad Campbell
2004-08-17 16:40 ` Brad Campbell
2004-08-17 18:57   ` Brad Campbell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=412254FC.2010207@wasp.net.au \
    --to=brad@wasp.net.au \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).