linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Instability
@ 2005-04-19 14:50 Frank Henkel
  2005-04-20  2:30 ` Instability Albert Lee
  0 siblings, 1 reply; 2+ messages in thread
From: Frank Henkel @ 2005-04-19 14:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-ide

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear Jeff,

we have a stability problem under x86_64 Linux using the 
sata_sil driver for the SiI 3512A dual port SATA onboard
controller (BIOS Version 4.3.47) of the MSI MS-9145 Dual 
Opteron (So940) MoBo.

The proprietary drivers from SiI don't match our kernel
versions, so we rely on the sata_sil alternative.

What Linux distros I have tested:

Distro		Kernel              sata_sil Version
- -------------------------------------------------------
SuSE 9.3 x86_64	2.6.11.4-20a-smp    0.8
- -------------------------------------------------------
Scientific Linux CERN 3.0.4
                2.4.21-27.0.2.EL.cernsmp
                                    0.54
- -------------------------------------------------------
RHEL WS3 U4     2.4.X (server crashed before information 
                       was saved)   
- -------------------------------------------------------

The problem is, that I a lot of console messages like

    ata1: status=0x51 { DriveReady SeekComplete Error }
    ata1: error=0x04 { DriveStatus Error }

appear when data is written to disk.  And, suddenly, I/O 
errors are reported and the system hangs.  In the case of
RHEL WS3 U4 I lost the complete installation, because 
fsck couldn't catch up all errors in the FS.

Do you know problems with this specific SATA controller?
Do you have a solution (2.4 kernel)?
Can I help you with more information to enhance the driver?

Thank you.
Best regards,
Frank
==================================================
Frank Henkel
Application Analyst
NEC High Performance Computing Europe GmbH, EHPCTC
Hessbruehlstr. 21B, 70565 Stuttgart, Germany
Tel: +49 711 78055 14      fhenkel@hpce.nec.com
Fax: +49 711 78055 25      http://www.hpce.nec.com
==================================================
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Further Information: www.gnupg.org, www.gnupp.de, www.sicherheit-im-internet.de/themes/engl.phtml, http://hp.vector.co.jp/authors/VA019487

iD8DBQFCZRqhbC4eWe/BlrIRAkdhAJ4oRRetabT4d3eoMIqr9CaAScOFRgCgjvzx
UHq0/yn9gxTtsZzuvNpXP14=
=qC02
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Instability
  2005-04-19 14:50 Instability Frank Henkel
@ 2005-04-20  2:30 ` Albert Lee
  0 siblings, 0 replies; 2+ messages in thread
From: Albert Lee @ 2005-04-20  2:30 UTC (permalink / raw)
  To: fhenkel; +Cc: Jeff Garzik, linux-ide

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]



Frank Henkel wrote:
> 
> Dear Jeff,
> 
> we have a stability problem under x86_64 Linux using the 
> sata_sil driver for the SiI 3512A dual port SATA onboard
> controller (BIOS Version 4.3.47) of the MSI MS-9145 Dual 
> Opteron (So940) MoBo.
> 
> The proprietary drivers from SiI don't match our kernel
> versions, so we rely on the sata_sil alternative.
> 
> What Linux distros I have tested:
> 
> Distro		Kernel              sata_sil Version
> - -------------------------------------------------------
> SuSE 9.3 x86_64	2.6.11.4-20a-smp    0.8
> - -------------------------------------------------------
> Scientific Linux CERN 3.0.4
>                 2.4.21-27.0.2.EL.cernsmp
>                                     0.54
> - -------------------------------------------------------
> RHEL WS3 U4     2.4.X (server crashed before information 
>                        was saved)   
> - -------------------------------------------------------
> 
> The problem is, that I a lot of console messages like
> 
>     ata1: status=0x51 { DriveReady SeekComplete Error }
>     ata1: error=0x04 { DriveStatus Error }
> 
> appear when data is written to disk.  And, suddenly, I/O 
> errors are reported and the system hangs.  In the case of
> RHEL WS3 U4 I lost the complete installation, because 
> fsck couldn't catch up all errors in the FS.
> 
> Do you know problems with this specific SATA controller?
> Do you have a solution (2.4 kernel)?
> Can I help you with more information to enhance the driver?
> 

Hi Frank,

Since you are running x86-64, I guess the problem might be similar to
the sg_dma_len() problem seen on ppc64:

http://marc.theaimsgroup.com/?l=linux-ide&m=111113103410355&w=2

Could you please try the attached patch, thanks.

Albert

[-- Attachment #2: sg_dma_len2.diff --]
[-- Type: text/plain, Size: 4351 bytes --]

--- linux-2.6.5-SLES9_SP2_BRANCH_20050418161416/drivers/scsi/libata-core.c.ori	2005-04-19 15:34:14.000000000 +0800
+++ linux-2.6.5-SLES9_SP2_BRANCH_20050418161416/drivers/scsi/libata-core.c	2005-04-19 16:39:25.000000000 +0800
@@ -1054,6 +1054,7 @@
 	}
 
 	qc->waiting = &wait;
+	qc->private_data = &status;
 	qc->complete_fn = ata_qc_complete_noop;
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
@@ -1065,7 +1066,6 @@
 	else
 		wait_for_completion(&wait);
 
-	status = ata_chk_status(ap);
 	if (status & ATA_ERR) {
 		/*
 		 * arg!  EDD works for all test cases, but seems to return
@@ -1918,6 +1918,7 @@
 	struct ata_queued_cmd *qc;
 	int rc;
 	unsigned long flags;
+	u8 status;
 
 	/* set up set-features taskfile */
 	DPRINTK("set features - xfer mode\n");
@@ -1932,6 +1933,7 @@
 	qc->tf.nsect = dev->xfer_mode;
 
 	qc->waiting = &wait;
+	qc->private_data = &status;
 	qc->complete_fn = ata_qc_complete_noop;
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
@@ -2071,7 +2073,7 @@
 	sg = qc->sg;
 	sg->page = virt_to_page(buf);
 	sg->offset = (unsigned long) buf & ~PAGE_MASK;
-	sg_dma_len(sg) = buflen;
+	sg->length = buflen;
 }
 
 void ata_sg_init(struct ata_queued_cmd *qc, struct scatterlist *sg,
@@ -2101,11 +2103,12 @@
 	dma_addr_t dma_address;
 
 	dma_address = dma_map_single(ap->host_set->dev, qc->buf_virt,
-				     sg_dma_len(sg), dir);
+				     sg->length, dir);
 	if (dma_mapping_error(dma_address))
 		return -1;
 
 	sg_dma_address(sg) = dma_address;
+	sg_dma_len(sg) = sg->length;
 
 	DPRINTK("mapped buffer of %d bytes for %s\n", sg_dma_len(sg),
 		qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
@@ -2310,7 +2313,7 @@
 	qc->cursect++;
 	qc->cursg_ofs++;
 
-	if ((qc->cursg_ofs * ATA_SECT_SIZE) == sg_dma_len(&sg[qc->cursg])) {
+	if ((qc->cursg_ofs * ATA_SECT_SIZE) == (&sg[qc->cursg])->length) {
 		qc->cursg++;
 		qc->cursg_ofs = 0;
 	}
@@ -2333,13 +2336,29 @@
 	unsigned char *buf;
 	unsigned int offset, count;
 
-	if (qc->curbytes == qc->nbytes - bytes)
+	if (qc->curbytes + bytes >= qc->nbytes)
 		ap->pio_task_state = PIO_ST_LAST;
 
 next_sg:
+	/* check whether qc->sg is full */
+	if (unlikely(qc->cursg >= qc->n_elem)) {
+		unsigned char pad_buf[2];
+		unsigned int words = (bytes+1) >> 1; /* pad to word boundary */
+		unsigned int i;
+
+		DPRINTK("ata%u: padding %u bytes\n", ap->id, bytes);
+
+		memset(&pad_buf, 0, sizeof(pad_buf));
+		for (i = 0; i < words; i++) {
+			ata_data_xfer(ap, pad_buf, sizeof(pad_buf), do_write);
+		}
+
+		ap->pio_task_state = PIO_ST_LAST;
+		return;
+	} 
+
 	sg = &qc->sg[qc->cursg];
 
-next_page:
 	page = sg->page;
 	offset = sg->offset + qc->cursg_ofs;
 
@@ -2347,18 +2366,25 @@
 	page = nth_page(page, (offset >> PAGE_SHIFT));
 	offset %= PAGE_SIZE;
 
-	count = min(sg_dma_len(sg) - qc->cursg_ofs, bytes);
+	/* don't overrun current sg */
+	count = min(sg->length - qc->cursg_ofs, bytes);
 
 	/* don't cross page boundaries */
 	count = min(count, (unsigned int)PAGE_SIZE - offset);
 
+	/* handle the odd condition */
+	if (unlikely(count & 0x01)) {
+		printk(KERN_WARNING "ata%u: odd count %u rounded: qc->nbytes %u, bytes %u\n", 
+		       ap->id, count, qc->nbytes, bytes);
+		count++;
+	}
+
 	buf = kmap(page) + offset;
 
-	bytes -= count;
 	qc->curbytes += count;
 	qc->cursg_ofs += count;
 
-	if (qc->cursg_ofs == sg_dma_len(sg)) {
+	if (qc->cursg_ofs >= sg->length) {
 		qc->cursg++;
 		qc->cursg_ofs = 0;
 	}
@@ -2370,9 +2396,9 @@
 
 	kunmap(page);
 
-	if (bytes) {
-		if (qc->cursg_ofs < sg_dma_len(sg))
-			goto next_page;
+	if (bytes > count) {
+		bytes -= count;
+
 		goto next_sg;
 	}
 }
@@ -2475,8 +2501,7 @@
 	assert(qc != NULL);
 
 	drv_stat = ata_chk_status(ap);
-	printk(KERN_WARNING "ata%u: PIO error, drv_stat 0x%x\n",
-	       ap->id, drv_stat);
+	DPRINTK("ata%u: PIO error, drv_stat 0x%x\n", ap->id, drv_stat);
 
 	ap->pio_task_state = PIO_ST_IDLE;
 
@@ -2527,6 +2552,7 @@
 	struct ata_queued_cmd *qc;
 	unsigned long flags;
 	int rc;
+	u8 status;
 
 	DPRINTK("ATAPI request sense\n");
 
@@ -2552,6 +2578,7 @@
 	qc->nbytes = SCSI_SENSE_BUFFERSIZE;
 
 	qc->waiting = &wait;
+	qc->private_data = &status;
 	qc->complete_fn = ata_qc_complete_noop;
 
 	spin_lock_irqsave(&ap->host_set->lock, flags);
@@ -2745,6 +2772,7 @@
 
 static int ata_qc_complete_noop(struct ata_queued_cmd *qc, u8 drv_stat)
 {
+	*((u8*)qc->private_data) = drv_stat;
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-04-20  2:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-19 14:50 Instability Frank Henkel
2005-04-20  2:30 ` Instability Albert Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).