[patch]: ide dma timeout retry in pio

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch]: ide dma timeout retry in pio
@ 2001-05-28 18:34 Jens Axboe
  2001-05-28 19:39 ` Mark Hahn
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2001-05-28 18:34 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Andre M. Hedrick, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

Hi,

We have the current problem of ide dma possibly tossing out a complete
request, when we hit a dma timout. In this case, what we really want to
do is retry the request in pio mode and revert to normal dma operations
later again.

This patch catches the dma timout. It clears the dma engine, turns dma
off, sanity checks the request, and makes sure that the ide request
handler restarts the request (now in pio mode). When the first chunk of
the request is finished, return to dma mode. If the dma timeouts keep
happening, stay in pio mode.

Patch is untested for obvious reason, against 2.4.5-ac3

-- 
Jens Axboe


[-- Attachment #2: ide-dma-timeout-1 --]
[-- Type: text/plain, Size: 3527 bytes --]

--- ../linux-2.4.5-ac3-clean/drivers/ide/ide.c	Mon May 28 20:28:05 2001
+++ drivers/ide/ide.c	Mon May 28 20:21:48 2001
@@ -543,10 +543,20 @@
 {
 	struct request *rq;
 	unsigned long flags;
+	ide_drive_t *drive = hwgroup->drive;
 
 	spin_lock_irqsave(&io_request_lock, flags);
 	rq = hwgroup->rq;
 
+	/*
+	 * decide whether to reenable DMA -- 3 is a random magic for now,
+	 * if we DMA timeout more than 3 times, just stay in PIO
+	 */
+	if (drive->state == DMA_PIO_RETRY && drive->retry_pio <= 3) {
+		drive->state = 0;
+		hwgroup->hwif->dmaproc(ide_dma_on, drive);
+	}
+
 	if (!end_that_request_first(rq, uptodate, hwgroup->drive->name)) {
 		add_blkdev_randomness(MAJOR(rq->rq_dev));
 		blkdev_dequeue_request(rq);
@@ -1419,6 +1429,49 @@
 }
 
 /*
+ * un-busy the hwgroup etc, and clear any pending DMA status. we want to
+ * retry the current request in pio mode instead of risking tossing it
+ * all away
+ */
+void ide_dma_timeout_retry(ide_drive_t *drive)
+{
+	ide_hwif_t *hwif = HWIF(drive);
+	struct request *rq;
+
+	/*
+	 * end current dma transaction
+	 */
+	(void) hwif->dmaproc(ide_dma_end, drive);
+
+	/*
+	 * complain a little, later we might remove some of this verbosity
+	 */
+	printk("%s: timeout waiting for DMA\n", drive->name);
+	(void) hwif->dmaproc(ide_dma_timeout, drive);
+
+	/*
+	 * disable dma for now, but remember that we did so because of
+	 * a timeout -- we'll reenable after we finish this next request
+	 * (or rather the first chunk of it) in pio.
+	 */
+	drive->retry_pio++;
+	drive->state = DMA_PIO_RETRY;
+	(void) hwif->dmaproc(ide_dma_off_quietly, drive);
+
+	/*
+	 * un-busy drive etc (hwgroup->busy is cleared on return) and
+	 * make sure request is sane
+	 */
+	rq = HWGROUP(drive)->rq;
+	HWGROUP(drive)->rq = NULL;
+
+	rq->errors = 0;
+	rq->sector = rq->bh->b_rsector;
+	rq->current_nr_sectors = rq->bh->b_size >> 9;
+	rq->buffer = rq->bh->b_data;
+}
+
+/*
  * ide_timer_expiry() is our timeout function for all drive operations.
  * But note that it can also be invoked as a result of a "sleep" operation
  * triggered by the mod_timer() call in ide_do_request.
@@ -1491,11 +1544,10 @@
 				startstop = handler(drive);
 			} else {
 				if (drive->waiting_for_dma) {
-					(void) hwgroup->hwif->dmaproc(ide_dma_end, drive);
-					printk("%s: timeout waiting for DMA\n", drive->name);
-					(void) hwgroup->hwif->dmaproc(ide_dma_timeout, drive);
-				}
-				startstop = ide_error(drive, "irq timeout", GET_STAT());
+					startstop = ide_stopped;
+					ide_dma_timeout_retry(drive);
+				} else
+					startstop = ide_error(drive, "irq timeout", GET_STAT());
 			}
 			set_recovery_timer(hwif);
 			drive->service_time = jiffies - drive->service_start;
--- ../linux-2.4.5-ac3-clean/include/linux/ide.h	Mon May 28 20:28:13 2001
+++ include/linux/ide.h	Mon May 28 20:21:18 2001
@@ -87,6 +87,11 @@
 #define ERROR_RECAL	1	/* Recalibrate every 2nd retry */
 
 /*
+ * state flags
+ */
+#define DMA_PIO_RETRY	1	/* retrying in PIO */
+
+/*
  * Ensure that various configuration flags have compatible settings
  */
 #ifdef REALLY_SLOW_IO
@@ -299,6 +304,8 @@
 	special_t	special;	/* special action flags */
 	byte     keep_settings;		/* restore settings after drive reset */
 	byte     using_dma;		/* disk is using dma for read/write */
+	byte	 retry_pio;		/* retrying dma capable host in pio */
+	byte	 state;			/* retry state */
 	byte     waiting_for_dma;	/* dma currently in progress */
 	byte     unmask;		/* flag: okay to unmask other irqs */
 	byte     slow;			/* flag: slow data port */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 18:34 [patch]: ide dma timeout retry in pio Jens Axboe
@ 2001-05-28 19:39 ` Mark Hahn
  2001-05-28 20:13   ` Christopher B. Liebman
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mark Hahn @ 2001-05-28 19:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: andre, alan, linux-kernel

> request, when we hit a dma timout. In this case, what we really want to
> do is retry the request in pio mode and revert to normal dma operations
> later again.

really?  do we know the nature of the DMA engine problem well enough?
is there a reason to believe that it'll work better "later"?
I guess I was surprised at resorting to PIO - couldn't we just
break the request up into smaller chunks, still using DMA?

I seem to recall Andre saying that the problem arises when the 
ide DMA engine looses PCI arbitration during a burst.  shorter 
bursts would seem like the best workaround if this is the problem...

resorting to PIO would be such a shame, not only because it eats
CPU so badly, but also because it has no checksum like UDMA...

thanks, mark hahn.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [patch]: ide dma timeout retry in pio
  2001-05-28 19:39 ` Mark Hahn
@ 2001-05-28 20:13   ` Christopher B. Liebman
  2001-05-28 20:37   ` Jens Axboe
  2001-05-28 21:12   ` Alan Cox
  2 siblings, 0 replies; 15+ messages in thread
From: Christopher B. Liebman @ 2001-05-28 20:13 UTC (permalink / raw)
  To: Jens Axboe, Mark Hahn
  Cc: Acpi@Phobos. Fachschaften. Tu-Muenchen. De, linux-kernel, alan,
	andre

I think that this may be an issue with ACPI processor power saving...  I
have documented issues with ide DMA timeouts when the processor is put into
the C3 power state.  One of the things that happens in this state is that
buss master arbitration is *disabled*.....  bus master activity is
*supposed* to transition the system back to a C0 power state.  I'll bet
there are some issues with the Linux IDE dma and disabling bus master
arbitration......  ideas?  thoughts?  patches? ;-)

	-- Chris

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Mark Hahn
>
> I seem to recall Andre saying that the problem arises when the
> ide DMA engine looses PCI arbitration during a burst.  shorter
> bursts would seem like the best workaround if this is the problem...
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 19:39 ` Mark Hahn
  2001-05-28 20:13   ` Christopher B. Liebman
@ 2001-05-28 20:37   ` Jens Axboe
  2001-05-28 22:15     ` Andre Hedrick
  2001-05-28 21:12   ` Alan Cox
  2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2001-05-28 20:37 UTC (permalink / raw)
  To: Mark Hahn; +Cc: andre, alan, linux-kernel

On Mon, May 28 2001, Mark Hahn wrote:
> > request, when we hit a dma timout. In this case, what we really want to
> > do is retry the request in pio mode and revert to normal dma operations
> > later again.
> 
> really?  do we know the nature of the DMA engine problem well enough?
> is there a reason to believe that it'll work better "later"?
> I guess I was surprised at resorting to PIO - couldn't we just
> break the request up into smaller chunks, still using DMA?

That is indeed possible, it will require some surgery to the dma request
path though. IDE has no concept of doing part of a request for dma
currently, it's an all-or-nothing approach. That's why it falls back to
pio right now.

> I seem to recall Andre saying that the problem arises when the 
> ide DMA engine looses PCI arbitration during a burst.  shorter 
> bursts would seem like the best workaround if this is the problem...

It's worth a shot. My patch was not meant as the end-all solution,
however we need something _now_. Loosing sectors is not funny.
Dynamically limiting general request size for to make dma work is a
piece of cake, that'll be about a one-liner addition to the current
patch. So the logic could be something of the order of:

	- 1st dma timeout
	- scale max size down from 128kB (127.5kB really) to half that
	...
	- things aren't working, 2nd dma timeout. Scale down to 32kB.

and so forth, revert to pio and reset full size if it's really no good.
If limiting transfer sizes solves the problem, this would be the way to
go. I'll do another version that does this.

Testers? Who has frequent ide dma timeout problems??

> resorting to PIO would be such a shame, not only because it eats
> CPU so badly, but also because it has no checksum like UDMA...

Look at the patch -- we resort to pio for _one_ hunk. That's 8 sectors
tops, then back to dma. Hardly a big issue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 20:37   ` Jens Axboe
@ 2001-05-28 22:15     ` Andre Hedrick
  2001-05-28 22:26       ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Andre Hedrick @ 2001-05-28 22:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mark Hahn, alan, linux-kernel

On Mon, 28 May 2001, Jens Axboe wrote:

> On Mon, May 28 2001, Mark Hahn wrote:
> > > request, when we hit a dma timout. In this case, what we really want to
> > > do is retry the request in pio mode and revert to normal dma operations
> > > later again.
> > 
> > really?  do we know the nature of the DMA engine problem well enough?
> > is there a reason to believe that it'll work better "later"?
> > I guess I was surprised at resorting to PIO - couldn't we just
> > break the request up into smaller chunks, still using DMA?
> 
> That is indeed possible, it will require some surgery to the dma request
> path though. IDE has no concept of doing part of a request for dma
> currently, it's an all-or-nothing approach. That's why it falls back to
> pio right now.
> 
> > I seem to recall Andre saying that the problem arises when the 
> > ide DMA engine looses PCI arbitration during a burst.  shorter 
> > bursts would seem like the best workaround if this is the problem...
> 
> It's worth a shot. My patch was not meant as the end-all solution,
> however we need something _now_. Loosing sectors is not funny.
> Dynamically limiting general request size for to make dma work is a
> piece of cake, that'll be about a one-liner addition to the current
> patch. So the logic could be something of the order of:
> 
> 	- 1st dma timeout
> 	- scale max size down from 128kB (127.5kB really) to half that
> 	...
> 	- things aren't working, 2nd dma timeout. Scale down to 32kB.
> 
> and so forth, revert to pio and reset full size if it's really no good.
> If limiting transfer sizes solves the problem, this would be the way to
> go. I'll do another version that does this.
> 
> Testers? Who has frequent ide dma timeout problems??
> 
> > resorting to PIO would be such a shame, not only because it eats
> > CPU so badly, but also because it has no checksum like UDMA...
> 
> Look at the patch -- we resort to pio for _one_ hunk. That's 8 sectors
> tops, then back to dma. Hardly a big issue.

Unless we reissue the entire request from scratch you have no idea what if
anything is on the platters.  Since one can generally only get control
over the device with a soft reset, you have to assume that anything and
everything about that request was lost at the device level and begin
again.

<RANT>
This is why it is so important to change to TFAM, because we carry a copy
of the setup-seek operations with the request, and not unless we error out
do we change that content.  Thus is a timeout fault not a error case we
have all the info to re-issue or copy into a retry queue.  But as we all
know the proper fix can not be even attempted until 2.5...
</RANT>

One thing that I have been trying is to pop the local ISR at a timeout and
that looks to handle much if the problem; however, I need to reassemble my
one test box that I can reproduce the fault 100% of the time.

As I recall, there is a way to reinsert the faulted request, but that
means the request_struct needs fault counter.  If it is truly a DMA error
because of re-seeks then the timeout value for that request must be
expanded.

Now the final issue could be that the value of the calculated get-out
timer may be to short for other reason and the driver is jumping the gun
to notify and recover.

Cheers,

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-----------------------------------------------------------------------------
ASL, Inc.                                     Toll free: 1-877-ASL-3535
1757 Houret Court                             Fax: 1-408-941-2071
Milpitas, CA 95035                            Web: www.aslab.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 22:15     ` Andre Hedrick
@ 2001-05-28 22:26       ` Jens Axboe
  2001-05-29  0:09         ` Andre Hedrick
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2001-05-28 22:26 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Mark Hahn, alan, linux-kernel

On Mon, May 28 2001, Andre Hedrick wrote:
> > > resorting to PIO would be such a shame, not only because it eats
> > > CPU so badly, but also because it has no checksum like UDMA...
> > 
> > Look at the patch -- we resort to pio for _one_ hunk. That's 8 sectors
> > tops, then back to dma. Hardly a big issue.
> 
> Unless we reissue the entire request from scratch you have no idea what if
> anything is on the platters.  Since one can generally only get control
> over the device with a soft reset, you have to assume that anything and
> everything about that request was lost at the device level and begin
> again.

Look at the patch, that's what it does. For ide dma, it's all or
nothing. So if it times out, no part of the request is ended
ide_dma_timeout_retry does the sanity re-setup of the request for good
measure, and it might be needed in the future when ide dma can do
partial requests (2.5, not now). The request _is_ reissued from scratch.

> <RANT>
> This is why it is so important to change to TFAM, because we carry a copy
> of the setup-seek operations with the request, and not unless we error out
> do we change that content.  Thus is a timeout fault not a error case we
> have all the info to re-issue or copy into a retry queue.  But as we all
> know the proper fix can not be even attempted until 2.5...
> </RANT>

This is bull shit. If IDE didn't muck around with the request so much in
the first place, the info could always be trusted. Even so, we have the
hard_* numbers to go by. So this argument does not hold.

> As I recall, there is a way to reinsert the faulted request, but that

Again, look at the patch. The request is never off the list, so there is
never a reason to reinsert. hwgroup->busy is cleared (and, again for
good measure, hwgroup->rq), so ide_do_request/start_request will get the
same request that we just handled.

> means the request_struct needs fault counter.  If it is truly a DMA error

->errors, it's already there.

> because of re-seeks then the timeout value for that request must be
> expanded.

Yep

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 22:26       ` Jens Axboe
@ 2001-05-29  0:09         ` Andre Hedrick
  2001-05-29  0:30           ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Andre Hedrick @ 2001-05-29  0:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mark Hahn, alan, linux-kernel

On Tue, 29 May 2001, Jens Axboe wrote:

> This is bull shit. If IDE didn't muck around with the request so much in
> the first place, the info could always be trusted. Even so, we have the
> hard_* numbers to go by. So this argument does not hold.

Maybe if you looked at the new code model as a whole you would see that
the request-forking is gone.  The object is to preserve a copy of the io
instructions out to the registers to not have to repeat the do_request
call unless it is a do or die thing.  Also it is good to carry a copy of
the local request even if it is never used.  Also are you resetting the
pointer (back to the geginning) on rq->buffer on the retry?

You first flush the DMA engine and issue a device soft reset not using the
current drive reset, is presevers the hwgroup->busy state and allows the
request to be retried without reinserting.

> > As I recall, there is a way to reinsert the faulted request, but that
> 
> Again, look at the patch. The request is never off the list, so there is
> never a reason to reinsert. hwgroup->busy is cleared (and, again for
> good measure, hwgroup->rq), so ide_do_request/start_request will get the
> same request that we just handled.

I will have to poke in a few flags to verify this but if you say so.

> > means the request_struct needs fault counter.  If it is truly a DMA error
> 
> ->errors, it's already there.

Wrong location to poke and by that time it requires a full retry.
The new code would have had the task structs filled with the error.

> > because of re-seeks then the timeout value for that request must be
> > expanded.
> 
> Yep

In some cases yes, but it would be better if I had a standard counter that
meant something.  Also changing the jiffie counter in ide_delay_50ms to a
mdelay may have done more harm than good.

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-----------------------------------------------------------------------------
ASL, Inc.                                     Toll free: 1-877-ASL-3535
1757 Houret Court                             Fax: 1-408-941-2071
Milpitas, CA 95035                            Web: www.aslab.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-29  0:09         ` Andre Hedrick
@ 2001-05-29  0:30           ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2001-05-29  0:30 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Mark Hahn, alan, linux-kernel

On Mon, May 28 2001, Andre Hedrick wrote:
> On Tue, 29 May 2001, Jens Axboe wrote:
> 
> > This is bull shit. If IDE didn't muck around with the request so much in
> > the first place, the info could always be trusted. Even so, we have the
> > hard_* numbers to go by. So this argument does not hold.
> 
> Maybe if you looked at the new code model as a whole you would see that
> the request-forking is gone.  The object is to preserve a copy of the io

There's only a 'fork' (I'm assuming you mean copy?) for the pio
multwrite path, which I don't consider a big issue. I'm not saying that
TFAM is not needed, I'm saying don't go touting that as the big saviour
when it really has no relation to this topic at all.

> instructions out to the registers to not have to repeat the do_request
> call unless it is a do or die thing.  Also it is good to carry a copy of
> the local request even if it is never used.  Also are you resetting the

Retry is by no means a performance thing, and I would indeed prefer to
just pretend this is a brand new request rather than count on anything
being in a sane state.

> pointer (back to the geginning) on rq->buffer on the retry?

Of course

> You first flush the DMA engine and issue a device soft reset not using the
> current drive reset, is presevers the hwgroup->busy state and allows the
> request to be retried without reinserting.

Again, _there is no reinsertion_. And why would we want to preserve
hwgroup->busy? In fact, we need to clear it at all times to start the
request over sanely.

> > > As I recall, there is a way to reinsert the faulted request, but that
> > 
> > Again, look at the patch. The request is never off the list, so there is
> > never a reason to reinsert. hwgroup->busy is cleared (and, again for
> > good measure, hwgroup->rq), so ide_do_request/start_request will get the
> > same request that we just handled.
> 
> I will have to poke in a few flags to verify this but if you say so.

Ok, this is how it goes: a queue list is build of requests. IDE grabs
the very first request on the list, the path:

ide_do_request:
	hwgroup->rq = blkdev_entry_next_request(&drive->queue.queue_head)

start_request:
	struct request *rq = blkdev_entry_next_request(&drive->queue.queue_head)

There is no deletion going on. Request is started, runs, and is done. At
this point the low level driver calls ide_end_request:

ide_end_request:
	if (last_part_of_request) {
		blkdev_dequeue_request(rq);
		hwgroup->rq = NULL;
		...
	}

_here_ we take the request off the list, and we better be completely
done with it at this point. So as long as the request is not ended
completely, it will always be the at the head of the pending queue.

> > > means the request_struct needs fault counter.  If it is truly a DMA error
> > 
> > ->errors, it's already there.
> 
> Wrong location to poke and by that time it requires a full retry.
> The new code would have had the task structs filled with the error.

->errors is just meant as a simple counter, so you can do stuff like

	if (io_error)
		if (rq->errors++ > ERROR_MAX)
			/* uh oh */

if you need more than this, then yes a dedicated 'ide command' pointer
is what we want. You would need that for queueing anyway.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 19:39 ` Mark Hahn
  2001-05-28 20:13   ` Christopher B. Liebman
  2001-05-28 20:37   ` Jens Axboe
@ 2001-05-28 21:12   ` Alan Cox
  2001-05-28 22:11     ` James Turinsky
  2001-05-29  6:18     ` Larry McVoy
  2 siblings, 2 replies; 15+ messages in thread
From: Alan Cox @ 2001-05-28 21:12 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Jens Axboe, andre, alan, linux-kernel

> really?  do we know the nature of the DMA engine problem well enough?

I can categorise some of them:

1.	Hardware that just doesnt support it.
2.	Timeouts that are false positives caused by disks having problems
	and being very slow to recover
3.	Bad cabling
4.	Stalls caused by heavy PCI traffic

> is there a reason to believe that it'll work better "later"?

#1 will go fail, fail, fail -> PIO now (or should do Im about to try it)
#2 and #4 will be transient
#3 could go either way

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 21:12   ` Alan Cox
@ 2001-05-28 22:11     ` James Turinsky
  2001-05-29  6:18     ` Larry McVoy
  1 sibling, 0 replies; 15+ messages in thread
From: James Turinsky @ 2001-05-28 22:11 UTC (permalink / raw)
  To: Mark Hahn, Alan Cox; +Cc: Jens Axboe, andre, linux-kernel


----- Original Message -----
From: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
To: "Mark Hahn" <hahn@coffee.psychology.mcmaster.ca>
Cc: "Jens Axboe" <axboe@suse.de>; <andre@linux-ide.org>;
<alan@lxorguk.ukuu.org.uk>; <linux-kernel@vger.kernel.org>
Sent: Monday, May 28, 2001 5:12 PM
Subject: Re: [patch]: ide dma timeout retry in pio


> > really?  do we know the nature of the DMA engine problem well
enough?
>
> I can categorise some of them:
>
> 1. Hardware that just doesnt support it.
> 2. Timeouts that are false positives caused by disks having problems
> and being very slow to recover
> 3. Bad cabling
> 4. Stalls caused by heavy PCI traffic
>
> > is there a reason to believe that it'll work better "later"?
>
> #1 will go fail, fail, fail -> PIO now (or should do Im about to try
it)
> #2 and #4 will be transient
> #3 could go either way


Where does the "'DMA Timeout -> disable DMA' then lose all
responsiveness when I issue 'hdparm -d1' while it tries and fails to
re-enable DMA" fit in?  The disk will happily run for several days in
UDMA33 and then at some point it craps out with a DMA timeout which
results 1) DMA being turned off and 2) all attempts to re-enable DMA
failing?

And what's up with this:

[root@MoveAlong james]# hdparm /dev/hda

/dev/hda:
 multcount    =  0 (off)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  1 (on)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 19590/16/63, sectors = 19746720, start = 0
[root@MoveAlong james]# hdparm -tT /dev/hda

/dev/hda:
 Timing buffer-cache reads:   128 MB in  6.98 seconds = 18.34 MB/sec
 Timing buffered disk reads:  64 MB in  5.77 seconds = 11.09 MB/sec
Hmm.. suspicious results: probably not enough free memory for a proper
test.
[root@MoveAlong james]# free
             total       used       free     shared    buffers
cached
Mem:        126800     123460       3340          0      67284
41572
-/+ buffers/cache:      14604     112196
Swap:       394624      32816     361808

I used to get ~33MB/sec on buffer-cache and ~10MB/sec on buffered disk
reads in 2.2...

--JT


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 21:12   ` Alan Cox
  2001-05-28 22:11     ` James Turinsky
@ 2001-05-29  6:18     ` Larry McVoy
  2001-05-28 22:20       ` Alan Cox
  2001-05-28 22:56       ` Meelis Roos
  1 sibling, 2 replies; 15+ messages in thread
From: Larry McVoy @ 2001-05-29  6:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mark Hahn, Jens Axboe, andre, linux-kernel

On Mon, May 28, 2001 at 10:12:31PM +0100, Alan Cox wrote:
> > really?  do we know the nature of the DMA engine problem well enough?
> 3.	Bad cabling

For what it is worth, in the recent postings I made about this topic, you
suggested that it was bad cabling, I swapped the cabling, same problem.
I swapped the mother board from Abit K7T to ASUS A7V and all cables worked
fine.

I really think there is a software problem in there with certain chipsets,
those from VIA seem to be problematic.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-29  6:18     ` Larry McVoy
@ 2001-05-28 22:20       ` Alan Cox
  2001-05-28 22:56       ` Meelis Roos
  1 sibling, 0 replies; 15+ messages in thread
From: Alan Cox @ 2001-05-28 22:20 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Alan Cox, Mark Hahn, Jens Axboe, andre, linux-kernel

> For what it is worth, in the recent postings I made about this topic, you
> suggested that it was bad cabling, I swapped the cabling, same problem.
> I swapped the mother board from Abit K7T to ASUS A7V and all cables worked
> fine.
> 
> I really think there is a software problem in there with certain chipsets,
> those from VIA seem to be problematic.

Well given the catalogue of VIA chipset problems popping up on news sites right
now that would not suprise me. Also the non -ac tree has a very out of date
VIA ide driver although I don't think that impacts this case.

Alan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-29  6:18     ` Larry McVoy
  2001-05-28 22:20       ` Alan Cox
@ 2001-05-28 22:56       ` Meelis Roos
  2001-05-29  7:11         ` Larry McVoy
  1 sibling, 1 reply; 15+ messages in thread
From: Meelis Roos @ 2001-05-28 22:56 UTC (permalink / raw)
  To: linux-kernel

LM> For what it is worth, in the recent postings I made about this topic, you
LM> suggested that it was bad cabling, I swapped the cabling, same problem.
LM> I swapped the mother board from Abit K7T to ASUS A7V and all cables worked
LM> fine.

Similar info about KT7 - changing cables (both 30 and 80 wire) on Abit KT7 did
not help, still CRC errors (with all disks tried). So it looks like some KT7
boards have problems with IDE interface cabling or smth. like that.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch]: ide dma timeout retry in pio
  2001-05-28 22:56       ` Meelis Roos
@ 2001-05-29  7:11         ` Larry McVoy
  0 siblings, 0 replies; 15+ messages in thread
From: Larry McVoy @ 2001-05-29  7:11 UTC (permalink / raw)
  To: Meelis Roos; +Cc: linux-kernel

On Tue, May 29, 2001 at 12:56:37AM +0200, Meelis Roos wrote:
> LM> For what it is worth, in the recent postings I made about this topic, you
> LM> suggested that it was bad cabling, I swapped the cabling, same problem.
> LM> I swapped the mother board from Abit K7T to ASUS A7V and all cables worked
> LM> fine.
> 
> Similar info about KT7 - changing cables (both 30 and 80 wire) on Abit KT7 did
> not help, still CRC errors (with all disks tried). So it looks like some KT7
> boards have problems with IDE interface cabling or smth. like that.

I don't think it is a cabling problem, I think it is that motherboard.  I
suspect that the chipset on that motherboard is not well supported by
Linux.  

As an aside, I am less than impressed with the IDE support in Linux.
It's been a constant source of problems for the last couple of years
and it doesn't seem to get fixed.  We seem to get lots of chip sets 
almost working and then move on to the next one.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [patch]: ide dma timeout retry in pio
@ 2001-05-30 21:09 Diefenbaugh, Paul S
  0 siblings, 0 replies; 15+ messages in thread
From: Diefenbaugh, Paul S @ 2001-05-30 21:09 UTC (permalink / raw)
  To: 'Christopher B. Liebman', Jens Axboe, Mark Hahn
  Cc: Acpi@Phobos. Fachschaften. Tu-Muenchen. De, linux-kernel, alan,
	andre

Chris/All:

I think your assumptions are correct.  I'm guessing that IDE DMA activity is
not being properly handled when the CPU is in C3, resulting in memory (and
therefore file system) corruption.  We haven't seen corruption on our
development systems, but this is probably due to the fact that we don't
explicitly enable IDE DMA transfers (?).

I'm concerned that the CPU is being put into C3 during what appears to be
times of high bus mastering activity.  The default policy (prpolicy.c) is
configured to only use C3 when bus mastering (BM_STS) is silent for 4 or
more 'quantums'.  You can see if this is working by causing disk activity
while cat'ing the file '/proc/acpi/processor/0/status': the C3 counter
should not be incrementing (or not by much, anyway).

The C3 handler should block bus master activity while the CPU is in C3.  DMA
activity (writes) during C3 would result in cache-incoherency (since the CPU
is not snooping) and thus memory corruption.  The idea is to block bus
mastering activity while in C3 (ARB_DIS), but allow the CPU to wakeup
whenever bus mastering is requested (BM_RLD).  I'm betting that DMA is
happening during C3 resulting in fs corruption.

To verify if C3 is really the culprit we should try disabling its use on a
vulnerable system.  I'd recommend mapping the C3 handler to use C2 instead,
which could be done by modifying the switch statement in pr_power_idle()
within prpower.c (see below).  Note that we'll still be setting BM_RLD for
C3's during pr_power_activate_state(), but this shouldn't be an issue.

	case PR_C2:
	case PR_C3:
		/* Interrupts must be disabled during C2 transitions */
		disable();
		/* See how long we're asleep for */
		acpi_get_timer(&start_ticks);
		/* Invoke C2 */
		acpi_os_in8(processor->power.p_lvl2);
		/* Dummy op - must do something useless after P_LVL2 read */
		acpi_hw_register_bit_access(ACPI_READ, ACPI_MTX_DO_NOT_LOCK,

			BM_STS);
		/* Compute time elapsed */
		acpi_get_timer(&end_ticks);
		/* Re-enable interrupts */
		enable();
		break;

	<remove previous 'case PR_C3'>

Could somebody give this a try and let me know?

Thanks,

-- Paul Diefenbaugh
   Intel Corporation

-----Original Message-----
From: Christopher B. Liebman [mailto:liebman@sponsera.com]
Sent: Monday, May 28, 2001 1:13 PM
To: Jens Axboe; Mark Hahn
Cc: Acpi@Phobos. Fachschaften. Tu-Muenchen. De;
linux-kernel@vger.kernel.org; alan@lxorguk.ukuu.org.uk;
andre@linux-ide.org
Subject: RE: [patch]: ide dma timeout retry in pio

I think that this may be an issue with ACPI processor power saving...  I
have documented issues with ide DMA timeouts when the processor is put into
the C3 power state.  One of the things that happens in this state is that
buss master arbitration is *disabled*.....  bus master activity is
*supposed* to transition the system back to a C0 power state.  I'll bet
there are some issues with the Linux IDE dma and disabling bus master
arbitration......  ideas?  thoughts?  patches? ;-)

	-- Chris

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Mark Hahn
>
> I seem to recall Andre saying that the problem arises when the
> ide DMA engine looses PCI arbitration during a burst.  shorter
> bursts would seem like the best workaround if this is the problem...
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-05-30 21:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-28 18:34 [patch]: ide dma timeout retry in pio Jens Axboe
2001-05-28 19:39 ` Mark Hahn
2001-05-28 20:13   ` Christopher B. Liebman
2001-05-28 20:37   ` Jens Axboe
2001-05-28 22:15     ` Andre Hedrick
2001-05-28 22:26       ` Jens Axboe
2001-05-29  0:09         ` Andre Hedrick
2001-05-29  0:30           ` Jens Axboe
2001-05-28 21:12   ` Alan Cox
2001-05-28 22:11     ` James Turinsky
2001-05-29  6:18     ` Larry McVoy
2001-05-28 22:20       ` Alan Cox
2001-05-28 22:56       ` Meelis Roos
2001-05-29  7:11         ` Larry McVoy
  -- strict thread matches above, loose matches on Subject: below --
2001-05-30 21:09 Diefenbaugh, Paul S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox