hda: error: DMA in progress..

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* hda: error: DMA in progress..
@ 2002-06-21  9:24 Jens Axboe
  2002-06-21 10:05 ` Martin Dalecki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2002-06-21  9:24 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Linux Kernel

Martin,

I gave 2.5.24 a spin, and it quickly dies with the error in subject,
under moderate disk load. It's an IBM travel star on a PIIX4.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21  9:24 hda: error: DMA in progress Jens Axboe
@ 2002-06-21 10:05 ` Martin Dalecki
  2002-06-21 10:12   ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Dalecki @ 2002-06-21 10:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel

Użytkownik Jens Axboe napisał:
> Martin,
> 
> I gave 2.5.24 a spin, and it quickly dies with the error in subject,
> under moderate disk load. It's an IBM travel star on a PIIX4.
> 

if (test_bit(IDE_DMA, ch->active)) {
		printk(KERN_ERR "%s: error: DMA in progress...\n", drive->name);
			break;
}

Well I did the change we where talking about .waiting_for_dma -> xxx_bit(IDE_DMA.
And I was asking about it's possible interactions with TCQ.
And now we see that it is indeed apparently really interacting with
the TCQ code in bad ways. But if I look down from the above code (Just below in 
ide.c)

	if (blk_queue_plugged(&drive->queue)) {
			BUG_ON(!drive->using_tcq);
			break;
		}

It seems like the check which is catching reality right now
is bogous in itself. Becouse having DMA running would be
only problematic if the queue was still plugged. (Right?)
So please just disable the check.

This time it's no new damage - just detecting weak code
from the past...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 10:05 ` Martin Dalecki
@ 2002-06-21 10:12   ` Jens Axboe
  2002-06-21 10:28     ` Jens Axboe
  2002-06-21 10:31     ` Martin Dalecki
  0 siblings, 2 replies; 9+ messages in thread
From: Jens Axboe @ 2002-06-21 10:12 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Linux Kernel

On Fri, Jun 21 2002, Martin Dalecki wrote:
> U?ytkownik Jens Axboe napisa?:
> >Martin,
> >
> >I gave 2.5.24 a spin, and it quickly dies with the error in subject,
> >under moderate disk load. It's an IBM travel star on a PIIX4.
> >
> 
> 
> if (test_bit(IDE_DMA, ch->active)) {
> 		printk(KERN_ERR "%s: error: DMA in progress...\n", 
> 		drive->name);
> 			break;
> }
> 
> Well I did the change we where talking about .waiting_for_dma -> 
> xxx_bit(IDE_DMA.

Yeah I noticed.

> And I was asking about it's possible interactions with TCQ.

Haven't even tried TCQ yet, the above is just plain dma (no travelstarts
can do tcq).

> And now we see that it is indeed apparently really interacting with
> the TCQ code in bad ways. But if I look down from the above code (Just 
> below in ide.c)
> 
> 	if (blk_queue_plugged(&drive->queue)) {
> 			BUG_ON(!drive->using_tcq);
> 			break;
> 		}
> 
> It seems like the check which is catching reality right now
> is bogous in itself. Becouse having DMA running would be
> only problematic if the queue was still plugged. (Right?)
> So please just disable the check.

Not exactly, let me see if I remember the race here... The queue can
become plugged when we queue one request with the drive (the only on the
queue at that time), and then try to queue another right after (hence
only a tcq issue). In that time period, we drop the queue lock, so it's
indeed possible for the block layer to plug the queue before we reach
the above code again. The drive can be in two states here, 1) IDE_DMA is
set because the drive didn't release the bus (or it did, and it already
reconnected), or 2) drive is disconnected from the bus.

For non-tcq, hitting IDE_DMA set queue_commands() is a bug. The old
IDE_BUSY/IDE_DMA worked because IDE_DMA must not be set if IDE_BUSY is
not set.

> This time it's no new damage - just detecting weak code
> from the past...

Smells like new breakage to me :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 10:12   ` Jens Axboe
@ 2002-06-21 10:28     ` Jens Axboe
  2002-06-21 10:31     ` Martin Dalecki
  1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2002-06-21 10:28 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Linux Kernel

On Fri, Jun 21 2002, Jens Axboe wrote:
> > And now we see that it is indeed apparently really interacting with
> > the TCQ code in bad ways. But if I look down from the above code (Just 
> > below in ide.c)
> > 
> > 	if (blk_queue_plugged(&drive->queue)) {
> > 			BUG_ON(!drive->using_tcq);
> > 			break;
> > 		}
> > 
> > It seems like the check which is catching reality right now
> > is bogous in itself. Becouse having DMA running would be
> > only problematic if the queue was still plugged. (Right?)
> > So please just disable the check.
> 
> Not exactly, let me see if I remember the race here... The queue can
> become plugged when we queue one request with the drive (the only on the
> queue at that time), and then try to queue another right after (hence
> only a tcq issue). In that time period, we drop the queue lock, so it's
> indeed possible for the block layer to plug the queue before we reach
> the above code again. The drive can be in two states here, 1) IDE_DMA is
> set because the drive didn't release the bus (or it did, and it already
> reconnected), or 2) drive is disconnected from the bus.

Sorry that's not true (#1), it's _never_ valid for IDE_DMA to be set
here even when using TCQ (less so, if possible, for non-tcq :-). At
least according to the old semantics. That's only the case when using
speculative starts of the dma engine for tcq with release interrupt
enabled, which we don't use on Linux.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 10:12   ` Jens Axboe
  2002-06-21 10:28     ` Jens Axboe
@ 2002-06-21 10:31     ` Martin Dalecki
  2002-06-21 10:35       ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Dalecki @ 2002-06-21 10:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel

Użytkownik Jens Axboe napisał:
> On Fri, Jun 21 2002, Martin Dalecki wrote:

> 
>>And I was asking about it's possible interactions with TCQ.
> 
> 
> Haven't even tried TCQ yet, the above is just plain dma (no travelstarts
> can do tcq).

Argh...


>>
>>	if (blk_queue_plugged(&drive->queue)) {
>>			BUG_ON(!drive->using_tcq);
>>			break;

> 
> Not exactly, let me see if I remember the race here... The queue can
> become plugged when we queue one request with the drive (the only on the
> queue at that time), and then try to queue another right after (hence
> only a tcq issue). In that time period, we drop the queue lock, so it's
> indeed possible for the block layer to plug the queue before we reach
> the above code again. The drive can be in two states here, 1) IDE_DMA is
> set because the drive didn't release the bus (or it did, and it already
> reconnected), or 2) drive is disconnected from the bus.

OK. We have now just one single place where IDE_DMA gets unset ->
udma_stop. This to too early to reset IDE_BUSY. However it well
may be that ide_dma_intr() simply doesn't care about IDE_BUSY.
Let's have a look...

> 
> For non-tcq, hitting IDE_DMA set queue_commands() is a bug. The old
> IDE_BUSY/IDE_DMA worked because IDE_DMA must not be set if IDE_BUSY is
> not set.
> 
> 
>>This time it's no new damage - just detecting weak code
>>from the past...
> 
> 
> Smells like new breakage to me :-)

Well lets look at ata_irq_intr, the end of it:

	 * Note that handler() may have set things up for another
	 * interrupt to occur soon, but it cannot happen until
	 * we exit from this routine, because it will be the
	 * same irq as is currently being serviced here, and Linux
	 * won't allow another of the same (on any CPU) until we return.
	 */
	if (startstop == ide_stopped) {
		if (!ch->handler) {	/* paranoia */
			clear_bit(IDE_BUSY, ch->active);
			do_request(ch);
		} else {
			printk("%s: %s: huh? expected NULL handler on exit\n", drive->name, __FUNCTION__);
		}
	} else if (startstop == ide_released)
		queue_commands(drive);

I think the above needs more tough now...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 10:31     ` Martin Dalecki
@ 2002-06-21 10:35       ` Jens Axboe
  2002-06-21 11:10         ` Martin Dalecki
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2002-06-21 10:35 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Linux Kernel

On Fri, Jun 21 2002, Martin Dalecki wrote:
> U?ytkownik Jens Axboe napisa?:
> >On Fri, Jun 21 2002, Martin Dalecki wrote:
> 
> >
> >>And I was asking about it's possible interactions with TCQ.
> >
> >
> >Haven't even tried TCQ yet, the above is just plain dma (no travelstarts
> >can do tcq).
> 
> Argh...

Indeed

> >>
> >>	if (blk_queue_plugged(&drive->queue)) {
> >>			BUG_ON(!drive->using_tcq);
> >>			break;
> 
> >
> >Not exactly, let me see if I remember the race here... The queue can
> >become plugged when we queue one request with the drive (the only on the
> >queue at that time), and then try to queue another right after (hence
> >only a tcq issue). In that time period, we drop the queue lock, so it's
> >indeed possible for the block layer to plug the queue before we reach
> >the above code again. The drive can be in two states here, 1) IDE_DMA is
> >set because the drive didn't release the bus (or it did, and it already
> >reconnected), or 2) drive is disconnected from the bus.
> 
> OK. We have now just one single place where IDE_DMA gets unset ->
> udma_stop. This to too early to reset IDE_BUSY. However it well
> may be that ide_dma_intr() simply doesn't care about IDE_BUSY.
> Let's have a look...

You can leave IDE_BUSY there, that's ok. It's not invalid for IDE_BUSY
to be set while IDE_DMA gets cleared. That's expected.

> >For non-tcq, hitting IDE_DMA set queue_commands() is a bug. The old
> >IDE_BUSY/IDE_DMA worked because IDE_DMA must not be set if IDE_BUSY is
> >not set.
> >
> >
> >>This time it's no new damage - just detecting weak code
> >>from the past...
> >
> >
> >Smells like new breakage to me :-)
> 
> Well lets look at ata_irq_intr, the end of it:
> 
> 	 * Note that handler() may have set things up for another
> 	 * interrupt to occur soon, but it cannot happen until
> 	 * we exit from this routine, because it will be the
> 	 * same irq as is currently being serviced here, and Linux
> 	 * won't allow another of the same (on any CPU) until we return.
> 	 */
> 	if (startstop == ide_stopped) {
> 		if (!ch->handler) {	/* paranoia */
> 			clear_bit(IDE_BUSY, ch->active);
> 			do_request(ch);
> 		} else {
> 			printk("%s: %s: huh? expected NULL handler on 
> 			exit\n", drive->name, __FUNCTION__);
> 		}
> 	} else if (startstop == ide_released)
> 		queue_commands(drive);
> 
> I think the above needs more tough now...

Same case as the one I described in the email following this, will only
happen for TCQ with release interrupt enabled. Otherwise it's illegal to
release the bus from the tcq interrupt handler. Since I removed all
traces of that long ago, you can safely kill the

	} else if (startstop == ide_released)
		queue_commands(drive);

part of it.

The rest looks sane. If handler returns it's no longer busy
(ide_stopped), we clear IDE_BUSY (IDE_DMA damn well better be cleared at
this point as well!!) and let do_request() start a new request (heck or
the same, we don't know and don't care).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 10:35       ` Jens Axboe
@ 2002-06-21 11:10         ` Martin Dalecki
  2002-06-21 14:57           ` Stelian Pop
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Dalecki @ 2002-06-21 11:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel

Użytkownik Jens Axboe napisał:

>>OK. We have now just one single place where IDE_DMA gets unset ->
>>udma_stop. This to too early to reset IDE_BUSY. However it well
>>may be that ide_dma_intr() simply doesn't care about IDE_BUSY.
>>Let's have a look...
> 
> 
> You can leave IDE_BUSY there, that's ok. It's not invalid for IDE_BUSY
> to be set while IDE_DMA gets cleared. That's expected.
> 

>>Well lets look at ata_irq_intr, the end of it:
>>
>>	 * Note that handler() may have set things up for another
>>	 * interrupt to occur soon, but it cannot happen until
>>	 * we exit from this routine, because it will be the
>>	 * same irq as is currently being serviced here, and Linux
>>	 * won't allow another of the same (on any CPU) until we return.
>>	 */
>>	if (startstop == ide_stopped) {
>>		if (!ch->handler) {	/* paranoia */
>>			clear_bit(IDE_BUSY, ch->active);
>>			do_request(ch);
>>		} else {
>>			printk("%s: %s: huh? expected NULL handler on 
>>			exit\n", drive->name, __FUNCTION__);
>>		}
>>	} else if (startstop == ide_released)
>>		queue_commands(drive);
>>
>>I think the above needs more tough now...
> 
> 
> Same case as the one I described in the email following this, will only
> happen for TCQ with release interrupt enabled. Otherwise it's illegal to
> release the bus from the tcq interrupt handler. Since I removed all
> traces of that long ago, you can safely kill the
> 
> 	} else if (startstop == ide_released)
> 		queue_commands(drive);
> 
> part of it.

I'm glad to get confirmation on this. This leaves only one place, vide:
do_request, where we can queue up new commands. Much easier to trace and
makes queue_commands never run from IRQ context, which is simplyfiying
things too.

> The rest looks sane. If handler returns it's no longer busy
> (ide_stopped), we clear IDE_BUSY (IDE_DMA damn well better be cleared at
> this point as well!!) and let do_request() start a new request (heck or
> the same, we don't know and don't care).

Right now the handlers are expected to clear IDE_BUSY and ->handler
themself. I have now an idea: Could you add a reporting about
the handler function there:

  	if (test_bit(IDE_DMA, ch->active)) {
			printk(KERN_ERR "%s: error: DMA in progress... %p\n", drive->name, ch->handler);
			break;
		}

And please take a short look at System.map.

This will show which IRQ handler is the culprit...

If it's indeed ide_dma_intr, let's have a look on it:

We see that it's calling udma_stop() immediately. This should
reset IDE_DMA unconditionally.. immediately on enty:

static inline int udma_stop(struct ata_device *drive)
{
	clear_bit(IDE_DMA, drive->channel->active);

	return drive->channel->udma_stop(drive);
}

Argh... There is a race in the above it should be:

static inline int udma_stop(struct ata_device *drive)
{
	int ret = drive->channel->udma_stop(drive);
         clear_bit(IDE_DMA, drive->channel->active);
         return ret;
}

Or we should move the clar_bit down do ide_dma_intr and
silbings behind __ata_end_request().
And finally we don't clear the IDE_BUSY on this code path.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 11:10         ` Martin Dalecki
@ 2002-06-21 14:57           ` Stelian Pop
  2002-06-24  8:54             ` Stelian Pop
  0 siblings, 1 reply; 9+ messages in thread
From: Stelian Pop @ 2002-06-21 14:57 UTC (permalink / raw)
  To: dalecki, Jens Axboe; +Cc: Linux Kernel

> Could you add a reporting about
> the handler function there:
> 
>   	if (test_bit(IDE_DMA, ch->active)) {
> 			printk(KERN_ERR "%s: error: DMA in progress... %p\n",
>   drive->name, ch->handler);
> 			break;
> 		}
> 
> And please take a short look at System.map.
> 
> This will show which IRQ handler is the culprit...

Martin, I have the same problem on my Sony Vaio C1VE, 
Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01), HITACHI_DK23AA-12 disk.

It doesn't even boot, the "DMA in progress error..." appears just
after having mounted the root partition. 2.5.23 worked on this laptop.

I've added, as you suggested the following patch:

===== drivers/ide/ide.c 1.110 vs edited =====
--- 1.110/drivers/ide/ide.c	Thu Jun 20 13:28:43 2002
+++ edited/drivers/ide/ide.c	Fri Jun 21 16:46:29 2002
@@ -863,7 +863,7 @@
 		drive->sleep = 0;
 
 		if (test_bit(IDE_DMA, ch->active)) {
-			printk(KERN_ERR "%s: error: DMA in progress...\n", drive->name);
+			printk(KERN_ERR "%s: error: DMA in progress...%p\n", drive->name, ch->handler);
 			break;
 		}

And the bad news is that ch->handler is NULL...

Stelian.
-- 
Stelian Pop <stelian.pop@fr.alcove.com>
Alcove - http://www.alcove.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hda: error: DMA in progress..
  2002-06-21 14:57           ` Stelian Pop
@ 2002-06-24  8:54             ` Stelian Pop
  0 siblings, 0 replies; 9+ messages in thread
From: Stelian Pop @ 2002-06-24  8:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Martin Dalecki, Jens Axboe

On Fri, Jun 21, 2002 at 04:57:24PM +0200, Stelian Pop wrote:

> Martin, I have the same problem on my Sony Vaio C1VE, 
> Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01), HITACHI_DK23AA-12 disk.
> 
> It doesn't even boot, the "DMA in progress error..." appears just
> after having mounted the root partition. 2.5.23 worked on this laptop.

Ok, after poking around the "waiting_for_dma" changes, I found that
the attached patch is solving all my problems, my laptop works again.

BIG FAT WARNING: I have a very limited knowledge in the ide driver
internals, the attached patch could destroy all your data!

Please advice.

Stelian.

===== include/linux/ide.h 1.90 vs edited =====
--- 1.90/include/linux/ide.h	Thu Jun 20 13:35:15 2002
+++ edited/include/linux/ide.h	Mon Jun 24 10:17:00 2002
@@ -766,10 +766,12 @@
  */
 static inline ide_startstop_t udma_init(struct ata_device *drive, struct request *rq)
 {
-	int ret = drive->channel->udma_init(drive, rq);
-	if (ret == ide_started)
-		set_bit(IDE_DMA, drive->channel->active);
-
+	int ret;
+	
+	set_bit(IDE_DMA, drive->channel->active);
+	ret = drive->channel->udma_init(drive, rq);
+	if (ret != ide_started)
+		clear_bit(IDE_DMA, drive->channel->active);
 	return ret;
 }
 
-- 
Stelian Pop <stelian.pop@fr.alcove.com>
Alcove - http://www.alcove.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-06-24  8:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-21  9:24 hda: error: DMA in progress Jens Axboe
2002-06-21 10:05 ` Martin Dalecki
2002-06-21 10:12   ` Jens Axboe
2002-06-21 10:28     ` Jens Axboe
2002-06-21 10:31     ` Martin Dalecki
2002-06-21 10:35       ` Jens Axboe
2002-06-21 11:10         ` Martin Dalecki
2002-06-21 14:57           ` Stelian Pop
2002-06-24  8:54             ` Stelian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox