* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 7:04 ` Linus Torvalds
@ 2007-01-29 7:19 ` Mike Galbraith
2007-01-29 10:01 ` Mike Galbraith
` (3 subsequent siblings)
4 siblings, 0 replies; 37+ messages in thread
From: Mike Galbraith @ 2007-01-29 7:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan, linux-ide,
B.Zolnierkiewicz, Jeff Garzik, Jens Axboe, Mike Christie,
James Bottomley
On Sun, 2007-01-28 at 23:04 -0800, Linus Torvalds wrote:
> Can somebody try to bisect this?
I'm bisecting the old fashioned way right now. I'll get it to at least a
specific rc, and maybe further.
-Mike
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 7:04 ` Linus Torvalds
2007-01-29 7:19 ` Mike Galbraith
@ 2007-01-29 10:01 ` Mike Galbraith
2007-01-29 18:16 ` Mike Galbraith
2007-01-29 17:16 ` Mike Christie
` (2 subsequent siblings)
4 siblings, 1 reply; 37+ messages in thread
From: Mike Galbraith @ 2007-01-29 10:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan, linux-ide,
B.Zolnierkiewicz, Jeff Garzik, Jens Axboe, Mike Christie,
James Bottomley
On Sun, 2007-01-28 at 23:04 -0800, Linus Torvalds wrote:
>
> [ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is
> associated with the "larger block pc request" stuff: Mike, Jens? James B
> added for good luck.
>
> It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2,
2.6.20-rc1 is bad for me. Ripping the diff apart manually is proving
challenging, so I suppose I'll bite the bullet and do the git clone. No
idea how long that'll take at ~45KB/S, but I'll do the bisect if nobody
beats me to it.
-Mike
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 10:01 ` Mike Galbraith
@ 2007-01-29 18:16 ` Mike Galbraith
2007-01-29 18:43 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Mike Galbraith @ 2007-01-29 18:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan, linux-ide,
B.Zolnierkiewicz, Jeff Garzik, Jens Axboe, Mike Christie,
James Bottomley
On Mon, 2007-01-29 at 11:01 +0100, Mike Galbraith wrote:
> On Sun, 2007-01-28 at 23:04 -0800, Linus Torvalds wrote:
> >
> > [ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is
> > associated with the "larger block pc request" stuff: Mike, Jens? James B
> > added for good luck.
> >
> > It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2,
>
> 2.6.20-rc1 is bad for me. Ripping the diff apart manually is proving
> challenging, so I suppose I'll bite the bullet and do the git clone. No
> idea how long that'll take at ~45KB/S, but I'll do the bisect if nobody
> beats me to it.
Done. Horse-pookey result.
My criteria for "good" during bisection was no endless error loop. At
no time during bisection did nero actually work, as in allow me to
select a burner. It started off "bad", turned "good" after the third
build and stayed that way.
The extremely unlikely winner is:
29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32 is first bad commit
commit 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date: Mon Dec 4 15:40:40 2006 +0100
[S390] pfault code cleanup.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
:040000 040000 cdf0a6e6468eb76f58702a35394bb54368aba916
ad1bb21e85c76efa44b2a25af0f05c2bde160a80 M arch
:040000 040000 de5ead61f0a183d7b9cb81ac919b08ae1712a4c7
6fbc043b58d4045e3e9544f079f9edc352909ba6 M include
poo.
-Mike
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 18:16 ` Mike Galbraith
@ 2007-01-29 18:43 ` Linus Torvalds
2007-01-30 4:14 ` Mike Galbraith
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-29 18:43 UTC (permalink / raw)
To: Mike Galbraith
Cc: Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan, linux-ide,
B.Zolnierkiewicz, Jeff Garzik, Jens Axboe, Mike Christie,
James Bottomley
On Mon, 29 Jan 2007, Mike Galbraith wrote:
>
> The extremely unlikely winner is:
>
> 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32 is first bad commit
Yeah, that's not going to be it. You probably had a bad kernel there
somewhere that you called "good".
Git bisect is wonderful for figuring out (reasonably quickly) where
problems are, but exactly because it zeroes in on the bug so quickly, if
you give it faulty data, it zeroes in on something *else* very quickly and
without any kind of sense.
There's just no redundancy there, so one small bit error in the input, and
you get a totally wrong end result ;)
I'm trying to narrow it down myself. It all *seemed* to work with the
commit I suspected initially (the "support larger block pc requests" one).
But yes, I haven't actually tried to burn anything either. I also just
started up "nero" and looked whether I'd get the error messages in dmesg.
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 18:43 ` Linus Torvalds
@ 2007-01-30 4:14 ` Mike Galbraith
0 siblings, 0 replies; 37+ messages in thread
From: Mike Galbraith @ 2007-01-30 4:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan, linux-ide,
B.Zolnierkiewicz, Jeff Garzik, Jens Axboe, Mike Christie,
James Bottomley
On Mon, 2007-01-29 at 10:43 -0800, Linus Torvalds wrote:
>
> On Mon, 29 Jan 2007, Mike Galbraith wrote:
> >
> > The extremely unlikely winner is:
> >
> > 29b08d2bae854f66d3cfd5f57aaf2e7c2c7fce32 is first bad commit
>
> Yeah, that's not going to be it. You probably had a bad kernel there
> somewhere that you called "good".
Oh well, all was not wasted. I'm now gitified (the clone wasn't nearly
as bad as I thought it was going to be with my lousy connection). Now I
need another repeatable bug to try it against.
-Mike
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 7:04 ` Linus Torvalds
2007-01-29 7:19 ` Mike Galbraith
2007-01-29 10:01 ` Mike Galbraith
@ 2007-01-29 17:16 ` Mike Christie
2007-01-29 20:37 ` Mike Christie
2007-02-02 13:07 ` Uwe Bugla
4 siblings, 0 replies; 37+ messages in thread
From: Mike Christie @ 2007-01-29 17:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley
Linus Torvalds wrote:
>
> [ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is
> associated with the "larger block pc request" stuff: Mike, Jens? James B
> added for good luck.
>
> It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2,
> and doing a
>
> gitk v2.6.19..v2.6.20-rc2 block/scsi_ioctl.c drivers/ide/
>
> I don't see anything else that really looks all that suspicious.. Unless
> maybe it's that "Fix SG_IO leak". Jeff added because of the hddtemp
> issue, but I think that was effectively SATA-only, so probably isn't
> relevant.
>
> Damn, I just realized that Jens is in the middle of his vacation for a
> week..
>
> Mike, can you please look at this and check? ]
>
Ok, I am looking into it now.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 7:04 ` Linus Torvalds
` (2 preceding siblings ...)
2007-01-29 17:16 ` Mike Christie
@ 2007-01-29 20:37 ` Mike Christie
2007-01-29 20:58 ` Linus Torvalds
` (2 more replies)
2007-02-02 13:07 ` Uwe Bugla
4 siblings, 3 replies; 37+ messages in thread
From: Mike Christie @ 2007-01-29 20:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley
Linus Torvalds wrote:
>>
>> [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>> [ 4362.981475] ide: failed opcode was: unknown
>> [ 4362.986183] hdd: drive not ready for command
>
What chipsets are you guys using?
I tried 2.6.20-rc6, and forced my cdrom to be used by the ide piix
driver instead of the sata one, so the device shows up as hda instead of
a scsi one. And, if I use the ide piix driver for the cdrom then it
works fine. I see IO going through the block layer sg io code which was
modified in 2.6.20 to support larger IOs.
Strangely, if I use the sata driver, then nero uses the sg driver
(drivers/scsi/sg.c) instead of the block layer sg io code. But the sata
code spits out:
Jan 29 12:03:21 madmax kernel: ata1.00: ATAPI check failed
Jan 29 12:03:21 madmax kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Jan 29 12:03:21 madmax kernel: ata1.00: cmd
a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0xac data 808 in
Jan 29 12:03:21 madmax kernel: res
51/51:03:00:00:20/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
Jan 29 12:03:21 madmax kernel: ata1: soft resetting port
Jan 29 12:03:21 madmax kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 29 12:03:21 madmax kernel: ata1.00: failed to IDENTIFY (I/O error,
err_mask=0x2)
Jan 29 12:03:21 madmax kernel: ata1.00: revalidation failed (errno=-5)
Jan 29 12:03:21 madmax kernel: ata1: failed to recover some devices,
retrying in 5 secs
Jan 29 12:03:33 madmax kernel: ata1: port is slow to respond, please be
patient (Status 0xd0)
Jan 29 12:03:56 madmax kernel: ata1: port failed to respond (30 secs,
Status 0xd0)
Jan 29 12:03:56 madmax kernel: ata1: soft resetting port
Jan 29 12:03:57 madmax kernel: ata1.00: configured for PIO4
Jan 29 12:03:57 madmax kernel: ata1: EH complete
nero still works in this case, but is is sluggish. It takes a couple
minutes to start up and on almost every operation like "Choose Recorder"
spits out more error messages like above.
This may be not be related. I am going to keep digging around.
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 20:37 ` Mike Christie
@ 2007-01-29 20:58 ` Linus Torvalds
2007-01-29 22:16 ` Linus Torvalds
2007-01-30 3:25 ` Mike Galbraith
2007-01-30 4:14 ` Jeff Garzik
2 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-29 20:58 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley
On Mon, 29 Jan 2007, Mike Christie wrote:
>
> What chipsets are you guys using?
I can see it with a
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 02)
where the disk is on the SATA side, and the DVD writer is PATA using
ide_piix.
> I tried 2.6.20-rc6, and forced my cdrom to be used by the ide piix
> driver instead of the sata one, so the device shows up as hda instead of
> a scsi one. And, if I use the ide piix driver for the cdrom then it
> works fine. I see IO going through the block layer sg io code which was
> modified in 2.6.20 to support larger IOs.
That doesn't sound very different from my setup.. I wonder why it works
for you.
That said, I'm making progress with my bisection. "16 revisions left to
test after this", and three of those sixteen are
Remove unnecessary blk_queue_bounce in SG_IO
fix SG_IO bio leak
remove blk_queue_activity_fn
but I'll do the bisection 'til the bitter end, in case I get into a
similar nonsense value that the other Mike got into.
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 20:58 ` Linus Torvalds
@ 2007-01-29 22:16 ` Linus Torvalds
2007-01-29 23:01 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-29 22:16 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007, Linus Torvalds wrote:
>
> That said, I'm making progress with my bisection. "16 revisions left to
> test after this", and three of those sixteen are
>
> Remove unnecessary blk_queue_bounce in SG_IO
> fix SG_IO bio leak
> remove blk_queue_activity_fn
I've now bisected to the point where those are the only commits left, so
it's definitely one of the three.
Two more reboots and I should know exactly which one broke "nero".
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 22:16 ` Linus Torvalds
@ 2007-01-29 23:01 ` Linus Torvalds
2007-01-29 23:32 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-29 23:01 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007, Linus Torvalds wrote:
>
> Two more reboots and I should know exactly which one broke "nero".
This one.
However, the scary thing is that I think the patch really is correct, and
I wonder if nero has some strange work-around for an older bug.. Although
I don't see how you could even have that, since afaik, the behaviour
before the fix was literally just a leak that a user process shouldn't be
able to see.
Very strange. Will add some debugging printk's.
Linus
----
commit 77d172ce2719b5ad2dc0637452c8871d9cba344c
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Mon Dec 11 10:01:34 2006 +0100
[PATCH] fix SG_IO bio leak
This patch fixes bio leaks in SG_IO. rq->bio can be changed after io
completion, so we need to reset rq->bio before calling blk_rq_unmap_user()
http://marc.theaimsgroup.com/?l=linux-kernel&m=116570666807983&w=2
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
block/scsi_ioctl.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index b3e2107..045cabd 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -228,6 +228,7 @@ static int sg_io(struct file *file, request_queue_t *q,
struct request *rq;
char sense[SCSI_SENSE_BUFFERSIZE];
unsigned char cmd[BLK_MAX_CDB];
+ struct bio *bio;
if (hdr->interface_id != 'S')
return -EINVAL;
@@ -308,6 +309,7 @@ static int sg_io(struct file *file, request_queue_t *q,
if (ret)
goto out;
+ bio = rq->bio;
rq->retries = 0;
start_time = jiffies;
@@ -338,6 +340,7 @@ static int sg_io(struct file *file, request_queue_t *q,
hdr->sb_len_wr = len;
}
+ rq->bio = bio;
if (blk_rq_unmap_user(rq))
ret = -EFAULT;
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 23:01 ` Linus Torvalds
@ 2007-01-29 23:32 ` Linus Torvalds
2007-01-29 23:42 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-29 23:32 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
Uwe, others, does this patch fix your problem?
It will have a few printk's that it spews out, but if it fixes your
problem, at least we know a bit more.
Linus
---
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 2528a0c..f0ff151 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,8 +333,13 @@ static int sg_io(struct file *file, request_queue_t *q,
hdr->sb_len_wr = len;
}
- if (blk_rq_unmap_user(bio))
+ if (rq->bio != bio)
+ printk("rq->bio = %p, bio = %p\n", rq->bio, bio);
+
+ if (blk_rq_unmap_user(rq->bio)) {
+ printk("blk_rq_unmap_user failed!\n");
ret = -EFAULT;
+ }
/* may not have succeeded, but output values written to control
* structure (struct sg_io_hdr). */
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 23:32 ` Linus Torvalds
@ 2007-01-29 23:42 ` Mike Christie
2007-01-30 0:23 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2007-01-29 23:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
Linus Torvalds wrote:
> Uwe, others, does this patch fix your problem?
>
I can replicate the problem now using a older box, but same driver.
> It will have a few printk's that it spews out, but if it fixes your
> problem, at least we know a bit more.
>
> Linus
> ---
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 2528a0c..f0ff151 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -333,8 +333,13 @@ static int sg_io(struct file *file, request_queue_t *q,
> hdr->sb_len_wr = len;
> }
>
> - if (blk_rq_unmap_user(bio))
> + if (rq->bio != bio)
> + printk("rq->bio = %p, bio = %p\n", rq->bio, bio);
> +
rq->bio is NULL here, so no data is coped back to userspace and it seems
nero just stops trying to talk to the drive after this.
Because nero just gives up, no more commands are sent and we do not get
flooded with status errors like before so it sort of looks like it
solves the problem but it doesn't - at least that is what is happening here.
The reason for using the bio in that patch is that
__end_that_request_first eventually sets rq->bio to NULL so the caller
of the blk_execute is supposed to save a pointer to the first bio for
later unmapping.
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 23:42 ` Mike Christie
@ 2007-01-30 0:23 ` Linus Torvalds
2007-01-30 0:55 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-30 0:23 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007, Mike Christie wrote:
>
> rq->bio is NULL here, so no data is coped back to userspace and it seems
> nero just stops trying to talk to the drive after this.
Well, except that's what we used to do in 2.6.19 too. So what changed?
> Because nero just gives up, no more commands are sent and we do not get
> flooded with status errors like before so it sort of looks like it
> solves the problem but it doesn't - at least that is what is happening here.
Yeah, I can't burn with Nero either, although I'll also say that I've
never even tried before, so my leet Nero skillz are nonexistant.
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 0:23 ` Linus Torvalds
@ 2007-01-30 0:55 ` Mike Christie
2007-01-30 1:04 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2007-01-30 0:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
Linus Torvalds wrote:
>
> On Mon, 29 Jan 2007, Mike Christie wrote:
>> rq->bio is NULL here, so no data is coped back to userspace and it seems
>> nero just stops trying to talk to the drive after this.
>
> Well, except that's what we used to do in 2.6.19 too. So what changed?
Oops, you are right. I thought you reverted the place where rq->bio was
getting set to bio. Ignore my comment.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 0:55 ` Mike Christie
@ 2007-01-30 1:04 ` Mike Christie
2007-01-30 1:45 ` Linus Torvalds
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2007-01-30 1:04 UTC (permalink / raw)
To: Mike Christie
Cc: Linus Torvalds, Mike Galbraith, Uwe Bugla, Adrian Bunk,
Andrew Morton, gd, alan, linux-ide, B.Zolnierkiewicz, Jeff Garzik,
Jens Axboe, James Bottomley, FUJITA Tomonori, Boaz Harrosh
Mike Christie wrote:
> Linus Torvalds wrote:
>> On Mon, 29 Jan 2007, Mike Christie wrote:
>>> rq->bio is NULL here, so no data is coped back to userspace and it seems
>>> nero just stops trying to talk to the drive after this.
>> Well, except that's what we used to do in 2.6.19 too. So what changed?
>
Actually, I do not think we did this in 2.6.19. Tomo added a bug when he
ported a patch and mixed up some things so we did something weird for
2.6.20-rc1.
> Oops, you are right. I thought you reverted the place where rq->bio was
> getting set to bio. Ignore my comment.
>
I think I am right now :)
In 2.6.19, we did:
bio = rq->bio;
blk_execute_rq() <- the execution sets rq->bio to null so that is why we
save a bio pointer.
blk_rq_unmap_user(bio);
For a while in 2.6.20-rc1, we basically did
blk_execute_rq()
blk_rq_unmap_user(rq->bio) <- this was a bug that caused the mem leak
and caused data to not be copied because rq->bio was null.
Tomo and Jens then fixes this in 2.6.20-rc2 or rc3 to what we have in rc6:
bio = rq->bio;
blk_execute_rq() <- the execution sets rq->bio to null so that is why we
save a bio pointer.
blk_rq_unmap_user(bio);
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 1:04 ` Mike Christie
@ 2007-01-30 1:45 ` Linus Torvalds
2007-01-30 2:50 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2007-01-30 1:45 UTC (permalink / raw)
To: Mike Christie
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007, Mike Christie wrote:
>
> Actually, I do not think we did this in 2.6.19. Tomo added a bug when he
> ported a patch and mixed up some things so we did something weird for
> 2.6.20-rc1.
Ah, ok. Warring bugs. Have you pinpointed the original one? Is it your
original "block: support larger block pc requests" after all? It looks
like yours is the one that did the big changes, with Tomo then fixing some
of the fallout?
In fact, now that I look closer, I see that it's definitely your first
patch that first removes the old code:
> In 2.6.19, we did:
>
> bio = rq->bio;
> blk_execute_rq() <- the execution sets rq->bio to null so that is why we
> save a bio pointer.
> blk_rq_unmap_user(bio);
And then your patch introduces the bug:
> For a while in 2.6.20-rc1, we basically did
>
> blk_execute_rq()
> blk_rq_unmap_user(rq->bio) <- this was a bug that caused the mem leak
> and caused data to not be copied because rq->bio was null.
.. and perhaps introduced somethign else too?
and thus:
> Tomo and Jens then fixes this in 2.6.20-rc2 or rc3 to what we have in rc6:
>
> bio = rq->bio;
> blk_execute_rq() <- the execution sets rq->bio to null so that is why we
> save a bio pointer.
> blk_rq_unmap_user(bio);
didn't actually help, because if fixed the bio leak, but it didn't fix
whatever else went wrong..
Mike, with (a) you being attributed as the author of that original
"support larger block pc requests" and (b) Jens apparently off gallivating
somewhere, I really hope we you can find this, because otherwise I get the
feeling that I have to revert that series if I'm to release a 2.6.20 in
any kind of timely manner.
(Not that I haven't held up releases before, but this seems to be the main
remaining thing, and in that light I think I'd probably end up
reverting..)
No pressure ;)
(But it certainly doesn't have to be "today", so don't take it that way)
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 1:45 ` Linus Torvalds
@ 2007-01-30 2:50 ` Mike Christie
2007-01-30 3:02 ` Mike Christie
2007-01-30 3:08 ` Andrew Morton
0 siblings, 2 replies; 37+ messages in thread
From: Mike Christie @ 2007-01-30 2:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
[-- Attachment #1: Type: text/plain, Size: 820 bytes --]
Linus Torvalds wrote:
>
> On Mon, 29 Jan 2007, Mike Christie wrote:
>> Actually, I do not think we did this in 2.6.19. Tomo added a bug when he
>> ported a patch and mixed up some things so we did something weird for
>> 2.6.20-rc1.
>
> Ah, ok. Warring bugs. Have you pinpointed the original one? Is it your
> original "block: support larger block pc requests" after all? It looks
> like yours is the one that did the big changes, with Tomo then fixing some
> of the fallout?
I am not completely convinced this bug is the "block: support larger
block pc requests" patches fault. If I revert the jiffies_to_msecs usage
then it works for me. On my system I have
CONFIG_HZ_250=y
CONFIG_HZ=250
With the attached patch, nero finds the cd drives and I can burn disks.
There is no errors from the ide layer like before.
[-- Attachment #2: use-old-timeout-calc.patch --]
[-- Type: text/x-patch, Size: 398 bytes --]
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 2528a0c..aded9a0 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -271,7 +271,7 @@ static int sg_io(struct file *file, requ
rq->cmd_type = REQ_TYPE_BLOCK_PC;
- rq->timeout = jiffies_to_msecs(hdr->timeout);
+ rq->timeout = (hdr->timeout * HZ) / 1000;
if (!rq->timeout)
rq->timeout = q->sg_timeout;
if (!rq->timeout)
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 2:50 ` Mike Christie
@ 2007-01-30 3:02 ` Mike Christie
2007-01-30 3:08 ` Andrew Morton
1 sibling, 0 replies; 37+ messages in thread
From: Mike Christie @ 2007-01-30 3:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mike Galbraith, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
Mike Christie wrote:
> Linus Torvalds wrote:
>> On Mon, 29 Jan 2007, Mike Christie wrote:
>>> Actually, I do not think we did this in 2.6.19. Tomo added a bug when he
>>> ported a patch and mixed up some things so we did something weird for
>>> 2.6.20-rc1.
>> Ah, ok. Warring bugs. Have you pinpointed the original one? Is it your
>> original "block: support larger block pc requests" after all? It looks
>> like yours is the one that did the big changes, with Tomo then fixing some
>> of the fallout?
>
> I am not completely convinced this bug is the "block: support larger
> block pc requests" patches fault. If I revert the jiffies_to_msecs usage
> then it works for me. On my system I have
> CONFIG_HZ_250=y
> CONFIG_HZ=250
>
> With the attached patch, nero finds the cd drives and I can burn disks.
> There is no errors from the ide layer like before.
>
>
> ------------------------------------------------------------------------
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 2528a0c..aded9a0 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -271,7 +271,7 @@ static int sg_io(struct file *file, requ
>
> rq->cmd_type = REQ_TYPE_BLOCK_PC;
>
> - rq->timeout = jiffies_to_msecs(hdr->timeout);
> + rq->timeout = (hdr->timeout * HZ) / 1000;
> if (!rq->timeout)
> rq->timeout = q->sg_timeout;
> if (!rq->timeout)
Or if that patch is wrong maybe we want something like what
drivers/scsi/sg.c uses:
ul_timeout = msecs_to_jiffies(srp->header.timeout);
timeout = (ul_timeout < INT_MAX) ? ul_timeout : INT_MAX;
I do not think we should be doing jiffies_to_msecs though.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 2:50 ` Mike Christie
2007-01-30 3:02 ` Mike Christie
@ 2007-01-30 3:08 ` Andrew Morton
2007-01-30 2:18 ` Mike Christie
2007-01-30 4:00 ` Mike Galbraith
1 sibling, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2007-01-30 3:08 UTC (permalink / raw)
To: Mike Christie
Cc: Linus Torvalds, Mike Galbraith, Uwe Bugla, Adrian Bunk, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007 20:50:58 -0600
Mike Christie <michaelc@cs.wisc.edu> wrote:
> With the attached patch, nero finds the cd drives and I can burn disks.
> There is no errors from the ide layer like before.
>
>
> [use-old-timeout-calc.patch text/x-patch (399B)]
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 2528a0c..aded9a0 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -271,7 +271,7 @@ static int sg_io(struct file *file, requ
>
> rq->cmd_type = REQ_TYPE_BLOCK_PC;
>
> - rq->timeout = jiffies_to_msecs(hdr->timeout);
> + rq->timeout = (hdr->timeout * HZ) / 1000;
Yes, that was a buggy conversion - it should have been msecs_to_jiffies().
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 3:08 ` Andrew Morton
@ 2007-01-30 2:18 ` Mike Christie
2007-01-30 3:33 ` Andrew Morton
2007-01-30 4:44 ` Linus Torvalds
2007-01-30 4:00 ` Mike Galbraith
1 sibling, 2 replies; 37+ messages in thread
From: Mike Christie @ 2007-01-30 2:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Mike Galbraith, Uwe Bugla, Adrian Bunk, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 2007-01-29 at 19:08 -0800, Andrew Morton wrote:
> On Mon, 29 Jan 2007 20:50:58 -0600
> Mike Christie <michaelc@cs.wisc.edu> wrote:
>
> > With the attached patch, nero finds the cd drives and I can burn disks.
> > There is no errors from the ide layer like before.
> >
> >
> > [use-old-timeout-calc.patch text/x-patch (399B)]
> > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > index 2528a0c..aded9a0 100644
> > --- a/block/scsi_ioctl.c
> > +++ b/block/scsi_ioctl.c
> > @@ -271,7 +271,7 @@ static int sg_io(struct file *file, requ
> >
> > rq->cmd_type = REQ_TYPE_BLOCK_PC;
> >
> > - rq->timeout = jiffies_to_msecs(hdr->timeout);
> > + rq->timeout = (hdr->timeout * HZ) / 1000;
>
> Yes, that was a buggy conversion - it should have been msecs_to_jiffies().
Ok. here is a fix with the overflow check sg.c has. Patch
was made against Linus's tree and tested with nero.
Userspace does not send us jiffies. Use msecs_to_jiffies
and check for overflow like sg.c
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 2528a0c..5ca72c5 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -223,7 +223,7 @@ static int verify_command(struct file *f
static int sg_io(struct file *file, request_queue_t *q,
struct gendisk *bd_disk, struct sg_io_hdr *hdr)
{
- unsigned long start_time;
+ unsigned long start_time. timeout;
int writing = 0, ret = 0;
struct request *rq;
char sense[SCSI_SENSE_BUFFERSIZE];
@@ -271,7 +271,8 @@ static int sg_io(struct file *file, requ
rq->cmd_type = REQ_TYPE_BLOCK_PC;
- rq->timeout = jiffies_to_msecs(hdr->timeout);
+ timeout = msecs_to_jiffies(hdr->timeout);
+ rq->timeout = (timeout < INT_MAX) ? timeout : INT_MAX;
if (!rq->timeout)
rq->timeout = q->sg_timeout;
if (!rq->timeout)
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 2:18 ` Mike Christie
@ 2007-01-30 3:33 ` Andrew Morton
2007-01-30 4:44 ` Linus Torvalds
1 sibling, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2007-01-30 3:33 UTC (permalink / raw)
To: Mike Christie
Cc: Linus Torvalds, Mike Galbraith, Uwe Bugla, Adrian Bunk, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007 21:18:38 -0500
Mike Christie <michaelc@cs.wisc.edu> wrote:
> Ok. here is a fix with the overflow check sg.c has. Patch
> was made against Linus's tree and tested with nero.
>
> Userspace does not send us jiffies. Use msecs_to_jiffies
> and check for overflow like sg.c
>
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 2528a0c..5ca72c5 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -223,7 +223,7 @@ static int verify_command(struct file *f
> static int sg_io(struct file *file, request_queue_t *q,
> struct gendisk *bd_disk, struct sg_io_hdr *hdr)
> {
> - unsigned long start_time;
> + unsigned long start_time. timeout;
you'll be wanting a comma there.
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 2:18 ` Mike Christie
2007-01-30 3:33 ` Andrew Morton
@ 2007-01-30 4:44 ` Linus Torvalds
1 sibling, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2007-01-30 4:44 UTC (permalink / raw)
To: Mike Christie
Cc: Andrew Morton, Mike Galbraith, Uwe Bugla, Adrian Bunk, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 29 Jan 2007, Mike Christie wrote:
>
> Ok. here is a fix with the overflow check sg.c has. Patch
> was made against Linus's tree and tested with nero.
>
> Userspace does not send us jiffies. Use msecs_to_jiffies
> and check for overflow like sg.c
>
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Thanks. Fixed the ',' that Andrew pointed out, committed and pushed out.
What a stupid bug. And I even _looked_ at that patch, since there weren't
that many that changed anything in this area, and it looked just subtly
right enough that no warning bells ever went off.
Thanks to everybody, but especially Mike for showing us the errors of our
ways.
Linus
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-30 3:08 ` Andrew Morton
2007-01-30 2:18 ` Mike Christie
@ 2007-01-30 4:00 ` Mike Galbraith
1 sibling, 0 replies; 37+ messages in thread
From: Mike Galbraith @ 2007-01-30 4:00 UTC (permalink / raw)
To: Andrew Morton
Cc: Mike Christie, Linus Torvalds, Uwe Bugla, Adrian Bunk, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley, FUJITA Tomonori, Boaz Harrosh
On Mon, 2007-01-29 at 19:08 -0800, Andrew Morton wrote:
> On Mon, 29 Jan 2007 20:50:58 -0600
> Mike Christie <michaelc@cs.wisc.edu> wrote:
>
> > With the attached patch, nero finds the cd drives and I can burn disks.
> > There is no errors from the ide layer like before.
> >
> >
> > [use-old-timeout-calc.patch text/x-patch (399B)]
> > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> > index 2528a0c..aded9a0 100644
> > --- a/block/scsi_ioctl.c
> > +++ b/block/scsi_ioctl.c
> > @@ -271,7 +271,7 @@ static int sg_io(struct file *file, requ
> >
> > rq->cmd_type = REQ_TYPE_BLOCK_PC;
> >
> > - rq->timeout = jiffies_to_msecs(hdr->timeout);
> > + rq->timeout = (hdr->timeout * HZ) / 1000;
>
> Yes, that was a buggy conversion - it should have been msecs_to_jiffies().
Confirmed. With that line changed to msecs_to_jiffies()in 2.6.20-rc6,
nero is a happy camper. Burner selected, disk burned, no whimpering.
-Mike
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 20:37 ` Mike Christie
2007-01-29 20:58 ` Linus Torvalds
@ 2007-01-30 3:25 ` Mike Galbraith
2007-01-30 4:14 ` Jeff Garzik
2 siblings, 0 replies; 37+ messages in thread
From: Mike Galbraith @ 2007-01-30 3:25 UTC (permalink / raw)
To: Mike Christie
Cc: Linus Torvalds, Uwe Bugla, Adrian Bunk, Andrew Morton, gd, alan,
linux-ide, B.Zolnierkiewicz, Jeff Garzik, Jens Axboe,
James Bottomley
On Mon, 2007-01-29 at 14:37 -0600, Mike Christie wrote:
> Linus Torvalds wrote:
> >>
> >> [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> >> [ 4362.981475] ide: failed opcode was: unknown
> >> [ 4362.986183] hdd: drive not ready for command
> >
>
> What chipsets are you guys using?
ICH5 here.
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 20:37 ` Mike Christie
2007-01-29 20:58 ` Linus Torvalds
2007-01-30 3:25 ` Mike Galbraith
@ 2007-01-30 4:14 ` Jeff Garzik
2 siblings, 0 replies; 37+ messages in thread
From: Jeff Garzik @ 2007-01-30 4:14 UTC (permalink / raw)
To: Mike Christie
Cc: Linus Torvalds, Mike Galbraith, Uwe Bugla, Adrian Bunk,
Andrew Morton, gd, alan, linux-ide, B.Zolnierkiewicz, Jens Axboe,
James Bottomley, Tejun Heo
Mike Christie wrote:
> Strangely, if I use the sata driver, then nero uses the sg driver
> (drivers/scsi/sg.c) instead of the block layer sg io code. But the sata
> code spits out:
>
> Jan 29 12:03:21 madmax kernel: ata1.00: ATAPI check failed
> Jan 29 12:03:21 madmax kernel: ata1.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Jan 29 12:03:21 madmax kernel: ata1.00: cmd
> a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0xac data 808 in
> Jan 29 12:03:21 madmax kernel: res
> 51/51:03:00:00:20/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
> Jan 29 12:03:21 madmax kernel: ata1: soft resetting port
> Jan 29 12:03:21 madmax kernel: ATA: abnormal status 0xD0 on port 0x1F7
> Jan 29 12:03:21 madmax kernel: ata1.00: failed to IDENTIFY (I/O error,
> err_mask=0x2)
> Jan 29 12:03:21 madmax kernel: ata1.00: revalidation failed (errno=-5)
-5 == EIO
> Jan 29 12:03:21 madmax kernel: ata1: failed to recover some devices,
> retrying in 5 secs
> Jan 29 12:03:33 madmax kernel: ata1: port is slow to respond, please be
> patient (Status 0xd0)
> Jan 29 12:03:56 madmax kernel: ata1: port failed to respond (30 secs,
> Status 0xd0)
> Jan 29 12:03:56 madmax kernel: ata1: soft resetting port
> Jan 29 12:03:57 madmax kernel: ata1.00: configured for PIO4
> Jan 29 12:03:57 madmax kernel: ata1: EH complete
It's not critical for 2.6.20, since the default recommendation for PATA
is still "use drivers/ide", but Alan Cox (and Tejun and me) should be
aware of cases where drivers/ide works but libata does not.
So, if you could post some machine and .config info regarding this to
linux-ide (CC'ing Alan), that would be appreciated.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
2007-01-29 7:04 ` Linus Torvalds
` (3 preceding siblings ...)
2007-01-29 20:37 ` Mike Christie
@ 2007-02-02 13:07 ` Uwe Bugla
4 siblings, 0 replies; 37+ messages in thread
From: Uwe Bugla @ 2007-02-02 13:07 UTC (permalink / raw)
To: Linus Torvalds, efault
Cc: James.Bottomley, michaelc, jens.axboe, jgarzik, B.Zolnierkiewicz,
linux-ide, alan, gd, akpm, bunk
-------- Original-Nachricht --------
Datum: Sun, 28 Jan 2007 23:04:49 -0800 (PST)
Von: Linus Torvalds <torvalds@linux-foundation.org>
An: Mike Galbraith <efault@gmx.de>
CC: Uwe Bugla <uwe.bugla@gmx.de>, Adrian Bunk <bunk@stusta.de>, Andrew Morton <akpm@osdl.org>, gd@spherenet.de, alan@lxorguk.ukuu.org.uk, linux-ide@vger.kernel.org, B.Zolnierkiewicz@elka.pw.edu.pl, Jeff Garzik <jgarzik@pobox.com>, Jens Axboe <jens.axboe@oracle.com>, Mike Christie <michaelc@cs.wisc.edu>, James Bottomley <James.Bottomley@SteelEye.com>
Betreff: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
>
>
> [ Added Jeff, Jens and Mike Christie to Cc. I would _guess_ this is
> associated with the "larger block pc request" stuff: Mike, Jens? James B
> added for good luck.
>
> It apparently started happening somewhere between 2.6.19 and 2.6.20-rc2,
> and doing a
>
> gitk v2.6.19..v2.6.20-rc2 block/scsi_ioctl.c drivers/ide/
>
> I don't see anything else that really looks all that suspicious.. Unless
> maybe it's that "Fix SG_IO leak". Jeff added because of the hddtemp
> issue, but I think that was effectively SATA-only, so probably isn't
> relevant.
>
> Damn, I just realized that Jens is in the middle of his vacation for a
> week..
>
> Mike, can you please look at this and check? ]
>
> On Mon, 29 Jan 2007, Mike Galbraith wrote:
> >
> > FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is
> > run, the kernel never gives up retrying whatever command failed, so I
> > get...
> >
> > [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete
> DataRequest }
> > [ 4362.981475] ide: failed opcode was: unknown
> > [ 4362.986183] hdd: drive not ready for command
>
> Ok, I tried the demo version you can download at
>
> http://www.nero.com/eng/nerolinux-prog.html
>
> and yes, nerolinux seems broken. I've never used it before, but it
> triggers:
>
> hda: irq timeout: status=0xc0 { Busy }
> ide: failed opcode was: unknown
>
> and after that there is indeed an endless stream of:
>
> hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hda: drive not ready for command
>
> Which eventially switches to
>
> hda: status error: status=0x59 { DriveReady SeekComplete DataRequest
> Error }
> hda: status error: error=0x40 { LastFailedSense=0x04 }
> ide: failed opcode was: unknown
> hda: drive not ready for command
>
> However, it appears to be rather hard to debug, with nerolinux being some
> closed black box. Does anybody who knows nero know if there is some way to
> get debug information out of it to see what it tried to do?
>
> Can somebody try to bisect this?
>
> Linus
Unfortunately not me, Linus, sorry for my ignorance in those things.
But I will try rc7 and see whether this terrible bug is gone.
Regards
Uwe
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail?ac=OM.GX.GX003K11713T4783a
^ permalink raw reply [flat|nested] 37+ messages in thread