* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
[not found] <20100527100752.c12d8e2e.akpm@linux-foundation.org>
@ 2010-05-27 20:30 ` Alan Stern
2010-05-27 20:43 ` Mark Hounschell
2010-05-28 11:51 ` Mark Hounschell
0 siblings, 2 replies; 6+ messages in thread
From: Alan Stern @ 2010-05-27 20:30 UTC (permalink / raw)
To: markh; +Cc: SCSI development list, bugzilla-daemon
On Thu, 27 May 2010, Andrew Morton wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=16058
>
> Summary: [BUG] Cannot boot any kernel from 2.6.27 on if a 256
> byte sector SCSI disk is attached
> As of 2.6.27 if any SCSI disk is attached that has been formatted with a 256
> byte sector size, the boot process hangs. 512, 768, and 1024 byte sector disks
> do not seem to trigger this. The disks in use do NOT have a partition table.
> They are being used by out applications via the sg_io interface only.
>
> A 2.6.26.8 kernel works fine.
>
> I have bisected this problem to the following commit:
>
> # git bisect good
> 427e59f09fdba387547106de7bab980b7fff77be is first bad commit
> commit 427e59f09fdba387547106de7bab980b7fff77be
> Author: James Bottomley <James.Bottomley@HansenPartnership.com>
> Date: Sat Mar 8 18:24:17 2008 -0600
>
> [SCSI] make use of the residue value
>
> USB sometimes doesn't return an error but instead returns a residue
> value indicating part (or all) of the command wasn't completed. So if
> the driver _done() error processing indicates the command was fully
> processed, subtract off the residue so that this USB error gets
> propagated.
>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>
> :040000 040000 d3bad84ebe1bc231e8e7d6267907ca62fd4d0dcd
> c85f8cb8bd4910724f0101e41054555980727e16 M drivers
>
> Now, what USB has to do with my SCSI disks is beyond me. I have a
> feeling that this commit is just uncovering another problem. I've attached
> a bootlog from a serial console that ends where the boot hangs.
>
> The does the same thing on a 2.6.34 kernel. Anything I can do to help, I'm
> available.
I'd guess that this has nothing to do with the sector size. Instead
the drive probably reports a non-zero residue when it shouldn't. Can
you add some debugging printk's to the patch to find out in more detail
what's going wrong?
Alan Stern
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
2010-05-27 20:30 ` [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached Alan Stern
@ 2010-05-27 20:43 ` Mark Hounschell
2010-05-27 21:17 ` Alan Stern
2010-05-28 11:51 ` Mark Hounschell
1 sibling, 1 reply; 6+ messages in thread
From: Mark Hounschell @ 2010-05-27 20:43 UTC (permalink / raw)
To: Alan Stern; +Cc: SCSI development list, bugzilla-daemon
On 05/27/2010 04:30 PM, Alan Stern wrote:
> On Thu, 27 May 2010, Andrew Morton wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=16058
>>
>> Summary: [BUG] Cannot boot any kernel from 2.6.27 on if a 256
>> byte sector SCSI disk is attached
>
>> As of 2.6.27 if any SCSI disk is attached that has been formatted with a 256
>> byte sector size, the boot process hangs. 512, 768, and 1024 byte sector disks
>> do not seem to trigger this. The disks in use do NOT have a partition table.
>> They are being used by out applications via the sg_io interface only.
>>
>> A 2.6.26.8 kernel works fine.
>>
>> I have bisected this problem to the following commit:
>>
>> # git bisect good
>> 427e59f09fdba387547106de7bab980b7fff77be is first bad commit
>> commit 427e59f09fdba387547106de7bab980b7fff77be
>> Author: James Bottomley <James.Bottomley@HansenPartnership.com>
>> Date: Sat Mar 8 18:24:17 2008 -0600
>>
>> [SCSI] make use of the residue value
>>
>> USB sometimes doesn't return an error but instead returns a residue
>> value indicating part (or all) of the command wasn't completed. So if
>> the driver _done() error processing indicates the command was fully
>> processed, subtract off the residue so that this USB error gets
>> propagated.
>>
>> Cc: Alan Stern <stern@rowland.harvard.edu>
>> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>>
>> :040000 040000 d3bad84ebe1bc231e8e7d6267907ca62fd4d0dcd
>> c85f8cb8bd4910724f0101e41054555980727e16 M drivers
>>
>> Now, what USB has to do with my SCSI disks is beyond me. I have a
>> feeling that this commit is just uncovering another problem. I've attached
>> a bootlog from a serial console that ends where the boot hangs.
>>
>> The does the same thing on a 2.6.34 kernel. Anything I can do to help, I'm
>> available.
>
> I'd guess that this has nothing to do with the sector size. Instead
> the drive probably reports a non-zero residue when it shouldn't. Can
> you add some debugging printk's to the patch to find out in more detail
> what's going wrong?
>
> Alan Stern
>
>
Yes, I can. But first let me ask, since reverting this patch on at least
2.6.32 - 2.6.34 does not help, would it possibly be better if I did a
little more work to find out where it stops working with the above patch
reverted or not?
Mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
2010-05-27 20:43 ` Mark Hounschell
@ 2010-05-27 21:17 ` Alan Stern
0 siblings, 0 replies; 6+ messages in thread
From: Alan Stern @ 2010-05-27 21:17 UTC (permalink / raw)
To: Mark Hounschell; +Cc: SCSI development list, bugzilla-daemon
On Thu, 27 May 2010, Mark Hounschell wrote:
> >> As of 2.6.27 if any SCSI disk is attached that has been formatted with a 256
> >> byte sector size, the boot process hangs. 512, 768, and 1024 byte sector disks
> >> The does the same thing on a 2.6.34 kernel. Anything I can do to help, I'm
> >> available.
> Yes, I can. But first let me ask, since reverting this patch on at least
> 2.6.32 - 2.6.34 does not help, would it possibly be better if I did a
> little more work to find out where it stops working with the above patch
> reverted or not?
I'd do 2.6.27 first, since that's where the problem started. Then move
on to 2.6.34. There may be two different problems.
Alan Stern
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
2010-05-27 20:30 ` [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached Alan Stern
2010-05-27 20:43 ` Mark Hounschell
@ 2010-05-28 11:51 ` Mark Hounschell
2010-05-28 14:58 ` Alan Stern
1 sibling, 1 reply; 6+ messages in thread
From: Mark Hounschell @ 2010-05-28 11:51 UTC (permalink / raw)
To: Alan Stern; +Cc: markh, SCSI development list, bugzilla-daemon
On 05/27/2010 04:30 PM, Alan Stern wrote:
> On Thu, 27 May 2010, Andrew Morton wrote:
>
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=16058
>>
>> Summary: [BUG] Cannot boot any kernel from 2.6.27 on if a 256
>> byte sector SCSI disk is attached
>>
>
>> As of 2.6.27 if any SCSI disk is attached that has been formatted with a 256
>> byte sector size, the boot process hangs. 512, 768, and 1024 byte sector disks
>> do not seem to trigger this. The disks in use do NOT have a partition table.
>> They are being used by out applications via the sg_io interface only.
>>
>> A 2.6.26.8 kernel works fine.
>>
>> I have bisected this problem to the following commit:
>>
>> # git bisect good
>> 427e59f09fdba387547106de7bab980b7fff77be is first bad commit
>> commit 427e59f09fdba387547106de7bab980b7fff77be
>> Author: James Bottomley <James.Bottomley@HansenPartnership.com>
>> Date: Sat Mar 8 18:24:17 2008 -0600
>>
>> [SCSI] make use of the residue value
>>
>> USB sometimes doesn't return an error but instead returns a residue
>> value indicating part (or all) of the command wasn't completed. So if
>> the driver _done() error processing indicates the command was fully
>> processed, subtract off the residue so that this USB error gets
>> propagated.
>>
>> Cc: Alan Stern <stern@rowland.harvard.edu>
>> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>>
>> :040000 040000 d3bad84ebe1bc231e8e7d6267907ca62fd4d0dcd
>> c85f8cb8bd4910724f0101e41054555980727e16 M drivers
>>
>> Now, what USB has to do with my SCSI disks is beyond me. I have a
>> feeling that this commit is just uncovering another problem. I've attached
>> a bootlog from a serial console that ends where the boot hangs.
>>
>> The does the same thing on a 2.6.34 kernel. Anything I can do to help, I'm
>> available.
>>
> I'd guess that this has nothing to do with the sector size. Instead
> the drive probably reports a non-zero residue when it shouldn't. Can
> you add some debugging printk's to the patch to find out in more detail
> what's going wrong?
>
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
Alan,
I've added some printks in scsi.c and aic7xxx_core.c.
A TUR:
ahc_calc_residual: Entered
ahc_calc_residual: return Case 2 sgptr = 0x00000001
ahc_calc_residual: Entered
ahc_calc_residual: return Case 5-1 resid = 0xe
ahc_calc_residual: return Case 5-2 resid = 0xe
scsi_finish_command: Entered for cmd(6):0x00 0x00 0x00 0x00 0x00 0x00
cmd->result = 0x08000002
good_bytes = 0x0
scsi_finish_command: Complete
Another TUR:
scsi_finish_command: Entered for cmd(6):0x00 0x00 0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x0
scsi_finish_command: Complete
A Read Capicity:
scsi_finish_command: Entered for cmd(10):0x25 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x8
scsi_finish_command: Complete
sd 8:0:0:0: [sde] 7260582 256-byte hardware sectors (1859 MB)
A Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x3f 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write Protect is off
Another Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
Another Mode Sense:
ahc_calc_residual: Entered
ahc_calc_residual: return Case 5-1 resid = 0x8
ahc_calc_residual: return Case 5-2 resid = 0x8
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x20 0x00
cmd->result = 0x00000000
good_bytes = 0x20
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write cache: disabled, read cache: enabled, supports
DPO and FUA
Another TUR:
scsi_finish_command: Entered for cmd(6):0x00 0x00 0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x0
scsi_finish_command: Complete
Another Read Capacity:
scsi_finish_command: Entered for cmd(10):0x25 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x8
scsi_finish_command: Complete
sd 8:0:0:0: [sde] 7260582 256-byte hardware sectors (1859 MB)
Another Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x3f 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write Protect is off
Another Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
Another Mode Sense:
ahc_calc_residual: Entered
ahc_calc_residual: return Case 5-1 resid = 0x8
ahc_calc_residual: return Case 5-2 resid = 0x8
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x20 0x00
cmd->result = 0x00000000
good_bytes = 0x20
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write cache: disabled, read cache: enabled, supports
DPO and FUA
Another TUR:
scsi_finish_command: Entered for cmd(6):0x00 0x00 0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x0
scsi_finish_command: Complete
Another Read Capacity:
scsi_finish_command: Entered for cmd(10):0x25 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00
cmd->result = 0x00000000
good_bytes = 0x8
scsi_finish_command: Complete
sd 8:0:0:0: [sde] 7260582 256-byte hardware sectors (1859 MB)
Another Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x3f 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write Protect is off
Another Mode Sense:
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x04 0x00
cmd->result = 0x00000000
good_bytes = 0x4
scsi_finish_command: Complete
Another Mode Sense:
ahc_calc_residual: Entered
ahc_calc_residual: return Case 5-1 resid = 0x8
ahc_calc_residual: return Case 5-2 resid = 0x8
scsi_finish_command: Entered for cmd(6):0x1a 0x00 0x08 0x00 0x20 0x00
cmd->result = 0x00000000
good_bytes = 0x20
scsi_finish_command: Complete
sd 8:0:0:0: [sde] Write cache: disabled, read cache: enabled, supports
DPO and FUA
First READ(10):
sde:
ahc_calc_residual: Entered
ahc_calc_residual: return Case 5-1 resid = 0x800
ahc_calc_residual: return Case 5-2 resid = 0x800
scsi_finish_command: Entered for cmd(10):0x28 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x08 0x00
cmd->result = 0x00000000
good_bytes == old_good_bytes = 0x800 scsi_get_resid(cmd) = 0x800
New good_bytes = 0x0
scsi_finish_command: Complete
>From here it just keeps repeating this read of 8 blocks. (2048 bytes) so
it looks like the machine is hung.
Now, I know for a fact that _if_ this read CDB is actually being sent to
the drive, it's actual residual count will be zero. These are working
disks and that read CDB is valid.
Why is ahc_calc_residual saying that the residual count is as though the
read never took place? I noticed that the first read on all the SATA
drives was for 4096 bytes, why is this one only 2048? Should it have
been 4096 and ahc_calc_residual assume that?
BTW, I'll be in and out all day today so I may not be able to respond
quickly.
One thing all these machines I have doing this, have in common, is the
scsi controller (Aic7xxx).
Regards
Mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
2010-05-28 11:51 ` Mark Hounschell
@ 2010-05-28 14:58 ` Alan Stern
2010-05-28 16:01 ` James Bottomley
0 siblings, 1 reply; 6+ messages in thread
From: Alan Stern @ 2010-05-28 14:58 UTC (permalink / raw)
To: Mark Hounschell; +Cc: markh, SCSI development list, bugzilla-daemon
On Fri, 28 May 2010, Mark Hounschell wrote:
> First READ(10):
>
> sde:
> ahc_calc_residual: Entered
> ahc_calc_residual: return Case 5-1 resid = 0x800
> ahc_calc_residual: return Case 5-2 resid = 0x800
>
> scsi_finish_command: Entered for cmd(10):0x28 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x08 0x00
> cmd->result = 0x00000000
> good_bytes == old_good_bytes = 0x800 scsi_get_resid(cmd) = 0x800
> New good_bytes = 0x0
> scsi_finish_command: Complete
>
> From here it just keeps repeating this read of 8 blocks. (2048 bytes) so
> it looks like the machine is hung.
Probably not hung, just doing a lot of retries. It should time out
eventually, but it might take a long time (perhaps as long as 15
minutes). The combination of the block layer and the SCSI layer isn't
very good at knowing when to give up.
> Now, I know for a fact that _if_ this read CDB is actually being sent to
> the drive, it's actual residual count will be zero. These are working
> disks and that read CDB is valid.
>
> Why is ahc_calc_residual saying that the residual count is as though the
> read never took place? I noticed that the first read on all the SATA
> drives was for 4096 bytes, why is this one only 2048? Should it have
> been 4096 and ahc_calc_residual assume that?
I don't know the answer to any of these questions. They could well be
due to bugs in the driver, and I know nothing about how the aic7xxx
driver works. You should talk to someone who does.
In the meantime, you can track this down a little farther by adding
printk's to the appropriate places in drivers/scsi/sd.c. Look at
sd_prep_fn() to see why there's 2048 bytes instead of 4096.
Alan Stern
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached
2010-05-28 14:58 ` Alan Stern
@ 2010-05-28 16:01 ` James Bottomley
0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2010-05-28 16:01 UTC (permalink / raw)
To: Alan Stern; +Cc: Mark Hounschell, markh, SCSI development list, bugzilla-daemon
On Fri, 2010-05-28 at 10:58 -0400, Alan Stern wrote:
> On Fri, 28 May 2010, Mark Hounschell wrote:
>
> > First READ(10):
> >
> > sde:
> > ahc_calc_residual: Entered
> > ahc_calc_residual: return Case 5-1 resid = 0x800
> > ahc_calc_residual: return Case 5-2 resid = 0x800
> >
> > scsi_finish_command: Entered for cmd(10):0x28 0x00 0x00 0x00 0x00 0x00
> > 0x00 0x00 0x08 0x00
> > cmd->result = 0x00000000
> > good_bytes == old_good_bytes = 0x800 scsi_get_resid(cmd) = 0x800
> > New good_bytes = 0x0
> > scsi_finish_command: Complete
> >
> > From here it just keeps repeating this read of 8 blocks. (2048 bytes) so
> > it looks like the machine is hung.
>
> Probably not hung, just doing a lot of retries. It should time out
> eventually, but it might take a long time (perhaps as long as 15
> minutes). The combination of the block layer and the SCSI layer isn't
> very good at knowing when to give up.
Actually, I think this is a partition read. Each partition manager
tends to read a page through the page cache. If we get an error, we
seem to re-read to fill the cache.
> > Now, I know for a fact that _if_ this read CDB is actually being sent to
> > the drive, it's actual residual count will be zero. These are working
> > disks and that read CDB is valid.
> >
> > Why is ahc_calc_residual saying that the residual count is as though the
> > read never took place? I noticed that the first read on all the SATA
> > drives was for 4096 bytes, why is this one only 2048? Should it have
> > been 4096 and ahc_calc_residual assume that?
>
> I don't know the answer to any of these questions. They could well be
> due to bugs in the driver, and I know nothing about how the aic7xxx
> driver works. You should talk to someone who does.
I'll take this one ... although we're a bit lacking in documentation for
this driver.
I think the 2048 is because something is hardcoded to think 8 sectors is
a page.
James
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-05-28 16:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20100527100752.c12d8e2e.akpm@linux-foundation.org>
2010-05-27 20:30 ` [Bugme-new] [Bug 16058] New: [BUG] Cannot boot any kernel from 2.6.27 on if a 256 byte sector SCSI disk is attached Alan Stern
2010-05-27 20:43 ` Mark Hounschell
2010-05-27 21:17 ` Alan Stern
2010-05-28 11:51 ` Mark Hounschell
2010-05-28 14:58 ` Alan Stern
2010-05-28 16:01 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).