[Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels
@ 2025-01-02  0:57 bugzilla-daemon
  2025-01-02  0:58 ` [Bug 219652] " bugzilla-daemon
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02  0:57 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

            Bug ID: 219652
           Summary: READ CAPACITY(16) not used on large USB-attached drive
                    in recent kernels
           Product: IO/Storage
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: SCSI
          Assignee: linux-scsi@vger.kernel.org
          Reporter: bugs-a21@moonlit-rail.com
                CC: stern@rowland.harvard.edu
        Regression: Yes

Created attachment 307435
  --> https://bugzilla.kernel.org/attachment.cgi?id=307435&action=edit
diff without stamps of the usbmon output, good (-) to bad (+)

Upgrading from old mainline LTS kernel 6.6 to current LTS kernel 6.12, a large
3.00TB SATA drive connected via USB through an "Initio Corporation INIC-1610P
SATA bridge" (id 13fd:1e40) is falsely reported as a 2.00TiB capacity drive. 
According to the dmesg output, the newer kernel fails to identify the need to
call read_capacity_16(), resulting in a 32-bit size calculation.

I had reported this initially to the linux-usb mailing list.  Alan Stern
(CC'ed) wrote back, suggesting I redirect to linux-scsi, and include some
usbmon traces.  To upload the traces, I'm opening this on bugzilla.

A few notes:

* The USB-to-SATA bridge (via lsusb) is:
  ID 13fd:1e40 Initio Corporation INIC-1610P SATA bridge

* When booting into kernel 6.12.7, the external drive capacity
  is mis-reported.  Booting /back/ into 6.6.68, the capacity
  continues to be mis-reported.  One must now yank the USB
  cable (or power-cycle) to get the correct size detection again.

Here is a diff of the dmesg output when plugging in the drive.
From (-) is kernel 6.1.122
To   (+) is kernel 6.12.7:
 scsi host12: usb-storage 1-13:1.0
 scsi 12:0:0:0: Direct-Access     TOSHIBA  DT01ACA300       3.00 PQ: 0 ANSI: 4
 sd 12:0:0:0: Attached scsi generic sg1 type 0
-sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
-sd 12:0:0:0: [sdb] 5860533167 512-byte logical blocks: (3.00 TB/2.73 TiB)
+sd 12:0:0:0: [sdb] 4294967295 512-byte logical blocks: (2.20 TB/2.00 TiB)
 sd 12:0:0:0: [sdb] Write Protect is off
 sd 12:0:0:0: [sdb] Mode Sense: 23 00 00 00
 sd 12:0:0:0: [sdb] No Caching mode page found
 sd 12:0:0:0: [sdb] Assuming drive cache: write through
-sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
- sdb: sdb1
 sd 12:0:0:0: [sdb] Attached SCSI disk

I am attaching a diff of the usbmon output for the earlier kernel (good) and
current kernel (bad), in case the sequence of commands sent to/from the bridge
shows whatever it is that puts the bridge into a low-capacity state.  As I
think I can only attach one file to this posting, I'll make separate
attachments for the good and bad usbmon listings.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
@ 2025-01-02  0:58 ` bugzilla-daemon
  2025-01-02  0:59 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02  0:58 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #1 from Kris Karas (bugs-a21@moonlit-rail.com) ---
Created attachment 307436
  --> https://bugzilla.kernel.org/attachment.cgi?id=307436&action=edit
usbmon trace from kernel 6.12 showing failed capacity negotiation

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
  2025-01-02  0:58 ` [Bug 219652] " bugzilla-daemon
@ 2025-01-02  0:59 ` bugzilla-daemon
  2025-01-02  5:22 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02  0:59 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #2 from Kris Karas (bugs-a21@moonlit-rail.com) ---
Created attachment 307437
  --> https://bugzilla.kernel.org/attachment.cgi?id=307437&action=edit
usbmon trace from kernel 6.1 showing correct capacity negotiation

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
  2025-01-02  0:58 ` [Bug 219652] " bugzilla-daemon
  2025-01-02  0:59 ` bugzilla-daemon
@ 2025-01-02  5:22 ` bugzilla-daemon
  2025-01-02  5:26 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02  5:22 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

Kris Karas (bugs-a21@moonlit-rail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Bisected commit-id|                            |0f11328f2f46618c8c4734041fd
                   |                            |b2aacfa99b802

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (2 preceding siblings ...)
  2025-01-02  5:22 ` bugzilla-daemon
@ 2025-01-02  5:26 ` bugzilla-daemon
  2025-01-02 15:38 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02  5:26 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #3 from Kris Karas (bugs-a21@moonlit-rail.com) ---
I had a feeling that Greg KH would amble by and say, "Can you bisect that?" 
:-)

So, I figured I'd try to get to that first.
And the winner is...

    commit 0f11328f2f46618c8c4734041fdb2aacfa99b802
    Author: Mike Christie <michael.christie@oracle.com>
    Date:   Mon Jan 22 18:22:16 2024 -0600

        scsi: sd: Have midlayer retry read_capacity_10() errors

        This has read_capacity_10() have the SCSI midlayer retry errors
        instead of driving them itself.

I have tested 6.12.7 with that backed-out, and confirm that read capacity works
normally again.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (3 preceding siblings ...)
  2025-01-02  5:26 ` bugzilla-daemon
@ 2025-01-02 15:38 ` bugzilla-daemon
  2025-01-02 23:07 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02 15:38 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #4 from Alan Stern (stern@rowland.harvard.edu) ---
Your best bet is to instrument the new read_capacity_10() routine and its
caller (sd_read_capacity()) with sd_printk() lines to find out exactly what is
going on.

For instance, does sd_try_rc16_first() return true?  And why does the "Very big
device. Trying to use READ CAPACITY(16)" line not get executed?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (4 preceding siblings ...)
  2025-01-02 15:38 ` bugzilla-daemon
@ 2025-01-02 23:07 ` bugzilla-daemon
  2025-01-03  0:53 ` [Bug 219652] New: " Mike Christie
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-02 23:07 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

Martin K. Petersen (mkp@mkp.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mkp@mkp.net

--- Comment #5 from Martin K. Petersen (mkp@mkp.net) ---
Also, please provide the output of:

# sg_readcap --hex /dev/sdb
# sg_readcap --hex --16 /dev/sdb

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (5 preceding siblings ...)
  2025-01-02 23:07 ` bugzilla-daemon
@ 2025-01-03  0:53 ` Mike Christie
  2025-01-03  0:53 ` [Bug 219652] " bugzilla-daemon
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2025-01-03  0:53 UTC (permalink / raw)
  To: bugzilla-daemon, linux-scsi

On 1/1/25 6:57 PM, bugzilla-daemon@kernel.org wrote:
> * When booting into kernel 6.12.7, the external drive capacity
>   is mis-reported.  Booting /back/ into 6.6.68, the capacity
>   continues to be mis-reported.  One must now yank the USB
>   cable (or power-cycle) to get the correct size detection again.

That part is really strange. I wonder if the new code is not
resetting the device and the old code was somehow.
> 
> Here is a diff of the dmesg output when plugging in the drive.
> From (-) is kernel 6.1.122
> To   (+) is kernel 6.12.7:
>  scsi host12: usb-storage 1-13:1.0
>  scsi 12:0:0:0: Direct-Access     TOSHIBA  DT01ACA300       3.00 PQ: 0 ANSI: 4
>  sd 12:0:0:0: Attached scsi generic sg1 type 0
> -sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
> -sd 12:0:0:0: [sdb] 5860533167 512-byte logical blocks: (3.00 TB/2.73 TiB)
> +sd 12:0:0:0: [sdb] 4294967295 512-byte logical blocks: (2.20 TB/2.00 TiB)
>  sd 12:0:0:0: [sdb] Write Protect is off
>  sd 12:0:0:0: [sdb] Mode Sense: 23 00 00 00
>  sd 12:0:0:0: [sdb] No Caching mode page found
>  sd 12:0:0:0: [sdb] Assuming drive cache: write through
> -sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
> - sdb: sdb1
>  sd 12:0:0:0: [sdb] Attached SCSI disk
> 
> I am attaching a diff of the usbmon output for the earlier kernel (good) and
> current kernel (bad),

Just to start can you tell me what the device is returning?

We hit that "Very big device" code path if the device returns a size
of 0xffffffff or larger so with the old code we at some point returned
a large size. With the new code the device returns only 0xfffffffe
(4294967294) (read_capacity_10 does a lba that we got from the device + 1
so you see 4294967295 in the log message) so we never hit that very
bit device code path and try READ CAPACITY(16).

I might have changed some behavior during the retries that causes us
to not get the right size now.  However, I couldn't fully understand
the output of the usbmon traces. Can you tell me when we do the READ
CAPACITY (10) what the device is returning? Is it returning a sense
error like UNIT_ATTENTION a couple times and when it returns success
what's in the data buffer?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (6 preceding siblings ...)
  2025-01-03  0:53 ` [Bug 219652] New: " Mike Christie
@ 2025-01-03  0:53 ` bugzilla-daemon
  2025-01-03  2:02 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  0:53 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #6 from michael.christie@oracle.com ---
On 1/1/25 6:57 PM, bugzilla-daemon@kernel.org wrote:
> * When booting into kernel 6.12.7, the external drive capacity
>   is mis-reported.  Booting /back/ into 6.6.68, the capacity
>   continues to be mis-reported.  One must now yank the USB
>   cable (or power-cycle) to get the correct size detection again.

That part is really strange. I wonder if the new code is not
resetting the device and the old code was somehow.
> 
> Here is a diff of the dmesg output when plugging in the drive.
> From (-) is kernel 6.1.122
> To   (+) is kernel 6.12.7:
>  scsi host12: usb-storage 1-13:1.0
>  scsi 12:0:0:0: Direct-Access     TOSHIBA  DT01ACA300       3.00 PQ: 0 ANSI:
>  4
>  sd 12:0:0:0: Attached scsi generic sg1 type 0
> -sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
> -sd 12:0:0:0: [sdb] 5860533167 512-byte logical blocks: (3.00 TB/2.73 TiB)
> +sd 12:0:0:0: [sdb] 4294967295 512-byte logical blocks: (2.20 TB/2.00 TiB)
>  sd 12:0:0:0: [sdb] Write Protect is off
>  sd 12:0:0:0: [sdb] Mode Sense: 23 00 00 00
>  sd 12:0:0:0: [sdb] No Caching mode page found
>  sd 12:0:0:0: [sdb] Assuming drive cache: write through
> -sd 12:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
> - sdb: sdb1
>  sd 12:0:0:0: [sdb] Attached SCSI disk
> 
> I am attaching a diff of the usbmon output for the earlier kernel (good) and
> current kernel (bad),

Just to start can you tell me what the device is returning?

We hit that "Very big device" code path if the device returns a size
of 0xffffffff or larger so with the old code we at some point returned
a large size. With the new code the device returns only 0xfffffffe
(4294967294) (read_capacity_10 does a lba that we got from the device + 1
so you see 4294967295 in the log message) so we never hit that very
bit device code path and try READ CAPACITY(16).

I might have changed some behavior during the retries that causes us
to not get the right size now.  However, I couldn't fully understand
the output of the usbmon traces. Can you tell me when we do the READ
CAPACITY (10) what the device is returning? Is it returning a sense
error like UNIT_ATTENTION a couple times and when it returns success
what's in the data buffer?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (7 preceding siblings ...)
  2025-01-03  0:53 ` [Bug 219652] " bugzilla-daemon
@ 2025-01-03  2:02 ` bugzilla-daemon
  2025-01-03  2:26 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  2:02 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #7 from Alan Stern (stern@rowland.harvard.edu) ---
The two kernels perform identical USB initializations, and the two SCSI
initializations also start out identically: INQUIRY, TEST UNIT READY, and READ
CAPACITY(10).  Under both kernels, the response to the READ CAPACITY(10) is
0xffffffff blocks with blocksize 0x00000200, no errors.

At that point the bad kernel repeats the READ CAPACITY(10) command, issuing it
a total of 4 times before stopping (the fourth time, the returned value says
0xfffffffe blocks instead of 0xffffffff -- I'm not sure if the device does this
or if usb-storage changes the value in flight, but it doesn't really matter
because the error has already occurred).

By contrast, at that point the good kernel issues READ CAPACITY(16), receiving
a response of 0x15d50a3ae blocks of size 0x00000200, the correct values.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (8 preceding siblings ...)
  2025-01-03  2:02 ` bugzilla-daemon
@ 2025-01-03  2:26 ` bugzilla-daemon
  2025-01-03  3:01 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  2:26 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #8 from Kris Karas (bugs-a21@moonlit-rail.com) ---
For Martin in comment #5:

    # : Working kernel sg_readcap
     00     ff ff ff ff 00 00 02 00
    # : Ditto --16
     00     00 00 00 01 5d 50 a3 ae  00 00 02 00 00 00 00 00
     10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
    # : Failing kernel sg_readcap
     00     ff ff ff fe 00 00 02 00
    # : Ditto --16
     00     00 00 00 01 5d 50 a3 ae  00 00 02 00 00 00 00 00
     10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

As Michael surmised, the only difference I can see from the above is in the
regular READ_CAPACITY, returning ffffffff normally and fffffffe under the
failing patch.

For Alan in comment #4 and Michael in comment #6:

I spent a couple hours staring at the code (all of it new to me), and the only
thing I could see pre/post patch is that the retry loop was moved one level
down.  Commands passed, buffers, etc., look the same.  Timing/Race?

Command buffer is a bit bigger (16 vs 10).  sshdr.ascq is checked (== 0x00)
pre-patch but not post-patch.  Retry count 13 vs 10.

Can you guys write me a couple sd_printk()'s you would like to see in there?  I
am not familiar with the structure extractors enough to craft a printk with
useful info in it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (9 preceding siblings ...)
  2025-01-03  2:26 ` bugzilla-daemon
@ 2025-01-03  3:01 ` bugzilla-daemon
  2025-01-03  3:30 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  3:01 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #9 from Kris Karas (bugs-a21@moonlit-rail.com) ---
Just saw Alan's comment #7 (posting race).

FWIW, there are two drives showing this error.  The first is a Toshiba
DT01ACA300, and the second, a Seagate ST3000DM001.  So it's rather unlikely to
be a drive firmware bug.

Given that this patch has been out since 6.9.1 (if I read git correctly), it's
a bit odd that my searches for relevant/duplicate bug reports came up empty. 
Hardware bug with this particular USB/SATA bridge?  I checked the
manufacturer's website, and the model 1610 sports an embedded 8051 uP with
downloadable firmware; however, there are no firmware resources on their
website.

I found one "smoking gun" in this report from site hddguru:
https://forum.hddguru.com/viewtopic.php?t=33369
It indicates that other initio USB-to-SATA adapters report sector_count-1 in
some cases.

If this is specific to Initio adapters, perhaps the easiest patch here is to
assume that if read_capacity_10 is reported as 0xfffffffe, then it's buggy
hardware, treat it as 0xffffffff and proceed thusly.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (10 preceding siblings ...)
  2025-01-03  3:01 ` bugzilla-daemon
@ 2025-01-03  3:30 ` bugzilla-daemon
  2025-01-04  3:50   ` Mike Christie
  2025-01-03  3:37 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  3:30 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #10 from Alan Stern (stern@rowland.harvard.edu) ---
Re comment #8: I would like to know the answers to the questions asked in
comment #4 about the bad kernel:

   In sd_read_capacity(), does sd_try_rc16_first() return true?

   And why doesn't the "Very big device. Trying to use READ CAPACITY(16)" line,
along with the subsequent call to read_capacity_16(), get executed in the bad
kernel?

   Or does read_capacity_16() get executed but return immediately with an error
instead of communicating with the device?

The whole business about 0xffffffff vs. 0xfffffffe is a red herring.  If the
bad kernel were working properly, the issue wouldn't even arise.  Notice that
you never see the erroneous value under the good kernel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (11 preceding siblings ...)
  2025-01-03  3:30 ` bugzilla-daemon
@ 2025-01-03  3:37 ` bugzilla-daemon
  2025-01-04  3:50 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-03  3:37 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #11 from Alan Stern (stern@rowland.harvard.edu) ---
For example, at the appropriate points in sd_read_capacity(), add:

   sd_printk(KERN_NOTICE, sdkp, "sd_try_rc16_first(): %d\n",
sd_try_rc16_first(sdp));

   sd_printk(KERN_NOTICE, sdkp, "sector_size: %d\n", sector_size);

   sd_printk(KERN_NOTICE, sdkp, "sizeof(sdkp->capacity): %d\n", (int)
sizeof(sdkp->capacity));

   sd_printk(KERN_NOTICE, sdkp, "sdkp->capacity: %llx\n", (unsigned long long)
sdkp->capacity);

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-03  3:30 ` bugzilla-daemon
@ 2025-01-04  3:50   ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2025-01-04  3:50 UTC (permalink / raw)
  To: bugzilla-daemon, linux-scsi

On 1/2/25 9:30 PM, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219652
> 
> --- Comment #10 from Alan Stern (stern@rowland.harvard.edu) ---
> Re comment #8: I would like to know the answers to the questions asked in
> comment #4 about the bad kernel:
> 
>    In sd_read_capacity(), does sd_try_rc16_first() return true?
> 
>    And why doesn't the "Very big device. Trying to use READ CAPACITY(16)" line,> along with the subsequent call to read_capacity_16(), get executed in the bad
> kernel?

I see it.

With the new code we think the command is failing. We then retry the command 3 times
like you described you saw in the trace. Then because on the 3rd retry we get
0xfffffffe instead of 0xffffffff, we don't hit the check below like I mentioned before:

                sector_size = read_capacity_10(sdkp, sdp, buffer);
                if (sector_size == -EOVERFLOW)
                        goto got_data;
                if (sector_size < 0)
                        return;
                if ((sizeof(sdkp->capacity) > 4) &&
                    (sdkp->capacity > 0xffffffffULL)) {
                        int old_sector_size = sector_size;
                        sd_printk(KERN_NOTICE, sdkp, "Very big device. "
                                        "Trying to use READ CAPACITY(16).\n");
                        sector_size = read_capacity_16(sdkp, sdp, lim, buffer);

With the old kernel, we saw the first try succeed. We then saw the 0xffffffff
and then tried read_capacity_16 above.

And the reason for the difference was that with the new code, I forgot to
add a check for if there was even an error. We ended up always retrying
3 times and that lead us to get the 0xfffffffe value on that last retry.

so we need:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index adee6f60c966..2dcf225c7017 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,9 @@ static int scsi_check_passthrough(struct scsi_cmnd *scmd,
 	struct scsi_sense_hdr sshdr;
 	enum sam_status status;
 
+	if (!scmd->result)
+		return 0;
+
 	if (!failures)
 		return 0;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (12 preceding siblings ...)
  2025-01-03  3:37 ` bugzilla-daemon
@ 2025-01-04  3:50 ` bugzilla-daemon
  2025-01-06 20:33 ` bugzilla-daemon
  2025-01-06 21:08 ` bugzilla-daemon
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-04  3:50 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #12 from michael.christie@oracle.com ---
On 1/2/25 9:30 PM, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219652
> 
> --- Comment #10 from Alan Stern (stern@rowland.harvard.edu) ---
> Re comment #8: I would like to know the answers to the questions asked in
> comment #4 about the bad kernel:
> 
>    In sd_read_capacity(), does sd_try_rc16_first() return true?
> 
>    And why doesn't the "Very big device. Trying to use READ CAPACITY(16)"
>    line,> along with the subsequent call to read_capacity_16(), get executed
>    in the bad
> kernel?

I see it.

With the new code we think the command is failing. We then retry the command 3
times
like you described you saw in the trace. Then because on the 3rd retry we get
0xfffffffe instead of 0xffffffff, we don't hit the check below like I mentioned
before:

                sector_size = read_capacity_10(sdkp, sdp, buffer);
                if (sector_size == -EOVERFLOW)
                        goto got_data;
                if (sector_size < 0)
                        return;
                if ((sizeof(sdkp->capacity) > 4) &&
                    (sdkp->capacity > 0xffffffffULL)) {
                        int old_sector_size = sector_size;
                        sd_printk(KERN_NOTICE, sdkp, "Very big device. "
                                        "Trying to use READ CAPACITY(16).\n");
                        sector_size = read_capacity_16(sdkp, sdp, lim, buffer);

With the old kernel, we saw the first try succeed. We then saw the 0xffffffff
and then tried read_capacity_16 above.

And the reason for the difference was that with the new code, I forgot to
add a check for if there was even an error. We ended up always retrying
3 times and that lead us to get the 0xfffffffe value on that last retry.

so we need:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index adee6f60c966..2dcf225c7017 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,9 @@ static int scsi_check_passthrough(struct scsi_cmnd *scmd,
        struct scsi_sense_hdr sshdr;
        enum sam_status status;

+       if (!scmd->result)
+               return 0;
+
        if (!failures)
                return 0;

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (13 preceding siblings ...)
  2025-01-04  3:50 ` bugzilla-daemon
@ 2025-01-06 20:33 ` bugzilla-daemon
  2025-01-06 21:08 ` bugzilla-daemon
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-06 20:33 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

--- Comment #13 from Kris Karas (bugs-a21@moonlit-rail.com) ---
Sorry for late reply, was out of town.

Just checked Michael's proposed two-liner in comment #12, and am most happy to 
confirm it fixes this bug!

FWIW, I also investigated the related issue mentioned in comment #9, from the
HDDGuru forum, and can confirm that (with this same brand, Initio) of
USB-to-SATA connector, when you perform a disk capacity check via USB, it
returns exactly one sector less than when the enclosure is connected via eSATA.
 So their chipset/firmware clearly has a bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug 219652] READ CAPACITY(16) not used on large USB-attached drive in recent kernels
  2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
                   ` (14 preceding siblings ...)
  2025-01-06 20:33 ` bugzilla-daemon
@ 2025-01-06 21:08 ` bugzilla-daemon
  15 siblings, 0 replies; 18+ messages in thread
From: bugzilla-daemon @ 2025-01-06 21:08 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=219652

michael.christie@oracle.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michael.christie@oracle.com

--- Comment #14 from michael.christie@oracle.com ---
No problem. Thanks for doing the investigation and lots of debugging. I'll send
the patch upstream later today.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-01-06 21:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02  0:57 [Bug 219652] New: READ CAPACITY(16) not used on large USB-attached drive in recent kernels bugzilla-daemon
2025-01-02  0:58 ` [Bug 219652] " bugzilla-daemon
2025-01-02  0:59 ` bugzilla-daemon
2025-01-02  5:22 ` bugzilla-daemon
2025-01-02  5:26 ` bugzilla-daemon
2025-01-02 15:38 ` bugzilla-daemon
2025-01-02 23:07 ` bugzilla-daemon
2025-01-03  0:53 ` [Bug 219652] New: " Mike Christie
2025-01-03  0:53 ` [Bug 219652] " bugzilla-daemon
2025-01-03  2:02 ` bugzilla-daemon
2025-01-03  2:26 ` bugzilla-daemon
2025-01-03  3:01 ` bugzilla-daemon
2025-01-03  3:30 ` bugzilla-daemon
2025-01-04  3:50   ` Mike Christie
2025-01-03  3:37 ` bugzilla-daemon
2025-01-04  3:50 ` bugzilla-daemon
2025-01-06 20:33 ` bugzilla-daemon
2025-01-06 21:08 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox