[Bug 153241] New: Kernel (unconditionally?) repeatedly attempts to issue SMART commands via ATA pass-through

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 153241] New: Kernel (unconditionally?) repeatedly attempts to issue SMART commands via ATA pass-through
@ 2016-08-16 19:59 bugzilla-daemon
  2016-09-22  0:44 ` [Bug 153241] " bugzilla-daemon
  0 siblings, 1 reply; 2+ messages in thread
From: bugzilla-daemon @ 2016-08-16 19:59 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=153241

            Bug ID: 153241
           Summary: Kernel (unconditionally?) repeatedly attempts to issue
                    SMART commands via ATA pass-through
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 4.4.14
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@vger.kernel.org
          Reporter: markk@clara.co.uk
        Regression: No

With a recent kernel update (on Lubuntu 16.04 x64-64), I noticed some error
messages in the log on connecting a USB hard drive, for example:

[ 1580.500043] usb 2-1: new high-speed USB device number 4 using ehci-pci
[ 1580.633247] usb 2-1: New USB device found, idVendor=0bc2, idProduct=3300
[ 1580.633255] usb 2-1: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[ 1580.633260] usb 2-1: Product: Desktop         
[ 1580.633264] usb 2-1: Manufacturer: Seagate 
[ 1580.633268] usb 2-1: SerialNumber: [redacted]
[ 1580.672701] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 1580.674539] scsi host5: usb-storage 2-1:1.0
[ 1580.674639] usbcore: registered new interface driver usb-storage
[ 1580.676522] usbcore: registered new interface driver uas
[ 1581.673205] scsi 5:0:0:0: Direct-Access     Seagate  Desktop          0146
PQ: 0 ANSI: 4
[ 1581.676331] sd 5:0:0:0: Attached scsi generic sg2 type 0
[ 1581.677416] sd 5:0:0:0: [sdb] 732566644 4096-byte logical blocks: (3.00
TB/2.73 TiB)
[ 1581.677907] sd 5:0:0:0: [sdb] Write Protect is off
[ 1581.677918] sd 5:0:0:0: [sdb] Mode Sense: 1c 00 00 00
[ 1581.678407] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[ 1581.690549]  sdb: sdb1
[ 1581.692636] sd 5:0:0:0: [sdb] Attached SCSI disk
[ 1581.846416] sd 5:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR
driverbyte=DRIVER_SENSE
[ 1581.846426] sd 5:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current]
[descriptor] 
[ 1581.846432] sd 5:0:0:0: [sdb] tag#0 Add. Sense: No additional sense
information
[ 1581.846439] sd 5:0:0:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06
20 00 00 00 00 00 00 00 00 00 00 00 e5 00
[ 1581.929398] sd 5:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR
driverbyte=DRIVER_SENSE
[ 1581.929403] sd 5:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current]
[descriptor] 
[ 1581.929405] sd 5:0:0:0: [sdb] tag#0 Add. Sense: No additional sense
information
[ 1581.929409] sd 5:0:0:0: [sdb] tag#0 CDB: ATA command pass through(12)/Blank
a1 06 20 da 00 00 4f c2 00 b0 00 00

The ATA pass-through commands are attempted every ten minutes, so the tag#0...
lines repeat in the kernel log. The commands are something to do with SMART.

It seems other people are seeing this problem too:
"kernel-4.6.3-300 shows false warnings about USB hard disks."
https://bugzilla.redhat.com/show_bug.cgi?id=1351305

"Worrisome USB disk messages in 4.6.0-1, but not 4.5.0-2"
https://lists.debian.org/debian-user/2016/07/msg00988.html


I bisected between 4.4.13 and 4.4.14 with this result:

$ git bisect bad
0dec8c0d67c64401d97122e4eba347ccc5850622 is the first bad commit
commit 0dec8c0d67c64401d97122e4eba347ccc5850622
Author: James Bottomley <James.Bottomley@HansenPartnership.com>
Date:   Fri May 13 12:04:06 2016 -0700

    scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands

    commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream.

    When SCSI was written, all commands coming from the filesystem
    (REQ_TYPE_FS commands) had data.  This meant that our signal for needing
    to complete the command was the number of bytes completed being equal to
    the number of bytes in the request.  Unfortunately, with the advent of
    flush barriers, we can now get zero length REQ_TYPE_FS commands, which
    confuse this logic because they satisfy the condition every time.  This
    means they never get retried even for retryable conditions, like UNIT
    ATTENTION because we complete them early assuming they're done.  Fix
    this by special casing the early completion condition to recognise zero
    length commands with errors and let them drop through to the retry code.

    Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
    Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
    Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


>From that commit description, it sounds like it may not be the real/root cause
of the problem though? Just that now, commands which might have been silently
dropped/ignored before, no longer are.


So, questions...

Does the kernel now unconditionally try to issue ATA pass-through commands
(opcodes 0x85 and 0xA1)? That could be problematic for several reasons:

 - The FAILED log messages are likely to make the user think their drive is
defective. Perhaps the majority of USB hard drives (well, USB-to-IDE/SATA
bridges) don't support ATA pass-through. And things like USB card readers or
flash drives definitely won't.

 - Over time the log gets filled up with repeated messages as the kernel issues
the commands every ten minutes. If the first attempt fails, don't keep trying
to avoid spamming the log and giving users heart attacks thinking their disks
are about to die. :)
Perhaps print some explanation, e.g. "target probably does not support ATA
pass-through commands".

 - Knowing the state of drive/bridge firmware:
  - some devices could hang or lock up on receiving the unknown commands,
perhaps requiring power cycling to recover
  - some devices could corrupt data being read or written when receiving the
unknown commands
  - some devices could even be "bricked" on receiving the commands, if 0x85 or
0xA1 opcodes have some vendor-specific function

 - Is there any way to blacklist a given device, e.g. by USB ID, so no
SMART/ATA pass-through commands are issued? Though that wouldn't work for
native SCSI drives.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug 153241] Kernel (unconditionally?) repeatedly attempts to issue SMART commands via ATA pass-through
  2016-08-16 19:59 [Bug 153241] New: Kernel (unconditionally?) repeatedly attempts to issue SMART commands via ATA pass-through bugzilla-daemon
@ 2016-09-22  0:44 ` bugzilla-daemon
  0 siblings, 0 replies; 2+ messages in thread
From: bugzilla-daemon @ 2016-09-22  0:44 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=153241

Arno Moeller <arno@d-tor.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arno@d-tor.org

--- Comment #1 from Arno Moeller <arno@d-tor.org> ---

I'm seeing this too on an USB-Disk (/dev/sdj) every 10 minutes:

Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x07 driverbyte=0x08
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 Sense Key : 0x4
[current] [descriptor] 
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 ASC=0x0 ASCQ=0x0 
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 CDB: opcode=0x85 85 06
20 00 00 00 00 00 00 00 00 00 00 00 e5 00
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x07 driverbyte=0x08
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 Sense Key : 0x4
[current] [descriptor] 
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 ASC=0x0 ASCQ=0x0 
Sep 22 01:47:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 CDB: opcode=0xa1 a1 06
20 da 00 00 4f c2 00 b0 00 00
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x07 driverbyte=0x08
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 Sense Key : 0x4
[current] [descriptor] 
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 ASC=0x0 ASCQ=0x0 
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 CDB: opcode=0x85 85 06
20 00 00 00 00 00 00 00 00 00 00 00 e5 00
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 UNKNOWN(0x2003) Result:
hostbyte=0x07 driverbyte=0x08
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 Sense Key : 0x4
[current] [descriptor] 
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 ASC=0x0 ASCQ=0x0 
Sep 22 01:57:05 hadante kernel: sd 7:0:0:0: [sdj] tag#0 CDB: opcode=0xa1 a1 06
20 da 00 00 4f c2 00 b0 00 00
[...]

I *think* this is caused by udisks2 v2.1.7 (source:
https://udisks.freedesktop.org/releases/).

It's doing this in src/udiskslinuxdrive.c (line 1212ff.):
  io_v4.guard = 'Q';
  io_v4.protocol = BSG_PROTOCOL_SCSI;
  io_v4.subprotocol = BSG_SUB_PROTOCOL_SCSI_CMD;
  io_v4.request_len = cdb_len;
  io_v4.request = (uintptr_t) cdb;
  io_v4.max_response_len = sizeof (sense);
  io_v4.response = (uintptr_t) sense;
  io_v4.timeout = timeout_msec;

  rc = ioctl (fd, SG_IO, &io_v4);
  if (rc != 0)
    {
      /* could be that the driver doesn't do version 4, try version 3 */
      if (errno == EINVAL)
        {
          struct sg_io_hdr io_hdr;
          memset (&io_hdr, 0, sizeof (struct sg_io_hdr));
          io_hdr.interface_id = 'S';
          io_hdr.cmdp = (unsigned char*) cdb;
          io_hdr.cmd_len = cdb_len;
          io_hdr.dxfer_direction = SG_DXFER_NONE;
          io_hdr.sbp = sense;
          io_hdr.mx_sb_len = sizeof (sense);
          io_hdr.timeout = timeout_msec;

stracing udisksd gives:

[pid 27993] open("/dev/sdj", O_RDONLY|O_NONBLOCK) = 16
[pid 27993] ioctl(16, SG_IO, {'Q', BSG_PROTOCOL_SCSI,
BSG_SUB_PROTOCOL_SCSI_CMD, request[16]=[85, 06, 20, 00, 00, 00, 00, 00, 00, 00,
00, 00, 00, 00, e5, 00], request_tag=0, request_attr=0, request_priority=0,
request_extra=0, max_response_len=32, dout_iovec_count=0, dout_xfer_len=0,
din_iovec_count=0, din_xfer_len=0, timeout=5000, flags=0, usr_ptr=0,
spare_in=0, dout[0]=NULL}) = -1 EINVAL (Invalid argument)
^^^^ this is the culprit???
[pid 27993] ioctl(16, SG_IO, {'S', SG_DXFER_NONE, cmd[16]=[85, 06, 20, 00, 00,
00, 00, 00, 00, 00, 00, 00, 00, 00, e5, 00], mx_sb_len=32, iovec_count=0,
dxfer_len=0, timeout=5000, flags=0, status=00, masked_status=00, sb[22]=[72,
04, 00, 00, 00, 00, 00, 0e, 09, 0c, 00, 00, 00, ff, 00, 00, 00, 00, 00, 00, 00,
50], host_status=0x7, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0
[pid 27993] close(16)

It's really annoying and spamming the kernel log. At first I thought the drive
is defective, because google told me that Sense Key 0x4 is a Hardware Error
(http://www.t10.org/lists/2sensekey.htm), but since the drive is working fine,
I dug deeper...

Further information:

# smartctl -a /dev/sdj
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.2-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD20EZRX-00D8PB0
Serial Number:    WD-WMC4M1418198
LU WWN Device Id: 5 0014ee 603ddaee2
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical                        
Rotation Rate:    5400 rpm                                                      
Device is:        In smartctl database [for details use: -P show]               
ATA Version is:   ACS-2 (minor revision not indicated)                          
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)                        
Local Time is:    Thu Sep 22 02:21:33 2016 CEST                                 
SMART support is: Available - device has SMART capability.                      
SMART support is: Enabled
...

# uname -a
Linux hadante 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64
GNU/Linux

kernel config:
https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-09-22  0:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-16 19:59 [Bug 153241] New: Kernel (unconditionally?) repeatedly attempts to issue SMART commands via ATA pass-through bugzilla-daemon
2016-09-22  0:44 ` [Bug 153241] " bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).