linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
@ 2009-06-21 17:26 bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 17:26 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594

           Summary: SMART responses for SATA disks on SAS get interpreted
                    as errors
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 2.6.30-rc6
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
        AssignedTo: linux-scsi@vger.kernel.org
        ReportedBy: sgunderson@bigfoot.com
        Regression: No


Hi,

I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
with:

  [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
[descriptor]
  [  811.099807] Descriptor sense data with sense descriptors (in hex):
  [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
  [  811.113262]         00 4f 00 c2 00 50
  [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
available

I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
all that changed is that the hex dump was added to the error message.

Whenever this happens, it appears like all the disks “hiccup” and the kernel
loses contact with the controller for a small while. If too many of these
happen at once, eventually disks start falling off RAIDs, and the entire
machine goes down. It looks to me as if these messages should simply not be
treated as errors by the kernel -- smartctl explicitly asks for a response even
if the command doesn't fail (by setting CK_COND), so the response probably
shouldn't be taken as an error.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
@ 2009-06-21 18:47 ` James Bottomley
  2009-06-21 18:55   ` James Bottomley
  2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: James Bottomley @ 2009-06-21 18:47 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

On Sun, 2009-06-21 at 17:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
>            Summary: SMART responses for SATA disks on SAS get interpreted
>                     as errors
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 2.6.30-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: sgunderson@bigfoot.com
>         Regression: No
> 
> 
> Hi,
> 
> I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
> drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
> standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
> 2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
> use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
> with:
> 
>   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> [descriptor]
>   [  811.099807] Descriptor sense data with sense descriptors (in hex):
>   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
>   [  811.113262]         00 4f 00 c2 00 50
>   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> available

This is a message the kernel prints out on all recovered error returns
(except those marked REQ_QUIET).  It's purely informational and doesn't
affect return processing of the command at all, so the kernel is
actually treating this as a successful completion not an error.

> I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> all that changed is that the hex dump was added to the error message.
> 
> Whenever this happens, it appears like all the disks “hiccup” and the kernel
> loses contact with the controller for a small while. If too many of these
> happen at once, eventually disks start falling off RAIDs, and the entire
> machine goes down. It looks to me as if these messages should simply not be
> treated as errors by the kernel -- smartctl explicitly asks for a response even
> if the command doesn't fail (by setting CK_COND), so the response probably
> shouldn't be taken as an error.

So this sounds like the bug ... however, for the LSI card, this bug will
be in the SAT layer in the fusion firmware.  I can shut the kernel up by
making the recovered error processing clause look for 01/00/1D as well
as REQ_QUIET, but it won't affect this problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
@ 2009-06-21 18:48 ` bugzilla-daemon
  2009-06-21 18:55 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:48 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #1 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 18:47:59 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 17:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
>            Summary: SMART responses for SATA disks on SAS get interpreted
>                     as errors
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 2.6.30-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: sgunderson@bigfoot.com
>         Regression: No
> 
> 
> Hi,
> 
> I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
> drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
> standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
> 2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
> use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
> with:
> 
>   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> [descriptor]
>   [  811.099807] Descriptor sense data with sense descriptors (in hex):
>   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
>   [  811.113262]         00 4f 00 c2 00 50
>   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> available

This is a message the kernel prints out on all recovered error returns
(except those marked REQ_QUIET).  It's purely informational and doesn't
affect return processing of the command at all, so the kernel is
actually treating this as a successful completion not an error.

> I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> all that changed is that the hex dump was added to the error message.
> 
> Whenever this happens, it appears like all the disks “hiccup” and the kernel
> loses contact with the controller for a small while. If too many of these
> happen at once, eventually disks start falling off RAIDs, and the entire
> machine goes down. It looks to me as if these messages should simply not be
> treated as errors by the kernel -- smartctl explicitly asks for a response even
> if the command doesn't fail (by setting CK_COND), so the response probably
> shouldn't be taken as an error.

So this sounds like the bug ... however, for the LSI card, this bug will
be in the SAT layer in the fusion firmware.  I can shut the kernel up by
making the recovered error processing clause look for 01/00/1D as well
as REQ_QUIET, but it won't affect this problem.

James

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 18:47 ` James Bottomley
@ 2009-06-21 18:55   ` James Bottomley
  0 siblings, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-06-21 18:55 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

On Sun, 2009-06-21 at 13:47 -0500, James Bottomley wrote:
> >   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> > [descriptor]
> >   [  811.099807] Descriptor sense data with sense descriptors (in hex):
> >   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> >   [  811.113262]         00 4f 00 c2 00 50
> >   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> > available
> 
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.
> 
> > I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> > all that changed is that the hex dump was added to the error message.
> > 
> > Whenever this happens, it appears like all the disks “hiccup” and the kernel
> > loses contact with the controller for a small while. If too many of these
> > happen at once, eventually disks start falling off RAIDs, and the entire
> > machine goes down. It looks to me as if these messages should simply not be
> > treated as errors by the kernel -- smartctl explicitly asks for a response even
> > if the command doesn't fail (by setting CK_COND), so the response probably
> > shouldn't be taken as an error.
> 
> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

Actually quieting the message is trivially easy, try this.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..a0235c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -774,7 +774,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	 * is what gets returned to the user
 	 */
 	if (sense_valid && sshdr.sense_key == RECOVERED_ERROR) {
-		if (!(req->cmd_flags & REQ_QUIET))
+		if (!(req->cmd_flags & REQ_QUIET) &&
+		    !(sshdr.asc == 0x00 && sshdr.ascq == 0x1d))
 			scsi_print_sense("", cmd);
 		result = 0;
 		/* BLOCK_PC may have set error */


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
  2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
@ 2009-06-21 18:55 ` bugzilla-daemon
  2009-06-21 18:58 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:55 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #2 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 18:55:06 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 13:47 -0500, James Bottomley wrote:
> >   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> > [descriptor]
> >   [  811.099807] Descriptor sense data with sense descriptors (in hex):
> >   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> >   [  811.113262]         00 4f 00 c2 00 50
> >   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> > available
> 
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.
> 
> > I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> > all that changed is that the hex dump was added to the error message.
> > 
> > Whenever this happens, it appears like all the disks “hiccup” and the kernel
> > loses contact with the controller for a small while. If too many of these
> > happen at once, eventually disks start falling off RAIDs, and the entire
> > machine goes down. It looks to me as if these messages should simply not be
> > treated as errors by the kernel -- smartctl explicitly asks for a response even
> > if the command doesn't fail (by setting CK_COND), so the response probably
> > shouldn't be taken as an error.
> 
> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

Actually quieting the message is trivially easy, try this.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..a0235c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -774,7 +774,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int
good_bytes)
      * is what gets returned to the user
      */
     if (sense_valid && sshdr.sense_key == RECOVERED_ERROR) {
-        if (!(req->cmd_flags & REQ_QUIET))
+        if (!(req->cmd_flags & REQ_QUIET) &&
+            !(sshdr.asc == 0x00 && sshdr.ascq == 0x1d))
             scsi_print_sense("", cmd);
         result = 0;
         /* BLOCK_PC may have set error */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (2 preceding siblings ...)
  2009-06-21 18:55 ` bugzilla-daemon
@ 2009-06-21 18:58 ` bugzilla-daemon
  2009-06-21 19:07   ` James Bottomley
  2009-06-21 19:07 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:58 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
(In reply to comment #1)
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.

OK.

> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

I tried reporting this to the Linux fusionmpt driver people a while ago, but
never received any response (thus this bug)... I guess I'm out of luck, then,
if there's nothing that can be done for it in the kernel. It's a bit weird,
though; one would believe people ran smartd on their systems and discovered
this already.

/* Steinar */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 18:58 ` bugzilla-daemon
@ 2009-06-21 19:07   ` James Bottomley
  0 siblings, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-06-21 19:07 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi, Moore, Eric

On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
> (In reply to comment #1)
> > This is a message the kernel prints out on all recovered error returns
> > (except those marked REQ_QUIET).  It's purely informational and doesn't
> > affect return processing of the command at all, so the kernel is
> > actually treating this as a successful completion not an error.
> 
> OK.
> 
> > So this sounds like the bug ... however, for the LSI card, this bug will
> > be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> > making the recovered error processing clause look for 01/00/1D as well
> > as REQ_QUIET, but it won't affect this problem.
> 
> I tried reporting this to the Linux fusionmpt driver people a while ago, but
> never received any response (thus this bug)... I guess I'm out of luck,

OK, cc'd LSI people, let's see if I get better luck

>  then,
> if there's nothing that can be done for it in the kernel. It's a bit weird,
> though; one would believe people ran smartd on their systems and discovered
> this already.

I can guess that it's some type of firmware mode problem: either it runs
for SMART or it runs for normal commands, hence the hiatus.  If that's
true, you'd likely only see the problem in a large disk setup ... it
might also be possible to work around by simply quiescing the card
before sending down SMART commands (that would be grossly inefficient,
but at least devices wouldn't get errored).

James



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (3 preceding siblings ...)
  2009-06-21 18:58 ` bugzilla-daemon
@ 2009-06-21 19:07 ` bugzilla-daemon
  2009-06-21 20:53   ` Douglas Gilbert
  2009-06-21 20:53 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 19:07 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
> (In reply to comment #1)
> > This is a message the kernel prints out on all recovered error returns
> > (except those marked REQ_QUIET).  It's purely informational and doesn't
> > affect return processing of the command at all, so the kernel is
> > actually treating this as a successful completion not an error.
> 
> OK.
> 
> > So this sounds like the bug ... however, for the LSI card, this bug will
> > be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> > making the recovered error processing clause look for 01/00/1D as well
> > as REQ_QUIET, but it won't affect this problem.
> 
> I tried reporting this to the Linux fusionmpt driver people a while ago, but
> never received any response (thus this bug)... I guess I'm out of luck,

OK, cc'd LSI people, let's see if I get better luck

>  then,
> if there's nothing that can be done for it in the kernel. It's a bit weird,
> though; one would believe people ran smartd on their systems and discovered
> this already.

I can guess that it's some type of firmware mode problem: either it runs
for SMART or it runs for normal commands, hence the hiatus.  If that's
true, you'd likely only see the problem in a large disk setup ... it
might also be possible to work around by simply quiescing the card
before sending down SMART commands (that would be grossly inefficient,
but at least devices wouldn't get errored).

James

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 19:07 ` bugzilla-daemon
@ 2009-06-21 20:53   ` Douglas Gilbert
  2009-06-22 12:04     ` Matthew Wilcox
  0 siblings, 1 reply; 27+ messages in thread
From: Douglas Gilbert @ 2009-06-21 20:53 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
> Reply-To: James.Bottomley@HansenPartnership.com
> 
> On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=13594
>>
>>
>>
>>
>>
>> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
>> (In reply to comment #1)
>>> This is a message the kernel prints out on all recovered error returns
>>> (except those marked REQ_QUIET).  It's purely informational and doesn't
>>> affect return processing of the command at all, so the kernel is
>>> actually treating this as a successful completion not an error.
>> OK.
>>
>>> So this sounds like the bug ... however, for the LSI card, this bug will
>>> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
>>> making the recovered error processing clause look for 01/00/1D as well
>>> as REQ_QUIET, but it won't affect this problem.
>> I tried reporting this to the Linux fusionmpt driver people a while ago, but
>> never received any response (thus this bug)... I guess I'm out of luck,
> 
> OK, cc'd LSI people, let's see if I get better luck
> 
>>  then,
>> if there's nothing that can be done for it in the kernel. It's a bit weird,
>> though; one would believe people ran smartd on their systems and discovered
>> this already.
> 
> I can guess that it's some type of firmware mode problem: either it runs
> for SMART or it runs for normal commands, hence the hiatus.  If that's
> true, you'd likely only see the problem in a large disk setup ... it
> might also be possible to work around by simply quiescing the card
> before sending down SMART commands (that would be grossly inefficient,
> but at least devices wouldn't get errored).

I have just replicated the "ATA pass through information
available" message report on a similar vintage LSI
controller and a SATA disk with a recent smartctl
version.

There is no need to report this in the kernel error log,
as the smartmontools ATA pass-through (SCSI) command asked
for the final state of the ATA registers and the sense
buffer is the conduit for that information. That ASC/ASCQ
pair basically means "you asked for them and here they
are". [reference: sat2r07b.pdf section 12.2.5 table 107
when CK_COND is 1]

As for the hiccup, I have noticed that with SAS (SCSI)
disks from Seagate there is a curious sound and a pause
before the response to LOG SENSE SCSI command (the
type the smartmontools uses on SCSI disks).

Another annoyance is that the disk must be ready (i.e.
spun up) before MODE SENSE and LOG SENSE work, haven't
Seagate heard of flash :-)
SCSI standards permit that (i.e. only
a small number of commands have to work when the disk
is not ready) but you would think accessing metadata
given the disk has spun up once since power up could
be accomplished from RAM or flash.

Doug Gilbert

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (4 preceding siblings ...)
  2009-06-21 19:07 ` bugzilla-daemon
@ 2009-06-21 20:53 ` bugzilla-daemon
  2009-06-21 21:14 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 20:53 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #5 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 20:53:36 ---
Reply-To: dgilbert@interlog.com

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
> Reply-To: James.Bottomley@HansenPartnership.com
> 
> On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=13594
>>
>>
>>
>>
>>
>> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
>> (In reply to comment #1)
>>> This is a message the kernel prints out on all recovered error returns
>>> (except those marked REQ_QUIET).  It's purely informational and doesn't
>>> affect return processing of the command at all, so the kernel is
>>> actually treating this as a successful completion not an error.
>> OK.
>>
>>> So this sounds like the bug ... however, for the LSI card, this bug will
>>> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
>>> making the recovered error processing clause look for 01/00/1D as well
>>> as REQ_QUIET, but it won't affect this problem.
>> I tried reporting this to the Linux fusionmpt driver people a while ago, but
>> never received any response (thus this bug)... I guess I'm out of luck,
> 
> OK, cc'd LSI people, let's see if I get better luck
> 
>>  then,
>> if there's nothing that can be done for it in the kernel. It's a bit weird,
>> though; one would believe people ran smartd on their systems and discovered
>> this already.
> 
> I can guess that it's some type of firmware mode problem: either it runs
> for SMART or it runs for normal commands, hence the hiatus.  If that's
> true, you'd likely only see the problem in a large disk setup ... it
> might also be possible to work around by simply quiescing the card
> before sending down SMART commands (that would be grossly inefficient,
> but at least devices wouldn't get errored).

I have just replicated the "ATA pass through information
available" message report on a similar vintage LSI
controller and a SATA disk with a recent smartctl
version.

There is no need to report this in the kernel error log,
as the smartmontools ATA pass-through (SCSI) command asked
for the final state of the ATA registers and the sense
buffer is the conduit for that information. That ASC/ASCQ
pair basically means "you asked for them and here they
are". [reference: sat2r07b.pdf section 12.2.5 table 107
when CK_COND is 1]

As for the hiccup, I have noticed that with SAS (SCSI)
disks from Seagate there is a curious sound and a pause
before the response to LOG SENSE SCSI command (the
type the smartmontools uses on SCSI disks).

Another annoyance is that the disk must be ready (i.e.
spun up) before MODE SENSE and LOG SENSE work, haven't
Seagate heard of flash :-)
SCSI standards permit that (i.e. only
a small number of commands have to work when the disk
is not ready) but you would think accessing metadata
given the disk has spun up once since power up could
be accomplished from RAM or flash.

Doug Gilbert

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (5 preceding siblings ...)
  2009-06-21 20:53 ` bugzilla-daemon
@ 2009-06-21 21:14 ` bugzilla-daemon
  2009-06-22 12:04 ` bugzilla-daemon
  2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-21 21:14 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #6 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 21:14:37 ---
On Sun, Jun 21, 2009 at 08:53:37PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> I have just replicated the "ATA pass through information
> available" message report on a similar vintage LSI
> controller and a SATA disk with a recent smartctl
> version.
> 
> There is no need to report this in the kernel error log,
> as the smartmontools ATA pass-through (SCSI) command asked
> for the final state of the ATA registers and the sense
> buffer is the conduit for that information. That ASC/ASCQ
> pair basically means "you asked for them and here they
> are". [reference: sat2r07b.pdf section 12.2.5 table 107
> when CK_COND is 1]

OK, this is basically what we agreed on already. I'm not able to
test the given patch right now, though (the machine is a production
machine).

> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).

FWIW, I've used the same disks on SATA controllers with smartctl
without any problems. I'm not entirely sure how to parse your
message, though -- do you imply that the problem is in smartctl?
The disk?

/* Steinar */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 20:53   ` Douglas Gilbert
@ 2009-06-22 12:04     ` Matthew Wilcox
  0 siblings, 0 replies; 27+ messages in thread
From: Matthew Wilcox @ 2009-06-22 12:04 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: bugzilla-daemon, linux-scsi

On Sun, Jun 21, 2009 at 04:53:29PM -0400, Douglas Gilbert wrote:
> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).
>
> Another annoyance is that the disk must be ready (i.e.
> spun up) before MODE SENSE and LOG SENSE work, haven't
> Seagate heard of flash :-)
> SCSI standards permit that (i.e. only
> a small number of commands have to work when the disk
> is not ready) but you would think accessing metadata
> given the disk has spun up once since power up could
> be accomplished from RAM or flash.

We've experienced similar problems at Intel with an LSI card and Intel
SSDs (SATA, not SAS).  This issue got pushed into the 'investigate later'
category, as we were able to just disable smartd.  I'll try and get some
more information on this later.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (6 preceding siblings ...)
  2009-06-21 21:14 ` bugzilla-daemon
@ 2009-06-22 12:04 ` bugzilla-daemon
  2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-06-22 12:04 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #7 from Matthew Wilcox <matthew@wil.cx>  2009-06-22 12:04:31 ---
On Sun, Jun 21, 2009 at 04:53:29PM -0400, Douglas Gilbert wrote:
> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).
>
> Another annoyance is that the disk must be ready (i.e.
> spun up) before MODE SENSE and LOG SENSE work, haven't
> Seagate heard of flash :-)
> SCSI standards permit that (i.e. only
> a small number of commands have to work when the disk
> is not ready) but you would think accessing metadata
> given the disk has spun up once since power up could
> be accomplished from RAM or flash.

We've experienced similar problems at Intel with an LSI card and Intel
SSDs (SATA, not SAS).  This issue got pushed into the 'investigate later'
category, as we were able to just disable smartd.  I'll try and get some
more information on this later.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (7 preceding siblings ...)
  2009-06-22 12:04 ` bugzilla-daemon
@ 2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2009-11-21  0:20 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594


Al Tobey <tobert@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tobert@gmail.com




--- Comment #8 from Al Tobey <tobert@gmail.com>  2009-11-21 00:20:30 ---
I get the same issue on LSI SAS2008 using the mpt2sas driver in 2.6.32-rc5.  
It wouldn't be a big deal, but it actually increments
/sys/block/$dev/device/ioerr_cnt, which I'd like to use for quick & dirty
checks for drives going south (I realize it's not perfect).

This occurs with both smartmontools 5.38-2+lenny1 as shipped with Debian 5 and
with a local backport of 5.38+svn2956 from experimental.

Trying smartctl -d scsi returns an outright failure. 

I can also reproduce with sg_sat_identify -c.

~$ sudo sg_sat_identify -c /dev/sg13
~$ dmesg |tail -n 5
sd 4:0:11:0: [sg13] Sense Key : Recovered Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
        00 00 00 00 00 00 
sd 4:0:11:0: [sg13] Add. Sense: ATA pass through information available

~$ cat /sys/block/sdm/device/ioerr_cnt
0x5

~$ sudo smartctl -d sat -q errorsonly -H /dev/sdm
smartctl 5.39 2009-10-10 r2955 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

~$ cat /sys/block/sdm/device/ioerr_cnt
0x6

~$ cat /sys/class/scsi_host/host4/device_delay
00
~$ cat /sys/class/scsi_host/host4/version_fw
02.00.50.00
~$ cat /sys/class/scsi_host/host4/version_mpi
200.0b
~$ cat /sys/class/scsi_host/host4/version_product 
LSISAS2008
~$ cat /sys/class/scsi_host/host4/version_bios
07.01.01.00

~$ sudo sg_inq /dev/sg12
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TGPS=0  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=74 (0x4a)   Peripheral device type: disk
 Vendor identification: ATA     
 Product identification: WDC WD2002FYPS-0
 Product revision level: 5G04
 Unit serial number:      WD-WCAVY0517841

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
@ 2010-04-03 22:07 ` bugzilla-daemon
  2010-04-27 22:31 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-04-03 22:07 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


Cláudio Martins <ctpm@ist.utl.pt> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ctpm@ist.utl.pt




--- Comment #9 from Cláudio Martins <ctpm@ist.utl.pt>  2010-04-03 22:07:47 ---
Hello,

 I'd like to point out that this bug is still present on kernel version
2.6.34-rc3-00163-g5e11611.

 I'm using a Supermicro enclosure with a SAS backplane and 16 SATA 1.5TB drives
(ST31500341AS).

The onboard controller, as reported by lspci:

05:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express
Fusion-MPT SAS (rev 08)

At boot time the mptsas kernel driver reports:

scsi4 : ioc0: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16

Smartmontools is version 5.38-2+lenny1 (v5.38 from Debian Lenny)


While generating I/O in the disks, I can easily make all I/O stall for several
minutes and even kick drives out of an MD Array by running "smartctl -a
/dev/sdX" repeatedly on several drives. During the stall, the kernel logged the
following messages:

mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptscsih: ioc0: attempting task abort! (sc=ffff8802b57aa100)
sd 4:0:10:0: [sdk] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00
09 00 4f 00 c2 00 b0 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed},
SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802b57aa100)
mptscsih: ioc0: attempting task abort! (sc=ffff8802b57aa100)
sd 4:0:10:0: [sdk] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed},
SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802b57aa100)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be35ec00)
sd 4:0:10:0: [sdk] CDB: Write(10): 2a 00 96 27 78 00 00 04 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be35ec00)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be35eb00)
sd 4:0:10:0: [sdk] CDB: Write(10): 2a 00 96 27 7c 00 00 04 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be35eb00)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be35eb00)
sd 4:0:10:0: [sdk] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet
Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be35eb00)
mptscsih: ioc0: attempting target reset! (sc=ffff8802b57aa100)
sd 4:0:10:0: [sdk] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00
09 00 4f 00 c2 00 b0 00
mptscsih: ioc0: target reset: FAILED (sc=ffff8802b57aa100)
mptscsih: ioc0: attempting bus reset! (sc=ffff8802b57aa100)
sd 4:0:10:0: [sdk] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00
09 00 4f 00 c2 00 b0 00
mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8802b57aa100)
mptscsih: ioc0: attempting task abort! (sc=ffff8802b57aa100)
sd 4:0:10:0: [sdk] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet
Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802b57aa100)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptbase: ioc0: LogInfo(0x31123000): Originator={PL}, Code={Abort},
SubCode(0x3000)
mptscsih: ioc0: attempting task abort! (sc=ffff8802be35eb00)
sd 4:0:10:0: [sdk] CDB: Test Unit Ready: 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet
Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8802be35eb00)
mptscsih: ioc0: attempting host reset! (sc=ffff8802b57aa100)
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: host reset: SUCCESS (sc=ffff8802b57aa100)
end_request: I/O error, dev sdb, sector 3903551
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on sdb1, disabling device.
raid1: Operation continuing on 1 devices.
end_request: I/O error, dev sda, sector 3903551
md: super_written gets error=-5, uptodate=0
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1
 disk 1, wo:1, o:0, dev:sdb1
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1


--------------

 I have this hardware available for a few weeks, so I am willing to help with
any tests, diagnostic operations, patches or firmware, that you might have.

 Any help with this is appreciated, since the fact that drives are being kicked
from MD arrays, makes Smartmontools use quite difficult.

 Thanks in advance for your help.

Best regards 

Cláudio

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
  2010-04-03 22:07 ` bugzilla-daemon
@ 2010-04-27 22:31 ` bugzilla-daemon
  2010-05-01  4:45 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-04-27 22:31 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


kdesai <kashyap.desai@lsi.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kashyap.desai@lsi.com

andcycle-bugzilla.kernel.org@andcycle.idv.tw changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andcycle-bugzilla.kernel.or
                   |                            |g@andcycle.idv.tw




--- Comment #10 from kdesai <kashyap.desai@lsi.com>  2010-04-05 07:49:42 ---
Claudio,

I tried doing similar stuffs at my setup and I was not able to see similar
issue as reported by you.

We need to know whether it is specific to SATA disk or generic issue.

Can you please provide me next possible details as mentioned below?

a) How about using different SATA disk instead of which you are using
currently. What is behavior in that case?
b) I did below steps to reproduce things. (Please correct me if anything
missing while mimicking your test case)
    mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sdc
/dev/sdd
    "while true; do smartclt -a /dev/sdX; done;" 
I kept running it for 15 min, I could not see any issue in my setup.
Is this correct way of reproducing the issue?

My disk are Seagate ST320000641AS (2TB) FW version CC12.


I am suspecting this issue may be mapped to the end devices also. 
Need to clarify this doubt doing some other combinations of experiment. Can you
provide details on my queries to jump next steps of investigation?


--Kashyap

--- Comment #11 from andcycle-bugzilla.kernel.org@andcycle.idv.tw  2010-04-27 22:29:53 ---
oops, someone just get a more detail view over this problem on LKML,
I am gonna trying this

Date    Mon, 26 Apr 2010 18:11:54 -0500
>From    Ryan Kuester <>
Subject    mptsas hangs caused by ATA pass-through explained

http://lkml.org/lkml/2010/4/26/335

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
  2010-04-03 22:07 ` bugzilla-daemon
  2010-04-27 22:31 ` bugzilla-daemon
@ 2010-05-01  4:45 ` bugzilla-daemon
  2010-05-12 14:09 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-01  4:45 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


kdesai <kashyap.desai@lsi.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kashyap.desai@lsi.com

AndCycle <andcycle-bugzilla.kernel.org@andcycle.idv.tw> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andcycle-bugzilla.kernel.or
                   |                            |g@andcycle.idv.tw

Don Bindner <don.bindner@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |don.bindner@gmail.com




--- Comment #10 from kdesai <kashyap.desai@lsi.com>  2010-04-05 07:49:42 ---
Claudio,

I tried doing similar stuffs at my setup and I was not able to see similar
issue as reported by you.

We need to know whether it is specific to SATA disk or generic issue.

Can you please provide me next possible details as mentioned below?

a) How about using different SATA disk instead of which you are using
currently. What is behavior in that case?
b) I did below steps to reproduce things. (Please correct me if anything
missing while mimicking your test case)
    mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sdc
/dev/sdd
    "while true; do smartclt -a /dev/sdX; done;" 
I kept running it for 15 min, I could not see any issue in my setup.
Is this correct way of reproducing the issue?

My disk are Seagate ST320000641AS (2TB) FW version CC12.


I am suspecting this issue may be mapped to the end devices also. 
Need to clarify this doubt doing some other combinations of experiment. Can you
provide details on my queries to jump next steps of investigation?


--Kashyap

--- Comment #11 from AndCycle <andcycle-bugzilla.kernel.org@andcycle.idv.tw>  2010-04-27 22:29:53 ---
oops, someone just get a more detail view over this problem on LKML,
I am gonna trying this

Date    Mon, 26 Apr 2010 18:11:54 -0500
>From    Ryan Kuester <>
Subject    mptsas hangs caused by ATA pass-through explained

http://lkml.org/lkml/2010/4/26/335

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2010-05-01  4:45 ` bugzilla-daemon
@ 2010-05-12 14:09 ` bugzilla-daemon
  2010-05-12 14:43   ` Douglas Gilbert
  2010-05-12 15:20 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-12 14:09 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


Ken Stailey <kstailey@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kstailey@yahoo.com




--- Comment #12 from Ken Stailey <kstailey@yahoo.com>  2010-05-12 14:09:28 ---
A utility from LSI is available here:
ftp://ftp.lsil.com/HostAdapterDrivers/linux/lsiutil/ 

Some information from my use of lsutil:

Board name: LSISAS3442E-R
Board assembly: L3-00120-05E

Current active firmware version is 01172b00 (1.23.43)
Firmware image's version is MPTFW-01.23.43.00-IE
  LSI Logic
x86 BIOS image's version is MPTBIOS-6.18.05.00 (2008.05.14)
EFI BIOS image's version is 3.05.01.01

Diagnostics -> Display phy counters:
Adapter Phy 1: Link Up
  Invalid DWord Count 2,734
  Running Disparity Error Count 2,757
  Loss of DWord Synch Count 0
  Phy Reset Problem Count 0 

Other information:

$ lspci | grep LSI
03:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express
Fusion-MPT SAS (rev 08)

$ uname -srvm
Linux 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC 2010 x86_64

$ strings
/lib/modules/2.6.31-21-generic/kernel/drivers/message/fusion/mptsas.ko | grep
version=
version=3.04.10
srcversion=4023EA52994688E9AE61982

$ lsb_release -d
Description:    Ubuntu 9.10

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2010-05-12 14:09 ` bugzilla-daemon
@ 2010-05-12 14:43   ` Douglas Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Douglas Gilbert @ 2010-05-12 14:43 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

The originally reported problem has been fixed. See:
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=91b25002bd58f55207e4662a611a6cded4ef9834

I was told that was scheduled to go in lk 2.6.33

Reading the bugzilla entry some of the latter posts
could be reporting some other LSI related problems.
Anyway, the bug report should be closed.

Doug Gilbert


bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> Ken Stailey <kstailey@yahoo.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |kstailey@yahoo.com
> 
> 
> 
> 
> --- Comment #12 from Ken Stailey <kstailey@yahoo.com>  2010-05-12 14:09:28 ---
> A utility from LSI is available here:
> ftp://ftp.lsil.com/HostAdapterDrivers/linux/lsiutil/ 
> 
> Some information from my use of lsutil:
> 
> Board name: LSISAS3442E-R
> Board assembly: L3-00120-05E
> 
> Current active firmware version is 01172b00 (1.23.43)
> Firmware image's version is MPTFW-01.23.43.00-IE
>   LSI Logic
> x86 BIOS image's version is MPTBIOS-6.18.05.00 (2008.05.14)
> EFI BIOS image's version is 3.05.01.01
> 
> Diagnostics -> Display phy counters:
> Adapter Phy 1: Link Up
>   Invalid DWord Count 2,734
>   Running Disparity Error Count 2,757
>   Loss of DWord Synch Count 0
>   Phy Reset Problem Count 0 
> 
> Other information:
> 
> $ lspci | grep LSI
> 03:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express
> Fusion-MPT SAS (rev 08)
> 
> $ uname -srvm
> Linux 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC 2010 x86_64
> 
> $ strings
> /lib/modules/2.6.31-21-generic/kernel/drivers/message/fusion/mptsas.ko | grep
> version=
> version=3.04.10
> srcversion=4023EA52994688E9AE61982
> 
> $ lsb_release -d
> Description:    Ubuntu 9.10
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2010-05-12 14:09 ` bugzilla-daemon
@ 2010-05-12 15:20 ` bugzilla-daemon
  2010-05-12 17:42 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-12 15:20 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #13 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2010-05-12 15:20:09 ---
Reply-To: dgilbert@interlog.com

The originally reported problem has been fixed. See:
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=91b25002bd58f55207e4662a611a6cded4ef9834

I was told that was scheduled to go in lk 2.6.33

Reading the bugzilla entry some of the latter posts
could be reporting some other LSI related problems.
Anyway, the bug report should be closed.

Doug Gilbert


bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> Ken Stailey <kstailey@yahoo.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |kstailey@yahoo.com
> 
> 
> 
> 
> --- Comment #12 from Ken Stailey <kstailey@yahoo.com>  2010-05-12 14:09:28 ---
> A utility from LSI is available here:
> ftp://ftp.lsil.com/HostAdapterDrivers/linux/lsiutil/ 
> 
> Some information from my use of lsutil:
> 
> Board name: LSISAS3442E-R
> Board assembly: L3-00120-05E
> 
> Current active firmware version is 01172b00 (1.23.43)
> Firmware image's version is MPTFW-01.23.43.00-IE
>   LSI Logic
> x86 BIOS image's version is MPTBIOS-6.18.05.00 (2008.05.14)
> EFI BIOS image's version is 3.05.01.01
> 
> Diagnostics -> Display phy counters:
> Adapter Phy 1: Link Up
>   Invalid DWord Count 2,734
>   Running Disparity Error Count 2,757
>   Loss of DWord Synch Count 0
>   Phy Reset Problem Count 0 
> 
> Other information:
> 
> $ lspci | grep LSI
> 03:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express
> Fusion-MPT SAS (rev 08)
> 
> $ uname -srvm
> Linux 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC 2010 x86_64
> 
> $ strings
> /lib/modules/2.6.31-21-generic/kernel/drivers/message/fusion/mptsas.ko | grep
> version=
> version=3.04.10
> srcversion=4023EA52994688E9AE61982
> 
> $ lsb_release -d
> Description:    Ubuntu 9.10
>

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2010-05-12 15:20 ` bugzilla-daemon
@ 2010-05-12 17:42 ` bugzilla-daemon
  2010-05-12 17:43 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-12 17:42 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #14 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2010-05-12 17:42:13 ---
On Wed, May 12, 2010 at 03:20:14PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> The originally reported problem has been fixed. See:
> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=91b25002bd58f55207e4662a611a6cded4ef9834
> 
> I was told that was scheduled to go in lk 2.6.33
> 
> Reading the bugzilla entry some of the latter posts
> could be reporting some other LSI related problems.
> Anyway, the bug report should be closed.

It actually seems like that in 2.6.34-rc6, I can use SMART pretty much with
impunity. Don't know if I'm just luckier now or what happened...

/* Steinar */

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2010-05-12 17:42 ` bugzilla-daemon
@ 2010-05-12 17:43 ` bugzilla-daemon
  2010-05-18 15:04 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-12 17:43 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #15 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2010-05-12 17:42:55 ---
On Wed, May 12, 2010 at 06:45:33PM +0200, Steinar H. Gunderson wrote:
> It actually seems like that in 2.6.34-rc6, I can use SMART pretty much with
> impunity. Don't know if I'm just luckier now or what happened...

Scratch that; I could use smartctl all I wanted, but installing smartd
promptly floored the entire card (and with it, the machine, since the RAID
went away). dmesg below.

At reboot, I kept seeing the “IOC is in FAULT state” until I got logged in
and killed smartd again.

/* Steinar */

[588630.695020] mptscsih: ioc0: attempting task abort! (sc=ffff880182bab200)
[588630.702007] sd 0:0:4:0: [sde] CDB: ATA command pass through(16): 85 08 0e
00 d5 00 01 00 09 00 4f 00 c2 00 b0 00
[588632.074809] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO
Executed}, SubCode(0x0000)
[588632.084283] mptscsih: ioc0: task abort: SUCCESS (sc=ffff880182bab200)
[588638.081578] mptbase: ioc0: LogInfo(0x31111000): Originator={PL},
Code={Reset}, SubCode(0x1000)
[588638.095332] mptbase: ioc0: LogInfo(0x31112000): Originator={PL},
Code={Reset}, SubCode(0x2000)
[588642.090380] mptscsih: ioc0: attempting task abort! (sc=ffff880182bab200)
[588642.097310] sd 0:0:4:0: [sde] CDB: Test Unit Ready: 00 00 00 00 00 00
[588642.104177] mptscsih: ioc0: task abort: SUCCESS (sc=ffff880182bab200)
[588642.110862] mptscsih: ioc0: attempting task abort! (sc=ffff8801c9ffb600)
[588642.117813] sd 0:0:4:0: [sde] CDB: Synchronize Cache(10): 35 00 00 00 00 00
00 00 00 00
[588642.126382] mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801c9ffb600)
[588652.133012] mptscsih: ioc0: attempting task abort! (sc=ffff8801c9ffb600)
[588652.140020] sd 0:0:4:0: [sde] CDB: Test Unit Ready: 00 00 00 00 00 00
[588652.146909] mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801c9ffb600)
[588652.153621] mptscsih: ioc0: attempting target reset! (sc=ffff880182bab200)
[588652.160768] sd 0:0:4:0: [sde] CDB: ATA command pass through(16): 85 08 0e
00 d5 00 01 00 09 00 4f 00 c2 00 b0 00
[588652.177222] mptbase: ioc0: LogInfo(0x31112000): Originator={PL},
Code={Reset}, SubCode(0x2000)
[588653.575199] mptscsih: ioc0: target reset: SUCCESS (sc=ffff880182bab200)
[588656.583548] mptbase: ioc0: LogInfo(0x31111000): Originator={PL},
Code={Reset}, SubCode(0x1000)
[588656.594285] mptbase: ioc0: LogInfo(0x31112000): Originator={PL},
Code={Reset}, SubCode(0x2000)
[588663.582006] mptscsih: ioc0: attempting task abort! (sc=ffff880182bab200)
[588663.588952] sd 0:0:4:0: [sde] CDB: Test Unit Ready: 00 00 00 00 00 00
[588663.595810] mptscsih: ioc0: task abort: SUCCESS (sc=ffff880182bab200)
[588663.602509] mptscsih: ioc0: attempting bus reset! (sc=ffff880182bab200)
[588663.609381] sd 0:0:4:0: [sde] CDB: ATA command pass through(16): 85 08 0e
00 d5 00 01 00 09 00 4f 00 c2 00 b0 00
[588663.670443] mptbase: ioc0: LogInfo(0x31112000): Originator={PL},
Code={Reset}, SubCode(0x2000)
[588665.077991] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff880182bab200)
[588668.083821] mptbase: ioc0: LogInfo(0x31111000): Originator={PL},
Code={Reset}, SubCode(0x1000)
[588675.085656] sd 0:0:4:0: [sde] Device not ready
[588675.090326] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.097771] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.103969] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.111743] sd 0:0:4:0: [sde] CDB: Write(10): 2a 00 57 54 52 08 00 00 08 00
[588675.119172] end_request: I/O error, dev sde, sector 1465143816
[588675.125343] end_request: I/O error, dev sde, sector 1465143816
[588675.126238] md: super_written gets error=-5, uptodate=0
[588675.126238] raid5: Disk failure on sde6, disabling device.
[588675.126238] raid5: Operation continuing on 5 devices.
[588675.148011] sd 0:0:4:0: [sde] Device not ready
[588675.152712] sd 0:0:4:0: [sde] Device not ready
[588675.152723] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152725] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152727] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152730] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 44 bc ae 88 00 00 80 00
[588675.152733] end_request: I/O error, dev sde, sector 1153216136
[588675.152740] sd 0:0:4:0: [sde] Device not ready
[588675.152741] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152743] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152745] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152747] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 2b 6a 42 70 00 00 18 00
[588675.152751] end_request: I/O error, dev sde, sector 728384112
[588675.152755] sd 0:0:4:0: [sde] Device not ready
[588675.152756] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152758] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152759] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152762] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 45 9c d0 08 00 00 80 00
[588675.152765] end_request: I/O error, dev sde, sector 1167904776
[588675.152769] sd 0:0:4:0: [sde] Device not ready
[588675.152770] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152771] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152773] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152775] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 2c c6 b8 88 00 00 80 00
[588675.152779] end_request: I/O error, dev sde, sector 751220872
[588675.152783] sd 0:0:4:0: [sde] Device not ready
[588675.152784] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152785] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152787] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152789] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 32 fb f1 08 00 00 80 00
[588675.152793] end_request: I/O error, dev sde, sector 855372040
[588675.152796] sd 0:0:4:0: [sde] Device not ready
[588675.152797] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152799] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152801] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152803] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 34 1e a8 88 00 00 80 00
[588675.152806] end_request: I/O error, dev sde, sector 874424456
[588675.152811] sd 0:0:4:0: [sde] Device not ready
[588675.152812] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152813] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152815] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152817] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 3e 4d bd 88 00 00 80 00
[588675.152821] end_request: I/O error, dev sde, sector 1045282184
[588675.152824] sd 0:0:4:0: [sde] Device not ready
[588675.152825] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152827] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152828] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152831] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 18 a3 50 88 00 00 10 00
[588675.152834] end_request: I/O error, dev sde, sector 413356168
[588675.152838] sd 0:0:4:0: [sde] Device not ready
[588675.152839] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152841] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152842] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152845] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 18 a3 50 a0 00 00 68 00
[588675.152848] end_request: I/O error, dev sde, sector 413356192
[588675.152855] sd 0:0:4:0: [sde] Device not ready
[588675.152856] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152857] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152859] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152861] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 36 12 e4 08 00 00 80 00
[588675.152865] end_request: I/O error, dev sde, sector 907207688
[588675.152868] sd 0:0:4:0: [sde] Device not ready
[588675.152869] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152871] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152873] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152875] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 31 8e c0 08 00 00 80 00
[588675.152878] end_request: I/O error, dev sde, sector 831438856
[588675.152882] sd 0:0:4:0: [sde] Device not ready
[588675.152883] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.152885] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.152886] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.152889] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 3d e3 2f 08 00 00 80 00
[588675.152892] end_request: I/O error, dev sde, sector 1038298888
[588675.152910] end_request: I/O error, dev sdj, sector 2930271882
[588675.152912] md: super_written gets error=-5, uptodate=0
[588675.152915] raid5: Disk failure on sdj6, disabling device.
[588675.152915] raid5: Operation continuing on 4 devices.
[588675.158145] end_request: I/O error, dev sdh, sector 2930271882
[588675.158147] md: super_written gets error=-5, uptodate=0
[588675.158149] raid5: Disk failure on sdh6, disabling device.
[588675.158150] raid5: Operation continuing on 3 devices.
[588675.160440] end_request: I/O error, dev sdk, sector 2930271882
[588675.160442] md: super_written gets error=-5, uptodate=0
[588675.160444] raid5: Disk failure on sdk6, disabling device.
[588675.160445] raid5: Operation continuing on 2 devices.
[588675.161965] end_request: I/O error, dev sdg, sector 2930271882
[588675.161967] md: super_written gets error=-5, uptodate=0
[588675.161969] raid5: Disk failure on sdg6, disabling device.
[588675.161970] raid5: Operation continuing on 1 devices.
[588675.168925] end_request: I/O error, dev sdi, sector 2930271882
[588675.168927] md: super_written gets error=-5, uptodate=0
[588675.168929] raid5: Disk failure on sdi6, disabling device.
[588675.168930] raid5: Operation continuing on 0 devices.
[588675.168948] RAID5 conf printout:
[588675.168950]  --- rd:5 wd:0
[588675.168951]  disk 0, o:0, dev:sdg6
[588675.168952]  disk 1, o:0, dev:sdh6
[588675.168953]  disk 2, o:0, dev:sdi6
[588675.168955]  disk 3, o:0, dev:sdj6
[588675.168956]  disk 4, o:0, dev:sdk6
[588675.758839] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.766304] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.772415] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.780146] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 00 5a cc 98 00 00 08 00
[588675.787487] end_request: I/O error, dev sde, sector 5950616
[588675.793314] sd 0:0:4:0: [sde] Device not ready
[588675.797984] sd 0:0:4:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[588675.805453] sd 0:0:4:0: [sde] Sense Key : Not Ready [current]
[588675.811572] sd 0:0:4:0: [sde] Add. Sense: Logical unit failed
self-configuration
[588675.819312] sd 0:0:4:0: [sde] CDB: Read(10): 28 00 00 5a cc b0 00 00 08 00
[588675.826713] end_request: I/O error, dev sde, sector 5950640
[588675.838005] RAID5 conf printout:
[588675.841422]  --- rd:5 wd:0
[588675.844323]  disk 1, o:0, dev:sdh6
[588675.847913]  disk 2, o:0, dev:sdi6
[588675.851516]  disk 3, o:0, dev:sdj6
[588675.855106]  disk 4, o:0, dev:sdk6
[588675.858697] RAID5 conf printout:
[588675.862139]  --- rd:5 wd:0
[588675.862154] RAID5 conf printout:
[588675.862155]  --- rd:6 wd:5
[588675.862157]  disk 0, o:1, dev:sda6
[588675.862158]  disk 1, o:1, dev:sdf6
[588675.862160]  disk 2, o:0, dev:sde6
[588675.862162]  disk 3, o:1, dev:sdc6
[588675.862163]  disk 4, o:1, dev:sdb1
[588675.862164]  disk 5, o:1, dev:sdd1
[588675.892893]  disk 1, o:0, dev:sdh6
[588675.896490]  disk 2, o:0, dev:sdi6
[588675.900086]  disk 3, o:0, dev:sdj6
[588675.903674]  disk 4, o:0, dev:sdk6
[588675.912005] RAID5 conf printout:
[588675.915438]  --- rd:5 wd:0
[588675.918339]  disk 1, o:0, dev:sdh6
[588675.919256] RAID5 conf printout:
[588675.919258]  --- rd:6 wd:5
[588675.919259]  disk 0, o:1, dev:sda6
[588675.919261]  disk 1, o:1, dev:sdf6
[588675.919262]  disk 3, o:1, dev:sdc6
[588675.919263]  disk 4, o:1, dev:sdb1
[588675.919264]  disk 5, o:1, dev:sdd1
[588675.946353]  disk 2, o:0, dev:sdi6
[588675.949956]  disk 3, o:0, dev:sdj6
[588675.953553] RAID5 conf printout:
[588675.956990]  --- rd:5 wd:0
[588675.959890]  disk 1, o:0, dev:sdh6
[588675.963502]  disk 2, o:0, dev:sdi6
[588675.967103]  disk 3, o:0, dev:sdj6
[588675.974006] RAID5 conf printout:
[588675.977431]  --- rd:5 wd:0
[588675.980380]  disk 1, o:0, dev:sdh6
[588675.984017]  disk 2, o:0, dev:sdi6
[588675.987633] RAID5 conf printout:
[588675.991069]  --- rd:5 wd:0
[588675.993989]  disk 1, o:0, dev:sdh6
[588675.997600]  disk 2, o:0, dev:sdi6
[588676.006006] RAID5 conf printout:
[588676.009465]  --- rd:5 wd:0
[588676.012375]  disk 1, o:0, dev:sdh6
[588676.015985] RAID5 conf printout:
[588676.019473]  --- rd:5 wd:0
[588676.022367]  disk 1, o:0, dev:sdh6
[588676.030005] RAID5 conf printout:
[588676.033439]  --- rd:5 wd:0
[588676.036350] Buffer I/O error on device dm-15, logical block 307593216
[588676.043081] lost page write due to I/O error on dm-15
[588676.751012] ttyS0: 1 input overrun(s)
[588679.821212] ttyS0: 1 input overrun(s)
[588702.915013] mptbase: ioc0: WARNING - IOC is in FAULT state (7827h)!!!
[588702.921701] mptbase: ioc0: WARNING - Issuing HardReset from
mpt_fault_reset_work!!
[588702.929579] mptbase: ioc0: Initiating recovery
[588702.934245] mptbase: ioc0: WARNING - IOC is in FAULT state!!!
[588702.940210] mptbase: ioc0: WARNING -            FAULT code = 7827h
[588706.051011] mptbase: ioc0: Recovered from IOC FAULT
[588717.036031] mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset:
success

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2010-05-12 17:43 ` bugzilla-daemon
@ 2010-05-18 15:04 ` bugzilla-daemon
  2010-07-20 20:08 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-05-18 15:04 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #16 from Ken Stailey <kstailey@yahoo.com>  2010-05-18 15:04:15 ---
If this bug report is to be closed on the grounds that it only encompasses
suppressing some log messages can anyone post the ID of any bug reports that
are for the "real" LSI MPT driver issues?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2010-05-18 15:04 ` bugzilla-daemon
@ 2010-07-20 20:08 ` bugzilla-daemon
  2010-10-29  3:30 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-07-20 20:08 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #17 from Ken Stailey <kstailey@yahoo.com>  2010-07-20 20:08:27 ---
Related bug reports:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/605939
https://bugzilla.redhat.com/show_bug.cgi?id=616572

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2010-07-20 20:08 ` bugzilla-daemon
@ 2010-10-29  3:30 ` bugzilla-daemon
  2012-06-08 15:40 ` bugzilla-daemon
  2012-06-08 15:40 ` bugzilla-daemon
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2010-10-29  3:30 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


pipa.tk <bigplum@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bigplum@gmail.com




--- Comment #18 from pipa.tk <bigplum@gmail.com>  2010-10-29 03:30:34 ---
I also use LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS and
seagate ST31500341AS 1.5TB harddisk. 

I found that the ST31500341AS has firmware issue:
http://www.avsforum.com/avs-vb/showthread.php?t=1080005. So I check the
/var/log/message and lsscsi, there are 2 firmware version in the server, and
all sdX error messages loged are version SD17. The SD17 version should be
upgrade to SD1B, or it will hung IO for almost half a minute randomly.

Oct 29 08:27:21 XEN-ST-27 kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff8801e5465840)
Oct 29 08:27:21 XEN-ST-27 kernel: sd 4:0:3:0:
Oct 29 08:27:21 XEN-ST-27 kernel:         command: Synchronize Cache(10): 35 00
00 00 00 00 00 00 00 00
Oct 29 08:27:23 XEN-ST-27 kernel: mptbase: ioc0: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Oct 29 08:27:23 XEN-ST-27 kernel: mptscsih: ioc0: task abort: SUCCESS
(sc=ffff8801e5465840)

[4:0:0:0]    disk    ATA      ST31500341AS     SD17  /dev/sda
[4:0:1:0]    disk    ATA      ST31500341AS     CC1H  /dev/sdb
[4:0:2:0]    disk    ATA      ST31500341AS     CC1H  /dev/sdc
[4:0:3:0]    disk    ATA      ST31500341AS     SD17  /dev/sdd
[4:0:4:0]    disk    ATA      ST31500341AS     CC1H  /dev/sde
[4:0:5:0]    disk    ATA      ST31500341AS     SD17  /dev/sdf
[4:0:6:0]    disk    ATA      ST31500341AS     SD17  /dev/sdg
[4:0:7:0]    disk    ATA      ST31500341AS     CC1H  /dev/sdh
[4:0:8:0]    disk    ATA      ST31500341AS     CC1H  /dev/sdi
[4:0:9:0]    disk    ATA      ST31500341AS     CC1H  /dev/sdj
[4:0:10:0]   disk    ATA      ST31500341AS     CC1H  /dev/sdk
[4:0:11:0]   disk    ATA      ST31500341AS     CC1H  /dev/sdl

I am suffering IO hung in many xen servers. I've apply this patch
http://lkml.org/lkml/2010/4/26/335 in 2.6.18-xen with mpt version
mptlinux-3.04.01, and "task abort" still show in dmesg. But smartctl -a will
not trigger error even without this patch. So I think havey IO hung issue may
be caused by seagate firmware and ATA path-through bug in the kernel.

I didn't find ATA path-through issue in 2.6.18-xen and 2.6.16-xen, but 2.6.29
and 2.6.31 and 2.6.32 have this issue. It could be reproduced easily by running
"while true; do smartctl -a /dev/sdd > /dev/null; done". Even apply patch
http://lkml.org/lkml/2010/4/26/335, and try all mpt fusion driver I can find
form 3.04.01 to the latest lsi version 4.0.22.

Finally I test 2.6.36, ATA issue seems solved. But it doesn't support xen dom0,
I can't test this kernel in productive server. I'am trying reproduce IO hung
issue in lab, and upgrade seagate firmware version to verify it.

Related bug: https://bugzilla.kernel.org/show_bug.cgi?id=18652

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2010-10-29  3:30 ` bugzilla-daemon
@ 2012-06-08 15:40 ` bugzilla-daemon
  2012-06-08 15:40 ` bugzilla-daemon
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2012-06-08 15:40 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |alan@lxorguk.ukuu.org.uk
         Resolution|                            |OBSOLETE




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
       [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2012-06-08 15:40 ` bugzilla-daemon
@ 2012-06-08 15:40 ` bugzilla-daemon
  11 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2012-06-08 15:40 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=13594


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-06-08 15:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
2009-06-21 18:47 ` James Bottomley
2009-06-21 18:55   ` James Bottomley
2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
2009-06-21 18:55 ` bugzilla-daemon
2009-06-21 18:58 ` bugzilla-daemon
2009-06-21 19:07   ` James Bottomley
2009-06-21 19:07 ` bugzilla-daemon
2009-06-21 20:53   ` Douglas Gilbert
2009-06-22 12:04     ` Matthew Wilcox
2009-06-21 20:53 ` bugzilla-daemon
2009-06-21 21:14 ` bugzilla-daemon
2009-06-22 12:04 ` bugzilla-daemon
2009-11-21  0:20 ` bugzilla-daemon
     [not found] <bug-13594-11613@https.bugzilla.kernel.org/>
2010-04-03 22:07 ` bugzilla-daemon
2010-04-27 22:31 ` bugzilla-daemon
2010-05-01  4:45 ` bugzilla-daemon
2010-05-12 14:09 ` bugzilla-daemon
2010-05-12 14:43   ` Douglas Gilbert
2010-05-12 15:20 ` bugzilla-daemon
2010-05-12 17:42 ` bugzilla-daemon
2010-05-12 17:43 ` bugzilla-daemon
2010-05-18 15:04 ` bugzilla-daemon
2010-07-20 20:08 ` bugzilla-daemon
2010-10-29  3:30 ` bugzilla-daemon
2012-06-08 15:40 ` bugzilla-daemon
2012-06-08 15:40 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).