public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org,
	Sathya Prakash <sathya.prakash@broadcom.com>,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	Sreekanth Reddy <sreekanth.reddy@broadcom.com>,
	Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
	mpi3mr-linuxdrv.pdl@broadcom.com,
	MPT-FusionLinux.pdl@broadcom.com
Subject: Re: [PATCH 0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr
Date: Wed, 11 Jun 2025 12:57:45 +0900	[thread overview]
Message-ID: <b46403b6-3eaf-4152-ad2a-c6c5402dfcac@kernel.org> (raw)
In-Reply-To: <CALOAHbC2nhFN7FR5k+9V6=Bx+b4wpT_XZ8R6U-TPKHFn6tss-A@mail.gmail.com>

On 6/11/25 12:27 PM, Yafang Shao wrote:
> On Mon, Jun 9, 2025 at 3:18 PM Damien Le Moal <dlemoal@kernel.org> wrote:
>>
>> On 6/9/25 16:09, Yafang Shao wrote:
>>> On Mon, Jun 9, 2025 at 1:50 PM Christoph Hellwig <hch@infradead.org> wrote:
>>>>
>>>> Adding Yafang Shao <laoar.shao@gmail.com>, who has a test case, which
>>>> I think promted this.
>>
>> Note that cdl-tools test suite has many test cases that do not pass without the
>> mpi3mr patch. CDL makes it easy to trigger the issue.
>>
>>>
>>> Thank you for the information and for addressing this so quickly!
>>>
>>>>
>>>> Yafang, can you check if this makes the writeback errors you're seeing
>>>> go away?
>>>
>>> I’m happy to test the fix and will share the results as soon as I have them.
>>
>> Thanks. And my apologies for forgetting to CC you on these patches.
> 
> We have developed and deployed a hotfix to hundreds of our production
> servers equipped with this drive, and it has been functioning as
> expected so far. We are currently rolling it out to the remaining
> production servers, though the process may take some time to complete.
> 
> Additionally, we encountered a writeback error in the libata driver,
> which appears to be related to DID_TIME_OUT. Do you have any insights
> into this issue?
> 
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#30 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s

No idea. Timeout is hard to debug... A bus trace would help. It could be that
the drives are really bad and taking too long per command, thus causing time
out for commands that are at the end of the queue. E.g., if you hit many bad
sectors with read commands, the drive internal retry mechanism may cause such
read command to take several seconds. So if you are running the drives at high
queue depth, this can easily cause timeouts for the commands that are at the
end of the queue.

> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#30 CDB: Write(16) 8a
> 00 00 00 00 08 2c eb 4a 58 00 00 02 c0 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 35113355864 op
> 0x1:(WRITE) flags 0x100000 phys_seg 88 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#29 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#29 CDB: Write(16) 8a
> 00 00 00 00 08 2c eb 45 18 00 00 05 40 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 35113354520 op
> 0x1:(WRITE) flags 0x104000 phys_seg 168 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#28 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#28 CDB: Write(16) 8a
> 00 00 00 00 08 2c eb 42 58 00 00 02 c0 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 35113353816 op
> 0x1:(WRITE) flags 0x100000 phys_seg 88 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#22 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#22 CDB: Read(16) 88
> 00 00 00 00 03 d8 33 44 18 00 00 02 c0 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 16512140312 op
> 0x0:(READ) flags 0x80700 phys_seg 88 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#21 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#21 CDB: Read(16) 88
> 00 00 00 00 03 d8 33 3e d8 00 00 05 40 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 16512138968 op
> 0x0:(READ) flags 0x84700 phys_seg 168 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#20 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#20 CDB: Read(16) 88
> 00 00 00 00 03 d8 33 3c 18 00 00 02 c0 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 16512138264 op
> 0x0:(READ) flags 0x80700 phys_seg 88 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#19 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#19 CDB: Write(16) 8a
> 00 00 00 00 08 2c eb 3d 18 00 00 05 40 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 35113352472 op
> 0x1:(WRITE) flags 0x104000 phys_seg 168 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#15 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#15 CDB: Read(16) 88
> 00 00 00 00 03 2c e9 4e b8 00 00 00 98 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 13638389432 op
> 0x0:(READ) flags 0x80700 phys_seg 19 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#14 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#14 CDB: Read(16) 88
> 00 00 00 00 03 2c e9 4c b8 00 00 02 00 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 13638388920 op
> 0x0:(READ) flags 0x80700 phys_seg 64 prio class 2
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#13 FAILED Result:
> hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
> [Tue Jun 10 13:27:47 2025] sd 14:0:0:0: [sdl] tag#13 CDB: Read(16) 88
> 00 00 00 00 03 d8 33 36 d8 00 00 05 40 00 00
> [Tue Jun 10 13:27:47 2025] I/O error, dev sdl, sector 16512136920 op
> 0x0:(READ) flags 0x84700 phys_seg 168 prio class 2
> [Tue Jun 10 13:27:47 2025] XFS (sdl): metadata I/O error in
> "xfs_imap_to_bp+0x4f/0x70 [xfs]" at daddr 0x4804c4398 len 32 error 5
> [Tue Jun 10 13:27:48 2025] sdl: writeback error on inode 32213499599,
> offset 0, sector 35113332208
> 


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2025-06-11  3:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06  5:27 [PATCH 0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr Damien Le Moal
2025-06-06  5:27 ` [PATCH 1/2] scsi: mpi3mr: Correctly handle ATA device errors Damien Le Moal
2025-06-06  5:27 ` [PATCH 2/2] scsi: mpt3sas: " Damien Le Moal
2025-06-09  5:50 ` [PATCH 0/2] Improve ATA NCQ command error in mpt3sas and mpi3mr Christoph Hellwig
2025-06-09  7:09   ` Yafang Shao
2025-06-09  7:17     ` Damien Le Moal
2025-06-11  3:27       ` Yafang Shao
2025-06-11  3:57         ` Damien Le Moal [this message]
2025-06-11  5:42           ` Yafang Shao
2025-06-16  2:13     ` Yafang Shao
2025-06-16  2:28       ` Damien Le Moal
2025-06-16 12:40         ` Yafang Shao
2025-06-16 20:51 ` Martin K. Petersen
2025-06-20  3:00 ` Martin K. Petersen
2025-06-20 17:28   ` Sathya Prakash Veerichetty
2025-06-25  1:44 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b46403b6-3eaf-4152-ad2a-c6c5402dfcac@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=MPT-FusionLinux.pdl@broadcom.com \
    --cc=hch@infradead.org \
    --cc=kashyap.desai@broadcom.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpi3mr-linuxdrv.pdl@broadcom.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=sreekanth.reddy@broadcom.com \
    --cc=suganath-prabu.subramani@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox