* [bug report] block: Non-NCQ commands will never be executed while fio is continuously running @ 2024-09-09 13:10 yangxingui 2024-09-09 13:21 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: yangxingui @ 2024-09-09 13:10 UTC (permalink / raw) To: axboe, John Garry Cc: linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal Hello axboe & John, After the driver exposes all HW queues to the block layer, non-NCQ commands will never be executed while fio is continuously running, such as a smartctl command. The cause of the problem is that other hctx used by the NCQ command is still active and can continue to issue NCQ commands to the sata disk. And the pio command keeps retrying in its corresponding hctx because qc_defer() always returns true. hctx0: ncq, pio, ncq hctx1:ncq, ncq, ... ... hctxn: ncq, ncq, ... Is there any good solution for this? Thanks. Xingui ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-09 13:10 [bug report] block: Non-NCQ commands will never be executed while fio is continuously running yangxingui @ 2024-09-09 13:21 ` Damien Le Moal 2024-09-10 1:09 ` yangxingui 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2024-09-09 13:21 UTC (permalink / raw) To: yangxingui, axboe, John Garry Cc: linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal On 9/9/24 22:10, yangxingui wrote: > Hello axboe & John, > > After the driver exposes all HW queues to the block layer, non-NCQ > commands will never be executed while fio is continuously running, such > as a smartctl command. > > The cause of the problem is that other hctx used by the NCQ command is > still active and can continue to issue NCQ commands to the sata disk. > And the pio command keeps retrying in its corresponding hctx because > qc_defer() always returns true. > > hctx0: ncq, pio, ncq > hctx1:ncq, ncq, ... > ... > hctxn: ncq, ncq, ... > > Is there any good solution for this? SATA devices are single queue so how can you have multiple queues ? What adapter are you using ? -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-09 13:21 ` Damien Le Moal @ 2024-09-10 1:09 ` yangxingui 2024-09-10 4:45 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: yangxingui @ 2024-09-10 1:09 UTC (permalink / raw) To: Damien Le Moal, axboe, John Garry Cc: linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal On 2024/9/9 21:21, Damien Le Moal wrote: > On 9/9/24 22:10, yangxingui wrote: >> Hello axboe & John, >> >> After the driver exposes all HW queues to the block layer, non-NCQ >> commands will never be executed while fio is continuously running, such >> as a smartctl command. >> >> The cause of the problem is that other hctx used by the NCQ command is >> still active and can continue to issue NCQ commands to the sata disk. >> And the pio command keeps retrying in its corresponding hctx because >> qc_defer() always returns true. >> >> hctx0: ncq, pio, ncq >> hctx1:ncq, ncq, ... >> ... >> hctxn: ncq, ncq, ... >> >> Is there any good solution for this? > > SATA devices are single queue so how can you have multiple queues ? > What adapter are you using ? In the following patch, we expose the host's 16 hardware queues to the block layer. And when connecting to a sata disk, 16 hctx are used. 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") Thanks, Xingui . ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 1:09 ` yangxingui @ 2024-09-10 4:45 ` Damien Le Moal 2024-09-10 6:34 ` yangxingui 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2024-09-10 4:45 UTC (permalink / raw) To: yangxingui, axboe, John Garry Cc: linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal On 9/10/24 10:09 AM, yangxingui wrote: > > > On 2024/9/9 21:21, Damien Le Moal wrote: >> On 9/9/24 22:10, yangxingui wrote: >>> Hello axboe & John, >>> >>> After the driver exposes all HW queues to the block layer, non-NCQ >>> commands will never be executed while fio is continuously running, such >>> as a smartctl command. >>> >>> The cause of the problem is that other hctx used by the NCQ command is >>> still active and can continue to issue NCQ commands to the sata disk. >>> And the pio command keeps retrying in its corresponding hctx because >>> qc_defer() always returns true. >>> >>> hctx0: ncq, pio, ncq >>> hctx1:ncq, ncq, ... >>> ... >>> hctxn: ncq, ncq, ... >>> >>> Is there any good solution for this? >> >> SATA devices are single queue so how can you have multiple queues ? >> What adapter are you using ? > > In the following patch, we expose the host's 16 hardware queues to the block > layer. And when connecting to a sata disk, 16 hctx are used. > > 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") OK, so the HBA is a hisi one, using libsas... What is the device ? An SSD ? and HDD ? Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does setting a scheduler resolve the issue ? I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also have multiple queues with a shared tagset. Never seen the issue you are reporting though using HDDs with mq-deadline or bfq as the scheduler. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 4:45 ` Damien Le Moal @ 2024-09-10 6:34 ` yangxingui 2024-09-10 11:27 ` Niklas Cassel 0 siblings, 1 reply; 12+ messages in thread From: yangxingui @ 2024-09-10 6:34 UTC (permalink / raw) To: Damien Le Moal, axboe, John Garry Cc: linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal On 2024/9/10 12:45, Damien Le Moal wrote: > On 9/10/24 10:09 AM, yangxingui wrote: >> >> >> On 2024/9/9 21:21, Damien Le Moal wrote: >>> On 9/9/24 22:10, yangxingui wrote: >>>> Hello axboe & John, >>>> >>>> After the driver exposes all HW queues to the block layer, non-NCQ >>>> commands will never be executed while fio is continuously running, such >>>> as a smartctl command. >>>> >>>> The cause of the problem is that other hctx used by the NCQ command is >>>> still active and can continue to issue NCQ commands to the sata disk. >>>> And the pio command keeps retrying in its corresponding hctx because >>>> qc_defer() always returns true. >>>> >>>> hctx0: ncq, pio, ncq >>>> hctx1:ncq, ncq, ... >>>> ... >>>> hctxn: ncq, ncq, ... >>>> >>>> Is there any good solution for this? >>> >>> SATA devices are single queue so how can you have multiple queues ? >>> What adapter are you using ? >> >> In the following patch, we expose the host's 16 hardware queues to the block >> layer. And when connecting to a sata disk, 16 hctx are used. >> >> 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") > > OK, so the HBA is a hisi one, using libsas... > What is the device ? An SSD ? and HDD ? Both SATA SSD and SATA HDD have this problem. > > Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does > setting a scheduler resolve the issue ? Currently, the default configuration mq-deadline is used, and the same phenomenon occurs when I try setting it to none. It seems to have nothing to do with the scheduling strategy. > > I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also > have multiple queues with a shared tagset. Never seen the issue you are > reporting though using HDDs with mq-deadline or bfq as the scheduler. Unlike libsas, as these hosts don't use qc_defer()? Thanks, Xingui . ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 6:34 ` yangxingui @ 2024-09-10 11:27 ` Niklas Cassel 2024-09-10 22:38 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: Niklas Cassel @ 2024-09-10 11:27 UTC (permalink / raw) To: yangxingui Cc: Damien Le Moal, axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, damien.lemoal On Tue, Sep 10, 2024 at 02:34:06PM +0800, yangxingui wrote: > > > On 2024/9/10 12:45, Damien Le Moal wrote: > > On 9/10/24 10:09 AM, yangxingui wrote: > > > > > > > > > On 2024/9/9 21:21, Damien Le Moal wrote: > > > > On 9/9/24 22:10, yangxingui wrote: > > > > > Hello axboe & John, > > > > > > > > > > After the driver exposes all HW queues to the block layer, non-NCQ > > > > > commands will never be executed while fio is continuously running, such > > > > > as a smartctl command. > > > > > > > > > > The cause of the problem is that other hctx used by the NCQ command is > > > > > still active and can continue to issue NCQ commands to the sata disk. > > > > > And the pio command keeps retrying in its corresponding hctx because > > > > > qc_defer() always returns true. > > > > > > > > > > hctx0: ncq, pio, ncq > > > > > hctx1:ncq, ncq, ... > > > > > ... > > > > > hctxn: ncq, ncq, ... > > > > > > > > > > Is there any good solution for this? > > > > > > > > SATA devices are single queue so how can you have multiple queues ? > > > > What adapter are you using ? > > > > > > In the following patch, we expose the host's 16 hardware queues to the block > > > layer. And when connecting to a sata disk, 16 hctx are used. > > > > > > 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") > > > > OK, so the HBA is a hisi one, using libsas... > > What is the device ? An SSD ? and HDD ? > Both SATA SSD and SATA HDD have this problem. > > > > > Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does > > setting a scheduler resolve the issue ? > Currently, the default configuration mq-deadline is used, and the same > phenomenon occurs when I try setting it to none. It seems to have nothing to > do with the scheduling strategy. > > > > > I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also > > have multiple queues with a shared tagset. Never seen the issue you are > > reporting though using HDDs with mq-deadline or bfq as the scheduler. > Unlike libsas, as these hosts don't use qc_defer()? mpt3sas and mpi3mr do not use any libata code at all, the SCSI to ATA Translation (SAT) is done completely by the HBA, so from a Linux perspective, we are issuing SCSI commands to the HBA. We can see that libsas uses ata_std_qc_defer() as its .qc_defer callback: https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/scsi/libsas/sas_ata.c#L566 If you look at SATA 3.a Gold specification, "13.6.3 Intermixing Non-NCQ commands and NCQ commands" "The host shall not issue a non-NCQ command while an NCQ command is outstanding." In AHCI 1.3.1 specification, "1.7 Theory of Operation" "System software is responsible to ensure that queued and non-queued commands are not mixed in the command list for the same device with the exception of the NCQ Unload command." Usually, tools like smartctl submit SCSI commands of type "ATA-16 passthrough", which is a specific SCSI command that just contains a regular ATA command as payload: https://www.smartmontools.org/browser/trunk/smartmontools/scsiata.cpp?desc=1&order=date#L346 For a "ATA-16 passthrough" SCSI command, libata will simply copy the fields from the "ATA-16 passthrough" SCSI command to the appropriate field in a newly created ATA command, see the SAT specification and: https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/ata/libata-scsi.c#L2878-L2887 See also the SAT-6 specification, "6.2.4 Mechanism for processing some commands as NCQ commands" "The ACS-5 standard defines a mechanism for NCQ encapsulation of some commands. Use of this mechanism allows these commands to be processed without quiescing the ATA device." Without considering if it is a good idea or not, it should be possible to translate some commands to instead use the "NCQ encapsulated" variant of the ATA command that was used in the "ATA-16 passthrough" SCSI command. However looking at e.g.: https://www.smartmontools.org/browser/trunk/smartmontools/scsiata.cpp?desc=1&order=date#L566 smartctl is sending a IDENTIFY DEVICE (ECh) ATA command, and this command has no NCQ encapsulated variant. (Had the application instead used a READ LOG DMA EXT command to read the IDENTIFY DEVICE data log, where log page 01h is a copy of IDENTIFY DEVICE data, we would have been able to convert the command to an NCQ encapsulated variant.) TL;DR: I do not see easy generic solution to this problem. To be able to send a non-queued command, there has to be no NCQ commands queued on the device. I guess you could implement a scheduler that would be quiescing the queue, processes the non-queued command, and then thaw the queue, but that would essentially make non-queued commands high priority commands, and could thus be used to seriously limit throughput by just sending some non-queued commands every now and then :) Kind regards, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 11:27 ` Niklas Cassel @ 2024-09-10 22:38 ` Damien Le Moal 2024-09-11 9:41 ` yangxingui 2024-09-19 12:26 ` Yu Kuai 0 siblings, 2 replies; 12+ messages in thread From: Damien Le Moal @ 2024-09-10 22:38 UTC (permalink / raw) To: Niklas Cassel, yangxingui Cc: axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen On 9/10/24 20:27, Niklas Cassel wrote: > On Tue, Sep 10, 2024 at 02:34:06PM +0800, yangxingui wrote: >> >> >> On 2024/9/10 12:45, Damien Le Moal wrote: >>> On 9/10/24 10:09 AM, yangxingui wrote: >>>> >>>> >>>> On 2024/9/9 21:21, Damien Le Moal wrote: >>>>> On 9/9/24 22:10, yangxingui wrote: >>>>>> Hello axboe & John, >>>>>> >>>>>> After the driver exposes all HW queues to the block layer, non-NCQ >>>>>> commands will never be executed while fio is continuously running, such >>>>>> as a smartctl command. >>>>>> >>>>>> The cause of the problem is that other hctx used by the NCQ command is >>>>>> still active and can continue to issue NCQ commands to the sata disk. >>>>>> And the pio command keeps retrying in its corresponding hctx because >>>>>> qc_defer() always returns true. >>>>>> >>>>>> hctx0: ncq, pio, ncq >>>>>> hctx1:ncq, ncq, ... >>>>>> ... >>>>>> hctxn: ncq, ncq, ... >>>>>> >>>>>> Is there any good solution for this? >>>>> >>>>> SATA devices are single queue so how can you have multiple queues ? >>>>> What adapter are you using ? >>>> >>>> In the following patch, we expose the host's 16 hardware queues to the block >>>> layer. And when connecting to a sata disk, 16 hctx are used. >>>> >>>> 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") >>> >>> OK, so the HBA is a hisi one, using libsas... >>> What is the device ? An SSD ? and HDD ? >> Both SATA SSD and SATA HDD have this problem. >> >>> >>> Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does >>> setting a scheduler resolve the issue ? >> Currently, the default configuration mq-deadline is used, and the same >> phenomenon occurs when I try setting it to none. It seems to have nothing to >> do with the scheduling strategy. >> >>> >>> I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also >>> have multiple queues with a shared tagset. Never seen the issue you are >>> reporting though using HDDs with mq-deadline or bfq as the scheduler. >> Unlike libsas, as these hosts don't use qc_defer()? > > mpt3sas and mpi3mr do not use any libata code at all, the SCSI to ATA > Translation (SAT) is done completely by the HBA, so from a Linux > perspective, we are issuing SCSI commands to the HBA. Yes, but we still can get requeue happening. Though for a SATA drive, that is unlikely since the max queue depth is clearly defined, unlike for SAS drives > We can see that libsas uses ata_std_qc_defer() as its .qc_defer callback: > https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/scsi/libsas/sas_ata.c#L566 And that may be the issue. More on this below. > Without considering if it is a good idea or not, it should be possible to > translate some commands to instead use the "NCQ encapsulated" variant of > the ATA command that was used in the "ATA-16 passthrough" SCSI command. That would be way too much work on the user side, and likely open up a can of device bugs unseen until now. > To be able to send a non-queued command, there has to be no NCQ commands queued > on the device. I guess you could implement a scheduler that would be quiescing > the queue, processes the non-queued command, and then thaw the queue, but that > would essentially make non-queued commands high priority commands, and could > thus be used to seriously limit throughput by just sending some non-queued > commands every now and then :) Passthrough commands do not go through the scheduler and are submitted directly to the dispatch queue, generally at the head of it (see blk_mq_insert_request()). So for a single queue device, even if ata_qc_defer causes a requeue, the passthrough command ends up back at the top of the dispatch queue. After repeating this a few times, all in-flight NCQ commands complete and the passthrough command goes through. But I feel this is very fragile given that the block layer requeue is done through a work item, so in parallel to an application submitting IOs. So in theory, I think that the requeue for the passthrough command could happen forever... And for a multi-queue setup like with the hisi adapter, that is what is happening. I do not have any good idea how to fix that yet. We need to find something. scsi_queue_rq() and the budget/host or device blocked state management may help with that, or we have a bug there... In any case, I do not think it is a block layer issue as the block layer knows nothing about NCQ vs non-NCQ. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 22:38 ` Damien Le Moal @ 2024-09-11 9:41 ` yangxingui 2024-09-19 12:26 ` Yu Kuai 1 sibling, 0 replies; 12+ messages in thread From: yangxingui @ 2024-09-11 9:41 UTC (permalink / raw) To: Damien Le Moal, Niklas Cassel Cc: axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen On 2024/9/11 6:38, Damien Le Moal wrote: > On 9/10/24 20:27, Niklas Cassel wrote: >> On Tue, Sep 10, 2024 at 02:34:06PM +0800, yangxingui wrote: >>> >>> >>> On 2024/9/10 12:45, Damien Le Moal wrote: >>>> On 9/10/24 10:09 AM, yangxingui wrote: >>>>> >>>>> >>>>> On 2024/9/9 21:21, Damien Le Moal wrote: >>>>>> On 9/9/24 22:10, yangxingui wrote: >>>>>>> Hello axboe & John, >>>>>>> >>>>>>> After the driver exposes all HW queues to the block layer, non-NCQ >>>>>>> commands will never be executed while fio is continuously running, such >>>>>>> as a smartctl command. >>>>>>> >>>>>>> The cause of the problem is that other hctx used by the NCQ command is >>>>>>> still active and can continue to issue NCQ commands to the sata disk. >>>>>>> And the pio command keeps retrying in its corresponding hctx because >>>>>>> qc_defer() always returns true. >>>>>>> >>>>>>> hctx0: ncq, pio, ncq >>>>>>> hctx1:ncq, ncq, ... >>>>>>> ... >>>>>>> hctxn: ncq, ncq, ... >>>>>>> >>>>>>> Is there any good solution for this? >>>>>> >>>>>> SATA devices are single queue so how can you have multiple queues ? >>>>>> What adapter are you using ? >>>>> >>>>> In the following patch, we expose the host's 16 hardware queues to the block >>>>> layer. And when connecting to a sata disk, 16 hctx are used. >>>>> >>>>> 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") >>>> >>>> OK, so the HBA is a hisi one, using libsas... >>>> What is the device ? An SSD ? and HDD ? >>> Both SATA SSD and SATA HDD have this problem. >>> >>>> >>>> Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does >>>> setting a scheduler resolve the issue ? >>> Currently, the default configuration mq-deadline is used, and the same >>> phenomenon occurs when I try setting it to none. It seems to have nothing to >>> do with the scheduling strategy. >>> >>>> >>>> I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also >>>> have multiple queues with a shared tagset. Never seen the issue you are >>>> reporting though using HDDs with mq-deadline or bfq as the scheduler. >>> Unlike libsas, as these hosts don't use qc_defer()? >> >> mpt3sas and mpi3mr do not use any libata code at all, the SCSI to ATA >> Translation (SAT) is done completely by the HBA, so from a Linux >> perspective, we are issuing SCSI commands to the HBA. > > Yes, but we still can get requeue happening. Though for a SATA drive, that is > unlikely since the max queue depth is clearly defined, unlike for SAS drives > >> We can see that libsas uses ata_std_qc_defer() as its .qc_defer callback: >> https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/scsi/libsas/sas_ata.c#L566 > > And that may be the issue. More on this below. > >> Without considering if it is a good idea or not, it should be possible to >> translate some commands to instead use the "NCQ encapsulated" variant of >> the ATA command that was used in the "ATA-16 passthrough" SCSI command. > > That would be way too much work on the user side, and likely open up a can of > device bugs unseen until now. > >> To be able to send a non-queued command, there has to be no NCQ commands queued >> on the device. I guess you could implement a scheduler that would be quiescing >> the queue, processes the non-queued command, and then thaw the queue, but that >> would essentially make non-queued commands high priority commands, and could >> thus be used to seriously limit throughput by just sending some non-queued >> commands every now and then :) > > Passthrough commands do not go through the scheduler and are submitted directly > to the dispatch queue, generally at the head of it (see blk_mq_insert_request()). > > So for a single queue device, even if ata_qc_defer causes a requeue, the > passthrough command ends up back at the top of the dispatch queue. After > repeating this a few times, all in-flight NCQ commands complete and the > passthrough command goes through. > > But I feel this is very fragile given that the block layer requeue is done > through a work item, so in parallel to an application submitting IOs. So in > theory, I think that the requeue for the passthrough command could happen forever... > > And for a multi-queue setup like with the hisi adapter, that is what is happening. > > I do not have any good idea how to fix that yet. We need to find something. > scsi_queue_rq() and the budget/host or device blocked state management may help > with that, or we have a bug there... In any case, I do not think it is a block > layer issue as the block layer knows nothing about NCQ vs non-NCQ. > Thanks for your reply, can we provide a module parameter to confirm whether to expose multiple queues to the upper layer? And let users choose. Thanks, Xingui . ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-10 22:38 ` Damien Le Moal 2024-09-11 9:41 ` yangxingui @ 2024-09-19 12:26 ` Yu Kuai 2024-09-19 14:14 ` Damien Le Moal 1 sibling, 1 reply; 12+ messages in thread From: Yu Kuai @ 2024-09-19 12:26 UTC (permalink / raw) To: Damien Le Moal, Niklas Cassel, yangxingui Cc: axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, yukuai (C), yangerkun@huawei.com Hi, 在 2024/09/11 6:38, Damien Le Moal 写道: > On 9/10/24 20:27, Niklas Cassel wrote: >> On Tue, Sep 10, 2024 at 02:34:06PM +0800, yangxingui wrote: >>> >>> >>> On 2024/9/10 12:45, Damien Le Moal wrote: >>>> On 9/10/24 10:09 AM, yangxingui wrote: >>>>> >>>>> >>>>> On 2024/9/9 21:21, Damien Le Moal wrote: >>>>>> On 9/9/24 22:10, yangxingui wrote: >>>>>>> Hello axboe & John, >>>>>>> >>>>>>> After the driver exposes all HW queues to the block layer, non-NCQ >>>>>>> commands will never be executed while fio is continuously running, such >>>>>>> as a smartctl command. >>>>>>> >>>>>>> The cause of the problem is that other hctx used by the NCQ command is >>>>>>> still active and can continue to issue NCQ commands to the sata disk. >>>>>>> And the pio command keeps retrying in its corresponding hctx because >>>>>>> qc_defer() always returns true. >>>>>>> >>>>>>> hctx0: ncq, pio, ncq >>>>>>> hctx1:ncq, ncq, ... >>>>>>> ... >>>>>>> hctxn: ncq, ncq, ... >>>>>>> >>>>>>> Is there any good solution for this? >>>>>> >>>>>> SATA devices are single queue so how can you have multiple queues ? >>>>>> What adapter are you using ? >>>>> >>>>> In the following patch, we expose the host's 16 hardware queues to the block >>>>> layer. And when connecting to a sata disk, 16 hctx are used. >>>>> >>>>> 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") >>>> >>>> OK, so the HBA is a hisi one, using libsas... >>>> What is the device ? An SSD ? and HDD ? >>> Both SATA SSD and SATA HDD have this problem. >>> >>>> >>>> Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does >>>> setting a scheduler resolve the issue ? >>> Currently, the default configuration mq-deadline is used, and the same >>> phenomenon occurs when I try setting it to none. It seems to have nothing to >>> do with the scheduling strategy. >>> >>>> >>>> I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also >>>> have multiple queues with a shared tagset. Never seen the issue you are >>>> reporting though using HDDs with mq-deadline or bfq as the scheduler. >>> Unlike libsas, as these hosts don't use qc_defer()? >> >> mpt3sas and mpi3mr do not use any libata code at all, the SCSI to ATA >> Translation (SAT) is done completely by the HBA, so from a Linux >> perspective, we are issuing SCSI commands to the HBA. > > Yes, but we still can get requeue happening. Though for a SATA drive, that is > unlikely since the max queue depth is clearly defined, unlike for SAS drives > >> We can see that libsas uses ata_std_qc_defer() as its .qc_defer callback: >> https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/scsi/libsas/sas_ata.c#L566 > > And that may be the issue. More on this below. > >> Without considering if it is a good idea or not, it should be possible to >> translate some commands to instead use the "NCQ encapsulated" variant of >> the ATA command that was used in the "ATA-16 passthrough" SCSI command. > > That would be way too much work on the user side, and likely open up a can of > device bugs unseen until now. > >> To be able to send a non-queued command, there has to be no NCQ commands queued >> on the device. I guess you could implement a scheduler that would be quiescing >> the queue, processes the non-queued command, and then thaw the queue, but that >> would essentially make non-queued commands high priority commands, and could >> thus be used to seriously limit throughput by just sending some non-queued >> commands every now and then :) > > Passthrough commands do not go through the scheduler and are submitted directly > to the dispatch queue, generally at the head of it (see blk_mq_insert_request()). > > So for a single queue device, even if ata_qc_defer causes a requeue, the > passthrough command ends up back at the top of the dispatch queue. After > repeating this a few times, all in-flight NCQ commands complete and the > passthrough command goes through. > > But I feel this is very fragile given that the block layer requeue is done > through a work item, so in parallel to an application submitting IOs. So in > theory, I think that the requeue for the passthrough command could happen forever... > > And for a multi-queue setup like with the hisi adapter, that is what is happening. > > I do not have any good idea how to fix that yet. We need to find something. > scsi_queue_rq() and the budget/host or device blocked state management may help > with that, or we have a bug there... In any case, I do not think it is a block > layer issue as the block layer knows nothing about NCQ vs non-NCQ. Does libata return a specific value in this case? If so, maybe we can stop other hctx untill this IO is handled. For now, I think libata should use single hctx, it just doesn't support multiple hctx yet. Thanks, Kuai > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-19 12:26 ` Yu Kuai @ 2024-09-19 14:14 ` Damien Le Moal 2024-10-31 14:12 ` Niklas Cassel 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2024-09-19 14:14 UTC (permalink / raw) To: Yu Kuai, Niklas Cassel, yangxingui Cc: axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, yukuai (C), yangerkun@huawei.com On 2024/09/19 14:26, Yu Kuai wrote: > Hi, > > 在 2024/09/11 6:38, Damien Le Moal 写道: >> On 9/10/24 20:27, Niklas Cassel wrote: >>> On Tue, Sep 10, 2024 at 02:34:06PM +0800, yangxingui wrote: >>>> >>>> >>>> On 2024/9/10 12:45, Damien Le Moal wrote: >>>>> On 9/10/24 10:09 AM, yangxingui wrote: >>>>>> >>>>>> >>>>>> On 2024/9/9 21:21, Damien Le Moal wrote: >>>>>>> On 9/9/24 22:10, yangxingui wrote: >>>>>>>> Hello axboe & John, >>>>>>>> >>>>>>>> After the driver exposes all HW queues to the block layer, non-NCQ >>>>>>>> commands will never be executed while fio is continuously running, such >>>>>>>> as a smartctl command. >>>>>>>> >>>>>>>> The cause of the problem is that other hctx used by the NCQ command is >>>>>>>> still active and can continue to issue NCQ commands to the sata disk. >>>>>>>> And the pio command keeps retrying in its corresponding hctx because >>>>>>>> qc_defer() always returns true. >>>>>>>> >>>>>>>> hctx0: ncq, pio, ncq >>>>>>>> hctx1:ncq, ncq, ... >>>>>>>> ... >>>>>>>> hctxn: ncq, ncq, ... >>>>>>>> >>>>>>>> Is there any good solution for this? >>>>>>> >>>>>>> SATA devices are single queue so how can you have multiple queues ? >>>>>>> What adapter are you using ? >>>>>> >>>>>> In the following patch, we expose the host's 16 hardware queues to the block >>>>>> layer. And when connecting to a sata disk, 16 hctx are used. >>>>>> >>>>>> 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ") >>>>> >>>>> OK, so the HBA is a hisi one, using libsas... >>>>> What is the device ? An SSD ? and HDD ? >>>> Both SATA SSD and SATA HDD have this problem. >>>> >>>>> >>>>> Do you set a block I/O scheduler for the drive, e.g. mq-deadline. If not, does >>>>> setting a scheduler resolve the issue ? >>>> Currently, the default configuration mq-deadline is used, and the same >>>> phenomenon occurs when I try setting it to none. It seems to have nothing to >>>> do with the scheduling strategy. >>>> >>>>> >>>>> I do not have any hisi HBA. I use a lot of mpt3sas and mpi3mr HBAs which also >>>>> have multiple queues with a shared tagset. Never seen the issue you are >>>>> reporting though using HDDs with mq-deadline or bfq as the scheduler. >>>> Unlike libsas, as these hosts don't use qc_defer()? >>> >>> mpt3sas and mpi3mr do not use any libata code at all, the SCSI to ATA >>> Translation (SAT) is done completely by the HBA, so from a Linux >>> perspective, we are issuing SCSI commands to the HBA. >> >> Yes, but we still can get requeue happening. Though for a SATA drive, that is >> unlikely since the max queue depth is clearly defined, unlike for SAS drives >> >>> We can see that libsas uses ata_std_qc_defer() as its .qc_defer callback: >>> https://github.com/torvalds/linux/blob/v6.11-rc7/drivers/scsi/libsas/sas_ata.c#L566 >> >> And that may be the issue. More on this below. >> >>> Without considering if it is a good idea or not, it should be possible to >>> translate some commands to instead use the "NCQ encapsulated" variant of >>> the ATA command that was used in the "ATA-16 passthrough" SCSI command. >> >> That would be way too much work on the user side, and likely open up a can of >> device bugs unseen until now. >> >>> To be able to send a non-queued command, there has to be no NCQ commands queued >>> on the device. I guess you could implement a scheduler that would be quiescing >>> the queue, processes the non-queued command, and then thaw the queue, but that >>> would essentially make non-queued commands high priority commands, and could >>> thus be used to seriously limit throughput by just sending some non-queued >>> commands every now and then :) >> >> Passthrough commands do not go through the scheduler and are submitted directly >> to the dispatch queue, generally at the head of it (see blk_mq_insert_request()). >> >> So for a single queue device, even if ata_qc_defer causes a requeue, the >> passthrough command ends up back at the top of the dispatch queue. After >> repeating this a few times, all in-flight NCQ commands complete and the >> passthrough command goes through. >> >> But I feel this is very fragile given that the block layer requeue is done >> through a work item, so in parallel to an application submitting IOs. So in >> theory, I think that the requeue for the passthrough command could happen forever... >> >> And for a multi-queue setup like with the hisi adapter, that is what is happening. >> >> I do not have any good idea how to fix that yet. We need to find something. >> scsi_queue_rq() and the budget/host or device blocked state management may help >> with that, or we have a bug there... In any case, I do not think it is a block >> layer issue as the block layer knows nothing about NCQ vs non-NCQ. > > Does libata return a specific value in this case? If so, maybe we can > stop other hctx untill this IO is handled. > > For now, I think libata should use single hctx, it just doesn't support > multiple hctx yet. libata does not care/know about hctx. It only issues commands to ATA devices, which always are single queue. And pure SATA adapters like AHCI are always single queue. The issue at hand can happen only for libsas based SAS HBAs that have multiple command submission queues (with a shared tag set). Commands for the same device may end up being submitted through different queues, and when the submitted commands include a mix of NCQ and non-NCQ commands, the problem happens without libata being able to easily do anything about it, and not possible control possible at the scsi layer either since the commands submitted are SCSI (not yet translated to ATA commands) which do not have any NCQ/non-NCQ exclusion knowledge at all. NCQ is an ATA concept unknown to the scsi and block layer. We (Niklas and I) are trying to find a solution, but that may not be within libata itself. It may need changes to libsas as well. Not sure yet. Still exploring. > > Thanks, > Kuai > >> > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-09-19 14:14 ` Damien Le Moal @ 2024-10-31 14:12 ` Niklas Cassel 2024-11-01 2:17 ` yangxingui 0 siblings, 1 reply; 12+ messages in thread From: Niklas Cassel @ 2024-10-31 14:12 UTC (permalink / raw) To: Damien Le Moal Cc: Yu Kuai, yangxingui, axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, yukuai (C), yangerkun@huawei.com On Thu, Sep 19, 2024 at 04:14:15PM +0200, Damien Le Moal wrote: > On 2024/09/19 14:26, Yu Kuai wrote: > > > > Does libata return a specific value in this case? If so, maybe we can > > stop other hctx untill this IO is handled. > > > > For now, I think libata should use single hctx, it just doesn't support > > multiple hctx yet. > > libata does not care/know about hctx. It only issues commands to ATA devices, > which always are single queue. And pure SATA adapters like AHCI are always > single queue. > > The issue at hand can happen only for libsas based SAS HBAs that have multiple > command submission queues (with a shared tag set). Commands for the same device > may end up being submitted through different queues, and when the submitted > commands include a mix of NCQ and non-NCQ commands, the problem happens without > libata being able to easily do anything about it, and not possible control > possible at the scsi layer either since the commands submitted are SCSI (not yet > translated to ATA commands) which do not have any NCQ/non-NCQ exclusion > knowledge at all. NCQ is an ATA concept unknown to the scsi and block layer. > > We (Niklas and I) are trying to find a solution, but that may not be within > libata itself. It may need changes to libsas as well. Not sure yet. Still exploring. Hello Xingui, I send a proposed solution to this problem here: https://lore.kernel.org/linux-ide/20241031140731.224589-4-cassel@kernel.org/ Please test and see if it addresses your problem. Kind regards, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug report] block: Non-NCQ commands will never be executed while fio is continuously running 2024-10-31 14:12 ` Niklas Cassel @ 2024-11-01 2:17 ` yangxingui 0 siblings, 0 replies; 12+ messages in thread From: yangxingui @ 2024-11-01 2:17 UTC (permalink / raw) To: Niklas Cassel, Damien Le Moal Cc: Yu Kuai, axboe, John Garry, linux-block, linux-kernel, James.Bottomley, Martin K. Petersen, yukuai (C), yangerkun@huawei.com On 2024/10/31 22:12, Niklas Cassel wrote: > On Thu, Sep 19, 2024 at 04:14:15PM +0200, Damien Le Moal wrote: >> On 2024/09/19 14:26, Yu Kuai wrote: >>> >>> Does libata return a specific value in this case? If so, maybe we can >>> stop other hctx untill this IO is handled. >>> >>> For now, I think libata should use single hctx, it just doesn't support >>> multiple hctx yet. >> >> libata does not care/know about hctx. It only issues commands to ATA devices, >> which always are single queue. And pure SATA adapters like AHCI are always >> single queue. >> >> The issue at hand can happen only for libsas based SAS HBAs that have multiple >> command submission queues (with a shared tag set). Commands for the same device >> may end up being submitted through different queues, and when the submitted >> commands include a mix of NCQ and non-NCQ commands, the problem happens without >> libata being able to easily do anything about it, and not possible control >> possible at the scsi layer either since the commands submitted are SCSI (not yet >> translated to ATA commands) which do not have any NCQ/non-NCQ exclusion >> knowledge at all. NCQ is an ATA concept unknown to the scsi and block layer. >> >> We (Niklas and I) are trying to find a solution, but that may not be within >> libata itself. It may need changes to libsas as well. Not sure yet. Still exploring. > > Hello Xingui, > > I send a proposed solution to this problem here: > https://lore.kernel.org/linux-ide/20241031140731.224589-4-cassel@kernel.org/ > > Please test and see if it addresses your problem. > OK, thanks for following this issue and fixing it, we will verify it as soon as possible. Thanks. Xingui ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-11-01 2:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-09 13:10 [bug report] block: Non-NCQ commands will never be executed while fio is continuously running yangxingui 2024-09-09 13:21 ` Damien Le Moal 2024-09-10 1:09 ` yangxingui 2024-09-10 4:45 ` Damien Le Moal 2024-09-10 6:34 ` yangxingui 2024-09-10 11:27 ` Niklas Cassel 2024-09-10 22:38 ` Damien Le Moal 2024-09-11 9:41 ` yangxingui 2024-09-19 12:26 ` Yu Kuai 2024-09-19 14:14 ` Damien Le Moal 2024-10-31 14:12 ` Niklas Cassel 2024-11-01 2:17 ` yangxingui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).