linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Hindborg <andreas.hindborg@wdc.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org
Subject: Reordering of ublk IO requests
Date: Wed, 16 Nov 2022 16:00:15 +0100	[thread overview]
Message-ID: <87sfii99e7.fsf@wdc.com> (raw)


Hi Ming,

I have a question regarding ublk. For context, I am working on adding
zoned storage support to ublk, and you have been very kind to help me on
Github [1].

I have a problem with ordering of requests after they are issued to the
ublk driver. Basically ublk will reverse the ordering of requests that are
batched. The behavior can be observed with the following fio workload:

> fio --name=test --ioengine=io_uring --rw=read --bs=4m --direct=1
> --size=4m --filename=/dev/ublkb0

For a loopback ublk target I get the following from blktrace:

> 259,2    0     3469   286.337681303   724  D   R 0 + 1024 [fio]
> 259,2    0     3470   286.337691313   724  D   R 1024 + 1024 [fio]
> 259,2    0     3471   286.337694423   724  D   R 2048 + 1024 [fio]
> 259,2    0     3472   286.337696583   724  D   R 3072 + 1024 [fio]
> 259,2    0     3473   286.337698433   724  D   R 4096 + 1024 [fio]
> 259,2    0     3474   286.337700213   724  D   R 5120 + 1024 [fio]
> 259,2    0     3475   286.337702723   724  D   R 6144 + 1024 [fio]
> 259,2    0     3476   286.337704323   724  D   R 7168 + 1024 [fio]
> 259,1    0     1794   286.337794934   390  D   R 6144 + 2048 [ublk]
> 259,1    0     1795   286.337805504   390  D   R 4096 + 2048 [ublk]
> 259,1    0     1796   286.337816274   390  D   R 2048 + 2048 [ublk]
> 259,1    0     1797   286.337821744   390  D   R 0 + 2048 [ublk]

And enabling debug prints in ublk shows that the reversal happens when
ublk defers work to the io_uring context:

> kernel: ublk_queue_rq: qid 0, tag 180, sect 0
> kernel: ublk_queue_rq: qid 0, tag 181, sect 1024
> kernel: ublk_queue_rq: qid 0, tag 182, sect 2048
> kernel: ublk_queue_rq: qid 0, tag 183, sect 3072
> kernel: ublk_queue_rq: qid 0, tag 184, sect 4096
> kernel: ublk_queue_rq: qid 0, tag 185, sect 5120
> kernel: ublk_queue_rq: qid 0, tag 186, sect 6144
> kernel: ublk_queue_rq: qid 0, tag 187, sect 7168
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 187 io_flags 1 addr 7f245d359000, sect 7168
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 186 io_flags 1 addr 7f245fcfd000, sect 6144
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 185 io_flags 1 addr 7f245fd7f000, sect 5120
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 184 io_flags 1 addr 7f245fe01000, sect 4096
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 183 io_flags 1 addr 7f245fe83000, sect 3072
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 182 io_flags 1 addr 7f245ff05000, sect 2048
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 181 io_flags 1 addr 7f245ff87000, sect 1024
> kernel: __ublk_rq_task_work: complete: op 33, qid 0 tag 180 io_flags 1 addr 7f2460009000, sect 0

The problem seems to be that the method used to defer work to the
io_uring context, task_work_add(), is using a stack to queue the
callbacks.

As you probably are aware, writes to zoned storage must
happen at the write pointer, so when order is lost there is a problem.
But I would assume that this behavior is also undesirable in other
circumstances.

I am not sure what is the best approach to change this behavor. Maybe
having a separate queue for the requests and then only scheduling work
if a worker is not already processing the queue?

If you like, I can try to implement a fix?

Best regards,
Andreas Hindborg

[1] https://github.com/ming1/ubdsrv/pull/28

             reply	other threads:[~2022-11-16 16:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 15:00 Andreas Hindborg [this message]
2022-11-17  2:18 ` Reordering of ublk IO requests Ming Lei
2022-11-17  8:05   ` Andreas Hindborg
2022-11-17  8:52     ` Ming Lei
2022-11-17  9:07       ` Andreas Hindborg
2022-11-17 11:47         ` Ming Lei
2022-11-17 11:59           ` Andreas Hindborg
2022-11-17 13:11             ` Damien Le Moal
2022-11-17 13:31               ` Andreas Hindborg
2022-11-18  1:51                 ` Damien Le Moal
2022-11-18  9:29                   ` Andreas Hindborg
2022-11-18  4:12             ` Ming Lei
2022-11-18  4:35               ` Damien Le Moal
2022-11-18  6:07                 ` Ming Lei
2022-11-18  9:41                   ` Andreas Hindborg
2022-11-18 11:28                     ` Ming Lei
2022-11-18 11:49                       ` Andreas Hindborg
2022-11-18 12:46                         ` Andreas Hindborg
2022-11-18 12:47                         ` Ming Lei
2022-11-19  0:24                           ` Damien Le Moal
2022-11-19  7:36                             ` Andreas Hindborg
2022-11-21 10:15                               ` Andreas Hindborg
2022-11-20 14:37                             ` Ming Lei
2022-11-21  1:25                               ` Damien Le Moal
2022-11-21  8:03                             ` Christoph Hellwig
2022-11-21  8:13                               ` Ming Lei
2022-11-17 13:00         ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfii99e7.fsf@wdc.com \
    --to=andreas.hindborg@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).