From: Randy Dunlap <rdunlap@infradead.org>
To: Uday Shankar <ushankar@purestorage.com>,
Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>,
Caleb Sander Mateos <csander@purestorage.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Jonathan Corbet <corbet@lwn.net>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v6 8/8] Documentation: ublk: document UBLK_F_RR_TAGS
Date: Wed, 7 May 2025 16:08:49 -0700 [thread overview]
Message-ID: <e454bcf1-ae3c-41bd-b376-6560ea534925@infradead.org> (raw)
In-Reply-To: <20250507-ublk_task_per_io-v6-8-a2a298783c01@purestorage.com>
Hi,
On 5/7/25 2:49 PM, Uday Shankar wrote:
> Document the new flag UBLK_F_RR_TAGS along with its intended use case.
> Also describe the new restrictions on threading model imposed by
> ublk_drv (one (qid,tag) pair is can be served by only one thread), and
> remove references to ubq_daemon/per-queue threads, since such a concept
> no longer exists.
>
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> ---
> Documentation/block/ublk.rst | 83 ++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 72 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> index 854f823b46c2add01d0b65ba36aecd26c45bb65d..e9cbabdd69c5539a02463780ba5e51de0416c3f6 100644
> --- a/Documentation/block/ublk.rst
> +++ b/Documentation/block/ublk.rst
> @@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
>
> - ``UBLK_CMD_START_DEV``
>
> - After the server prepares userspace resources (such as creating per-queue
> - pthread & io_uring for handling ublk IO), this command is sent to the
> + After the server prepares userspace resources (such as creating I/O handler
> + threads & io_uring for handling ublk IO), this command is sent to the
> driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
> ``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
>
> - ``UBLK_CMD_STOP_DEV``
>
> Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
> - ublk server will release resources (such as destroying per-queue pthread &
> + ublk server will release resources (such as destroying I/O handler threads &
> io_uring).
>
> - ``UBLK_CMD_DEL_DEV``
> @@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
> modify how I/O is handled while the ublk server is dying/dead (this is called
> the ``nosrv`` case in the driver code).
>
> - With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
> - handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
> + With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
> + ublk does not delete ``/dev/ublkb*`` during the whole
> recovery stage and ublk device ID is kept. It is ublk server's
> responsibility to recover the device context by its own knowledge.
> Requests which have not been issued to userspace are requeued. Requests
> which have been issued to userspace are aborted.
>
> - With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
> - (ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
> + With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
> + exits, contrary to ``UBLK_F_USER_RECOVERY``,
> requests which have been issued to userspace are requeued and will be
> re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
> ``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
> @@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
> Data plane
> ----------
>
> -ublk server needs to create per-queue IO pthread & io_uring for handling IO
> -commands via io_uring passthrough. The per-queue IO pthread
> -focuses on IO handling and shouldn't handle any control & management
> -tasks.
> +The ublk server should create dedicated threads for handling I/O. Each
> +thread should have its own io_uring through which it is notified of new
> +I/O, and through which it can complete I/O. These dedicated threads
> +should focus on IO handling and shouldn't handle any control &
> +management tasks.
>
> The's IO is assigned by a unique tag, which is 1:1 mapping with IO
???
> request of ``/dev/ublkb*``.
> @@ -265,6 +266,13 @@ with specified IO tag in the command data:
> destined to ``/dev/ublkb*``. This command is sent only once from the server
> IO pthread for ublk driver to setup IO forward environment.
>
> + Once a thread issues this command against a given (qid,tag) pair, the thread
> + registers itself as that I/O's daemon. In the future, only that I/O's daemon
> + is allowed to issue commands against the I/O. If any other thread attempts
> + to issue a command against a (qid,tag) pair for which the thread is not the
> + daemon, the command will fail. Daemons can be reset only be going through
> + recovery.
> +
> - ``UBLK_IO_COMMIT_AND_FETCH_REQ``
>
> When an IO request is destined to ``/dev/ublkb*``, the driver stores
> @@ -309,6 +317,59 @@ with specified IO tag in the command data:
> ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> the server buffer (pages) read to the IO request pages.
>
> +Load balancing
> +--------------
> +
> +A simple approach to designing a ublk server might involve selecting a
> +number of I/O handler threads N, creating devices with N queues, and
> +pairing up I/O handler threads with queues, so that each thread gets a
> +unique qid, and it issues ``FETCH_REQ``s against all tags for that qid.
> +Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this
> +was essentially the only option (*)
Add ending period (full stop), please.
> +
> +This approach can run into performance issues under imbalanced load.
> +This architecture taken together with the `blk-mq architecture
> +<https://docs.kernel.org/block/blk-mq.html>`_ implies that there is a
> +fixed mapping from I/O submission CPU to the ublk server thread that
> +handles it. If the workload is CPU-bottlenecked, only allowing one ublk
> +server thread to handle all the I/O generated from a single CPU can
> +limit peak bandwidth.
> +
> +To address this issue, two changes were made:
> +
> +- ublk servers can now pair up threads with I/Os (i.e. (qid,tag) pairs)
> + arbitrarily. In particular, the preexisting restriction that all I/Os
> + in one queue must be served by the same thread is lifted.
> +- ublk servers can now specify ``UBLK_F_RR_TAGS`` when creating a ublk
> + device to get round-robin tag allocation on each queue
Add ending period (full stop), please.
> +
> +The ublk server can check for the presence of these changes by testing
> +for the ``UBLK_F_RR_TAGS`` feature.
> +
> +With these changes, a ublk server can balance load as follows:
> +
> +- create the device with ``UBLK_F_RR_TAGS`` set in
> + ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command
> +- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in
> + a round-robin manner. For example, for a device configured with
> + ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4
> + I/O handling threads, ``FETCH_REQ``s could be issued as follows, where
> + each entry in the table is the pair (``ublksrv_io_cmd::q_id``,
> + ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``.
> +
> + ======== ======== ======== ========
> + thread 0 thread 1 thread 2 thread 3
> + ======== ======== ======== ========
> + (0, 0) (0, 1) (0, 2) (0, 3)
> + (1, 3) (1, 0) (1, 1) (1, 2)
> +
> +With this setup, I/O submitted on a CPU which maps to queue 0 will be
> +balanced across all threads instead of all landing on the same thread.
> +Thus, a potential bottleneck is avoided.
> +
> +(*) technically, one I/O handling thread could service multiple queues
Technically,
> +if it wanted to, but that doesn't help with imbalanced load
Add ending period (full stop), please.
> +
> Zero copy
> ---------
>
>
--
~Randy
next prev parent reply other threads:[~2025-05-07 23:09 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 21:49 [PATCH v6 0/8] ublk: decouple server threads from hctxs Uday Shankar
2025-05-07 21:49 ` [PATCH v6 1/8] ublk: have a per-io daemon instead of a per-queue daemon Uday Shankar
2025-05-09 3:29 ` Ming Lei
2025-05-10 23:54 ` Caleb Sander Mateos
2025-05-11 0:17 ` Caleb Sander Mateos
2025-05-07 21:49 ` [PATCH v6 2/8] sbitmap: fix off-by-one when wrapping hint Uday Shankar
2025-05-09 3:51 ` Ming Lei
2025-05-11 0:35 ` Caleb Sander Mateos
2025-05-07 21:49 ` [PATCH v6 3/8] selftests: ublk: kublk: plumb q_id in io_uring user_data Uday Shankar
2025-05-09 7:28 ` Ming Lei
2025-05-07 21:49 ` [PATCH v6 4/8] selftests: ublk: kublk: tie sqe allocation to io instead of queue Uday Shankar
2025-05-09 7:40 ` Ming Lei
2025-05-07 21:49 ` [PATCH v6 5/8] selftests: ublk: kublk: lift queue initialization out of thread Uday Shankar
2025-05-09 7:44 ` Ming Lei
2025-05-07 21:49 ` [PATCH v6 6/8] selftests: ublk: kublk: move per-thread data out of ublk_queue Uday Shankar
2025-05-09 8:14 ` Ming Lei
2025-05-07 21:49 ` [PATCH v6 7/8] selftests: ublk: kublk: decouple ublk_queues from ublk server threads Uday Shankar
2025-05-09 8:31 ` Ming Lei
2025-05-09 8:46 ` Ming Lei
2025-05-07 21:49 ` [PATCH v6 8/8] Documentation: ublk: document UBLK_F_RR_TAGS Uday Shankar
2025-05-07 23:08 ` Randy Dunlap [this message]
2025-05-08 2:08 ` Bagas Sanjaya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e454bcf1-ae3c-41bd-b376-6560ea534925@infradead.org \
--to=rdunlap@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=corbet@lwn.net \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=shuah@kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).