From: "Michael S. Tsirkin" <mst@redhat.com>
To: michael.christie@oracle.com
Cc: stefanha@redhat.com, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v7 00/14] vhost: multiple worker support
Date: Fri, 2 Jun 2023 07:49:54 -0400 [thread overview]
Message-ID: <20230602074855-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <b7bc4a59-af42-bc4e-0bc8-05b5e6885750@oracle.com>
Hi Mike,
On Fri, Apr 28, 2023 at 11:35:20AM -0500, michael.christie@oracle.com wrote:
> The following patches were built over Linux's tree. They allow us to
> support multiple vhost workers tasks per device. The design is a modified
> version of Stefan's original idea where userspace has the kernel create a
> worker and we pass back the pid. In this version instead of passing the
> pid between user/kernel space we use a worker_id which is just an integer
> managed by the vhost driver and we allow userspace to create and free
> workers and then attach them to virtqueues at setup time.
>
> All review comments from the past reviews should be handled. If I didn't
> reply to a review comment, I agreed with the comment and should have
> handled it in this posting. Let me know if I missed one.
>
> Results:
> --------
>
> fio jobs 1 2 4 8 12 16
> ----------------------------------------------------------
> 1 worker 160k 488k - - - -
> worker per vq 160k 310k 620k 1300k 1836k 2326k
>
>
> Notes:
> 0. This used a simple fio command:
>
> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128 --numjobs=$JOBS_ABOVE
>
> and I used a VM with 16 vCPUs and 16 virtqueues.
>
> 1. The patches were tested with LIO's emulate_pr=0 which drops the
> LIO PR lock use. This was a bottleneck at around 12 vqs/jobs.
>
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
>
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> I have 2 patches that fix this. One is just submit from the worker thread
> instead of kikcing off to a workqueue like how the vhost block patches do.
> I'll post this after the worker support is merged. I'm also working on
> patches to add back batching during lio completion and do polling on the
> submission side.
>
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
>
> fio using io_uring and batching with the iodepth_batch* settings:
>
> fio jobs 1 2 4 8 12 16
> -------------------------------------------------------------
> 1 worker 494k 520k - - - -
> worker per vq 496k 878k 1542k 2436k 2304k 2590k
Could you rebase on latest rc and repost please?
Thanks!
> V7:
> - Add more comments about assumptions.
> - Drop refcounting and just use an attachment_cnt variable, so there
> is no confusion about when a worker is freed.
> - Do a opt-in model, because vsiock has an issue where it can queue works
> before it's running and it doesn't even need multiple workers, so there
> is no point in chaning the driver or core code.
> - Add back get worker ioctl.
> - Broke up last patches to make it easier to read.
>
> V6:
> - Rebase against vhost_task patchset.
> - Used xa instead of idr.
> V5:
> - Rebase against user_worker patchset.
> - Rebase against flush patchset.
> - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker.
> V4:
> - fix vhost-sock VSOCK_VQ_RX use.
> - name functions called directly by ioctl cmd's to match the ioctl cmd.
> - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd.
> - document worker lifetime, and cgroup, namespace, mm, rlimit
> inheritance, make it clear we currently only support sharing within the
> device.
> - add support to attach workers while IO is running.
> - instead of passing a pid_t of the kernel thread, pass a int allocated
> by the vhost layer with an idr.
>
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
>
> V2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
prev parent reply other threads:[~2023-06-02 11:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-28 16:31 [PATCH 01/14] vhost: add vhost_worker pointer to vhost_virtqueue Mike Christie
2023-04-28 16:31 ` [PATCH 02/14] vhost, vhost_net: add helper to check if vq has work Mike Christie
2023-04-28 16:31 ` [PATCH 03/14] vhost: take worker or vq instead of dev for queueing Mike Christie
2023-04-28 16:31 ` [PATCH 04/14] vhost: take worker or vq instead of dev for flushing Mike Christie
2023-04-28 16:31 ` [PATCH 05/14] vhost: convert poll work to be vq based Mike Christie
2023-04-28 16:31 ` [PATCH 06/14] vhost_sock: convert to vhost_vq_work_queue Mike Christie
2023-04-28 16:31 ` [PATCH 07/14] vhost_scsi: make SCSI cmd completion per vq Mike Christie
2023-04-28 16:31 ` [PATCH 08/14] vhost_scsi: convert to vhost_vq_work_queue Mike Christie
2023-04-28 16:31 ` [PATCH 09/14] vhost: remove vhost_work_queue Mike Christie
2023-04-28 16:31 ` [PATCH 10/14] vhost_scsi: flush IO vqs then send TMF rsp Mike Christie
2023-04-28 16:31 ` [PATCH 11/14] vhost: add helper to parse userspace vring state/file Mike Christie
2023-04-28 16:31 ` [PATCH 12/14] vhost: replace single worker pointer with xarray Mike Christie
2023-04-28 16:31 ` [PATCH 13/14] vhost: allow userspace to create workers Mike Christie
2023-05-17 3:10 ` Jason Wang
2023-05-18 23:21 ` Mike Christie
2023-04-28 16:31 ` [PATCH 14/14] vhost_scsi: add support for worker ioctls Mike Christie
2023-04-28 16:35 ` [PATCH v7 00/14] vhost: multiple worker support michael.christie
2023-06-02 11:49 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230602074855-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=michael.christie@oracle.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).