virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: virtualization@lists.linux-foundation.org, stefanha@redhat.com
Subject: Re: [PATCH v6 00/11] vhost: multiple worker support
Date: Fri, 21 Apr 2023 02:49:56 -0400	[thread overview]
Message-ID: <20230421024930-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20230328021717.42268-1-michael.christie@oracle.com>

On Mon, Mar 27, 2023 at 09:17:06PM -0500, Mike Christie wrote:
> The following patches were built over linux-next which contains various
> vhost patches in mst's tree and the vhost_task patchset in Christian
> Brauner's tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git
> 
> kernel.user_worker branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=kernel.user_worker


Looks like it's going to miss this merge cycle. Hopefully in the next
one.


> The latter patchset handles the review comment for the patches in thread
> to make sure that worker threads we create are accounted for in the parent
> process's NPROC limit. The patches are scheduled to be sent to Linus for
> 6.4.
> 
> The patches in this patchset allow us to support multiple vhost workers
> per device. The design is a modified version of Stefan's original idea
> where userspace has the kernel create a worker and we pass back the pid.
> In this version instead of passing the pid between user/kernel space we
> use a worker_id which is just an integer managed by the vhost driver and
> we allow userspace to create and free workers and then attach them to
> virtqueues at setup time.
> 
> All review comments from the past reviews should be handled. If I didn't
> reply to a review comment, I agreed with the comment and should have
> handled it in this posting. Let me know if I missed one.
> 
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        160k   488k     -       -       -       -
> worker per vq   160k   310k    620k    1300k   1836k   2326k
> 
> Notes:
> 0. This used a simple fio command:
> 
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE
> 
> and I used a VM with 16 vCPUs and 16 virtqueues.
> 
> 1. The patches were tested with LIO's emulate_pr=0 which drops the
> LIO PR lock use. This was a bottleneck at around 12 vqs/jobs.
> 
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
> 
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> This results in less context switches and better batching without having to
> tweak any settings. I'm working on patches to add back batching during lio
> completion and do polling on the submission side.
> 
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
> 
> fio using io_uring and batching with the iodepth_batch* settings:
> 
> fio jobs        1       2       4       8       12      16
> -------------------------------------------------------------
> 1 worker        494k    520k    -       -       -       -
> worker per vq   496k    878k    1542k   2436k   2304k   2590k
> 
> 
> V6:
> - Rebase against vhost_task patchset.
> - Used xa instead of idr.
> V5:
> - Rebase against user_worker patchset.
> - Rebase against flush patchset.
> - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker.
> V4:
> - fix vhost-sock VSOCK_VQ_RX use.
> - name functions called directly by ioctl cmd's to match the ioctl cmd.
> - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd.
> - document worker lifetime, and cgroup, namespace, mm, rlimit
> inheritance, make it clear we currently only support sharing within the
> device.
> - add support to attach workers while IO is running.
> - instead of passing a pid_t of the kernel thread, pass a int allocated
> by the vhost layer with an idr.
> 
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
> 
> V2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

      parent reply	other threads:[~2023-04-21  6:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28  2:17 [PATCH v6 00/11] vhost: multiple worker support Mike Christie
2023-03-28  2:17 ` [PATCH v6 01/11] vhost: add vhost_worker pointer to vhost_virtqueue Mike Christie
2023-04-04  7:04   ` Jason Wang
2023-04-04 18:38   ` Michael S. Tsirkin
2023-04-04 23:15     ` Mike Christie
2023-03-28  2:17 ` [PATCH v6 02/11] vhost, vhost-net: add helper to check if vq has work Mike Christie
2023-04-04  7:05   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 03/11] vhost: take worker or vq instead of dev for queueing Mike Christie
2023-04-04  7:07   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 04/11] vhost: take worker or vq instead of dev for flushing Mike Christie
2023-04-04  7:08   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 05/11] vhost: convert poll work to be vq based Mike Christie
2023-03-28  2:17 ` [PATCH v6 06/11] vhost-sock: convert to vhost_vq_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 07/11] vhost-scsi: make SCSI cmd completion per vq Mike Christie
2023-03-28  2:17 ` [PATCH v6 08/11] vhost-scsi: convert to vhost_vq_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 09/11] vhost: remove vhost_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 10/11] vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2023-03-28  2:17 ` [PATCH v6 11/11] vhost: allow userspace to create workers Mike Christie
2023-04-04  8:00   ` Jason Wang
2023-04-04 23:08     ` Mike Christie
2023-04-10  7:04       ` Jason Wang
2023-04-10 17:16         ` Mike Christie
2023-04-11  3:00           ` Jason Wang
2023-04-11 22:15             ` Mike Christie
2023-04-12  7:56               ` Jason Wang
2023-04-13 22:36                 ` Mike Christie
2023-04-14  2:26                   ` Jason Wang
2023-04-14 16:49                     ` Mike Christie
2023-04-21  6:49 ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230421024930-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=michael.christie@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).