All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: virtualization@lists.linux-foundation.org, stefanha@redhat.com
Subject: Re: [PATCH v6 00/11] vhost: multiple worker support
Date: Fri, 21 Apr 2023 02:49:56 -0400	[thread overview]
Message-ID: <20230421024930-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20230328021717.42268-1-michael.christie@oracle.com>

On Mon, Mar 27, 2023 at 09:17:06PM -0500, Mike Christie wrote:
> The following patches were built over linux-next which contains various
> vhost patches in mst's tree and the vhost_task patchset in Christian
> Brauner's tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git
> 
> kernel.user_worker branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=kernel.user_worker


Looks like it's going to miss this merge cycle. Hopefully in the next
one.


> The latter patchset handles the review comment for the patches in thread
> to make sure that worker threads we create are accounted for in the parent
> process's NPROC limit. The patches are scheduled to be sent to Linus for
> 6.4.
> 
> The patches in this patchset allow us to support multiple vhost workers
> per device. The design is a modified version of Stefan's original idea
> where userspace has the kernel create a worker and we pass back the pid.
> In this version instead of passing the pid between user/kernel space we
> use a worker_id which is just an integer managed by the vhost driver and
> we allow userspace to create and free workers and then attach them to
> virtqueues at setup time.
> 
> All review comments from the past reviews should be handled. If I didn't
> reply to a review comment, I agreed with the comment and should have
> handled it in this posting. Let me know if I missed one.
> 
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        160k   488k     -       -       -       -
> worker per vq   160k   310k    620k    1300k   1836k   2326k
> 
> Notes:
> 0. This used a simple fio command:
> 
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE
> 
> and I used a VM with 16 vCPUs and 16 virtqueues.
> 
> 1. The patches were tested with LIO's emulate_pr=0 which drops the
> LIO PR lock use. This was a bottleneck at around 12 vqs/jobs.
> 
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
> 
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> This results in less context switches and better batching without having to
> tweak any settings. I'm working on patches to add back batching during lio
> completion and do polling on the submission side.
> 
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
> 
> fio using io_uring and batching with the iodepth_batch* settings:
> 
> fio jobs        1       2       4       8       12      16
> -------------------------------------------------------------
> 1 worker        494k    520k    -       -       -       -
> worker per vq   496k    878k    1542k   2436k   2304k   2590k
> 
> 
> V6:
> - Rebase against vhost_task patchset.
> - Used xa instead of idr.
> V5:
> - Rebase against user_worker patchset.
> - Rebase against flush patchset.
> - Redo vhost-scsi tmf flush handling so it doesn't access vq->worker.
> V4:
> - fix vhost-sock VSOCK_VQ_RX use.
> - name functions called directly by ioctl cmd's to match the ioctl cmd.
> - break up VHOST_SET_VRING_WORKER into a new, free and attach cmd.
> - document worker lifetime, and cgroup, namespace, mm, rlimit
> inheritance, make it clear we currently only support sharing within the
> device.
> - add support to attach workers while IO is running.
> - instead of passing a pid_t of the kernel thread, pass a int allocated
> by the vhost layer with an idr.
> 
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
> 
> V2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

      parent reply	other threads:[~2023-04-21  6:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28  2:17 [PATCH v6 00/11] vhost: multiple worker support Mike Christie
2023-03-28  2:17 ` [PATCH v6 01/11] vhost: add vhost_worker pointer to vhost_virtqueue Mike Christie
2023-04-04  7:04   ` Jason Wang
2023-04-04 18:38   ` Michael S. Tsirkin
2023-04-04 23:15     ` Mike Christie
2023-03-28  2:17 ` [PATCH v6 02/11] vhost, vhost-net: add helper to check if vq has work Mike Christie
2023-04-04  7:05   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 03/11] vhost: take worker or vq instead of dev for queueing Mike Christie
2023-04-04  7:07   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 04/11] vhost: take worker or vq instead of dev for flushing Mike Christie
2023-04-04  7:08   ` Jason Wang
2023-03-28  2:17 ` [PATCH v6 05/11] vhost: convert poll work to be vq based Mike Christie
2023-03-28  2:17 ` [PATCH v6 06/11] vhost-sock: convert to vhost_vq_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 07/11] vhost-scsi: make SCSI cmd completion per vq Mike Christie
2023-03-28  2:17 ` [PATCH v6 08/11] vhost-scsi: convert to vhost_vq_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 09/11] vhost: remove vhost_work_queue Mike Christie
2023-03-28  2:17 ` [PATCH v6 10/11] vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2023-03-28  2:17 ` [PATCH v6 11/11] vhost: allow userspace to create workers Mike Christie
2023-04-04  8:00   ` Jason Wang
2023-04-04 23:08     ` Mike Christie
2023-04-10  7:04       ` Jason Wang
2023-04-10 17:16         ` Mike Christie
2023-04-11  3:00           ` Jason Wang
2023-04-11 22:15             ` Mike Christie
2023-04-12  7:56               ` Jason Wang
2023-04-13 22:36                 ` Mike Christie
2023-04-14  2:26                   ` Jason Wang
2023-04-14 16:49                     ` Mike Christie
2023-04-21  6:49 ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230421024930-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=michael.christie@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.