Re: thoughts stac/clac and get user for vhost

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: netdev@vger.kernel.org
Subject: Re: thoughts stac/clac and get user for vhost
Date: Fri, 4 Jan 2019 16:25:20 -0500	[thread overview]
Message-ID: <20190104162014-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <3a58f172-f36f-044d-f8ac-8e24b2dc61a5@redhat.com>

On Wed, Jan 02, 2019 at 11:25:14AM +0800, Jason Wang wrote:
> 
> On 2018/12/31 上午2:40, Michael S. Tsirkin wrote:
> > On Thu, Dec 27, 2018 at 05:55:52PM +0800, Jason Wang wrote:
> > > On 2018/12/26 下午11:06, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 26, 2018 at 12:03:50PM +0800, Jason Wang wrote:
> > > > > On 2018/12/26 上午12:41, Michael S. Tsirkin wrote:
> > > > > > Hi!
> > > > > > I was just wondering: packed ring batches things naturally.
> > > > > > E.g.
> > > > > > 
> > > > > > user_access_begin
> > > > > > check descriptor valid
> > > > > > smp_rmb
> > > > > > copy descriptor
> > > > > > user_access_end
> > > > > But without speculation on the descriptor (which may only work for in-order
> > > > > or even a violation of spec). Only one two access of a single descriptor
> > > > > could be batched. For split ring, we can batch more since we know how many
> > > > > descriptors is pending. (avail_idx - last_avail_idx).
> > > > > 
> > > > > Anything I miss?
> > > > > 
> > > > > Thanks
> > > > > 
> > > > just check more descriptors in a loop:
> > > > 
> > > >    user_access_begin
> > > >    for (i = 0; i < 16; ++i) {
> > > > 	 if (!descriptor valid)
> > > > 		break;
> > > > 	 smp_rmb
> > > > 	 copy descriptor
> > > >    }
> > > >    user_access_end
> > > > 
> > > > you don't really need to know how many there are
> > > > ahead of the time as you still copy them 1 by one.
> > > 
> > > So let's see the case of split ring
> > > 
> > > 
> > > user_access_begin
> > > 
> > > n = avail_idx - last_avail_idx (1)
> > > 
> > > n = MIN(n, 16)
> > > 
> > > smp_rmb
> > > 
> > > read n entries from avail_ring (2)
> > > 
> > > for (i =0; i <n; i++)
> > > 
> > >      copy descriptor (3)
> > > 
> > > user_access_end
> > > 
> > > 
> > > Consider for the case of heavy workload. So for packed ring, we have 32
> > > times of userspace access and 16 times of smp_rmb()
> > > 
> > > For split ring we have
> > > 
> > > (1) 1 time
> > > 
> > > (2) 2 times at most
> > > 
> > > (3) 16 times
> > > 
> > > 19 times of userspace access and 1 times of smp_rmb(). In fact 2 could be
> > > eliminated with in order. 3 could be batched completely with in order and
> > > partially when out of order.
> > > 
> > > I don't see how packed ring help here especially consider lfence on x86 is
> > > more than memory fence, it prevents speculation in fact.
> > > 
> > > Thanks
> > So on x86 at least RMB is free, this is why I never bothered optimizing
> > it out. Is smp_rmb still worth optimizing out for ARM? Does it cost
> > more than the extra indirection in the split ring?
> 
> 
> I don't know, but obviously, RMB has a chance to damage the performance more
> or less. But even on arch where the RMB is free, packed ring still does not
> show obvious advantage.

People do measure gains with a PMD on host+guest.
So it's a question of optimizing the packed ring implementation in Linux.


> 
> > 
> > But my point was really fundamental - if ring accesses are expensive
> > then we should batch them.
> 
> 
> I don't object the batching, the reason that they are expensive could be:
> 
> 1) unnecessary overhead caused by speculation barrier and check likes SMAP
> 2) cache contention
> 
> So it does not conflict with the effort that I did to remove 1). My plan is:
> for metadata, try to eliminate all the 1) completely. For data, we can do
> batch copying to amortize its effort. For avail/descriptor batching, we can
> try to it on top.
> 
> 
> >   Right now we have an API that gets
> > an iovec directly. That limits the optimizations you can do.
> > 
> > The translation works like this:
> > 
> > ring -> valid descriptors -> iovecs
> > 
> > We should have APIs for each step that work in batches.
> > 
> 
> Yes.
> 
> Thanks
> 
> 
> > 
> > > > 
> > > > > > So packed layout should show the gain with this approach.
> > > > > > That could be motivation enough to finally enable vhost packed ring
> > > > > > support.
> > > > > > 
> > > > > > Thoughts?
> > > > > >

next prev parent reply	other threads:[~2019-01-04 21:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-25 16:41 thoughts stac/clac and get user for vhost Michael S. Tsirkin
2018-12-26  4:03 ` Jason Wang
2018-12-26 15:06   ` Michael S. Tsirkin
2018-12-27  9:55     ` Jason Wang
2018-12-30 18:40       ` Michael S. Tsirkin
2019-01-02  3:25         ` Jason Wang
2019-01-04 21:25           ` Michael S. Tsirkin [this message]
2019-01-07  4:26             ` Jason Wang
2019-01-07  5:42               ` Michael S. Tsirkin
2019-01-07  6:54                 ` Jason Wang
2019-01-07 14:45                   ` Michael S. Tsirkin
2019-01-08 10:09                     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190104162014-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.