From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Miller <davem@davemloft.net>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
Date: Mon, 24 Dec 2018 14:09:29 -0500 [thread overview]
Message-ID: <20181224140420-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <f6ce1fbb-b634-b17d-e9cf-36c662f49d75@redhat.com>
On Mon, Dec 24, 2018 at 04:44:14PM +0800, Jason Wang wrote:
>
> On 2018/12/17 上午3:57, Michael S. Tsirkin wrote:
> > On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote:
> > > From: Jason Wang <jasowang@redhat.com>
> > > Date: Fri, 14 Dec 2018 12:29:54 +0800
> > >
> > > > On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
> > > > > > Hi:
> > > > > >
> > > > > > This series tries to access virtqueue metadata through kernel virtual
> > > > > > address instead of copy_user() friends since they had too much
> > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > toggling.
> > > > > >
> > > > > > Test shows about 24% improvement on TX PPS. It should benefit other
> > > > > > cases as well.
> > > > > >
> > > > > > Please review
> > > > > I think the idea of speeding up userspace access is a good one.
> > > > > However I think that moving all checks to start is way too aggressive.
> > > >
> > > > So did packet and AF_XDP. Anyway, sharing address space and access
> > > > them directly is the fastest way. Performance is the major
> > > > consideration for people to choose backend. Compare to userspace
> > > > implementation, vhost does not have security advantages at any
> > > > level. If vhost is still slow, people will start to develop backends
> > > > based on e.g AF_XDP.
> > > Exactly, this is precisely how this kind of problem should be solved.
> > >
> > > Michael, I strongly support the approach Jason is taking here, and I
> > > would like to ask you to seriously reconsider your objections.
> > >
> > > Thank you.
> > Okay. Won't be the first time I'm wrong.
> >
> > Let's say we ignore security aspects, but we need to make sure the
> > following all keep working (broken with this revision):
> > - file backed memory (I didn't see where we mark memory dirty -
> > if we don't we get guest memory corruption on close, if we do
> > then host crash as https://lwn.net/Articles/774411/ seems to apply here?)
>
>
> We only pin metadata pages, so I don't think they can be used for DMA. So it
> was probably not an issue. The real issue is zerocopy codes, maybe it's time
> to disable it by default?
>
>
> > - THP
>
>
> We will miss 2 or 4 pages for THP, I wonder whether or not it's measurable.
>
>
> > - auto-NUMA
>
>
> I'm not sure auto-NUMA will help for the case of IPC. It can damage the
> performance in the worst case if vhost and userspace are running in two
> different nodes. Anyway I can measure.
>
>
> >
> > Because vhost isn't like AF_XDP where you can just tell people "use
> > hugetlbfs" and "data is removed on close" - people are using it in lots
> > of configurations with guest memory shared between rings and unrelated
> > data.
>
>
> This series doesn't share data, only metadata is shared.
Let me clarify - I mean that metadata is in same huge page with
unrelated guest data.
>
> >
> > Jason, thoughts on these?
> >
>
> Based on the above, I can measure the impact of THP to see how it impacts.
>
> For unsafe variants, it can only work for when we can batch the access and
> it needs non trivial rework on the vhost codes with unexpected amount of
> work for archs other than x86. I'm not sure it's worth to try.
>
> Thanks
Yes I think we need better APIs in vhost. Right now
we have an API to get and translate a single buffer.
We should have one that gets a batch of descriptors
and stores it, then one that translates this batch.
IMHO this will benefit everyone even if we do vmap due to
better code locality.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: David Miller <davem@davemloft.net>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()
Date: Mon, 24 Dec 2018 14:09:29 -0500 [thread overview]
Message-ID: <20181224140420-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <f6ce1fbb-b634-b17d-e9cf-36c662f49d75@redhat.com>
On Mon, Dec 24, 2018 at 04:44:14PM +0800, Jason Wang wrote:
>
> On 2018/12/17 上午3:57, Michael S. Tsirkin wrote:
> > On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote:
> > > From: Jason Wang <jasowang@redhat.com>
> > > Date: Fri, 14 Dec 2018 12:29:54 +0800
> > >
> > > > On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
> > > > > > Hi:
> > > > > >
> > > > > > This series tries to access virtqueue metadata through kernel virtual
> > > > > > address instead of copy_user() friends since they had too much
> > > > > > overheads like checks, spec barriers or even hardware feature
> > > > > > toggling.
> > > > > >
> > > > > > Test shows about 24% improvement on TX PPS. It should benefit other
> > > > > > cases as well.
> > > > > >
> > > > > > Please review
> > > > > I think the idea of speeding up userspace access is a good one.
> > > > > However I think that moving all checks to start is way too aggressive.
> > > >
> > > > So did packet and AF_XDP. Anyway, sharing address space and access
> > > > them directly is the fastest way. Performance is the major
> > > > consideration for people to choose backend. Compare to userspace
> > > > implementation, vhost does not have security advantages at any
> > > > level. If vhost is still slow, people will start to develop backends
> > > > based on e.g AF_XDP.
> > > Exactly, this is precisely how this kind of problem should be solved.
> > >
> > > Michael, I strongly support the approach Jason is taking here, and I
> > > would like to ask you to seriously reconsider your objections.
> > >
> > > Thank you.
> > Okay. Won't be the first time I'm wrong.
> >
> > Let's say we ignore security aspects, but we need to make sure the
> > following all keep working (broken with this revision):
> > - file backed memory (I didn't see where we mark memory dirty -
> > if we don't we get guest memory corruption on close, if we do
> > then host crash as https://lwn.net/Articles/774411/ seems to apply here?)
>
>
> We only pin metadata pages, so I don't think they can be used for DMA. So it
> was probably not an issue. The real issue is zerocopy codes, maybe it's time
> to disable it by default?
>
>
> > - THP
>
>
> We will miss 2 or 4 pages for THP, I wonder whether or not it's measurable.
>
>
> > - auto-NUMA
>
>
> I'm not sure auto-NUMA will help for the case of IPC. It can damage the
> performance in the worst case if vhost and userspace are running in two
> different nodes. Anyway I can measure.
>
>
> >
> > Because vhost isn't like AF_XDP where you can just tell people "use
> > hugetlbfs" and "data is removed on close" - people are using it in lots
> > of configurations with guest memory shared between rings and unrelated
> > data.
>
>
> This series doesn't share data, only metadata is shared.
Let me clarify - I mean that metadata is in same huge page with
unrelated guest data.
>
> >
> > Jason, thoughts on these?
> >
>
> Based on the above, I can measure the impact of THP to see how it impacts.
>
> For unsafe variants, it can only work for when we can batch the access and
> it needs non trivial rework on the vhost codes with unexpected amount of
> work for archs other than x86. I'm not sure it's worth to try.
>
> Thanks
Yes I think we need better APIs in vhost. Right now
we have an API to get and translate a single buffer.
We should have one that gets a batch of descriptors
and stores it, then one that translates this batch.
IMHO this will benefit everyone even if we do vmap due to
better code locality.
--
MST
next prev parent reply other threads:[~2018-12-24 19:09 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-13 10:10 [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Jason Wang
2018-12-13 10:10 ` [PATCH net-next 1/3] vhost: generalize adding used elem Jason Wang
2018-12-13 10:10 ` Jason Wang
2018-12-13 19:41 ` Michael S. Tsirkin
2018-12-14 4:00 ` Jason Wang
2018-12-14 4:00 ` Jason Wang
2018-12-13 19:41 ` Michael S. Tsirkin
2018-12-13 10:10 ` [PATCH net-next 2/3] vhost: fine grain userspace memory accessors Jason Wang
2018-12-13 10:10 ` Jason Wang
2018-12-13 10:10 ` [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Jason Wang
2018-12-13 15:44 ` Michael S. Tsirkin
2018-12-13 15:44 ` Michael S. Tsirkin
2018-12-13 21:18 ` Konrad Rzeszutek Wilk
2018-12-13 21:58 ` Michael S. Tsirkin
2018-12-13 21:58 ` Michael S. Tsirkin
2018-12-13 21:18 ` Konrad Rzeszutek Wilk
2018-12-14 3:57 ` Jason Wang
2018-12-14 12:36 ` Michael S. Tsirkin
2018-12-14 12:36 ` Michael S. Tsirkin
2018-12-24 7:53 ` Jason Wang
2018-12-24 7:53 ` Jason Wang
2018-12-24 18:10 ` Michael S. Tsirkin
2018-12-25 10:05 ` Jason Wang
2018-12-25 10:05 ` Jason Wang
2018-12-25 12:50 ` Michael S. Tsirkin
2018-12-25 12:50 ` Michael S. Tsirkin
2018-12-26 3:57 ` Jason Wang
2018-12-26 3:57 ` Jason Wang
2018-12-26 15:02 ` Michael S. Tsirkin
2018-12-26 15:02 ` Michael S. Tsirkin
2018-12-27 9:39 ` Jason Wang
2018-12-27 9:39 ` Jason Wang
2018-12-30 18:30 ` Michael S. Tsirkin
2018-12-30 18:30 ` Michael S. Tsirkin
2019-01-02 11:38 ` Jason Wang
2019-01-02 11:38 ` Jason Wang
2018-12-24 18:10 ` Michael S. Tsirkin
2018-12-15 21:15 ` David Miller
2018-12-15 21:15 ` David Miller
2018-12-14 3:57 ` Jason Wang
2018-12-14 14:48 ` kbuild test robot
2018-12-14 14:48 ` kbuild test robot
2018-12-13 10:10 ` Jason Wang
2018-12-13 15:27 ` [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Michael S. Tsirkin
2018-12-14 3:42 ` Jason Wang
2018-12-14 12:33 ` Michael S. Tsirkin
2018-12-14 15:31 ` Michael S. Tsirkin
2018-12-14 15:31 ` Michael S. Tsirkin
2018-12-24 8:32 ` Jason Wang
2018-12-24 8:32 ` Jason Wang
2018-12-24 18:12 ` Michael S. Tsirkin
2018-12-24 18:12 ` Michael S. Tsirkin
2018-12-25 10:09 ` Jason Wang
2018-12-25 10:09 ` Jason Wang
2018-12-25 12:52 ` Michael S. Tsirkin
2018-12-25 12:52 ` Michael S. Tsirkin
2018-12-26 3:59 ` Jason Wang
2018-12-26 3:59 ` Jason Wang
2018-12-14 12:33 ` Michael S. Tsirkin
2018-12-14 3:42 ` Jason Wang
2018-12-13 15:27 ` Michael S. Tsirkin
2018-12-13 20:12 ` Michael S. Tsirkin
2018-12-14 4:29 ` Jason Wang
2018-12-14 12:52 ` Michael S. Tsirkin
2018-12-14 12:52 ` Michael S. Tsirkin
2018-12-15 19:43 ` David Miller
2018-12-15 19:43 ` David Miller
2018-12-16 19:57 ` Michael S. Tsirkin
2018-12-16 19:57 ` Michael S. Tsirkin
2018-12-24 8:44 ` Jason Wang
2018-12-24 19:09 ` Michael S. Tsirkin [this message]
2018-12-24 19:09 ` Michael S. Tsirkin
2018-12-24 8:44 ` Jason Wang
2018-12-14 4:29 ` Jason Wang
2018-12-13 20:12 ` Michael S. Tsirkin
2018-12-14 15:16 ` Michael S. Tsirkin
2018-12-14 15:16 ` Michael S. Tsirkin
-- strict thread matches above, loose matches on Subject: below --
2018-12-13 10:10 Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181224140420-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=davem@davemloft.net \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.