From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [PATCH net V2 4/4] vhost: log dirty page correctly
Date: Wed, 26 Dec 2018 08:46:39 -0500 [thread overview]
Message-ID: <20181226083630-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <2a78e991-1917-256b-4f09-60c228c17979@redhat.com>
On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
>
> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
> > On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
> > > On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
> > > > > On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
> > > > > > On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
> > > > > > > On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
> > > > > > > > > Just to make sure I understand this. It looks to me we should:
> > > > > > > > >
> > > > > > > > > - allow passing GIOVA->GPA through UAPI
> > > > > > > > >
> > > > > > > > > - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
> > > > > > > > > performance
> > > > > > > > >
> > > > > > > > > Is this what you suggest?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > Not really. We already have GPA->HVA, so I suggested a flag to pass
> > > > > > > > GIOVA->GPA in the IOTLB.
> > > > > > > >
> > > > > > > > This has advantages for security since a single table needs
> > > > > > > > then to be validated to ensure guest does not corrupt
> > > > > > > > QEMU memory.
> > > > > > > >
> > > > > > > I wonder how much we can gain through this. Currently, qemu IOMMU gives
> > > > > > > GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
> > > > > > > GIOVA->HVA to vhost. It looks no difference to me.
> > > > > > >
> > > > > > > Thanks
> > > > > > The difference is in security not in performance. Getting a bad HVA
> > > > > > corrupts QEMU memory and it might be guest controlled. Very risky.
> > > > > How can this be controlled by guest? HVA was generated from qemu ram blocks
> > > > > which is totally under the control of qemu memory core instead of guest.
> > > > >
> > > > >
> > > > > Thanks
> > > > It is ultimately under guest influence as guest supplies IOVA->GPA
> > > > translations. qemu translates GPA->HVA and gives the translated result
> > > > to the kernel. If it's not buggy and kernel isn't buggy it's all
> > > > fine.
> > >
> > > If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
> > > the point why we even want to try this. Buggy qemu code can crash itself in
> > > many ways.
> > >
> > >
> > > > But that's the approach that was proven not to work in the 20th century.
> > > > In the 21st century we are trying defence in depth approach.
> > > >
> > > > My point is that a single code path that is responsible for
> > > > the HVA translations is better than two.
> > > >
> > > So the difference whether or not use memory table information:
> > >
> > > Current:
> > >
> > > 1) SET_MEM_TABLE: GPA->HVA
> > >
> > > 2) Qemu GIOVA->GPA
> > >
> > > 3) Qemu GPA->HVA
> > >
> > > 4) IOTLB_UPDATE: GIOVA->HVA
> > >
> > > If I understand correctly you want to drop step 3 consider it might be buggy
> > > which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
> > > will ends up:
> > >
> > > 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
> > > do it during device IOTLB lookup).
> > >
> > > 2) Extra bits to enable this capability.
> > >
> > > So this looks need more codes in kernel than what qemu did in userspace. Is
> > > this really worthwhile?
> > >
> > > Thanks
> > So there are several points I would like to make
> >
> > 1. At the moment without an iommu it is possible to
> > change GPA-HVA mappings and everything keeps working
> > because a change in memory tables flushes the rings.
>
>
> Interesting, I don't know this before. But when can this happen?
It doesn't happen with existing qemu. But it seems like a valid
thing to do to remap memory at a different address.
>
> > However I don't see the iotlb cache being invalidated
> > on that path - did I miss it? If it is not there it's
> > a related minor bug.
>
>
> It might have a bug. But a question is consider the case without IOMMU. We
> only update mem table (SET_MEM_TABLE), but not vring address. This looks
> like a bug as well?
I think that without an iommu it can only work without races if backend is
stopped or if the vring isn't in guest memory with ring aliasing).
>
> >
> > 2. qemu already has a GPA. Discarding it and re-calculating
> > when logging is on just seems wrong.
> > However if you would like to *also* keep the HVA in the iotlb
> > to avoid doing extra translations, that sounds like a
> > reasonable optimization.
>
>
> Yes, traverse GPA->HVA mapping seems unnecessary.
>
>
> >
> > 3. it also means that the hva->gpa translation only runs
> > when logging is enabled. That is a rarely excercised
> > path so any bugs there will not be caught.
>
>
> I wonder maybe some kind of unit-test may help here.
>
>
> >
> > So I really would like us long term to move away from
> > hva->gpa translations, keep them for legacy userspace only
> > but I don't really mind how we do it.
> >
> > How about
> > - a new flag to pass an iotlb with *both* a gpa and hva
> > - for legacy userspace, calculate the gpa on iotlb update
> > so the device then uses a shared code path
> >
> > what do you think?
> >
> >
>
> I don't object this idea so I can try, just want to figure out why it was a
> must.
>
> Thanks
Not a must but I think it's a good interface extension.
--
MST
next prev parent reply other threads:[~2018-12-26 13:46 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
2018-12-12 14:33 ` Michael S. Tsirkin
2018-12-12 14:33 ` Michael S. Tsirkin
2018-12-12 10:08 ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
2018-12-12 14:20 ` Michael S. Tsirkin
2018-12-12 14:20 ` Michael S. Tsirkin
2018-12-12 10:08 ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:24 ` Michael S. Tsirkin
2018-12-12 14:24 ` Michael S. Tsirkin
2018-12-13 2:27 ` Jason Wang
2018-12-13 2:27 ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
2018-12-12 10:08 ` Jason Wang
2018-12-12 14:32 ` Michael S. Tsirkin
2018-12-12 14:32 ` Michael S. Tsirkin
2018-12-13 2:39 ` Jason Wang
2018-12-13 2:39 ` Jason Wang
2018-12-13 14:31 ` Michael S. Tsirkin
2018-12-13 14:31 ` Michael S. Tsirkin
2018-12-14 2:43 ` Jason Wang
2018-12-14 2:43 ` Jason Wang
2018-12-14 13:20 ` Michael S. Tsirkin
2018-12-24 3:43 ` Jason Wang
2018-12-24 3:43 ` Jason Wang
2018-12-24 17:41 ` Michael S. Tsirkin
2018-12-25 9:43 ` Jason Wang
2018-12-25 16:25 ` Michael S. Tsirkin
2018-12-25 16:25 ` Michael S. Tsirkin
2018-12-26 5:43 ` Jason Wang
2018-12-26 5:43 ` Jason Wang
2018-12-26 13:46 ` Michael S. Tsirkin [this message]
2018-12-27 9:32 ` Jason Wang
2018-12-27 9:32 ` Jason Wang
2018-12-26 13:46 ` Michael S. Tsirkin
2018-12-25 9:43 ` Jason Wang
2018-12-24 17:41 ` Michael S. Tsirkin
2018-12-14 13:20 ` Michael S. Tsirkin
2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
2018-12-12 23:31 ` David Miller
2018-12-13 2:42 ` Jason Wang
2018-12-13 2:42 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181226083630-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=jasowang@redhat.com \
--cc=jintack@cs.columbia.edu \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.