From: Eduardo Habkost <ehabkost@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org, cohuck@redhat.com, f4bug@amsat.org,
yuval.shaia@oracle.com, borntraeger@de.ibm.com,
pbonzini@redhat.com, imammedo@redhat.com
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 16:21:08 -0200 [thread overview]
Message-ID: <20180201182108.GE26425@localhost.localdomain> (raw)
In-Reply-To: <db40e5ca-9a15-4faf-4675-f29b0ee4ab22@redhat.com>
On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 15:53, Eduardo Habkost wrote:
> > On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote:
> >> On 01/02/2018 14:10, Eduardo Habkost wrote:
> >>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
> >>>> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> >>>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> >>> [...]
> >>>>>> BTW, what's the root cause for requiring HVAs in the buffer?
> >>>>>
> >>>>> It's a side effect of the kernel/userspace API which always wants
> >>>>> a single HVA/len pair to map memory for the application.
> >>>>>
> >>>>>
> >>>>
> >>>> Hi Eduardo and Michael,
> >>>>
> >>>>>> Can
> >>>>>> this be fixed?
> >>>>>
> >>>>> I think yes. It'd need to be a kernel patch for the RDMA subsystem
> >>>>> mapping an s/g list with actual memory. The HVA/len pair would then just
> >>>>> be used to refer to the region, without creating the two mappings.
> >>>>>
> >>>>> Something like splitting the register mr into
> >>>>>
> >>>>> mr = create mr (va/len) - allocate a handle and record the va/len
> >>>>>
> >>>>> addmemory(mr, offset, hva, len) - pin memory
> >>>>>
> >>>>> register mr - pass it to HW
> >>>>>
> >>>>> As a nice side effect we won't burn so much virtual address space.
> >>>>>
> >>>>
> >>>> We would still need a contiguous virtual address space range (for post-send)
> >>>> which we don't have since guest contiguous virtual address space
> >>>> will always end up as non-contiguous host virtual address space.
> >>>>
> >>>> I am not sure the RDMA HW can handle a large VA with holes.
> >>>
> >>> I'm confused. Why would the hardware see and care about virtual
> >>> addresses?
> >>
> >> The post-send operations bypasses the kernel, and the process
> >> puts in the work request GVA addresses.
> >>
> >>> How exactly does the hardware translates VAs to
> >>> PAs?
> >>
> >> The HW maintains a page-directory like structure different form MMU
> >> VA -> phys pages
> >>
> >>> What if the process page tables change?
> >>>
> >>
> >> Since the page tables the HW uses are their own, we just need the phys
> >> page to be pinned.
> >
> > So there's no hardware-imposed requirement that the hardware VAs
> > (mapped by the HW page directory) match the VAs in QEMU
> > address-space, right?
>
> Actually there is. Today it works exactly as you described.
Are you sure there's such hardware-imposed requirement?
Why would the hardware require VAs to match the ones in the
userspace address-space, if it doesn't use the CPU MMU at all?
--
Eduardo
next prev parent reply other threads:[~2018-02-01 18:21 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-17 9:54 [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2018-01-17 9:54 ` [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2018-01-31 20:40 ` Eduardo Habkost
2018-01-31 21:10 ` Michael S. Tsirkin
2018-01-31 23:34 ` Eduardo Habkost
2018-02-01 2:22 ` Michael S. Tsirkin
2018-02-01 5:36 ` Marcel Apfelbaum
2018-02-01 12:10 ` Eduardo Habkost
2018-02-01 12:29 ` Marcel Apfelbaum
2018-02-01 13:53 ` Eduardo Habkost
2018-02-01 18:03 ` Marcel Apfelbaum
2018-02-01 18:21 ` Eduardo Habkost [this message]
2018-02-01 18:31 ` Marcel Apfelbaum
2018-02-01 18:51 ` Eduardo Habkost
2018-02-01 18:58 ` Marcel Apfelbaum
2018-02-01 19:21 ` Eduardo Habkost
2018-02-01 19:28 ` Marcel Apfelbaum
2018-02-01 19:35 ` Paolo Bonzini
2018-02-01 18:52 ` Michael S. Tsirkin
2018-02-01 14:24 ` Michael S. Tsirkin
2018-02-01 16:31 ` Eduardo Habkost
2018-02-01 16:48 ` Michael S. Tsirkin
2018-02-01 16:57 ` Eduardo Habkost
2018-02-01 16:59 ` Michael S. Tsirkin
2018-02-01 17:01 ` Eduardo Habkost
2018-02-01 17:12 ` Michael S. Tsirkin
2018-02-01 17:36 ` Eduardo Habkost
2018-02-01 17:58 ` Marcel Apfelbaum
2018-02-01 18:18 ` Eduardo Habkost
2018-02-01 18:34 ` Marcel Apfelbaum
2018-02-01 18:01 ` Michael S. Tsirkin
2018-02-01 18:07 ` Marcel Apfelbaum
2018-02-01 12:57 ` Michael S. Tsirkin
2018-02-01 18:11 ` Marcel Apfelbaum
2018-01-17 9:54 ` [Qemu-devel] [PATCH V8 2/4] docs: add pvrdma device documentation Marcel Apfelbaum
2018-01-17 9:54 ` [Qemu-devel] [PATCH V8 3/4] pvrdma: initial implementation Marcel Apfelbaum
2018-02-01 19:10 ` Michael S. Tsirkin
2018-02-01 19:46 ` Marcel Apfelbaum
2018-01-17 9:54 ` [Qemu-devel] [PATCH V8 4/4] MAINTAINERS: add entry for hw/rdma Marcel Apfelbaum
2018-01-17 10:50 ` [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation no-reply
2018-01-17 11:22 ` Yuval Shaia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180201182108.GE26425@localhost.localdomain \
--to=ehabkost@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=f4bug@amsat.org \
--cc=imammedo@redhat.com \
--cc=marcel@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yuval.shaia@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).