Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	qemu-devel@nongnu.org, cohuck@redhat.com, f4bug@amsat.org,
	yuval.shaia@oracle.com, borntraeger@de.ibm.com,
	pbonzini@redhat.com, imammedo@redhat.com
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 14:57:35 +0200	[thread overview]
Message-ID: <20180201145228-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <8dbc7c99-84f6-0023-526b-359fdf2b5162@redhat.com>

On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> > On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> >> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
> >>>> On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> >>>>> Currently only file backed memory backend can
> >>>>> be created with a "share" flag in order to allow
> >>>>> sharing guest RAM with other processes in the host.
> >>>>>
> >>>>> Add the "share" flag also to RAM Memory Backend
> >>>>> in order to allow remapping parts of the guest RAM
> >>>>> to different host virtual addresses. This is needed
> >>>>> by the RDMA devices in order to remap non-contiguous
> >>>>> QEMU virtual addresses to a contiguous virtual address range.
> >>>>>
> >>>>
> >>>> Why do we need to make this configurable?  Would anything break
> >>>> if MAP_SHARED was always used if possible?
> >>>
> >>> See Documentation/vm/numa_memory_policy.txt for a list
> >>> of complications.
> >>
> >> Ew.
> >>
> >>>
> >>> Maybe we should more of an effort to detect and report these
> >>> issues.
> >>
> >> Probably.  Having other features breaking silently when using
> >> pvrdma doesn't sound good.  We must at least document those
> >> problems in the documentation for memory-backend-ram.
> >>
> >> BTW, what's the root cause for requiring HVAs in the buffer?
> > 
> > It's a side effect of the kernel/userspace API which always wants
> > a single HVA/len pair to map memory for the application.
> > 
> > 
> 
> Hi Eduardo and Michael,
> 
> >>  Can
> >> this be fixed?
> > 
> > I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> > mapping an s/g list with actual memory. The HVA/len pair would then just
> > be used to refer to the region, without creating the two mappings.
> > 
> > Something like splitting the register mr into
> > 
> > mr = create mr (va/len) - allocate a handle and record the va/len
> > 
> > addmemory(mr, offset, hva, len) - pin memory
> > 
> > register mr - pass it to HW
> > 
> > As a nice side effect we won't burn so much virtual address space.
> >
> 
> We would still need a contiguous virtual address space range (for post-send)
> which we don't have since guest contiguous virtual address space
> will always end up as non-contiguous host virtual address space.

It just needs to be contiguous in the HCA virtual address space.
Software never accesses through this pointer.
In other words - basically expose register physical mr to userspace.


> 
> I am not sure the RDMA HW can handle a large VA with holes.
> 
> An alternative would be 0-based MR, QEMU intercepts the post-send
> operations and can substract the guest VA base address.
> However I didn't see the implementation in kernel for 0 based MRs
> and also the RDMA maintainer said it would work for local keys
> and not for remote keys.
> 
> > This will fix rdma with hugetlbfs as well which is currently broken.
> > 
> > 
> 
> There is already a discussion on the linux-rdma list:
>     https://www.spinics.net/lists/linux-rdma/msg60079.html
> But it will take some (actually a lot of) time, we are currently talking about
> a possible API.

You probably need to pass the s/g piece by piece since it might exceed
any reasonable array size.

> And it does not solve the re-mapping...
> 
> Thanks,
> Marcel

Haven't read through that discussion. But at least what I posted solves
it since you do not need it contiguous in HVA any longer.

> >> -- 
> >> Eduardo

next prev parent reply	other threads:[~2018-02-01 12:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17  9:54 [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2018-01-31 20:40   ` Eduardo Habkost
2018-01-31 21:10     ` Michael S. Tsirkin
2018-01-31 23:34       ` Eduardo Habkost
2018-02-01  2:22         ` Michael S. Tsirkin
2018-02-01  5:36           ` Marcel Apfelbaum
2018-02-01 12:10             ` Eduardo Habkost
2018-02-01 12:29               ` Marcel Apfelbaum
2018-02-01 13:53                 ` Eduardo Habkost
2018-02-01 18:03                   ` Marcel Apfelbaum
2018-02-01 18:21                     ` Eduardo Habkost
2018-02-01 18:31                       ` Marcel Apfelbaum
2018-02-01 18:51                         ` Eduardo Habkost
2018-02-01 18:58                           ` Marcel Apfelbaum
2018-02-01 19:21                             ` Eduardo Habkost
2018-02-01 19:28                               ` Marcel Apfelbaum
2018-02-01 19:35                               ` Paolo Bonzini
2018-02-01 18:52                         ` Michael S. Tsirkin
2018-02-01 14:24                 ` Michael S. Tsirkin
2018-02-01 16:31                   ` Eduardo Habkost
2018-02-01 16:48                     ` Michael S. Tsirkin
2018-02-01 16:57                       ` Eduardo Habkost
2018-02-01 16:59                         ` Michael S. Tsirkin
2018-02-01 17:01                           ` Eduardo Habkost
2018-02-01 17:12                             ` Michael S. Tsirkin
2018-02-01 17:36                               ` Eduardo Habkost
2018-02-01 17:58                                 ` Marcel Apfelbaum
2018-02-01 18:18                                   ` Eduardo Habkost
2018-02-01 18:34                                     ` Marcel Apfelbaum
2018-02-01 18:01                                 ` Michael S. Tsirkin
2018-02-01 18:07                   ` Marcel Apfelbaum
2018-02-01 12:57             ` Michael S. Tsirkin [this message]
2018-02-01 18:11               ` Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 2/4] docs: add pvrdma device documentation Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 3/4] pvrdma: initial implementation Marcel Apfelbaum
2018-02-01 19:10   ` Michael S. Tsirkin
2018-02-01 19:46     ` Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 4/4] MAINTAINERS: add entry for hw/rdma Marcel Apfelbaum
2018-01-17 10:50 ` [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation no-reply
2018-01-17 11:22   ` Yuval Shaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201145228-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=imammedo@redhat.com \
    --cc=marcel@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).