Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	qemu-devel@nongnu.org, cohuck@redhat.com, f4bug@amsat.org,
	yuval.shaia@oracle.com, borntraeger@de.ibm.com,
	pbonzini@redhat.com, imammedo@redhat.com
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 14:57:35 +0200	[thread overview]
Message-ID: <20180201145228-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <8dbc7c99-84f6-0023-526b-359fdf2b5162@redhat.com>

On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote:
> On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> > On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> >> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
> >>>> On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> >>>>> Currently only file backed memory backend can
> >>>>> be created with a "share" flag in order to allow
> >>>>> sharing guest RAM with other processes in the host.
> >>>>>
> >>>>> Add the "share" flag also to RAM Memory Backend
> >>>>> in order to allow remapping parts of the guest RAM
> >>>>> to different host virtual addresses. This is needed
> >>>>> by the RDMA devices in order to remap non-contiguous
> >>>>> QEMU virtual addresses to a contiguous virtual address range.
> >>>>>
> >>>>
> >>>> Why do we need to make this configurable?  Would anything break
> >>>> if MAP_SHARED was always used if possible?
> >>>
> >>> See Documentation/vm/numa_memory_policy.txt for a list
> >>> of complications.
> >>
> >> Ew.
> >>
> >>>
> >>> Maybe we should more of an effort to detect and report these
> >>> issues.
> >>
> >> Probably.  Having other features breaking silently when using
> >> pvrdma doesn't sound good.  We must at least document those
> >> problems in the documentation for memory-backend-ram.
> >>
> >> BTW, what's the root cause for requiring HVAs in the buffer?
> > 
> > It's a side effect of the kernel/userspace API which always wants
> > a single HVA/len pair to map memory for the application.
> > 
> > 
> 
> Hi Eduardo and Michael,
> 
> >>  Can
> >> this be fixed?
> > 
> > I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> > mapping an s/g list with actual memory. The HVA/len pair would then just
> > be used to refer to the region, without creating the two mappings.
> > 
> > Something like splitting the register mr into
> > 
> > mr = create mr (va/len) - allocate a handle and record the va/len
> > 
> > addmemory(mr, offset, hva, len) - pin memory
> > 
> > register mr - pass it to HW
> > 
> > As a nice side effect we won't burn so much virtual address space.
> >
> 
> We would still need a contiguous virtual address space range (for post-send)
> which we don't have since guest contiguous virtual address space
> will always end up as non-contiguous host virtual address space.

It just needs to be contiguous in the HCA virtual address space.
Software never accesses through this pointer.
In other words - basically expose register physical mr to userspace.


> 
> I am not sure the RDMA HW can handle a large VA with holes.
> 
> An alternative would be 0-based MR, QEMU intercepts the post-send
> operations and can substract the guest VA base address.
> However I didn't see the implementation in kernel for 0 based MRs
> and also the RDMA maintainer said it would work for local keys
> and not for remote keys.
> 
> > This will fix rdma with hugetlbfs as well which is currently broken.
> > 
> > 
> 
> There is already a discussion on the linux-rdma list:
>     https://www.spinics.net/lists/linux-rdma/msg60079.html
> But it will take some (actually a lot of) time, we are currently talking about
> a possible API.

You probably need to pass the s/g piece by piece since it might exceed
any reasonable array size.

> And it does not solve the re-mapping...
> 
> Thanks,
> Marcel

Haven't read through that discussion. But at least what I posted solves
it since you do not need it contiguous in HVA any longer.

> >> -- 
> >> Eduardo

next prev parent reply	other threads:[~2018-02-01 12:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17  9:54 [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2018-01-31 20:40   ` Eduardo Habkost
2018-01-31 21:10     ` Michael S. Tsirkin
2018-01-31 23:34       ` Eduardo Habkost
2018-02-01  2:22         ` Michael S. Tsirkin
2018-02-01  5:36           ` Marcel Apfelbaum
2018-02-01 12:10             ` Eduardo Habkost
2018-02-01 12:29               ` Marcel Apfelbaum
2018-02-01 13:53                 ` Eduardo Habkost
2018-02-01 18:03                   ` Marcel Apfelbaum
2018-02-01 18:21                     ` Eduardo Habkost
2018-02-01 18:31                       ` Marcel Apfelbaum
2018-02-01 18:51                         ` Eduardo Habkost
2018-02-01 18:58                           ` Marcel Apfelbaum
2018-02-01 19:21                             ` Eduardo Habkost
2018-02-01 19:28                               ` Marcel Apfelbaum
2018-02-01 19:35                               ` Paolo Bonzini
2018-02-01 18:52                         ` Michael S. Tsirkin
2018-02-01 14:24                 ` Michael S. Tsirkin
2018-02-01 16:31                   ` Eduardo Habkost
2018-02-01 16:48                     ` Michael S. Tsirkin
2018-02-01 16:57                       ` Eduardo Habkost
2018-02-01 16:59                         ` Michael S. Tsirkin
2018-02-01 17:01                           ` Eduardo Habkost
2018-02-01 17:12                             ` Michael S. Tsirkin
2018-02-01 17:36                               ` Eduardo Habkost
2018-02-01 17:58                                 ` Marcel Apfelbaum
2018-02-01 18:18                                   ` Eduardo Habkost
2018-02-01 18:34                                     ` Marcel Apfelbaum
2018-02-01 18:01                                 ` Michael S. Tsirkin
2018-02-01 18:07                   ` Marcel Apfelbaum
2018-02-01 12:57             ` Michael S. Tsirkin [this message]
2018-02-01 18:11               ` Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 2/4] docs: add pvrdma device documentation Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 3/4] pvrdma: initial implementation Marcel Apfelbaum
2018-02-01 19:10   ` Michael S. Tsirkin
2018-02-01 19:46     ` Marcel Apfelbaum
2018-01-17  9:54 ` [Qemu-devel] [PATCH V8 4/4] MAINTAINERS: add entry for hw/rdma Marcel Apfelbaum
2018-01-17 10:50 ` [Qemu-devel] [PATCH V8 0/4] hw/pvrdma: PVRDMA device implementation no-reply
2018-01-17 11:22   ` Yuval Shaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201145228-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=imammedo@redhat.com \
    --cc=marcel@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.