From: Halil Pasic <pasic@linux.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
Cornelia Huck <cohuck@redhat.com>, Frank Yang <lfy@google.com>,
virtio-comment@lists.oasis-open.org,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
Date: Fri, 15 Feb 2019 22:42:39 +0100 [thread overview]
Message-ID: <20190215224239.6c0cb1d1@oc2783563651> (raw)
In-Reply-To: <20190215151424.GI2630@work-vm>
On Fri, 15 Feb 2019 15:14:25 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * David Hildenbrand (david@redhat.com) wrote:
> > On 15.02.19 15:02, Dr. David Alan Gilbert wrote:
> > > * David Hildenbrand (david@redhat.com) wrote:
> > >> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> > >>> * Cornelia Huck (cohuck@redhat.com) wrote:
> > >>>> On Fri, 15 Feb 2019 13:33:06 +0100
> > >>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>
> > >>>>> On 15.02.19 13:28, Cornelia Huck wrote:
> > >>>>>> On Fri, 15 Feb 2019 12:26:00 +0100
> > >>>>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>>>
> > >>>>>>> Probing is always ugly. But I think we can add something like
> > >>>>>>> the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> > >>>>>>> So there, we would have a memory region just like e.g. x86 has.
> > >>>>>>
> > >>>>>> A special region is probably the best way out of this pickle. We would
> > >>>>>> only need the discovery ccw for virtio, then.
> > >>>>>>
> > >>>>>>>
> > >>>>>>> This should even work with other mechanism I am working on. E.g.
> > >>>>>>> for memory devices, we will add yet another memory region above
> > >>>>>>> the special PCI region.
> > >>>>>>>
> > >>>>>>> The layout of the guest would then be something like
> > >>>>>>>
> > >>>>>>> [0x000000000000000]
> > >>>>>>> ... Memory region containing RAM
> > >>>>>>> [ram_size ]
> > >>>>>>> ... Memory region for e.g. special PCI devices
> > >>>>>>> [ram_size +1 GB ]
> > >>>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > >>>>>>> [maxram_size - ram_size + 1GB]
> > >>>>>>>
> > >>>>>>> We would have to create proper page tables for guest backing that take
> > >>>>>>> care of the new guest size (not just ram_size). Also, to the guest we
> > >>>>>>> would indicate "maximum ram size == ram_size" so it does not try to
> > >>>>>>> probe the "special" memory.
> > >>>>>>
> > >>>>>> Hm... so that would be:
> > >>>>>> - 0..ram_size: just like it is handled now
> > >>>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> > >>>>>> build page tables for it
> > >>>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> > >>>>>>
> > >>>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
> > >>>>>> via some kind of probable feature that there's a 1GB region there?
> > >>>>>
> > >>>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
> > >>>>> via SCLP. An unmodified guest will not probe beyond that.
> > >>>>
> > >>>> Nod.
> > >>>>
> > >>>>> The parts of the 1GB used by a device should be communicated via the
> > >>>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
> > >>>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> > >>>>> is one part to be clarified.
> > >>>>>
> > >>>>> I guess the guest does not need to know about the whole 1GB, only per
> > >>>>> device about the used part. We can then built page tables in the guest
> > >>>>> for that part when plugging.
> > >>>>
> > >>>> Hm. With my proposal, the guest would get a list of region addresses
> > >>>> from the device via a new ccw. It could then proceed to set up page
> > >>>> tables for it and start to use it. As long as it is aware that the
> > >>>> addresses it will get are beyond max_ram, that should be fine, I think.
> > >>>
> > >>> Which is the same as my virtio-mmio proposal; the host gets to put it
> > >>> where ever it sees fit (outside ram) and you've just got a way of
> > >>> telling the guest where it lives.
> > >>>
> > >>> Davidh's 1GB window is pretty much how older PCs worked I think;
> > >>> the problem is that 1GB is never enough and you still need a way
> > >>> to enumarate what devices are where, so it doesn't help you.
> > >>> (Our current virtio-fs dax mappings we're using are a few GB).
> > >>>
> > >>
> > >> How does that work on x86? You cannot suddenly move stuff into the
> > >> memory device memory region and potentially mess with DIMMs to be
> > >> plugged later. QEMU wise, this sounds wrong.
> > >
> > > Because it's PCI based, it becomes the guests problem - the guest
> > > sets the PCI BARs which set the GPA of the PCI devices; I assume
> > > there's some protection that happens if it gets mapped over RAM (?!)
> > >
> > > I think that varies by firmware as well, with EFI mapping
> > > them differently from our bios.
> > > I think the guest knows the total number of DIMM slots and max-ram
> > > limit, so knows where not-to-map.
> >
> > On s390x, we have to define the size of the host->guest page table when
> > starting the guest. So we need some upper limit.
>
> That's OK; x86 also has that because they have a limited physical
> and virtual address size [which may or may not be correctly passed to
> the guest!].
>
> > Mapping anywhere, I
> > really don't like. Letting the guest define the mapping, I really don't
> > like.
>
> Well it's OK to have a hole for it, but letting the guest choose where
> those mappings go in the hole is the norm for PCI (there are
> exceptions).
>
> > We can of course switch the order of mappings
> >
> > [0x000000000000000 ]
> > ... Memory region containing RAM
> > [ram_size ]
> > ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > [maxram_size - ram_size ]
> > ... Memory region for e.g. special PCI/CCW devices
> > [ TBD]
> >
> > We can size TBD in a way that we e.g. max out the current page table
> > size before having to switch to more levels.
>
> Yes, that's fine to set some upper limit; you've just got to make sure
> that the hypervisor knows where it can put stuff and if the guest
> does PCI that it knows where it's allowed to put stuff and as long
> as the two don't overlap everyone is happy.
>
> [We should probably take this level of detail off this list - it's
> parsecs away from the detail of virtio]
If you do take the in detail discussion off is list please keep me in the
loop.
Regards,
Halil
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
next prev parent reply other threads:[~2019-02-15 21:42 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 11:41 [virtio-comment] [PATCH 0/3] Large shared memory regions Dr. David Alan Gilbert (git)
2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
2019-01-11 12:15 ` Cornelia Huck
2019-01-11 12:26 ` Dr. David Alan Gilbert
2019-01-15 10:10 ` Cornelia Huck
2019-01-15 11:23 ` Dr. David Alan Gilbert
2019-01-16 10:56 ` Cornelia Huck
2019-01-16 20:06 ` Dr. David Alan Gilbert
2019-02-11 21:52 ` Cornelia Huck
2019-02-13 18:37 ` Dr. David Alan Gilbert
2019-02-14 10:58 ` Cornelia Huck
2019-02-14 16:37 ` Dr. David Alan Gilbert
2019-02-14 17:43 ` Frank Yang
2019-02-15 11:07 ` Cornelia Huck
2019-02-15 11:19 ` Dr. David Alan Gilbert
2019-02-15 12:31 ` Cornelia Huck
2019-02-18 15:28 ` Halil Pasic
2019-02-15 11:26 ` David Hildenbrand
2019-02-15 12:28 ` Cornelia Huck
2019-02-15 12:33 ` David Hildenbrand
2019-02-15 12:37 ` Cornelia Huck
2019-02-15 12:59 ` David Hildenbrand
2019-02-15 13:50 ` Dr. David Alan Gilbert
2019-02-15 13:56 ` David Hildenbrand
2019-02-15 14:02 ` Dr. David Alan Gilbert
2019-02-15 14:13 ` David Hildenbrand
2019-02-15 15:14 ` Dr. David Alan Gilbert
2019-02-15 21:42 ` Halil Pasic [this message]
2019-02-15 22:08 ` David Hildenbrand
2019-02-15 12:51 ` Halil Pasic
2019-02-15 13:33 ` Cornelia Huck
2019-01-23 15:12 ` Michael S. Tsirkin
2019-01-11 15:29 ` Halil Pasic
2019-01-11 16:07 ` Dr. David Alan Gilbert
2019-01-11 17:57 ` Halil Pasic
2019-01-15 9:33 ` Cornelia Huck
2019-02-13 2:25 ` [virtio-comment] " Stefan Hajnoczi
2019-02-13 10:44 ` Dr. David Alan Gilbert
2019-02-14 3:43 ` Stefan Hajnoczi
2019-01-11 11:41 ` [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability Dr. David Alan Gilbert (git)
2019-02-13 2:30 ` [virtio-comment] " Stefan Hajnoczi
2019-01-11 11:42 ` [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers Dr. David Alan Gilbert (git)
2019-02-13 2:33 ` [virtio-comment] " Stefan Hajnoczi
2019-02-13 16:52 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190215224239.6c0cb1d1@oc2783563651 \
--to=pasic@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lfy@google.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox