From: "Zhang, Haozhong" <haozhong.zhang@intel.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Juergen Gross <jgross@suse.com>, Wei Liu <wei.liu2@citrix.com>,
"Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
Ian Campbell <ian.campbell@citrix.com>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <jbeulich@suse.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [RFC Design Doc] Add vNVDIMM support for Xen
Date: Mon, 15 Feb 2016 17:04:55 +0800 [thread overview]
Message-ID: <20160215090455.GC8938@hz-desktop.sh.intel.com> (raw)
In-Reply-To: <20160203154744.GG20732@char.us.oracle.com>
On 02/03/16 23:47, Konrad Rzeszutek Wilk wrote:
> > > > > Open: It seems no system call/ioctl is provided by Linux kernel to
> > > > > get the physical address from a virtual address.
> > > > > /proc/<qemu_pid>/pagemap provides information of mapping from
> > > > > VA to PA. Is it an acceptable solution to let QEMU parse this
> > > > > file to get the physical address?
> > > >
> > > > Does it work in a non-root scenario?
> > > >
> > >
> > > Seemingly no, according to Documentation/vm/pagemap.txt in Linux kernel:
> > > | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
> > > | In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from
> > > | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
> > > | Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
>
> Ah right.
> > >
> > > A possible alternative is to add a new hypercall similar to
> > > XEN_DOMCTL_memory_mapping but receiving virtual address as the address
> > > parameter and translating to machine address in the hypervisor.
> >
> > That might work.
>
> That won't work.
>
> This is a userspace VMA - which means the once the ioctl is done we swap
> to kernel virtual addresses. Now we may know that the prior cr3 has the
> userspace virtual address and walk it down - but what if the domain
> that is doing this is PVH? (or HVM) - the cr3 of userspace is tucked somewhere
> inside the kernel.
>
> Which means this hypercall would need to know the Linux kernel task structure
> to find this.
>
> May I propose another solution - an stacking driver (similar to loop). You
> setup it up (ioctl /dev/pmem0/guest.img, get some /dev/mapper/guest.img created).
> Then mmap the /dev/mapper/guest.img - all of the operations are the same - except
> it may have an extra ioctl - get_pfns - which would provide the data in similar
> form to pagemap.txt.
>
This stack driver approach seems still need privileged permission and
more modifications in kernel, so ...
> But folks will then ask - why don't you just use pagemap? Could the pagemap
> have an extra security capability check? One that can be set for
> QEMU?
>
... I would like to use pagemap and mlock.
Haozhong
> >
> >
> > > > > Open: For a large pmem, mmap(2) is very possible to not map all SPA
> > > > > occupied by pmem at the beginning, i.e. QEMU may not be able to
> > > > > get all SPA of pmem from buf (in virtual address space) when
> > > > > calling XEN_DOMCTL_memory_mapping.
> > > > > Can mmap flag MAP_LOCKED or mlock(2) be used to enforce the
> > > > > entire pmem being mmaped?
> > > >
> > > > Ditto
> > > >
> > >
> > > No. If I take the above alternative for the first open, maybe the new
> > > hypercall above can inject page faults into dom0 for the unmapped
> > > virtual address so as to enforce dom0 Linux to create the page
> > > mapping.
>
> Ugh. That sounds hacky. And you wouldn't neccessarily be safe.
> Imagine that the system admin decides to defrag the /dev/pmem filesystem.
> Or move the files (disk images) around. If they do that - we may
> still have the guest mapped to system addresses which may contain filesystem
> metadata now, or a different guest image. We MUST mlock or lock the file
> during the duration of the guest.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-02-15 9:04 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-01 5:44 [RFC Design Doc] Add vNVDIMM support for Xen Haozhong Zhang
2016-02-01 18:25 ` Andrew Cooper
2016-02-02 3:27 ` Tian, Kevin
2016-02-02 3:44 ` Haozhong Zhang
2016-02-02 11:09 ` Andrew Cooper
2016-02-02 6:33 ` Tian, Kevin
2016-02-02 7:39 ` Zhang, Haozhong
2016-02-02 7:48 ` Tian, Kevin
2016-02-02 7:53 ` Zhang, Haozhong
2016-02-02 8:03 ` Tian, Kevin
2016-02-02 8:49 ` Zhang, Haozhong
2016-02-02 19:01 ` Konrad Rzeszutek Wilk
2016-02-02 17:11 ` Stefano Stabellini
2016-02-03 7:00 ` Haozhong Zhang
2016-02-03 9:13 ` Jan Beulich
2016-02-03 14:09 ` Andrew Cooper
2016-02-03 14:23 ` Haozhong Zhang
2016-02-05 14:40 ` Ross Philipson
2016-02-06 1:43 ` Haozhong Zhang
2016-02-06 16:17 ` Ross Philipson
2016-02-03 12:02 ` Stefano Stabellini
2016-02-03 13:11 ` Haozhong Zhang
2016-02-03 14:20 ` Andrew Cooper
2016-02-04 3:10 ` Haozhong Zhang
2016-02-03 15:16 ` George Dunlap
2016-02-03 15:22 ` Stefano Stabellini
2016-02-03 15:35 ` Konrad Rzeszutek Wilk
2016-02-03 15:35 ` George Dunlap
2016-02-04 2:55 ` Haozhong Zhang
2016-02-04 12:24 ` Stefano Stabellini
2016-02-15 3:16 ` Zhang, Haozhong
2016-02-16 11:14 ` Stefano Stabellini
2016-02-16 12:55 ` Jan Beulich
2016-02-17 9:03 ` Haozhong Zhang
2016-03-04 7:30 ` Haozhong Zhang
2016-03-16 12:55 ` Haozhong Zhang
2016-03-16 13:13 ` Konrad Rzeszutek Wilk
2016-03-16 13:16 ` Jan Beulich
2016-03-16 13:55 ` Haozhong Zhang
2016-03-16 14:23 ` Jan Beulich
2016-03-16 14:55 ` Haozhong Zhang
2016-03-16 15:23 ` Jan Beulich
2016-03-17 8:58 ` Haozhong Zhang
2016-03-17 11:04 ` Jan Beulich
2016-03-17 12:44 ` Haozhong Zhang
2016-03-17 12:59 ` Jan Beulich
2016-03-17 13:29 ` Haozhong Zhang
2016-03-17 13:52 ` Jan Beulich
2016-03-17 14:00 ` Ian Jackson
2016-03-17 14:21 ` Haozhong Zhang
2016-03-29 8:47 ` Haozhong Zhang
2016-03-29 9:11 ` Jan Beulich
2016-03-29 10:10 ` Haozhong Zhang
2016-03-29 10:49 ` Jan Beulich
2016-04-08 5:02 ` Haozhong Zhang
2016-04-08 15:52 ` Jan Beulich
2016-04-12 8:45 ` Haozhong Zhang
2016-04-21 5:09 ` Haozhong Zhang
2016-04-21 7:04 ` Jan Beulich
2016-04-22 2:36 ` Haozhong Zhang
2016-04-22 8:24 ` Jan Beulich
2016-04-22 10:16 ` Haozhong Zhang
2016-04-22 10:53 ` Jan Beulich
2016-04-22 12:26 ` Haozhong Zhang
2016-04-22 12:36 ` Jan Beulich
2016-04-22 12:54 ` Haozhong Zhang
2016-04-22 13:22 ` Jan Beulich
2016-03-17 13:32 ` Konrad Rzeszutek Wilk
2016-02-03 15:47 ` Konrad Rzeszutek Wilk
2016-02-04 2:36 ` Haozhong Zhang
2016-02-15 9:04 ` Zhang, Haozhong [this message]
2016-02-02 19:15 ` Konrad Rzeszutek Wilk
2016-02-03 8:28 ` Haozhong Zhang
2016-02-03 9:18 ` Jan Beulich
2016-02-03 12:22 ` Haozhong Zhang
2016-02-03 12:38 ` Jan Beulich
2016-02-03 12:49 ` Haozhong Zhang
2016-02-03 14:30 ` Andrew Cooper
2016-02-03 14:39 ` Jan Beulich
2016-02-15 8:43 ` Haozhong Zhang
2016-02-15 11:07 ` Jan Beulich
2016-02-17 9:01 ` Haozhong Zhang
2016-02-17 9:08 ` Jan Beulich
2016-02-18 7:42 ` Haozhong Zhang
2016-02-19 2:14 ` Konrad Rzeszutek Wilk
2016-03-01 7:39 ` Haozhong Zhang
2016-03-01 18:33 ` Ian Jackson
2016-03-01 18:49 ` Konrad Rzeszutek Wilk
2016-03-02 7:14 ` Haozhong Zhang
2016-03-02 13:03 ` Jan Beulich
2016-03-04 2:20 ` Haozhong Zhang
2016-03-08 9:15 ` Haozhong Zhang
2016-03-08 9:27 ` Jan Beulich
2016-03-09 12:22 ` Haozhong Zhang
2016-03-09 16:17 ` Jan Beulich
2016-03-10 3:27 ` Haozhong Zhang
2016-03-17 11:05 ` Ian Jackson
2016-03-17 13:37 ` Haozhong Zhang
2016-03-17 13:56 ` Jan Beulich
2016-03-17 14:22 ` Haozhong Zhang
2016-03-17 14:12 ` Xu, Quan
2016-03-17 14:22 ` Zhang, Haozhong
2016-03-07 20:53 ` Konrad Rzeszutek Wilk
2016-03-08 5:50 ` Haozhong Zhang
2016-02-18 17:17 ` Jan Beulich
2016-02-24 13:28 ` Haozhong Zhang
2016-02-24 14:00 ` Ross Philipson
2016-02-24 16:42 ` Haozhong Zhang
2016-02-24 17:50 ` Ross Philipson
2016-02-24 14:24 ` Jan Beulich
2016-02-24 15:48 ` Haozhong Zhang
2016-02-24 16:54 ` Jan Beulich
2016-02-28 14:48 ` Haozhong Zhang
2016-02-29 9:01 ` Jan Beulich
2016-02-29 9:45 ` Haozhong Zhang
2016-02-29 10:12 ` Jan Beulich
2016-02-29 11:52 ` Haozhong Zhang
2016-02-29 12:04 ` Jan Beulich
2016-02-29 12:22 ` Haozhong Zhang
2016-03-01 13:51 ` Ian Jackson
2016-03-01 15:04 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160215090455.GC8938@hz-desktop.sh.intel.com \
--to=haozhong.zhang@intel.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.