From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH 3/3] Port of mmap_batch_v2 to support paging in Xen Date: Sat, 17 Dec 2011 16:29:51 -0500 Message-ID: <20111217212951.GA17479@phenom.dumpdata.com> References: <1324092141-9730-1-git-send-email-adin@scannell.ca> <1324092141-9730-4-git-send-email-adin@scannell.ca> <20111217144003.GE14816@konrad-lan> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Adin Scannell Cc: olaf@aepfle.de, xen-devel@lists.xensource.com, Adin Scannell , andres@gridcentric.ca, JBeulich@suse.com, Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Sat, Dec 17, 2011 at 11:51:11AM -0500, Adin Scannell wrote: > Hi Konrad, > = > Thanks for the quick turnaround. I'll incorporate your feedback and > resend. Some NOTES are inline below. > = > On Sat, Dec 17, 2011 at 9:40 AM, Konrad Rzeszutek Wilk > wrote: > > On Fri, Dec 16, 2011 at 10:22:21PM -0500, Adin Scannell wrote: > >> This wasn't ported from any patch, but was rewritten based on the XCP = 2.6.32 > >> tree. =A0The code structure is significantly different and this patch = mirrors the > >> existing Linux code. > >> > >> The primary reason for need the V2 interface is to support foreign map= pings > >> (i.e. qemu) of paged-out pages. =A0The libxc code will already retry m= appings > >> when an ENOENT is returned. =A0The V2 interface provides a richer erro= r value, > >> so the user-space code is capable of handling these errors specificall= y. > > > > Can you give more details on how to use paged-out pages. Perhaps a > > pointer to the xen's docs? > = > Hrm, in usual fashion the docs are a bit lacking. > = > Simply: the kernel has to do nothing to support paging (other than the > backend drivers which need to handle the grant EAGAIN case -- other > patch). The only reason the V2 interface is needed here is to pass Hm I did not see the netback one? Should that also incorporate this? > back full error codes from the mmu_update()'s. Xen and libxc have a > mutual understanding that when you receive an ENOENT error code, you > back off and try again. What you described is pretty good. Please do include it in the git description. Thx > = > >> > >> Signed-off-by: Adin Scannell > >> > >> Index: linux/drivers/xen/xenfs/privcmd.c > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> --- > >> =A0drivers/xen/xenfs/privcmd.c | =A0 90 ++++++++++++++++++++++++++++++= ++++++++++++- > > > > So that file just moved to drivers/xen/privcmd.c > = > Of course it has :) Well, we can't have it easy can we! :-) > = > >> =A0include/xen/privcmd.h =A0 =A0 =A0 | =A0 10 +++++ > >> =A02 files changed, 99 insertions(+), 1 deletions(-) > >> > >> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c > >> index dbd3b16..21cbb5a 100644 > >> --- a/drivers/xen/xenfs/privcmd.c > >> +++ b/drivers/xen/xenfs/privcmd.c > >> @@ -70,7 +70,7 @@ static void free_page_list(struct list_head *pages) > >> =A0 */ > >> =A0static int gather_array(struct list_head *pagelist, > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned nelem, size_t siz= e, > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 void __user *data) > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const void __user *data) > >> =A0{ > >> =A0 =A0 =A0 unsigned pageidx; > >> =A0 =A0 =A0 void *pagedata; > >> @@ -245,6 +245,15 @@ struct mmap_batch_state { > >> =A0 =A0 =A0 xen_pfn_t __user *user; > >> =A0}; > >> > >> +struct mmap_batch_v2_state { > >> + =A0 =A0 domid_t domain; > >> + =A0 =A0 unsigned long va; > >> + =A0 =A0 struct vm_area_struct *vma; > >> + =A0 =A0 int paged_out; > > > > Should this be unsigned int? > = > Noted below. > = > >> + > >> + =A0 =A0 int __user *err; > >> +}; > >> + > >> =A0static int mmap_batch_fn(void *data, void *state) > >> =A0{ > >> =A0 =A0 =A0 xen_pfn_t *mfnp =3D data; > >> @@ -260,6 +269,20 @@ static int mmap_batch_fn(void *data, void *state) > >> =A0 =A0 =A0 return 0; > >> =A0} > >> > >> +static int mmap_batch_v2_fn(void *data, void *state) > >> +{ > >> + =A0 =A0 xen_pfn_t *mfnp =3D data; > >> + =A0 =A0 struct mmap_batch_v2_state *st =3D state; > >> + > >> + =A0 =A0 int rc =3D xen_remap_domain_mfn_range(st->vma, st->va & PAGE= _MASK, *mfnp, 1, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0st->vma->vm_page_prot, st->domain); > > > > You don't want to check that st is not NULL? > = > This could be an assertion as it's only used in the > ioctl_mmap_batch_v2 function where it's guaranteed to pass in a > non-null state. > = > I'll add an additional patch to the queue that adds these assertions > to both mmap_batch_fn and mmap_batch_fn_v2. > = > >> + =A0 =A0 =A0 =A0 =A0 =A0 st->paged_out++; > > Is it possible that this ends overflowing and hitting 0? > = > I don't think this is an issue practically, as the only way to trigger > this would be to be on a 64bit machine and map an ungodly number of > pages with a single mmap_batch (MAX_INT). For correctness though, I > can update this variable and the err variable in mmap_batch_state > which suffers from the same problem -- turn them into unsigned long. > This will be included in the additional patch. > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0m.arr); > >> + > >> + =A0 =A0 if (ret || list_empty(&pagelist)) > >> + =A0 =A0 =A0 =A0 =A0 =A0 goto out; > >> + > >> + =A0 =A0 down_write(&mm->mmap_sem); > >> + > >> + =A0 =A0 vma =3D find_vma(mm, m.addr); > >> + =A0 =A0 ret =3D -EINVAL; > >> + =A0 =A0 /* We allow multiple shots here, because this interface > >> + =A0 =A0 =A0* is used by libxc and mappings for specific pages will > >> + =A0 =A0 =A0* be retried when pages are paged-out (ENOENT). */ > >> + =A0 =A0 if (!vma || > >> + =A0 =A0 =A0 =A0 vma->vm_ops !=3D &privcmd_vm_ops || > >> + =A0 =A0 =A0 =A0 (m.addr < vma->vm_start) || > >> + =A0 =A0 =A0 =A0 ((m.addr + (nr_pages << PAGE_SHIFT)) > vma->vm_end))= { > >> + =A0 =A0 =A0 =A0 =A0 =A0 up_write(&mm->mmap_sem); > >> + =A0 =A0 =A0 =A0 =A0 =A0 goto out; > >> + =A0 =A0 } > >> + > >> + =A0 =A0 state.domain =3D m.dom; > > > > Should you check the m.dom for incorrect ones? Like -1? or DOMID_IO? > = > I followed the example for mmap_batch, which just tries the call and > returns whatever error Xen gives. I think that's the right approach > for these. OK. I think a another patch to actually check for that in every other ioclt in this code might be worth it. > = > Cheers! > -Adin