From mboxrd@z Thu Jan 1 00:00:00 1970 From: annie li Subject: Re: Rebooting domu fails in nfs share exported from another domu on the same dom0 Date: Mon, 28 Jul 2014 12:14:13 -0400 Message-ID: <53D676D5.4090909@oracle.com> References: <53C6E24D.7050903@oracle.com> <53D65AD3.4030804@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53D65AD3.4030804@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel Cc: "xen-devel@lists.xen.org" , roger.pau@citrix.com List-Id: xen-devel@lists.xenproject.org On 2014/7/28 10:14, David Vrabel wrote: > On 16/07/14 21:36, annie li wrote: >> Hi >> >> I hit a problem in such scenario: vm1 is running and export nfs service, >> dom0 mount this nfs, and vm2 is booted in this nfs location. vm1 and vm2 >> are running on the same dom0. >> >> When this bug happens, the data flow is: vm2 blkfront-> vm2 blkback-> >> loop -> nfs file -> nfs client -> bridge priv1 -> vm1 vif -> vm1 netback >> -> vm1 netfront. >> >> In above data flow, nfs implements direct io, blkfront and blkback uses >> grantmap. This makes page mapping works well through vm2 blkfront to vm1 >> netback. However, when netback does grant copy, the error happens in >> this routine: >> __gnttab_copy->__get_paged_frame->get_page_from_gfn->get_page. >> See /xen/arch/x86/mm.c get_page(), >> if ( likely(owner == domain) ) >> return 1; >> In above if condition, the src page is from vm2, so owner is id of vm2, >> domain is 0 here. Then get_page return 0, hence get_page_from_gfn return >> NULL and __get_paged_frame return GNTST_bad_page. Finally, put_page is >> called in __grant_copy directly and grant copy fails in netback. As a >> result, writing to nfsfile fails and this results damage to nfsfile, >> then vm can not be rebooted successfully. >> >> Disable the nfs direct io can be a workaround, however, this will cause >> performance penalty. Or any copy is involved between vm2 blkfront->vm1 >> netback probably helps in this case. But zerocopy is the best thing for >> performance, so any suggestions for this issue? > I planned (eventually) for foreign struct page's for grant mapped frames > to be marked as such and then the gref and original domain accessible. > The netback specific code for dealing with foreign pages could then be > made generic. This sounds good if dealing with foreign pages in netback could be generic. > > The difficultly lies in extending struct page without actually making it > bigger and without adding Xen-specific fields into it... Yes... > > Other alternatives I explored were using the guest mapping to copy > to/from instead of having to use the grant ref to find the page. But > page sharing etc. made this look like a nightmare. What I am thinking is add one more item named "frame" in grant_mapping structure, see xen/include/xen/grant_table.h. From this, we can get the ref based on foreign page, this probably involves some searching work. But I was interrupted by other works and did not started it till now. For example, struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */ + unsigned long frame; /* grant frame */ }; Thanks Annie