From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages Date: Tue, 22 Jul 2014 20:38:58 -0400 Message-ID: <20140723003858.GB5022@laptop.dumpdata.com> References: <1404135584-29206-1-git-send-email-bob.liu@oracle.com> <1404135584-29206-3-git-send-email-bob.liu@oracle.com> <53B2979C020000780001EE97@mail.emea.novell.com> <53B2A8C7.9040601@oracle.com> <53B2CCD1020000780001F027@mail.emea.novell.com> <53C4F171.8060807@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X9kaQ-0003n2-Sy for xen-devel@lists.xenproject.org; Wed, 23 Jul 2014 00:39:15 +0000 Content-Disposition: inline In-Reply-To: <53C4F171.8060807@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Bob Liu Cc: Bob Liu , keir@xen.org, ian.campbell@citrix.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, Jan Beulich , xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On Tue, Jul 15, 2014 at 05:16:33PM +0800, Bob Liu wrote: > Hi Jan & Konrad, > > On 07/01/2014 08:59 PM, Jan Beulich wrote: > >>>> On 01.07.14 at 14:25, wrote: > >> On 07/01/2014 05:12 PM, Jan Beulich wrote: > >>>>>> On 30.06.14 at 15:39, wrote: > >>>> @@ -948,6 +954,7 @@ static void free_heap_pages( > >>>> { > >>>> if ( !tainted ) > >>>> { > >>>> + node_need_scrub[node] = 1; > >>>> for ( i = 0; i < (1 << order); i++ ) > >>>> pg[i].count_info |= PGC_need_scrub; > >>>> } > >>> > >>> Iirc it was more than this single place where you set > >>> PGC_need_scrub, and hence where you'd now need to set the > >>> other flag too. > >>> > >> > >> I'm afraid this is the only place where PGC_need_scrub was set. > > > > Ah, indeed - I misremembered others, they are all tests for the flag. > > > >> I'm sorry for all of the coding style problems. > >> > >> By the way is there any script which can be used to check the code > >> before submitting? Something like ./scripts/checkpatch.pl under linux. > > > > No, there isn't. But avoiding (or spotting) hard tabs should be easy > > enough, and other things you ought to simply inspect your patch for > > - after all that's no different from what reviewers do. > > > >>>> + } > >>>> + > >>>> + /* free percpu free list */ > >>>> + if ( !page_list_empty(local_free_list) ) > >>>> + { > >>>> + spin_lock(&heap_lock); > >>>> + page_list_for_each_safe( pg, tmp, local_free_list ) > >>>> + { > >>>> + order = PFN_ORDER(pg); > >>>> + page_list_del(pg, local_free_list); > >>>> + for ( i = 0; i < (1 << order); i++ ) > >>>> + { > >>>> + pg[i].count_info |= PGC_state_free; > >>>> + pg[i].count_info &= ~PGC_need_scrub; > >>> > >>> This needs to happen earlier - the scrub flag should be cleared right > >>> after scrubbing, and the free flag should imo be set when the page > >>> gets freed. That's for two reasons: > >>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can > >>> allocate memory regardless of the scrub flag's state. > >> > >> AFAIR, the reason I set those flags here is to avoid a panic happen. > > > > That's pretty vague a statement. > > > >>> 2) You still detain the memory on the local lists from allocation. On a > >>> many-node system, the 16Mb per node can certainly sum up (which > >>> is not to say that I don't view the 16Mb on a single node as already > >>> problematic). > >> > >> Right, but we can adjust SCRUB_BATCH_ORDER. > >> Anyway I'll take a retry as you suggested. > > > > You should really drop the idea of removing pages temporarily. > > All you need to do is make sure a page being allocated and getting > > simultaneously scrubbed by another CPU won't get passed to the > > caller until the scrubbing finished. In particular it's no problem if > > the allocating CPU occasionally ends up scrubbing a page already > > being scrubbed elsewhere. > > > > After so many days I haven't make a workable solution if don't remove > pages temporarily. The hardest part is iterating the heap free list > without holding heap_lock because if holding the lock it might be heavy > lock contention. Why do you need to hold on the heap_lock? I presume that the flag changes need an cmpxchg operation. In which case could you add another flag that would be 'PG_scrubbing_now` - and the allocator could bypass all of the pages that have said flag (and also PG_need_scrub). If the allocator can't find enough pages to satisfy the demands - it goes back in and looks at PG_need_scrub pages and scrubs those. In parallel, another thread (threads?) pick the ones that have PG_need_scrub and change them to PG_scrubbing_now and scrubs them. When it has completed it removes both flags. > So do you think it's acceptable if fixed all other concerns about this > patch? > > And there is another idea based on the assumption that only guest > allocation need all pages to be scrubbed(It's safe for xen hypervisor to > use an un-scrubbed page): > 1. set the need_scrub flag in free_heap_pages(). > 2. alloc pages from heap free list, don't scrub and neither clear the > need_scrub flag. > 3. When the guest accessing the real memory(machine pages) the first > time, page fault should happen. We track this event and before build the I think you mean that we can set the EPT (or shadow) all of those PG_need_scrub pages to be r/o. And then when the guest access them we would trap in the hypervisor and scrub them. However, that means the guest can still read those pages without trapping - and that means we might be reading sensitive data from pages that came from another guest! > right PFN to MFN page table mappings(correct me if it doesn't work as > this), at this place scrub the page if the need_scrub flag was set. > > By this way only guest real used pages are needed scrubbing and the time > is scattered. > How do you think of this solution? > > -- > Regards, > -Bob