From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages
Date: Tue, 22 Jul 2014 20:38:58 -0400
Message-ID: <20140723003858.GB5022@laptop.dumpdata.com>
References: <1404135584-29206-1-git-send-email-bob.liu@oracle.com>
	<1404135584-29206-3-git-send-email-bob.liu@oracle.com>
	<53B2979C020000780001EE97@mail.emea.novell.com>
	<53B2A8C7.9040601@oracle.com>
	<53B2CCD1020000780001F027@mail.emea.novell.com>
	<53C4F171.8060807@oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <konrad.wilk@oracle.com>) id 1X9kaQ-0003n2-Sy
	for xen-devel@lists.xenproject.org; Wed, 23 Jul 2014 00:39:15 +0000
Content-Disposition: inline
In-Reply-To: <53C4F171.8060807@oracle.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Bob Liu <bob.liu@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>, keir@xen.org, ian.campbell@citrix.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, Jan Beulich <JBeulich@suse.com>, xen-devel@lists.xenproject.org
List-Id: xen-devel@lists.xenproject.org

On Tue, Jul 15, 2014 at 05:16:33PM +0800, Bob Liu wrote:
> Hi Jan & Konrad,
> 
> On 07/01/2014 08:59 PM, Jan Beulich wrote:
> >>>> On 01.07.14 at 14:25, <bob.liu@oracle.com> wrote:
> >> On 07/01/2014 05:12 PM, Jan Beulich wrote:
> >>>>>> On 30.06.14 at 15:39, <lliubbo@gmail.com> wrote:
> >>>> @@ -948,6 +954,7 @@ static void free_heap_pages(
> >>>>      {
> >>>>          if ( !tainted )
> >>>>          {
> >>>> +            node_need_scrub[node] = 1;
> >>>>              for ( i = 0; i < (1 << order); i++ )
> >>>>                  pg[i].count_info |= PGC_need_scrub;
> >>>>          }
> >>>
> >>> Iirc it was more than this single place where you set
> >>> PGC_need_scrub, and hence where you'd now need to set the
> >>> other flag too.
> >>>
> >>
> >> I'm afraid this is the only place where PGC_need_scrub was set.
> > 
> > Ah, indeed - I misremembered others, they are all tests for the flag.
> > 
> >> I'm sorry for all of the coding style problems.
> >>
> >> By the way is there any script which can be used to check the code
> >> before submitting? Something like ./scripts/checkpatch.pl under linux.
> > 
> > No, there isn't. But avoiding (or spotting) hard tabs should be easy
> > enough, and other things you ought to simply inspect your patch for
> > - after all that's no different from what reviewers do.
> > 
> >>>> +    }
> >>>> +
> >>>> +    /* free percpu free list */
> >>>> +    if ( !page_list_empty(local_free_list) )
> >>>> +    {
> >>>> +        spin_lock(&heap_lock);
> >>>> +        page_list_for_each_safe( pg, tmp, local_free_list )
> >>>> +        {
> >>>> +            order = PFN_ORDER(pg);
> >>>> +            page_list_del(pg, local_free_list);
> >>>> +            for ( i = 0; i < (1 << order); i++ )
> >>>> +	    {
> >>>> +                pg[i].count_info |= PGC_state_free;
> >>>> +                pg[i].count_info &= ~PGC_need_scrub;
> >>>
> >>> This needs to happen earlier - the scrub flag should be cleared right
> >>> after scrubbing, and the free flag should imo be set when the page
> >>> gets freed. That's for two reasons:
> >>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can
> >>> allocate memory regardless of the scrub flag's state.
> >>
> >> AFAIR, the reason I set those flags here is to avoid a panic happen.
> > 
> > That's pretty vague a statement.
> > 
> >>> 2) You still detain the memory on the local lists from allocation. On a
> >>> many-node system, the 16Mb per node can certainly sum up (which
> >>> is not to say that I don't view the 16Mb on a single node as already
> >>> problematic).
> >>
> >> Right, but we can adjust SCRUB_BATCH_ORDER.
> >> Anyway I'll take a retry as you suggested.
> > 
> > You should really drop the idea of removing pages temporarily.
> > All you need to do is make sure a page being allocated and getting
> > simultaneously scrubbed by another CPU won't get passed to the
> > caller until the scrubbing finished. In particular it's no problem if
> > the allocating CPU occasionally ends up scrubbing a page already
> > being scrubbed elsewhere.
> > 
> 
> After so many days I haven't make a workable solution if don't remove
> pages temporarily. The hardest part is iterating the heap free list
> without holding heap_lock because if holding the lock it might be heavy
> lock contention.

Why do you need to hold on the heap_lock?

I presume that the flag changes need an cmpxchg operation. In which case
could you add another flag that would be 'PG_scrubbing_now` - and the
allocator could bypass all of the pages that have said flag (and also
PG_need_scrub).

If the allocator can't find enough pages to satisfy the demands - it goes
back in and looks at PG_need_scrub pages and scrubs those.

In parallel, another thread (threads?) pick the ones that have PG_need_scrub
and change them to PG_scrubbing_now and scrubs them. When it has
completed it removes both flags.

> So do you think it's acceptable if fixed all other concerns about this
> patch?
> 
> And there is another idea based on the assumption that only guest
> allocation need all pages to be scrubbed(It's safe for xen hypervisor to
> use an un-scrubbed page):
> 1. set the need_scrub flag in free_heap_pages().
> 2. alloc pages from heap free list, don't scrub and neither clear the
> need_scrub flag.
> 3. When the guest accessing the real memory(machine pages) the first
> time, page fault should happen. We track this event and before build the

I think you mean that we can set the EPT (or shadow) all of those PG_need_scrub
pages to be r/o. And then when the guest access them we would trap in
the hypervisor and scrub them. However, that means the guest can
still read those pages without trapping - and that means we might
be reading sensitive data from pages that came from another guest!


> right PFN to MFN page table mappings(correct me if it doesn't work as
> this), at this place scrub the page if the need_scrub flag was set.
> 
> By this way only guest real used pages are needed scrubbing and the time
> is scattered.
> How do you think of this solution?
> 
> -- 
> Regards,
> -Bob