Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Bob Liu <bob.liu@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>,
	keir@xen.org, ian.campbell@citrix.com,
	George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages
Date: Wed, 23 Jul 2014 09:30:09 +0800	[thread overview]
Message-ID: <53CF1021.2090209@oracle.com> (raw)
In-Reply-To: <20140723003858.GB5022@laptop.dumpdata.com>


On 07/23/2014 08:38 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 15, 2014 at 05:16:33PM +0800, Bob Liu wrote:
>> Hi Jan & Konrad,
>>
>> On 07/01/2014 08:59 PM, Jan Beulich wrote:
>>>>>> On 01.07.14 at 14:25, <bob.liu@oracle.com> wrote:
>>>> On 07/01/2014 05:12 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.14 at 15:39, <lliubbo@gmail.com> wrote:
>>>>>> @@ -948,6 +954,7 @@ static void free_heap_pages(
>>>>>>      {
>>>>>>          if ( !tainted )
>>>>>>          {
>>>>>> +            node_need_scrub[node] = 1;
>>>>>>              for ( i = 0; i < (1 << order); i++ )
>>>>>>                  pg[i].count_info |= PGC_need_scrub;
>>>>>>          }
>>>>>
>>>>> Iirc it was more than this single place where you set
>>>>> PGC_need_scrub, and hence where you'd now need to set the
>>>>> other flag too.
>>>>>
>>>>
>>>> I'm afraid this is the only place where PGC_need_scrub was set.
>>>
>>> Ah, indeed - I misremembered others, they are all tests for the flag.
>>>
>>>> I'm sorry for all of the coding style problems.
>>>>
>>>> By the way is there any script which can be used to check the code
>>>> before submitting? Something like ./scripts/checkpatch.pl under linux.
>>>
>>> No, there isn't. But avoiding (or spotting) hard tabs should be easy
>>> enough, and other things you ought to simply inspect your patch for
>>> - after all that's no different from what reviewers do.
>>>
>>>>>> +    }
>>>>>> +
>>>>>> +    /* free percpu free list */
>>>>>> +    if ( !page_list_empty(local_free_list) )
>>>>>> +    {
>>>>>> +        spin_lock(&heap_lock);
>>>>>> +        page_list_for_each_safe( pg, tmp, local_free_list )
>>>>>> +        {
>>>>>> +            order = PFN_ORDER(pg);
>>>>>> +            page_list_del(pg, local_free_list);
>>>>>> +            for ( i = 0; i < (1 << order); i++ )
>>>>>> +	    {
>>>>>> +                pg[i].count_info |= PGC_state_free;
>>>>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>>>>
>>>>> This needs to happen earlier - the scrub flag should be cleared right
>>>>> after scrubbing, and the free flag should imo be set when the page
>>>>> gets freed. That's for two reasons:
>>>>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can
>>>>> allocate memory regardless of the scrub flag's state.
>>>>
>>>> AFAIR, the reason I set those flags here is to avoid a panic happen.
>>>
>>> That's pretty vague a statement.
>>>
>>>>> 2) You still detain the memory on the local lists from allocation. On a
>>>>> many-node system, the 16Mb per node can certainly sum up (which
>>>>> is not to say that I don't view the 16Mb on a single node as already
>>>>> problematic).
>>>>
>>>> Right, but we can adjust SCRUB_BATCH_ORDER.
>>>> Anyway I'll take a retry as you suggested.
>>>
>>> You should really drop the idea of removing pages temporarily.
>>> All you need to do is make sure a page being allocated and getting
>>> simultaneously scrubbed by another CPU won't get passed to the
>>> caller until the scrubbing finished. In particular it's no problem if
>>> the allocating CPU occasionally ends up scrubbing a page already
>>> being scrubbed elsewhere.
>>>
>>
>> After so many days I haven't make a workable solution if don't remove
>> pages temporarily. The hardest part is iterating the heap free list
>> without holding heap_lock because if holding the lock it might be heavy
>> lock contention.
> 
> Why do you need to hold on the heap_lock?
> 
> I presume that the flag changes need an cmpxchg operation. In which case
> could you add another flag that would be 'PG_scrubbing_now` - and the
> allocator could bypass all of the pages that have said flag (and also
> PG_need_scrub).

That's what I'm struggling with these days.

> 
> If the allocator can't find enough pages to satisfy the demands - it goes
> back in and looks at PG_need_scrub pages and scrubs those.
> 
> In parallel, another thread (threads?) pick the ones that have PG_need_scrub
> and change them to PG_scrubbing_now and scrubs them. When it has
> completed it removes both flags.
> 

But there is still race condition:

A:alloc path                        B:idle scrub path

                                    page = list_entry(&heap_list);

^^^
Delist the page from heap_list
(Can happen any time)

                                    if (PG_need_scrub)
                                        set PG_scrubbing_now
                                    scrub and clear both flags
                                    page = page_list_next(page);
                                           ^^^ Get panic


Another problem Jan pointed out before is if order=20 and being scrubbed
in idle vcpus, alloc path will fail because of the PG_scrubbing_now
flag. This will introduce a long time we have large memory but can't
allocate any page.

>> So do you think it's acceptable if fixed all other concerns about this
>> patch?
>>
>> And there is another idea based on the assumption that only guest
>> allocation need all pages to be scrubbed(It's safe for xen hypervisor to
>> use an un-scrubbed page):
>> 1. set the need_scrub flag in free_heap_pages().
>> 2. alloc pages from heap free list, don't scrub and neither clear the
>> need_scrub flag.
>> 3. When the guest accessing the real memory(machine pages) the first
>> time, page fault should happen. We track this event and before build the
> 
> I think you mean that we can set the EPT (or shadow) all of those PG_need_scrub

No, I mean at the point when setting the EPT or shadow page table entry,
scrub the related machine page if it has the PG_need_scrub flag.

I think in Pod mode the place is in p2m_set_entry(), scrub the mfn
related page if PG_need_scrub before we set up the entry.

> pages to be r/o. And then when the guest access them we would trap in
> the hypervisor and scrub them. However, that means the guest can
> still read those pages without trapping - and that means we might
> be reading sensitive data from pages that came from another guest!
> 
> 
>> right PFN to MFN page table mappings(correct me if it doesn't work as
>> this), at this place scrub the page if the need_scrub flag was set.
>>
>> By this way only guest real used pages are needed scrubbing and the time
>> is scattered.
>> How do you think of this solution?

-- 
Regards,
-Bob

next prev parent reply	other threads:[~2014-07-23  1:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 13:39 [PATCH v2 1/3] xen: delay page scrubbing to allocation path Bob Liu
2014-06-30 13:39 ` [PATCH v2 2/3] xen: introduce function merge_free_trunks Bob Liu
2014-06-30 15:58   ` Jan Beulich
2014-07-01  8:14     ` Bob Liu
2014-07-01  8:27       ` Jan Beulich
2014-06-30 13:39 ` [PATCH v2 3/3] xen: use idle vcpus to scrub pages Bob Liu
2014-07-01  9:12   ` Jan Beulich
2014-07-01 12:25     ` Bob Liu
2014-07-01 12:59       ` Jan Beulich
2014-07-02  6:27         ` Bob Liu
2014-07-07 12:20           ` Bob Liu
2014-07-15  9:16         ` Bob Liu
2014-07-23  0:38           ` Konrad Rzeszutek Wilk
2014-07-23  1:30             ` Bob Liu [this message]
2014-07-23  7:28           ` Jan Beulich
2014-07-24  2:08             ` Bob Liu
2014-07-24  6:24               ` Jan Beulich
2014-07-25  0:42                 ` Bob Liu
2014-07-25  6:51                   ` Jan Beulich
2014-07-25  7:28                     ` Bob Liu
2014-07-25  7:36                       ` Jan Beulich
2014-07-25  8:18                         ` Bob Liu
2014-07-25  8:28                           ` Jan Beulich
2014-06-30 15:56 ` [PATCH v2 1/3] xen: delay page scrubbing to allocation path Jan Beulich
2014-07-01  8:12   ` Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53CF1021.2090209@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=lliubbo@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).