Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bob Liu <bob.liu@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>,
	keir@xen.org, ian.campbell@citrix.com,
	George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages
Date: Wed, 23 Jul 2014 09:30:09 +0800	[thread overview]
Message-ID: <53CF1021.2090209@oracle.com> (raw)
In-Reply-To: <20140723003858.GB5022@laptop.dumpdata.com>


On 07/23/2014 08:38 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 15, 2014 at 05:16:33PM +0800, Bob Liu wrote:
>> Hi Jan & Konrad,
>>
>> On 07/01/2014 08:59 PM, Jan Beulich wrote:
>>>>>> On 01.07.14 at 14:25, <bob.liu@oracle.com> wrote:
>>>> On 07/01/2014 05:12 PM, Jan Beulich wrote:
>>>>>>>> On 30.06.14 at 15:39, <lliubbo@gmail.com> wrote:
>>>>>> @@ -948,6 +954,7 @@ static void free_heap_pages(
>>>>>>      {
>>>>>>          if ( !tainted )
>>>>>>          {
>>>>>> +            node_need_scrub[node] = 1;
>>>>>>              for ( i = 0; i < (1 << order); i++ )
>>>>>>                  pg[i].count_info |= PGC_need_scrub;
>>>>>>          }
>>>>>
>>>>> Iirc it was more than this single place where you set
>>>>> PGC_need_scrub, and hence where you'd now need to set the
>>>>> other flag too.
>>>>>
>>>>
>>>> I'm afraid this is the only place where PGC_need_scrub was set.
>>>
>>> Ah, indeed - I misremembered others, they are all tests for the flag.
>>>
>>>> I'm sorry for all of the coding style problems.
>>>>
>>>> By the way is there any script which can be used to check the code
>>>> before submitting? Something like ./scripts/checkpatch.pl under linux.
>>>
>>> No, there isn't. But avoiding (or spotting) hard tabs should be easy
>>> enough, and other things you ought to simply inspect your patch for
>>> - after all that's no different from what reviewers do.
>>>
>>>>>> +    }
>>>>>> +
>>>>>> +    /* free percpu free list */
>>>>>> +    if ( !page_list_empty(local_free_list) )
>>>>>> +    {
>>>>>> +        spin_lock(&heap_lock);
>>>>>> +        page_list_for_each_safe( pg, tmp, local_free_list )
>>>>>> +        {
>>>>>> +            order = PFN_ORDER(pg);
>>>>>> +            page_list_del(pg, local_free_list);
>>>>>> +            for ( i = 0; i < (1 << order); i++ )
>>>>>> +	    {
>>>>>> +                pg[i].count_info |= PGC_state_free;
>>>>>> +                pg[i].count_info &= ~PGC_need_scrub;
>>>>>
>>>>> This needs to happen earlier - the scrub flag should be cleared right
>>>>> after scrubbing, and the free flag should imo be set when the page
>>>>> gets freed. That's for two reasons:
>>>>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can
>>>>> allocate memory regardless of the scrub flag's state.
>>>>
>>>> AFAIR, the reason I set those flags here is to avoid a panic happen.
>>>
>>> That's pretty vague a statement.
>>>
>>>>> 2) You still detain the memory on the local lists from allocation. On a
>>>>> many-node system, the 16Mb per node can certainly sum up (which
>>>>> is not to say that I don't view the 16Mb on a single node as already
>>>>> problematic).
>>>>
>>>> Right, but we can adjust SCRUB_BATCH_ORDER.
>>>> Anyway I'll take a retry as you suggested.
>>>
>>> You should really drop the idea of removing pages temporarily.
>>> All you need to do is make sure a page being allocated and getting
>>> simultaneously scrubbed by another CPU won't get passed to the
>>> caller until the scrubbing finished. In particular it's no problem if
>>> the allocating CPU occasionally ends up scrubbing a page already
>>> being scrubbed elsewhere.
>>>
>>
>> After so many days I haven't make a workable solution if don't remove
>> pages temporarily. The hardest part is iterating the heap free list
>> without holding heap_lock because if holding the lock it might be heavy
>> lock contention.
> 
> Why do you need to hold on the heap_lock?
> 
> I presume that the flag changes need an cmpxchg operation. In which case
> could you add another flag that would be 'PG_scrubbing_now` - and the
> allocator could bypass all of the pages that have said flag (and also
> PG_need_scrub).

That's what I'm struggling with these days.

> 
> If the allocator can't find enough pages to satisfy the demands - it goes
> back in and looks at PG_need_scrub pages and scrubs those.
> 
> In parallel, another thread (threads?) pick the ones that have PG_need_scrub
> and change them to PG_scrubbing_now and scrubs them. When it has
> completed it removes both flags.
> 

But there is still race condition:

A:alloc path                        B:idle scrub path

                                    page = list_entry(&heap_list);

^^^
Delist the page from heap_list
(Can happen any time)

                                    if (PG_need_scrub)
                                        set PG_scrubbing_now
                                    scrub and clear both flags
                                    page = page_list_next(page);
                                           ^^^ Get panic


Another problem Jan pointed out before is if order=20 and being scrubbed
in idle vcpus, alloc path will fail because of the PG_scrubbing_now
flag. This will introduce a long time we have large memory but can't
allocate any page.

>> So do you think it's acceptable if fixed all other concerns about this
>> patch?
>>
>> And there is another idea based on the assumption that only guest
>> allocation need all pages to be scrubbed(It's safe for xen hypervisor to
>> use an un-scrubbed page):
>> 1. set the need_scrub flag in free_heap_pages().
>> 2. alloc pages from heap free list, don't scrub and neither clear the
>> need_scrub flag.
>> 3. When the guest accessing the real memory(machine pages) the first
>> time, page fault should happen. We track this event and before build the
> 
> I think you mean that we can set the EPT (or shadow) all of those PG_need_scrub

No, I mean at the point when setting the EPT or shadow page table entry,
scrub the related machine page if it has the PG_need_scrub flag.

I think in Pod mode the place is in p2m_set_entry(), scrub the mfn
related page if PG_need_scrub before we set up the entry.

> pages to be r/o. And then when the guest access them we would trap in
> the hypervisor and scrub them. However, that means the guest can
> still read those pages without trapping - and that means we might
> be reading sensitive data from pages that came from another guest!
> 
> 
>> right PFN to MFN page table mappings(correct me if it doesn't work as
>> this), at this place scrub the page if the need_scrub flag was set.
>>
>> By this way only guest real used pages are needed scrubbing and the time
>> is scattered.
>> How do you think of this solution?

-- 
Regards,
-Bob

next prev parent reply	other threads:[~2014-07-23  1:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 13:39 [PATCH v2 1/3] xen: delay page scrubbing to allocation path Bob Liu
2014-06-30 13:39 ` [PATCH v2 2/3] xen: introduce function merge_free_trunks Bob Liu
2014-06-30 15:58   ` Jan Beulich
2014-07-01  8:14     ` Bob Liu
2014-07-01  8:27       ` Jan Beulich
2014-06-30 13:39 ` [PATCH v2 3/3] xen: use idle vcpus to scrub pages Bob Liu
2014-07-01  9:12   ` Jan Beulich
2014-07-01 12:25     ` Bob Liu
2014-07-01 12:59       ` Jan Beulich
2014-07-02  6:27         ` Bob Liu
2014-07-07 12:20           ` Bob Liu
2014-07-15  9:16         ` Bob Liu
2014-07-23  0:38           ` Konrad Rzeszutek Wilk
2014-07-23  1:30             ` Bob Liu [this message]
2014-07-23  7:28           ` Jan Beulich
2014-07-24  2:08             ` Bob Liu
2014-07-24  6:24               ` Jan Beulich
2014-07-25  0:42                 ` Bob Liu
2014-07-25  6:51                   ` Jan Beulich
2014-07-25  7:28                     ` Bob Liu
2014-07-25  7:36                       ` Jan Beulich
2014-07-25  8:18                         ` Bob Liu
2014-07-25  8:28                           ` Jan Beulich
2014-06-30 15:56 ` [PATCH v2 1/3] xen: delay page scrubbing to allocation path Jan Beulich
2014-07-01  8:12   ` Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53CF1021.2090209@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=lliubbo@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.