From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Liu Subject: Re: [PATCH v2 3/3] xen: use idle vcpus to scrub pages Date: Mon, 07 Jul 2014 20:20:48 +0800 Message-ID: <53BA90A0.8050907@oracle.com> References: <1404135584-29206-1-git-send-email-bob.liu@oracle.com> <1404135584-29206-3-git-send-email-bob.liu@oracle.com> <53B2979C020000780001EE97@mail.emea.novell.com> <53B2A8C7.9040601@oracle.com> <53B2CCD1020000780001F027@mail.emea.novell.com> <53B3A664.7070401@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X47uv-0000bU-F9 for xen-devel@lists.xenproject.org; Mon, 07 Jul 2014 12:21:09 +0000 In-Reply-To: <53B3A664.7070401@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Bob Liu , keir@xen.org, ian.campbell@citrix.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 07/02/2014 02:27 PM, Bob Liu wrote: > > On 07/01/2014 08:59 PM, Jan Beulich wrote: >>>>> On 01.07.14 at 14:25, wrote: >>> On 07/01/2014 05:12 PM, Jan Beulich wrote: >>>>>>> On 30.06.14 at 15:39, wrote: >>>>> @@ -948,6 +954,7 @@ static void free_heap_pages( >>>>> { >>>>> if ( !tainted ) >>>>> { >>>>> + node_need_scrub[node] = 1; >>>>> for ( i = 0; i < (1 << order); i++ ) >>>>> pg[i].count_info |= PGC_need_scrub; >>>>> } >>>> >>>> Iirc it was more than this single place where you set >>>> PGC_need_scrub, and hence where you'd now need to set the >>>> other flag too. >>>> >>> >>> I'm afraid this is the only place where PGC_need_scrub was set. >> >> Ah, indeed - I misremembered others, they are all tests for the flag. >> >>> I'm sorry for all of the coding style problems. >>> >>> By the way is there any script which can be used to check the code >>> before submitting? Something like ./scripts/checkpatch.pl under linux. >> >> No, there isn't. But avoiding (or spotting) hard tabs should be easy >> enough, and other things you ought to simply inspect your patch for >> - after all that's no different from what reviewers do. >> >>>>> + } >>>>> + >>>>> + /* free percpu free list */ >>>>> + if ( !page_list_empty(local_free_list) ) >>>>> + { >>>>> + spin_lock(&heap_lock); >>>>> + page_list_for_each_safe( pg, tmp, local_free_list ) >>>>> + { >>>>> + order = PFN_ORDER(pg); >>>>> + page_list_del(pg, local_free_list); >>>>> + for ( i = 0; i < (1 << order); i++ ) >>>>> + { >>>>> + pg[i].count_info |= PGC_state_free; >>>>> + pg[i].count_info &= ~PGC_need_scrub; >>>> >>>> This needs to happen earlier - the scrub flag should be cleared right >>>> after scrubbing, and the free flag should imo be set when the page >>>> gets freed. That's for two reasons: >>>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can >>>> allocate memory regardless of the scrub flag's state. >>> >>> AFAIR, the reason I set those flags here is to avoid a panic happen. >> >> That's pretty vague a statement. >> >>>> 2) You still detain the memory on the local lists from allocation. On a >>>> many-node system, the 16Mb per node can certainly sum up (which >>>> is not to say that I don't view the 16Mb on a single node as already >>>> problematic). >>> >>> Right, but we can adjust SCRUB_BATCH_ORDER. >>> Anyway I'll take a retry as you suggested. >> >> You should really drop the idea of removing pages temporarily. >> All you need to do is make sure a page being allocated and getting >> simultaneously scrubbed by another CPU won't get passed to the >> caller until the scrubbing finished. In particular it's no problem if >> the allocating CPU occasionally ends up scrubbing a page already >> being scrubbed elsewhere. >> > > Yes, I also like to drop percpu lists which can make things simper. But > I'm afraid which also means I can't use any spinlock(&heap_lock) any > more because of potential heavy lock contentions. I'm not sure whether > things can work fine without heap_lock. > In my attempt to get rid of heap_lock, there was a panic happen when iterating the heap free list. My implementation is like this: scrub_free_pages() { for ( zone = 0; zone < NR_ZONES; zone++ ) { for ( order = MAX_ORDER; order >= 0; order-- ) { page_list_for_each_safe( pg, tmp, &heap(node, zone, order) ) { if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) ) continue; for ( i = 0; i < (1 << order); i++ ) { if ( test_bit(_PGC_need_scrub, &(pg->count_info)) ) { scrub_one_page(&pg[i]); clear_bit(_PGC_need_scrub, &pg[i].count_info); } } } } } } The panic was in page_list_next(). I didn't find a good way to iterate the free list without holding heap_lock, but if holding the lock it might be heavy lock contention then I have to remove pages temporarily from heap free list to a percpu list. -- Regards, -Bob