From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757682AbaHGOer (ORCPT ); Thu, 7 Aug 2014 10:34:47 -0400 Received: from cantor2.suse.de ([195.135.220.15]:37045 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757488AbaHGOeq (ORCPT ); Thu, 7 Aug 2014 10:34:46 -0400 Message-ID: <53E38E81.3030301@suse.cz> Date: Thu, 07 Aug 2014 16:34:41 +0200 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Joonsoo Kim , Andrew Morton CC: "Kirill A. Shutemov" , Rik van Riel , Mel Gorman , Johannes Weiner , Minchan Kim , Yasuaki Ishimatsu , Zhang Yanfei , "Srivatsa S. Bhat" , Tang Chen , Naoya Horiguchi , Bartlomiej Zolnierkiewicz , Wen Congyang , Marek Szyprowski , Michal Nazarewicz , Laura Abbott , Heesub Shin , "Aneesh Kumar K.V" , Ritesh Harjani , t.stanislaws@samsung.com, Gioh Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation References: <1407309517-3270-1-git-send-email-iamjoonsoo.kim@lge.com> <1407309517-3270-8-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1407309517-3270-8-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/06/2014 09:18 AM, Joonsoo Kim wrote: > We got migratetype of the freeing page without holding the zone lock so > it could be racy. There are two cases of this race. > > 1. pages are added to isolate buddy list after restoring original > migratetype. > 2. pages are added to normal buddy list while pageblock is isolated. > > If case 1 happens, we can't allocate freepages on isolate buddy list > until next pageblock isolation occurs. > In case of 2, pages could be merged with pages on isolate buddy list and > located on normal buddy list. This makes freepage counting incorrect > and break the property of pageblock isolation. > > One solution to this problem is checking pageblock migratetype with > holding zone lock in __free_one_page() and I posted it before, but, > it didn't get welcome since it needs the hook in zone lock critical > section on freepath. > > This is another solution to this problem and impose most overhead on > pageblock isolation logic. Following is how this solution works. > > 1. Extends irq disabled period on freepath to call > get_pfnblock_migratetype() with irq disabled. With this, we can be > sure that future freed pages will see modified pageblock migratetype > after certain synchronization point so we don't need to hold the zone > lock to get correct pageblock migratetype. Although it extends irq > disabled period on freepath, I guess it is marginal and better than > adding the hook in zone lock critical section. > > 2. #1 requires IPI for synchronization and we can't hold the zone lock It would be better to explain here that the synchronization point is pcplists draining. > during processing IPI. In this time, some pages could be moved from buddy > list to pcp list on page allocation path and later it could be moved again > from pcp list to buddy list. In this time, this page would be on isolate It is difficult to understand the problem just by reading this. I guess the timelines you included while explaining the problem to me, would help here :) > pageblock, so, the hook is required on free_pcppages_bulk() to prevent More clearly, a recheck for pageblock's migratetype would be needed in free_pcppages_bulk(), which would again impose overhead outside isolation. > misplacement. To remove this possibility, disabling and draining pcp > list is needed during isolation. It guaratees that there is no page on pcp > list on all cpus while isolation, so misplacement problem can't happen. > > Note that this doesn't fix freepage counting problem. To fix it, > we need more logic. Following patches will do it. > > Signed-off-by: Joonsoo Kim > --- > mm/internal.h | 2 ++ > mm/page_alloc.c | 27 ++++++++++++++++++++------- > mm/page_isolation.c | 45 +++++++++++++++++++++++++++++++++------------ > 3 files changed, 55 insertions(+), 19 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index a1b651b..81b8884 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); > /* > * in mm/page_alloc.c > */ > +extern void zone_pcp_disable(struct zone *zone); > +extern void zone_pcp_enable(struct zone *zone); > extern void __free_pages_bootmem(struct page *page, unsigned int order); > extern void prep_compound_page(struct page *page, unsigned long order); > #ifdef CONFIG_MEMORY_FAILURE > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3e1e344..4517b1d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ > __free_one_page(page, page_to_pfn(page), zone, 0, mt); > trace_mm_page_pcpu_drain(page, 0, mt); > - if (likely(!is_migrate_isolate_page(page))) { > - __mod_zone_page_state(zone, NR_FREE_PAGES, 1); > - if (is_migrate_cma(mt)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); > - } > + __mod_zone_freepage_state(zone, 1, mt); Could be worth mentioning that this can now be removed as it was an incomplete attempt to fix freepage counting, but didn't address the misplacement. > } while (--to_free && --batch_free && !list_empty(list)); > } > spin_unlock(&zone->lock); > @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order) > if (!free_pages_prepare(page, order)) > return; > > - migratetype = get_pfnblock_migratetype(page, pfn); > local_irq_save(flags); > + migratetype = get_pfnblock_migratetype(page, pfn); > __count_vm_events(PGFREE, 1 << order); > set_freepage_migratetype(page, migratetype); > free_one_page(page_zone(page), page, pfn, order, migratetype); > @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold) > if (!free_pages_prepare(page, 0)) > return; > > + local_irq_save(flags); > migratetype = get_pfnblock_migratetype(page, pfn); > set_freepage_migratetype(page, migratetype); > - local_irq_save(flags); > __count_vm_event(PGFREE); Maybe add comments to these two to make it clear that this cannot be moved outside of the irq disabled part, in case anyone considers it (again) in the future? > @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) > */ > > out: > - if (!ret) { > - unsigned long nr_pages; > - int migratetype = get_pageblock_migratetype(page); > + if (ret) { > + spin_unlock_irqrestore(&zone->lock, flags); > + return ret; > + } > on pcplists > - set_pageblock_migratetype(page, MIGRATE_ISOLATE); > - nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE); > + migratetype = get_pageblock_migratetype(page); > + set_pageblock_migratetype(page, MIGRATE_ISOLATE); > + spin_unlock_irqrestore(&zone->lock, flags); > > - __mod_zone_freepage_state(zone, -nr_pages, migratetype); > - } > + zone_pcp_disable(zone); > + > + /* > + * After this point, freed pages will see MIGRATE_ISOLATE as > + * their pageblock migratetype on all cpus. And pcp list has > + * no free page. > + */ > + on_each_cpu(drain_local_pages, NULL, 1); Is there any difference between drain_all_pages() and this, or why didn't you use drain_all_pages()?