From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 78722900001 for ; Fri, 29 Apr 2011 11:47:05 -0400 (EDT) Received: by qwa26 with SMTP id 26so2634492qwa.14 for ; Fri, 29 Apr 2011 08:47:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110428110623.GU4658@suse.de> References: <51e7412097fa62f86656c77c1934e3eb96d5eef6.1303833417.git.minchan.kim@gmail.com> <20110428110623.GU4658@suse.de> Date: Sat, 30 Apr 2011 00:47:03 +0900 Message-ID: Subject: Re: [RFC 6/8] In order putback lru core From: Minchan Kim Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , linux-mm , LKML , Christoph Lameter , Johannes Weiner , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Rik van Riel , Andrea Arcangeli On Thu, Apr 28, 2011 at 8:06 PM, Mel Gorman wrote: > On Wed, Apr 27, 2011 at 01:25:23AM +0900, Minchan Kim wrote: >> This patch defines new APIs to putback the page into previous position o= f LRU. >> The idea is simple. >> >> When we try to putback the page into lru list and if friends(prev, next)= of the pages >> still is nearest neighbor, we can insert isolated page into prev's next = instead of >> head of LRU list. So it keeps LRU history without losing the LRU informa= tion. >> >> Before : >> =C2=A0 =C2=A0 =C2=A0 LRU POV : H - P1 - P2 - P3 - P4 -T >> >> Isolate P3 : >> =C2=A0 =C2=A0 =C2=A0 LRU POV : H - P1 - P2 - P4 - T >> >> Putback P3 : >> =C2=A0 =C2=A0 =C2=A0 if (P2->next =3D=3D P4) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 putback(P3, P2); >> =C2=A0 =C2=A0 =C2=A0 So, >> =C2=A0 =C2=A0 =C2=A0 LRU POV : H - P1 - P2 - P3 - P4 -T >> >> For implement, we defines new structure pages_lru which remebers >> both lru friend pages of isolated one and handling functions. >> >> But this approach has a problem on contiguous pages. >> In this case, my idea can not work since friend pages are isolated, too. >> It means prev_page->next =3D=3D next_page always is false and both pages= are not >> LRU any more at that time. It's pointed out by Rik at LSF/MM summit. >> So for solving the problem, I can change the idea. >> I think we don't need both friend(prev, next) pages relation but >> just consider either prev or next page that it is still same LRU. >> Worset case in this approach, prev or next page is free and allocate new >> so it's in head of LRU and our isolated page is located on next of head. >> But it's almost same situation with current problem. So it doesn't make = worse >> than now and it would be rare. But in this version, I implement based on= idea >> discussed at LSF/MM. If my new idea makes sense, I will change it. >> >> Any comment? >> >> Cc: KOSAKI Motohiro >> Cc: Mel Gorman >> Cc: Rik van Riel >> Cc: Andrea Arcangeli >> Signed-off-by: Minchan Kim >> --- >> =C2=A0include/linux/migrate.h =C2=A0| =C2=A0 =C2=A02 + >> =C2=A0include/linux/mm_types.h | =C2=A0 =C2=A07 ++++ >> =C2=A0include/linux/swap.h =C2=A0 =C2=A0 | =C2=A0 =C2=A04 ++- >> =C2=A0mm/compaction.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A03= +- >> =C2=A0mm/internal.h =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 = =C2=A02 + >> =C2=A0mm/memcontrol.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A02= +- >> =C2=A0mm/migrate.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 36= +++++++++++++++++++++ >> =C2=A0mm/swap.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|= =C2=A0 =C2=A02 +- >> =C2=A0mm/vmscan.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2= =A0 79 +++++++++++++++++++++++++++++++++++++++++++-- >> =C2=A09 files changed, 129 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >> index e39aeec..3aa5ab6 100644 >> --- a/include/linux/migrate.h >> +++ b/include/linux/migrate.h >> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned = long private, int **); >> =C2=A0#ifdef CONFIG_MIGRATION >> =C2=A0#define PAGE_MIGRATION 1 >> >> +extern void putback_pages_lru(struct list_head *l); >> =C2=A0extern void putback_lru_pages(struct list_head *l); >> =C2=A0extern int migrate_page(struct address_space *, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 struct page *, struct page *); >> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct addre= ss_space *mapping, >> =C2=A0#else >> =C2=A0#define PAGE_MIGRATION 0 >> >> +static inline void putback_pages_lru(struct list_head *l) {} >> =C2=A0static inline void putback_lru_pages(struct list_head *l) {} >> =C2=A0static inline int migrate_pages(struct list_head *l, new_page_t x, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unsigned long private, = bool offlining, >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index ca01ab2..35e80fb 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -102,6 +102,13 @@ struct page { >> =C2=A0#endif >> =C2=A0}; >> >> +/* This structure is used for keeping LRU ordering of isolated page */ >> +struct pages_lru { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *page; =C2=A0 =C2=A0 =C2=A0/* i= solated page */ >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *prev_page; /* previous page of= isolate page as LRU order */ >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *next_page; /* next page of iso= late page as LRU order */ >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct list_head lru; >> +}; >> =C2=A0/* > > So this thing has to be allocated from somewhere. We can't put it > on the stack as we're already in danger there so we must be using > kmalloc. In the reclaim paths, this should be avoided obviously. > For compaction, we might hurt the compaction success rates if pages > are pinned with control structures. It's something to be wary of. Actually, 32 page unit of migration in compaction should be not a big probl= em. Maybe, just one page is enough to keep pages_lru arrays but I admit dynamic allocation in reclaim patch and pinning the page in compaction path isn't good. So I have thought pagevec-like approach which is static allocated. But the approach should be assume the migration unit should be small like SWAP_CLUSTER_MAX It's true now but actually I don't want to depends on the size. But for free the size limitation, we have to allocate new page dynamically, Hmm, any ideas? Personally, I don't have any idea other than pagevec-like approach of 32 pa= ges. > > At LSF/MM, I stated a preference for swapping the source and > destination pages in the LRU. This unfortunately means that the LRU I can't understand your point. swapping source and destination? Could you elaborate on it? > now contains a page in the process of being migrated to and the backout > paths for migration failure become a lot more complex. Reclaim should > be ok as it'll should fail to lock the page and recycle it in the list. > This avoids allocations but I freely admit that I'm not in the position > to implement such a thing right now :( > >> =C2=A0 * A region containing a mapping of a non-memory backed file under= NOMMU >> =C2=A0 * conditions. =C2=A0These are held in a global tree and are pinne= d by the VMAs that >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index baef4ad..4ad0a0c 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *pag= e); >> =C2=A0extern void deactivate_page(struct page *page); >> =C2=A0extern void swap_setup(void); >> >> +extern void update_page_reclaim_stat(struct zone *zone, struct page *pa= ge, >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int file, int rotate= d); >> =C2=A0extern void add_page_to_unevictable_list(struct page *page); >> >> =C2=A0/** >> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(str= uct mem_cgroup *mem, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 struct zone *zone, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 unsigned long *nr_scanned); >> =C2=A0extern int __isolate_lru_page(struct page *page, int mode, int fil= e, >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 int not_dirty, int not_mapped); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int not_dirty, int not_mappe= d, struct pages_lru *pages_lru); >> =C2=A0extern unsigned long shrink_all_memory(unsigned long nr_pages); >> =C2=A0extern int vm_swappiness; >> =C2=A0extern int remove_mapping(struct address_space *mapping, struct pa= ge *page); >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 653b02b..c453000 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zon= e *zone, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Try isolate the page= */ >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (__isolate_lru_page(page,= ISOLATE_BOTH, 0, !cc->sync, 0) !=3D 0) >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (__isolate_lru_page(page,= ISOLATE_BOTH, 0, >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 !cc->sync, 0, NULL)= !=3D 0) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 continue; >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VM_BUG_ON(PageTransComp= ound(page)); >> diff --git a/mm/internal.h b/mm/internal.h >> index d071d38..3c8182c 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn; >> =C2=A0 * in mm/vmscan.c: >> =C2=A0 */ >> =C2=A0extern int isolate_lru_page(struct page *page); >> +extern bool keep_lru_order(struct pages_lru *pages_lru); >> +extern void putback_page_to_lru(struct page *page, struct list_head *he= ad); >> =C2=A0extern void putback_lru_page(struct page *page); >> >> =C2=A0/* >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 471e7fd..92a9046 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned lo= ng nr_to_scan, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 continue; >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 scan++; >> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D __isolate_lru_page(p= age, mode, file, 0, 0); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D __isolate_lru_page(p= age, mode, file, 0, 0, NULL); >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 switch (ret) { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 case 0: >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 list_move(&page->lru, dst); >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 819d233..9cfb63b 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l) >> =C2=A0} >> >> =C2=A0/* >> + * This function is almost same iwth putback_lru_pages. >> + * The difference is that function receives struct pages_lru list >> + * and if possible, we add pages into original position of LRU >> + * instead of LRU's head. >> + */ >> +void putback_pages_lru(struct list_head *l) >> +{ >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct pages_lru *isolated_page; >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct pages_lru *isolated_page2; >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0struct page *page; >> + >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0list_for_each_entry_safe(isolated_page, iso= lated_page2, l, lru) { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct zone *zo= ne; >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page =3D isolat= ed_page->page; >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0list_del(&isola= ted_page->lru); >> + >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dec_zone_page_s= tate(page, NR_ISOLATED_ANON + >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0page_is_file_cache(page)); >> + >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zone =3D page_z= one(page); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_lock_irq(&= zone->lru_lock); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (keep_lru_or= der(isolated_page)) { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0putback_page_to_lru(page, &isolated_page->prev_page->lru); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0spin_unlock_irq(&zone->lru_lock); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0spin_unlock_irq(&zone->lru_lock); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0putback_lru_page(page); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >> + >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kfree(isolated_= page); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0} >> +} > > I think we also need counters at least at discussion stage to see > how successful this is. Actually, I had some numbers but don't published it. (4core, 1G DRAM and THP always, big stress(qemu, compile, web browsing, eclipse and other programs fork, successful rate is about 50%) That's because I have another idea to solve contiguous PFN problem. Please, see the description and reply of Rik's question on this thread. I guess the approach could make a high successful rate if there isn't concurrent direct compaction processes. > > For example, early in the system there is a casual relationship > between the age of a page and its location in physical memory. The > buddy allocator gives pages back in PFN order where possible and > there is a loose relationship between when pages get allocated and > when they get reclaimed. As compaction is a linear scanner, there > is a likelihood (that is highly variable) that physically contiguous > pages will have similar positions in the LRU. They'll be isolated at > the same time meaning they also won't be put back in order. Indeed! but I hope my second idea would solve the problem. What do you think about it? > > This might cease to matter when the system is running for some time but > it's a concern. Yes. It depends on aging of workloads. I think it's not easy to get a meaningful number to justify all. :( --=20 Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org