From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx170.postini.com [74.125.245.170]) by kanga.kvack.org (Postfix) with SMTP id 61C046B0082 for ; Thu, 16 Feb 2012 10:55:01 -0500 (EST) Received: by bkty12 with SMTP id y12so2767784bkt.14 for ; Thu, 16 Feb 2012 07:54:59 -0800 (PST) Message-ID: <4F3D26CF.2040102@openvz.org> Date: Thu, 16 Feb 2012 19:54:55 +0400 From: Konstantin Khlebnikov MIME-Version: 1.0 Subject: Re: [PATCH RFC 00/15] mm: memory book keeping and lru_lock splitting References: <20120215224221.22050.80605.stgit@zurg> <20120216110408.f35c3448.kamezawa.hiroyu@jp.fujitsu.com> <4F3C9798.7050800@openvz.org> <20120216172409.5fa18608.kamezawa.hiroyu@jp.fujitsu.com> <4F3CE243.9050203@openvz.org> In-Reply-To: <4F3CE243.9050203@openvz.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Hugh Dickins , "hannes@cmpxchg.org" Konstantin Khlebnikov wrote: > KAMEZAWA Hiroyuki wrote: >> On Thu, 16 Feb 2012 09:43:52 +0400 >> Konstantin Khlebnikov wrote: >> >>> KAMEZAWA Hiroyuki wrote: >>>> On Thu, 16 Feb 2012 02:57:04 +0400 >>>> Konstantin Khlebnikov wrote: >> >>>>> * optimize page to book translations, move it upper in the call stack, >>>>> replace some struct zone arguments with struct book pointer. >>>>> >>>> >>>> a page->book transrater from patch 2/15 >>>> >>>> +struct book *page_book(struct page *page) >>>> +{ >>>> + struct mem_cgroup_per_zone *mz; >>>> + struct page_cgroup *pc; >>>> + >>>> + if (mem_cgroup_disabled()) >>>> + return&page_zone(page)->book; >>>> + >>>> + pc = lookup_page_cgroup(page); >>>> + if (!PageCgroupUsed(pc)) >>>> + return&page_zone(page)->book; >>>> + /* Ensure pc->mem_cgroup is visible after reading PCG_USED. */ >>>> + smp_rmb(); >>>> + mz = mem_cgroup_zoneinfo(pc->mem_cgroup, >>>> + page_to_nid(page), page_zonenum(page)); >>>> + return&mz->book; >>>> +} >>>> >>>> What happens when pc->mem_cgroup is rewritten by move_account() ? >>>> Where is the guard for lockless access of this ? >>> >>> Initially this suppose to be protected with lru_lock, in final patch they are protected with rcu. >> >> Hmm, VM_BUG_ON(!PageLRU(page)) ? > > Where? > >> >> move_account() overwrites pc->mem_cgroup with isolating page from LRU. >> but it doesn't take lru_lock. > > There three kinds of lock_page_book() users: > 1) caller want to catch page in LRU, it will lock either old or new book and > recheck PageLRU() after locking, if page not it in LRU it don't touch anything. > some of these functions has stable reference to page, some of them not. > [ There actually exist small race, I knew about it, just forget to pick this chunk from old code. See below. ] > 2) page is isolated by caller, it want to put it back. book link is stable. no problems. > 3) page-release functions. page-counter is zero. no references -- no problems. > > race for 1) > > catcher switcher > > # isolate > old_book = lock_page_book(page) > ClearPageLRU(page) > unlock_book(old_book) > # charge > old_book = lock_page_book(page) > # switch > page->book = new_book > # putback > lock_book(new_book) > SetPageLRU(page) > unlock_book(new_book) > if (PageLRU(page)) > oops, page actually in new_book > unlock_book(old_book) > > > I'll protect "switch" phase with old_book lru-lock: > > lock_book(old_book) > page->book = new_book > unlock_book(old_book) I found better solution for switcher sequence: #isolate old_book = lock_page_book(page) ClearPageLRU(page) unlock_book(old_book) #charge #switch page->book = new_book spin_unlock_wait(&old_book->lru_lock) #putback lock_book(new_book) SetPageLRU(page) unlock_book(new_book) this spin_unlock_wait() effectively stabilize PageLRU() sign for potential old_book lock holder. > > The other option is recheck in "catcher" page book after PageLRU() > maybe there exists some other variants. > >> BTW, what amount of perfomance benefit ? > > It depends, but usually lru_lock is very-very hot. > This lock splitting can be used without cgroups and containers, > now huge zones can be easily sliced into arbitrary pieces, for example one book per 256Mb. > > > > According to my experience, one of complicated thing there is how to postpone "book" destroying > if some its pages are isolated. For example lumpy reclaim and memory compaction isolates pages > from several books. And they wants to put them back. Currently this can be broken, if someone removes > cgroup in wrong moment. There appears funny races with three players: catcher, switcher and destroyer. > This can be fixed with some extra reference-counting or some other sleepable synchronizing. > In my rhel6-based implementation I uses extra reference-counting, and it looks ugly. So I want to invent something better. > Other option is just never release books, reuse them after rcu grace period for rcu-list iterating. Looks like it is not broken, charged page will keep memcg books alive. To make it completely safe rcu-free callback must wait on spin_unlock_wait(book->lru_lock). > >> >> Thanks, >> -Kame >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org