From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932183Ab2BVEFz (ORCPT ); Tue, 21 Feb 2012 23:05:55 -0500 Received: from mail-bk0-f46.google.com ([209.85.214.46]:52449 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932106Ab2BVEFy (ORCPT ); Tue, 21 Feb 2012 23:05:54 -0500 Authentication-Results: mr.google.com; spf=pass (google.com: domain of koct9i@gmail.com designates 10.205.120.141 as permitted sender) smtp.mail=koct9i@gmail.com; dkim=pass header.i=koct9i@gmail.com Message-ID: <4F44699D.3020309@openvz.org> Date: Wed, 22 Feb 2012 08:05:49 +0400 From: Konstantin Khlebnikov User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.19) Gecko/20120201 Iceape/2.0.14 MIME-Version: 1.0 To: Hugh Dickins CC: KAMEZAWA Hiroyuki , Andrew Morton , Johannes Weiner , Ying Han , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup References: <20120221181321.637556cd.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hugh Dickins wrote: > On Tue, 21 Feb 2012, KAMEZAWA Hiroyuki wrote: >> On Mon, 20 Feb 2012 15:34:28 -0800 (PST) >> Hugh Dickins wrote: >> return NULL; >>> >>> + lruvec = page_lock_lruvec(page); >>> lock_page_cgroup(pc); >>> >> >> Do we need to take lrulock+irq disable per page in this very very hot path ? > > I'm sure we don't want to: I hope you were pleased to find it goes away > (from most cases) a couple of patches later. > > I had lruvec lock nested inside page_cgroup lock in the rollup I sent in > December, whereas you went for page_cgroup lock nested inside lruvec lock > in your lrucare patch. > > I couldn't find an imperative reason why they should be one way round or > the other, so I tried hard to stick with your ordering, and it did work > (in this 6/10). But then I couldn't work out how to get rid of the > overheads added in doing it this way round, so swapped them back. > >> >> Hmm.... How about adding NR_ISOLATED counter into lruvec ? >> >> Then, we can delay freeing lruvec until all conunters goes down to zero. >> as... >> >> bool we_can_free_lruvec = true; >> >> lock_lruvec(lruvec->lock); >> for_each_lru_lruvec(lru) >> if (!list_empty(&lruvec->lru[lru])) >> we_can_free_lruvec = false; >> if (lruvec->nr_isolated) >> we_can_free_lruvec = false; >> unlock_lruvec(lruvec) >> if (we_can_free_lruvec) >> kfree(lruvec); >> >> If compaction, lumpy reclaim free a page taken from LRU, >> it knows what it does and can decrement lruvec->nr_isolated properly >> (it seems zone's NR_ISOLATED is decremented at putback.) > > At the moment I'm thinking that what we end up with by 9/10 is > better than adding such a refcount. But I'm not entirely happy with > mem_cgroup_reset_uncharged_to_root (it adds a further page_cgroup > lookup just after I got rid of some others), and need yet to think > about the race which Konstantin posits, so all options remain open. This lruvec->nr_isolated seem reasonable, and its managegin not very costly. In move_account() we anyway need to touch old_lruvec->lru_lock after recharge, to stabilize PageLRU() before adding page to new_lruvec. (because that race) In migration/compaction this handled automatically, because they always call putback_lru_page() at the end. Main problem is shrink_page_list() for lumpy-reclaim, but seems like it never used if memory compaction is enabled, so it can be slow and ineffective with tons of lru_list relocks. > > Hugh