From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Ying Han <yinghan@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 9/10] mm/memcg: move lru_lock into lruvec
Date: Tue, 21 Feb 2012 11:08:48 +0400 [thread overview]
Message-ID: <4F434300.3080001@openvz.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1202201537040.23274@eggly.anvils>
Hugh Dickins wrote:
> We're nearly there. Now move lru_lock and irqflags into struct lruvec,
> so they are in every zone (for !MEM_RES_CTLR and mem_cgroup_disabled()
> cases) and in every memcg lruvec.
>
> Extend the memcg version of page_relock_lruvec() to drop old and take
> new lock whenever changing lruvec. But the memcg will only be stable
> once we already have the lock: so, having got it, check if it's still
> the lock we want, and retry if not. It's for this retry that we route
> all page lruvec locking through page_relock_lruvec().
>
> No need for lock_page_cgroup() in here (which would entail reinverting
> the lock ordering, and _irq'ing all of its calls): the lrucare protocol
> when charging (holding old lock while changing owner then acquiring new)
> fits correctly with this retry protocol. In some places we rely also on
> page_count 0 preventing further references, in some places on !PageLRU
> protecting a page from outside interference: mem_cgroup_move_account()
>
> What if page_relock_lruvec() were preempted for a while, after reading
> a valid mem_cgroup from page_cgroup, but before acquiring the lock?
> In that case, a rmdir might free the mem_cgroup and its associated
> zoneinfo, and we take a spin_lock in freed memory. But rcu_read_lock()
> before we read mem_cgroup keeps it safe: cgroup.c uses synchronize_rcu()
> in between pre_destroy (force_empty) and destroy (freeing structures).
> mem_cgroup_force_empty() cannot succeed while there's any charge, or any
> page on any of its lrus - and checks list_empty() while holding the lock.
Heh, your code is RCU-protected too. =)
On lumpy/compaction isolate you do:
if (!PageLRU(page))
continue
__isolate_lru_page()
page_relock_rcu_vec()
rcu_read_lock()
rcu_dereference()...
spin_lock()...
rcu_read_unlock()
You protect page_relock_rcu_vec with switching pointers back to root.
I do:
catch_page_lru()
rcu_read_lock()
if (!PageLRU(page))
return false
rcu_dereference()...
spin_lock()...
rcu_read_unlock()
if (PageLRU())
return true
if true
__isolate_lru_page()
I protect my catch_page_lruvec() with PageLRU() under single rcu-interval with locking.
Thus my code is better, because it not requires switching pointers back to root memcg.
Meanwhile after seeing your patches, I realized that this rcu-protection is
required only for lock-by-pfn in lumpy/compaction isolation.
Thus my locking should be simplified and optimized.
>
> But although we are now fully prepared, in this patch keep on using
> the zone->lru_lock for all of its memcgs: so that the cost or benefit
> of split locking can be easily compared with the final patch (but
> of course, some costs and benefits come earlier in the series).
>
> Signed-off-by: Hugh Dickins<hughd@google.com>
> ---
> include/linux/mmzone.h | 4 +-
> include/linux/swap.h | 13 +++---
> mm/memcontrol.c | 74 ++++++++++++++++++++++++++-------------
> mm/page_alloc.c | 2 -
> 4 files changed, 59 insertions(+), 34 deletions(-)
>
> --- mmotm.orig/include/linux/mmzone.h 2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/include/linux/mmzone.h 2012-02-18 11:58:09.047525220 -0800
> @@ -174,6 +174,8 @@ struct zone_reclaim_stat {
>
> struct lruvec {
> struct zone *zone;
> + spinlock_t lru_lock;
> + unsigned long irqflags;
> struct list_head lists[NR_LRU_LISTS];
> struct zone_reclaim_stat reclaim_stat;
> };
> @@ -373,8 +375,6 @@ struct zone {
> ZONE_PADDING(_pad1_)
>
> /* Fields commonly accessed by the page reclaim scanner */
> - spinlock_t lru_lock;
> - unsigned long irqflags;
> struct lruvec lruvec;
>
> unsigned long pages_scanned; /* since last reclaim */
> --- mmotm.orig/include/linux/swap.h 2012-02-18 11:57:42.675524592 -0800
> +++ mmotm/include/linux/swap.h 2012-02-18 11:58:09.047525220 -0800
> @@ -252,25 +252,24 @@ static inline void lru_cache_add_file(st
>
> static inline spinlock_t *lru_lockptr(struct lruvec *lruvec)
> {
> - return&lruvec->zone->lru_lock;
> + /* Still use per-zone lru_lock */
> + return&lruvec->zone->lruvec.lru_lock;
> }
>
> static inline void lock_lruvec(struct lruvec *lruvec)
> {
> - struct zone *zone = lruvec->zone;
> unsigned long irqflags;
>
> - spin_lock_irqsave(&zone->lru_lock, irqflags);
> - zone->irqflags = irqflags;
> + spin_lock_irqsave(lru_lockptr(lruvec), irqflags);
> + lruvec->irqflags = irqflags;
> }
>
> static inline void unlock_lruvec(struct lruvec *lruvec)
> {
> - struct zone *zone = lruvec->zone;
> unsigned long irqflags;
>
> - irqflags = zone->irqflags;
> - spin_unlock_irqrestore(&zone->lru_lock, irqflags);
> + irqflags = lruvec->irqflags;
> + spin_unlock_irqrestore(lru_lockptr(lruvec), irqflags);
> }
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> --- mmotm.orig/mm/memcontrol.c 2012-02-18 11:58:02.451525062 -0800
> +++ mmotm/mm/memcontrol.c 2012-02-18 11:58:09.051525220 -0800
> @@ -1048,39 +1048,64 @@ void page_relock_lruvec(struct page *pag
> struct page_cgroup *pc;
> struct lruvec *lruvec;
>
> - if (mem_cgroup_disabled())
> + if (unlikely(mem_cgroup_disabled())) {
> lruvec =&page_zone(page)->lruvec;
> - else {
> - pc = lookup_page_cgroup(page);
> - memcg = pc->mem_cgroup;
> - /*
> - * At present we start up with all page_cgroups initialized
> - * to zero: correct that to root_mem_cgroup once we see it.
> - */
> - if (unlikely(!memcg))
> - memcg = pc->mem_cgroup = root_mem_cgroup;
> - /*
> - * We must reset pc->mem_cgroup back to root before freeing
> - * a page: avoid additional callouts from hot paths by doing
> - * it here when we see the page is frozen (can safely be done
> - * before taking lru_lock because the page is frozen).
> - */
> - if (memcg != root_mem_cgroup&& !page_count(page))
> - pc->mem_cgroup = root_mem_cgroup;
> - mz = page_cgroup_zoneinfo(memcg, page);
> - lruvec =&mz->lruvec;
> + if (*lruvp&& *lruvp != lruvec) {
> + unlock_lruvec(*lruvp);
> + *lruvp = NULL;
> + }
> + if (!*lruvp) {
> + *lruvp = lruvec;
> + lock_lruvec(lruvec);
> + }
> + return;
> }
>
> + pc = lookup_page_cgroup(page);
> + /*
> + * Imagine being preempted for a long time: we need to make sure that
> + * the structure at pc->mem_cgroup, and structures it links to, cannot
> + * be freed while we locate and acquire its zone lru_lock. cgroup's
> + * synchronize_rcu() between pre_destroy and destroy makes this safe.
> + */
> + rcu_read_lock();
> +again:
> + memcg = rcu_dereference(pc->mem_cgroup);
> /*
> - * For the moment, simply lock by zone just as before.
> + * At present we start up with all page_cgroups initialized
> + * to zero: here treat NULL as root_mem_cgroup, then correct
> + * the page_cgroup below once we really have it locked.
> */
> - if (*lruvp&& (*lruvp)->zone != lruvec->zone) {
> + mz = page_cgroup_zoneinfo(memcg ? : root_mem_cgroup, page);
> + lruvec =&mz->lruvec;
> +
> + /*
> + * Sometimes we are called with non-NULL *lruvp spinlock already held:
> + * hold on if we want the same lock again, otherwise drop and acquire.
> + */
> + if (*lruvp&& *lruvp != lruvec) {
> unlock_lruvec(*lruvp);
> *lruvp = NULL;
> }
> - if (!*lruvp)
> + if (!*lruvp) {
> + *lruvp = lruvec;
> lock_lruvec(lruvec);
> - *lruvp = lruvec;
> + /*
> + * But pc->mem_cgroup may have changed since we looked...
> + */
> + if (unlikely(pc->mem_cgroup != memcg))
> + goto again;
> + }
> +
> + /*
> + * We must reset pc->mem_cgroup back to root before freeing a page:
> + * avoid additional callouts from hot paths by doing it here when we
> + * see the page is frozen. Also initialize pc at first use of page.
> + */
> + if (memcg != root_mem_cgroup&& (!memcg || !page_count(page)))
> + pc->mem_cgroup = root_mem_cgroup;
> +
> + rcu_read_unlock();
> }
>
> void mem_cgroup_reset_uncharged_to_root(struct page *page)
> @@ -4744,6 +4769,7 @@ static int alloc_mem_cgroup_per_zone_inf
> for (zone = 0; zone< MAX_NR_ZONES; zone++) {
> mz =&pn->zoneinfo[zone];
> mz->lruvec.zone =&NODE_DATA(node)->node_zones[zone];
> + spin_lock_init(&mz->lruvec.lru_lock);
> for_each_lru(lru)
> INIT_LIST_HEAD(&mz->lruvec.lists[lru]);
> mz->usage_in_excess = 0;
> --- mmotm.orig/mm/page_alloc.c 2012-02-18 11:57:28.375524252 -0800
> +++ mmotm/mm/page_alloc.c 2012-02-18 11:58:09.051525220 -0800
> @@ -4360,12 +4360,12 @@ static void __paginginit free_area_init_
> #endif
> zone->name = zone_names[j];
> spin_lock_init(&zone->lock);
> - spin_lock_init(&zone->lru_lock);
> zone_seqlock_init(zone);
> zone->zone_pgdat = pgdat;
>
> zone_pcp_init(zone);
> zone->lruvec.zone = zone;
> + spin_lock_init(&zone->lruvec.lru_lock);
> for_each_lru(lru)
> INIT_LIST_HEAD(&zone->lruvec.lists[lru]);
> zone->lruvec.reclaim_stat.recent_rotated[0] = 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-02-21 7:08 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-20 23:26 [PATCH 0/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
2012-02-20 23:28 ` [PATCH 1/10] mm/memcg: scanning_global_lru means mem_cgroup_disabled Hugh Dickins
2012-02-21 8:03 ` KAMEZAWA Hiroyuki
2012-02-20 23:29 ` [PATCH 2/10] mm/memcg: move reclaim_stat into lruvec Hugh Dickins
2012-02-21 8:05 ` KAMEZAWA Hiroyuki
2012-02-20 23:30 ` [PATCH 3/10] mm/memcg: add zone pointer " Hugh Dickins
2012-02-21 8:08 ` KAMEZAWA Hiroyuki
2012-02-20 23:32 ` [PATCH 4/10] mm/memcg: apply add/del_page to lruvec Hugh Dickins
2012-02-21 8:20 ` KAMEZAWA Hiroyuki
2012-02-21 22:25 ` Hugh Dickins
2012-02-20 23:33 ` [PATCH 5/10] mm/memcg: introduce page_relock_lruvec Hugh Dickins
2012-02-21 8:38 ` KAMEZAWA Hiroyuki
2012-02-21 22:36 ` Hugh Dickins
2012-02-20 23:34 ` [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup Hugh Dickins
2012-02-21 5:55 ` Konstantin Khlebnikov
2012-02-21 19:37 ` Hugh Dickins
2012-02-21 20:40 ` Konstantin Khlebnikov
2012-02-21 22:05 ` Hugh Dickins
2012-02-21 6:05 ` Konstantin Khlebnikov
2012-02-21 20:00 ` Hugh Dickins
2012-02-21 9:13 ` KAMEZAWA Hiroyuki
2012-02-21 23:03 ` Hugh Dickins
2012-02-22 4:05 ` Konstantin Khlebnikov
2012-02-20 23:35 ` [PATCH 7/10] mm/memcg: remove mem_cgroup_reset_owner Hugh Dickins
2012-02-21 9:17 ` KAMEZAWA Hiroyuki
2012-02-20 23:36 ` [PATCH 8/10] mm/memcg: nest lru_lock inside page_cgroup lock Hugh Dickins
2012-02-21 9:48 ` KAMEZAWA Hiroyuki
2012-02-20 23:38 ` [PATCH 9/10] mm/memcg: move lru_lock into lruvec Hugh Dickins
2012-02-21 7:08 ` Konstantin Khlebnikov [this message]
2012-02-21 20:12 ` Hugh Dickins
2012-02-21 21:35 ` Konstantin Khlebnikov
2012-02-21 22:12 ` Hugh Dickins
2012-02-22 3:43 ` Konstantin Khlebnikov
2012-02-22 6:09 ` Hugh Dickins
2012-02-23 14:21 ` Konstantin Khlebnikov
2012-02-20 23:39 ` [PATCH 10/10] mm/memcg: per-memcg per-zone lru locking Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F434300.3080001@openvz.org \
--to=khlebnikov@openvz.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).