All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	mgorman@techsingularity.net, tj@kernel.org, hughd@google.com,
	khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com,
	yang.shi@linux.alibaba.com, willy@infradead.org,
	"Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Roman Gushchin" <guro@fb.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Chris Down" <chris@chrisdown.name>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vlastimil Babka" <vbabka@suse.cz>, "Qian Cai" <cai@lca.pw>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"David Rientjes" <rientjes@google.c>
Subject: Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Tue, 19 Nov 2019 18:04:35 +0800	[thread overview]
Message-ID: <85702d50-daf0-ece8-9a5d-e4b860ef2f99@linux.alibaba.com> (raw)
In-Reply-To: <20191118161126.GB365174@cmpxchg.org>



ÔÚ 2019/11/19 ÉÏÎç12:11, Johannes Weiner дµÀ:
> On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote:
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 62470325f9bc..cf274739e619 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1246,6 +1246,42 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
>>  	return lruvec;
>>  }
>>  
>> +struct lruvec *lock_page_lruvec_irq(struct page *page,
>> +					struct pglist_data *pgdat)
>> +{
>> +	struct lruvec *lruvec;
>> +
>> +again:
>> +	lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> +	spin_lock_irq(&lruvec->lru_lock);
> 
> This isn't safe. Nothing prevents the page from being moved to a
> different memcg in between these two operations, and the lruvec having
> been freed by the time you try to acquire the spinlock.
> 
> You need to use the rcu lock to dereference page->mem_cgroup and its
> lruvec when coming from the page like this.

Hi Johannes,

Yes, you are right. Thank a lot to point out this!
Could we use rcu lock to guard the lruvec till lruvec->lru_lock getten?

+struct lruvec *lock_page_lruvec_irq(struct page *page,
+                                       struct pglist_data *pgdat)
+{
+       struct lruvec *lruvec;
+
+again:
+       rcu_read_lock();
+       lruvec = mem_cgroup_page_lruvec(page, pgdat);
+       spin_lock_irq(&lruvec->lru_lock);
+       rcu_read_unlock();
+
+       /* lruvec may changed in commit_charge() */
+       if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) {
+               spin_unlock_irq(&lruvec->lru_lock);
+               goto again;
+       }
+
+       return lruvec;
+}
> 
> You also need to use page_memcg_rcu() to insert the appropriate
> lockless access barriers, which mem_cgroup_page_lruvec() does not do
> since it's designed for use with pgdat->lru_lock.
> 

Since the page_memcg_rcu can not know it under a spin_lock, guess the following enhance is fine:
@@ -1225,7 +1224,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
                goto out;
        }

-       memcg = page->mem_cgroup;
+       memcg = READ_ONCE(page->mem_cgroup);
        /*


Millions thanks!
Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	mgorman@techsingularity.net, tj@kernel.org, hughd@google.com,
	khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com,
	yang.shi@linux.alibaba.com, willy@infradead.org,
	"Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Roman Gushchin" <guro@fb.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Chris Down" <chris@chrisdown.name>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vlastimil Babka" <vbabka@suse.cz>, "Qian Cai" <cai@lca.pw>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"David Rientjes" <rientjes@google.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	swkhack <swkhack@gmail.com>,
	"Potyra, Stefan" <Stefan.Potyra@elektrobit.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Stephen Rothwell" <sfr@canb.auug.org.au>,
	"Colin Ian King" <colin.king@canonical.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"Peng Fan" <peng.fan@nxp.com>,
	"Nikolay Borisov" <nborisov@suse.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Kirill Tkhai" <ktkhai@virtuozzo.com>,
	"Yafang Shao" <laoar.shao@gmail.com>
Subject: Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Tue, 19 Nov 2019 18:04:35 +0800	[thread overview]
Message-ID: <85702d50-daf0-ece8-9a5d-e4b860ef2f99@linux.alibaba.com> (raw)
In-Reply-To: <20191118161126.GB365174@cmpxchg.org>



在 2019/11/19 上午12:11, Johannes Weiner 写道:
> On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote:
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 62470325f9bc..cf274739e619 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1246,6 +1246,42 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
>>  	return lruvec;
>>  }
>>  
>> +struct lruvec *lock_page_lruvec_irq(struct page *page,
>> +					struct pglist_data *pgdat)
>> +{
>> +	struct lruvec *lruvec;
>> +
>> +again:
>> +	lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> +	spin_lock_irq(&lruvec->lru_lock);
> 
> This isn't safe. Nothing prevents the page from being moved to a
> different memcg in between these two operations, and the lruvec having
> been freed by the time you try to acquire the spinlock.
> 
> You need to use the rcu lock to dereference page->mem_cgroup and its
> lruvec when coming from the page like this.

Hi Johannes,

Yes, you are right. Thank a lot to point out this!
Could we use rcu lock to guard the lruvec till lruvec->lru_lock getten?

+struct lruvec *lock_page_lruvec_irq(struct page *page,
+                                       struct pglist_data *pgdat)
+{
+       struct lruvec *lruvec;
+
+again:
+       rcu_read_lock();
+       lruvec = mem_cgroup_page_lruvec(page, pgdat);
+       spin_lock_irq(&lruvec->lru_lock);
+       rcu_read_unlock();
+
+       /* lruvec may changed in commit_charge() */
+       if (lruvec != mem_cgroup_page_lruvec(page, pgdat)) {
+               spin_unlock_irq(&lruvec->lru_lock);
+               goto again;
+       }
+
+       return lruvec;
+}
> 
> You also need to use page_memcg_rcu() to insert the appropriate
> lockless access barriers, which mem_cgroup_page_lruvec() does not do
> since it's designed for use with pgdat->lru_lock.
> 

Since the page_memcg_rcu can not know it under a spin_lock, guess the following enhance is fine:
@@ -1225,7 +1224,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
                goto out;
        }

-       memcg = page->mem_cgroup;
+       memcg = READ_ONCE(page->mem_cgroup);
        /*


Millions thanks!
Alex


  reply	other threads:[~2019-11-19 10:04 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-16  3:14 [PATCH v3 0/8] per lruvec lru_lock for memcg Alex Shi
2019-11-16  3:15 ` [PATCH v3 1/7] mm/lru: add per lruvec lock " Alex Shi
2019-11-16  6:28   ` Shakeel Butt
2019-11-18  2:44     ` Alex Shi
2019-11-18 12:08       ` Matthew Wilcox
2019-11-18 12:37         ` Alex Shi
2019-11-19 10:05         ` Alex Shi
2019-11-16  3:15 ` [PATCH v3 2/7] mm/lruvec: add irqsave flags into lruvec struct Alex Shi
2019-11-16  6:31   ` Shakeel Butt
2019-11-18  2:52     ` Alex Shi
2019-11-22  6:46   ` Christoph Hellwig
2019-11-16  3:15 ` [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2019-11-16  3:15   ` Alex Shi
2019-11-16  4:38   ` Matthew Wilcox
2019-11-16  4:38     ` Matthew Wilcox
2019-11-18 11:55     ` Alex Shi
2019-11-18 11:55       ` Alex Shi
2019-11-18 12:14       ` Matthew Wilcox
2019-11-18 12:14         ` Matthew Wilcox
2019-11-18 12:31         ` Alex Shi
2019-11-18 12:31           ` Alex Shi
2019-11-18 12:34           ` Matthew Wilcox
2019-11-18 12:34             ` Matthew Wilcox
2019-11-19 10:14             ` Alex Shi
2019-11-19 10:14               ` Alex Shi
2019-11-16  7:03   ` Shakeel Butt
2019-11-16  7:03     ` Shakeel Butt
2019-11-18 12:23     ` Alex Shi
2019-11-18 12:23       ` Alex Shi
2019-11-18 12:31       ` Matthew Wilcox
2019-11-18 12:31         ` Matthew Wilcox
2019-11-19 10:08         ` Alex Shi
2019-11-19 10:08           ` Alex Shi
2019-11-18 16:11   ` Johannes Weiner
2019-11-18 16:11     ` Johannes Weiner
2019-11-19 10:04     ` Alex Shi [this message]
2019-11-19 10:04       ` Alex Shi
2019-11-19  2:10   ` Daniel Jordan
2019-11-19  2:10     ` Daniel Jordan
2019-11-19 10:10     ` Alex Shi
2019-11-19 10:10       ` Alex Shi
2019-11-16  3:15 ` [PATCH v3 4/7] mm/lru: only change the lru_lock iff page's lruvec is different Alex Shi
2019-11-16  3:15 ` [PATCH v3 5/7] mm/pgdat: remove pgdat lru_lock Alex Shi
2019-11-16  3:15 ` [PATCH v3 6/7] mm/lru: likely enhancement Alex Shi
2019-11-16  3:15 ` [PATCH v3 7/7] mm/lru: revise the comments of lru_lock Alex Shi
2019-11-16  3:15   ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85702d50-daf0-ece8-9a5d-e4b860ef2f99@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=daniel.m.jordan@oracle.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.c \
    --cc=shakeelb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.