All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org,
	daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org,
	willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	"Michal Hocko" <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"Vladimir Davydov"
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"Roman Gushchin" <guro-b10kYP2dOMg@public.gmane.org>,
	"Chris Down" <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>,
	"Thomas Gleixner" <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	"Vlastimil Babka" <vbabka-AlSwsSmVLrQ@public.gmane.org>,
	"Qian Cai" <cai-J5quhbR+WMc@public.gmane.org>,
	"Andrey Ryabinin"
	<aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	"Kirill A. Shutemov"
	<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	"Jérôme Glisse" <jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Andrea Arcangeli"
	<aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"David Rientjes"
	<rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH v8 03/10] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Tue, 21 Jan 2020 11:00:05 -0500	[thread overview]
Message-ID: <20200121160005.GA69293@cmpxchg.org> (raw)
In-Reply-To: <9ee80b68-a78f-714a-c727-1f6d2b4f87ea-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>

On Mon, Jan 20, 2020 at 08:58:09PM +0800, Alex Shi wrote:
> 
> 
> 在 2020/1/17 上午5:52, Johannes Weiner 写道:
> 
> > You simply cannot serialize on page->mem_cgroup->lruvec when
> > page->mem_cgroup isn't stable. You need to serialize on the page
> > itself, one way or another, to make this work.
> > 
> > 
> > So here is a crazy idea that may be worth exploring:
> > 
> > Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
> > linked list.
> > 
> > Can we make PageLRU atomic and use it to stabilize the lru_lock
> > instead, and then use the lru_lock only serialize list operations?
> > 
> 
> Hi Johannes,
> 
> I am trying to figure out the solution of atomic PageLRU, but is 
> blocked by the following sitations, when PageLRU and lru list was protected
> together under lru_lock, the PageLRU could be a indicator if page on lru list
> But now seems it can't be the indicator anymore.
> Could you give more clues of stabilization usage of PageLRU?

There are two types of PageLRU checks: optimistic and deterministic.

The check in activate_page() for example is optimistic and the result
unstable, but that's okay, because if we miss a page here and there
it's not the end of the world.

But the check in __activate_page() is deterministic, because we need
to be sure before del_page_from_lru_list(). Currently it's made
deterministic by testing under the lock: whoever acquires the lock
first gets to touch the LRU state. The same can be done with an atomic
TestClearPagLRU: whoever clears the flag first gets to touch the LRU
state (the lock is then only acquired to not corrupt the linked list,
in case somebody adds or removes a different page at the same time).

I.e. in my proposal, if you want to get a stable read of PageLRU, you
have to clear it atomically. But AFAICS, everybody who currently does
need a stable read either already clears it or can easily be converted
to clear it and then set it again (like __activate_page and friends).

> __page_cache_release/release_pages/compaction            __pagevec_lru_add
> if (TestClearPageLRU(page))                              if (!PageLRU())
>                                                                 lruvec_lock();
>                                                                 list_add();
>         			                                lruvec_unlock();
>         			                                SetPageLRU() //position 1
>         lock_page_lruvec_irqsave(page, &flags);
>         del_page_from_lru_list(page, lruvec, ..);
>         unlock_page_lruvec_irqrestore(lruvec, flags);
>                                                                 SetPageLRU() //position 2

Hm, that's not how __pagevec_lru_add() looks. In fact,
__pagevec_lru_add_fn() has a BUG_ON(PageLRU).

That's because only one thread can own the isolation state at a time.

If PageLRU is set, only one thread can claim it. Right now, whoever
takes the lock first and clears it wins. When we replace it with
TestClearPageLRU, it's the same thing: only one thread can win.

And you cannot set PageLRU, unless you own it. Either you isolated the
page using TestClearPageLRU, or you allocated a new page.

So you can have multiple threads trying to isolate a page from the LRU
list, hence the atomic testclear. But no two threads should ever be
racing to add a page to the LRU list, because only one thread can own
the isolation state.

With the atomic PageLRU flag, the sequence would be this:

__pagevec_lru_add:

  BUG_ON(PageLRU()) // Caller *must* own the isolation state

  lruvec_lock()     // The lruvec is stable, because changing
                    // page->mem_cgroup requires owning the
                    // isolation state (PageLRU) and we own it

  list_add()        // Linked list protected by lru_lock

  lruvec_unlock()

  SetPageLRU()      // The page has been added to the linked
                    // list, give up our isolation state. Once
                    // this flag becomes visible, other threads
                    // can isolate the page from the LRU list

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	mgorman@techsingularity.net, tj@kernel.org, hughd@google.com,
	khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com,
	yang.shi@linux.alibaba.com, willy@infradead.org,
	shakeelb@google.com, "Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Roman Gushchin" <guro@fb.com>,
	"Chris Down" <chris@chrisdown.name>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vlastimil Babka" <vbabka@suse.cz>, "Qian Cai" <cai@lca.pw>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"David Rientjes" <rientjes@google.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	swkhack <swkhack@gmail.com>,
	"Potyra, Stefan" <Stefan.Potyra@elektrobit.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Stephen Rothwell" <sfr@canb.auug.org.au>,
	"Colin Ian King" <colin.king@canonical.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"Peng Fan" <peng.fan@nxp.com>,
	"Nikolay Borisov" <nborisov@suse.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Kirill Tkhai" <ktkhai@virtuozzo.com>,
	"Yafang Shao" <laoar.shao@gmail.com>
Subject: Re: [PATCH v8 03/10] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Tue, 21 Jan 2020 11:00:05 -0500	[thread overview]
Message-ID: <20200121160005.GA69293@cmpxchg.org> (raw)
In-Reply-To: <9ee80b68-a78f-714a-c727-1f6d2b4f87ea@linux.alibaba.com>

On Mon, Jan 20, 2020 at 08:58:09PM +0800, Alex Shi wrote:
> 
> 
> 在 2020/1/17 上午5:52, Johannes Weiner 写道:
> 
> > You simply cannot serialize on page->mem_cgroup->lruvec when
> > page->mem_cgroup isn't stable. You need to serialize on the page
> > itself, one way or another, to make this work.
> > 
> > 
> > So here is a crazy idea that may be worth exploring:
> > 
> > Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
> > linked list.
> > 
> > Can we make PageLRU atomic and use it to stabilize the lru_lock
> > instead, and then use the lru_lock only serialize list operations?
> > 
> 
> Hi Johannes,
> 
> I am trying to figure out the solution of atomic PageLRU, but is 
> blocked by the following sitations, when PageLRU and lru list was protected
> together under lru_lock, the PageLRU could be a indicator if page on lru list
> But now seems it can't be the indicator anymore.
> Could you give more clues of stabilization usage of PageLRU?

There are two types of PageLRU checks: optimistic and deterministic.

The check in activate_page() for example is optimistic and the result
unstable, but that's okay, because if we miss a page here and there
it's not the end of the world.

But the check in __activate_page() is deterministic, because we need
to be sure before del_page_from_lru_list(). Currently it's made
deterministic by testing under the lock: whoever acquires the lock
first gets to touch the LRU state. The same can be done with an atomic
TestClearPagLRU: whoever clears the flag first gets to touch the LRU
state (the lock is then only acquired to not corrupt the linked list,
in case somebody adds or removes a different page at the same time).

I.e. in my proposal, if you want to get a stable read of PageLRU, you
have to clear it atomically. But AFAICS, everybody who currently does
need a stable read either already clears it or can easily be converted
to clear it and then set it again (like __activate_page and friends).

> __page_cache_release/release_pages/compaction            __pagevec_lru_add
> if (TestClearPageLRU(page))                              if (!PageLRU())
>                                                                 lruvec_lock();
>                                                                 list_add();
>         			                                lruvec_unlock();
>         			                                SetPageLRU() //position 1
>         lock_page_lruvec_irqsave(page, &flags);
>         del_page_from_lru_list(page, lruvec, ..);
>         unlock_page_lruvec_irqrestore(lruvec, flags);
>                                                                 SetPageLRU() //position 2

Hm, that's not how __pagevec_lru_add() looks. In fact,
__pagevec_lru_add_fn() has a BUG_ON(PageLRU).

That's because only one thread can own the isolation state at a time.

If PageLRU is set, only one thread can claim it. Right now, whoever
takes the lock first and clears it wins. When we replace it with
TestClearPageLRU, it's the same thing: only one thread can win.

And you cannot set PageLRU, unless you own it. Either you isolated the
page using TestClearPageLRU, or you allocated a new page.

So you can have multiple threads trying to isolate a page from the LRU
list, hence the atomic testclear. But no two threads should ever be
racing to add a page to the LRU list, because only one thread can own
the isolation state.

With the atomic PageLRU flag, the sequence would be this:

__pagevec_lru_add:

  BUG_ON(PageLRU()) // Caller *must* own the isolation state

  lruvec_lock()     // The lruvec is stable, because changing
                    // page->mem_cgroup requires owning the
                    // isolation state (PageLRU) and we own it

  list_add()        // Linked list protected by lru_lock

  lruvec_unlock()

  SetPageLRU()      // The page has been added to the linked
                    // list, give up our isolation state. Once
                    // this flag becomes visible, other threads
                    // can isolate the page from the LRU list


  parent reply	other threads:[~2020-01-21 16:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16  3:04 [PATCH v8 00/10] per lruvec lru_lock for memcg Alex Shi
     [not found] ` <1579143909-156105-1-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-01-16  3:05   ` [PATCH v8 01/10] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-01-16  3:05     ` Alex Shi
2020-01-16  3:05 ` [PATCH v8 02/10] mm/memcg: fold lock_page_lru into commit_charge Alex Shi
2020-01-16  3:05 ` [PATCH v8 03/10] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-01-16  3:05   ` Alex Shi
2020-01-16 21:52   ` Johannes Weiner
2020-01-16 21:52     ` Johannes Weiner
     [not found]     ` <20200116215222.GA64230-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-01-19 11:32       ` Alex Shi
2020-01-19 11:32         ` Alex Shi
2020-01-20 12:58     ` Alex Shi
2020-01-20 12:58       ` Alex Shi
     [not found]       ` <9ee80b68-a78f-714a-c727-1f6d2b4f87ea-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-01-21 16:00         ` Johannes Weiner [this message]
2020-01-21 16:00           ` Johannes Weiner
     [not found]           ` <20200121160005.GA69293-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-01-22 12:01             ` Alex Shi
2020-01-22 12:01               ` Alex Shi
2020-01-22 18:31               ` Johannes Weiner
2020-01-22 18:31                 ` Johannes Weiner
2020-04-13 10:48     ` Alex Shi
2020-04-13 10:48       ` Alex Shi
     [not found]       ` <cdcdb710-1d78-6fac-48d7-35519ddcdc6a-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-04-13 18:07         ` Johannes Weiner
2020-04-13 18:07           ` Johannes Weiner
     [not found]           ` <20200413180725.GA99267-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-04-14  4:52             ` Alex Shi
2020-04-14  4:52               ` Alex Shi
     [not found]               ` <8e7bf170-2bb5-f862-c12b-809f7f7d96cb-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-04-14 16:31                 ` Johannes Weiner
2020-04-14 16:31                   ` Johannes Weiner
     [not found]                   ` <20200414163114.GA136578-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-04-15 13:42                     ` Alex Shi
2020-04-15 13:42                       ` Alex Shi
     [not found]                       ` <54af0662-cbb4-88c7-7eae-f969684025dd-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-04-16  8:01                         ` Alex Shi
2020-04-16  8:01                           ` Alex Shi
     [not found]                           ` <0bed9f1a-400d-d9a9-aeb4-de1dd9ccbb45-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-04-16 15:28                             ` Johannes Weiner
2020-04-16 15:28                               ` Johannes Weiner
     [not found]                               ` <20200416152830.GA195132-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2020-04-16 17:47                                 ` Shakeel Butt
2020-04-16 17:47                                   ` Shakeel Butt
     [not found]                                   ` <CALvZod4bdmkd_YG=96O8+zCSCFNpsBQiN+3Cq+6oD7jn3GTYog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-17 13:18                                     ` Alex Shi
2020-04-17 13:18                                       ` Alex Shi
2020-04-17 14:39                                 ` Alex Shi
2020-04-17 14:39                                   ` Alex Shi
2020-04-14  8:19             ` Alex Shi
2020-04-14  8:19               ` Alex Shi
2020-04-14 16:36               ` Johannes Weiner
2020-04-14 16:36                 ` Johannes Weiner
2020-01-16  3:05 ` [PATCH v8 04/10] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-01-16  3:05 ` [PATCH v8 05/10] mm/mlock: optimize munlock_pagevec by relocking Alex Shi
2020-01-16  3:05 ` [PATCH v8 06/10] mm/swap: only change the lru_lock iff page's lruvec is different Alex Shi
2020-01-16  3:05 ` [PATCH v8 07/10] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-01-16  3:05 ` [PATCH v8 08/10] mm/lru: revise the comments of lru_lock Alex Shi
2020-01-16  3:05   ` Alex Shi
2020-01-16  3:05 ` [PATCH v8 09/10] mm/lru: add debug checking for page memcg moving Alex Shi
2020-01-16  3:05 ` [PATCH v8 10/10] mm/memcg: add debug checking in lock_page_memcg Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200121160005.GA69293@cmpxchg.org \
    --to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
    --cc=aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org \
    --cc=aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org \
    --cc=cai-J5quhbR+WMc@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org \
    --cc=daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=guro-b10kYP2dOMg@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
    --cc=kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org \
    --cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=vbabka-AlSwsSmVLrQ@public.gmane.org \
    --cc=vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.