Re: [PATCH next] mm/swap.c: reduce lock contention in lru_cache_add

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yu Zhao <yuzhao@google.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH next] mm/swap.c: reduce lock contention in lru_cache_add
Date: Wed, 25 Nov 2020 21:52:34 -0700	[thread overview]
Message-ID: <20201126045234.GA1014081@google.com> (raw)
In-Reply-To: <1605860847-47445-1-git-send-email-alex.shi@linux.alibaba.com>

On Fri, Nov 20, 2020 at 04:27:27PM +0800, Alex Shi wrote:
> The current relock logical will change lru_lock when found a new
> lruvec, so if 2 memcgs are reading file or alloc page at same time,
> they could hold the lru_lock alternately, and wait for each other for
> fairness attribute of ticket spin lock.
> 
> This patch will sort that all lru_locks and only hold them once in
> above scenario. That could reduce fairness waiting for lock reget.
> Than, vm-scalability/case-lru-file-readtwice could get ~5% performance
> gain on my 2P*20core*HT machine.
> 
> Suggested-by: Konstantin Khlebnikov <koct9i@gmail.com>
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Konstantin Khlebnikov <koct9i@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 490553f3f9ef..c787b38bf9c0 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1009,24 +1009,65 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec)
>  	trace_mm_lru_insertion(page, lru);
>  }
>  
> +struct lruvecs {
> +	struct list_head lists[PAGEVEC_SIZE];
> +	struct lruvec *vecs[PAGEVEC_SIZE];
> +};
> +
> +/* Sort pvec pages on their lruvec */
> +int sort_page_lruvec(struct lruvecs *lruvecs, struct pagevec *pvec)
> +{
> +	int i, j, nr_lruvec;
> +	struct page *page;
> +	struct lruvec *lruvec = NULL;
> +
> +	lruvecs->vecs[0] = NULL;
> +	for (i = nr_lruvec = 0; i < pagevec_count(pvec); i++) {
> +		page = pvec->pages[i];
> +		lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> +
> +		/* Try to find a same lruvec */
> +		for (j = 0; j <= nr_lruvec; j++)
> +			if (lruvec == lruvecs->vecs[j])
> +				break;
> +
> +		/* A new lruvec */
> +		if (j > nr_lruvec) {
> +			INIT_LIST_HEAD(&lruvecs->lists[nr_lruvec]);
> +			lruvecs->vecs[nr_lruvec] = lruvec;
> +			j = nr_lruvec++;
> +			lruvecs->vecs[nr_lruvec] = 0;
> +		}
> +
> +		list_add_tail(&page->lru, &lruvecs->lists[j]);
> +	}
> +
> +	return nr_lruvec;
> +}
> +
>  /*
>   * Add the passed pages to the LRU, then drop the caller's refcount
>   * on them.  Reinitialises the caller's pagevec.
>   */
>  void __pagevec_lru_add(struct pagevec *pvec)
>  {
> -	int i;
> -	struct lruvec *lruvec = NULL;
> +	int i, nr_lruvec;
>  	unsigned long flags = 0;
> +	struct page *page;
> +	struct lruvecs lruvecs;
>  
> -	for (i = 0; i < pagevec_count(pvec); i++) {
> -		struct page *page = pvec->pages[i];
> +	nr_lruvec = sort_page_lruvec(&lruvecs, pvec);

Simply looping pvec multiple times (15 at most) for different lruvecs
would be better because 1) it requires no extra data structures and
therefore has better cache locality (theoretically faster) 2) it only
loops once when !CONFIG_MEMCG and !CONFIG_NUMA and therefore has no
impact on Android and Chrome OS.

> -		lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags);
> -		__pagevec_lru_add_fn(page, lruvec);
> +	for (i = 0; i < nr_lruvec; i++) {
> +		spin_lock_irqsave(&lruvecs.vecs[i]->lru_lock, flags);
> +		while (!list_empty(&lruvecs.lists[i])) {
> +			page = lru_to_page(&lruvecs.lists[i]);
> +			list_del(&page->lru);
> +			__pagevec_lru_add_fn(page, lruvecs.vecs[i]);
> +		}
> +		spin_unlock_irqrestore(&lruvecs.vecs[i]->lru_lock, flags);
>  	}
> -	if (lruvec)
> -		unlock_page_lruvec_irqrestore(lruvec, flags);
> +
>  	release_pages(pvec->pages, pvec->nr);
>  	pagevec_reinit(pvec);
>  }
> -- 
> 2.29.GIT
>

next prev parent reply	other threads:[~2020-11-26  4:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20  8:27 [PATCH next] mm/swap.c: reduce lock contention in lru_cache_add Alex Shi
2020-11-20 23:19 ` Andrew Morton
2020-11-23  4:46   ` Alex Shi
2020-11-25 15:38 ` Vlastimil Babka
2020-11-26  3:12   ` Alex Shi
2020-11-26 11:05     ` Vlastimil Babka
2020-11-26  4:52 ` Yu Zhao [this message]
2020-11-26  6:39   ` Alex Shi
2020-11-26  7:24     ` Yu Zhao
2020-11-26  8:09       ` Alex Shi
2020-11-26 11:22       ` Vlastimil Babka
2020-11-26 15:44         ` Vlastimil Babka
2020-11-26 15:55           ` Matthew Wilcox
2020-11-27  3:14             ` Alex Shi
2020-12-01  8:02             ` [PATCH 1/3] mm/swap.c: pre-sort pages in pagevec for pagevec_lru_move_fn Alex Shi
2020-12-01  8:02               ` [PATCH 2/3] mm/swap.c: bail out early for no memcg and no numa Alex Shi
2020-12-01  8:02               ` [PATCH 3/3] mm/swap.c: extend the usage to pagevec_lru_add Alex Shi
2020-12-01  8:10               ` [PATCH 1/3] mm/swap.c: pre-sort pages in pagevec for pagevec_lru_move_fn Michal Hocko
2020-12-01  8:20                 ` Alex Shi
2020-12-25  9:59             ` [RFC PATCH 0/4] pre sort pages on lruvec in pagevec Alex Shi
2020-12-25  9:59               ` [RFC PATCH 1/4] mm/swap.c: pre-sort pages in pagevec for pagevec_lru_move_fn Alex Shi
2020-12-25  9:59               ` [RFC PATCH 2/4] mm/swap.c: bail out early for no memcg and no numa Alex Shi
2020-12-25  9:59               ` [RFC PATCH 3/4] mm/swap.c: extend the usage to pagevec_lru_add Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201126045234.GA1014081@google.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=hughd@google.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).