From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=gLXF=HN=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4BD15C433E0
	for <linux-mm@archiver.kernel.org>; Thu, 11 Feb 2021 17:29:17 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id D7DA8601FF
	for <linux-mm@archiver.kernel.org>; Thu, 11 Feb 2021 17:29:16 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7DA8601FF
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 51B3A6B0137; Thu, 11 Feb 2021 12:29:16 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4CDAD6B0138; Thu, 11 Feb 2021 12:29:16 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3BABC6B0139; Thu, 11 Feb 2021 12:29:16 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100])
	by kanga.kvack.org (Postfix) with ESMTP id 2651D6B0137
	for <linux-mm@kvack.org>; Thu, 11 Feb 2021 12:29:16 -0500 (EST)
Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id E186B180AD801
	for <linux-mm@kvack.org>; Thu, 11 Feb 2021 17:29:15 +0000 (UTC)
X-FDA: 77806673070.19.title44_50150252761a
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin19.hostedemail.com (Postfix) with ESMTP id BC0441AD1BA
	for <linux-mm@kvack.org>; Thu, 11 Feb 2021 17:29:15 +0000 (UTC)
X-HE-Tag: title44_50150252761a
X-Filterd-Recvd-Size: 8765
Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44])
	by imf37.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu, 11 Feb 2021 17:29:15 +0000 (UTC)
Received: by mail-lf1-f44.google.com with SMTP id v30so8890493lfq.6
        for <linux-mm@kvack.org>; Thu, 11 Feb 2021 09:29:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=W2BqX2C9xZ0tvZx8kWx5WCPacXA4lbnom4Z0hj5ICR8=;
        b=qjStwinyIak01ZE0CTrknqWtcIGDuf5y25gvfJyxgAJ/tAeUhOgX4xmVTAd0/Sb5/n
         caiwYDhhIT7wddkXXoCO3cAE7RRnvfUZQP6xO0cZc4x3EZHJgNdWON4X+1QpIBhE5WZm
         gRm/T8uf5Yn72iYfC3rzmfHP6NH2dPZBW50/dzMMGppSCHBpMk5qhkW+jQwmANu2A++Y
         qyfLUWPMLrG2qPWGuLltCFXkV4JtmczeGwib+SsHhaPEf1y/I8e/jrkWNczXjFK0MfGo
         mumeZXC90V/bYVjBtPEjt5h+vkr9IgDotOZaY65ZMAA+ZT5eTi04cIeLoCulfv2tpYI0
         19sA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=W2BqX2C9xZ0tvZx8kWx5WCPacXA4lbnom4Z0hj5ICR8=;
        b=NMxgsnF9rD7sY33ftSgu+1BWEUbQGQ+wzHsuEMBredKf8PhXTDbLtzigaS25c/Ipav
         6/gJiwYaJmUn/EPPi50o27kEfYGNVGJ5EI+VLdQdOYyzcs64CRJMYXHBMymjk8RIglGy
         A/PYeoHcojoRHYxcbKf8Ayn8v6fDSOkWF1ireVNCJwnES0h1RpElbx6m0v9E2cXFsa2S
         mzRVVZ4Wf9qry1nHqpFqHE+CKD/xtht8/x5tkOUu1VrTv8SI2zJbijXVTAMYSKosEB0K
         zVQXf+RxwFl33Kq2cGgL2G31st58eUkQk763YABkKrSl3CRdCc0nt6lNAUikn5O2JD/G
         1pzg==
X-Gm-Message-State: AOAM532tKQg1N4UZAIEEtWhn9+iLOWomsEhFvdRunCAeLNBC1g7C0xmD
	zlsruuTDrjNODIJYuypThuQm7q+9flBTkCxvFZo=
X-Google-Smtp-Source: ABdhPJwubO0DBK93M/nIhHEGIX4oy5U5kDS4N2nkKEfH+z8IKuTRk6V5y9n/DiElSdFdVD658u00d++oPwp9MiahZXM=
X-Received: by 2002:a19:6447:: with SMTP id b7mr4763884lfj.206.1613064553614;
 Thu, 11 Feb 2021 09:29:13 -0800 (PST)
MIME-Version: 1.0
References: <20210209174646.1310591-1-shy828301@gmail.com> <20210209174646.1310591-13-shy828301@gmail.com>
 <acd1915c-306b-08a8-9e0f-b06c1e09fb4c@suse.cz>
In-Reply-To: <acd1915c-306b-08a8-9e0f-b06c1e09fb4c@suse.cz>
From: Yang Shi <shy828301@gmail.com>
Date: Thu, 11 Feb 2021 09:29:01 -0800
Message-ID: <CAHbLzkpF9+NUp2yUf_yKHHngKXGDya4Mj3ZTc-2rm3yFNw_==A@mail.gmail.com>
Subject: Re: [v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional
 to priority
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Roman Gushchin <guro@fb.com>, Kirill Tkhai <ktkhai@virtuozzo.com>, 
	Shakeel Butt <shakeelb@google.com>, Dave Chinner <david@fromorbit.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Linux MM <linux-mm@kvack.org>, 
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>, 
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Feb 11, 2021 at 5:10 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/9/21 6:46 PM, Yang Shi wrote:
> > The number of deferred objects might get windup to an absurd number, and it
> > results in clamp of slab objects.  It is undesirable for sustaining workingset.
> >
> > So shrink deferred objects proportional to priority and cap nr_deferred to twice
> > of cache items.
>
> Makes sense to me, minimally it's simpler than the old code and avoiding absurd
> growth of nr_deferred should be a good thing, as well as the "proportional to
> priority" part.

Thanks.

>
> I just suspect there's a bit of unnecessary bias in the implementation, as
> explained below:
>
> > The idea is borrowed from Dave Chinner's patch:
> > https://lore.kernel.org/linux-xfs/20191031234618.15403-13-david@fromorbit.com/
> >
> > Tested with kernel build and vfs metadata heavy workload in our production
> > environment, no regression is spotted so far.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  mm/vmscan.c | 40 +++++-----------------------------------
> >  1 file changed, 5 insertions(+), 35 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 66163082cc6f..d670b119d6bd 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -654,7 +654,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >        */
> >       nr = count_nr_deferred(shrinker, shrinkctl);
> >
> > -     total_scan = nr;
> >       if (shrinker->seeks) {
> >               delta = freeable >> priority;
> >               delta *= 4;
> > @@ -668,37 +667,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >               delta = freeable / 2;
> >       }
> >
> > +     total_scan = nr >> priority;
> >       total_scan += delta;
>
> So, our scan goal consists of the part based on freeable objects (delta), plus a
> part of the defferred objects (nr >> priority). Fine.
>
> > -     if (total_scan < 0) {
> > -             pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n",
> > -                    shrinker->scan_objects, total_scan);
> > -             total_scan = freeable;
> > -             next_deferred = nr;
> > -     } else
> > -             next_deferred = total_scan;
> > -
> > -     /*
> > -      * We need to avoid excessive windup on filesystem shrinkers
> > -      * due to large numbers of GFP_NOFS allocations causing the
> > -      * shrinkers to return -1 all the time. This results in a large
> > -      * nr being built up so when a shrink that can do some work
> > -      * comes along it empties the entire cache due to nr >>>
> > -      * freeable. This is bad for sustaining a working set in
> > -      * memory.
> > -      *
> > -      * Hence only allow the shrinker to scan the entire cache when
> > -      * a large delta change is calculated directly.
> > -      */
> > -     if (delta < freeable / 4)
> > -             total_scan = min(total_scan, freeable / 2);
> > -
> > -     /*
> > -      * Avoid risking looping forever due to too large nr value:
> > -      * never try to free more than twice the estimate number of
> > -      * freeable entries.
> > -      */
> > -     if (total_scan > freeable * 2)
> > -             total_scan = freeable * 2;
> > +     total_scan = min(total_scan, (2 * freeable));
>
> Probably unnecessary as we cap next_deferred below anyway? So total_scan cannot
> grow without limits anymore. But can't hurt.
>
> >       trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
> >                                  freeable, delta, total_scan, priority);
> > @@ -737,10 +708,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> >               cond_resched();
> >       }
> >
> > -     if (next_deferred >= scanned)
> > -             next_deferred -= scanned;
> > -     else
> > -             next_deferred = 0;
> > +     next_deferred = max_t(long, (nr - scanned), 0) + total_scan;
>
> And here's the bias I think. Suppose we scanned 0 due to e.g. GFP_NOFS. We count
> as newly deferred both the "delta" part of total_scan, which is fine, but also
> the "nr >> priority" part, where we failed to our share of the "reduce
> nr_deferred" work, but I don't think it means we should also increase
> nr_deferred by that amount of failed work.

Here "nr" is the saved deferred work since the last scan, "scanned" is
the scanned work in this round, total_scan is the *unscanned" work
which is actually "total_scan - scanned" (total_scan is decreased by
scanned in each loop). So, the logic is "decrease any scanned work
from deferred then add newly unscanned work to deferred". IIUC this is
what "deferred" means even before this patch.

> OTOH if we succeed and scan exactly the whole goal, we are subtracting from
> nr_deferred both the "nr >> priority" part, which is correct, but also delta,
> which was new work, not deferred one, so that's incorrect IMHO as well.

I don't think so. The deferred comes from new work, why not dec new
work from deferred?

And, the old code did:

if (next_deferred >= scanned)
                next_deferred -= scanned;
        else
                next_deferred = 0;

IIUC, it also decreases the new work (the scanned includes both last
deferred and new delata).

> So the calculation should probably be something like this?
>
>         next_deferred = max_t(long, nr + delta - scanned, 0);
>
> Thanks,
> Vlastimil
>
> > +     next_deferred = min(next_deferred, (2 * freeable));
> > +
> >       /*
> >        * move the unused scan count back into the shrinker in a
> >        * manner that handles concurrent updates.
> >
>