From: Yu Zhao <yuzhao@google.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
Date: Tue, 12 Mar 2024 16:11:55 -0600 [thread overview]
Message-ID: <ZfDTK79iQNlax-h6@google.com> (raw)
In-Reply-To: <ZfC7PO0-3Kg88Wj3@google.com>
On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote:
> On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
> > On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > After we enabled mglru on our 384C1536GB production servers, we
> > > > encountered frequent soft lockups attributed to scanning folios.
> > > >
> > > > The soft lockup as follows,
> > > >
> > > > ...
> > > >
> > > > There were a total of 22 tasks waiting for this spinlock
> > > > (RDI: ffff99d2b6ff9050):
> > > >
> > > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> > > > 22
> > >
> > > If we're holding the lock for this long then there's a possibility of
> > > getting hit by the NMI watchdog also.
> >
> > The NMI watchdog is disabled as these servers are KVM guest.
> >
> > kernel.nmi_watchdog = 0
> > kernel.soft_watchdog = 1
> >
> > >
> > > > Additionally, two other threads were also engaged in scanning folios, one
> > > > with 19 waiters and the other with 15 waiters.
> > > >
> > > > To address this issue under heavy reclaim conditions, we introduced a
> > > > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > > > Following the application of this hotfix to our servers, the soft lockup
> > > > issue ceased.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > > >
> > > > if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > > > break;
> > > > +
> > > > + spin_unlock_irq(&lruvec->lru_lock);
> > > > + cond_resched();
> > > > + spin_lock_irq(&lruvec->lru_lock);
> > > > }
> > >
> > > Presumably wrapping this with `if (need_resched())' will save some work.
> >
> > good suggestion.
> >
> > >
> > > This lock is held for a reason. I'd like to see an analysis of why
> > > this change is safe.
> >
> > I believe the key point here is whether we can reduce the scope of
> > this lock from:
> >
> > evict_folios
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > scanned = 0;
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > to:
> >
> > evict_folios
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > scanned = 0;
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > In isolate_folios(), it merely utilizes the min_seq to retrieve the
> > generation without modifying it. If multiple tasks are running
> > evict_folios() concurrently, it seems inconsequential whether min_seq
> > is incremented by one task or another. I'd appreciate Yu's
> > confirmation on this matter.
>
> Hi Yafang,
>
> Thanks for the patch!
>
> Yes, your second analysis is correct -- we can't just drop the lock
> as the original patch does because min_seq can be updated in the mean
> time. If this happens, the gen value becomes invalid, since it's based
> on the expired min_seq:
>
> sort_folio()
> {
> ..
> gen = lru_gen_from_seq(lrugen->min_seq[type]);
> ..
> }
>
> The following might be a better approach (untested):
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4255619a1a31..6fe53cfa8ef8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> skipped_zone += delta;
> }
>
> - if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> + if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
> + spin_is_contended(&lruvec->lru_lock))
> break;
> }
>
> @@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> skipped += skipped_zone;
> }
>
> - if (!remaining || isolated >= MIN_LRU_BATCH)
> + if (!remaining || isolated >= MIN_LRU_BATCH ||
> + (scanned && spin_is_contended(&lruvec->lru_lock)))
> break;
> }
A better way might be:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4255619a1a31..ac59f064c4e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
break;
+
+ if (need_resched() || spin_is_contended(&lruvec->lru_lock)) {
+ remaining = 0;
+ break;
+ }
}
if (skipped_zone) {
next prev parent reply other threads:[~2024-03-12 22:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 3:19 [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios Yafang Shao
2024-03-07 17:06 ` Andrew Morton
2024-03-08 8:57 ` Yafang Shao
2024-03-12 20:29 ` Yu Zhao
2024-03-12 22:11 ` Yu Zhao [this message]
2024-03-13 2:21 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfDTK79iQNlax-h6@google.com \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=laoar.shao@gmail.com \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.