All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
Date: Tue, 12 Mar 2024 14:29:48 -0600	[thread overview]
Message-ID: <ZfC7PO0-3Kg88Wj3@google.com> (raw)
In-Reply-To: <CALOAHbAsyT9ms739DLZeAf88XsrxjJgm1D8wr+dKNFxROOQFFw@mail.gmail.com>

On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
> On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Thu,  7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > After we enabled mglru on our 384C1536GB production servers, we
> > > encountered frequent soft lockups attributed to scanning folios.
> > >
> > > The soft lockup as follows,
> > >
> > > ...
> > >
> > > There were a total of 22 tasks waiting for this spinlock
> > > (RDI: ffff99d2b6ff9050):
> > >
> > >  crash> foreach RU bt | grep -B 8  queued_spin_lock_slowpath |  grep "RDI: ffff99d2b6ff9050" | wc -l
> > >  22
> >
> > If we're holding the lock for this long then there's a possibility of
> > getting hit by the NMI watchdog also.
> 
> The NMI watchdog is disabled as these servers are KVM guest.
> 
>     kernel.nmi_watchdog = 0
>     kernel.soft_watchdog = 1
> 
> >
> > > Additionally, two other threads were also engaged in scanning folios, one
> > > with 19 waiters and the other with 15 waiters.
> > >
> > > To address this issue under heavy reclaim conditions, we introduced a
> > > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > > Following the application of this hotfix to our servers, the soft lockup
> > > issue ceased.
> > >
> > > ...
> > >
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > >
> > >                       if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > >                               break;
> > > +
> > > +                     spin_unlock_irq(&lruvec->lru_lock);
> > > +                     cond_resched();
> > > +                     spin_lock_irq(&lruvec->lru_lock);
> > >               }
> >
> > Presumably wrapping this with `if (need_resched())' will save some work.
> 
> good suggestion.
> 
> >
> > This lock is held for a reason.  I'd like to see an analysis of why
> > this change is safe.
> 
> I believe the key point here is whether we can reduce the scope of
> this lock from:
> 
>   evict_folios
>       spin_lock_irq(&lruvec->lru_lock);
>       scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
>       scanned += try_to_inc_min_seq(lruvec, swappiness);
>       if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
>           scanned = 0;
>       spin_unlock_irq(&lruvec->lru_lock);
> 
> to:
> 
>   evict_folios
>       spin_lock_irq(&lruvec->lru_lock);
>       scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
>       spin_unlock_irq(&lruvec->lru_lock);
> 
>       spin_lock_irq(&lruvec->lru_lock);
>       scanned += try_to_inc_min_seq(lruvec, swappiness);
>       if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
>           scanned = 0;
>       spin_unlock_irq(&lruvec->lru_lock);
> 
> In isolate_folios(), it merely utilizes the min_seq to retrieve the
> generation without modifying it. If multiple tasks are running
> evict_folios() concurrently, it seems inconsequential whether min_seq
> is incremented by one task or another. I'd appreciate Yu's
> confirmation on this matter.

Hi Yafang,

Thanks for the patch!

Yes, your second analysis is correct -- we can't just drop the lock
as the original patch does because min_seq can be updated in the mean
time. If this happens, the gen value becomes invalid, since it's based
on the expired min_seq:

  sort_folio()
  {
    ..
    gen = lru_gen_from_seq(lrugen->min_seq[type]);
    ..
  }

The following might be a better approach (untested):

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4255619a1a31..6fe53cfa8ef8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 				skipped_zone += delta;
 			}
 
-			if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
+			if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
+			    spin_is_contended(&lruvec->lru_lock))
 				break;
 		}
 
@@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 			skipped += skipped_zone;
 		}
 
-		if (!remaining || isolated >= MIN_LRU_BATCH)
+		if (!remaining || isolated >= MIN_LRU_BATCH ||
+		    (scanned && spin_is_contended(&lruvec->lru_lock)))
 			break;
 	}
 

  reply	other threads:[~2024-03-12 20:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07  3:19 [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios Yafang Shao
2024-03-07 17:06 ` Andrew Morton
2024-03-08  8:57   ` Yafang Shao
2024-03-12 20:29     ` Yu Zhao [this message]
2024-03-12 22:11       ` Yu Zhao
2024-03-13  2:21         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfC7PO0-3Kg88Wj3@google.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.