From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F07CDC54E5D for ; Wed, 13 Mar 2024 02:22:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D7438E0032; Tue, 12 Mar 2024 22:22:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 888A08E0011; Tue, 12 Mar 2024 22:22:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 750168E0032; Tue, 12 Mar 2024 22:22:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 649EC8E0011 for ; Tue, 12 Mar 2024 22:22:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3BEBA80F70 for ; Wed, 13 Mar 2024 02:22:28 +0000 (UTC) X-FDA: 81890416776.05.970DC6B Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf05.hostedemail.com (Postfix) with ESMTP id 773BD100004 for ; Wed, 13 Mar 2024 02:22:26 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PVy0+zkj; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710296546; a=rsa-sha256; cv=none; b=geAWcNMMXG611TTIVMGxSnGc+dBLdNY6YpVl0ynYTvJgB5FSOLuv2OGdkvYepxmP/5QPRJ 8K/bCvGO96+8rI44Pua/5DoGid6K98iC5VbyVVzqHsN7b270BkidlcRGTU2KUEZkOSjyeI lPSjJ4LRmbrUx5+5aRviM9ma2nuSSko= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PVy0+zkj; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710296546; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0zm7KTlg2XztzGKUYwxOz8ydvaIZ+iWJFSeIhbfCjRA=; b=yqBri7+6angFK/sXs0/8OaA049DZKj+qj1HgQD5qDIb2N1pNK5k/RLFzxQbqarc9OpDVt6 xPK6lmWYTEViw3WpcXOUou/ICU4SFWzgwB1Y2bIK4nCEDGbPYCUvJTfut2rrv9VHXjua9u VTheOJJ2wlOEVCIwdFc6HvO/DgXzWR4= Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-690fed6816fso3261326d6.1 for ; Tue, 12 Mar 2024 19:22:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710296545; x=1710901345; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0zm7KTlg2XztzGKUYwxOz8ydvaIZ+iWJFSeIhbfCjRA=; b=PVy0+zkjGl2Kbs41RSuBWg7XVjsh3gt4iegONfGazf8j4ys2ASEmtuxEPPbkrKlWBF Dz7FMBYqmPCVcwl4ONTtrgz90KBgYwz0mgYisq1MYofTyxnVlFnUYwbp+Xl8V0vFpXhy 4+ag77/k/yUqnubFbMwXfxdMu4Q/EDtb7QQK0XUWiaW1Accs1qvRhFOu6OiR6MLFQXVd uJmN0Z+ljrT1OFNzSr/KjBtqcQ+bsUZBRJpd9PSX+m67+S1JJxIcTN/2BXz5ahdKwQ+d CtuQfULOB3RKv0Py5R4Tf3kKlxr0ksjsqZGSiRHySd4bnLHTxjKkwjvOHdizVQ6KjKzf bF9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710296545; x=1710901345; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0zm7KTlg2XztzGKUYwxOz8ydvaIZ+iWJFSeIhbfCjRA=; b=AfJNu2A1A2PZDyuf96dPhkIwZzAh+k363Y+y++mam8wVUk3TzE1vCZ4MAo2XhqE9o2 xcUMwglGTmXuqSDyFbq83YRUyOcml+7JnNunri7htNqwciynpWAX01lg6ePt5+gWY7aF SqXtXiljajZdVIAd1RzPavWowIocjSBuaI1PBtXwhzS0l2laTIqzis8oA10bi27Cfdjg S5C6+WgBereIjOVJKV30tUoMs7CYgEYYyEwCgMtV5zpFeQ2YLO4CojmI2OCH4ZCWr8B0 CmY14D5Do+GsnZIALO7b4RxOblYgqij3y1HXrCM4DfI7dKUaQrv01U/so8vttchH3uPf SxyQ== X-Forwarded-Encrypted: i=1; AJvYcCUvyedAKqk/hf2VgO36XJP9Ua8tZFzfnncWapBfvL6ADNm1NiC/CMMjSAwXqC2ThrWbaMChlEzLHvufPPgd9/6AiN4= X-Gm-Message-State: AOJu0YylpJlYouAINHbRBV6H4ASerjyQHoCAovR1C4eNLLebydsojGcq o4hSyeUwBhbZlXl6RExaZkv9TFWp+4XBwtqIv0Ao2LQKCPEzTehLPveDAsiZ9vsd2k9sbKWpnHA 1kWEcGKAOmiBPc5gavRGrX2hkwDU= X-Google-Smtp-Source: AGHT+IGRwvfE1LQv/A5aE8860oiNVgY2jYqJZ6xX8ndl3d0ba0sI87aq2RQhhti4cdMl8ibjUrlozIWSUN4tLfj9hjI= X-Received: by 2002:a0c:e84c:0:b0:690:f92e:79c9 with SMTP id l12-20020a0ce84c000000b00690f92e79c9mr2531410qvo.9.1710296545480; Tue, 12 Mar 2024 19:22:25 -0700 (PDT) MIME-Version: 1.0 References: <20240307031952.2123-1-laoar.shao@gmail.com> <20240307090618.50da28040e1263f8af39046f@linux-foundation.org> In-Reply-To: From: Yafang Shao Date: Wed, 13 Mar 2024 10:21:49 +0800 Message-ID: Subject: Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios To: Yu Zhao Cc: Andrew Morton , linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 773BD100004 X-Stat-Signature: 3fgeq8mfbfhsc1q6qgw9of54xuqz3ctq X-Rspam-User: X-HE-Tag: 1710296546-533217 X-HE-Meta: U2FsdGVkX1+JXbfwi2kda2sli77CegAzGmWWXbh4Vj/xHfcYhOy0jzyrpdt+XW6PJijz8hYFHnBVa5s5P9f0itg3EcSM+wm993+MsvEaU+Xoo/JlRRQgEnBJHHPmJEv8SzfbKlAkIZg4Nx++CWwWHzCA59cZrkjgWUh9XKEI4qW9oM1GQAiytP6bW6AqdLrjUadYcMISNGkr/4wTn6aoe7w+ftNt7ToEupKSREZAtWSgt02H0DsZ+zeNfPDn0xP8jXAlhn6VPVqnOwbonWW7ql9Inap13sEvQu+xyc2lkMDkCZppKo1GWjzYiLx3OIN72d6eaEmHAVf1GwfucZGtqwkgPbb7TQMMFs6kxjyQPdkeSPnOKgGBY/8hQsOFfhd5GgI9oGdm2E8pUKlSSbZriWRXbNCGr+zJPskncoF7LDCFm9MToRsqd0DNPJClTdEuX43afDx2BqQ3CFP8bt8cqpi+jB6mXcTLzo8Jlw14bC8v6kMxT9tYXCDs+aqFckx4HXEGLhpouvrMpmHLe9bAmWJPgDALs93yRh5rvINJZ6EKj6T7qHI1HLysjqxengjyru1U6fA+Py508WxpffRHT1vPcj/tb5+F8TPDSbi5+DfHK5kX/2cZ2gu4nW21/KZQ1IIurYDOAlZAWqIer59KXQtgijRezc40Mksq9IucwqzV1qCJWTlytENwiIvS4gKOBA2hJD3PfqsGby0GOLCrCtl34du/cqoPpiKonDkNln+YnJfXEyjYHxwACObKm8ZYBl2ObRFLFeJej7PYUxDQmaySFW2u3dcR5+A0zdKTGphWFI1k/u3fjBQDJxyAw3haOIje7CHi9q7l39VC3/fD6BhAFALPpZj+qutKyoQ1+ZfdQmbJl486SoUNfDIvE/gNPDGIIMLvnLdGUJRnk1nI3QY1HpTti/B8ccevea20QeCqIoRO6ZIXMEMtYyfn5/n1IhfeLHP8/IQUsmlkw6D +nuHR9B+ ZtcGHv5r3t5Pe4DZVpXntS1VLID5ziJT862S3K0MVQhAeKDTfEN4xxztLU7pqdb4tA8PATk7JyLpoIyOolIF3BwUT1xgRkou6o+nKirIPQAQ85ASnEYNBU6xSVvcQqfhyqxjc5q8DNfHdDAktfy2mhcLDpkOY9uL3dNfi2nF8yby5fQZ/C/HLXb0laTAb/sbqUKRS6fAK9sKFMUcuRLPFDG9VKiM4/JOy/J3qr6wC8SgM83yvE0gbUT6cFra4ufIiA90p+qRk8nMWjdjCK06lje3//YrkzPUiLjneUDp+lZnGB5cBxgHHsQjnU2fa36Hjaz87VGoBmtjqNnHVGhRjUskFdoQF1+fZFWoIuxLmdNLsQVvs8A68iWhdyMuqtdynpwrcPYg6B/bKaXg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 13, 2024 at 6:12=E2=80=AFAM Yu Zhao wrote: > > On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote: > > On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote: > > > On Fri, Mar 8, 2024 at 1:06=E2=80=AFAM Andrew Morton wrote: > > > > > > > > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao wrote: > > > > > > > > > After we enabled mglru on our 384C1536GB production servers, we > > > > > encountered frequent soft lockups attributed to scanning folios. > > > > > > > > > > The soft lockup as follows, > > > > > > > > > > ... > > > > > > > > > > There were a total of 22 tasks waiting for this spinlock > > > > > (RDI: ffff99d2b6ff9050): > > > > > > > > > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | g= rep "RDI: ffff99d2b6ff9050" | wc -l > > > > > 22 > > > > > > > > If we're holding the lock for this long then there's a possibility = of > > > > getting hit by the NMI watchdog also. > > > > > > The NMI watchdog is disabled as these servers are KVM guest. > > > > > > kernel.nmi_watchdog =3D 0 > > > kernel.soft_watchdog =3D 1 > > > > > > > > > > > > Additionally, two other threads were also engaged in scanning fol= ios, one > > > > > with 19 waiters and the other with 15 waiters. > > > > > > > > > > To address this issue under heavy reclaim conditions, we introduc= ed a > > > > > hotfix version of the fix, incorporating cond_resched() in scan_f= olios(). > > > > > Following the application of this hotfix to our servers, the soft= lockup > > > > > issue ceased. > > > > > > > > > > ... > > > > > > > > > > --- a/mm/vmscan.c > > > > > +++ b/mm/vmscan.c > > > > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruv= ec, struct scan_control *sc, > > > > > > > > > > if (!--remaining || max(isolated, skipped_z= one) >=3D MIN_LRU_BATCH) > > > > > break; > > > > > + > > > > > + spin_unlock_irq(&lruvec->lru_lock); > > > > > + cond_resched(); > > > > > + spin_lock_irq(&lruvec->lru_lock); > > > > > } > > > > > > > > Presumably wrapping this with `if (need_resched())' will save some = work. > > > > > > good suggestion. > > > > > > > > > > > This lock is held for a reason. I'd like to see an analysis of why > > > > this change is safe. > > > > > > I believe the key point here is whether we can reduce the scope of > > > this lock from: > > > > > > evict_folios > > > spin_lock_irq(&lruvec->lru_lock); > > > scanned =3D isolate_folios(lruvec, sc, swappiness, &type, &list= ); > > > scanned +=3D try_to_inc_min_seq(lruvec, swappiness); > > > if (get_nr_gens(lruvec, !swappiness) =3D=3D MIN_NR_GENS) > > > scanned =3D 0; > > > spin_unlock_irq(&lruvec->lru_lock); > > > > > > to: > > > > > > evict_folios > > > spin_lock_irq(&lruvec->lru_lock); > > > scanned =3D isolate_folios(lruvec, sc, swappiness, &type, &list= ); > > > spin_unlock_irq(&lruvec->lru_lock); > > > > > > spin_lock_irq(&lruvec->lru_lock); > > > scanned +=3D try_to_inc_min_seq(lruvec, swappiness); > > > if (get_nr_gens(lruvec, !swappiness) =3D=3D MIN_NR_GENS) > > > scanned =3D 0; > > > spin_unlock_irq(&lruvec->lru_lock); > > > > > > In isolate_folios(), it merely utilizes the min_seq to retrieve the > > > generation without modifying it. If multiple tasks are running > > > evict_folios() concurrently, it seems inconsequential whether min_seq > > > is incremented by one task or another. I'd appreciate Yu's > > > confirmation on this matter. > > > > Hi Yafang, > > > > Thanks for the patch! > > > > Yes, your second analysis is correct -- we can't just drop the lock > > as the original patch does because min_seq can be updated in the mean > > time. If this happens, the gen value becomes invalid, since it's based > > on the expired min_seq: > > > > sort_folio() > > { > > .. > > gen =3D lru_gen_from_seq(lrugen->min_seq[type]); > > .. > > } > > > > The following might be a better approach (untested): > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 4255619a1a31..6fe53cfa8ef8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, str= uct scan_control *sc, > > skipped_zone +=3D delta; > > } > > > > - if (!--remaining || max(isolated, skipped_zone) >= =3D MIN_LRU_BATCH) > > + if (!--remaining || max(isolated, skipped_zone) >= =3D MIN_LRU_BATCH || > > + spin_is_contended(&lruvec->lru_lock)) > > break; > > } > > > > @@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, str= uct scan_control *sc, > > skipped +=3D skipped_zone; > > } > > > > - if (!remaining || isolated >=3D MIN_LRU_BATCH) > > + if (!remaining || isolated >=3D MIN_LRU_BATCH || > > + (scanned && spin_is_contended(&lruvec->lru_lock))) > > break; > > } > > A better way might be: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4255619a1a31..ac59f064c4e1 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, stru= ct scan_control *sc, > > if (!--remaining || max(isolated, skipped_zone) >= =3D MIN_LRU_BATCH) > break; > + > + if (need_resched() || spin_is_contended(&lruvec->= lru_lock)) { > + remaining =3D 0; > + break; > + } > } > > if (skipped_zone) { It is better. Thanks for your suggestion. I will verify it on our production servers, which may take several days. --=20 Regards Yafang