From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6627BC52D7C for ; Sun, 18 Aug 2024 07:07:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EBEB6B0447; Sun, 18 Aug 2024 03:07:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99B836B0448; Sun, 18 Aug 2024 03:07:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86DD46B0449; Sun, 18 Aug 2024 03:07:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6A88B6B0447 for ; Sun, 18 Aug 2024 03:07:40 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 04F4880A74 for ; Sun, 18 Aug 2024 07:07:39 +0000 (UTC) X-FDA: 82464485880.23.9F991C9 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) by imf07.hostedemail.com (Postfix) with ESMTP id 3801440014 for ; Sun, 18 Aug 2024 07:07:38 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="gtTIw/kC"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723964773; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dz/w/7aMh+rwPp/zbjE/lpS1G5EmTh/C0/ZB753IjuE=; b=dSNSZtcn1bjcACqmHD31Ubfd7CFPEYonfKR0JWORKDPotrxUEz2ahjRTqOJzUwVmYeSJtt k4NmvoRapL2/M5AUPc7Ry9TH1WCMb+5WHhMzPMsvTugoJvHNRPZT/zATALLWnLXVdxyHRG 3glJwrO3XJxG5DXFeD5eAub+ILFjZI8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723964773; a=rsa-sha256; cv=none; b=B5aEsR35gaCi+Ml/k3YVX0hopn2DmBYYG2O/75PJj2pVl1jRiE5hxbZU9l7su8X4ChiuoZ wp7xiuE3ISxLa9mkGusoPqhbhRwSINbkE6npnihJDdUsn1BRnwtm31f/lFi2dIkj/C9h5f 2/eP+pLRxIQgxrXChbnKukzBKOR6FCE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="gtTIw/kC"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-e13d5cbc067so877917276.2 for ; Sun, 18 Aug 2024 00:07:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723964857; x=1724569657; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dz/w/7aMh+rwPp/zbjE/lpS1G5EmTh/C0/ZB753IjuE=; b=gtTIw/kCd6ZgUIbQ68H4lnP7YRaPwFyZmKCLSYe3NWn2jpevq38puJjygZxAu8/IXw ALx6Igw4LOSp68sxpNocqGCsQbdsvNZHJ45idhwRNxgSOoeh6KaAsN9Uj67YMe04YYg/ sqa3Rkwokj0kpOxTA9bjegt/SpzsaMezHSd+y7UdCwDIQqU9Woe3crgdc+zh10nJSJbl RwEC9BSTWPuR7Ghr5xhpZjkE6COpaeBycR5G9m7oEDiUGyN2MEdR1eS59GM+iHu4VELo kIn23DYt5zD2kdIzzU8iLbRZ1ZDowTCrzzU28izLx27ks8Etg6qhqxKKmV0KrmtoFZ2s DlzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723964857; x=1724569657; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dz/w/7aMh+rwPp/zbjE/lpS1G5EmTh/C0/ZB753IjuE=; b=nCaWGmwBwAcln3Pdec2VEp5Bub/7vL0Fa8cNJonTaX64W1/81AOUiW9QnpaKgx+vbQ gy7Atdnqjr1ap7N11F1pPukFNt1YzRF81hsSpX0AoAopOgbNA69J3HQucKlStL5NeUcg ctI/qAlBJr55vMyA2lc11s0dQ2u3vlVVCxltpNu7XhU1JgaxLm+8pYh2oaQm9iVvAYqD gQW+lr4rc/Z1gHEU37wRncDJ3zNDJu67obEmKO1yGFmATVka/Vc+sntCEe/yQzJAoRes CxRskBw00PdwHj0VnfmxPvwpVF+/4z2o2MNpgFr5v17G/X/wMG/nlc9s0Tnm3MGbr6GY Bf0A== X-Forwarded-Encrypted: i=1; AJvYcCVZcDoSEkfc7mnxsouULW5xM4HCjkB2iLLJf4epRq0u3YK7YS0e9Mapkak4/2dT0d4luSxl99U5emefIM6DPMWYf18= X-Gm-Message-State: AOJu0YzJgYTjZfmwh96LohAEHrZhD2Mw8YW5HJxvIj2hVlazurl2e4jv I87V3N29kvVHEEY82avNvl6MLR2cN6xtJPMznEmsFvHDSrFu727TS6azciyvCc5BrOYcWZJyjql 2bOQLAE7CBz10Kp3JHDQnWT77Zpo= X-Google-Smtp-Source: AGHT+IEOfwtacAGNpo2AtWA6BFDlLTMS64eflyJpNpn1WwCNEuZFsHYCTVY07cswdoiGqINujJOB8B3xP3Y1YgXRurY= X-Received: by 2002:a25:d801:0:b0:e13:c739:5b56 with SMTP id 3f1490d57ef6-e13c7395d0bmr5831421276.40.1723964857145; Sun, 18 Aug 2024 00:07:37 -0700 (PDT) MIME-Version: 1.0 References: <20240817062449.21164-1-21cnbao@gmail.com> <20240817062449.21164-5-21cnbao@gmail.com> In-Reply-To: From: Yafang Shao Date: Sun, 18 Aug 2024 15:07:00 +0800 Message-ID: Subject: Re: [PATCH v3 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, torvalds@linux-foundation.org, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev, Lorenzo Stoakes , Kees Cook , =?UTF-8?Q?Eugenio_P=C3=A9rez?= , Jason Wang , Maxime Coquelin , "Michael S. Tsirkin" , Xuan Zhuo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3801440014 X-Stat-Signature: wt39c6sohmfirrbzcbz6bbs6mchoupx7 X-Rspam-User: X-HE-Tag: 1723964858-28946 X-HE-Meta: U2FsdGVkX1+fFDStXnuXaXL8/pL0i596kVRIe9VD+PF54Oug4r7aZ4/boFS5zMzz/r/7/KZLdS6oTPi/zRSxxwcW7dk4xDMTjfy5XS9Cmaj2O0EWdwEhgHoD1lP/s+G3DPY9n8/2el+UgPlLmiecmF7ZGoZJrr0Tx+YxChTDXOWuDVCWux+OVVNilICRyF8LQ7WSlYaaNBQmKVmey1xNRgVEFGEezz2E1Ku2CvF6ILiMrQdPb6eGSjc6rp8MuE6iyVJGvqkWD/Qhj7B2LfB/vEzbtajpRIc1jLkEVKPM/Gq8Fz+sLN2yB0RN+pqRGIvU3Oe08Dbb5LPhgrJ+kPfjI3ecVRlOzJBMu8NKc/6WzS7yL+DRo9pzZU0Mrng9Hlm8LYbGP0csDFwHOWLoguvR/j3VYu5NTjeJL7VTuEnB9rEIrIIT3+WBJWR81zdjtowY85y52J57lqeINfrk+wlrCHZs67lv9pJhwkRh5KMtVGZt8DXJMz/rThC3pAKhVw7Z2Q95YGy61+qkbfeyR9c6w6lN1YyxpKqX7AhKo1zHUrj19useCsZpIp35eBX8ABpWmAirgLNlOZEKUEY9Pwi/cKs1ayQL/DWG4a/OF2iDBY+utSmwcmfLVqeXSshVAIyHpzFAWJbAe5hDvctLDtvUJ7DZR29WspexLMJs8McuWXn7xqklv/yZkgB/zyJAA4qxZ46Kthg+U8anByqomfOIqIJCmNbbgCnFS4TlfP5sKXnYllk4dG/IKGzBvNOPvbek3sSVAHY7FSRHILxnBNsP4d5w0cr0y0iowNnskH0l71320oFOP2RVWiosiLCgqE0bc094Z6uWet7cWvi1RKmgmVF9KYu9pDVAjmBjl1Pk2pzFEey/jcLyHP2oG6PSXrU/Vt/pB1pfFs1Eyebn3SvE+yLT3HG1Ue3/cSfPX8fg70+8zk95VwvoPggZMn8vL92HqZ8Y0YHZeGEdpDkbASe 46n8wvUi noR/QF2DvRniR2VxgwsYO2yO4jRfnZFk5F6FZhF6mXOpucjepT47qZisDM7SfnUrksnH+U5FXfkYPuy/c62nvG66zc0yxwUFnOy8iSFq9swCrel+yBfMDuWD4Awfqw8+GCl+rLVKOLhB7EnpFS4gIDGgeUr1PDLJeJ6PeJgKJgTct53d/uv75icfLkNHo+huwDJyOicihK3I/jCM58+hUbKqRSbHyH7mT3jdk8JNlRsG2GJV6ZXITxqaZdTPo86NRByggz3Qv162PtEk+2M8vVfgojFGv7TCM52mwvhZk2doLnrK3GU4IRnXz0CTCA8/uiRSLJK1lyrAEHPxkvtt6WXzqzeAzg78IpL3UI26ikDY27EwgEcFL+cCqBDktz4HBTgtLSvAoHkxwA154WqoRqVl0rZUKSaL7ME7Ycs22Wq0VJS45eP4bWfm2cSX6592UkCleTarVT6hGI6/7KiHFBg6as8ivppNcqxcFbOMFfpa1pqBTpnjvan3YXhaiQffMMsqmn65tfi+vsL+jiyjmAP8IVf8cTNYZQQeX4h+iUws6VmGy3Q0C6e/tZbSdgiFc5vpNul7nSn4usAXu51rb4Q/md5F8TbXmt5lAWSeSB748oGmAeNRUS5g5gKhScJuY0CDO6vdC3JNrbC/EmVw0po8tONKt+2ngdSrcV2qYovir1UU+Y79eQrBVy1d3x/vr2MwmGZHNKNhLSViPsp+I+wla8BL0yv4edJ2h+WegI8vMfnOhwj9jtOPQ0pxelUjf8JA/QwgVtOEBiAr5bVoi5jek9mEF2m57I3YnvqRpx68yo/OHYt3c90s/KF3CK0tZ7sZTnwquSHp01EEOfibapy2gXhBDgymVpQs7mc6JJhize3kiAF6jWVPeGouijYMeYy8PwyoK9sg6OKtU42q1b6R6cyWRIM3rP44Do9FwWulJx6uIPciCW1//INitIeQu7+u+TSI8zR4HzjR8Y12p5AjrYYYT w+nIcoC/ nU7PRIigre5pELr/pLNcHdS0xji3+jJDID7V3bk03ha06e1Qmnw1cg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Aug 18, 2024 at 2:45=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Sun, Aug 18, 2024 at 6:27=E2=80=AFPM Barry Song <21cnbao@gmail.com> wr= ote: > > > > On Sun, Aug 18, 2024 at 5:51=E2=80=AFPM Yafang Shao wrote: > > > > > > On Sun, Aug 18, 2024 at 11:48=E2=80=AFAM Barry Song <21cnbao@gmail.co= m> wrote: > > > > > > > > On Sun, Aug 18, 2024 at 2:55=E2=80=AFPM Yafang Shao wrote: > > > > > > > > > > On Sat, Aug 17, 2024 at 2:25=E2=80=AFPM Barry Song <21cnbao@gmail= .com> wrote: > > > > > > > > > > > > From: Barry Song > > > > > > > > > > > > When users allocate memory with the __GFP_NOFAIL flag, they mig= ht > > > > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This= kind of > > > > > > non-blockable __GFP_NOFAIL is not supported and is pointless. = If we > > > > > > attempt and still fail to allocate memory for these users, we h= ave two > > > > > > choices: > > > > > > > > > > > > 1. We could busy-loop and hope that some other direct recla= mation or > > > > > > kswapd rescues the current process. However, this is unreli= able > > > > > > and could ultimately lead to hard or soft lockups, > > > > > > > > > > That can occur even if we set both __GFP_NOFAIL and > > > > > __GFP_DIRECT_RECLAIM, right? So, I don't believe the issue is rel= ated > > > > > to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed > > > > > design of __GFP_NOFAIL itself. > > > > > > > > the point of GFP_NOFAIL is that it won't fail and its user won't ch= eck > > > > the return value. without direct_reclamation, it is sometimes impos= sible. > > > > but with direct reclamation, users constantly wait and finally they= can > > > > > > So, what exactly is the difference between 'constantly waiting' and > > > 'busy looping'? Could you please clarify? Also, why can't we > > > 'constantly wait' when __GFP_DIRECT_RECLAIM is not set? > > > > I list two options in changelog > > 1: busy loop 2. bug_on. I am actually fine with either one. either one = is > > better than the existing code. but returning null in the current code > > is definitely wrong. > > > > 1 somehow has the attempt to make __GFP_NOFAIL without direct_reclamati= on > > legal. so it is a bit suspicious going in the wrong direction. > > > > busy-loop is that you are not reclaiming memory you are not sleeping. > > cpu is constantly working and busy, so it might result in a lockup, eit= her > > soft lockup or hard lockup. Thanks for the clarification. That can be avoided by a simple cond_resched() if the hard lockup or softlockup is the main issue ;) > > > > with direct_reclamation, wait is the case you can sleep. it is not hold= ing > > cpu, not a busy loop. in rare case, users might end in endless wait, > > but it matches the doc of __GFP_NOFAIL, never return till memory > > is gotten (the current code is implemented in this way unless users > > incorrectly combine __GFP_NOFAIL with aotmic/nowait etc.) > > > > and the essential difference between "w/ and w/o direct_reclaim": with > direct reclaim, the user is actively reclaiming memory to rescue itself > by all kinds of possible ways(compact, oom, reclamation), while without > direct reclamation, it can do nothing and just loop (busy-loop). It can wake up kswapd, which can then reclaim memory. If kswapd can't keep up, the system is likely under heavy memory pressure. In such a case, it makes little difference whether __GFP_DIRECT_RECLAIM is set or not. For reference, see the old issue: https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com/. I believe the core issue persists, and the design of __GFP_NOFAIL exacerbates it. By the way, I believe we could trigger an asynchronous OOM kill in the case without direct reclaim to avoid busy looping. > > > note, long-term we won't expose __GFP_NOFAIL any more. we > > will only expose GFP_NOFAIL which enforces Blockable. I am > > quite busy on other issues, so this won't happen in a short time. > > > > > > > > > get memory. if you read the doc of __GFP_NOFAIL you will find it. > > > > it is absolutely clearly documented. > > > > > > > > > > > > > > > which might not > > > > > > be well supported by some architectures. > > > > > > > > > > > > 2. We could use BUG_ON to trigger a reliable system crash, = avoiding > > > > > > exposing NULL dereference. > > > > > > > > > > > > Neither option is ideal, but both are improvements over the exi= sting code. > > > > > > This patch selects the second option because, with the introduc= tion of > > > > > > scoped API and GFP_NOFAIL=E2=80=94capable of enforcing direct r= eclamation for > > > > > > nofail users(which is in my plan), non-blockable nofail allocat= ions will > > > > > > no longer be possible. > > > > > > > > > > > > Signed-off-by: Barry Song > > > > > > Cc: Michal Hocko > > > > > > Cc: Uladzislau Rezki (Sony) > > > > > > Cc: Christoph Hellwig > > > > > > Cc: Lorenzo Stoakes > > > > > > Cc: Christoph Lameter > > > > > > Cc: Pekka Enberg > > > > > > Cc: David Rientjes > > > > > > Cc: Joonsoo Kim > > > > > > Cc: Vlastimil Babka > > > > > > Cc: Roman Gushchin > > > > > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > > > Cc: Linus Torvalds > > > > > > Cc: Kees Cook > > > > > > Cc: "Eugenio P=C3=A9rez" > > > > > > Cc: Hailong.Liu > > > > > > Cc: Jason Wang > > > > > > Cc: Maxime Coquelin > > > > > > Cc: "Michael S. Tsirkin" > > > > > > Cc: Xuan Zhuo > > > > > > --- > > > > > > mm/page_alloc.c | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > > index d2c37f8f8d09..fb5850ecd3ae 100644 > > > > > > --- a/mm/page_alloc.c > > > > > > +++ b/mm/page_alloc.c > > > > > > @@ -4399,11 +4399,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, = unsigned int order, > > > > > > */ > > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > > /* > > > > > > - * All existing users of the __GFP_NOFAIL are b= lockable, so warn > > > > > > - * of any new users that actually require GFP_N= OWAIT > > > > > > + * All existing users of the __GFP_NOFAIL are b= lockable > > > > > > + * otherwise we introduce a busy loop with insi= de the page > > > > > > + * allocator from non-sleepable contexts > > > > > > */ > > > > > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_m= ask)) > > > > > > - goto fail; > > > > > > + BUG_ON(!can_direct_reclaim); > > > > > > > > > > I'm not in favor of using BUG_ON() here, as many call sites alrea= dy > > > > > handle the return value of __GFP_NOFAIL. > > > > > > > > > > > > > it is not correct to handle the return value of __GFP_NOFAIL. > > > > if you check the ret, don't use __GFP_NOFAIL. > > > > > > If so, you have many code changes to make in the linux kernel ;) > > > > > > > Please list those code using __GFP_NOFAIL and check the result > > might fail, we should get them fixed. This is insane. NOFAIL means > > no fail. You can find some instances with grep commands, but there's no reliable way to capture them all with a single command. Here are a few examples: // drivers/infiniband/hw/cxgb4/mem.c skb =3D alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL); if (!skb) return -ENOMEM; // fs/xfs/libxfs/xfs_dir2.c args =3D kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL); if (!args) return -ENOMEM; -- Regards Yafang