From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CB74C52D7C for ; Sun, 18 Aug 2024 06:45:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93D518D00DB; Sun, 18 Aug 2024 02:45:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EBD68D00B8; Sun, 18 Aug 2024 02:45:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B3F38D00DB; Sun, 18 Aug 2024 02:45:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5F3F88D00B8 for ; Sun, 18 Aug 2024 02:45:23 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D527C40A31 for ; Sun, 18 Aug 2024 06:45:22 +0000 (UTC) X-FDA: 82464429684.20.3AB6CBF Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by imf04.hostedemail.com (Postfix) with ESMTP id 1C52740006 for ; Sun, 18 Aug 2024 06:45:19 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QieaN7Hq; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723963444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FGe0T7l9uL3Dzm+n7R4FudLcaAnAIzAU+FrtXP7P/TE=; b=K9mdxYS+nxJgxy+bg2VGng4+5E88vewCzDh7rUhpsLMWb0G+T5dhTHcMCv/B2trFX2qp1+ 5OW/Azca+VsfDtAuCDAoqEEY3EnjxVNu+589nxaqPUc4uo/i/ZeNi0frVRDx05jyQfUQkS q7ll1feRoZmXjpZ8siJQn2EYAV+MqcY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723963444; a=rsa-sha256; cv=none; b=XLWt+8KL/gyZsLFPrBUNeRMPKnwHNvcd2MbgC7qVWHP00QDE9QifHjETRMEfAFgyrEQU9q oHDL5CMUUc2ljVJMhOpWeI7isoUMuJKba7tiILO3cHIaFTv59isfHasWMJpqWKEGzXw/pl rhEDXqYk192P2qHJr5OPnH3PqBw5NfQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QieaN7Hq; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-4f6b7250d6dso1310254e0c.0 for ; Sat, 17 Aug 2024 23:45:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723963519; x=1724568319; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FGe0T7l9uL3Dzm+n7R4FudLcaAnAIzAU+FrtXP7P/TE=; b=QieaN7HqBV7k2/3oROiF+013bKj+0uA8gspGhV3WY32T55FcnCP1ty6yk8WmShNwfX 5BBv/qhcmiuDa1NCG++F8+ar7k6n5+Ka3kvx1lH8jQZs+grJs+N+4U4NKrwa++2/pkl+ oFjTKwXj6yTbr2Fwtf+NaTsiGKLL9pfBmCc9qbt9qFfWctkBNJkJe3jID3TB9l1vkDPz 63AmACED187S2vW8ump2/fqUD1Dg+1WXE9nQ2uhDFuHpsJFrkFjJEIROytb+D+fl5/Z3 ItsopVG/k/kAsST1BB5eyp7PB1dhcv+icIQIK5VPIaVVmW7jX74p2vykOdewbSNfmEHG Gzyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723963519; x=1724568319; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FGe0T7l9uL3Dzm+n7R4FudLcaAnAIzAU+FrtXP7P/TE=; b=s7iPIY1Ol5a6PtU6lnlkBHpM8iYlNP0B44rJE9xdK2MQSrMs6Xtu1mq86VWnVW/aVs Xe9SiBgE/Id14wavgxKRiplY2z+RSTy6fW8HDuhLzWsdaNR8myIV7M2QbJYYb47wp6Mv qKmbHHKfvy/S4CMMWmblIwqup8Dtxn66ioUAJXZMBV+EibzusMyE86vjMfVTQn7o8thZ UjwUs3m7SKI00LoDiUZzAo4S2dRp1XGbfT/BT3mSTNVWxq2h8Nz/Cqo5XRdN+zfHQP00 ATai9L1kzAcKFtmst95Oj/iBGKl3mPOWxDfIfwO1qbhIjkRD+gpJZ+liHXqpLC4PUKlz hzFQ== X-Forwarded-Encrypted: i=1; AJvYcCVl2TRE+zWOnl84+BCbkVIhxHj7wVtHXGjmeshQ/Uh4NH0OYnWOPY1QP+b3Pl/zSq76gc7JhSRSYSSejq8fVGOv8Ok= X-Gm-Message-State: AOJu0Yyh5eKRnVYXo47+tw0brF0+PT/VA2EvfQ2tFpcaIurueHfFblYJ reDB8g6Ae3ZRMCzAw5UVyZTe4B8fIGURtnDx9BJPNGuHz9aT0DLCaJX3scvSbslEYM99RBY0HAF jegZgx3bWHqwl5WrrRND4OV9QSr8= X-Google-Smtp-Source: AGHT+IEpxFT80QCZFL31v0q3iPKPZK65nLiPytLgDBTnvcLISCQg2QqNS0aoCVe1Zn0+UxiRUAweq1K8cq2A1j9nC2U= X-Received: by 2002:a05:6122:2a07:b0:4f6:a85d:38b3 with SMTP id 71dfb90a1353d-4fc6ca10d25mr9619582e0c.13.1723963519091; Sat, 17 Aug 2024 23:45:19 -0700 (PDT) MIME-Version: 1.0 References: <20240817062449.21164-1-21cnbao@gmail.com> <20240817062449.21164-5-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sun, 18 Aug 2024 18:45:08 +1200 Message-ID: Subject: Re: [PATCH v3 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL To: Yafang Shao Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, torvalds@linux-foundation.org, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev, Lorenzo Stoakes , Kees Cook , =?UTF-8?Q?Eugenio_P=C3=A9rez?= , Jason Wang , Maxime Coquelin , "Michael S. Tsirkin" , Xuan Zhuo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1C52740006 X-Stat-Signature: urkggz95kxc5m61spkotqt1113mooi6h X-HE-Tag: 1723963519-889203 X-HE-Meta: U2FsdGVkX1++dV+x7OD8twHoeBM0EPIQuDdcmJbtyMFcMUl0jBNs+tVGaBc20PHFVKIRlznL4O0yrMZG0lKTmeugCJh8+AxE/Mq35U4IaUoz1PMz5HnEext/XR+8/qSuC7ha48BUa9MXCikXnBH/tX4pvobWycuo6qemwDslQHzKhetXNiiNqnekAv3tKvKWwIzD38Z9g0RPdEslThthNStEKlvdda8TeOnEN2Iwg0PVemW1LWXHVAJFtI8d8S/BadI0AhHFQyyt41Xk29BzwQmPKGTv5P+dBGjnqirHkXhcD/I830DLwsJjcY6yQvkjUXkIKXqMjV2NSCvrUCLHI2bXPI27TN1wRvswd78WAWSLQBbXGWfDB9Tfq386Vm5cjiaHnYc0QaMFcQD7ygp67U3TwNhC6b9C1aVqMu20ZJ2KU9QmWlnKB2njSicwu9IzGP9rbQnMoQVFQsL2njZR0++Rkn7ui91MRBA6PYONo4Gj0B92PjF1DtB+qj10RDE15kN9PDIy5SwqRfkEH51IVlX/+HwOAoez8xAdJG1ompydIWTRXEfBFXKbxyisIzLvaPS3Asm/KQRVL+iJxUHpFX5Qq/WhRZ0FvYoaToMC8apSDWANGOqeKcpmhnEejZlhm9AYREpPgvjXJhzgxH+dvQyYqugLEqbcM+vSCzLbO6yiuTglYR42BMXvIv9OZQ/BsTq/7AdHB0Wswgzh9GKq+M7gr/ixk26uLzRlpIgHVu6NdUUa9nvBa41cMC5UwMegk74nShuz4fv7UzfO0zJli9B39com9z8iVKamAz7+eyomEDG7wbx5pTK47VWlUo7Jc/kcSgNEHS2k5hvP/GaqqTPBHjmWKkkwgEOCK4IFkzeLU/TcohFusll2gQKt1+kQ5PZ11Ag8zLJkULMsXR0IbEcEB1abfRCo9KgaR65fcm9WS6DE74geVZsn6YtNmDduZlReAknrKL8oSFXKKnZ 8uS8Pdv9 JsRKW5/uUPbuguMPdPXnrrBeZUXPRrTH5inidhWlZvQCzDkRLFk+ba/k3CWKEaeZ/XBuqM0JCV3HJZsePjpY7VA2u/rraHnb72EJXqSYA6wVrsTZriOc55wP8jjFCkELYIublcyfUzvHb+tZDYRo9vSH94JC+8HD6scyADScYOI1H+yqZ4QaOOIyLBpMrc3XOj7DttUjRyBYkKK7WvPb46RJHSCH14c+pIikBfDx4y29p1TMvGQpg+V325RlhYcZeQFxrJhZ9e9y9KXvm7xP4OPeGp2r4AjEyx1PmX7g0vZqqsDmqBuVZ8trwF1/hqPbpTuhrCZ6YihSPTUAMbnfx4PiIlapEiLHaZVsPLXIvICb+rBfwhywdKl6qNONMHXAgFbnP7r4/gkiMdZ+Nabf6Cd6dTyc6ZgacKNpTmypkBqRShSbaR48i/wRWWtQ17yPsAQkghnup6D35B1H+N3d9Q0XpeCgiyUHRO2dSHv//93BJkjfStvmRLvhCobTUUu1lOEYJYy+uZnhN9cd6PLfPxaB9MX378Wc1vn9JXY7T2blML4ppGenok1elYZY783mjB4qgD41+ibfhATX/MW3lMJFy8r79lRHpnwfToKr2PXWpbPF6mxn3e9Fhp/zL5/VcyLOrDUUisVH+xl5KRsKzHI8f/afjSnVCuK+Q3ND6GoC3N3yafEhrFARzQihAwnl7QhnI8jZmV2AQ0a/HCl1IpMvk9MCpAy+0pMMg1hUtWRmy41NzWl95dRjj/oCgaZviHYGDWfOsHjOz+FmVpooNBpjuUXsHkWshQ5+cA1W7zcO5M19IQFCehBMmQ4fkfi+RVVCXufTna2ITu9ur9UGMTyXvF8XxTYr8wsgdMX0QDynRp9/P9KoDuNC1wuwSDvmkj0gUSDGr5uWD6z03cxH4B10yq1Y77o7aUf2k1/Uu/aLty2lCEXItcU/txDTg1IX2P4xx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Aug 18, 2024 at 6:27=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Sun, Aug 18, 2024 at 5:51=E2=80=AFPM Yafang Shao wrote: > > > > On Sun, Aug 18, 2024 at 11:48=E2=80=AFAM Barry Song <21cnbao@gmail.com>= wrote: > > > > > > On Sun, Aug 18, 2024 at 2:55=E2=80=AFPM Yafang Shao wrote: > > > > > > > > On Sat, Aug 17, 2024 at 2:25=E2=80=AFPM Barry Song <21cnbao@gmail.c= om> wrote: > > > > > > > > > > From: Barry Song > > > > > > > > > > When users allocate memory with the __GFP_NOFAIL flag, they might > > > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This k= ind of > > > > > non-blockable __GFP_NOFAIL is not supported and is pointless. If= we > > > > > attempt and still fail to allocate memory for these users, we hav= e two > > > > > choices: > > > > > > > > > > 1. We could busy-loop and hope that some other direct reclama= tion or > > > > > kswapd rescues the current process. However, this is unreliab= le > > > > > and could ultimately lead to hard or soft lockups, > > > > > > > > That can occur even if we set both __GFP_NOFAIL and > > > > __GFP_DIRECT_RECLAIM, right? So, I don't believe the issue is relat= ed > > > > to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed > > > > design of __GFP_NOFAIL itself. > > > > > > the point of GFP_NOFAIL is that it won't fail and its user won't chec= k > > > the return value. without direct_reclamation, it is sometimes impossi= ble. > > > but with direct reclamation, users constantly wait and finally they c= an > > > > So, what exactly is the difference between 'constantly waiting' and > > 'busy looping'? Could you please clarify? Also, why can't we > > 'constantly wait' when __GFP_DIRECT_RECLAIM is not set? > > I list two options in changelog > 1: busy loop 2. bug_on. I am actually fine with either one. either one is > better than the existing code. but returning null in the current code > is definitely wrong. > > 1 somehow has the attempt to make __GFP_NOFAIL without direct_reclamation > legal. so it is a bit suspicious going in the wrong direction. > > busy-loop is that you are not reclaiming memory you are not sleeping. > cpu is constantly working and busy, so it might result in a lockup, eithe= r > soft lockup or hard lockup. > > with direct_reclamation, wait is the case you can sleep. it is not holdin= g > cpu, not a busy loop. in rare case, users might end in endless wait, > but it matches the doc of __GFP_NOFAIL, never return till memory > is gotten (the current code is implemented in this way unless users > incorrectly combine __GFP_NOFAIL with aotmic/nowait etc.) > and the essential difference between "w/ and w/o direct_reclaim": with direct reclaim, the user is actively reclaiming memory to rescue itself by all kinds of possible ways(compact, oom, reclamation), while without direct reclamation, it can do nothing and just loop (busy-loop). > note, long-term we won't expose __GFP_NOFAIL any more. we > will only expose GFP_NOFAIL which enforces Blockable. I am > quite busy on other issues, so this won't happen in a short time. > > > > > > get memory. if you read the doc of __GFP_NOFAIL you will find it. > > > it is absolutely clearly documented. > > > > > > > > > > > > which might not > > > > > be well supported by some architectures. > > > > > > > > > > 2. We could use BUG_ON to trigger a reliable system crash, av= oiding > > > > > exposing NULL dereference. > > > > > > > > > > Neither option is ideal, but both are improvements over the exist= ing code. > > > > > This patch selects the second option because, with the introducti= on of > > > > > scoped API and GFP_NOFAIL=E2=80=94capable of enforcing direct rec= lamation for > > > > > nofail users(which is in my plan), non-blockable nofail allocatio= ns will > > > > > no longer be possible. > > > > > > > > > > Signed-off-by: Barry Song > > > > > Cc: Michal Hocko > > > > > Cc: Uladzislau Rezki (Sony) > > > > > Cc: Christoph Hellwig > > > > > Cc: Lorenzo Stoakes > > > > > Cc: Christoph Lameter > > > > > Cc: Pekka Enberg > > > > > Cc: David Rientjes > > > > > Cc: Joonsoo Kim > > > > > Cc: Vlastimil Babka > > > > > Cc: Roman Gushchin > > > > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > > Cc: Linus Torvalds > > > > > Cc: Kees Cook > > > > > Cc: "Eugenio P=C3=A9rez" > > > > > Cc: Hailong.Liu > > > > > Cc: Jason Wang > > > > > Cc: Maxime Coquelin > > > > > Cc: "Michael S. Tsirkin" > > > > > Cc: Xuan Zhuo > > > > > --- > > > > > mm/page_alloc.c | 10 +++++----- > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > index d2c37f8f8d09..fb5850ecd3ae 100644 > > > > > --- a/mm/page_alloc.c > > > > > +++ b/mm/page_alloc.c > > > > > @@ -4399,11 +4399,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, un= signed int order, > > > > > */ > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > /* > > > > > - * All existing users of the __GFP_NOFAIL are blo= ckable, so warn > > > > > - * of any new users that actually require GFP_NOW= AIT > > > > > + * All existing users of the __GFP_NOFAIL are blo= ckable > > > > > + * otherwise we introduce a busy loop with inside= the page > > > > > + * allocator from non-sleepable contexts > > > > > */ > > > > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mas= k)) > > > > > - goto fail; > > > > > + BUG_ON(!can_direct_reclaim); > > > > > > > > I'm not in favor of using BUG_ON() here, as many call sites already > > > > handle the return value of __GFP_NOFAIL. > > > > > > > > > > it is not correct to handle the return value of __GFP_NOFAIL. > > > if you check the ret, don't use __GFP_NOFAIL. > > > > If so, you have many code changes to make in the linux kernel ;) > > > > Please list those code using __GFP_NOFAIL and check the result > might fail, we should get them fixed. This is insane. NOFAIL means > no fail. > > > > > > > > If we believe BUG_ON() is necessary, why not place it at the beginn= ing > > > > of __alloc_pages_slowpath() instead of after numerous operations, > > > > which could potentially obscure the issue? > > > > > > to some extent I agree with you. but the point here is that we might > > > want to avoid this check in the hot path. so basically, we check when > > > we have to check. in 99%+ case, this check can be avoided. > > > > It's on the slow path, but that's not the main point here. > > I actually recommended the approach, we can do an earlier check in the ho= tpath. > somehow, in the previous discussion, people didn't like it. > > > > > -- > > Regards > > Yafang > > Thanks > barry