From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91992C5320E for ; Tue, 27 Aug 2024 07:38:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00E456B007B; Tue, 27 Aug 2024 03:38:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F007C6B0082; Tue, 27 Aug 2024 03:38:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D792B6B0083; Tue, 27 Aug 2024 03:38:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BA0F36B007B for ; Tue, 27 Aug 2024 03:38:55 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 677C11216A1 for ; Tue, 27 Aug 2024 07:38:55 +0000 (UTC) X-FDA: 82497223830.01.3A67C89 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf30.hostedemail.com (Postfix) with ESMTP id 2729480011 for ; Tue, 27 Aug 2024 07:38:52 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mw+10Zne; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=voy0EpCD; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mw+10Zne; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=voy0EpCD; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724744218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9xOpYhsvJDLKARD1MYOOuGWdxCA0HLemjilzwpAeaXI=; b=6HKTmeNrYIxLbx4dgoatlxEiHfMjA3nGlRaYV9NE84onuaExwIKndfaf+AoJR6bZaZr/nB pskASRC/ElJUswby18IDLCgHdkIjNpG6LTZnp5GRcx92niGtIRscY8g9kMe40w8CGzFlR7 16/HG80I/kI36t9q46ijKOFu21SYdWk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mw+10Zne; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=voy0EpCD; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=mw+10Zne; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=voy0EpCD; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724744218; a=rsa-sha256; cv=none; b=6wgqE2nszgWw1/n0mo9JsfmU4HZ5b06PS4NtWReDCyJbll0MeDdY0rr8+9ZQWEz5cpIbRF 5YPY6HXHJa5qI/Sg5+mzF2/gybvb4rR0NaBHNG+JfUc6qFesH/5AZC7FrH+og+Yj8KHtNJ 80MJ+vDwmL8ge7VOqN8g6raQRxspmd8= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 75BD821B07; Tue, 27 Aug 2024 07:38:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1724744331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=9xOpYhsvJDLKARD1MYOOuGWdxCA0HLemjilzwpAeaXI=; b=mw+10ZneMRHGVdSQ+4mKMzq0+jqAAC97jDGdNPXTbaU8yQfQdprXzbYTz3/6OoSPmhkhtP j8T3Soxlg5PWBaSmkjxYRjMdH3GHgxaLx4sCawzlcX1ZabgIhbp5bLTqWnZoJnl7Y2E3l+ GVJAr5SrY7OkPQdFpaGnLdWE5a6lVP8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1724744331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=9xOpYhsvJDLKARD1MYOOuGWdxCA0HLemjilzwpAeaXI=; b=voy0EpCDVStftd/ZW5bDjVlijSJw29TB3bs2uhVun2fcpeyJx/LpgDo9UMSPbQtsPbxwG7 pJcrpjhh3VaOfWAQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1724744331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=9xOpYhsvJDLKARD1MYOOuGWdxCA0HLemjilzwpAeaXI=; b=mw+10ZneMRHGVdSQ+4mKMzq0+jqAAC97jDGdNPXTbaU8yQfQdprXzbYTz3/6OoSPmhkhtP j8T3Soxlg5PWBaSmkjxYRjMdH3GHgxaLx4sCawzlcX1ZabgIhbp5bLTqWnZoJnl7Y2E3l+ GVJAr5SrY7OkPQdFpaGnLdWE5a6lVP8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1724744331; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=9xOpYhsvJDLKARD1MYOOuGWdxCA0HLemjilzwpAeaXI=; b=voy0EpCDVStftd/ZW5bDjVlijSJw29TB3bs2uhVun2fcpeyJx/LpgDo9UMSPbQtsPbxwG7 pJcrpjhh3VaOfWAQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4EE5B13A44; Tue, 27 Aug 2024 07:38:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id aqHPEouCzWYZfgAAD6G6ig (envelope-from ); Tue, 27 Aug 2024 07:38:51 +0000 Message-ID: Date: Tue, 27 Aug 2024 09:38:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 0/4] mm: clarify nofail memory allocation Content-Language: en-US To: Barry Song <21cnbao@gmail.com> Cc: Linus Torvalds , David Hildenbrand , Michal Hocko , Yafang Shao , akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, urezki@gmail.com, v-songbaohua@oppo.com, virtualization@lists.linux.dev References: <59e90825-4efa-4384-8286-06c0235304dc@redhat.com> From: Vlastimil Babka Autocrypt: addr=vbabka@suse.cz; keydata= xsFNBFZdmxYBEADsw/SiUSjB0dM+vSh95UkgcHjzEVBlby/Fg+g42O7LAEkCYXi/vvq31JTB KxRWDHX0R2tgpFDXHnzZcQywawu8eSq0LxzxFNYMvtB7sV1pxYwej2qx9B75qW2plBs+7+YB 87tMFA+u+L4Z5xAzIimfLD5EKC56kJ1CsXlM8S/LHcmdD9Ctkn3trYDNnat0eoAcfPIP2OZ+ 9oe9IF/R28zmh0ifLXyJQQz5ofdj4bPf8ecEW0rhcqHfTD8k4yK0xxt3xW+6Exqp9n9bydiy tcSAw/TahjW6yrA+6JhSBv1v2tIm+itQc073zjSX8OFL51qQVzRFr7H2UQG33lw2QrvHRXqD Ot7ViKam7v0Ho9wEWiQOOZlHItOOXFphWb2yq3nzrKe45oWoSgkxKb97MVsQ+q2SYjJRBBH4 8qKhphADYxkIP6yut/eaj9ImvRUZZRi0DTc8xfnvHGTjKbJzC2xpFcY0DQbZzuwsIZ8OPJCc LM4S7mT25NE5kUTG/TKQCk922vRdGVMoLA7dIQrgXnRXtyT61sg8PG4wcfOnuWf8577aXP1x 6mzw3/jh3F+oSBHb/GcLC7mvWreJifUL2gEdssGfXhGWBo6zLS3qhgtwjay0Jl+kza1lo+Cv BB2T79D4WGdDuVa4eOrQ02TxqGN7G0Biz5ZLRSFzQSQwLn8fbwARAQABzSBWbGFzdGltaWwg QmFia2EgPHZiYWJrYUBzdXNlLmN6PsLBlAQTAQoAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIe AQIXgBYhBKlA1DSZLC6OmRA9UCJPp+fMgqZkBQJkBREIBQkRadznAAoJECJPp+fMgqZkNxIQ ALZRqwdUGzqL2aeSavbum/VF/+td+nZfuH0xeWiO2w8mG0+nPd5j9ujYeHcUP1edE7uQrjOC Gs9sm8+W1xYnbClMJTsXiAV88D2btFUdU1mCXURAL9wWZ8Jsmz5ZH2V6AUszvNezsS/VIT87 AmTtj31TLDGwdxaZTSYLwAOOOtyqafOEq+gJB30RxTRE3h3G1zpO7OM9K6ysLdAlwAGYWgJJ V4JqGsQ/lyEtxxFpUCjb5Pztp7cQxhlkil0oBYHkudiG8j1U3DG8iC6rnB4yJaLphKx57NuQ PIY0Bccg+r9gIQ4XeSK2PQhdXdy3UWBr913ZQ9AI2usid3s5vabo4iBvpJNFLgUmxFnr73SJ KsRh/2OBsg1XXF/wRQGBO9vRuJUAbnaIVcmGOUogdBVS9Sun/Sy4GNA++KtFZK95U7J417/J Hub2xV6Ehc7UGW6fIvIQmzJ3zaTEfuriU1P8ayfddrAgZb25JnOW7L1zdYL8rXiezOyYZ8Fm ZyXjzWdO0RpxcUEp6GsJr11Bc4F3aae9OZtwtLL/jxc7y6pUugB00PodgnQ6CMcfR/HjXlae h2VS3zl9+tQWHu6s1R58t5BuMS2FNA58wU/IazImc/ZQA+slDBfhRDGYlExjg19UXWe/gMcl De3P1kxYPgZdGE2eZpRLIbt+rYnqQKy8UxlszsBNBFsZNTUBCACfQfpSsWJZyi+SHoRdVyX5 J6rI7okc4+b571a7RXD5UhS9dlVRVVAtrU9ANSLqPTQKGVxHrqD39XSw8hxK61pw8p90pg4G /N3iuWEvyt+t0SxDDkClnGsDyRhlUyEWYFEoBrrCizbmahOUwqkJbNMfzj5Y7n7OIJOxNRkB IBOjPdF26dMP69BwePQao1M8Acrrex9sAHYjQGyVmReRjVEtv9iG4DoTsnIR3amKVk6si4Ea X/mrapJqSCcBUVYUFH8M7bsm4CSxier5ofy8jTEa/CfvkqpKThTMCQPNZKY7hke5qEq1CBk2 wxhX48ZrJEFf1v3NuV3OimgsF2odzieNABEBAAHCwXwEGAEKACYCGwwWIQSpQNQ0mSwujpkQ PVAiT6fnzIKmZAUCZAUSmwUJDK5EZgAKCRAiT6fnzIKmZOJGEACOKABgo9wJXsbWhGWYO7mD 8R8mUyJHqbvaz+yTLnvRwfe/VwafFfDMx5GYVYzMY9TWpA8psFTKTUIIQmx2scYsRBUwm5VI EurRWKqENcDRjyo+ol59j0FViYysjQQeobXBDDE31t5SBg++veI6tXfpco/UiKEsDswL1WAr tEAZaruo7254TyH+gydURl2wJuzo/aZ7Y7PpqaODbYv727Dvm5eX64HCyyAH0s6sOCyGF5/p eIhrOn24oBf67KtdAN3H9JoFNUVTYJc1VJU3R1JtVdgwEdr+NEciEfYl0O19VpLE/PZxP4wX PWnhf5WjdoNI1Xec+RcJ5p/pSel0jnvBX8L2cmniYnmI883NhtGZsEWj++wyKiS4NranDFlA HdDM3b4lUth1pTtABKQ1YuTvehj7EfoWD3bv9kuGZGPrAeFNiHPdOT7DaXKeHpW9homgtBxj 8aX/UkSvEGJKUEbFL9cVa5tzyialGkSiZJNkWgeHe+jEcfRT6pJZOJidSCdzvJpbdJmm+eED w9XOLH1IIWh7RURU7G1iOfEfmImFeC3cbbS73LQEFGe1urxvIH5K/7vX+FkNcr9ujwWuPE9b 1C2o4i/yZPLXIVy387EjA6GZMqvQUFuSTs/GeBcv0NjIQi8867H3uLjz+mQy63fAitsDwLmR EP+ylKVEKb0Q2A== In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2729480011 X-Stat-Signature: efabyg5j77hqd7bbyfmsozwrm694k39h X-Rspam-User: X-HE-Tag: 1724744332-574061 X-HE-Meta: U2FsdGVkX1/6ByCsQbWIfaMKjSpwhGdAWuTe2PLjCuD+1J1QgFRa5QcHY1eZfR+WW6Gct2GbttBSe5CgfH8Y3aYHYBqym+bXHiWVVI8Fs6lcMyQVN9XHD9xIc4K64rTLaQPOOStcgmtXnWUbIbxDARuE0Jp+7iZxPzOv9u2A6eSFX+BITF0qTEdYM539Nh2EitlAgRz6J3dZUxfp8fakDZ9/R0EQ8X/sslffLPMo28toreExvSuplGZOKHCigVzbFHif1sz/2JBZAQG/5XprNw41BgQUWuzGY7UxKc3p8eb1EtIXofQbK6G1C7TQ48Lo1HwlGkpUj5vpD4jmgn0oaqVBydvbTfsTOp7Gh4uJtkqsOc4Wl2enBVkojmj5iYE0CT3D//oIRvrqG05C057ystveHlpUCKdGTvd06X/nysLnO3XKlarRk/jy3HQSooFrrEnASmI5MHQ97Zg4tJPoZ7dKq544wVurNW7wMscHweIZcwnZxJ51mzeq+qdXMgrzzIRKL6DVaS06b/27MjFaR5ZphwJy2pJi+rQd6LOgV8ukpnRJOJnmlxFe28djiw6PYoanTrKmy1WXXdno612QDQQ42YN9najs/YNuRiZvZgfl1yu/bXEKgRNPxEBTkk2UncsZY78fC6U7zqL1H1AZzFdZcgvvAP9eW9ocU70KO/rOpwXNJOf/gjQso2N5tJRBst/BtQ/UB1eoznmAMpCtBbgkT4jKA4VeZdasaFaQX+n6youjN4idp8assMstylo+QT4IMCU0dPDRH9wwl/mSRc8UgVqLHLBoEMJqcqeaXJjR2Ha2i6LXtVPT0YyfM0mCvygsHQA/OPnHl+R3LSE50+mdxgRC+JgXAJ9l0bfcu8lfxd8SIBJUhZgMHvXjjkPfFPQBOakENmuS520idyVjmLbkxMFPRFt5aFSrkJyuRqdRSfuGMZNcTDypjLAZX64FCOrCGoUNtuKqeGw2bc4 rQ163wkC JsDWfrhrSrJO6d01fdplv9s62wA0nrgNeXnCiGOqb0bNNWdJCPbIGpfx7sbQ544dkIr1FNMvvZI9rdN7BRJ5WWENM1l2lml+NVTTClpmtHaUBF9iwhADT+b63IaQV09WXTO92cMiEWtSCB6PEOrjEAuG/oY9TSuNJUnBo3elD6eL1z7mEplgAiHx+C2ZDhlS3HMAN7dBTeEoR4zWGZuas/oZuF0lf7DCs9X8rJ2MsL0CBocywaP5ZS0Fjk1gktQHzZ/u0zx41KfixvAHxLs3sIayDVFx5U6Z/NGuGhbfPVNUewup3/r1tPAfrpcyQpC0HVUC/+ib2Qv3gPNtQTdpdKeM3cbk/jwrmLeLPLuuInCXpisYvg0MCm8hJgTxeIems31/X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/27/24 09:15, Barry Song wrote: > On Tue, Aug 27, 2024 at 12:10 AM Vlastimil Babka wrote: >> >> On 8/22/24 11:34, Linus Torvalds wrote: >> > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand wrote: >> >> >> >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "retry >> >> infinitely". if that implies just OOPSing or actually be in a busy loop, >> >> I don't care. It could effectively happen with MAX_ORDER as well, as >> >> stated. But certainly not BUG_ON. >> > >> > No BUG_ON(), but also no endless loop. >> > >> > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to >> > make it easy to find offenders, and then let them deal with it. >> >> Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only when >> we're about to actually return NULL, so the memory has to be depleted >> already. To make it easier to find the offenders much more reliably, we >> should consider doing it sooner, but also not add unnecessary overhead to >> allocator fastpaths just because of the potentially buggy users. So either >> always in __alloc_pages_slowpath(), which should be often enough (unless the >> system never needs to wake up kswapd to reclaim) but with negligible enough >> overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM? > > We already have a WARN_ON for order > 1 in rmqueue. we might extend > the condition there to include checking flags as well? Ugh, wasn't aware, well spotted. So it means there at least shouldn't be existing users of __GFP_NOFAIL with order > 1 :) But also the check is in the hotpath, even before trying the pcplists, so we could move it to __alloc_pages_slowpath() while extending it? > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7dcb0713eb57..b5717c6569f9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone, > /* > * We most definitely don't want callers attempting to > * allocate greater than order-1 page units with __GFP_NOFAIL. > + * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, > + * which can result in a lockup > */ > - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && > + (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM))); > > if (likely(pcp_allowed_order(order))) { > page = rmqueue_pcplist(preferred_zone, zone, order, > >> >> > Don't take it upon yourself to say "we have to deal with any amount of >> > stupidity". >> > >> > The MM layer is not some slave to users. The MM layer is one of the >> > most core pieces of code in the kernel, and as such the MM layer is >> > damn well in charge. >> > >> > Nobody has the right to say "I will not deal with allocation >> > failures". The MM should not bend over backwards over something like >> > that. >> > >> > Seriously. Get a spine already, people. Tell random drivers that claim >> > that they cannot deal with errors to just f-ck off. >> > >> > And you don't do it by looping forever, and you don't do it by killing >> > the kernel. You do it by ignoring their bullying tactics. >> > >> > Then you document the *LIMITED* cases where you actually will try forever. >> > >> > This discussion has gone on for too damn long. >> > >> > Linus >>