From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03A30430BAA for ; Mon, 2 Mar 2026 18:51:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772477490; cv=none; b=YS0b5UuwJtP+Dx6UqGr7mtsadBPq6LTWR3/yC/EcXB6WuICxI12aUlXd2WJfcq6VgpMbayEJdQ3EhMqCHwUUJVTKotYPC3Z3N5rW6Gq1ElZa9zj25StJW1PP21F7rcuEe6JtgKF99oSOblmSv3aJlfVgJ15gK1yNIRtyTvwfBQI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772477490; c=relaxed/simple; bh=nAB+X8ABxTfnbIBEs58/7oLF9JjIcxW3jc8sjFMY3gU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pKni6whEuXs5ksaEwU8lGiecDOAgg2hTQ6pZMlK/ybDl1AlfzCVC3N8DXLTnOyS+VIoNC5m5/rBc0xrgYVExYqF1977Pa60fV7AS7fxw7Q0nMkMbWDhtODvSI3jkeyScEJMXj2aepI6S36ei3mh/QvmZkzfkjBo0PMPn5MzP0A4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=EdOKOJ4y; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="EdOKOJ4y" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4807068eacbso39870255e9.2 for ; Mon, 02 Mar 2026 10:51:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1772477487; x=1773082287; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5JQ76xqporR72La0Gtzvw92CKKW0Qc86jR/6DhxsEDo=; b=EdOKOJ4y0khSSQVt1o5VAhuOqKoipRCJTZipt+LgyGrRxKfLTtVc5nwGhifOmA+KPL TQUPvNT2IJ/8Ngb9F3fQ7kiXhBd7U7sSmvu89CHeDrf2TBIgYsf+ZHUdgYLRiFI6MOpe AzzFNE2s5bFVMtNqyKb4Y+woMVC1BgpmMOosTGnCO/QjyRRZFRTfX1uEd7b6T2iJy7Mb rJyOwwCnLZXHTq91IqKpJbuL4e8KeW3/MLkHZMJWH9yOPDA16I4iWoncBeb3TDFPa9Si OgNSCbDvAJ2NtxwvqbqNnqGn6RXGfzkB2W5xMPdlZfdO+zj5IKtu0INsBSB+gg6p1t3f mjfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772477487; x=1773082287; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5JQ76xqporR72La0Gtzvw92CKKW0Qc86jR/6DhxsEDo=; b=gCnNTTUdFVf/Tj9/TWDA6pq/lfDo6miQxY0xfY15azNgKx/yg3ewOzDyou5+jlztH9 qGtUfweBqTBffAEbzu2qLioiHILdIAzuVtoC57h1J1DwxBGrs+dlZK1+7Mz6pyMqvT35 LNMxrpeO7pwVfwIa21BE39Th5dOHDRxglnOFI0/Jm3VV/msJqBVYoMpdZLaHIcwL4zLi rCV+HUF5n2kwuTUH5trOv8gCzeyd4gEHbQ8R0qftre/lD0zawVx6ZUGnbESI3UqZMlOF ErtIBkhAAJpQ0S9PBDrIBsDCOzkfdojY4pl6f78u8GE9lrAfJvsomLgNFkL9cOhrw4tr R4Bg== X-Forwarded-Encrypted: i=1; AJvYcCUH1594QtrhwyNVM9ExkpyaVvdKw4jeJEtzJj7fimyxE6M4trH5EcO+eV1Xxk7pl0pZquqIUS8jmFWNkvs=@vger.kernel.org X-Gm-Message-State: AOJu0Yypz4CWYFsOMDjYQmHTcsL6b0ZK82oJC0QNkbp+eQju7AfXHngf sLuYN7nsOTmlwoyCESkzmd6f0gztlAoTTWXcGlWT4FJXiJVE5PHl2lQDsV2CHHI8KXCzkAoGrDo vH4SN X-Gm-Gg: ATEYQzwFY4+3r0bJNhVaagnsjAUz8A0ZF6I8Olz0/OqNGb30JjNtnF2sXICwrmr682B ISZ4c/R16zfYUuEpa5vm2p1bsJxIIBlCl7ohjoPUP4XiyeNZBo981EF52ictkAxGS71wUjbDczI ClGcx7Cguxe2JTFbNcg+4wlkBA9YQyrXBkfyK3ARn3nevNqWUsZwvD6fpFkn/2TFz/2l2Bef5oJ +fe+woqk5wzgKzG0VXlv5n4pNWv51Prg0UksdSvoWCxCf5oKlHb6cKytDwQV9x0+nQxBXhcpG5k PZmgk07iwHB9cdrn+QthKhT5B4v1twYTcUg/Q/Oc0kg+udnmFL2KhD2/He1zNkdi6O6FLVUB5DG b1i5ZqaEDR6tXaJJDN9tiuDt13WjCPWB3IHOc/96YEy4w+oR+ce6PaGveK7JOoaomuUHDG8KSw/ FqtunPr32nzEd8a8tCuYmii0bCDguNobO6jlPnCDSIqgwcGVg= X-Received: by 2002:a05:600c:4e05:b0:483:b505:9db7 with SMTP id 5b1f17b1804b1-483c9c0b940mr203343295e9.32.1772477487242; Mon, 02 Mar 2026 10:51:27 -0800 (PST) Received: from localhost (109-81-20-148.rct.o2.cz. [109.81.20.148]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-483bfb87030sm136547875e9.10.2026.03.02.10.51.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Mar 2026 10:51:26 -0800 (PST) Date: Mon, 2 Mar 2026 19:51:25 +0100 From: Michal Hocko To: Mikulas Patocka Cc: "Uladzislau Rezki (Sony)" , linux-mm@kvack.org, Andrew Morton , Vishal Moola , Baoquan He , LKML Subject: Re: [PATCH] vmalloc: support __GFP_RETRY_MAYFAIL and __GFP_NORETRY Message-ID: References: <20260302114740.2668450-1-urezki@gmail.com> <20260302114740.2668450-2-urezki@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon 02-03-26 18:38:53, Mikulas Patocka wrote: > > > On Mon, 2 Mar 2026, Uladzislau Rezki (Sony) wrote: > > > From: Michal Hocko > > > > __GFP_RETRY_MAYFAIL and __GFP_NORETRY haven't been supported so far > > because their semantic (i.e. to not trigger OOM killer) is not possible > > with the existing vmalloc page table allocation which is allowing for > > the OOM killer. > > > > Example: __vmalloc(size, GFP_KERNEL | __GFP_RETRY_MAYFAIL); > > > > > > vmalloc_test/55 invoked oom-killer: > > gfp_mask=0x40dc0( > > GFP_KERNEL|__GFP_ZERO|__GFP_COMP), order=0, oom_score_adj=0 > > active_anon:0 inactive_anon:0 isolated_anon:0 > > active_file:0 inactive_file:0 isolated_file:0 > > unevictable:0 dirty:0 writeback:0 > > slab_reclaimable:700 slab_unreclaimable:33708 > > mapped:0 shmem:0 pagetables:5174 > > sec_pagetables:0 bounce:0 > > kernel_misc_reclaimable:0 > > free:850 free_pcp:319 free_cma:0 > > CPU: 4 UID: 0 PID: 639 Comm: vmalloc_test/55 ... > > Hardware name: QEMU Standard PC (i440FX + PIIX, ... > > Call Trace: > > > > dump_stack_lvl+0x5d/0x80 > > dump_header+0x43/0x1b3 > > out_of_memory.cold+0x8/0x78 > > __alloc_pages_slowpath.constprop.0+0xef5/0x1130 > > __alloc_frozen_pages_noprof+0x312/0x330 > > alloc_pages_mpol+0x7d/0x160 > > alloc_pages_noprof+0x50/0xa0 > > __pte_alloc_kernel+0x1e/0x1f0 > > ... > > > > > > There are usecases for these modifiers when a large allocation request > > should rather fail than trigger OOM killer which wouldn't be able to > > handle the situation anyway [1]. > > > > While we cannot change existing page table allocation code easily we can > > piggy back on scoped NOWAIT allocation for them that we already have in > > place. The rationale is that the bulk of the consumed memory is sitting > > in pages backing the vmalloc allocation. Page tables are only > > participating a tiny fraction. Moreover page tables for virtually allocated > > areas are never reclaimed so the longer the system runs to less likely > > they are. It makes sense to allow an approximation of __GFP_RETRY_MAYFAIL > > and __GFP_NORETRY even if the page table allocation part is much weaker. > > This doesn't break the failure mode while it allows for the no OOM > > semantic. > > > > [1] https://lore.kernel.org/all/32bd9bed-a939-69c4-696d-f7f9a5fe31d8@redhat.com/T/#u > > > > Tested-by: Uladzislau Rezki (Sony) > > Signed-off-by: Michal Hocko > > Signed-off-by: Uladzislau Rezki (Sony) > > --- > > mm/vmalloc.c | 17 ++++++++++++----- > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index a06f4b3ea367..975592b0ec89 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3798,6 +3798,8 @@ static void defer_vm_area_cleanup(struct vm_struct *area) > > * non-blocking (no __GFP_DIRECT_RECLAIM) - memalloc_noreclaim_save() > > * GFP_NOFS - memalloc_nofs_save() > > * GFP_NOIO - memalloc_noio_save() > > + * __GFP_RETRY_MAYFAIL, __GFP_NORETRY - memalloc_noreclaim_save() > > + * to prevent OOMs > > * > > * Returns a flag cookie to pair with restore. > > */ > > @@ -3806,7 +3808,8 @@ memalloc_apply_gfp_scope(gfp_t gfp_mask) > > { > > unsigned int flags = 0; > > > > - if (!gfpflags_allow_blocking(gfp_mask)) > > + if (!gfpflags_allow_blocking(gfp_mask) || > > + (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_NORETRY))) > > flags = memalloc_noreclaim_save(); > > I wouldn't do this because: > > 1. it makes the __GFP_RETRY_MAYFAIL allocations unreliable. __GFP_RETRY_MAYFAIL doesn't provide any reliability. It just promisses to not OOM while trying hard. I believe this implementation doesn't break that promise. > 2. The comment at memalloc_noreclaim_save says that it may deplete memory > reserves: "This should only be used when the caller guarantees the > allocation will allow more memory to be freed very shortly, i.e. it needs > to allocate some memory in the process of freeing memory, and cannot > reclaim due to potential recursion." yes, this allocation clearly doesn't guaratee to free more memory. That comment is rather dated. Anyway, the crux is to make sure that the allocation is not unbound. The idea behind this decision is that the page tables are only a tiny fraction of the resulting memory allocated. Moreover this virtually allocated space is recycled so over time there should be less and less of page tables allocated as well. > I think that the cleanest solution to this problem would be to get rid of > PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO and instead introduce two per-thread > variables "gfp_t set_flags" and "gfp_t clear_flags" and set and clear gfp > flags according to them in the allocator: "gfp = (gfp | > current->set_flags) & ~current->clear_flags"; We've been through discussions like this one way too many times and the conclusion is that, no this will not work. The gfp space we have and need to support without rewriting a large part of the kernel is simply incompatible with a more sane interface. Yeah, I hate that as well but here we are. We need to be creative to keep sensible and not introduce even more weirdness to the interface. -- Michal Hocko SUSE Labs