From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB7F729A9E9 for ; Mon, 20 Apr 2026 12:35:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776688552; cv=none; b=nV9Blc0pSGgEn7HJYu4T+BSQQtHP29VhrAcOqqH1sUrPxEcL+35gE0/5AjiByLX9i9Vt6WJA6fJJL4ZwYrkGVi2vB7rmqKebQbbjU9smQhQvVhZVpHAnkmiLgjnTtoaAY8nwv592nesghX8oc3KwA1xVQhVw3xTJQzGre2CJWKc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776688552; c=relaxed/simple; bh=gmGwoLIpV2vLRdL0pzba4Ylkakn4dsonpwi8orJXlgM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=td+KuBUPq0nq4ohfE2w7tHTrTdj4TAdJI6w/4DIRiAACMGV798pgIkmAURb8TYrMuZllw8CnohiDU5BMLATnpT/QNiV1T3sxkYw8K//DlcuStxAZOdFSA7JmczzDbpdBgLrQgZWCa6NFn/fHr1uZVnNIUmF1Y/jHDWiW4sJViwA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JE0ITKd7; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JE0ITKd7" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2b24af7ca99so38077885ad.1 for ; Mon, 20 Apr 2026 05:35:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776688550; x=1777293350; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=a7rF8R+Cm1/VhrNzZCKXUuMbhfYj1uhB1Hlo6ydY99s=; b=JE0ITKd73jreywScxONQjqZlqyII+4/oY1Hur8G5XSGxf4mX0Bh+CAcWndyV+fzTpt WarWPzdsfWabaP48Rh7NoBv+pDZsPKiyVOi2zUyrZpew1v4oxi4g7FXXzv8qLeeFCBGW Lku79AF8v6TOBezCedOQHqxrxgGP6uJieAjLv1VrVpowC6ty46BbqfoqaBzsSlo8Wp4p Ptagx02lrVwTw0wwy0MkUIqFRsOPVcNGVUYUDG0mLpWobGPTgV/sn/R7PzpZIfxYoWda lv4soxX87PA1VxmeXb40J6EebsJWpuLfp1He4iBy7gtHA4hsW0oe3FLd2+r49JmlblyU Gk0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776688550; x=1777293350; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a7rF8R+Cm1/VhrNzZCKXUuMbhfYj1uhB1Hlo6ydY99s=; b=ej1Nlh+gJKAY1U3YzhzE/+9/aGoRTW1aLqzALgPTUvk/LvC9mmim6qdzpmbiTDm0bL o162yh/GPJ9PNp+dPTd0koXKMgVZva0FTlDmhWQHg/MkB5xbLvlZbrpq3YzIJAsweJq6 Hv0msRpwItZK+RoHcI5sx3QxIWNdzmfGQQ5vHJeekjj78W/MQbmwGKc6jPPphGuKULUE F+qdGr4/Y9mJNPLaRwKpPB1HDP4c8UA42+804cWNnAplivfi6sEHnJNfO73U/BBtXOdQ GAx3C+8t1dYchqteuuP1UE6GY4xttW0ovRKj11dZIrlQ0nOLJE0Np3O3se+i4l7+Wzu0 2zvw== X-Forwarded-Encrypted: i=1; AFNElJ+udmok87wcvt11Up3GHSWZ2UsLNwAmKhEsxCbhrsVkBJ/PvgQY76Mr/m7zspIl+6BYOxaZbH/ojwWfRPE=@vger.kernel.org X-Gm-Message-State: AOJu0YwSqpnsCK+eN/zY/o5xSlD05QU/8W1pAIev/CAFfcFp+R3MyFc6 UW5LyuaXC8ulrUrGqSyRM/usxQwUPbF7xgkX9zAjRZncFj7g4YnU2uuxevhcvMBPvwAyuaQ5KKA vXBaTbxiQanEc0U7pvDzmBzNMWw== X-Received: from pgac10.prod.google.com ([2002:a05:6a02:294a:b0:c73:9919:c4fd]) (user=joonwonkang job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:6685:b0:3a0:aee6:acb1 with SMTP id adf61e73a8af0-3a0aee6af6cmr6488139637.26.1776688549982; Mon, 20 Apr 2026 05:35:49 -0700 (PDT) Date: Mon, 20 Apr 2026 12:35:48 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.54.0.rc1.555.g9c883467ad-goog Message-ID: <20260420123548.2116177-1-joonwonkang@google.com> Subject: Re: [PATCH] percpu: Fix hint invariant breakage From: Joonwon Kang To: dennis@kernel.org Cc: akpm@linux-foundation.org, cl@gentwo.org, dodam@google.com, joonwonkang@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, tj@kernel.org Content-Type: text/plain; charset="UTF-8" > Hello, > > Sorry for the delay, I've been a bit sick. > > On Mon, Mar 23, 2026 at 02:05:14PM +0000, Joonwon Kang wrote: > > > Hello, > > > > > > On Fri, Mar 20, 2026 at 11:52:14AM +0000, Joonwon Kang wrote: > > > > The invariant "scan_hint_start > contig_hint_start if and only if > > > > scan_hint == contig_hint" should be kept for hint management. However, > > > > it could be broken in some cases: > > > > > > > > > > First I'd just like to apologize. I spent an hour yesterday trying to > > > remember why the invariant exists and the reality is this code is more > > > clever than it needs to be. > > > > Thanks for taking time for this and sharing more context. While you are at > > it, I have a fundamental question on the invariant. I had deliberation and > > discussion on what benefits the invariant gets to the percpu allocator by > > its existence. My understanding is that if we put contig_hint before > > scan_hint when they are the same, it is more likely that contig_hint is > > broken by a future allocation, which leads to a linear scan after the > > scan_hint for hints update, although we could save scanning upto scan_hint > > when contig_hint is not broken. On the other hand, if we put scan_hint > > before contig_hint instead, it is more likely that scan_hint is broken > > while keeping contig_hint, which does not lead to the linear scan for > > hints update, although we could not save the scanning that could be saved > > in the other case. > > > > In other words, if contig_hint breaking allocations occur a lot in general > > with the current invariant, the performance may more suffer than without > > the invariant. I also think that there would be no strict reason of having > > the invariant. > > > > I think the original premise is that percpu memory is quite expensive, 1 > allocation costs nr_cpus * sizeof(allocation). So we do our best to bin > pack at the cost of faster allocations. We could always just break the > contig_hint but then over time we could cause more fragmentation. > > The case that triggered this was netdev needing 8 byte objects with 16 > byte alignment [1]. > Thank you for sharing the points about the bin packing. Although I did not fully understand the relationship between breakage of the contig_hint and the fragmentation trend, it may be helpful to reference the case you referred to. I guess you may have missed the link for the reference [1]? Could you help to provide the link, if you intended to leave it? > > So, could you clarify the necessity of the invariant? If there is no must > > reason, then I could post another spin-off patch to remove the invariant > > at all so that we could simplify the code and experiment the result. How > > do you think? > > > > I can't really recall the exact reasoning for the invariant, but it was > probably along the lines of wanting to not lose information if possible. > > Say an earlier area becomes free that is the same size as the > contig_hint but with better alignment, we ant to use that as the > contig_hint but then we either have to lose the scan_hint or keep it > with the invariant. Given the premise above, I believe we want to > continue bin packing, I think the general idea of scanning next time > around isn't the worst thing. > > Sadly because it's already there, and has worked for quite some time, > it's kind of on us today to provide data / reasoning to delete it. I'd > wager that some upcoming work is going to change how percpu gives out > objects either through some sort of slab caching that we can revisit > this more in that context. > Understood and thanks for your detailed explanation. I will keep the invariant as-is unless I have a clear data point to reverse it. I sent the new patch set v3 recently with this in mind. Please help to review it ;) Thanks, Joonwon Kang