From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDE3535CB75 for ; Tue, 17 Mar 2026 12:32:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773750745; cv=none; b=VzQlgCY3BnpRcKCGDldVdoC2JE2PynMNBTR8wK8/EIzvdttflgNwYt4qs2L9qIyjsz6wQCk3WtHaF/PWWieTnJMRB96AlAW20RTcpa8iu56werX26nktQ40YsKWDkrRDDZSpiJ1wZjXMYcETwx6RUDHvp9XAJv39hnb0V1+fw6Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773750745; c=relaxed/simple; bh=m+yUloerdg+cKn6h3yYvRAaR4uhpVH65NrnZ6qV9LYI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oUrSiov4iJABq/GwX8o6Dw/aXfjAdaCPJEyJwdcQghtJTOx+ovqLLXwqdZ6T0jObirHsSv0N93fK0Wg0CB1gJ36RuQ0nw45M4nLDNsJDsb2k4yJsWgea7QHrvo3X0qJPDWuKNLak4WpRfDOQzTgLv0oI3/NA8x9TSK6VANaTluY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GWoBVyvT; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GWoBVyvT" Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-35a09e0dd63so6315085a91.3 for ; Tue, 17 Mar 2026 05:32:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773750743; x=1774355543; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=khh7GPjJ3y6yrkZvajwZLcEiPn144UhSqF1UmlKQ5e0=; b=GWoBVyvTTxsEbQ7n9McH5gcd7+o/5//pbjtodXjIFFZE9ZgBivWOgkkQBpUGkGe7MV VdAnHmtlxfgapQIfPTS7Es5G6t2vkIYqiXLV/mXfODmYTdinf1yfQv5CR0FD++PClFs0 9H5V3EEfdCCNxnkMx2c4yFO7NGpmjgQmdFCo0pBwZ1qROOqkRT9HSvRI+mITcM7/tpwN hylSWzk22ibhIhUcSQhid8ox2GdhRvQny7E/Z0l9vUBHsNkAhtUkN3IBI6Dn5m8zP48U pBsSdw39+GgEoZ2woLgDdk1xECLXtIw+WOCko92GvX9pxo6/NKkrasns1IskZFyiCQXE Ii5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773750743; x=1774355543; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=khh7GPjJ3y6yrkZvajwZLcEiPn144UhSqF1UmlKQ5e0=; b=ORQIZET46QNbI8O0KomGJ39NFBYmHtYQnar0N2Q1yTmmE1TxrfwMDBO2kB5k4YS9Mr 8Y4DxYvJbFPbfQuMCDROezhhSPGGxsfe/tFToabKBQZOhsC8+/VkZ/4dNKZ7CTUuakXy h1DtYpSWz2DLwKG/MIjVX5Y9cXtzOagrWMh4eAzQzzHIlv3f9wyQ/VDFg8ygPOXjzCDX c7iqySGUX1W8blP4gmA6rJi0YpuwvpJzifW/jh4CU+sKL3fYssVcuO9C7QuwAMVrFiAq hxcLTknjqmOmS3uB/kUkcpwVXSUXFg0kxloIz7zMbITW63LVFQHQ9rQi4VytNVTNZuvt PtGg== X-Forwarded-Encrypted: i=1; AJvYcCX0OR13Duh+WFpvISoVlfCjuQ6kPATQO/poHgMRJm1TxImSxeMfvaAKSJsUdLLr6XqlcV8I/VG+xqE/l7k=@vger.kernel.org X-Gm-Message-State: AOJu0YwHbPkPBGS0yWWwT+GKhNbUCLG7HbcvkaVBI2HHYNcqPyZqncyp oOEdpHx1Y0UzzzgEkDDL4HANBnSaE3nwsW9PneLJ9FMR1TEEXEkRpo1S X-Gm-Gg: ATEYQzzqvIFy45Df+ePQWtG75+o8rDwZz6ncfqvqyq0QJtKG4y76/nM5CcdbjY6S5Zl f+YQDVvfWZeyiGWa2uGIdoTLkpDg2Jy2pn/BE1Q2Z2okhhtDzqFP8ahDzk/jls/1vZ5S/TSVU4q nCkJAGOb3OMU+i1evsWT6nRtPSjWG7VyEuxdFa9VYoCWNS5BCPGOqf7l/QnYBrbw2dI8ahr1lRx C2lr/mclZ8/+hduuNQimn+Zy3i3hWQSZhFtS0qK2CREIHog0885v+yNCGWe8puvN+mghTu5HaFo 5aZuqrwFqg/LeEj8kUderbBAWhKrhrnICzEta1m9qxZk1atKBITDU28JeIB/KMpg6uwwXQeQc1n 6FF9V+VpDx028czQqTNI8fojrzNZCSItB8gO3sxt0eXs7+XcwYx1tvILxgOqdCNx9pAsJvpsr1r gam73NQ/+5OEsdhz7qvZQpmL0JpdLbFeHTurY= X-Received: by 2002:a17:90b:1a8c:b0:35b:9ab6:1d65 with SMTP id 98e67ed59e1d1-35b9ab61fb4mr6960509a91.25.1773750742946; Tue, 17 Mar 2026 05:32:22 -0700 (PDT) Received: from google.com ([2402:7500:a44:85b:7148:e9a6:4d84:7769]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-35a245b3e97sm4842851a91.10.2026.03.17.05.32.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Mar 2026 05:32:22 -0700 (PDT) Date: Tue, 17 Mar 2026 20:32:18 +0800 From: Kuan-Wei Chiu To: Zhihao Cheng Cc: richard@nod.at, akpm@linux-foundation.org, jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com, marscheng@google.com, linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls Message-ID: References: <20260315193900.218737-1-visitorckw@gmail.com> <7bb23ce1-a0f3-b576-ad79-c9fa746f11ed@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7bb23ce1-a0f3-b576-ad79-c9fa746f11ed@huawei.com> Hi Zhihao, On Tue, Mar 17, 2026 at 12:05:54PM +0800, Zhihao Cheng wrote: > 在 2026/3/16 3:39, Kuan-Wei Chiu 写道: > > Historically, list_sort() implemented a hack in merge_final(): > > if (unlikely(!++count)) > > cmp(priv, b, b); > > > > This was designed specifically so that callers could periodically > > invoke cond_resched() within their comparison functions when merging > > highly unbalanced lists. > > > > However, an audit of the kernel tree reveals that only fs/ubifs/ relies > > on this mechanism. For the vast majority of list_sort() users (such as > > block layer IO schedulers and file systems), this results in completely > > wasted function calls. In the worst-case scenario (merging an already > > sorted list where 'a' is exhausted quickly), this results in > > approximately (N/2)/256 unnecessary cmp() calls. > > > > To clean up this API while ensuring behavior compatibility: > > 1. Introduce list_sort_nonatomic(), which explicitly calls > > cond_resched() internally when count overflows. > > 2. Remove the dummy cmp(priv, b, b) fallback for standard list_sort(), > > saving unnecessary function calls and improving determinism. > > 3. Convert the sole user (fs/ubifs/) to the new API. > > > > Note that ubifs still maintains cond_resched() inside its own > > comparison functions. This patch does not alter the frequency or timing > > of those scheduling points, guaranteeing no regressions for ubifs, > > while benefiting all other kernel users. > > > > Signed-off-by: Kuan-Wei Chiu > > --- > > fs/ubifs/gc.c | 4 +- > > fs/ubifs/replay.c | 2 +- > > include/linux/list_sort.h | 3 + > > lib/list_sort.c | 166 +++++++++++++++++++++----------------- > > 4 files changed, 100 insertions(+), 75 deletions(-) > > lgtm for UBIFS. > > Reviewed-by: Zhihao Cheng Thanks for your review! > > one small nit below. > > > > [...] > > diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h > > index 453105f74e05..f7af29073d48 100644 > > --- a/include/linux/list_sort.h > > +++ b/include/linux/list_sort.h > > @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *, > > __attribute__((nonnull(2,3))) > > void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp); > > + > > +__attribute__((nonnull(2, 3))) > > +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func_t cmp); > > #endif > > diff --git a/lib/list_sort.c b/lib/list_sort.c > > index a310ecb7ccc0..788bfc26cf7b 100644 > > --- a/lib/list_sort.c > > +++ b/lib/list_sort.c > > @@ -3,6 +3,7 @@ > > #include > > #include > > #include > > +#include > > /* > > * Returns a list organized in an intermediate format suited > > @@ -47,7 +48,7 @@ static struct list_head *merge(void *priv, list_cmp_func_t cmp, > > */ > > __attribute__((nonnull(2,3,4,5))) > > static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head, > > - struct list_head *a, struct list_head *b) > > + struct list_head *a, struct list_head *b, bool may_schedule) > > { > > struct list_head *tail = head; > > u8 count = 0; > > @@ -79,12 +80,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head, > > /* > > * If the merge is highly unbalanced (e.g. the input is > > * already sorted), this loop may run many iterations. > > - * Continue callbacks to the client even though no > > - * element comparison is needed, so the client's cmp() > > - * routine can invoke cond_resched() periodically. > > + * If may_schedule is true, periodically invoke cond_resched() > > + * to avoid soft lockups. > > */ > > - if (unlikely(!++count)) > > - cmp(priv, b, b); > > + if (may_schedule && unlikely(!++count)) > > + cond_resched(); > The cond_resched() already has a judgment on whether to schedule out, so the > 'count' could be removed? However, I think keeping the u8 count rate-limiter makes more sense here due to the overhead difference. Evaluating unlikely(!++count) is essentially a single ALU instruction (register increment) and a zero-flag check, which has virtually zero cost. On the other hand, cond_resched() is a macro that does much more than a simple flag check. Depending on the kernel config, it often invokes __might_resched() (which reads current to check task_struct states, irq flags, etc.) and makes a call to __cond_resched(). Evaluating all of this heavy machinery on every single iteration of such a tight loop would probably introduce noticeable overhead. Actually, your comment brings up another thought I wanted to discuss. Since we are introducing list_sort_nonatomic(), I wonder if we should eventually move the cond_resched() out of UBIFS's cmp() functions entirely and handle it inside list_sort_nonatomic(). Right now, because the cmp() callback is inherently invoked at every step of the merge process, UBIFS ends up evaluating the cond_resched() macro every 3 or 4 pointer assignments during the main sort. While UBIFS needs to prevent soft lockups on huge lists, checking for resched at such a micro-granularity still feels excessive and likely leaves performance on the table. I didn't make this change in the current patch because I don't have the proper UBIFS hardware/setup to benchmark the performance difference, and I wanted to keep the current scheduling frequency exactly the same to guarantee no regressions. But I'd love to hear your thoughts on whether reducing the frequency and moving it out of UBIFS's cmp() is something worth doing in the future. Regards, Kuan-Wei