Linux kernel -stable discussions

Linux kernel -stable discussions
 help / color / mirror / Atom feed

* [RFC PATCH v2.1 3/3] mm/damon/stat: detect and use fresh enabled value
From: SeongJae Park @ 2026-04-18 22:27 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 17 . x, Andrew Morton, damon, linux-kernel,
	linux-mm
In-Reply-To: <20260418222758.39795-1-sj@kernel.org>

DAMON_STAT updates 'enabled' parameter value, which represents the
running status of its kdamond, when the user explicitly requests
start/stop of the kdamond.  The kdamond can, however, be stopped even if
the user explicitly requested the stop, if ctx->regions_score_histogram
allocation failure at beginning of the execution of the kdamond.  Hence,
if the kdamond is stopped by the allocation failure, the value of the
parameter can be stale.

Users could show the stale value and be confused.  The problem will only
rarely happen in real and common setups because the allocation is
arguably too small to fail.  Also, unlike the similar bugs that are now
fixed in DAMON_RECLAIM and DAMON_LRU_SORT, kdamond can be restarted in
this case, because DAMON_STAT force-updates the enabled parameter value
for user inputs.  The bug is a bug, though.

The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when
those are requested.

The issue was dicovered [1] by Sashiko.

[1] https://lore.kernel.org/20260416040602.88665-1-sj@kernel.org

Fixes: 369c415e6073 ("mm/damon: introduce DAMON_STAT module")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/stat.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/mm/damon/stat.c b/mm/damon/stat.c
index 99ba346f9e325..3951b762cbddf 100644
--- a/mm/damon/stat.c
+++ b/mm/damon/stat.c
@@ -19,14 +19,17 @@
 static int damon_stat_enabled_store(
 		const char *val, const struct kernel_param *kp);

+static int damon_stat_enabled_load(char *buffer,
+		const struct kernel_param *kp);
+
 static const struct kernel_param_ops enabled_param_ops = {
 	.set = damon_stat_enabled_store,
-	.get = param_get_bool,
+	.get = damon_stat_enabled_load,
 };

 static bool enabled __read_mostly = IS_ENABLED(
 	CONFIG_DAMON_STAT_ENABLED_DEFAULT);
-module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
+module_param_cb(enabled, &enabled_param_ops, NULL, 0600);
 MODULE_PARM_DESC(enabled, "Enable of disable DAMON_STAT");

 static unsigned long estimated_memory_bandwidth __read_mostly;
@@ -273,17 +276,23 @@ static void damon_stat_stop(void)
 	damon_stat_context = NULL;
 }

+static bool damon_stat_enabled(void)
+{
+	if (!damon_stat_context)
+		return false;
+	return damon_is_running(damon_stat_context);
+}
+
 static int damon_stat_enabled_store(
 		const char *val, const struct kernel_param *kp)
 {
-	bool is_enabled = enabled;
 	int err;

 	err = kstrtobool(val, &enabled);
 	if (err)
 		return err;

-	if (is_enabled == enabled)
+	if (damon_stat_enabled() == enabled)
 		return 0;

 	if (!damon_initialized())
@@ -293,16 +302,17 @@ static int damon_stat_enabled_store(
 		 */
 		return 0;

-	if (enabled) {
-		err = damon_stat_start();
-		if (err)
-			enabled = false;
-		return err;
-	}
+	if (enabled)
+		return damon_stat_start();
 	damon_stat_stop();
 	return 0;
 }

+static int damon_stat_enabled_load(char *buffer, const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%c\n", damon_stat_enabled() ? 'Y' : 'N');
+}
+
 static int __init damon_stat_init(void)
 {
 	int err = 0;
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH v2.1 2/3] mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
From: SeongJae Park @ 2026-04-18 22:27 UTC (permalink / raw)
  Cc: SeongJae Park, # 6 . 0 . x, Andrew Morton, damon, linux-kernel,
	linux-mm, Liew Rui Yan
In-Reply-To: <20260418222758.39795-1-sj@kernel.org>

DAMON_LRU_SORT updates 'enabled' and 'kdamond_pid' parameter values,
which represents the running status of its kdamond, when the user
explicitly requests start/stop of the kdamond.  The kdamond can,
however, be stopped in events other than the explicit user request in
the following three events.

1. ctx->regions_score_histogram allocation failure at beginning of the
   execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.

Hence, if the kdamond is stopped by the above three events, the values
of the status parameters can be stale.  Users could show the stale
values and be confused.  This is already bad, but the real consequence
is worse.  DAMON_LRU_SORT avoids unnecessary damon_start() and
damon_stop() calls based on the 'enabled' parameter value.  And the
update of 'enabled' parameter value depends on the damon_start() and
damon_stop() call results.  Hence, once the kdamond has stopped by the
unintentional events, the user cannot restart the kdamond before the
system reboot.  For example, the issue can be reproduced via below
steps.

    # cd /sys/module/damon_lru_sort/parameters
    #
    # # start DAMON_LRU_SORT
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         806       2  0 17:53 ?        00:00:00 [kdamond.0]
    root         808     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # commit wrong input to stop kdamond withou explicit stop request
    # echo 3 > addr_unit
    # echo Y > commit_inputs
    bash: echo: write error: Invalid argument
    #
    # # confirm kdamond is stopped
    # ps -ef | grep kdamond
    root         811     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # users casn now show stable status
    # cat enabled
    Y
    # cat kdamond_pid
    806
    #
    # # even after fixing the wrong parameter,
    # # kdamond cannot be restarted.
    # echo 1 > addr_unit
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         815     803  0 17:54 pts/4    00:00:00 grep kdamond

The problem will only rarely happen in real and common setups for the
following reasons.  The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail.  Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad.  And the bug is a
bug.

The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when
those are requested.

Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Cc: <stable@vger.kernel.org> # 6.0.x
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/lru_sort.c | 85 +++++++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 554559d729760..8494040b1ee48 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -161,15 +161,6 @@ module_param(monitor_region_end, ulong, 0600);
  */
 static unsigned long addr_unit __read_mostly = 1;
 
-/*
- * PID of the DAMON thread
- *
- * If DAMON_LRU_SORT is enabled, this becomes the PID of the worker thread.
- * Else, -1.
- */
-static int kdamond_pid __read_mostly = -1;
-module_param(kdamond_pid, int, 0400);
-
 static struct damos_stat damon_lru_sort_hot_stat;
 DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_lru_sort_hot_stat,
 		lru_sort_tried_hot_regions, lru_sorted_hot_regions,
@@ -386,12 +377,8 @@ static int damon_lru_sort_turn(bool on)
 {
 	int err;
 
-	if (!on) {
-		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
-		return err;
-	}
+	if (!on)
+		return damon_stop(&ctx, 1);
 
 	err = damon_lru_sort_apply_parameters();
 	if (err)
@@ -400,9 +387,6 @@ static int damon_lru_sort_turn(bool on)
 	err = damon_start(&ctx, 1, true);
 	if (err)
 		return err;
-	kdamond_pid = damon_kdamond_pid(ctx);
-	if (kdamond_pid < 0)
-		return kdamond_pid;
 	return damon_call(ctx, &call_control);
 }
 
@@ -430,42 +414,83 @@ module_param_cb(addr_unit, &addr_unit_param_ops, &addr_unit, 0600);
 MODULE_PARM_DESC(addr_unit,
 	"Scale factor for DAMON_LRU_SORT to ops address conversion (default: 1)");
 
+static bool damon_lru_sort_enabled(void)
+{
+	if (!ctx)
+		return false;
+	return damon_is_running(ctx);
+}
+
 static int damon_lru_sort_enabled_store(const char *val,
 		const struct kernel_param *kp)
 {
-	bool is_enabled = enabled;
-	bool enable;
 	int err;
 
-	err = kstrtobool(val, &enable);
+	err = kstrtobool(val, &enabled);
 	if (err)
 		return err;
 
-	if (is_enabled == enable)
+	if (damon_lru_sort_enabled() == enabled)
 		return 0;
 
 	/* Called before init function.  The function will handle this. */
 	if (!damon_initialized())
-		goto set_param_out;
+		return 0;
 
-	err = damon_lru_sort_turn(enable);
-	if (err)
-		return err;
+	return damon_lru_sort_turn(enabled);
+}
 
-set_param_out:
-	enabled = enable;
-	return err;
+static int damon_lru_sort_enabled_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%c\n", damon_lru_sort_enabled() ? 'Y' : 'N');
 }
 
 static const struct kernel_param_ops enabled_param_ops = {
 	.set = damon_lru_sort_enabled_store,
-	.get = param_get_bool,
+	.get = damon_lru_sort_enabled_load,
 };
 
 module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
 MODULE_PARM_DESC(enabled,
 	"Enable or disable DAMON_LRU_SORT (default: disabled)");
 
+static int damon_lru_sort_kdamond_pid_store(const char *val,
+		const struct kernel_param *kp)
+{
+	/*
+	 * kdamond_pid is read-only, but kernel command line could write it.
+	 * Do nothing here.
+	 */
+	return 0;
+}
+
+static int damon_lru_sort_kdamond_pid_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	int kdamond_pid = -1;
+
+	if (ctx) {
+		kdamond_pid = damon_kdamond_pid(ctx);
+		if (kdamond_pid < 0)
+			kdamond_pid = -1;
+	}
+	return sprintf(buffer, "%d\n", kdamond_pid);
+}
+
+static const struct kernel_param_ops kdamond_pid_param_ops = {
+	.set = damon_lru_sort_kdamond_pid_store,
+	.get = damon_lru_sort_kdamond_pid_load,
+};
+
+/*
+ * PID of the DAMON thread
+ *
+ * If DAMON_LRU_SORT is enabled, this becomes the PID of the worker thread.
+ * Else, -1.
+ */
+module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400);
+
 static int __init damon_lru_sort_init(void)
 {
 	int err;
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH v2.1 1/3] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
From: SeongJae Park @ 2026-04-18 22:27 UTC (permalink / raw)
  Cc: SeongJae Park, # 5 . 19 . x, Andrew Morton, damon, linux-kernel,
	linux-mm, Liew Rui Yan
In-Reply-To: <20260418222758.39795-1-sj@kernel.org>

DAMON_RECLAIM updates 'enabled' and 'kdamond_pid' parameter values,
which represents the running status of its kdamond, when the user
explicitly requests start/stop of the kdamond.  The kdamond can,
however, be stopped in events other than the explicit user request in
the following three events.

1. ctx->regions_score_histogram allocation failure at beginning of the
   execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.

Hence, if the kdamond is stopped by the above three events, the values
of the status parameters can be stale.  Users could show the stale
values and be confused.  This is already bad, but the real consequence
is worse.  DAMON_RECLAIM avoids unnecessary damon_start() and
damon_stop() calls based on the 'enabled' parameter value.  And the
update of 'enabled' parameter value depends on the damon_start() and
damon_stop() call results.  Hence, once the kdamond has stopped by the
unintentional events, the user cannot restart the kdamond before the
system reboot.  For example, the issue can be reproduced via below
steps.

    # cd /sys/module/damon_reclaim/parameters
    #
    # # start DAMON_RECLAIM
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         806       2  0 17:53 ?        00:00:00 [kdamond.0]
    root         808     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # commit wrong input to stop kdamond withou explicit stop request
    # echo 3 > addr_unit
    # echo Y > commit_inputs
    bash: echo: write error: Invalid argument
    #
    # # confirm kdamond is stopped
    # ps -ef | grep kdamond
    root         811     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # users casn now show stable status
    # cat enabled
    Y
    # cat kdamond_pid
    806
    #
    # # even after fixing the wrong parameter,
    # # kdamond cannot be restarted.
    # echo 1 > addr_unit
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         815     803  0 17:54 pts/4    00:00:00 grep kdamond

The problem will only rarely happen in real and common setups for the
following reasons.  The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail.  Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad.  And the bug is a
bug.

The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when
those are requested.

Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update")
Cc: <stable@vger.kernel.org> # 5.19.x
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/reclaim.c | 85 ++++++++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 86da147786583..fe7fce26cf6ce 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -144,15 +144,6 @@ static unsigned long addr_unit __read_mostly = 1;
 static bool skip_anon __read_mostly;
 module_param(skip_anon, bool, 0600);
 
-/*
- * PID of the DAMON thread
- *
- * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
- * Else, -1.
- */
-static int kdamond_pid __read_mostly = -1;
-module_param(kdamond_pid, int, 0400);
-
 static struct damos_stat damon_reclaim_stat;
 DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_reclaim_stat,
 		reclaim_tried_regions, reclaimed_regions, quota_exceeds);
@@ -288,12 +279,8 @@ static int damon_reclaim_turn(bool on)
 {
 	int err;
 
-	if (!on) {
-		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
-		return err;
-	}
+	if (!on)
+		return damon_stop(&ctx, 1);
 
 	err = damon_reclaim_apply_parameters();
 	if (err)
@@ -302,9 +289,6 @@ static int damon_reclaim_turn(bool on)
 	err = damon_start(&ctx, 1, true);
 	if (err)
 		return err;
-	kdamond_pid = damon_kdamond_pid(ctx);
-	if (kdamond_pid < 0)
-		return kdamond_pid;
 	return damon_call(ctx, &call_control);
 }
 
@@ -332,42 +316,83 @@ module_param_cb(addr_unit, &addr_unit_param_ops, &addr_unit, 0600);
 MODULE_PARM_DESC(addr_unit,
 	"Scale factor for DAMON_RECLAIM to ops address conversion (default: 1)");
 
+static bool damon_reclaim_enabled(void)
+{
+	if (!ctx)
+		return false;
+	return damon_is_running(ctx);
+}
+
 static int damon_reclaim_enabled_store(const char *val,
 		const struct kernel_param *kp)
 {
-	bool is_enabled = enabled;
-	bool enable;
 	int err;
 
-	err = kstrtobool(val, &enable);
+	err = kstrtobool(val, &enabled);
 	if (err)
 		return err;
 
-	if (is_enabled == enable)
+	if (damon_reclaim_enabled() == enabled)
 		return 0;
 
 	/* Called before init function.  The function will handle this. */
 	if (!damon_initialized())
-		goto set_param_out;
+		return 0;
 
-	err = damon_reclaim_turn(enable);
-	if (err)
-		return err;
+	return damon_reclaim_turn(enabled);
+}
 
-set_param_out:
-	enabled = enable;
-	return err;
+static int damon_reclaim_enabled_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%c\n", damon_reclaim_enabled() ? 'Y' : 'N');
 }
 
 static const struct kernel_param_ops enabled_param_ops = {
 	.set = damon_reclaim_enabled_store,
-	.get = param_get_bool,
+	.get = damon_reclaim_enabled_load,
 };
 
 module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
 MODULE_PARM_DESC(enabled,
 	"Enable or disable DAMON_RECLAIM (default: disabled)");
 
+static int damon_reclaim_kdamond_pid_store(const char *val,
+		const struct kernel_param *kp)
+{
+	/*
+	 * kdamond_pid is read-only, but kernel command line could write it.
+	 * Do nothing here.
+	 */
+	return 0;
+}
+
+static int damon_reclaim_kdamond_pid_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	int kdamond_pid = -1;
+
+	if (ctx) {
+		kdamond_pid = damon_kdamond_pid(ctx);
+		if (kdamond_pid < 0)
+			kdamond_pid = -1;
+	}
+	return sprintf(buffer, "%d\n", kdamond_pid);
+}
+
+static const struct kernel_param_ops kdamond_pid_param_ops = {
+	.set = damon_reclaim_kdamond_pid_store,
+	.get = damon_reclaim_kdamond_pid_load,
+};
+
+/*
+ * PID of the DAMON thread
+ *
+ * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
+ * Else, -1.
+ */
+module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400);
+
 static int __init damon_reclaim_init(void)
 {
 	int err;
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH v2.1 0/3] mm/damon/modules: detect and use fresh status
From: SeongJae Park @ 2026-04-18 22:27 UTC (permalink / raw)
  Cc: SeongJae Park, # 5 . 19 . x, Andrew Morton, damon, linux-kernel,
	linux-mm

DAMON modules including DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
commonly expose the kdamond running status via their parameters.  Under
certain scenarios including wrong user inputs and memory allocation
failures, those parameter values can be stale.  It can confuse users.
For DAMON_RECLAIM and DAMON_LRU_SORT, it even makes the kdamond unable
to be restarted before the system reboot.

The problem comes from the fact that there are multiple events for the
status changes and it is difficult to follow up all the scenarios.  Fix
the issue by detecting and using the status on demand, instead of using
a cached status that is difficult to be updated.

Patches 1-3 fix the bugs in DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
in the order.

Changes from RFC v2
- rfc v2: https://lore.kernel.org/20260418014439.6353-1-sj@kernel.org
- Set kdamond_pid set callbacks.
- Support multiple enabled parameters setup on boot commandline.
- Acknowledge the third patch was discovered by Sashiko.
Changes from v2
- v2: https://lore.kernel.org/20260413185249.5921-1-aethernet65535@gmail.com
- Add RFC tag back, for sashiko review.
- Detect and use fresh status instead of trying to catch up all scenarios.
- Change Liew from the responsible author to a credit-deserved co-developer.
- Move authorship responsibility to SJ.
- Add DAMON_STAT fix.
  - RFC of the fix was posted separately
    (https://lore.kernel.org/20260416143857.76146-1-sj@kernel.org), and
    only commit message wordsmithing is added in this version.
Changes from RFC
- rfc: https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com
- Remove RFC tag.
- Remove 'damon_thread_status' structure and damon_update_thread_status()
  (SJ pointed out this was too much extension of core API for a problem
  that can be fixed more simply).
- Add a fallback in damon_{lru_sort, reclaim}_turn() 'N' path. If
  damon_stop() fails but kdamond is not running, forcefully reset the
  parameters.
- Reset 'enabled' and 'kdamond_pid' when damon_commit_ctx() fails in
  damon_{lru_sort, reclaim}_apply_parameters() (kdamond will terminate
  eventually in this case).

SeongJae Park (3):
  mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
  mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
  mm/damon/stat: detect and use fresh enabled value

 mm/damon/lru_sort.c | 85 +++++++++++++++++++++++++++++----------------
 mm/damon/reclaim.c  | 85 +++++++++++++++++++++++++++++----------------
 mm/damon/stat.c     | 30 ++++++++++------
 3 files changed, 130 insertions(+), 70 deletions(-)


base-commit: 710b7b26c423290803f447f5ed2fb264e91cda56
-- 
2.47.3

^ permalink raw reply

* Re: [PATCH] bpf: crypto: reject unterminated type and algorithm names
From: Vadim Fedorenko @ 2026-04-18 21:33 UTC (permalink / raw)
  To: Pengpeng Hou, Alexei Starovoitov
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, bpf,
	linux-kernel, stable
In-Reply-To: <20260417073128.91029-1-pengpeng@iscas.ac.cn>

On 17.04.2026 08:31, Pengpeng Hou wrote:
> bpf_crypto_ctx_create() validates the overall size of
> struct bpf_crypto_params, but it does not verify that the fixed-width
> type[14] and algo[128] fields are NUL-terminated before passing them to
> string consumers.
> 
> A caller can therefore fill either field without a terminator and cause
> bpf_crypto_get_type(), has_algo(), or alloc_tfm() to read past the end
> of the fixed buffer.
How can this happen for static defined type/algo structures?

^ permalink raw reply

* Re: [PATCH net] sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks
From: patchwork-bot+netdevbpf @ 2026-04-18 19:30 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: linux-sctp, marcelo.leitner, lucien.xin, davem, edumazet, kuba,
	pabeni, horms, netdev, linux-kernel, stable
In-Reply-To: <20260416031903.1447072-1-michael.bommarito@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 23:19:03 -0400 you wrote:
> sctp_getsockopt_peer_auth_chunks() checks that the caller's optval
> buffer is large enough for the peer AUTH chunk list with
> 
>     if (len < num_chunks)
>             return -EINVAL;
> 
> but then writes num_chunks bytes to p->gauth_chunks, which lives
> at offset offsetof(struct sctp_authchunks, gauth_chunks) == 8
> inside optval.  The check is missing the sizeof(struct
> sctp_authchunks) = 8-byte header.  When the caller supplies
> len == num_chunks (for any num_chunks > 0) the test passes but
> copy_to_user() writes sizeof(struct sctp_authchunks) = 8 bytes
> past the declared buffer.
> 
> [...]

Here is the summary with links:
  - [net] sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks
    https://git.kernel.org/netdev/net/c/0cf004ffb61c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [net,PATCH v4 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, bigeasy, stable, davem, andrew+netdev, edumazet, kuba, nb,
	pabeni, ronald.wahl, yiconghui, linux-kernel
In-Reply-To: <20260415231020.455298-1-marex@nabladev.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 01:09:44 +0200 you wrote:
> If the driver executes ks8851_irq() AND a TX packet has been sent, then
> the driver enables TX queue via netif_wake_queue() which schedules TX
> softirq to queue packets for this device.
> 
> If CONFIG_PREEMPT_RT=y is set AND a packet has also been received by
> the MAC, then ks8851_rx_pkts() calls netdev_alloc_skb_ip_align() to
> allocate SKBs for the received packets. If netdev_alloc_skb_ip_align()
> is called with BH enabled, then local_bh_enable() at the end of
> netdev_alloc_skb_ip_align() will trigger the pending softirq processing,
> which may ultimately call the .xmit callback ks8851_start_xmit_par().
> The ks8851_start_xmit_par() will try to lock struct ks8851_net_par
> .lock spinlock, which is already locked by ks8851_irq() from which
> ks8851_start_xmit_par() was called. This leads to a deadlock, which
> is reported by the kernel, including a trace listed below.
> 
> [...]

Here is the summary with links:
  - [net,v4,1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
    https://git.kernel.org/netdev/net/c/5c9fcac3c872
  - [net,v4,2/2] net: ks8851: Avoid excess softirq scheduling
    https://git.kernel.org/netdev/net/c/22230e68b2cf

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH] eventfs: Hold eventfs_mutex and SRCU when remount walks events
From: David Carlier @ 2026-04-18 19:17 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: mathieu.desnoyers, linux-trace-kernel, linux-kernel,
	David Carlier, stable

Commit 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the
events descriptor") had eventfs_set_attrs() recurse through ei->children
on remount.  The walk only holds the rcu_read_lock() taken by
tracefs_apply_options() over tracefs_inodes, which is wrong:

  - list_for_each_entry over ei->children races with the list_del_rcu()
    in eventfs_remove_rec() -- LIST_POISON1 deref, same shape as
    d2603279c7d6.
  - eventfs_inodes are freed via call_srcu(&eventfs_srcu, ...).
    rcu_read_lock() does not extend an SRCU grace period, so ti->private
    can be reclaimed under the walk.
  - The writes to ei->attr race with eventfs_set_attr(), which holds
    eventfs_mutex.

Reproducer:

  while :; do mount -o remount,uid=$((RANDOM%1000)) /sys/kernel/tracing; done &
  while :; do
      echo "p:kp submit_bio" > /sys/kernel/tracing/kprobe_events
      echo > /sys/kernel/tracing/kprobe_events
  done

Wrap the events portion of tracefs_apply_options() in
eventfs_remount_lock()/_unlock() that take eventfs_mutex and
srcu_read_lock(&eventfs_srcu).  eventfs_set_attrs() doesn't sleep so the
nested rcu_read_lock() is fine; lockdep_assert_held() pins the contract.

Comment in tracefs_drop_inode() said "RCU cycle" -- it is SRCU.

Fixes: 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 fs/tracefs/event_inode.c | 14 ++++++++++++++
 fs/tracefs/inode.c       |  5 ++++-
 fs/tracefs/internal.h    |  3 +++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 81df94038f2e..79193021c6b0 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -244,6 +244,8 @@ static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t
 {
 	struct eventfs_inode *ei_child;
 
+	lockdep_assert_held(&eventfs_mutex);
+
 	/* Update events/<system>/<event> */
 	if (WARN_ON_ONCE(level > 3))
 		return;
@@ -886,3 +888,15 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
 	d_invalidate(dentry);
 	d_make_discardable(dentry);
 }
+
+int eventfs_remount_lock(void)
+{
+	mutex_lock(&eventfs_mutex);
+	return srcu_read_lock(&eventfs_srcu);
+}
+
+void eventfs_remount_unlock(int srcu_idx)
+{
+	srcu_read_unlock(&eventfs_srcu, srcu_idx);
+	mutex_unlock(&eventfs_mutex);
+}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 03f768536fd5..f3d6188a3b7b 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -313,6 +313,7 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
 	struct inode *inode = d_inode(sb->s_root);
 	struct tracefs_inode *ti;
 	bool update_uid, update_gid;
+	int srcu_idx;
 	umode_t tmp_mode;
 
 	/*
@@ -337,6 +338,7 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
 		update_uid = fsi->opts & BIT(Opt_uid);
 		update_gid = fsi->opts & BIT(Opt_gid);
 
+		srcu_idx = eventfs_remount_lock();
 		rcu_read_lock();
 		list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
 			if (update_uid) {
@@ -358,6 +360,7 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
 				eventfs_remount(ti, update_uid, update_gid);
 		}
 		rcu_read_unlock();
+		eventfs_remount_unlock(srcu_idx);
 	}
 
 	return 0;
@@ -403,7 +406,7 @@ static int tracefs_drop_inode(struct inode *inode)
 	 * This inode is being freed and cannot be used for
 	 * eventfs. Clear the flag so that it doesn't call into
 	 * eventfs during the remount flag updates. The eventfs_inode
-	 * gets freed after an RCU cycle, so the content will still
+	 * gets freed after an SRCU cycle, so the content will still
 	 * be safe if the iteration is going on now.
 	 */
 	ti->flags &= ~TRACEFS_EVENT_INODE;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index d83c2a25f288..a4a7f8431aff 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -76,4 +76,7 @@ struct inode *tracefs_get_inode(struct super_block *sb);
 void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid);
 void eventfs_d_release(struct dentry *dentry);
 
+int eventfs_remount_lock(void);
+void eventfs_remount_unlock(int srcu_idx);
+
 #endif /* _TRACEFS_INTERNAL_H */
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net v2 00/12] Intel Wired LAN Driver Updates 2026-04-14 (ice, i40e, iavf, idpf, e1000e)
From: patchwork-bot+netdevbpf @ 2026-04-18 19:10 UTC (permalink / raw)
  To: Jacob Keller
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	grzegorz.nitka, aleksandr.loktionov, horms, sx.rinitha,
	zoltan.fodor, sunithax.d.mekala, lgs201920130244, stable,
	mschmidt, paul.greenwalt, przemyslaw.kitszel, kmta1236, kohei,
	poros, pmenzel, rafal.romanowski, emil.s.tantilov, patryk.holda,
	tactii, avigailx.dahan
In-Reply-To: <20260416-iwl-net-submission-2026-04-14-v2-0-686c33c9828d@intel.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 17:53:24 -0700 you wrote:
> Grzegorz updates the logic for adjusting the PTP hardware clock on E830,
> fixing a bug that prevented adjustments below S32_MAX/MIN nanoseconds.
> 
> Grzegorz and Zoli update the PCS latency settings for E825 devices at 10GbE
> and 25GbE, improving the accuracy of timestamps based on data from
> production hardware.
> 
> [...]

Here is the summary with links:
  - [net,v2,01/12] ice: fix 'adjust' timer programming for E830 devices
    https://git.kernel.org/netdev/net/c/885c5e57924d
  - [net,v2,02/12] ice: update PCS latency settings for E825 10G/25Gb modes
    https://git.kernel.org/netdev/net/c/05567e405273
  - [net,v2,03/12] ice: fix double free in ice_sf_eth_activate() error path
    https://git.kernel.org/netdev/net/c/9aab1c3d7299
  - [net,v2,04/12] ice: fix double-free of tx_buf skb
    https://git.kernel.org/netdev/net/c/1a303baa715e
  - [net,v2,05/12] ice: fix PHY config on media change with link-down-on-close
    https://git.kernel.org/netdev/net/c/55e74f9ea7fe
  - [net,v2,06/12] ice: fix ICE_AQ_LINK_SPEED_M for 200G
    https://git.kernel.org/netdev/net/c/4a3a940059e9
  - [net,v2,07/12] ice: fix race condition in TX timestamp ring cleanup
    https://git.kernel.org/netdev/net/c/7c72ec18c2a4
  - [net,v2,08/12] ice: fix potential NULL pointer deref in error path of ice_set_ringparam()
    https://git.kernel.org/netdev/net/c/fa28351f970f
  - [net,v2,09/12] i40e: don't advertise IFF_SUPP_NOFCS
    https://git.kernel.org/netdev/net/c/a24162f18825
  - [net,v2,10/12] iavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2
    https://git.kernel.org/netdev/net/c/496d9f91062f
  - [net,v2,11/12] idpf: fix xdp crash in soft reset error path
    (no matching commit)
  - [net,v2,12/12] e1000e: Unroll PTP in probe error handling
    https://git.kernel.org/netdev/net/c/aa3f7fe40935

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2 11/12] idpf: fix xdp crash in soft reset error path
From: Jakub Kicinski @ 2026-04-18 19:00 UTC (permalink / raw)
  To: jacob.e.keller
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	emil.s.tantilov, stable, aleksandr.loktionov, patryk.holda
In-Reply-To: <20260416-iwl-net-submission-2026-04-14-v2-11-686c33c9828d@intel.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

Jakub: I'll drop this patch and apply the rest.
---
idpf: fix xdp crash in soft reset error path

This commit fixes a NULL pointer dereference in idpf_vport_open() when
soft reset fails by restoring vport->xdp_prog to the old value. It also
updates the restart check to use IDPF_VPORT_UP instead of netif_running().

Cc: stable@vger.kernel.org

> diff --git a/drivers/net/ethernet/intel/idpf/xdp.c b/drivers/net/ethernet/intel/idpf/xdp.c
> --- a/drivers/net/ethernet/intel/idpf/xdp.c
> +++ b/drivers/net/ethernet/intel/idpf/xdp.c
> @@ -488,6 +488,7 @@ static int idpf_xdp_setup_prog(struct idpf_vport *vport,
>  				   "Could not reopen the vport after XDP setup");
>  
>  		cfg->user_config.xdp_prog = old;
> +		vport->xdp_prog = old;

Does this also need to roll back vport->num_xdp_txq?

Because idpf_initiate_soft_reset() overwrites the vport struct with the new
queue configuration before failing, vport->num_xdp_txq might remain > 0
while vport->xdp_prog is restored to NULL.

When the interface is brought up, idpf_xdpsqs_get() evaluates
idpf_xdp_enabled(vport) as false and returns early without allocating the
NAPI timers (xdpsq->timer).

Could another interface redirect XDP frames via ndo_xdp_xmit, bypassing the
check because vport->num_xdp_txq > 0, and cause a NULL pointer dereference
in libeth_xdpsq_queue_timer() during idpf_xdp_tx_finalize()?

>  		old = prog;
>  	}

Is this old = prog assignment leading to a double-put of the BPF program?

If idpf_initiate_soft_reset() fails, this error path drops a reference to
the newly provided prog when it calls bpf_prog_put(old) right after this
block.

Since the function returns an error code to the core network stack, won't
dev_xdp_install() in net/core/dev.c also explicitly call bpf_prog_put(prog)
assuming ownership of the error cleanup?

Could this underflow the reference count and cause a use-after-free?

[ ... ]

> diff --git a/drivers/net/ethernet/intel/idpf/xsk.c b/drivers/net/ethernet/intel/idpf/xsk.c
> --- a/drivers/net/ethernet/intel/idpf/xsk.c
> +++ b/drivers/net/ethernet/intel/idpf/xsk.c
> @@ -568,7 +569,8 @@ int idpf_xsk_pool_setup(struct idpf_vport *vport, struct netdev_bpf *bpf)
>  		return -EINVAL;
>  	}
>  
> -	restart = idpf_xdp_enabled(vport) && netif_running(vport->netdev);
> +	restart = idpf_xdp_enabled(vport) &&
> +		  test_bit(IDPF_VPORT_UP, np->state);
>  	if (!restart)
>  		goto pool;

Does this function leave the queue pair permanently disabled if the pool
setup fails?

If restart is true, the queue pair is disabled via:
    idpf_qp_switch(vport, qid, false);

If the subsequent call to libeth_xsk_setup_pool() fails, the function
returns early:

    ret = libeth_xsk_setup_pool(vport->netdev, qid, pool);
    if (ret) {
        ...
        return ret;
    }

Does this early return bypass the idpf_qp_switch(vport, qid, true) call
expected later, causing a permanent partial loss of connectivity until the
interface is manually restarted?

^ permalink raw reply

* [PATCH net v2 1/2] tcp: call sk_data_ready() after listener migration
From: Zhenzhong Wu @ 2026-04-18 18:13 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
	shuah, tamird, linux-kernel, linux-kselftest, Zhenzhong Wu,
	stable
In-Reply-To: <20260418181333.1713389-1-jt26wzz@gmail.com>

When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters. A nonblocking accept() still works because it
checks the queue directly, but poll()/epoll_wait() waiters and
blocking accept() callers can also remain asleep indefinitely.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired
in reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq via accept()
or listener shutdown, hit reqsk_put(), and drop that listener ref.
Since listeners are SOCK_RCU_FREE, wrap the post-queue_add()
dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also
covers the existing sock_net(nsk) access in that path.

The reqsk_timer_handler() path does not need the same changes for two
reasons: half-open requests become readable only after the final ACK,
where tcp_child_process() already wakes the listener; and once nreq is
visible via inet_ehash_insert(), the success path no longer touches
nsk directly.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
Cc: stable@vger.kernel.org
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..928654c34 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk)
 			if (nreq) {
 				refcount_set(&nreq->rsk_refcnt, 1);

+				rcu_read_lock();
 				if (inet_csk_reqsk_queue_add(nsk, nreq, child)) {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQSUCCESS);
 					reqsk_migrate_reset(req);
+					READ_ONCE(nsk->sk_data_ready)(nsk);
 				} else {
 					__NET_INC_STATS(sock_net(nsk),
 							LINUX_MIB_TCPMIGRATEREQFAILURE);
 					reqsk_migrate_reset(nreq);
 					__reqsk_free(nreq);
 				}
+				rcu_read_unlock();

 				/* inet_csk_reqsk_queue_add() has already
 				 * called inet_child_forget() on failure case.
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net v2] ibmveth: Disable GSO for packets with small MSS
From: Jakub Kicinski @ 2026-04-18 17:54 UTC (permalink / raw)
  To: mmc
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, horms, bjking1,
	haren, ricklind, maddy, mpe, linuxppc-dev, stable, shaik.abdulla1,
	naveedaus
In-Reply-To: <20260417172910.81433-1-mmc@linux.ibm.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ibmveth: Disable GSO for packets with small MSS

This commit adds an ndo_features_check callback to the ibmveth driver to
disable segmentation offload when the MSS is less than 224 bytes. This
prevents physical adapters in SEA configurations from freezing when given
small MSS packets.

> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c

[ ... ]

> @@ -1756,6 +1756,25 @@ static int ibmveth_set_mac_addr(struct net_device *dev, void *p)
>  	return 0;
>  }
>  
> +static netdev_features_t ibmveth_features_check(struct sk_buff *skb,
> +						struct net_device *dev,
> +						netdev_features_t features)
> +{
> +	/* Some physical adapters do not support segmentation offload with
> +	 * MSS < 224. Disable GSO for such packets to avoid adapter freeze.
> +	 */
> +	if (skb_is_gso(skb)) {

The commit message mentions this matches the ibmvnic hardware errata from
commit f10b09ef687f. That commit disabled GSO for both gso_size < 224 and
gso_segs == 1.

By omitting the check for skb_shinfo(skb)->gso_segs == 1, could a
single-segment GSO packet still cause the adapter to freeze?

> +		if (skb_shinfo(skb)->gso_size < IBMVETH_MIN_LSO_MSS) {
> +			netdev_warn_once(dev,
> +					 "MSS %u too small for LSO, disabling GSO\n",
> +					 skb_shinfo(skb)->gso_size);
> +			features &= ~NETIF_F_GSO_MASK;
> +		}
> +	}
> +
> +	return features;

When implementing a custom ndo_features_check callback, netif_skb_features()
will bypass the standard dflt_features_check() function. Does this mean
vlan_features_check() is no longer called for this driver?

If vlan_features_check() is skipped, multi-tagged (QinQ) VLAN packets
might retain GSO and protocol-specific checksum offloads that the hardware
parser cannot handle.

Should this return vlan_features_check(skb, features) instead?

> +}
-- 
pw-bot: cr

^ permalink raw reply

* [PATCH 2/2] ksmbd: reset rcount per connection in ksmbd_conn_wait_idle_sess_id()
From: DaeMyung Kang @ 2026-04-18 17:28 UTC (permalink / raw)
  To: linkinjeon, smfrench
  Cc: senozhatsky, tom, linux-cifs, linux-kernel, stable,
	Henrique Carvalho, DaeMyung Kang
In-Reply-To: <20260418172844.1333378-1-charsyam@gmail.com>

rcount is intended to be connection-specific: 2 for curr_conn, 1 for
every other connection sharing the same session.  However, it is
initialised only once before the hash iteration and is never reset.
After the loop visits curr_conn, later sibling connections are also
checked against rcount == 2, so a sibling with req_running == 1 is
incorrectly treated as idle.  This makes the outcome depend on the
hash iteration order: whether a given sibling is checked against the
loose (< 2) or the strict (< 1) threshold is decided by whether it
happens to be visited before or after curr_conn.

The function's contract is "wait until every connection sharing this
session is idle" so that destroy_previous_session() can safely tear
the session down.  The latched rcount violates that contract and
reopens the teardown race window the wait logic was meant to close:
destroy_previous_session() may proceed before sibling channels have
actually quiesced, overlapping session teardown with in-flight work
on those connections.

Recompute rcount inside the loop so each connection is compared
against its own threshold regardless of iteration order.

This is a code-inspection fix for an iteration-order-dependent logic
error; a targeted reproducer would require SMB3 multichannel with
in-flight work on a sibling channel landing after curr_conn in hash
order, which is not something that can be triggered reliably.

Fixes: 76e98a158b20 ("ksmbd: fix race condition between destroy_previous_session() and smb2 operations()")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
 fs/smb/server/connection.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c
index a26899d12df1..b5e077f272cf 100644
--- a/fs/smb/server/connection.c
+++ b/fs/smb/server/connection.c
@@ -237,7 +237,7 @@ int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id)
 {
 	struct ksmbd_conn *conn;
 	int rc, retry_count = 0, max_timeout = 120;
-	int rcount = 1, bkt;
+	int rcount, bkt;

 retry_idle:
 	if (retry_count >= max_timeout)
@@ -246,8 +246,7 @@ int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id)
 	down_read(&conn_list_lock);
 	hash_for_each(conn_list, bkt, conn, hlist) {
 		if (conn->binding || xa_load(&conn->sessions, sess_id)) {
-			if (conn == curr_conn)
-				rcount = 2;
+			rcount = (conn == curr_conn) ? 2 : 1;
 			if (atomic_read(&conn->req_running) >= rcount) {
 				rc = wait_event_timeout(conn->req_running_q,
 					atomic_read(&conn->req_running) < rcount,
-- 
2.43.0

^ permalink raw reply related

* [PATCH 1/2] ksmbd: fix active_num_conn leak when alloc_transport() fails
From: DaeMyung Kang @ 2026-04-18 17:28 UTC (permalink / raw)
  To: linkinjeon, smfrench
  Cc: senozhatsky, tom, linux-cifs, linux-kernel, stable,
	Henrique Carvalho, DaeMyung Kang
In-Reply-To: <20260418172844.1333378-1-charsyam@gmail.com>

ksmbd_kthread_fn() increments active_num_conn right after accept(),
before calling ksmbd_tcp_new_connection().  The decrement normally
happens in ksmbd_tcp_disconnect() at the end of the connection's
lifetime.

If alloc_transport() fails in ksmbd_tcp_new_connection(), the function
releases the socket and returns -ENOMEM without going through
ksmbd_tcp_disconnect(), so active_num_conn never gets decremented.
Under memory pressure, repeated failures monotonically inflate the
counter until max_connections is reached and new clients are refused
indefinitely.

Decrement active_num_conn on this error path, matching the accounting
rule used by ksmbd_kthread_fn() and ksmbd_tcp_disconnect().

Commit 77ffbcac4e56 ("smb: server: fix leak of active_num_conn in
ksmbd_tcp_new_connection()") fixed the sibling leak on the kthread_run()
failure path; this patch closes the remaining one.

Reproduced with a debug build that adds a temporary module parameter
guarding an early return at the top of alloc_transport(), forcing
the first N accept-time transport allocations to fail:

  * Configure ksmbd with "max connections = 3".
  * Force 5 successive alloc_transport() failures at the accept path.
  * Without the fix: active_num_conn drifts up to max_connections and
    subsequent legitimate mount.cifs attempts are refused with
    "ksmbd: Limit the maximum number of connections(3)" in dmesg.
  * With the fix: the counter is correctly decremented on each
    failure and legitimate mounts continue to succeed.

Tested by injecting 5 alloc_transport() failures with
max_connections=3 and verifying that subsequent mount.cifs attempts
still succeed on the patched kernel while the unpatched kernel
refuses them.

Fixes: 0d0d4680db22 ("ksmbd: add max connections parameter")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
 fs/smb/server/transport_tcp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/smb/server/transport_tcp.c b/fs/smb/server/transport_tcp.c
index 7e29b06820e2..400412444838 100644
--- a/fs/smb/server/transport_tcp.c
+++ b/fs/smb/server/transport_tcp.c
@@ -182,6 +182,8 @@ static int ksmbd_tcp_new_connection(struct socket *client_sk)

 	t = alloc_transport(client_sk);
 	if (!t) {
+		if (server_conf.max_connections)
+			atomic_dec(&active_num_conn);
 		sock_release(client_sk);
 		return -ENOMEM;
 	}
-- 
2.43.0

^ permalink raw reply related

* [PATCH 0/2] ksmbd: connection accounting and session teardown fixes
From: DaeMyung Kang @ 2026-04-18 17:28 UTC (permalink / raw)
  To: linkinjeon, smfrench
  Cc: senozhatsky, tom, linux-cifs, linux-kernel, stable,
	Henrique Carvalho, DaeMyung Kang

Two independent correctness fixes in the ksmbd server.

 1/2 ksmbd_tcp_new_connection() does not decrement active_num_conn on
     the alloc_transport() failure path, so repeated allocation
     failures monotonically inflate the counter until max_connections
     is reached and new clients are refused indefinitely.  This is
     the remaining half of the same family of accounting bugs
     addressed by 77ffbcac4e56 ("smb: server: fix leak of
     active_num_conn in ksmbd_tcp_new_connection()"), which only
     closed the kthread_run() failure path.  Reproduced under a debug
     build that forces alloc_transport() to return NULL for a bounded
     number of calls; details in the commit log.

 2/2 ksmbd_conn_wait_idle_sess_id() stores its per-connection
     threshold (rcount) in cross-iteration state, so whether a given
     sibling connection is compared against the loose (< 2) or the
     strict (< 1) threshold is decided by hash iteration order
     relative to curr_conn.  Connections visited after curr_conn can
     slip through the idle check while still processing requests
     against the same session, reopening the teardown race
     destroy_previous_session() was meant to close.  This is a
     code-inspection fix; the iteration-order dependency makes a
     targeted reproducer impractical.

The two patches are independent; the series order is not significant.

DaeMyung Kang (2):
  ksmbd: fix active_num_conn leak when alloc_transport() fails
  ksmbd: reset rcount per connection in ksmbd_conn_wait_idle_sess_id()

 fs/smb/server/connection.c    | 5 ++---
 fs/smb/server/transport_tcp.c | 2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

--
2.43.0

^ permalink raw reply

* Re: [PATCH 5.10 311/491] dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
From: Marek Vasut @ 2026-04-18 16:51 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman, stable
  Cc: patches, Vinod Koul, Sasha Levin
In-Reply-To: <6def01a404f3b10ac374c011000637c86598453b.camel@decadent.org.uk>

On 4/16/26 8:43 PM, Ben Hutchings wrote:
> On Thu, 2026-04-16 at 20:20 +0200, Marek Vasut wrote:
>> On 4/16/26 7:58 PM, Ben Hutchings wrote:
>>> On Mon, 2026-04-13 at 17:59 +0200, Greg Kroah-Hartman wrote:
>>>> 5.10-stable review patch.  If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>>
>>>> From: Marek Vasut <marex@nabladev.com>
>>>>
>>>> [ Upstream commit c7d812e33f3e8ca0fa9eeabf71d1c7bc3acedc09 ]
>>>>
>>>> The segment .control and .status fields both contain top bits which are
>>>> not part of the buffer size, the buffer size is located only in the bottom
>>>> max_buffer_len bits. To avoid interference from those top bits, mask out
>>>> the size using max_buffer_len first, and only then subtract the values.
>>>
>>> This change is harmless, but the problem it claims to fix does not
>>> exist.
>>
>> The current code subtracts two independently read values which both
>> contain status/control MSbits and the actual value LSbits. Depending on
>> the MSbits being identical in both separately read values is unsafe, so
>> the change in this patch masks out the MSbits first and then does the
>> subtraction on the actual value LSbits only, which is safe.
>>
>> Why do you think the original unsafe behavior can not trigger a failure?
> 
> The old code masked out the MSbits after subtraction.  So, there was no
> dependency on their being equal before substraction.  Since borrows
> propagate to the left, not the right, the MSbits could not "interfere"
> with the LSbits.
> 
> If you still aren't convinced, please try to find some example values
> for which the result would actually change.
Ah sigh, you're right. I will add this into the list of lessons learnt 
the hard way. Thank you for the clarification.

^ permalink raw reply

* Re: [net,PATCH v3 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Marek Vasut @ 2026-04-18 16:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
	Yicong Hui, linux-kernel
In-Reply-To: <20260416104818._EDbo9hA@linutronix.de>

On 4/16/26 12:48 PM, Sebastian Andrzej Siewior wrote:
> On 2026-04-16 11:26:00 [+0200], Marek Vasut wrote:
>>> memory allocation. Therefore I am saying this backtrace is from an older
>>> kernel.
>>
>> I actually did update the backtrace in V3 with the one from next 20260413
>> that contained b44596ffe1b4 ("ARM: Allow to enable RT") from
>> stable-rt/v6.12-rt-rebase branch [1] .
>>
>> I think I misunderstood the usage of "softirq is raised" vs. "softirq is
>> invoked" above . Is it possible that there was an already raised softirq
>> before the threaded IRQ handler was invoked, and __netdev_alloc_skb() is
>> what invoked that softirq ?
> 
> It is not impossible. Something needs to netif_wake_queue() and
> ks8851_irq() must only report IRQ_RXI (not IRQ_TXI). Then it can happen.
> But usually the driver "stops" the queue if it can't process any new
> packets and resumes it once a packet has been sent so it has room again.
This driver .start_xmit is very simple, if there is space in the 6 kiB 
TX FIFO, then the packet is written into it, otherwise the .start_xmit 
returns NETDEV_TX_BUSY . There does not seem to be any 
netif_{start,stop,wake}_queue() in the .start_xmit path.

^ permalink raw reply

* [PATCH net] seg6: fix seg6 lwtunnel output redirect for L2 reduced encap mode
From: Andrea Mayer @ 2026-04-18 16:28 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni, horms
  Cc: anton.makarov11235, stefano.salsano, netdev, linux-kernel,
	Andrea Mayer, stable

When SEG6_IPTUN_MODE_L2ENCAP_RED (L2ENCAP_RED) was introduced, the
condition in seg6_build_state() that excludes L2 encap modes from
setting LWTUNNEL_STATE_OUTPUT_REDIRECT was not updated to account for
the new mode.
As a consequence, L2ENCAP_RED routes incorrectly trigger seg6_output()
on the output path, where the packet is silently dropped because
skb_mac_header_was_set() fails on L3 packets.

Extend the check to also exclude L2ENCAP_RED, consistent with L2ENCAP.

Fixes: 13f0296be8ec ("seg6: add support for SRv6 H.L2Encaps.Red behavior")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
---
 net/ipv6/seg6_iptunnel.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 97b50d9b1365..9b64343ebad6 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -746,7 +746,8 @@ static int seg6_build_state(struct net *net, struct nlattr *nla,
 	newts->type = LWTUNNEL_ENCAP_SEG6;
 	newts->flags |= LWTUNNEL_STATE_INPUT_REDIRECT;
 
-	if (tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP)
+	if (tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP &&
+	    tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP_RED)
 		newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
 
 	newts->headroom = seg6_lwt_headroom(tuninfo);
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v5 net] nfc: hci: fix out-of-bounds read in HCP header parsing
From: Simon Horman @ 2026-04-18 16:30 UTC (permalink / raw)
  To: Ashutosh Desai
  Cc: netdev, kuba, edumazet, davem, pabeni, stable, linux-kernel
In-Reply-To: <20260416051522.4154698-1-ashutoshdesai993@gmail.com>

On Thu, Apr 16, 2026 at 05:15:22AM +0000, Ashutosh Desai wrote:
> nfc_hci_recv_from_llc() and nci_hci_data_received_cb() cast skb->data
> to struct hcp_packet and read the message header byte without checking
> that enough data is present in the linear sk_buff area. A malicious NFC
> peer can send a 1-byte HCP frame that passes through the SHDLC layer
> and reaches these functions, causing an out-of-bounds heap read.
> 
> Fix this by adding pskb_may_pull() before each cast to ensure the full
> 2-byte HCP header is pulled into the linear area before it is accessed.
> 
> Fixes: 8b8d2e08bf0d ("NFC: HCI support")
> Fixes: 11f54f228643 ("NFC: nci: Add HCI over NCI protocol support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
> ---
> V4 -> V5: fix whitespace damage
> V3 -> V4: add Fixes tags
> V2 -> V3: drop redundant checks from nfc_hci_msg_rx_work/nci_hci_msg_rx_work;
>           remove incorrect Suggested-by tag
> V1 -> V2: use pskb_may_pull() instead of skb->len check
> 
> v4: https://lore.kernel.org/netdev/177614425081.3600288.2536320552978506086@gmail.com/
> v3: https://lore.kernel.org/netdev/20260413024329.3293075-1-ashutoshdesai993@gmail.com/
> v2: https://lore.kernel.org/netdev/20260409150825.2217133-1-ashutoshdesai993@gmail.com/
> v1: https://lore.kernel.org/netdev/20260408223113.2009304-1-ashutoshdesai993@gmail.com/
> 
>  net/nfc/hci/core.c | 5 +++++
>  net/nfc/nci/hci.c  | 5 +++++
>  2 files changed, 10 insertions(+)

Reviewed-by: Simon Horman <horms@kernel.org>

Review of this patch at Sashiko.dev flags a number of related problems in
this code. I believe none of them introduced by this patch. And that
they can all be treated as area for possible follow-up.


^ permalink raw reply

* Re: [f2fs-dev] [PATCH] f2fs: fix node_cnt race between extent node destroy and writeback
From: Yongpeng Yang @ 2026-04-18 16:29 UTC (permalink / raw)
  To: Chao Yu, Yongpeng Yang, Jaegeuk Kim
  Cc: Yongpeng Yang, stable, linux-f2fs-devel
In-Reply-To: <ac9d0f35-52dc-4371-a692-39c1d4ae5555@kernel.org>


On 4/18/26 8:51 AM, Chao Yu via Linux-f2fs-devel wrote:
> On 4/17/26 21:26, Yongpeng Yang wrote:
>>
>> On 4/17/26 17:00, Chao Yu via Linux-f2fs-devel wrote:
>>> On 4/3/26 22:40, Yongpeng Yang wrote:
>>>> From: Yongpeng Yang <yangyongpeng@xiaomi.com>
>>>>
>>>> f2fs_destroy_extent_node() does not set FI_NO_EXTENT before clearing
>>>> extent nodes. When called from f2fs_drop_inode() with I_SYNC set,
>>>> concurrent kworker writeback can insert new extent nodes into the same
>>>> extent tree, racing with the destroy and triggering f2fs_bug_on() in
>>>> __destroy_extent_node(). The scenario is as follows:
>>>>
>>>> drop inode                            writeback
>>>>    - iput
>>>>     - f2fs_drop_inode  // I_SYNC set
>>>>      - f2fs_destroy_extent_node
>>>>       - __destroy_extent_node
>>>>        - while (node_cnt) {
>>>>           write_lock(&et->lock)
>>>>           __free_extent_tree
>>>>           write_unlock(&et->lock)
>>>>                                          - __writeback_single_inode
>>>>                                           - f2fs_outplace_write_data
>>>>                                            - 
>>>> f2fs_update_read_extent_cache
>>>>                                             - 
>>>> __update_extent_tree_range
>>>>                                              // FI_NO_EXTENT not set,
>>>>                                              // insert new extent node
>>>>          } // node_cnt == 0, exit while
>>>>        - f2fs_bug_on(node_cnt)  // node_cnt > 0
>>>>
>>>> Additionally, __update_extent_tree_range() only checks FI_NO_EXTENT for
>>>> EX_READ type, leaving EX_BLOCK_AGE updates completely unprotected.
>>>>
>>>> This patch set FI_NO_EXTENT under et->lock in __destroy_extent_node(),
>>>> consistent with other callers (__update_extent_tree_range and
>>>> __drop_extent_tree) and check FI_NO_EXTENT for both EX_READ and
>>>> EX_BLOCK_AGE tree.
>>>
>>> I suffered below test failure, then I bisect to this change.
>>>
>>>      generic/475  84s ... [failed, exit status 1]- output mismatch 
>>> (see /
>>> share/git/fstests/results//generic/475.out.bad)
>>>      --- tests/generic/475.out   2025-01-12 21:57:40.279440664 +0800
>>>      +++ /share/git/fstests/results//generic/475.out.bad 2026-04-17
>>> 12:08:28.000000000 +0800
>>>      @@ -1,2 +1,6 @@
>>>       QA output created by 475
>>>       Silence is golden.
>>>      +mount: /mnt/scratch_f2fs: mount system call failed: Structure 
>>> needs
>>> cleaning.
>>>      +       dmesg(1) may have more information after failed mount 
>>> system
>>> call.
>>>      +mount failed
>>>      +(see /share/git/fstests/results//generic/475.full for details)
>>>      ...
>>>      (Run 'diff -u /share/git/fstests/tests/generic/475.out /share/git/
>>> fstests/results//generic/475.out.bad'  to see the entire diff)
>>>
>>>
>>>      generic/388  73s ... [failed, exit status 1]- output mismatch 
>>> (see /
>>> share/git/fstests/results//generic/388.out.bad)
>>>      --- tests/generic/388.out   2025-01-12 21:57:40.275440602 +0800
>>>      +++ /share/git/fstests/results//generic/388.out.bad 2026-04-17
>>> 11:58:05.000000000 +0800
>>>      @@ -1,2 +1,6 @@
>>>       QA output created by 388
>>>       Silence is golden.
>>>      +mount: /mnt/scratch_f2fs: mount system call failed: Structure 
>>> needs
>>> cleaning.
>>>      +       dmesg(1) may have more information after failed mount 
>>> system
>>> call.
>>>      +cycle mount failed
>>>      +(see /share/git/fstests/results//generic/388.full for details)
>>>      ...
>>>      (Run 'diff -u /share/git/fstests/tests/generic/388.out /share/git/
>>> fstests/results//generic/388.out.bad'  to see the entire diff)
>>>
>>>
>>>      F2FS-fs (dm-0): sanity_check_extent_cache: inode (ino=1761) extent
>>> info [220057, 57, 6] is incorrect, run fsck to fix
>>>
>>> I suspect we may miss any extent updates after we set FI_NO_EXTENT in
>>> __destroy_extent_node(), result in failing in 
>>> sanity_check_extent_cache().
>>>
>>> Can we just relocate f2fs_bug_on(node_cnt) rather than complicated 
>>> change?
>>> Thoughts?
>>
>> Oh, I overlooked largest extent. How about relocate
>> f2fs_bug_on(node_cnt) to __destroy_extent_tree?
>>
>> static void __destroy_extent_tree(struct inode *inode, enum extent_type
>> type)
>>
>>          /* free all extent info belong to this extent tree */
>>          node_cnt = __destroy_extent_node(inode, type);
>> +       f2fs_bug_on(sbi, atomic_read(&et->node_cnt));
> 
>      /* free all extent info belong to this extent tree */
>      node_cnt = __destroy_extent_node(inode, type);
> 
>      /* delete extent tree entry in radix tree */
>      mutex_lock(&eti->extent_tree_lock);
>      f2fs_bug_on(sbi, atomic_read(&et->node_cnt));  <---
> 
> Oh, it has already checked node_cnt, so, maybe we can just remove the 
> check in
> __destroy_extent_node()?

Yes. BTW, is it correct to remove the call to f2fs_destroy_extent_node()
in f2fs_drop_inode()? It seems this call is unnecessary, since
f2fs_evict_inode() will eventually delete all extent nodes properly.

Thanks
Yongpeng,

> 
> Thanks,
> 
> 
>>
>> Thanks
>> Yongpeng,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Fixes: 3fc5d5a182f6 ("f2fs: fix to shrink read extent node in batches")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
>>>> ---
>>>>    fs/f2fs/extent_cache.c | 17 ++++++++++-------
>>>>    1 file changed, 10 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
>>>> index 0ed84cc065a7..87169fd29d89 100644
>>>> --- a/fs/f2fs/extent_cache.c
>>>> +++ b/fs/f2fs/extent_cache.c
>>>> @@ -119,9 +119,10 @@ static bool __may_extent_tree(struct inode
>>>> *inode, enum extent_type type)
>>>>        if (!__init_may_extent_tree(inode, type))
>>>>            return false;
>>>>    +    if (is_inode_flag_set(inode, FI_NO_EXTENT))
>>>> +        return false;
>>>> +
>>>>        if (type == EX_READ) {
>>>> -        if (is_inode_flag_set(inode, FI_NO_EXTENT))
>>>> -            return false;
>>>>            if (is_inode_flag_set(inode, FI_COMPRESSED_FILE) &&
>>>>                     !f2fs_sb_has_readonly(F2FS_I_SB(inode)))
>>>>                return false;
>>>> @@ -644,6 +645,8 @@ static unsigned int __destroy_extent_node(struct
>>>> inode *inode,
>>>>          while (atomic_read(&et->node_cnt)) {
>>>>            write_lock(&et->lock);
>>>> +        if (!is_inode_flag_set(inode, FI_NO_EXTENT))
>>>> +            set_inode_flag(inode, FI_NO_EXTENT);
>>>>            node_cnt += __free_extent_tree(sbi, et, nr_shrink);
>>>>            write_unlock(&et->lock);
>>>>        }
>>>> @@ -688,12 +691,12 @@ static void __update_extent_tree_range(struct
>>>> inode *inode,
>>>>          write_lock(&et->lock);
>>>>    -    if (type == EX_READ) {
>>>> -        if (is_inode_flag_set(inode, FI_NO_EXTENT)) {
>>>> -            write_unlock(&et->lock);
>>>> -            return;
>>>> -        }
>>>> +    if (is_inode_flag_set(inode, FI_NO_EXTENT)) {
>>>> +        write_unlock(&et->lock);
>>>> +        return;
>>>> +    }
>>>>    +    if (type == EX_READ) {
>>>>            prev = et->largest;
>>>>            dei.len = 0;
>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-f2fs-devel mailing list
>>> Linux-f2fs-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>
> 
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


^ permalink raw reply

* Re: [REGRESSION] Return change in 6.12.80+ with volatile mounting
From: Amir Goldstein @ 2026-04-18 15:39 UTC (permalink / raw)
  To: Chenglong Tang; +Cc: Derek Taylor, stable, regressions, Kevin Berry, overlayfs
In-Reply-To: <CAOdxtTbwipkyAfDakLAB6aVp6YkPWtKpDdVDUTz88WDB-18HXQ@mail.gmail.com>

On Sat, Apr 18, 2026 at 1:33 AM Chenglong Tang <chenglongtang@google.com> wrote:
>
> CC Amir,
>
> For example, containerd 2.2.0 uses `volatile` instead of `fsync=volatile`:
> https://github.com/containerd/containerd/blob/main/core/mount/temp.go#L91C1-L92C1
>
> On Fri, Apr 17, 2026 at 3:41 PM Derek Taylor <ddtaylor@google.com> wrote:
> >
> > This change seems to have so far affected at least containerd in an
> > issue reported here
> > https://github.com/containerd/containerd/issues/13250.
> >
> > In stable versions 6.12.80+, commit
> > 6c0cfbe020c0fcd2a544fcd2931fbc366ee3cd12 with the specific change
> > being:
> > [*] The mount option "volatile" is an alias to "fsync=volatile".
> > In this scenario, code relying on checking "volatile" will now fail
> > due to the return being "fsync=volatile".
> >
> > #regzbot introduced:v6.12.80

Hi Chenglong,

Thanks for the report.

Is this problem in production containerd or in a test suite?
I did not understand the purpose of WithTempMount().

Is it possible to fix this function to use string.Contains() instead of
exact match to the "volatile" mount option?

If needed I can fix the kernel to show the legacy "volatile" option,
but I would like to first understand how bad the impact of this regression
is on real production workloads.

Thanks,
Amir.

^ permalink raw reply

* [PATCH] eventfs: Use list_add_tail_rcu() for SRCU-protected children list
From: David Carlier @ 2026-04-18 15:22 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: mathieu.desnoyers, linux-trace-kernel, linux-kernel,
	David Carlier, stable

Commit d2603279c7d6 ("eventfs: Use list_del_rcu() for SRCU protected
list variable") converted the removal side to pair with the
list_for_each_entry_srcu() walker in eventfs_iterate(). The insertion
in eventfs_create_dir() was left as a plain list_add_tail(), which on
weakly-ordered architectures can expose a new entry to the SRCU reader
before its list pointers and fields are observable.

Use list_add_tail_rcu() so the publication pairs with the existing
list_del_rcu() and list_for_each_entry_srcu().

Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 fs/tracefs/event_inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 81df94038f2e..8dd554508828 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -706,7 +706,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode

 	scoped_guard(mutex, &eventfs_mutex) {
 		if (!parent->is_freed)
-			list_add_tail(&ei->list, &parent->children);
+			list_add_tail_rcu(&ei->list, &parent->children);
 	}
 	/* Was the parent freed? */
 	if (list_empty(&ei->list)) {
-- 
2.53.0

^ permalink raw reply related

* [PATCH] lockdep: fix NULL pointer dereference in __lock_set_class()
From: Xiang Gao @ 2026-04-18 14:14 UTC (permalink / raw)
  To: peterz, mingo, will, boqun; +Cc: longman, linux-kernel, Xiang Gao, stable

From: Xiang Gao <gaoxiang17@xiaomi.com>

register_lock_class() can return NULL on failure (e.g., exceeding
MAX_LOCKDEP_KEYS or lock_keys_in_use overflow). __lock_set_class()
uses the return value directly in pointer arithmetic without a NULL
check:

  class = register_lock_class(lock, subclass, 0);
  hlock->class_idx = class - lock_classes;

If class is NULL, this computes a garbage negative offset that corrupts
hlock->class_idx (a bitfield). Any subsequent hlock_class() call on
this hlock returns a garbage pointer, leading to memory corruption or
a crash.

The other call site in __lock_acquire() (line 5112) already handles
this correctly with an explicit NULL check. Add the same guard here.

Fixes: 64aa348edc61 ("lockdep: lock_set_subclass - reset a held lock's subclass")
Cc: stable@vger.kernel.org
Signed-off-by: Xiang Gao <gaoxiang17@xiaomi.com>
---
 kernel/locking/lockdep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2d4c5bab5af8..e0de81114824 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5437,6 +5437,8 @@ __lock_set_class(struct lockdep_map *lock, const char *name,
 			      lock->wait_type_outer,
 			      lock->lock_type);
 	class = register_lock_class(lock, subclass, 0);
+	if (!class)
+		return 0;
 	hlock->class_idx = class - lock_classes;

 	curr->lockdep_depth = i;
-- 
2.34.1

^ permalink raw reply related

* [PATCH net v2] net/rds: zero per-item info buffer before handing it to visitors
From: Michael Bommarito @ 2026-04-18 14:10 UTC (permalink / raw)
  To: Allison Henderson, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Sharath Srinivasan, Simon Horman, netdev, linux-rdma, rds-devel,
	linux-kernel, stable
In-Reply-To: <20260417141916.494761-1-michael.bommarito@gmail.com>

rds_for_each_conn_info() and rds_walk_conn_path_info() both hand a
caller-allocated on-stack u64 buffer to a per-connection visitor and
then copy the full item_len bytes back to user space via
rds_info_copy() regardless of how much of the buffer the visitor
actually wrote.

rds_ib_conn_info_visitor() and rds6_ib_conn_info_visitor() only
write a subset of their output struct when the underlying
rds_connection is not in state RDS_CONN_UP (src/dst addr, tos, sl
and the two GIDs via explicit memsets). Several u32 fields
(max_send_wr, max_recv_wr, max_send_sge, rdma_mr_max, rdma_mr_size,
cache_allocs) and the 2-byte alignment hole between sl and
cache_allocs remain as whatever stack contents preceded the visitor
call and are then memcpy_to_user()'d out to user space.

struct rds_info_rdma_connection and struct rds6_info_rdma_connection
are the only rds_info_* structs in include/uapi/linux/rds.h that are
not marked __attribute__((packed)), so they have a real alignment
hole. The other info visitors (rds_conn_info_visitor,
rds6_conn_info_visitor, rds_tcp_tc_info, ...) write all fields of
their packed output struct today and are not known to be vulnerable,
but a future visitor that adds a conditional write-path would have
the same bug.

Reproduction on a kernel built without CONFIG_INIT_STACK_ALL_ZERO=y:
a local unprivileged user opens AF_RDS, sets SO_RDS_TRANSPORT=IB,
binds to a local address on an RDMA-capable netdev (rxe soft-RoCE on
any netdev is sufficient), sendto()'s any peer on the same subnet
(fails cleanly but installs an rds_connection in the global hash in
RDS_CONN_CONNECTING), then calls getsockopt(SOL_RDS,
RDS_INFO_IB_CONNECTIONS). The returned 68-byte item contains 26
bytes of stack garbage including kernel text/data pointers:

    0..7   0a 63 00 01 0a 63 00 02     src=10.99.0.1 dst=10.99.0.2
    8..39  00 ...                      gids (memset-zeroed)
    40..47 e0 92 a3 81 ff ff ff ff     kernel pointer (max_send_wr)
    48..55 7f 37 b5 81 ff ff ff ff     kernel pointer (rdma_mr_max)
    56..59 01 00 08 00                 rdma_mr_size (garbage)
    60..61 00 00                       tos, sl
    62..63 00 00                       alignment padding
    64..67 18 00 00 00                 cache_allocs (garbage)

Fix by zeroing the per-item buffer in both rds_for_each_conn_info()
and rds_walk_conn_path_info() before invoking the visitor. This
covers the IPv4/IPv6 IB visitors and hardens all current and future
visitors against the same class of bug.

No functional change for visitors that fully populate their output.

Changes in v2:
- retarget at the net tree (subject prefix "[PATCH net v2]",
  net/rds: prefix in the title)
- add Cc: stable@vger.kernel.org
- pick up Reviewed-by tags from Sharath Srinivasan and
  Allison Henderson

Fixes: ec16227e1414 ("RDS/IB: Infiniband transport")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
Reviewed-by: Allison Henderson <achender@kernel.org>
Assisted-by: Claude:claude-opus-4-7
---
 net/rds/connection.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 412441aaa298..c10b7ed06c49 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -701,6 +701,13 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 	     i++, head++) {
 		hlist_for_each_entry_rcu(conn, head, c_hash_node) {

+			/* Zero the per-item buffer before handing it to the
+			 * visitor so any field the visitor does not write -
+			 * including implicit alignment padding - cannot leak
+			 * stack contents to user space via rds_info_copy().
+			 */
+			memset(buffer, 0, item_len);
+
 			/* XXX no c_lock usage.. */
 			if (!visitor(conn, buffer))
 				continue;
@@ -750,6 +757,13 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
 			 */
 			cp = conn->c_path;

+			/* Zero the per-item buffer for the same reason as
+			 * rds_for_each_conn_info(): any byte the visitor
+			 * does not write (including alignment padding) must
+			 * not leak stack contents via rds_info_copy().
+			 */
+			memset(buffer, 0, item_len);
+
 			/* XXX no cp_lock usage.. */
 			if (!visitor(cp, buffer))
 				continue;
-- 
2.53.0

^ permalink raw reply related

* [PATCH] hfsplus: zero-initialize buffer in hfs_bnode_read
From: Tristan Madani @ 2026-04-18 13:40 UTC (permalink / raw)
  To: slava, glaubitz, frank.li
  Cc: linux-fsdevel, akpm, stable, syzbot+217eb327242d08197efb,
	Tristan Madani

hfs_bnode_read() can return early without initializing the output
buffer when the offset is invalid or the requested length is
corrected to zero by check_and_correct_requested_length().  Callers
such as hfs_bnode_read_u16() pass stack-allocated buffers and use the
result unconditionally, leading to KMSAN uninit-value reports.

Rather than initializing at each individual call site, zero the buffer
at the start of hfs_bnode_read() before any validation checks.  This
ensures the buffer is always in a known state regardless of which
early-return path is taken.

Reported-by: syzbot+217eb327242d08197efb@syzkaller.appspotmail.com
Tested-by: syzbot+217eb327242d08197efb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=217eb327242d08197efb
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
---
 fs/hfsplus/bnode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index f8b5a8ae58ff5..14d1af2c7ba93 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -25,6 +25,8 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, u32 off, u32 len)
 	struct page **pagep;
 	u32 l;

+	memset(buf, 0, len);
+
 	if (!is_bnode_offset_valid(node, off))
 		return;

-- 
2.47.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox