[PATCH] net/9p: fix infinite loop in p9_client

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal
@ 2026-04-15 15:52 Vasiliy Kovalev
  2026-04-16  1:41 ` Dominique Martinet
  0 siblings, 1 reply; 4+ messages in thread
From: Vasiliy Kovalev @ 2026-04-15 15:52 UTC (permalink / raw)
  To: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	Christian Schoenebeck, v9fs
  Cc: linux-kernel, lvc-project, kovalev

When p9_client_rpc() is called with type P9_TFLUSH and the transport
has no peer (e.g. fd transport backed by pipes with no 9p server),
a fatal signal causes an infinite loop:

  again:
	err = io_wait_event_killable(req->wq, ...)
	/* SIGKILL wakes the task, returns -ERESTARTSYS */

	if (err == -ERESTARTSYS && c->status == Connected &&
		type == P9_TFLUSH) {
		sigpending = 1;
		clear_thread_flag(TIF_SIGPENDING);
		goto again;
	}

clear_thread_flag() clears TIF_SIGPENDING before jumping back to
io_wait_event_killable(). signal_pending_state() checks TIF_SIGPENDING,
finds it zero, and the task goes to sleep again. The task can only wake
on the next signal delivery that calls signal_wake_up() and sets
TIF_SIGPENDING again. When that happens the loop repeats, clears
TIF_SIGPENDING, and sleeps again indefinitely.

This is triggered in practice by coredump_wait(): when a thread in a
multi-threaded process causes a coredump (e.g. via SIGSYS from Syscall
User Dispatch), coredump_wait() sends SIGKILL to all other threads and
waits for them to call mm_release(). If one of those threads is blocked
in p9_client_rpc() over an fd transport with no peer, it enters the
P9_TFLUSH loop and never calls mm_release(), so coredump_wait() stalls
forever:

INFO: task syz.0.18:676 blocked for more than 143 seconds.
      Not tainted 6.12.77+ #1
task:syz.0.18 state:D stack:27600 pid:676 tgid:673 ppid:630 flags:0x00000004
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5344 [inline]
 __schedule+0xcb4/0x5d50 kernel/sched/core.c:6724
 __schedule_loop kernel/sched/core.c:6801 [inline]
 schedule+0xe5/0x350 kernel/sched/core.c:6816
 schedule_timeout+0x253/0x290 kernel/time/timer.c:2593
 do_wait_for_common kernel/sched/completion.c:95 [inline]
 __wait_for_common+0x409/0x600 kernel/sched/completion.c:116
 wait_for_common kernel/sched/completion.c:127 [inline]
 wait_for_completion_state+0x1d/0x40 kernel/sched/completion.c:264
 coredump_wait fs/coredump.c:448 [inline]
 do_coredump+0x854/0x4350 fs/coredump.c:629
 get_signal+0x1425/0x2730 kernel/signal.c:2903
 arch_do_signal_or_restart+0x81/0x880 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0xf9/0x160 kernel/entry/common.c:218
 do_syscall_64+0x102/0x220 arch/x86/entry/common.c:84
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

Fix: check fatal_signal_pending() before clearing TIF_SIGPENDING in the
P9_TFLUSH retry loop. At that point TIF_SIGPENDING is still set, so
fatal_signal_pending() works correctly. If a fatal signal is pending,
jump to recalc_sigpending to restore TIF_SIGPENDING and return
-ERESTARTSYS to the caller.

The same defect is present in stable kernels back to 5.4. On those
kernels the infinite loop is broken earlier by a second SIGKILL from
the parent process (e.g. kill_and_wait() retrying after a timeout),
resulting in a zombie process and a shutdown delay rather than a
permanent D-state hang, but the underlying flaw is the same.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 91b8534fa8f5 ("9p: make rpc code common and rework flush code")
Closes: https://syzkaller.appspot.com/bug?extid=3ce7863f8fc836a427e7
Cc: stable@vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
---
 net/9p/client.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/9p/client.c b/net/9p/client.c
index f0dcf252af7e..748b92d3f0c1 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -600,6 +600,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)

 	if (err == -ERESTARTSYS && c->status == Connected &&
 	    type == P9_TFLUSH) {
+		if (fatal_signal_pending(current))
+			goto recalc_sigpending;
 		sigpending = 1;
 		clear_thread_flag(TIF_SIGPENDING);
 		goto again;
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal
  2026-04-15 15:52 [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal Vasiliy Kovalev
@ 2026-04-16  1:41 ` Dominique Martinet
  2026-04-16 12:49   ` Vasiliy Kovalev
  0 siblings, 1 reply; 4+ messages in thread
From: Dominique Martinet @ 2026-04-16  1:41 UTC (permalink / raw)
  To: Vasiliy Kovalev
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs, linux-kernel, lvc-project

Vasiliy Kovalev wrote on Wed, Apr 15, 2026 at 06:52:37PM +0300:
> The same defect is present in stable kernels back to 5.4. On those
> kernels the infinite loop is broken earlier by a second SIGKILL from
> the parent process (e.g. kill_and_wait() retrying after a timeout),
> resulting in a zombie process and a shutdown delay rather than a
> permanent D-state hang, but the underlying flaw is the same.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

It's unfortunately not that simple, just leaving here will leak things.

There was another patch last month, help is welcome,
please see https://lore.kernel.org/r/ab_b-95Ok9zLQCdn@codewreck.org

(By the way, what's with the rekindled interest on this?...)

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal
  2026-04-16  1:41 ` Dominique Martinet
@ 2026-04-16 12:49   ` Vasiliy Kovalev
  2026-04-16 22:52     ` Dominique Martinet
  0 siblings, 1 reply; 4+ messages in thread
From: Vasiliy Kovalev @ 2026-04-16 12:49 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs, linux-kernel, lvc-project

Hi Dominique,

On 4/16/26 04:41, Dominique Martinet wrote:
> Vasiliy Kovalev wrote on Wed, Apr 15, 2026 at 06:52:37PM +0300:
>> The same defect is present in stable kernels back to 5.4. On those
>> kernels the infinite loop is broken earlier by a second SIGKILL from
>> the parent process (e.g. kill_and_wait() retrying after a timeout),
>> resulting in a zombie process and a shutdown delay rather than a
>> permanent D-state hang, but the underlying flaw is the same.
>>
>> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> It's unfortunately not that simple, just leaving here will leak things.

Following up on your concern about leaking things (as mentioned in the
thread with Shigeru Yoshida's patch) - I did some tracing with `printk`
inside `p9_req_put` and `p9_tag_remove` to see exactly what happens to
the request lifecycle with my proposed patch vs Shigeru's approach:

diff --git a/net/9p/client.c b/net/9p/client.c
index 748b92d3f0c1..4e8f4e0195b4 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -274,10 +274,13 @@ static void p9_tag_remove(struct p9_client *c, 
struct p9_req_t *r)
         spin_lock_irqsave(&c->lock, flags);
         idr_remove(&c->reqs, tag);
         spin_unlock_irqrestore(&c->lock, flags);
+       printk("tag %d removed from IDR\n", tag);
  }

  int p9_req_put(struct p9_client *c, struct p9_req_t *r)
  {
+       printk("p9_req_put req %p tag %d refcount before=%d\n",
+         r, r->tc.tag, refcount_read(&r->refcount));
         if (refcount_dec_and_test(&r->refcount)) {
                 p9_tag_remove(c, r);

There is a significant difference.

If we use Shigeru's approach [1] (preventing the flush from being
sent/processed using `!fatal_signal_pending(current)` in the condition),
the original request (tag 0) is never freed from the IDR. It creates a
definitive memory and tag leak:

[root@localhost]# ./repro
executing program
[   50.613200] p9_req_put req ff11000110157cb0 tag 65535 refcount before=3
[   50.663513] p9_req_put req ff11000110157cb0 tag 65535 refcount before=2
[   50.665156] p9_req_put req ff11000110157cb0 tag 65535 refcount before=1
[   50.666593] tag 65535 removed from IDR
executing program
[   50.688232] p9_req_put req ff11000110157ba0 tag 65535 refcount before=3
[   50.739634] p9_req_put req ff11000110157ba0 tag 65535 refcount before=2
[   50.741170] p9_req_put req ff11000110157ba0 tag 65535 refcount before=1
[   50.742587] tag 65535 removed from IDR
executing program
[   50.775708] p9_req_put req ff11000110157a90 tag 65535 refcount before=3
[   50.825104] p9_req_put req ff11000110157a90 tag 65535 refcount before=2
[   50.827719] p9_req_put req ff11000110157a90 tag 65535 refcount before=1
[   50.830432] tag 65535 removed from IDR
executing program
...

[1] 
https://lore.kernel.org/all/20260322111848.2837057-1-syoshida@redhat.com/

However, with my patch (jumping to `recalc_sigpending` *inside* the
`P9_TFLUSH` wait loop), the logic works cleanly due to the `refcount`
mechanism:

[root@localhost]# ./repro
executing program
[   48.875093] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=3
[   48.925981] p9_req_put req ff110001056fdba0 tag 0 refcount before=3
[   53.873969] p9_req_put req ff110001056fdba0 tag 0 refcount before=2
[   53.876638] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=2
[   53.879882] p9_req_put req ff110001056fdba0 tag 0 refcount before=1
[   53.882505] tag 0 removed from IDR
[   53.884184] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=1
[   53.886905] tag 65535 removed from IDR
executing program
[   53.914764] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=3
[   53.967191] p9_req_put req ff1100010ca97980 tag 0 refcount before=3
[   58.909626] p9_req_put req ff1100010ca97980 tag 0 refcount before=2
[   58.912420] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=2
[   58.915358] p9_req_put req ff1100010ca97980 tag 0 refcount before=1
[   58.918478] tag 0 removed from IDR
[   58.920730] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=1
[   58.923715] tag 65535 removed from IDR
executing program
[   58.946905] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=3
[   58.998260] p9_req_put req ff1100010c5a3760 tag 0 refcount before=3
[   63.945960] p9_req_put req ff1100010c5a3760 tag 0 refcount before=2
[   63.948549] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=2
[   63.951153] p9_req_put req ff1100010c5a3760 tag 0 refcount before=1
[   63.953651] tag 0 removed from IDR
[   63.957254] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=1
[   63.960655] tag 65535 removed from IDR
executing program
...

1. The dying thread breaks out of the loop and calls `p9_req_put()`.
The refcount correctly drops from 2 to 1.

2. The tag is NOT removed from the IDR yet, preventing any Use-After-Free.
The request is safely "orphaned".

3. Shortly after, when the parent process closes the pipe/socket (or the
transport is torn down due to process exit), the transport layer cleans up
the pending requests. It calls the final `p9_req_put()`, refcount hits 0,
and the tag is properly removed from the IDR.

My logs clearly show `refcount before=1 -> tag 0 removed from IDR` upon
transport teardown, proving there is no permanent leak when breaking out
of the `TFLUSH` loop this way.

Since this safely resolves the permanent D-state hang during
`coredump_wait()` without causing the UAF or leaks that plagued earlier
kernels (pre-`728356dedeff` ("9p: Add refcount to p9_req_t")), do you
think this pragmatic fix is acceptable for mainline and stable?

> There was another patch last month, help is welcome,
> please see https://lore.kernel.org/r/ab_b-95Ok9zLQCdn@codewreck.org

While the ideal long-term goal is the asynchronous implementation (as
seen in your 9p-async-v2 branch [2]), this patch serves as a reliable
intermediate solution for a critical regression.

[2] https://github.com/martinetd/linux/commits/9p-async-v2

> (By the way, what's with the rekindled interest on this?...)
> 

The issue was identified by Syzkaller, and I have a stable C reproducer
that consistently triggers the hang. There is also a similar report on
Syzbot with a nearly identical reproducer:
https://syzkaller.appspot.com/text?tag=ReproC&x=156aa534580000

-- 
Thanks,
Vasiliy

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal
  2026-04-16 12:49   ` Vasiliy Kovalev
@ 2026-04-16 22:52     ` Dominique Martinet
  0 siblings, 0 replies; 4+ messages in thread
From: Dominique Martinet @ 2026-04-16 22:52 UTC (permalink / raw)
  To: Vasiliy Kovalev
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs, linux-kernel, lvc-project

Vasiliy Kovalev wrote on Thu, Apr 16, 2026 at 03:49:59PM +0300:
> > It's unfortunately not that simple, just leaving here will leak things.
> 
> Following up on your concern about leaking things (as mentioned in the
> thread with Shigeru Yoshida's patch) - I did some tracing with `printk`
> inside `p9_req_put` and `p9_tag_remove` to see exactly what happens to
> the request lifecycle with my proposed patch vs Shigeru's approach:
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 748b92d3f0c1..4e8f4e0195b4 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -274,10 +274,13 @@ static void p9_tag_remove(struct p9_client *c, struct
> p9_req_t *r)
>         spin_lock_irqsave(&c->lock, flags);
>         idr_remove(&c->reqs, tag);
>         spin_unlock_irqrestore(&c->lock, flags);
> +       printk("tag %d removed from IDR\n", tag);
>  }
> 
>  int p9_req_put(struct p9_client *c, struct p9_req_t *r)
>  {
> +       printk("p9_req_put req %p tag %d refcount before=%d\n",
> +         r, r->tc.tag, refcount_read(&r->refcount));
>         if (refcount_dec_and_test(&r->refcount)) {
>                 p9_tag_remove(c, r);
> 
> There is a significant difference.
> 
> If we use Shigeru's approach [1] (preventing the flush from being
> sent/processed using `!fatal_signal_pending(current)` in the condition),
> the original request (tag 0) is never freed from the IDR. It creates a
> definitive memory and tag leak:
> 
> [root@localhost]# ./repro
> executing program
> [   50.613200] p9_req_put req ff11000110157cb0 tag 65535 refcount before=3
> [   50.663513] p9_req_put req ff11000110157cb0 tag 65535 refcount before=2
> [   50.665156] p9_req_put req ff11000110157cb0 tag 65535 refcount before=1
> [   50.666593] tag 65535 removed from IDR
> executing program
> [   50.688232] p9_req_put req ff11000110157ba0 tag 65535 refcount before=3
> [   50.739634] p9_req_put req ff11000110157ba0 tag 65535 refcount before=2
> [   50.741170] p9_req_put req ff11000110157ba0 tag 65535 refcount before=1
> [   50.742587] tag 65535 removed from IDR
> executing program
> [   50.775708] p9_req_put req ff11000110157a90 tag 65535 refcount before=3
> [   50.825104] p9_req_put req ff11000110157a90 tag 65535 refcount before=2
> [   50.827719] p9_req_put req ff11000110157a90 tag 65535 refcount before=1
> [   50.830432] tag 65535 removed from IDR
> executing program
> ...
> 
> [1]
> https://lore.kernel.org/all/20260322111848.2837057-1-syoshida@redhat.com/
> 
> However, with my patch (jumping to `recalc_sigpending` *inside* the
> `P9_TFLUSH` wait loop), the logic works cleanly due to the `refcount`
> mechanism:
> 
> [root@localhost]# ./repro
> executing program
> [   48.875093] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=3
> [   48.925981] p9_req_put req ff110001056fdba0 tag 0 refcount before=3
> [   53.873969] p9_req_put req ff110001056fdba0 tag 0 refcount before=2
> [   53.876638] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=2
> [   53.879882] p9_req_put req ff110001056fdba0 tag 0 refcount before=1
> [   53.882505] tag 0 removed from IDR
> [   53.884184] p9_req_put req ff110001056fdcb0 tag 65535 refcount before=1
> [   53.886905] tag 65535 removed from IDR
> executing program
> [   53.914764] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=3
> [   53.967191] p9_req_put req ff1100010ca97980 tag 0 refcount before=3
> [   58.909626] p9_req_put req ff1100010ca97980 tag 0 refcount before=2
> [   58.912420] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=2
> [   58.915358] p9_req_put req ff1100010ca97980 tag 0 refcount before=1
> [   58.918478] tag 0 removed from IDR
> [   58.920730] p9_req_put req ff1100010ca97a90 tag 65535 refcount before=1
> [   58.923715] tag 65535 removed from IDR
> executing program
> [   58.946905] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=3
> [   58.998260] p9_req_put req ff1100010c5a3760 tag 0 refcount before=3
> [   63.945960] p9_req_put req ff1100010c5a3760 tag 0 refcount before=2
> [   63.948549] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=2
> [   63.951153] p9_req_put req ff1100010c5a3760 tag 0 refcount before=1
> [   63.953651] tag 0 removed from IDR
> [   63.957254] p9_req_put req ff1100010c5a3870 tag 65535 refcount before=1
> [   63.960655] tag 65535 removed from IDR
> executing program
> ...
> 
> 1. The dying thread breaks out of the loop and calls `p9_req_put()`.
> The refcount correctly drops from 2 to 1.
> 
> 2. The tag is NOT removed from the IDR yet, preventing any Use-After-Free.
> The request is safely "orphaned".
> 
> 3. Shortly after, when the parent process closes the pipe/socket (or the
> transport is torn down due to process exit), the transport layer cleans up
> the pending requests. It calls the final `p9_req_put()`, refcount hits 0,
> and the tag is properly removed from the IDR.

Thanks for taking the time to verify

So if I understand this right, with a sane server, the tag will be used
until umount or a reply (to either the flush or the original request)?

That sounds sane enough (and definitely better then Shigeru's patch)

> Since this safely resolves the permanent D-state hang during
> `coredump_wait()` without causing the UAF or leaks that plagued earlier
> kernels (pre-`728356dedeff` ("9p: Add refcount to p9_req_t")), do you
> think this pragmatic fix is acceptable for mainline and stable?

I'll need to find time (or someone) to perform some heavy tests with
this -- basically my problem is that these hangs are easy to test with
syzcaller but these are not ever supposed to happen with a real
(correct) server, so while I agree not having stuck threads would be
helpful to improve test efficiency I don't see it as a "real" problem
(as in, 9p mount is a privileged operation, so normal users aren't
supposed to mess around with a pipe/fd mount like syzcaller does)

Basically I'd want to "just" mess around with a real system on a 9p
mount and play with ^C/sigkills and see what happens, but I don't have
time for that short term, so this will likely be a few months

>  While the ideal long-term goal is the asynchronous implementation (as
> seen in your 9p-async-v2 branch [2]), this patch serves as a reliable
> intermediate solution for a critical regression.
> [2] https://github.com/martinetd/linux/commits/9p-async-v2

iirc one of the problem with the async branch is that the process would
quit immediately on, say, ^C, before the IO has completed, but it's
possible for the server to process the IO (and not the flush) afterwards
and you'd get something that's not supposed to happen e.g.

p1             p2

write(1)
^C/sigkill
flush sent but process exit without waiting for server ack
               1 not written yet
               write(2) in same spot
               write(2) done
write(1) completes
data isn't 2 as expected after p2 completed


So it's quite possible async isn't the way to go, but that there is no
good solution for this
(given this is true even without async on sigkill: if we have something
that works safely, there's no reason to wait only for non-fatal signals...)

> > (By the way, what's with the rekindled interest on this?...)
> > 
> 
> The issue was identified by Syzkaller, and I have a stable C reproducer
> that consistently triggers the hang. There is also a similar report on
> Syzbot with a nearly identical reproducer:
> https://syzkaller.appspot.com/text?tag=ReproC&x=156aa534580000

Yes, it's been a known problem since syzcaller started testing
9p... many years ago.
Back then requests didn't have a refcount, so it's quite likely nobody
looked hard enough after this became possible.

Thanks,
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-16 22:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 15:52 [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal Vasiliy Kovalev
2026-04-16  1:41 ` Dominique Martinet
2026-04-16 12:49   ` Vasiliy Kovalev
2026-04-16 22:52     ` Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox