* [PATCH 0/2] af_unix: Fix priority inversion issue @ 2026-07-01 16:35 Nam Cao 2026-07-01 16:35 ` [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() Nam Cao 2026-07-01 16:35 ` [PATCH 2/2] af_unix: Clean up unix_schedule_gc() Nam Cao 0 siblings, 2 replies; 5+ messages in thread From: Nam Cao @ 2026-07-01 16:35 UTC (permalink / raw) To: Kuniyuki Iwashima, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel, linux-rt-devel Cc: Nam Cao Hi, While auditing AF_UNIX sockets, I noticed that the sendmsg() code paths may block on the garbage collector running as workqueue. This can cause priority inversion and latency for real-time users. The implementation does kindly avoid blocking "sane users". However, it is impossible to tell whether the kernel's definition of "sane users" accurately describes all users out there. Digging into history and figuring out the reasons why sendmsg() needs to wait for garbage collector, it is determined that those reasons no longer apply. The first patch remove that block, and the second patch is a simple post cleanup. Nam Cao (2): af_unix: Do not wait for garbage collector in sendmsg() af_unix: Clean up unix_schedule_gc() net/unix/af_unix.c | 2 +- net/unix/af_unix.h | 2 +- net/unix/garbage.c | 16 +--------------- 3 files changed, 3 insertions(+), 17 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() 2026-07-01 16:35 [PATCH 0/2] af_unix: Fix priority inversion issue Nam Cao @ 2026-07-01 16:35 ` Nam Cao 2026-07-02 3:27 ` Kuniyuki Iwashima 2026-07-01 16:35 ` [PATCH 2/2] af_unix: Clean up unix_schedule_gc() Nam Cao 1 sibling, 1 reply; 5+ messages in thread From: Nam Cao @ 2026-07-01 16:35 UTC (permalink / raw) To: Kuniyuki Iwashima, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel, linux-rt-devel Cc: Nam Cao AF_UNIX sockets' sendmsg() schedules and blocks on the garbage collector if user has too many inflight unix sockets. This causes real-time issues, as high priority processes who do need to send lots of unix sockets get blocked by the garbage collector which runs as workqueue, causing a priority inversion scenario. The reason for blocking on garbage collector goes back to 2008, when it was reported that "Local/unprivileged users can cause soft lockups and take out system processes by triggering the OOM killer": https://bugzilla.redhat.com/show_bug.cgi?id=470201 The soft lockup was because a process can keep queueing AF_UNIX sockets to another process that is exiting. Back in 2008, the garbage collector was run synchronously by the exiting process, therefore keep queueing AF_UNIX sockets blocks that process from exiting. The solution to that issue was forcing sendmsg() to wait for ongoing garbage collector. The OOM killer issue was brought up again in 2010: https://lore.kernel.org/lkml/AANLkTi=Q967xpX0KLMwX-=_4_1AKO5wjHEuJ1TrNjCj9@mail.gmail.com/ To resolve that report, beside blocking on the garbage collector, sendmsg() also schedules the garbage collector if the number of inflight AF_UNIX sockets in the system is too high. Then in 2015, once again, the OOM killer problem was brought up: https://lore.kernel.org/lkml/20151228141435.GA13351@1wt.eu/ That time, the issue was resolved by disallowing a user from having more inflight AF_UNIX sockets than their RLIMIT_NOFILE. That was done by commit 712f4aad406b ("unix: properly account for FDs passed over unix sockets") and commit 415e3d3e90ce ("unix: correctly track in-flight fds in sending process user_struct"). Now, sendmsg() does not have to block on the garbage collector anymore, because: - The OOM killer issue has already been addressed by checking RLIMIT_NOFILE. - The soft lockup issue is no longer relevant, because the garbage collector now runs asynchronously since commit d9f21b361333 ("af_unix: Try to run GC async.") Therefore, remove that to prevent priority inversion. Running all the reproducers from the mentioned bug reports after this patch, no problem is observed. Signed-off-by: Nam Cao <namcao@linutronix.de> --- net/unix/garbage.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 0783555e2526..f180c59b3da9 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -300,8 +300,6 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) if (!fpl->edges) goto err; - unix_schedule_gc(fpl->user); - return 0; err: -- 2.47.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() 2026-07-01 16:35 ` [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() Nam Cao @ 2026-07-02 3:27 ` Kuniyuki Iwashima 2026-07-02 3:56 ` Nam Cao 0 siblings, 1 reply; 5+ messages in thread From: Kuniyuki Iwashima @ 2026-07-02 3:27 UTC (permalink / raw) To: Nam Cao Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel, linux-rt-devel On Wed, Jul 1, 2026 at 9:36 AM Nam Cao <namcao@linutronix.de> wrote: > > AF_UNIX sockets' sendmsg() schedules and blocks on the garbage collector if > user has too many inflight unix sockets AND there is a cyclic reference >. This causes real-time issues, as > high priority processes who do need to send lots of unix sockets get > blocked by the garbage collector which runs as workqueue, causing a > priority inversion scenario. So the real problem is the process creating cyclic references. > > The reason for blocking on garbage collector goes back to 2008, when > it was reported that "Local/unprivileged users can cause soft lockups > and take out system processes by triggering the OOM killer": > https://bugzilla.redhat.com/show_bug.cgi?id=470201 > > The soft lockup was because a process can keep queueing AF_UNIX sockets to > another process that is exiting. Back in 2008, the garbage collector was > run synchronously by the exiting process, therefore keep queueing AF_UNIX > sockets blocks that process from exiting. > > The solution to that issue was forcing sendmsg() to wait for ongoing > garbage collector. > > The OOM killer issue was brought up again in 2010: > https://lore.kernel.org/lkml/AANLkTi=Q967xpX0KLMwX-=_4_1AKO5wjHEuJ1TrNjCj9@mail.gmail.com/ > > To resolve that report, beside blocking on the garbage collector, sendmsg() > also schedules the garbage collector if the number of inflight AF_UNIX > sockets in the system is too high. > > Then in 2015, once again, the OOM killer problem was brought up: > https://lore.kernel.org/lkml/20151228141435.GA13351@1wt.eu/ > > That time, the issue was resolved by disallowing a user from having more > inflight AF_UNIX sockets than their RLIMIT_NOFILE. That was done by commit > 712f4aad406b ("unix: properly account for FDs passed over unix sockets") > and commit 415e3d3e90ce ("unix: correctly track in-flight fds in sending > process user_struct"). > > Now, sendmsg() does not have to block on the garbage collector anymore, > because: > > - The OOM killer issue has already been addressed by checking > RLIMIT_NOFILE. > > - The soft lockup issue is no longer relevant, because the garbage > collector now runs asynchronously since commit d9f21b361333 ("af_unix: > Try to run GC async.") I don't think the latter is resolved. Without blocking insane users, they can keep pushing sockets to the kernel work, which could be soft-lockup'd. > > Therefore, remove that to prevent priority inversion. Running all the > reproducers from the mentioned bug reports after this patch, no problem is > observed. > > Signed-off-by: Nam Cao <namcao@linutronix.de> > --- > net/unix/garbage.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c > index 0783555e2526..f180c59b3da9 100644 > --- a/net/unix/garbage.c > +++ b/net/unix/garbage.c > @@ -300,8 +300,6 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) > if (!fpl->edges) > goto err; > > - unix_schedule_gc(fpl->user); > - > return 0; > > err: > -- > 2.47.3 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() 2026-07-02 3:27 ` Kuniyuki Iwashima @ 2026-07-02 3:56 ` Nam Cao 0 siblings, 0 replies; 5+ messages in thread From: Nam Cao @ 2026-07-02 3:56 UTC (permalink / raw) To: Kuniyuki Iwashima Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel, linux-rt-devel Kuniyuki Iwashima <kuniyu@google.com> writes: > On Wed, Jul 1, 2026 at 9:36 AM Nam Cao <namcao@linutronix.de> wrote: >> Now, sendmsg() does not have to block on the garbage collector anymore, >> because: >> >> - The OOM killer issue has already been addressed by checking >> RLIMIT_NOFILE. >> >> - The soft lockup issue is no longer relevant, because the garbage >> collector now runs asynchronously since commit d9f21b361333 ("af_unix: >> Try to run GC async.") > > I don't think the latter is resolved. Without blocking insane > users, they can keep pushing sockets to the kernel work, > which could be soft-lockup'd. User cannot push more than RLIMIT_NOFILE before GC runs. And the GC grabs the spin lock, clean up the present stuffs, and exit. So user could make the GC runs again and again, but there wouldn't be soft lockup, as the GC yields after short intervals. Nam ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] af_unix: Clean up unix_schedule_gc() 2026-07-01 16:35 [PATCH 0/2] af_unix: Fix priority inversion issue Nam Cao 2026-07-01 16:35 ` [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() Nam Cao @ 2026-07-01 16:35 ` Nam Cao 1 sibling, 0 replies; 5+ messages in thread From: Nam Cao @ 2026-07-01 16:35 UTC (permalink / raw) To: Kuniyuki Iwashima, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel, linux-rt-devel Cc: Nam Cao The only caller of unix_schedule_gc() passes NULL as an argument. Therefore simplify by deleting the parameter. Signed-off-by: Nam Cao <namcao@linutronix.de> --- net/unix/af_unix.c | 2 +- net/unix/af_unix.h | 2 +- net/unix/garbage.c | 14 +------------- 3 files changed, 3 insertions(+), 15 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f7a9d55eee8a..759db734a866 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -733,7 +733,7 @@ static void unix_release_sock(struct sock *sk, int embrion) /* ---- Socket is dead now and most probably destroyed ---- */ - unix_schedule_gc(NULL); + unix_schedule_gc(); } struct unix_peercred { diff --git a/net/unix/af_unix.h b/net/unix/af_unix.h index 8119dbeef3a3..600d56fdcde4 100644 --- a/net/unix/af_unix.h +++ b/net/unix/af_unix.h @@ -30,7 +30,7 @@ void unix_update_edges(struct unix_sock *receiver); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_peek_fpl(struct scm_fp_list *fpl); -void unix_schedule_gc(struct user_struct *user); +void unix_schedule_gc(void); /* SOCK_DIAG */ long unix_inq_len(struct sock *sk); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index f180c59b3da9..d46aeb9d2051 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -635,23 +635,11 @@ static void unix_gc(struct work_struct *work) static DECLARE_WORK(unix_gc_work, unix_gc); -#define UNIX_INFLIGHT_SANE_USER (SCM_MAX_FD * 8) - -void unix_schedule_gc(struct user_struct *user) +void unix_schedule_gc(void) { if (READ_ONCE(unix_graph_state) == UNIX_GRAPH_NOT_CYCLIC) return; - /* Penalise users who want to send AF_UNIX sockets - * but whose sockets have not been received yet. - */ - if (user && - READ_ONCE(user->unix_inflight) < UNIX_INFLIGHT_SANE_USER) - return; - if (!READ_ONCE(gc_in_progress)) queue_work(system_dfl_wq, &unix_gc_work); - - if (user && READ_ONCE(unix_graph_cyclic_sccs)) - flush_work(&unix_gc_work); } -- 2.47.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-02 3:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-07-01 16:35 [PATCH 0/2] af_unix: Fix priority inversion issue Nam Cao 2026-07-01 16:35 ` [PATCH 1/2] af_unix: Do not wait for garbage collector in sendmsg() Nam Cao 2026-07-02 3:27 ` Kuniyuki Iwashima 2026-07-02 3:56 ` Nam Cao 2026-07-01 16:35 ` [PATCH 2/2] af_unix: Clean up unix_schedule_gc() Nam Cao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox