* [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
@ 2026-05-13 17:24 Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-19 0:35 ` Jakub Kicinski
0 siblings, 2 replies; 3+ messages in thread
From: Ricardo Robaina @ 2026-05-13 17:24 UTC (permalink / raw)
To: audit, linux-kernel, netdev
Cc: paul, eparis, edumazet, kuba, pabeni, horms, Ricardo Robaina,
Steve Grubb
When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
the netlink socket. If the wait timeout fully expires (timeo == 0),
netlink mistakenly interprets the zeroed timeout as a non-blocking
request. It then triggers netlink_overrun that drops the event,
completely bypassing the audit subsystem's internal retry queue, and
falsely returns ENOBUFS to user-space, resulting in the following error:
auditd[]: Error receiving audit netlink packet (No buffer space available)
Fix this by detecting when a blocking sender's timeout has expired
(timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
on the next iteration), safely free the skb and return -EAGAIN, allowing
the audit subsystem to gracefully enqueue the pending event into its
internal backlog.
Suggested-by: Steve Grubb <sgrubb@redhat.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
---
Changes in v2:
- Use the simple check (timeo == 0 && !nonblock) to detect
expired timeout, avoiding adding a new NETLINK flag.
net/netlink/af_netlink.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2aeb0680807d..fdc3db74b178 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
}
err = netlink_attachskb(sk, skb, &timeo, ssk);
- if (err == 1)
+ if (err == 1) {
+ /* timeo may have been zeroed by schedule_timeout inside
+ * netlink_attachskb. If the caller is a timed-blocking sender
+ * (not genuinely nonblocking), don't re-enter with timeo=0 as
+ * that would misfire netlink_overrun on the next iteration.
+ */
+ if (timeo == 0 && !nonblock) {
+ kfree_skb(skb);
+ return -EAGAIN;
+ }
goto retry;
+ }
if (err)
return err;
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
@ 2026-05-18 11:03 ` Simon Horman
2026-05-19 0:35 ` Jakub Kicinski
1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-05-18 11:03 UTC (permalink / raw)
To: rrobaina
Cc: 'Simon Horman', audit, linux-kernel, netdev, paul, eparis,
edumazet, kuba, pabeni, sgrubb
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
netlink, audit: prevent false ENOBUFS on timeout expiry
This commit modifies netlink_unicast() to return -EAGAIN when a blocking
sender's timeout expires. This bypasses the netlink_overrun() call to
allow the audit subsystem to gracefully enqueue the pending event into its
internal backlog without falsely returning ENOBUFS to user-space.
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
> }
>
> err = netlink_attachskb(sk, skb, &timeo, ssk);
> - if (err == 1)
> + if (err == 1) {
> + /* timeo may have been zeroed by schedule_timeout inside
> + * netlink_attachskb. If the caller is a timed-blocking sender
> + * (not genuinely nonblocking), don't re-enter with timeo=0 as
> + * that would misfire netlink_overrun on the next iteration.
> + */
> + if (timeo == 0 && !nonblock) {
> + kfree_skb(skb);
> + return -EAGAIN;
Could this regression cause silent packet drops in non-audit subsystems?
While this change prevents false ENOBUFS for the audit subsystem, other
subsystems like InfiniBand's IWPM use rdma_nl_unicast_wait(), which sets
nonblock=0 and operates on a kernel socket with a timeout.
If the timeout expires because the receiving user-space daemon is
bottlenecked, this new path will free the skb and return -EAGAIN without
ever invoking netlink_overrun().
Without netlink_overrun(), the receiver's socket error is never set to
ENOBUFS. Would this prevent user-space from being notified of the dropped
message, leading to permanent state desynchronization between the kernel
and user-space?
> + }
> goto retry;
> + }
> if (err)
> return err;
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
@ 2026-05-19 0:35 ` Jakub Kicinski
1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-05-19 0:35 UTC (permalink / raw)
To: Ricardo Robaina
Cc: audit, linux-kernel, netdev, paul, eparis, edumazet, pabeni,
horms, Steve Grubb
On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
> the netlink socket.
Holding socket lock during slow IO sounds very wrong. One could say -
that's abuse of the socket lock?
> If the wait timeout fully expires (timeo == 0),
> netlink mistakenly interprets the zeroed timeout as a non-blocking
> request. It then triggers netlink_overrun that drops the event,
> completely bypassing the audit subsystem's internal retry queue, and
> falsely returns ENOBUFS to user-space, resulting in the following error:
>
> auditd[]: Error receiving audit netlink packet (No buffer space available)
>
> Fix this by detecting when a blocking sender's timeout has expired
> (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
> on the next iteration), safely free the skb and return -EAGAIN, allowing
> the audit subsystem to gracefully enqueue the pending event into its
> internal backlog.
The socket _is_ the queue, normally.
Please explore fixing this in audit?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-19 0:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-19 0:35 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox