Netdev List
 help / color / mirror / Atom feed
* [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
@ 2026-05-13 17:24 Ricardo Robaina
  2026-05-18 11:03 ` Simon Horman
  2026-05-19  0:35 ` Jakub Kicinski
  0 siblings, 2 replies; 3+ messages in thread
From: Ricardo Robaina @ 2026-05-13 17:24 UTC (permalink / raw)
  To: audit, linux-kernel, netdev
  Cc: paul, eparis, edumazet, kuba, pabeni, horms, Ricardo Robaina,
	Steve Grubb

When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
the netlink socket. If the wait timeout fully expires (timeo == 0),
netlink mistakenly interprets the zeroed timeout as a non-blocking
request. It then triggers netlink_overrun that drops the event,
completely bypassing the audit subsystem's internal retry queue, and
falsely returns ENOBUFS to user-space, resulting in the following error:

 auditd[]: Error receiving audit netlink packet (No buffer space available)

Fix this by detecting when a blocking sender's timeout has expired
(timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
on the next iteration), safely free the skb and return -EAGAIN, allowing
the audit subsystem to gracefully enqueue the pending event into its
internal backlog.

Suggested-by: Steve Grubb <sgrubb@redhat.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
---
Changes in v2:
- Use the simple check (timeo == 0 && !nonblock) to detect
  expired timeout, avoiding adding a new NETLINK flag.

 net/netlink/af_netlink.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2aeb0680807d..fdc3db74b178 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	err = netlink_attachskb(sk, skb, &timeo, ssk);
-	if (err == 1)
+	if (err == 1) {
+		/* timeo may have been zeroed by schedule_timeout inside
+		 * netlink_attachskb. If the caller is a timed-blocking sender
+		 * (not genuinely nonblocking), don't re-enter with timeo=0 as
+		 * that would misfire netlink_overrun on the next iteration.
+		 */
+		if (timeo == 0 && !nonblock) {
+			kfree_skb(skb);
+			return -EAGAIN;
+		}
 		goto retry;
+	}
 	if (err)
 		return err;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
  2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
@ 2026-05-18 11:03 ` Simon Horman
  2026-05-19  0:35 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-05-18 11:03 UTC (permalink / raw)
  To: rrobaina
  Cc: 'Simon Horman', audit, linux-kernel, netdev, paul, eparis,
	edumazet, kuba, pabeni, sgrubb

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
netlink, audit: prevent false ENOBUFS on timeout expiry

This commit modifies netlink_unicast() to return -EAGAIN when a blocking
sender's timeout expires. This bypasses the netlink_overrun() call to
allow the audit subsystem to gracefully enqueue the pending event into its
internal backlog without falsely returning ENOBUFS to user-space.

> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
>  	}
>  
>  	err = netlink_attachskb(sk, skb, &timeo, ssk);
> -	if (err == 1)
> +	if (err == 1) {
> +		/* timeo may have been zeroed by schedule_timeout inside
> +		 * netlink_attachskb. If the caller is a timed-blocking sender
> +		 * (not genuinely nonblocking), don't re-enter with timeo=0 as
> +		 * that would misfire netlink_overrun on the next iteration.
> +		 */
> +		if (timeo == 0 && !nonblock) {
> +			kfree_skb(skb);
> +			return -EAGAIN;

Could this regression cause silent packet drops in non-audit subsystems?

While this change prevents false ENOBUFS for the audit subsystem, other
subsystems like InfiniBand's IWPM use rdma_nl_unicast_wait(), which sets
nonblock=0 and operates on a kernel socket with a timeout.

If the timeout expires because the receiving user-space daemon is
bottlenecked, this new path will free the skb and return -EAGAIN without
ever invoking netlink_overrun().

Without netlink_overrun(), the receiver's socket error is never set to
ENOBUFS. Would this prevent user-space from being notified of the dropped
message, leading to permanent state desynchronization between the kernel
and user-space?

> +		}
>  		goto retry;
> +	}
>  	if (err)
>  		return err;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
  2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
  2026-05-18 11:03 ` Simon Horman
@ 2026-05-19  0:35 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-05-19  0:35 UTC (permalink / raw)
  To: Ricardo Robaina
  Cc: audit, linux-kernel, netdev, paul, eparis, edumazet, pabeni,
	horms, Steve Grubb

On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
> the netlink socket.

Holding socket lock during slow IO sounds very wrong. One could say -
that's abuse of the socket lock?

> If the wait timeout fully expires (timeo == 0),
> netlink mistakenly interprets the zeroed timeout as a non-blocking
> request. It then triggers netlink_overrun that drops the event,
> completely bypassing the audit subsystem's internal retry queue, and
> falsely returns ENOBUFS to user-space, resulting in the following error:
> 
>  auditd[]: Error receiving audit netlink packet (No buffer space available)
> 
> Fix this by detecting when a blocking sender's timeout has expired
> (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
> on the next iteration), safely free the skb and return -EAGAIN, allowing
> the audit subsystem to gracefully enqueue the pending event into its
> internal backlog.

The socket _is_ the queue, normally.

Please explore fixing this in audit?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-19  0:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-19  0:35 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox