From: Steve Grubb <sgrubb@redhat.com>
To: Ricardo Robaina <rrobaina@redhat.com>, Jakub Kicinski <kuba@kernel.org>
Cc: audit@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, paul@paul-moore.com, eparis@redhat.com,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org
Subject: Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
Date: Thu, 28 May 2026 18:40:44 -0400 [thread overview]
Message-ID: <2143396.Jadu78ljVU@x2> (raw)
In-Reply-To: <20260527152936.001d5d28@kernel.org>
On Wednesday, May 27, 2026 6:29:36 PM Eastern Daylight Time Jakub Kicinski
wrote:
> On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote:
> > On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> > > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks
> > > > on
> > > > the netlink socket.
> > >
> > > Holding socket lock during slow IO sounds very wrong. One could say -
> > > that's abuse of the socket lock?
> > >
> > > > If the wait timeout fully expires (timeo == 0),
> > > > netlink mistakenly interprets the zeroed timeout as a non-blocking
> > > > request. It then triggers netlink_overrun that drops the event,
> > > > completely bypassing the audit subsystem's internal retry queue, and
> > > >
> > > > falsely returns ENOBUFS to user-space, resulting in the following
error:
> > > > auditd[]: Error receiving audit netlink packet (No buffer space
> > > > available)
> > > >
> > > > Fix this by detecting when a blocking sender's timeout has expired
> > > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> > > > of retrying with timeo=0 (which would incorrectly trigger
> > > > netlink_overrun
> > > > on the next iteration), safely free the skb and return -EAGAIN,
> > > > allowing
> > > > the audit subsystem to gracefully enqueue the pending event into its
> > > > internal backlog.
> > >
> > > The socket _is_ the queue, normally.
> > >
> > > Please explore fixing this in audit?
> > > --
> > > pw-bot: cr
> >
> > Hi Jakub,
> >
> > Thanks for reviewing this patch as well.
> >
> > First, regarding the lock: kauditd does not hold the socket lock during
> > slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on
> > nlk->wait (a wait queue). No socket lock or mutex is held during the
> > sleep.
>
> So you're saying the queue _is_ actually congested?
Yes. the socket buffer is genuinely full because auditd can't drain fast
enough.
> netlink_attachskb() sleeps because there's no space left in the socket's
> rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until
> user space drains its socket and kernel can succeed sending?
>
> Could you confirm this understanding is correct?
Yes. kauditd sleeps in netlink_attachskb, the HZ/10 timeout expires, and the
skb is moved to audit_retry_queue until auditd drains enough for delivery to
succeed. The record is not lost.
> > Second, regarding an audit-only fix: the symptom manifests as sk->sk_err
> > = ENOBUFS set inside netlink_overrun() (called from netlink_attachskb
> > when timeo == 0). Audit has no mechanism to prevent or clear this socket
> > state from the outside. Potential workarounds all fail:
> >
> > (1) Clearing sk_err after the fact is racy and affects other socket ops
>
> Why would you clear the sk_err, it's the reader's responsibility to
> clear the congestion and the reader is AFAIU a user space process.
The reader is in a fight to clear the congestion. But 1 reader thread vs 32
cores, the reader can get backlogged. It doesn't happen very often, but it
does once in a great while. The reader doesn't want an ENOBUFS and logs that
as an exceptional condition when that happens. It wants to rely on the
kernel's backlog mechanism.
> > (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism
>
> What's the anti-deadlock mechanism?
sk_sndtimeo = HZ/10, set in audit_net_init(). Without it, kauditd would sleep
indefinitely in netlink_attachskb if auditd is stalled or dead. The timeout
lets kauditd escape and route the skb to its retry queue.
> > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable
> > kernels where this bug is actively impacting users
>
> Which commit are you referring to? Isn't that flag itself ancient?
You're right, it is. I see how this flag would fix the pathological behavior
that was reported. But as I have looked at this suggestion, there seems to be
one wrinkle. User space should not need to know that the audit code in the
kernel has this retry mechanism. It seems like the audit subsystem should set
the flag on auditd's socket at registration time in auditd_set(). The kernel
is the right place for this because it's the kernel that manages the retry/
hold queues and sets the sk_sndtimeo that triggers the overrun path - auditd
has no knowledge of these internals.
NETLINK_F_RECV_NO_ENOBUFS and nlk_sk are private to net/netlink/af_netlink.h,
so audit.c can't set the flag directly. Should we propose a small exported
helper, netlink_sock_set_no_enobufs(), that mirrors the existing
setsockopt(NETLINK_NO_ENOBUFS) handler? Then the rest of the fix itself lives
entirely in kernel/audit.c as you suggested.
Something like:
void netlink_sock_set_no_enobufs(struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);
nlk->flags |= NETLINK_F_RECV_NO_ENOBUFS;
clear_bit(NETLINK_S_CONGESTED, &nlk->state);
wake_up_interruptible(&nlk->wait);
}
and then in audit_set() it calls this as it sets up the connection. Is this
the way you wanted to handle this?
-Steve
next prev parent reply other threads:[~2026-05-28 22:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-27 19:26 ` Ricardo Robaina
2026-05-19 0:35 ` Jakub Kicinski
2026-05-26 20:53 ` Paul Moore
2026-05-27 19:34 ` Ricardo Robaina
2026-05-27 19:29 ` Ricardo Robaina
2026-05-27 22:29 ` Jakub Kicinski
2026-05-28 22:40 ` Steve Grubb [this message]
2026-05-28 23:29 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2143396.Jadu78ljVU@x2 \
--to=sgrubb@redhat.com \
--cc=audit@vger.kernel.org \
--cc=edumazet@google.com \
--cc=eparis@redhat.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paul@paul-moore.com \
--cc=rrobaina@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox