From: Jakub Kicinski <kuba@kernel.org>
To: Ricardo Robaina <rrobaina@redhat.com>
Cc: audit@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, paul@paul-moore.com, eparis@redhat.com,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
Steve Grubb <sgrubb@redhat.com>
Subject: Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
Date: Wed, 27 May 2026 15:29:36 -0700 [thread overview]
Message-ID: <20260527152936.001d5d28@kernel.org> (raw)
In-Reply-To: <CAABTaaC98dqM8U-7xkdW=b=50UKu0SQyBO629LDdphQ9DC=P=g@mail.gmail.com>
On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote:
> On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
> > > the netlink socket.
> >
> > Holding socket lock during slow IO sounds very wrong. One could say -
> > that's abuse of the socket lock?
> >
> > > If the wait timeout fully expires (timeo == 0),
> > > netlink mistakenly interprets the zeroed timeout as a non-blocking
> > > request. It then triggers netlink_overrun that drops the event,
> > > completely bypassing the audit subsystem's internal retry queue, and
> > > falsely returns ENOBUFS to user-space, resulting in the following error:
> > >
> > > auditd[]: Error receiving audit netlink packet (No buffer space available)
> > >
> > > Fix this by detecting when a blocking sender's timeout has expired
> > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
> > > on the next iteration), safely free the skb and return -EAGAIN, allowing
> > > the audit subsystem to gracefully enqueue the pending event into its
> > > internal backlog.
> >
> > The socket _is_ the queue, normally.
> >
> > Please explore fixing this in audit?
> > --
> > pw-bot: cr
> >
>
> Hi Jakub,
>
> Thanks for reviewing this patch as well.
>
> First, regarding the lock: kauditd does not hold the socket lock during
> slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on
> nlk->wait (a wait queue). No socket lock or mutex is held during the sleep.
So you're saying the queue _is_ actually congested?
netlink_attachskb() sleeps because there's no space left in the socket's
rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until
user space drains its socket and kernel can succeed sending?
Could you confirm this understanding is correct?
> Second, regarding an audit-only fix: the symptom manifests as sk->sk_err =
> ENOBUFS set inside netlink_overrun() (called from netlink_attachskb when
> timeo == 0). Audit has no mechanism to prevent or clear this socket state
> from the outside. Potential workarounds all fail:
>
> (1) Clearing sk_err after the fact is racy and affects other socket ops
Why would you clear the sk_err, it's the reader's responsibility to
clear the congestion and the reader is AFAIU a user space process.
> (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism
What's the anti-deadlock mechanism?
> (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable
> kernels where this bug is actively impacting users
Which commit are you referring to? Isn't that flag itself ancient?
> I've submitted v3 [1] with NETLINK_UNICAST_TIMED as an explicit opt-in
> constant.
It's really not great to fall silent for 10+ days, then respond and
immediately posts equally pointless next version of the patch :/
prev parent reply other threads:[~2026-05-27 22:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-27 19:26 ` Ricardo Robaina
2026-05-19 0:35 ` Jakub Kicinski
2026-05-26 20:53 ` Paul Moore
2026-05-27 19:34 ` Ricardo Robaina
2026-05-27 19:29 ` Ricardo Robaina
2026-05-27 22:29 ` Jakub Kicinski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527152936.001d5d28@kernel.org \
--to=kuba@kernel.org \
--cc=audit@vger.kernel.org \
--cc=edumazet@google.com \
--cc=eparis@redhat.com \
--cc=horms@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paul@paul-moore.com \
--cc=rrobaina@redhat.com \
--cc=sgrubb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox