Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry

Netdev List
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Ricardo Robaina <rrobaina@redhat.com>
Cc: audit@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, paul@paul-moore.com, eparis@redhat.com,
	edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
	Steve Grubb <sgrubb@redhat.com>
Subject: Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry
Date: Wed, 27 May 2026 15:29:36 -0700	[thread overview]
Message-ID: <20260527152936.001d5d28@kernel.org> (raw)
In-Reply-To: <CAABTaaC98dqM8U-7xkdW=b=50UKu0SQyBO629LDdphQ9DC=P=g@mail.gmail.com>

On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote:
> On Mon, May 18, 2026 at 9:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:  
> > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
> > > the netlink socket.  
> >
> > Holding socket lock during slow IO sounds very wrong. One could say -
> > that's abuse of the socket lock?
> >  
> > > If the wait timeout fully expires (timeo == 0),
> > > netlink mistakenly interprets the zeroed timeout as a non-blocking
> > > request. It then triggers netlink_overrun that drops the event,
> > > completely bypassing the audit subsystem's internal retry queue, and
> > > falsely returns ENOBUFS to user-space, resulting in the following error:
> > >
> > >  auditd[]: Error receiving audit netlink packet (No buffer space available)
> > >
> > > Fix this by detecting when a blocking sender's timeout has expired
> > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
> > > on the next iteration), safely free the skb and return -EAGAIN, allowing
> > > the audit subsystem to gracefully enqueue the pending event into its
> > > internal backlog.  
> >
> > The socket _is_ the queue, normally.
> >
> > Please explore fixing this in audit?
> > --
> > pw-bot: cr
> >  
> 
> Hi Jakub,
> 
> Thanks for reviewing this patch as well.
> 
> First, regarding the lock: kauditd does not hold the socket lock during
> slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on
> nlk->wait (a wait queue). No socket lock or mutex is held during the sleep.

So you're saying the queue _is_ actually congested?
netlink_attachskb() sleeps because there's no space left in the socket's 
rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until
user space drains its socket and kernel can succeed sending?

Could you confirm this understanding is correct?

> Second, regarding an audit-only fix: the symptom manifests as sk->sk_err =
> ENOBUFS set inside netlink_overrun() (called from netlink_attachskb when
> timeo == 0). Audit has no mechanism to prevent or clear this socket state
> from the outside. Potential workarounds all fail:
> 
> (1) Clearing sk_err after the fact is racy and affects other socket ops

Why would you clear the sk_err, it's the reader's responsibility to
clear the congestion and the reader is AFAIU a user space process.

> (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism

What's the anti-deadlock mechanism?

> (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable
> kernels where this bug is actively impacting users

Which commit are you referring to? Isn't that flag itself ancient?

> I've submitted v3 [1] with NETLINK_UNICAST_TIMED as an explicit opt-in
> constant. 

It's really not great to fall silent for 10+ days, then respond and
immediately posts equally pointless next version of the patch :/

     prev parent reply	other threads:[~2026-05-27 22:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 17:24 [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry Ricardo Robaina
2026-05-18 11:03 ` Simon Horman
2026-05-27 19:26   ` Ricardo Robaina
2026-05-19  0:35 ` Jakub Kicinski
2026-05-26 20:53   ` Paul Moore
2026-05-27 19:34     ` Ricardo Robaina
2026-05-27 19:29   ` Ricardo Robaina
2026-05-27 22:29     ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260527152936.001d5d28@kernel.org \
    --to=kuba@kernel.org \
    --cc=audit@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=eparis@redhat.com \
    --cc=horms@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paul@paul-moore.com \
    --cc=rrobaina@redhat.com \
    --cc=sgrubb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox