From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2236836D9F6
	for <netdev@vger.kernel.org>; Thu, 28 May 2026 22:40:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780008055; cv=none; b=k6VOP8ADewIuKYAs0S1cLI/6u3ViMpPHos2t6AqSVhvjf+Wg+sS7pksEu+ZDmdDvPZJONGPKjJ6czYrlijPH9wFrku1b1qJM23YbRyclX9aMDuNTre6UPCLReMvfFo9tZfjA6W15qltIQiY2mGBoSLwr7PAjw/Z8cs+RYvkEIAg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780008055; c=relaxed/simple;
	bh=6TBxcT3rXL3rmUxs9dupW13j1+OxzR8YP9/qw/niEwU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=Od6reMV8QYaSyFmW3/HpHgKVQLknWhILsTJrfKthZlk5Ox69qOR5M+pYy2gyTvwpL3VqPg/DdXcGbwfl6usjMufE/NRrDgLXh5k2RBNgwWVYYQtU6/1xqL2Qf5N5BivGJx8i8VPOEjkqVOpb/qm5rggUfGxTcNEjBTqliQw2IlI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VMZPgwxV; arc=none smtp.client-ip=170.10.133.124
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VMZPgwxV"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1780008052;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=V1KtjqbZ1swA9x3w676twoyRtYQLwds5/WiKJAslGSU=;
	b=VMZPgwxVC5UDcRlZTYXAiil/IzJZhrBfxWciaOVCH387htNdtE9njWuLJ5Jc9gilV7Yof9
	WI7O4bX8IOSuFEgOJcgeoDSxzmXN+3rOq9UhzYbx45dwkMv4gI0Af3etDjuSt9+3r2pTL7
	a7QwN+kA+uLSwjqdkBDFwN3ITrnb5FQ=
Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com
 (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by
 relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-404-rLRL0CEONtC1BaI3CmqLhA-1; Thu,
 28 May 2026 18:40:48 -0400
X-MC-Unique: rLRL0CEONtC1BaI3CmqLhA-1
X-Mimecast-MFC-AGG-ID: rLRL0CEONtC1BaI3CmqLhA_1780008047
Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 340721956046;
	Thu, 28 May 2026 22:40:47 +0000 (UTC)
Received: from x2.localnet (unknown [10.22.65.72])
	by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A7A319560AB;
	Thu, 28 May 2026 22:40:45 +0000 (UTC)
From: Steve Grubb <sgrubb@redhat.com>
To: Ricardo Robaina <rrobaina@redhat.com>, Jakub Kicinski <kuba@kernel.org>
Cc: audit@vger.kernel.org, linux-kernel@vger.kernel.org,
 netdev@vger.kernel.org, paul@paul-moore.com, eparis@redhat.com,
 edumazet@google.com, pabeni@redhat.com, horms@kernel.org
Subject: Re: [PATCH v2] netlink,
 audit: prevent false ENOBUFS on timeout expiry
Date: Thu, 28 May 2026 18:40:44 -0400
Message-ID: <2143396.Jadu78ljVU@x2>
Organization: Red Hat
In-Reply-To: <20260527152936.001d5d28@kernel.org>
References:
 <20260513172443.1128496-1-rrobaina@redhat.com>
 <CAABTaaC98dqM8U-7xkdW=b=50UKu0SQyBO629LDdphQ9DC=P=g@mail.gmail.com>
 <20260527152936.001d5d28@kernel.org>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"
X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17

On Wednesday, May 27, 2026 6:29:36=E2=80=AFPM Eastern Daylight Time Jakub K=
icinski=20
wrote:
> On Wed, 27 May 2026 16:29:37 -0300 Ricardo Robaina wrote:
> > On Mon, May 18, 2026 at 9:35=E2=80=AFPM Jakub Kicinski <kuba@kernel.org=
> wrote:
> > > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> > > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks
> > > > on
> > > > the netlink socket.
> > >=20
> > > Holding socket lock during slow IO sounds very wrong. One could say -
> > > that's abuse of the socket lock?
> > >=20
> > > > If the wait timeout fully expires (timeo =3D=3D 0),
> > > > netlink mistakenly interprets the zeroed timeout as a non-blocking
> > > > request. It then triggers netlink_overrun that drops the event,
> > > > completely bypassing the audit subsystem's internal retry queue, and
> > > >=20
> > > > falsely returns ENOBUFS to user-space, resulting in the following=20
error:
> > > >  auditd[]: Error receiving audit netlink packet (No buffer space
> > > >  available)
> > > >=20
> > > > Fix this by detecting when a blocking sender's timeout has expired
> > > > (timeo =3D=3D 0 && !nonblock) in netlink_unicast(). In this case, i=
nstead
> > > > of retrying with timeo=3D0 (which would incorrectly trigger
> > > > netlink_overrun
> > > > on the next iteration), safely free the skb and return -EAGAIN,
> > > > allowing
> > > > the audit subsystem to gracefully enqueue the pending event into its
> > > > internal backlog.
> > >=20
> > > The socket _is_ the queue, normally.
> > >=20
> > > Please explore fixing this in audit?
> > > --
> > > pw-bot: cr
> >=20
> > Hi Jakub,
> >=20
> > Thanks for reviewing this patch as well.
> >=20
> > First, regarding the lock: kauditd does not hold the socket lock during
> > slow I/O. The sleep in netlink_attachskb() uses schedule_timeout() on
> > nlk->wait (a wait queue). No socket lock or mutex is held during the
> > sleep.
>
> So you're saying the queue _is_ actually congested?

Yes. the socket buffer is genuinely full because auditd can't drain fast=20
enough.

> netlink_attachskb() sleeps because there's no space left in the socket's
> rcvbuf? So the skbs are moved to audit_retry_queue "temporarily" until
> user space drains its socket and kernel can succeed sending?
>=20
> Could you confirm this understanding is correct?

Yes. kauditd sleeps in netlink_attachskb, the HZ/10 timeout expires, and th=
e=20
skb is moved to audit_retry_queue until auditd drains enough for delivery t=
o=20
succeed. The record is not lost.

> > Second, regarding an audit-only fix: the symptom manifests as sk->sk_err
> > =3D ENOBUFS set inside netlink_overrun() (called from netlink_attachskb
> > when timeo =3D=3D 0). Audit has no mechanism to prevent or clear this s=
ocket
> > state from the outside. Potential workarounds all fail:
> >=20
> > (1) Clearing sk_err after the fact is racy and affects other socket ops
>=20
> Why would you clear the sk_err, it's the reader's responsibility to
> clear the congestion and the reader is AFAIU a user space process.

The reader is in a fight to clear the congestion. But 1 reader thread vs 32=
=20
cores, the reader can get backlogged. It doesn't happen very often, but it=
=20
does once in a great while. The reader doesn't want an ENOBUFS and logs tha=
t=20
as an exceptional condition when that happens. It wants to rely on the=20
kernel's backlog mechanism.

> > (2) Avoiding timeouts entirely defeats the anti-deadlock mechanism
>=20
> What's the anti-deadlock mechanism?

sk_sndtimeo =3D HZ/10, set in audit_net_init(). Without it, kauditd would s=
leep=20
indefinitely in netlink_attachskb if auditd is stalled or dead. The timeout=
=20
lets kauditd escape and route the skb to its retry queue.

> > (3) A new NETLINK_F_RECV_NO_ENOBUFS socket flag doesn't exist in stable
> > kernels where this bug is actively impacting users
>=20
> Which commit are you referring to? Isn't that flag itself ancient?

You're right, it is. I see how this flag would fix the pathological behavio=
r=20
that was reported. But as I have looked at this suggestion, there seems to =
be=20
one wrinkle. User space should not need to know that the audit code in the=
=20
kernel has this retry mechanism. It seems like the audit subsystem should s=
et=20
the flag on auditd's socket at registration time in auditd_set(). The kerne=
l=20
is the right place for this because it's the kernel that manages the retry/
hold queues and sets the sk_sndtimeo that triggers the overrun path - audit=
d=20
has no knowledge of these internals.

NETLINK_F_RECV_NO_ENOBUFS and nlk_sk are private to net/netlink/af_netlink.=
h,=20
so audit.c can't set the flag directly. Should we propose a small exported=
=20
helper, netlink_sock_set_no_enobufs(), that mirrors the existing=20
setsockopt(NETLINK_NO_ENOBUFS) handler? Then the rest of the fix itself liv=
es=20
entirely in kernel/audit.c as you suggested.

Something like:
void netlink_sock_set_no_enobufs(struct sock *sk)
{
	struct netlink_sock *nlk =3D nlk_sk(sk);

	nlk->flags |=3D NETLINK_F_RECV_NO_ENOBUFS;
	clear_bit(NETLINK_S_CONGESTED, &nlk->state);
	wake_up_interruptible(&nlk->wait);
}

and then in audit_set() it calls this as it sets up the connection. Is this=
=20
the way you wanted to handle this?

=2DSteve