* [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN
@ 2026-04-28 5:23 Petr Malat
2026-04-28 5:58 ` Kuniyuki Iwashima
0 siblings, 1 reply; 3+ messages in thread
From: Petr Malat @ 2026-04-28 5:23 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: davem, kuba, ebiggers, Petr Malat
If a packet is queued and RCV_SHUTDOWN flag is set after the function
__skb_wait_for_more_packets() checked the queue, the function returns
EOF, which is then propagated by __unix_dgram_recvmsg() and the user
reads EOF although there is a message or messages still pending.
The function should check if the queue is empty before returning EOF.
As the same is true for disconnect and it's also reasonable for a pending
signal, check in a common place before returning from the function.
Signed-off-by: Petr Malat <oss@malat.biz>
---
net/core/datagram.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index c285c6465923..5952950f7233 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -98,40 +98,44 @@ int __skb_wait_for_more_packets(struct sock *sk,
struct sk_buff_head *queue,
/* Socket errors? */
error = sock_error(sk);
if (error)
- goto out_err;
+ goto out;
if (READ_ONCE(queue->prev) != skb)
goto out;
/* Socket shut down? */
- if (sk->sk_shutdown & RCV_SHUTDOWN)
- goto out_noerr;
+ if (sk->sk_shutdown & RCV_SHUTDOWN) {
+ error = 1;
+ goto check_queue;
+ }
/* Sequenced packets can come disconnected.
* If so we report the problem
*/
- error = -ENOTCONN;
if (connection_based(sk) &&
- !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN))
- goto out_err;
+ !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN)) {
+ error = -ENOTCONN;
+ goto check_queue;
+ }
/* handle signals */
- if (signal_pending(current))
- goto interrupted;
+ if (signal_pending(current)) {
+ error = sock_intr_errno(*timeo_p);
+ goto check_queue;
+ }
- error = 0;
*timeo_p = schedule_timeout(*timeo_p);
out:
+ *err = error < 0 ? error : 0;
finish_wait(sk_sleep(sk), &wait);
return error;
-interrupted:
- error = sock_intr_errno(*timeo_p);
-out_err:
- *err = error;
- goto out;
-out_noerr:
- *err = 0;
- error = 1;
+check_queue:
+ /* A packet may have arrived between the initial queue check and any
+ * of the early-exit conditions above. Return 0 to let the caller
+ * drain the queue before acting on the shutdown / disconnect / signal.
+ */
+ if (READ_ONCE(queue->prev) != skb)
+ error = 0;
goto out;
}
EXPORT_SYMBOL(__skb_wait_for_more_packets);
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN
2026-04-28 5:23 [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN Petr Malat
@ 2026-04-28 5:58 ` Kuniyuki Iwashima
2026-04-28 6:49 ` Petr Malat
0 siblings, 1 reply; 3+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-28 5:58 UTC (permalink / raw)
To: oss; +Cc: davem, ebiggers, kuba, linux-kernel, netdev
From: Petr Malat <oss@malat.biz>
Date: Mon, 27 Apr 2026 22:23:33 -0700
> If a packet is queued and RCV_SHUTDOWN flag is set after the function
> __skb_wait_for_more_packets() checked the queue, the function returns
> EOF, which is then propagated by __unix_dgram_recvmsg() and the user
> reads EOF although there is a message or messages still pending.
>
> The function should check if the queue is empty before returning EOF.
> As the same is true for disconnect and it's also reasonable for a pending
> signal, check in a common place before returning from the function.
>
> Signed-off-by: Petr Malat <oss@malat.biz>
> ---
> net/core/datagram.c | 38 +++++++++++++++++++++-----------------
> 1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index c285c6465923..5952950f7233 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -98,40 +98,44 @@ int __skb_wait_for_more_packets(struct sock *sk,
> struct sk_buff_head *queue,
> /* Socket errors? */
> error = sock_error(sk);
> if (error)
> - goto out_err;
> + goto out;
>
> if (READ_ONCE(queue->prev) != skb)
> goto out;
>
> /* Socket shut down? */
> - if (sk->sk_shutdown & RCV_SHUTDOWN)
> - goto out_noerr;
> + if (sk->sk_shutdown & RCV_SHUTDOWN) {
> + error = 1;
> + goto check_queue;
We already have checked the same condition just above, and this
is a matter of timing.
Even after the duplicated check is evaluated to false, there is
a small chance that the concurrent sendmsg() enqueues a new skb.
Considering __skb_wait_for_more_packets() is called only when
the queue is empty, it's not worth another round when shutdown()ed.
> + }
>
> /* Sequenced packets can come disconnected.
> * If so we report the problem
> */
> - error = -ENOTCONN;
> if (connection_based(sk) &&
> - !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN))
> - goto out_err;
> + !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN)) {
> + error = -ENOTCONN;
Also, the queue is always empty if SOCK_SEQPACKET sk is at TCP_CLOSE.
> + goto check_queue;
> + }
>
> /* handle signals */
> - if (signal_pending(current))
> - goto interrupted;
> + if (signal_pending(current)) {
> + error = sock_intr_errno(*timeo_p);
> + goto check_queue;
and we don't want to delay signal if it arrived first.
> + }
>
> - error = 0;
> *timeo_p = schedule_timeout(*timeo_p);
> out:
> + *err = error < 0 ? error : 0;
> finish_wait(sk_sleep(sk), &wait);
> return error;
> -interrupted:
> - error = sock_intr_errno(*timeo_p);
> -out_err:
> - *err = error;
> - goto out;
> -out_noerr:
> - *err = 0;
> - error = 1;
> +check_queue:
> + /* A packet may have arrived between the initial queue check and any
> + * of the early-exit conditions above. Return 0 to let the caller
> + * drain the queue before acting on the shutdown / disconnect / signal.
> + */
> + if (READ_ONCE(queue->prev) != skb)
> + error = 0;
> goto out;
> }
> EXPORT_SYMBOL(__skb_wait_for_more_packets);
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN
2026-04-28 5:58 ` Kuniyuki Iwashima
@ 2026-04-28 6:49 ` Petr Malat
0 siblings, 0 replies; 3+ messages in thread
From: Petr Malat @ 2026-04-28 6:49 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: davem, ebiggers, kuba, linux-kernel, netdev
On Tue, Apr 28, 2026 at 05:58:41AM +0000, Kuniyuki Iwashima wrote:
> From: Petr Malat <oss@malat.biz>
> Date: Mon, 27 Apr 2026 22:23:33 -0700
> > If a packet is queued and RCV_SHUTDOWN flag is set after the function
> > __skb_wait_for_more_packets() checked the queue, the function returns
> > EOF, which is then propagated by __unix_dgram_recvmsg() and the user
> > reads EOF although there is a message or messages still pending.
> >
> > The function should check if the queue is empty before returning EOF.
> > As the same is true for disconnect and it's also reasonable for a pending
> > signal, check in a common place before returning from the function.
> >
> > Signed-off-by: Petr Malat <oss@malat.biz>
> > ---
> > net/core/datagram.c | 38 +++++++++++++++++++++-----------------
> > 1 file changed, 21 insertions(+), 17 deletions(-)
> >
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index c285c6465923..5952950f7233 100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -98,40 +98,44 @@ int __skb_wait_for_more_packets(struct sock *sk,
> > struct sk_buff_head *queue,
> > /* Socket errors? */
> > error = sock_error(sk);
> > if (error)
> > - goto out_err;
> > + goto out;
> >
> > if (READ_ONCE(queue->prev) != skb)
> > goto out;
> >
> > /* Socket shut down? */
> > - if (sk->sk_shutdown & RCV_SHUTDOWN)
> > - goto out_noerr;
> > + if (sk->sk_shutdown & RCV_SHUTDOWN) {
> > + error = 1;
> > + goto check_queue;
>
> We already have checked the same condition just above, and this
> is a matter of timing.
Yes, but both a message followed by RCV_SHUTDOWN can arrive after the
condition has been checked and in which case returning the message
must be prefered.
> Even after the duplicated check is evaluated to false, there is
> a small chance that the concurrent sendmsg() enqueues a new skb.
No, there isn't, because the sendmsg checks the state and the sender
gets error in that case. This is done udner a lock, so the situation
you described is not possible, see unix_dgram_sendmsg():
unix_state_lock(other);
....
if (other->sk_shutdown & RCV_SHUTDOWN) {
err = -EPIPE;
goto out_unlock;
}
> Considering __skb_wait_for_more_packets() is called only when
> the queue is empty, it's not worth another round when shutdown()ed.
This breaks the interface as POSIX is quite clear here:
If no messages are available to be received and the peer has
performed an orderly shutdown, recv() shall return 0.
In this case messages were available, but 0 was returned. This
behavior breaks even the simple example in unix(7), if you run it long
enough (https://man7.org/linux/man-pages/man7/unix.7.html).
> > + }
> >
> > /* Sequenced packets can come disconnected.
> > * If so we report the problem
> > */
> > - error = -ENOTCONN;
> > if (connection_based(sk) &&
> > - !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN))
> > - goto out_err;
> > + !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN)) {
> > + error = -ENOTCONN;
>
> Also, the queue is always empty if SOCK_SEQPACKET sk is at TCP_CLOSE.
In my opinion this is not true if UDS reconnects, see the code
at unix_dgram_connect(), section "If it was connected, reconnect.",
where it sets TCP_CLOSE without checking the queue is empty. I haven't
checked other cases as it's enough to have one to justify the change.
Also, it makes the whole function simpler and one doesn't have to
review all callers or be afraid a future change could break it.
>
>
> > + goto check_queue;
> > + }
> >
> > /* handle signals */
> > - if (signal_pending(current))
> > - goto interrupted;
> > + if (signal_pending(current)) {
> > + error = sock_intr_errno(*timeo_p);
> > + goto check_queue;
>
> and we don't want to delay signal if it arrived first.
- We already do it anyway, if a signal followed by a message is
delivered before the initial queue check, the message handling gets
prefered.
- The signal is handled the same way, the only difference is the
user gets the message instead EINTR afterwards if he had a
nonrestartable handler set up. If the signal leads to the process,
termination, the syscal never returns. So, there is no delay.
Regards,
Petr
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-28 6:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 5:23 [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN Petr Malat
2026-04-28 5:58 ` Kuniyuki Iwashima
2026-04-28 6:49 ` Petr Malat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox