From mboxrd@z Thu Jan 1 00:00:00 1970 From: Soheil Hassas Yeganeh Subject: [PATCH net-next] sock: do not set sk_err in sock_dequeue_err_skb Date: Thu, 3 Nov 2016 18:24:27 -0400 Message-ID: <1478211867-24569-1-git-send-email-soheil.kdev@gmail.com> Cc: edumazet@google.com, willemb@google.com, ncardwell@google.com, Soheil Hassas Yeganeh To: davem@davemloft.net, netdev@vger.kernel.org Return-path: Received: from mail-qt0-f196.google.com ([209.85.216.196]:36726 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404AbcKCWYl (ORCPT ); Thu, 3 Nov 2016 18:24:41 -0400 Received: by mail-qt0-f196.google.com with SMTP id n34so2210952qtb.3 for ; Thu, 03 Nov 2016 15:24:41 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-ID: From: Soheil Hassas Yeganeh Do not set sk_err when dequeuing errors from the error queue. Doing so results in: a) Bugs: By overwriting existing sk_err values, it possibly hides legitimate errors. It is also incorrect when local errors are queued with ip_local_error. That happens in the context of a system call, which already returns the error code. b) Inconsistent behavior: When there are pending errors on the error queue, sk_err is sometimes 0 (e.g., for the first timestamp on the error queue) and sometimes set to an error code (after dequeuing the first timestamp). c) Suboptimality: Setting sk_err to ENOMSG on simple TX timestamps can abort parallel reads and writes. Removing this line doesn't break userspace. This is because userspace code cannot rely on sk_err for detecting whether there is something on the error queue. Except for ICMP messages received for UDP and RAW, sk_err is not set at enqueue time, and as a result sk_err can be 0 while there are plenty of errors on the error queue. For ICMP packets in UDP and RAW, sk_err is set when they are enqueued on the error queue, but that does not result in aborting reads and writes. For such cases, sk_err is only readable via getsockopt(SO_ERROR) which will reset the value of sk_err on its own. More importantly, prior to this patch, recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e., sk_err is set by sock_dequeue_err_skb without atomic ops or locks) which can store 0 in sk_err even when we have ICMP messages pending. Removing this line from sock_dequeue_err_skb eliminates that race. Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet Signed-off-by: Willem de Bruijn Signed-off-by: Neal Cardwell --- net/core/skbuff.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1e3e008..0b2a6e9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3725,7 +3725,6 @@ struct sk_buff *sock_dequeue_err_skb(struct sock *sk) err = SKB_EXT_ERR(skb_next)->ee.ee_errno; spin_unlock_irqrestore(&q->lock, flags); - sk->sk_err = err; if (err) sk->sk_error_report(sk); -- 2.8.0.rc3.226.g39d4020