From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH net] af_key: free SKBs under RCU protection Date: Thu, 20 Sep 2018 15:10:59 -0700 Message-ID: <82b2ac27-0f61-3be6-b09e-ec31cf95881f@gmail.com> References: <1537402712-12875-1-git-send-email-stranche@codeaurora.org> <6d194ac2-15e8-76d7-31d0-b4c4eed68d5a@gmail.com> <357e28c3fa0c7bacaffde4e960f58a87@codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: netdev@vger.kernel.org, steffen.klassert@secunet.com To: stranche@codeaurora.org, Eric Dumazet Return-path: Received: from mail-lf1-f65.google.com ([209.85.167.65]:35265 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727771AbeIUD4p (ORCPT ); Thu, 20 Sep 2018 23:56:45 -0400 Received: by mail-lf1-f65.google.com with SMTP id x20-v6so9761187lfg.2 for ; Thu, 20 Sep 2018 15:11:05 -0700 (PDT) In-Reply-To: <357e28c3fa0c7bacaffde4e960f58a87@codeaurora.org> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 09/20/2018 12:25 PM, stranche@codeaurora.org wrote: >> >> I do not believe the changelog or the patch makes sense. >> >> Having skb still referencing a socket prevents this socket being released. >> >> If you think about it, what would prevent the freeing happening >> _before_ the rcu_read_lock() in pfkey_broadcast() ? >> >> Maybe the correct fix is that pfkey_broadcast_one() should ensure the >> socket is still valid. >> >> I would suggest something like : >> >> diff --git a/net/key/af_key.c b/net/key/af_key.c >> index >> 9d61266526e767770d9a1ce184ac8cdd59de309a..5ce309d020dda5e46e4426c4a639bfb551e2260d >> 100644 >> --- a/net/key/af_key.c >> +++ b/net/key/af_key.c >> @@ -201,7 +201,9 @@ static int pfkey_broadcast_one(struct sk_buff >> *skb, struct sk_buff **skb2, >>  { >>         int err = -ENOBUFS; >> >> -       sock_hold(sk); >> +       if (!refcount_inc_not_zero(&sk->sk_refcnt)) >> +               return -ENOENT; >> + >>         if (*skb2 == NULL) { >>                 if (refcount_read(&skb->users) != 1) { >>                         *skb2 = skb_clone(skb, allocation); > > Hi Eric, > > I'm not sure that the socket getting freed before the rcu_read_lock() would > be an issue, since then it would no longer be in the net_pkey->table that > we loop through (since we call pfkey_remove() from pfkey_relase()). Because of > that, all the sockets processed in pfkey_broadcast_one() have valid refcounts, > so checking for zero there doesn't prevent the crash that I'm seeing. > > However, after going over the call flow again, I see that the actual problem > occurs because of pfkey_broadcast_one(). Specifically, because of this check: > >     if (*skb2 == NULL) { >         if (refcount_read(&skb->users) != 1) { >             *skb2 = skb_clone(skb, allocation); >         } else { >             *skb2 = skb; >             refcount_inc(&skb->users); >         } >     } > > Since we always pass a freshly cloned SKB to this function, skb->users is > always 1, and skb2 just becomes skb. We then set skb2 (and thus skb) to > belong to the socket. > > If the socket we queue skb2 to frees this SKB (thereby decrementing its > refcount to 1) and the socket is freed before pfkey_broadcast() can > execute the kfree_skb(skb) on line 284, we will then attempt to run > sock_rfree() on an SKB with a dangling reference to this socket. > > Perhaps a cleaner solution here is to always clone the SKB in > pfkey_broadcast_one(). That will ensure that the two kfree_skb() calls > in pfkey_broadcast() will never be passed an SKB with sock_rfree() as > its destructor, and we can avoid this race condition. As long as one skb has sock_rfree has its destructor, the socket attached to this skb can not be released. There is no race here. Note that skb_clone() does not propagate the destructor. The issue here is that in the rcu lookup, we can find a socket that has been dismantled, with a 0 refcount. We must not use sock_hold() in this case, since we are not sure the socket refcount is not already 0. pfkey_broadcast() and pfkey_broadcast_one() violate basic RCU rules. When in a RCU lookup, one want to increment an object refcount, it needs to be extra-careful, as I did in my proposal. Note that the race could be automatically detected with CONFIG_REFCOUNT_FULL=y