Re: [PATCH net] af_key: free SKBs under RCU protection

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: stranche@codeaurora.org, Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, steffen.klassert@secunet.com
Subject: Re: [PATCH net] af_key: free SKBs under RCU protection
Date: Thu, 20 Sep 2018 15:10:59 -0700	[thread overview]
Message-ID: <82b2ac27-0f61-3be6-b09e-ec31cf95881f@gmail.com> (raw)
In-Reply-To: <357e28c3fa0c7bacaffde4e960f58a87@codeaurora.org>



On 09/20/2018 12:25 PM, stranche@codeaurora.org wrote:
>>
>> I do not believe the changelog or the patch makes sense.
>>
>> Having skb still referencing a socket prevents this socket being released.
>>
>> If you think about it, what would prevent the freeing happening
>> _before_ the rcu_read_lock() in pfkey_broadcast() ?
>>
>> Maybe the correct fix is that pfkey_broadcast_one() should ensure the
>> socket is still valid.
>>
>> I would suggest something like :
>>
>> diff --git a/net/key/af_key.c b/net/key/af_key.c
>> index
>> 9d61266526e767770d9a1ce184ac8cdd59de309a..5ce309d020dda5e46e4426c4a639bfb551e2260d
>> 100644
>> --- a/net/key/af_key.c
>> +++ b/net/key/af_key.c
>> @@ -201,7 +201,9 @@ static int pfkey_broadcast_one(struct sk_buff
>> *skb, struct sk_buff **skb2,
>>  {
>>         int err = -ENOBUFS;
>>
>> -       sock_hold(sk);
>> +       if (!refcount_inc_not_zero(&sk->sk_refcnt))
>> +               return -ENOENT;
>> +
>>         if (*skb2 == NULL) {
>>                 if (refcount_read(&skb->users) != 1) {
>>                         *skb2 = skb_clone(skb, allocation);
> 
> Hi Eric,
> 
> I'm not sure that the socket getting freed before the rcu_read_lock() would
> be an issue, since then it would no longer be in the net_pkey->table that
> we loop through (since we call pfkey_remove() from pfkey_relase()). Because of
> that, all the sockets processed in pfkey_broadcast_one() have valid refcounts,
> so checking for zero there doesn't prevent the crash that I'm seeing.
> 
> However, after going over the call flow again, I see that the actual problem
> occurs because of pfkey_broadcast_one(). Specifically, because of this check:
> 
>     if (*skb2 == NULL) {
>         if (refcount_read(&skb->users) != 1) {
>             *skb2 = skb_clone(skb, allocation);
>         } else {
>             *skb2 = skb;
>             refcount_inc(&skb->users);
>         }
>     }
> 
> Since we always pass a freshly cloned SKB to this function, skb->users is
> always 1, and skb2 just becomes skb. We then set skb2 (and thus skb) to
> belong to the socket.
> 
> If the socket we queue skb2 to frees this SKB (thereby decrementing its
> refcount to 1) and the socket is freed before pfkey_broadcast() can
> execute the kfree_skb(skb) on line 284, we will then attempt to run
> sock_rfree() on an SKB with a dangling reference to this socket.
> 
> Perhaps a cleaner solution here is to always clone the SKB in
> pfkey_broadcast_one(). That will ensure that the two kfree_skb() calls
> in pfkey_broadcast() will never be passed an SKB with sock_rfree() as
> its destructor, and we can avoid this race condition.

As long as one skb has sock_rfree has its destructor, the socket attached to
this skb can not be released. There is no race here.

Note that skb_clone() does not propagate the destructor.

The issue here is that in the rcu lookup, we can find a socket that has been
dismantled, with a 0 refcount.

We must not use sock_hold() in this case, since we are not sure the socket refcount is not already 0.

pfkey_broadcast() and pfkey_broadcast_one() violate basic RCU rules.

When in a RCU lookup, one want to increment an object refcount, it needs
to be extra-careful, as I did in my proposal.

Note that the race could be automatically detected with CONFIG_REFCOUNT_FULL=y

next prev parent reply	other threads:[~2018-09-21  3:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-20  0:18 [PATCH net] af_key: free SKBs under RCU protection Sean Tranchetti
2018-09-20 13:29 ` Eric Dumazet
2018-09-20 19:25   ` stranche
2018-09-20 22:10     ` Eric Dumazet [this message]
2018-09-20 22:29       ` Eric Dumazet
2018-09-21 17:09         ` stranche
2018-09-21 17:40           ` Eric Dumazet
2018-09-21 18:44             ` stranche
2018-09-23 17:15     ` Eric Dumazet
2018-09-24 18:46       ` stranche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82b2ac27-0f61-3be6-b09e-ec31cf95881f@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=stranche@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).