* [PATCH v2] ipv4: dst_entry leak in ip_append_data()
@ 2014-10-14 4:57 Vasily Averin
2014-10-14 20:12 ` David Miller
2014-10-15 4:46 ` Eric Dumazet
0 siblings, 2 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-14 4:57 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Eric Dumazet
v2: adjust the indentation of the arguments __ip_append_data() call
Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
that creates skb and adds it to sk_write_queue.
If skb was added successfully following ip_push_pending_frames() call
reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
Signed-off-by: Vasily Averin <vvs@parallels.com>
---
net/ipv4/ip_output.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..3ba2291 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1120,6 +1120,15 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
return 0;
}
+static void ip_cork_release(struct inet_cork *cork)
+{
+ cork->flags &= ~IPCORK_OPT;
+ kfree(cork->opt);
+ cork->opt = NULL;
+ dst_release(cork->dst);
+ cork->dst = NULL;
+}
+
/*
* ip_append_data() and ip_append_page() can make one large IP datagram
* from many pieces of data. Each pieces will be holded on the socket
@@ -1152,9 +1161,14 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
transhdrlen = 0;
}
- return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
- sk_page_frag(sk), getfrag,
- from, length, transhdrlen, flags);
+ err = __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
+ sk_page_frag(sk), getfrag,
+ from, length, transhdrlen, flags);
+
+ if (skb_queue_empty(&sk->sk_write_queue))
+ ip_cork_release(&inet->cork.base);
+
+ return err;
}
ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
@@ -1304,15 +1318,6 @@ error:
return err;
}
-static void ip_cork_release(struct inet_cork *cork)
-{
- cork->flags &= ~IPCORK_OPT;
- kfree(cork->opt);
- cork->opt = NULL;
- dst_release(cork->dst);
- cork->dst = NULL;
-}
-
/*
* Combined all pending IP fragments on the socket as one IP datagram
* and push them out.
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-14 4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
@ 2014-10-14 20:12 ` David Miller
2014-10-15 7:48 ` Vasily Averin
2014-10-15 4:46 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2014-10-14 20:12 UTC (permalink / raw)
To: vvs; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet
From: Vasily Averin <vvs@parallels.com>
Date: Tue, 14 Oct 2014 08:57:14 +0400
> v2: adjust the indentation of the arguments __ip_append_data() call
>
> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>
> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> that creates skb and adds it to sk_write_queue.
>
> If skb was added successfully following ip_push_pending_frames() call
> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>
> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>
> Signed-off-by: Vasily Averin <vvs@parallels.com>
Why doesn't ip_make_skb() need the same fix? It seems to do the same
thing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-14 4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
2014-10-14 20:12 ` David Miller
@ 2014-10-15 4:46 ` Eric Dumazet
2014-10-15 6:56 ` Vasily Averin
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-10-15 4:46 UTC (permalink / raw)
To: Vasily Averin
Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> v2: adjust the indentation of the arguments __ip_append_data() call
>
> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>
> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> that creates skb and adds it to sk_write_queue.
>
> If skb was added successfully following ip_push_pending_frames() call
> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>
> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
I thought this was done by ip_flush_pending_frames() ?
Can you describe the issue more precisely ?
Thanks !
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-15 4:46 ` Eric Dumazet
@ 2014-10-15 6:56 ` Vasily Averin
2014-10-15 9:30 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Vasily Averin @ 2014-10-15 6:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On 15.10.2014 08:46, Eric Dumazet wrote:
> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>
> I thought this was done by ip_flush_pending_frames() ?
Take look at ip_send_unicast_reply():
ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().
Probably it can happen in raw_sendmsg() and udp_sendmsg() too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-14 20:12 ` David Miller
@ 2014-10-15 7:48 ` Vasily Averin
0 siblings, 0 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-15 7:48 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet
On 15.10.2014 00:12, David Miller wrote:
> From: Vasily Averin <vvs@parallels.com>
> Date: Tue, 14 Oct 2014 08:57:14 +0400
>
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>
>> Signed-off-by: Vasily Averin <vvs@parallels.com>
>
> Why doesn't ip_make_skb() need the same fix? It seems to do the same
> thing.
It seems for me ip_make_skb() works (almost) correctly,
but seems refcounting can be is incorrect if queue can be not empty
(Please see details below).
If __ip_append_data() returns errors ip_make_skb() calls
__ip_flush_pending_frames() that calls ip_cork_release() inside
and frees stolen dst_entry.
If __ip_append_data() returns success -- dst refcounter changes are not required.
In this case skb will be created and added to queue (and it will not be empty)
Later in __ip_make_skb() these skb will get dst reference,
and refcounter will be decremented during kfree_skb().
I do not like that there is such unclear dependency between functions,
but seems currently it works correctly.
However I afraid dst refcountng can work incorrectly if sk_write_queue
can be not empty at the moment of ip_append_data() call.
It was not happen in case ip_send_unicast_reply() but probably
can happen in other places.
Let's calculate dst refcounters changes in this case.
First packet:
dst_refcount increment was happen in ip_append_data() caller, taken during rt lookup
- ip_append_data():
-- sk_write_queue is empty, ip_setup_cork() steals dst entry
-- __ip_append_data() adds skb to queue, queue is not flushed, waiting for next packets.
ip_rt_put in ip_append_data() caller does not work, because dst reference was stolen.
dst refcount here +1
then we want to sent 2nd packet:
dst refcount increment was happen in ip_append_data() caller
- ip_append_data():
-- sk_write_queue is NOT empty, dst was not stolen
-- __ip_append_data() adds skb to queue
ip_rt_put in ip_append_data() caller decrements dst refcount, because it as not stolen
dst refcount here +1
Then we handle new packets, all of them are added to queue
dst refcount is still +1
Then queue is flushed.
Each packet in queue get dst reference from cork,
Each kfree_skb decrements dst refcounter, and it may become negative.
Am I wrong probably?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-15 6:56 ` Vasily Averin
@ 2014-10-15 9:30 ` Eric Dumazet
2014-10-15 11:31 ` Vasily Averin
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-10-15 9:30 UTC (permalink / raw)
To: Vasily Averin
Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
> On 15.10.2014 08:46, Eric Dumazet wrote:
> > On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> >> v2: adjust the indentation of the arguments __ip_append_data() call
> >>
> >> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> >>
> >> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> >> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> >> that creates skb and adds it to sk_write_queue.
> >>
> >> If skb was added successfully following ip_push_pending_frames() call
> >> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> >>
> >> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> >
> > I thought this was done by ip_flush_pending_frames() ?
>
> Take look at ip_send_unicast_reply():
So maybe the bug is here ?
>
> ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
> And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().
>
> Probably it can happen in raw_sendmsg() and udp_sendmsg() too.
UDP & RAW do :
err = ip_append_data(...);
if (err)
udp_flush_pending_frames(sk);
It seems you chose to add a test in fast path, with not even adding an
unlikely() clause, while it seems that we took care of all the cases but
missed a single one : ip_send_unicast_reply()
I am suggesting to fix this bug in another way.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
2014-10-15 9:30 ` Eric Dumazet
@ 2014-10-15 11:31 ` Vasily Averin
0 siblings, 0 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-15 11:31 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
On 15.10.2014 13:30, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
>> On 15.10.2014 08:46, Eric Dumazet wrote:
>>> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>>>> v2: adjust the indentation of the arguments __ip_append_data() call
>>>>
>>>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>>>
>>>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>>>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>>>> that creates skb and adds it to sk_write_queue.
>>>>
>>>> If skb was added successfully following ip_push_pending_frames() call
>>>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>>>
>>>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>>
>>> I thought this was done by ip_flush_pending_frames() ?
>>
>> Take look at ip_send_unicast_reply():
>
> So maybe the bug is here ?
Thank you, I'll remake my patch.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-10-15 11:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-14 4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
2014-10-14 20:12 ` David Miller
2014-10-15 7:48 ` Vasily Averin
2014-10-15 4:46 ` Eric Dumazet
2014-10-15 6:56 ` Vasily Averin
2014-10-15 9:30 ` Eric Dumazet
2014-10-15 11:31 ` Vasily Averin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).