netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ipv4: dst_entry leak in ip_append_data()
@ 2014-10-14  4:57 Vasily Averin
  2014-10-14 20:12 ` David Miller
  2014-10-15  4:46 ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-14  4:57 UTC (permalink / raw)
  To: netdev, David S. Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Eric Dumazet

v2: adjust the indentation of the arguments __ip_append_data() call

Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")

If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
that creates skb and adds it to sk_write_queue.

If skb was added successfully following ip_push_pending_frames() call
reassign dst entries from cork to skb, and kfree_skb frees dst_entry.

However nobody frees stolen dst_entry if skb was not added into sk_write_queue.

Signed-off-by: Vasily Averin <vvs@parallels.com>
---
 net/ipv4/ip_output.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..3ba2291 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1120,6 +1120,15 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	return 0;
 }
 
+static void ip_cork_release(struct inet_cork *cork)
+{
+	cork->flags &= ~IPCORK_OPT;
+	kfree(cork->opt);
+	cork->opt = NULL;
+	dst_release(cork->dst);
+	cork->dst = NULL;
+}
+
 /*
  *	ip_append_data() and ip_append_page() can make one large IP datagram
  *	from many pieces of data. Each pieces will be holded on the socket
@@ -1152,9 +1161,14 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		transhdrlen = 0;
 	}
 
-	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
-				sk_page_frag(sk), getfrag,
-				from, length, transhdrlen, flags);
+	err = __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base,
+			       sk_page_frag(sk), getfrag,
+			       from, length, transhdrlen, flags);
+
+	if (skb_queue_empty(&sk->sk_write_queue))
+		ip_cork_release(&inet->cork.base);
+
+	return err;
 }
 
 ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
@@ -1304,15 +1318,6 @@ error:
 	return err;
 }
 
-static void ip_cork_release(struct inet_cork *cork)
-{
-	cork->flags &= ~IPCORK_OPT;
-	kfree(cork->opt);
-	cork->opt = NULL;
-	dst_release(cork->dst);
-	cork->dst = NULL;
-}
-
 /*
  *	Combined all pending IP fragments on the socket as one IP datagram
  *	and push them out.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-14  4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
@ 2014-10-14 20:12 ` David Miller
  2014-10-15  7:48   ` Vasily Averin
  2014-10-15  4:46 ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2014-10-14 20:12 UTC (permalink / raw)
  To: vvs; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet

From: Vasily Averin <vvs@parallels.com>
Date: Tue, 14 Oct 2014 08:57:14 +0400

> v2: adjust the indentation of the arguments __ip_append_data() call
> 
> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> 
> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> that creates skb and adds it to sk_write_queue.
> 
> If skb was added successfully following ip_push_pending_frames() call
> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> 
> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> 
> Signed-off-by: Vasily Averin <vvs@parallels.com>

Why doesn't ip_make_skb() need the same fix?  It seems to do the same
thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-14  4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
  2014-10-14 20:12 ` David Miller
@ 2014-10-15  4:46 ` Eric Dumazet
  2014-10-15  6:56   ` Vasily Averin
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-10-15  4:46 UTC (permalink / raw)
  To: Vasily Averin
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> v2: adjust the indentation of the arguments __ip_append_data() call
> 
> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> 
> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> that creates skb and adds it to sk_write_queue.
> 
> If skb was added successfully following ip_push_pending_frames() call
> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> 
> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.

I thought this was done by ip_flush_pending_frames() ?

Can you describe the issue more precisely ?

Thanks !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-15  4:46 ` Eric Dumazet
@ 2014-10-15  6:56   ` Vasily Averin
  2014-10-15  9:30     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Vasily Averin @ 2014-10-15  6:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 15.10.2014 08:46, Eric Dumazet wrote:
> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> 
> I thought this was done by ip_flush_pending_frames() ?

Take look at ip_send_unicast_reply():

ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().

Probably it can happen in raw_sendmsg() and udp_sendmsg() too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-14 20:12 ` David Miller
@ 2014-10-15  7:48   ` Vasily Averin
  0 siblings, 0 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-15  7:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eric.dumazet



On 15.10.2014 00:12, David Miller wrote:
> From: Vasily Averin <vvs@parallels.com>
> Date: Tue, 14 Oct 2014 08:57:14 +0400
> 
>> v2: adjust the indentation of the arguments __ip_append_data() call
>>
>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>
>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>> that creates skb and adds it to sk_write_queue.
>>
>> If skb was added successfully following ip_push_pending_frames() call
>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>
>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>
>> Signed-off-by: Vasily Averin <vvs@parallels.com>
> 
> Why doesn't ip_make_skb() need the same fix?  It seems to do the same
> thing.

It seems for me ip_make_skb() works (almost) correctly,
but seems refcounting can be is incorrect if queue can be not empty 
(Please see details below).

If __ip_append_data() returns errors ip_make_skb() calls 
__ip_flush_pending_frames() that calls ip_cork_release() inside
and frees stolen dst_entry.

If __ip_append_data() returns success -- dst refcounter changes are not required.
In this case skb will be created and added to queue (and it will not be empty)
Later in __ip_make_skb() these skb will get dst reference,
and refcounter will be decremented during kfree_skb(). 

I do not like that there is such unclear dependency between functions,
but seems currently it works correctly.

However I afraid dst refcountng can work incorrectly if sk_write_queue
can be not empty at the moment of ip_append_data() call.
It was not happen in case ip_send_unicast_reply() but probably
can happen in other places.

Let's calculate dst refcounters changes in this case.

First packet:

dst_refcount increment was happen in ip_append_data() caller, taken during rt lookup
- ip_append_data():
--  sk_write_queue is empty, ip_setup_cork() steals dst entry
-- __ip_append_data() adds skb to queue, queue is not flushed, waiting for next packets.
ip_rt_put in ip_append_data() caller does not work, because dst reference was stolen.

dst refcount here +1

then we want to sent 2nd packet:
dst refcount increment was happen in ip_append_data() caller
- ip_append_data():
-- sk_write_queue is NOT empty, dst was not stolen
-- __ip_append_data() adds skb to queue
ip_rt_put in ip_append_data() caller decrements dst refcount, because it as not stolen

dst refcount here +1 

Then we handle new packets, all of them are added to queue

dst refcount is still +1

Then queue is flushed.
Each packet in queue get dst reference from cork,
Each kfree_skb decrements dst refcounter, and it may become negative.

Am I wrong probably?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-15  6:56   ` Vasily Averin
@ 2014-10-15  9:30     ` Eric Dumazet
  2014-10-15 11:31       ` Vasily Averin
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-10-15  9:30 UTC (permalink / raw)
  To: Vasily Averin
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
> On 15.10.2014 08:46, Eric Dumazet wrote:
> > On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> >> v2: adjust the indentation of the arguments __ip_append_data() call
> >>
> >> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> >>
> >> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> >> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> >> that creates skb and adds it to sk_write_queue.
> >>
> >> If skb was added successfully following ip_push_pending_frames() call
> >> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> >>
> >> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> > 
> > I thought this was done by ip_flush_pending_frames() ?
> 
> Take look at ip_send_unicast_reply():

So maybe the bug is here ?


> 
> ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
> And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().
> 
> Probably it can happen in raw_sendmsg() and udp_sendmsg() too.

UDP & RAW do :

err = ip_append_data(...);
if (err)
    udp_flush_pending_frames(sk);


It seems you chose to add a test in fast path, with not even adding an
unlikely() clause, while it seems that we took care of all the cases but
missed a single one : ip_send_unicast_reply()

I am suggesting to fix this bug in another way.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
  2014-10-15  9:30     ` Eric Dumazet
@ 2014-10-15 11:31       ` Vasily Averin
  0 siblings, 0 replies; 7+ messages in thread
From: Vasily Averin @ 2014-10-15 11:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

On 15.10.2014 13:30, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
>> On 15.10.2014 08:46, Eric Dumazet wrote:
>>> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>>>> v2: adjust the indentation of the arguments __ip_append_data() call
>>>>
>>>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>>>
>>>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>>>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>>>> that creates skb and adds it to sk_write_queue.
>>>>
>>>> If skb was added successfully following ip_push_pending_frames() call
>>>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>>>
>>>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>>
>>> I thought this was done by ip_flush_pending_frames() ?
>>
>> Take look at ip_send_unicast_reply():
> 
> So maybe the bug is here ?

Thank you, I'll remake my patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-15 11:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-14  4:57 [PATCH v2] ipv4: dst_entry leak in ip_append_data() Vasily Averin
2014-10-14 20:12 ` David Miller
2014-10-15  7:48   ` Vasily Averin
2014-10-15  4:46 ` Eric Dumazet
2014-10-15  6:56   ` Vasily Averin
2014-10-15  9:30     ` Eric Dumazet
2014-10-15 11:31       ` Vasily Averin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).