* [PATCH] net: ipv4: Fix incorrect free in ICMP receive
@ 2015-01-16 7:48 subashab
2015-01-16 8:09 ` David Miller
2015-01-16 12:52 ` Eric Dumazet
0 siblings, 2 replies; 10+ messages in thread
From: subashab @ 2015-01-16 7:48 UTC (permalink / raw)
To: netdev
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and panics as the
socket is freed already in ping_rcv().
WARN stack trace @ WARN_ON(atomic_read(&sk->sk_rmem_alloc));
dump_backtrace+0x0/0x248
show_stack+0x10/0x1c
dump_stack+0x1c/0x28
warn_slowpath_common+0x74/0x9c
warn_slowpath_null+0x14/0x20
inet_sock_destruct+0x130/0x1a0
__sk_free+0x1c/0x168
sk_free+0x24/0x30
ping_rcv+0xf4/0x124
icmp_rcv+0x224/0x2c4
ip_local_deliver_finish+0x108/0x214
ip_local_deliver+0x88/0xa0
ip_rcv_finish+0x234/0x284
ip_rcv+0x258/0x2e8
__netif_receive_skb_core+0x640/0x6b4
<snip>
-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish
Fix this by orphaning the skb's before freeing the socket
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
net/ipv4/af_inet.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b507a47..0c58f0e5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -147,6 +147,12 @@ EXPORT_SYMBOL(ipv4_config);
void inet_sock_destruct(struct sock *sk)
{
struct inet_sock *inet = inet_sk(sk);
+ struct sk_buff *skb;
+
+ skb_queue_walk(&sk->sk_receive_queue, skb)
+ skb_orphan(skb);
+ skb_queue_walk(&sk->sk_error_queue, skb)
+ skb_orphan(skb);
__skb_queue_purge(&sk->sk_receive_queue);
__skb_queue_purge(&sk->sk_error_queue);
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 7:48 [PATCH] net: ipv4: Fix incorrect free in ICMP receive subashab
@ 2015-01-16 8:09 ` David Miller
2015-01-16 12:52 ` Eric Dumazet
1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2015-01-16 8:09 UTC (permalink / raw)
To: subashab; +Cc: netdev
From: subashab@codeaurora.org
Date: Fri, 16 Jan 2015 07:48:36 -0000
> An exception is seen in ICMP ping receive path where the skb
> destructor sock_rfree() tries to access a freed socket. This happens
> because ping_rcv() releases socket reference with sock_put() and this
> internally frees up the socket. Later icmp_rcv() will try to free the
> skb and as part of this, skb destructor is called and panics as the
> socket is freed already in ping_rcv().
At the point in which foo_sock_destruct() is called, the 'sk' is not
freed. We're operating on it inside of the destruct function itself.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 7:48 [PATCH] net: ipv4: Fix incorrect free in ICMP receive subashab
2015-01-16 8:09 ` David Miller
@ 2015-01-16 12:52 ` Eric Dumazet
2015-01-16 20:59 ` subashab
1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-01-16 12:52 UTC (permalink / raw)
To: subashab; +Cc: netdev
On Fri, 2015-01-16 at 07:48 +0000, subashab@codeaurora.org wrote:
> An exception is seen in ICMP ping receive path where the skb
> destructor sock_rfree() tries to access a freed socket. This happens
> because ping_rcv() releases socket reference with sock_put() and this
> internally frees up the socket. Later icmp_rcv() will try to free the
> skb and as part of this, skb destructor is called and panics as the
> socket is freed already in ping_rcv().
>
> WARN stack trace @ WARN_ON(atomic_read(&sk->sk_rmem_alloc));
> dump_backtrace+0x0/0x248
> show_stack+0x10/0x1c
> dump_stack+0x1c/0x28
> warn_slowpath_common+0x74/0x9c
> warn_slowpath_null+0x14/0x20
> inet_sock_destruct+0x130/0x1a0
> __sk_free+0x1c/0x168
> sk_free+0x24/0x30
> ping_rcv+0xf4/0x124
> icmp_rcv+0x224/0x2c4
> ip_local_deliver_finish+0x108/0x214
> ip_local_deliver+0x88/0xa0
> ip_rcv_finish+0x234/0x284
> ip_rcv+0x258/0x2e8
> __netif_receive_skb_core+0x640/0x6b4
> <snip>
>
> -->|exception
> -007|sk_mem_uncharge
> -007|sock_rfree
> -008|skb_release_head_state
> -009|skb_release_all
> -009|__kfree_skb
> -010|kfree_skb
> -011|icmp_rcv
> -012|ip_local_deliver_finish
>
> Fix this by orphaning the skb's before freeing the socket
>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
> net/ipv4/af_inet.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b507a47..0c58f0e5 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -147,6 +147,12 @@ EXPORT_SYMBOL(ipv4_config);
> void inet_sock_destruct(struct sock *sk)
> {
> struct inet_sock *inet = inet_sk(sk);
> + struct sk_buff *skb;
> +
> + skb_queue_walk(&sk->sk_receive_queue, skb)
> + skb_orphan(skb);
> + skb_queue_walk(&sk->sk_error_queue, skb)
> + skb_orphan(skb);
>
> __skb_queue_purge(&sk->sk_receive_queue);
> __skb_queue_purge(&sk->sk_error_queue);
Sorry this makes absolutely no sense to me.
skb_queue_purge() is also calling skb_orphan() on all skb.
And no, sock_put() does not free the socket.
We release a refcount that was acquired when we did the socket lookup.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 12:52 ` Eric Dumazet
@ 2015-01-16 20:59 ` subashab
2015-01-16 21:48 ` Eric Dumazet
2015-01-16 21:48 ` David Miller
0 siblings, 2 replies; 10+ messages in thread
From: subashab @ 2015-01-16 20:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
>skb_queue_purge() is also calling skb_orphan() on all skb
>From my reading, it looked like skb_queue_purge() is dequeuing and calling
kfree_skb() which will release a reference. I did not see skb_orphan()
being called directly. Am I missing something?
I think that if it had really orphaned the skb, then this crash would not
be seen in the first place.
>And no, sock_put() does not free the socket.
>We release a refcount that was acquired when we did the socket lookup.
I agree that the refcount is released, but it looked to me like inside
sock_put() the reference count reached 0 hence the socket was freed.
Here is why I say this
(from dmesg)
WARN stack trace @ WARN_ON(atomic_read(&sk->sk_rmem_alloc));
dump_backtrace+0x0/0x248
show_stack+0x10/0x1c
dump_stack+0x1c/0x28
warn_slowpath_common+0x74/0x9c
warn_slowpath_null+0x14/0x20
inet_sock_destruct+0x130/0x1a0
__sk_free+0x1c/0x168
sk_free+0x24/0x30 [sock_put() inlined hence not seen in the
stack trace I believe]
ping_rcv+0xf4/0x124
icmp_rcv+0x224/0x2c4
ip_local_deliver_finish+0x108/0x214
and here is stack at the time of the exception
-007|sk_mem_uncharge [trying to access freed sk structure]
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish
I am not sure why the ref reached zero and if that is even the right trail
of thought. Could the userspace have released the sock reference also and
this is race condition? Any thoughts?
Note that we are using the 3.10 kernel version and does not have Rick
Jones patch in this area e3e3217029a35c579bf100998b43976d0b1cb8d7
"icmp: Remove some spurious dropped packet profile hits from the ICMP",
although it looks to me that this patch would not solve this problem
either.
Thanks
KS
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
> On Fri, 2015-01-16 at 07:48 +0000, subashab@codeaurora.org wrote:
>> An exception is seen in ICMP ping receive path where the skb
>> destructor sock_rfree() tries to access a freed socket. This happens
>> because ping_rcv() releases socket reference with sock_put() and this
>> internally frees up the socket. Later icmp_rcv() will try to free the
>> skb and as part of this, skb destructor is called and panics as the
>> socket is freed already in ping_rcv().
>>
>> WARN stack trace @ WARN_ON(atomic_read(&sk->sk_rmem_alloc));
>> dump_backtrace+0x0/0x248
>> show_stack+0x10/0x1c
>> dump_stack+0x1c/0x28
>> warn_slowpath_common+0x74/0x9c
>> warn_slowpath_null+0x14/0x20
>> inet_sock_destruct+0x130/0x1a0
>> __sk_free+0x1c/0x168
>> sk_free+0x24/0x30
>> ping_rcv+0xf4/0x124
>> icmp_rcv+0x224/0x2c4
>> ip_local_deliver_finish+0x108/0x214
>> ip_local_deliver+0x88/0xa0
>> ip_rcv_finish+0x234/0x284
>> ip_rcv+0x258/0x2e8
>> __netif_receive_skb_core+0x640/0x6b4
>> <snip>
>>
>> -->|exception
>> -007|sk_mem_uncharge
>> -007|sock_rfree
>> -008|skb_release_head_state
>> -009|skb_release_all
>> -009|__kfree_skb
>> -010|kfree_skb
>> -011|icmp_rcv
>> -012|ip_local_deliver_finish
>>
>> Fix this by orphaning the skb's before freeing the socket
>>
>> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
>> ---
>> net/ipv4/af_inet.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index b507a47..0c58f0e5 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -147,6 +147,12 @@ EXPORT_SYMBOL(ipv4_config);
>> void inet_sock_destruct(struct sock *sk)
>> {
>> struct inet_sock *inet = inet_sk(sk);
>> + struct sk_buff *skb;
>> +
>> + skb_queue_walk(&sk->sk_receive_queue, skb)
>> + skb_orphan(skb);
>> + skb_queue_walk(&sk->sk_error_queue, skb)
>> + skb_orphan(skb);
>>
>> __skb_queue_purge(&sk->sk_receive_queue);
>> __skb_queue_purge(&sk->sk_error_queue);
>
> Sorry this makes absolutely no sense to me.
>
> skb_queue_purge() is also calling skb_orphan() on all skb.
>
> And no, sock_put() does not free the socket.
> We release a refcount that was acquired when we did the socket lookup.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 20:59 ` subashab
@ 2015-01-16 21:48 ` Eric Dumazet
2015-01-16 21:48 ` David Miller
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-16 21:48 UTC (permalink / raw)
To: subashab; +Cc: netdev
On Fri, 2015-01-16 at 20:59 +0000, subashab@codeaurora.org wrote:
> >skb_queue_purge() is also calling skb_orphan() on all skb
> From my reading, it looked like skb_queue_purge() is dequeuing and calling
> kfree_skb() which will release a reference. I did not see skb_orphan()
> being called directly. Am I missing something?
> I think that if it had really orphaned the skb, then this crash would not
> be seen in the first place.
Well, I never said you had no crash.
I said your 'fix' made no sense.
Sure, we automatically call skb_orphan() when skb are freed, as a in
kfree_skb()
Thats whole point having skb->destructor
> >And no, sock_put() does not free the socket.
> >We release a refcount that was acquired when we did the socket lookup.
> I agree that the refcount is released, but it looked to me like inside
> sock_put() the reference count reached 0 hence the socket was freed.
>
> Here is why I say this
> (from dmesg)
> WARN stack trace @ WARN_ON(atomic_read(&sk->sk_rmem_alloc));
> dump_backtrace+0x0/0x248
> show_stack+0x10/0x1c
> dump_stack+0x1c/0x28
> warn_slowpath_common+0x74/0x9c
> warn_slowpath_null+0x14/0x20
> inet_sock_destruct+0x130/0x1a0
> __sk_free+0x1c/0x168
> sk_free+0x24/0x30 [sock_put() inlined hence not seen in the
> stack trace I believe]
> ping_rcv+0xf4/0x124
> icmp_rcv+0x224/0x2c4
> ip_local_deliver_finish+0x108/0x214
>
> and here is stack at the time of the exception
> -007|sk_mem_uncharge [trying to access freed sk structure]
> -007|sock_rfree
> -008|skb_release_head_state
> -009|skb_release_all
> -009|__kfree_skb
> -010|kfree_skb
> -011|icmp_rcv
> -012|ip_local_deliver_finish
>
> I am not sure why the ref reached zero and if that is even the right trail
> of thought. Could the userspace have released the sock reference also and
> this is race condition? Any thoughts?
Normally it is not possible. By definition, ping_lookup() does a
sock_hold() on a socket that is still alive (since it was found in hash
table). Hash table is protected by ping_table.lock rwlock.
>
> Note that we are using the 3.10 kernel version and does not have Rick
> Jones patch in this area e3e3217029a35c579bf100998b43976d0b1cb8d7
> "icmp: Remove some spurious dropped packet profile hits from the ICMP",
> although it looks to me that this patch would not solve this problem
> either.
Seriously guys.
Do not post upstream patches if you cannot reproduce the issue on
current kernel.
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 20:59 ` subashab
2015-01-16 21:48 ` Eric Dumazet
@ 2015-01-16 21:48 ` David Miller
2015-01-20 2:34 ` subashab
1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2015-01-16 21:48 UTC (permalink / raw)
To: subashab; +Cc: eric.dumazet, netdev
From: subashab@codeaurora.org
Date: Fri, 16 Jan 2015 20:59:14 -0000
>>skb_queue_purge() is also calling skb_orphan() on all skb
> From my reading, it looked like skb_queue_purge() is dequeuing and calling
> kfree_skb() which will release a reference. I did not see skb_orphan()
> being called directly. Am I missing something?
> I think that if it had really orphaned the skb, then this crash would not
> be seen in the first place.
The calls to skb->destructor(), done by skb_queue_purge() (via
kfree_skb()) do this.
But even if it didn't, the fact remains that we are operating on the
socket right here in the destructor. It still exists and has not been
freed yet.
And furthermore, exactly what skb_orphan() does is call skb->destructor(),
_JUST_LIKE_ skb_queue_purge() will via kfree_skb().
So either sock_rfree() is safe to call here, or it isn't. You are not
eliminating the calls to sock_rfree() which operate on this socket at
all. If you did, then the socket memory counters would end up being
corrupts and the warnings would trigger:
WARN_ON(atomic_read(&sk->sk_rmem_alloc));
WARN_ON(atomic_read(&sk->sk_wmem_alloc));
You're just moving the skb->destructor() call up a few lines in the
same function, it makes therefore no sense why this would fix a bug
or not.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-16 21:48 ` David Miller
@ 2015-01-20 2:34 ` subashab
2015-01-20 3:50 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: subashab @ 2015-01-20 2:34 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, netdev
Thanks David and Eric for the insights. In order for me to steer this
debug in the right direction, can you please help me? Based on your input
I looked into this a little deeper to understand the refcnts for sockets
and skb's in this ping receive path.
from ping_rcv()
sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id));
if (sk != NULL) {
pr_debug("rcv on socket %p\n", sk);
ping_queue_rcv_skb(sk, skb_get(skb));
sock_put(sk);
return;
}
>From my understanding I have made the following analysis, please correct
if I am wrong.
1) There is no guarantee that sock_put() in the above code snippet
will not drop the socket refcount to 0 and free the socket. This can
hypothetically happen if say
sock_close()->ping_close()->*->ping_unhash()->sock_put()
can happen between in a different context between ping_lookup() and
sock_put() in the above code snippet. Is this observation accurate?
2) Now since this socket is being freed in the ping receive path, I think
the following is what is happening with the skb.
alloc_skb()[skb->users=1] -> deliver_skb()[skb->users=2] -> * ->
icmp_rcv() -> ping_rcv() -> sk_free --> inet_sock_destruct()->
__skb_queue_purge()->kfree_skb()[dec ref cnt, skb->users=1]
when stack unwinds to icmp_rcv(), refcnt actually hits zero and packet is
freed calling the destructor which tries to access the freed socket.
If these observations are right, Can you please tell me what is the call
flow that is not supposed to happen but is happening in this issue? I am
trying to understand better to identify next steps to tackle this issue.
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
> From: subashab@codeaurora.org
> Date: Fri, 16 Jan 2015 20:59:14 -0000
>
>>>skb_queue_purge() is also calling skb_orphan() on all skb
>> From my reading, it looked like skb_queue_purge() is dequeuing and
>> calling
>> kfree_skb() which will release a reference. I did not see skb_orphan()
>> being called directly. Am I missing something?
>> I think that if it had really orphaned the skb, then this crash would
>> not
>> be seen in the first place.
>
> The calls to skb->destructor(), done by skb_queue_purge() (via
> kfree_skb()) do this.
>
> But even if it didn't, the fact remains that we are operating on the
> socket right here in the destructor. It still exists and has not been
> freed yet.
>
> And furthermore, exactly what skb_orphan() does is call skb->destructor(),
> _JUST_LIKE_ skb_queue_purge() will via kfree_skb().
>
> So either sock_rfree() is safe to call here, or it isn't. You are not
> eliminating the calls to sock_rfree() which operate on this socket at
> all. If you did, then the socket memory counters would end up being
> corrupts and the warnings would trigger:
>
> WARN_ON(atomic_read(&sk->sk_rmem_alloc));
> WARN_ON(atomic_read(&sk->sk_wmem_alloc));
>
> You're just moving the skb->destructor() call up a few lines in the
> same function, it makes therefore no sense why this would fix a bug
> or not.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-20 2:34 ` subashab
@ 2015-01-20 3:50 ` Eric Dumazet
2015-01-20 19:42 ` Cong Wang
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-01-20 3:50 UTC (permalink / raw)
To: subashab; +Cc: David Miller, netdev
On Tue, 2015-01-20 at 02:34 +0000, subashab@codeaurora.org wrote:
> Thanks David and Eric for the insights. In order for me to steer this
> debug in the right direction, can you please help me? Based on your input
> I looked into this a little deeper to understand the refcnts for sockets
> and skb's in this ping receive path.
>
> from ping_rcv()
>
> sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id));
> if (sk != NULL) {
> pr_debug("rcv on socket %p\n", sk);
> ping_queue_rcv_skb(sk, skb_get(skb));
> sock_put(sk);
> return;
> }
>
> From my understanding I have made the following analysis, please correct
> if I am wrong.
>
> 1) There is no guarantee that sock_put() in the above code snippet
> will not drop the socket refcount to 0 and free the socket. This can
> hypothetically happen if say
> sock_close()->ping_close()->*->ping_unhash()->sock_put()
> can happen between in a different context between ping_lookup() and
> sock_put() in the above code snippet. Is this observation accurate?
>
> 2) Now since this socket is being freed in the ping receive path, I think
> the following is what is happening with the skb.
> alloc_skb()[skb->users=1] -> deliver_skb()[skb->users=2] -> * ->
> icmp_rcv() -> ping_rcv() -> sk_free --> inet_sock_destruct()->
> __skb_queue_purge()->kfree_skb()[dec ref cnt, skb->users=1]
>
> when stack unwinds to icmp_rcv(), refcnt actually hits zero and packet is
> freed calling the destructor which tries to access the freed socket.
>
> If these observations are right, Can you please tell me what is the call
> flow that is not supposed to happen but is happening in this issue? I am
> trying to understand better to identify next steps to tackle this issue.
This is why skb_get() is very often a bug.
There is no guarantee the consume_skb() in icmp_rcv() is done before the
skb_queue_purge().
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index c0d82f78d364..2a3720fb5a5f 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -966,8 +966,11 @@ bool ping_rcv(struct sk_buff *skb)
sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id));
if (sk != NULL) {
+ struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
+
pr_debug("rcv on socket %p\n", sk);
- ping_queue_rcv_skb(sk, skb_get(skb));
+ if (skb2)
+ ping_queue_rcv_skb(sk, skb2);
sock_put(sk);
return true;
}
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-20 3:50 ` Eric Dumazet
@ 2015-01-20 19:42 ` Cong Wang
2015-01-20 21:35 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Cong Wang @ 2015-01-20 19:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: subashab, David Miller, netdev
On Mon, Jan 19, 2015 at 7:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2015-01-20 at 02:34 +0000, subashab@codeaurora.org wrote:
>> Thanks David and Eric for the insights. In order for me to steer this
>> debug in the right direction, can you please help me? Based on your input
>> I looked into this a little deeper to understand the refcnts for sockets
>> and skb's in this ping receive path.
>>
>> from ping_rcv()
>>
>> sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id));
>> if (sk != NULL) {
>> pr_debug("rcv on socket %p\n", sk);
>> ping_queue_rcv_skb(sk, skb_get(skb));
>> sock_put(sk);
>> return;
>> }
>>
>> From my understanding I have made the following analysis, please correct
>> if I am wrong.
>>
>> 1) There is no guarantee that sock_put() in the above code snippet
>> will not drop the socket refcount to 0 and free the socket. This can
>> hypothetically happen if say
>> sock_close()->ping_close()->*->ping_unhash()->sock_put()
>> can happen between in a different context between ping_lookup() and
>> sock_put() in the above code snippet. Is this observation accurate?
>>
>> 2) Now since this socket is being freed in the ping receive path, I think
>> the following is what is happening with the skb.
>> alloc_skb()[skb->users=1] -> deliver_skb()[skb->users=2] -> * ->
>> icmp_rcv() -> ping_rcv() -> sk_free --> inet_sock_destruct()->
>> __skb_queue_purge()->kfree_skb()[dec ref cnt, skb->users=1]
>>
>> when stack unwinds to icmp_rcv(), refcnt actually hits zero and packet is
>> freed calling the destructor which tries to access the freed socket.
>>
>> If these observations are right, Can you please tell me what is the call
>> flow that is not supposed to happen but is happening in this issue? I am
>> trying to understand better to identify next steps to tackle this issue.
>
> This is why skb_get() is very often a bug.
>
> There is no guarantee the consume_skb() in icmp_rcv() is done before the
> skb_queue_purge().
>
Or let the socket layer drop the packet instead of unconditionally
dropping all icmp packets after success?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net: ipv4: Fix incorrect free in ICMP receive
2015-01-20 19:42 ` Cong Wang
@ 2015-01-20 21:35 ` Eric Dumazet
0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-01-20 21:35 UTC (permalink / raw)
To: Cong Wang; +Cc: subashab, David Miller, netdev
On Tue, 2015-01-20 at 11:42 -0800, Cong Wang wrote:
>
> Or let the socket layer drop the packet instead of unconditionally
> dropping all icmp packets after success?
Sure, but that's probably at least 50 lines patch.
net-next candidate most likely.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-01-20 21:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-16 7:48 [PATCH] net: ipv4: Fix incorrect free in ICMP receive subashab
2015-01-16 8:09 ` David Miller
2015-01-16 12:52 ` Eric Dumazet
2015-01-16 20:59 ` subashab
2015-01-16 21:48 ` Eric Dumazet
2015-01-16 21:48 ` David Miller
2015-01-20 2:34 ` subashab
2015-01-20 3:50 ` Eric Dumazet
2015-01-20 19:42 ` Cong Wang
2015-01-20 21:35 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).