net/unix: sk_socket can disappear when state is unlocked

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* net/unix: sk_socket can disappear when state is unlocked
@ 2015-05-21 16:25 Mark Salyzyn
  2015-05-22  9:50 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Salyzyn @ 2015-05-21 16:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Salyzyn, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

got a rare NULL pointer dereference in clear_bit

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
---
 net/unix/af_unix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5266ea7..37a8925 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
 		unix_state_unlock(sk);
 		timeo = freezable_schedule_timeout(timeo);
 		unix_state_lock(sk);
+
+		/* sk_socket may have been killed while unlocked */
+		if (!sk->sk_socket)
+			break;
+
 		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 	}
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-21 16:25 net/unix: sk_socket can disappear when state is unlocked Mark Salyzyn
@ 2015-05-22  9:50 ` Hannes Frederic Sowa
  2015-05-22 14:51   ` Mark Salyzyn
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22  9:50 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
> got a rare NULL pointer dereference in clear_bit
> 
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> ---
>  net/unix/af_unix.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 5266ea7..37a8925 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
>  		unix_state_unlock(sk);
>  		timeo = freezable_schedule_timeout(timeo);
>  		unix_state_lock(sk);
> +
> +		/* sk_socket may have been killed while unlocked */
> +		if (!sk->sk_socket)
> +			break;
> +
>  		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
>  	}
>  

Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
seem like we are returning an error to user space but are still looping
to try to dequeue skbs from sk_receive_queue, which is concurrently
emptied by unix_release (maybe, without holding unix_state_lock).

Bye,
Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-22  9:50 ` Hannes Frederic Sowa
@ 2015-05-22 14:51   ` Mark Salyzyn
  2015-05-22 15:35     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Salyzyn @ 2015-05-22 14:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On 05/22/2015 02:50 AM, Hannes Frederic Sowa wrote:
> On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
>> got a rare NULL pointer dereference in clear_bit
>>
>> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
>> ---
>>   net/unix/af_unix.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 5266ea7..37a8925 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
>>   		unix_state_unlock(sk);
>>   		timeo = freezable_schedule_timeout(timeo);
>>   		unix_state_lock(sk);
>> +
>> +		/* sk_socket may have been killed while unlocked */
>> +		if (!sk->sk_socket)
>> +			break;
>> +
>>   		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
>>   	}
>>   
> Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
> seem like we are returning an error to user space but are still looping
> to try to dequeue skbs from sk_receive_queue, which is concurrently
> emptied by unix_release (maybe, without holding unix_state_lock).
>
> Bye,
> Hannes
>
I will send an updated patch shortly.

It may be acceptable given the expectation that sk_set_socket(sk, NULL) 
occurs after SOCK_DEAD flag is set since we would not be here during the 
socket initialization/connection phases. As such, for all phases (and I 
re-iterate, we can only be here if in connected state), it is not a 
generic guarantee of sk_socket != NULL. But I only saw one apparent 
example (in net/decnet/dn_nsp_in.c) of using sock_flag(sk, SOCK_DEAD) as 
protection against a possible deference NULL access with sk_socket, and 
many KISS examples of checking sk_socket for NULL to protect against thus.

Thanks for making me look though, it appears that I missed the same 
problem in net/caif/caif_socket.c and will add it!

Sincerely -- Mark Salyzyn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-22 14:51   ` Mark Salyzyn
@ 2015-05-22 15:35     ` Hannes Frederic Sowa
  2015-05-22 16:24       ` Mark Salyzyn
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22 15:35 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On Fr, 2015-05-22 at 07:51 -0700, Mark Salyzyn wrote:
> On 05/22/2015 02:50 AM, Hannes Frederic Sowa wrote:
> > On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
> >> got a rare NULL pointer dereference in clear_bit
> >>
> >> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> >> ---
> >>   net/unix/af_unix.c | 5 +++++
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >> index 5266ea7..37a8925 100644
> >> --- a/net/unix/af_unix.c
> >> +++ b/net/unix/af_unix.c
> >> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
> >>   		unix_state_unlock(sk);
> >>   		timeo = freezable_schedule_timeout(timeo);
> >>   		unix_state_lock(sk);
> >> +
> >> +		/* sk_socket may have been killed while unlocked */
> >> +		if (!sk->sk_socket)
> >> +			break;
> >> +
> >>   		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
> >>   	}
> >>   
> > Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
> > seem like we are returning an error to user space but are still looping
> > to try to dequeue skbs from sk_receive_queue, which is concurrently
> > emptied by unix_release (maybe, without holding unix_state_lock).
> >
> > Bye,
> > Hannes
> >
> I will send an updated patch shortly.
> 
> It may be acceptable given the expectation that sk_set_socket(sk, NULL) 
> occurs after SOCK_DEAD flag is set since we would not be here during the 
> socket initialization/connection phases. As such, for all phases (and I 
> re-iterate, we can only be here if in connected state), it is not a 
> generic guarantee of sk_socket != NULL. But I only saw one apparent 
> example (in net/decnet/dn_nsp_in.c) of using sock_flag(sk, SOCK_DEAD) as 
> protection against a possible deference NULL access with sk_socket, and 
> many KISS examples of checking sk_socket for NULL to protect against thus.
> 
> Thanks for making me look though, it appears that I missed the same 
> problem in net/caif/caif_socket.c and will add it!

Thank you for v2 of the patch.

I still wonder if we need to actually recheck the condition and not
simply break out of unix_stream_data_wait:

We return to the unix_stream_recvmsg loop and recheck the
sk_receive_queue. At this point sk_receive_queue is not really protected
with unix_state_lock against concurrent modification with unix_release,
as such we could end up concurrently dequeueing packets if socket is
DEAD.

Does that make sense?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-22 15:35     ` Hannes Frederic Sowa
@ 2015-05-22 16:24       ` Mark Salyzyn
  2015-05-22 18:16         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Salyzyn @ 2015-05-22 16:24 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
> I still wonder if we need to actually recheck the condition and not
> simply break out of unix_stream_data_wait:
>
> We return to the unix_stream_recvmsg loop and recheck the
> sk_receive_queue. At this point sk_receive_queue is not really protected
> with unix_state_lock against concurrent modification with unix_release,
> as such we could end up concurrently dequeueing packets if socket is
> DEAD.
sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so 
the receive queue has already been drained.

Sincerely -- Mark Salyzyn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-22 16:24       ` Mark Salyzyn
@ 2015-05-22 18:16         ` Hannes Frederic Sowa
  2015-05-22 19:59           ` Mark Salyzyn
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-22 18:16 UTC (permalink / raw)
  To: Mark Salyzyn, Hannes Frederic Sowa
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On Fri, May 22, 2015, at 18:24, Mark Salyzyn wrote:
> On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
> > I still wonder if we need to actually recheck the condition and not
> > simply break out of unix_stream_data_wait:
> >
> > We return to the unix_stream_recvmsg loop and recheck the
> > sk_receive_queue. At this point sk_receive_queue is not really protected
> > with unix_state_lock against concurrent modification with unix_release,
> > as such we could end up concurrently dequeueing packets if socket is
> > DEAD.
> sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so 
> the receive queue has already been drained.

I am still afraid that there is a race:

When we break out in unix_stream_data_wait we most of the time hit the
continue statement in unix_stream_recvmsg. Albeit we acquired state lock
again, we could end up in a situation where the sk_receive_queue is not
completely drained. We would miss the recheck of the sk_shutdown mask,
because it is possible we dequeue a non-null skb from the receive queue.
This is because unix_release_sock acquires state lock, sets appropriate
flags but the draining of the receive queue does happen without locks,
state lock is unlocked before that. So theoretically both, release_sock
and recvmsg could dequeue skbs concurrently in nondeterministic
behavior.

The fix would be to recheck SOCK_DEAD or even better, sk_shutdown right
after we reacquired state_lock and break out of the loop altogether,
maybe with -ECONNRESET.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: net/unix: sk_socket can disappear when state is unlocked
  2015-05-22 18:16         ` Hannes Frederic Sowa
@ 2015-05-22 19:59           ` Mark Salyzyn
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Salyzyn @ 2015-05-22 19:59 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Hannes Frederic Sowa
  Cc: linux-kernel, David S. Miller, Al Viro, David Howells, Ying Xue,
	Christoph Hellwig, netdev

On 05/22/2015 11:16 AM, Hannes Frederic Sowa wrote:
> On Fri, May 22, 2015, at 18:24, Mark Salyzyn wrote:
>> On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
>>> I still wonder if we need to actually recheck the condition and not
>>> simply break out of unix_stream_data_wait:
>>>
>>> We return to the unix_stream_recvmsg loop and recheck the
>>> sk_receive_queue. At this point sk_receive_queue is not really protected
>>> with unix_state_lock against concurrent modification with unix_release,
>>> as such we could end up concurrently dequeueing packets if socket is
>>> DEAD.
>> sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so
>> the receive queue has already been drained.
> I am still afraid that there is a race:
>
> When we break out in unix_stream_data_wait we most of the time hit the
> continue statement in unix_stream_recvmsg. Albeit we acquired state lock
> again, we could end up in a situation where the sk_receive_queue is not
> completely drained. We would miss the recheck of the sk_shutdown mask,
> because it is possible we dequeue a non-null skb from the receive queue.
> This is because unix_release_sock acquires state lock, sets appropriate
> flags but the draining of the receive queue does happen without locks,
> state lock is unlocked before that. So theoretically both, release_sock
> and recvmsg could dequeue skbs concurrently in nondeterministic
> behavior.
>
> The fix would be to recheck SOCK_DEAD or even better, sk_shutdown right
> after we reacquired state_lock and break out of the loop altogether,
> maybe with -ECONNRESET.
>
> Thanks,
> Hannes
I am trying to figure out _how_ to appease your worries.

Keep in mind what I hit was rare already, and resulted in a panic. 
Nondeterministic packet delivery during shutdown is a given, but if I 
buy that one can receive another frame after packet flush and 
RCV_SHUTDOWN, and SOCK_DEAD is set under lock then returning to the 
thread in wait, would you be more comfortable with:

         do {
                 int chunk;
                 struct sk_buff *skb, *last;

                 unix_state_lock(sk);
                 last = skb = skb_peek(&sk->sk_receive_queue);
  again:
-               if (skb == NULL) {
+               if (!skb || sock_flag(sk, SOCK_DEAD)) {
                         unix_sk(sk)->recursion_level = 0;
                         if (copied >= target)
                                 goto unlock;

- or -

+               skb = NULL;
+               if (!sock_flag(sk, SOCK_DEAD))   // check after loop, 
but not in again loop?
+                       skb = skb_peek(&sk->sk_receive_queue
+               last = skb;

I know this does not give you -ECONNRESET (but we will we get 
sock_error(sk) disposition, another check for sock_flag if err == 0 
could fix that)

Too far to deal with nondeterministic packet flow? getting a last packet 
or not does not seem worth the cycles of CPU trouble?

Sincerely -- Mark Salyzyn

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-22 19:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 16:25 net/unix: sk_socket can disappear when state is unlocked Mark Salyzyn
2015-05-22  9:50 ` Hannes Frederic Sowa
2015-05-22 14:51   ` Mark Salyzyn
2015-05-22 15:35     ` Hannes Frederic Sowa
2015-05-22 16:24       ` Mark Salyzyn
2015-05-22 18:16         ` Hannes Frederic Sowa
2015-05-22 19:59           ` Mark Salyzyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).