NULL primary

All of lore.kernel.org
 help / color / mirror / Atom feed

* NULL primary_path
@ 2013-03-06 22:57 Karl Heiss
  2013-03-07  1:53 ` Vlad Yasevich
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-06 22:57 UTC (permalink / raw)
  To: linux-sctp

I am getting kernel panics due to a NULL dereference in
sctp_getsockopt_sctp_status() when calling getsockopt() with
SCTP_STATUS immediately after establishing a connection.  This occurs
when transport = asoc->peer.primary_path; is NULL and transport is
later dereferenced.  Is there any way that an association would be
present but have no primary_path?  Should
sctp_getsockopt_sctp_status() be checking asoc->peer.primary_path and
returning -EINVAL?

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
@ 2013-03-07  1:53 ` Vlad Yasevich
  2013-03-07 14:52 ` Cristian Constantin
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07  1:53 UTC (permalink / raw)
  To: linux-sctp

On 03/06/2013 05:57 PM, Karl Heiss wrote:
> I am getting kernel panics due to a NULL dereference in
> sctp_getsockopt_sctp_status() when calling getsockopt() with
> SCTP_STATUS immediately after establishing a connection.  This occurs
> when transport = asoc->peer.primary_path; is NULL and transport is
> later dereferenced.  Is there any way that an association would be
> present but have no primary_path?

No, that shouldn't happen.  The very first transport that is added
to the association is assigned to the primary_path.  Primary_path can
never be null since the association must have at least 1 transport and
that 1 transport will always be primary.

Is this happening on the server or the client side?

Which kernel version?

Is Add-IP on and are there any Add-IP options in the packets?

Thanks
-vlad

>  Should
> sctp_getsockopt_sctp_status() be checking asoc->peer.primary_path and
> returning -EINVAL?
>
> Karl
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
  2013-03-07  1:53 ` Vlad Yasevich
@ 2013-03-07 14:52 ` Cristian Constantin
  2013-03-07 15:29 ` Neil Horman
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Cristian Constantin @ 2013-03-07 14:52 UTC (permalink / raw)
  To: linux-sctp

On Wed, Mar 06, 2013 at 08:53:02PM -0500, Vlad Yasevich wrote:
> On 03/06/2013 05:57 PM, Karl Heiss wrote:
> >I am getting kernel panics due to a NULL dereference in
> >sctp_getsockopt_sctp_status() when calling getsockopt() with
> >SCTP_STATUS immediately after establishing a connection.  This occurs
> >when transport = asoc->peer.primary_path; is NULL and transport is
> >later dereferenced.  Is there any way that an association would be
> >present but have no primary_path?
> 
> No, that shouldn't happen.  The very first transport that is added
> to the association is assigned to the primary_path.  Primary_path can
> never be null since the association must have at least 1 transport and
> that 1 transport will always be primary.

cristian: hi!

but, vlad, at least in 2.6.30.2 (which I have on my hdd with
the "tags" already computed), sctp_getsockopt_primary_addr() is a little
bit more conservative and does:
    
    ...
    asoc = sctp_id2assoc(sk, prim.ssp_assoc_id);
    if (!asoc)
        return -EINVAL;

    if (!asoc->peer.primary_path)
        return -ENOTCONN;
    ...

any idea if is there any big difference btw. the two getsockopt()s? behind
them, everything is assumed to be asynch., right? (in the sense that 
the assoc. can come and go at any time)

thanks!
bye now!
cristian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
  2013-03-07  1:53 ` Vlad Yasevich
  2013-03-07 14:52 ` Cristian Constantin
@ 2013-03-07 15:29 ` Neil Horman
  2013-03-07 15:44 ` Vlad Yasevich
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Neil Horman @ 2013-03-07 15:29 UTC (permalink / raw)
  To: linux-sctp

On Thu, Mar 07, 2013 at 03:52:35PM +0100, Cristian Constantin wrote:
> On Wed, Mar 06, 2013 at 08:53:02PM -0500, Vlad Yasevich wrote:
> > On 03/06/2013 05:57 PM, Karl Heiss wrote:
> > >I am getting kernel panics due to a NULL dereference in
> > >sctp_getsockopt_sctp_status() when calling getsockopt() with
> > >SCTP_STATUS immediately after establishing a connection.  This occurs
> > >when transport = asoc->peer.primary_path; is NULL and transport is
> > >later dereferenced.  Is there any way that an association would be
> > >present but have no primary_path?
> > 
> > No, that shouldn't happen.  The very first transport that is added
> > to the association is assigned to the primary_path.  Primary_path can
> > never be null since the association must have at least 1 transport and
> > that 1 transport will always be primary.
> 
> cristian: hi!
> 
> but, vlad, at least in 2.6.30.2 (which I have on my hdd with
> the "tags" already computed), sctp_getsockopt_primary_addr() is a little
> bit more conservative and does:
>     
>     ...
>     asoc = sctp_id2assoc(sk, prim.ssp_assoc_id);
>     if (!asoc)
>         return -EINVAL;
> 
>     if (!asoc->peer.primary_path)
>         return -ENOTCONN;
>     ...
> 
> any idea if is there any big difference btw. the two getsockopt()s? behind
> them, everything is assumed to be asynch., right? (in the sense that 
> the assoc. can come and go at any time)
> 
All paths that touch primary_path hold the associated socket lock for the socket
that owns the transport path, so there shouldn't be any changing of the primary
path while accessing it here, at least not in the upstream kernel.

Regards
Neil

> thanks!
> bye now!
> cristian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (2 preceding siblings ...)
  2013-03-07 15:29 ` Neil Horman
@ 2013-03-07 15:44 ` Vlad Yasevich
  2013-03-07 15:48 ` Vlad Yasevich
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07 15:44 UTC (permalink / raw)
  To: linux-sctp

On 03/07/2013 10:29 AM, Neil Horman wrote:
> On Thu, Mar 07, 2013 at 03:52:35PM +0100, Cristian Constantin wrote:
>> On Wed, Mar 06, 2013 at 08:53:02PM -0500, Vlad Yasevich wrote:
>>> On 03/06/2013 05:57 PM, Karl Heiss wrote:
>>>> I am getting kernel panics due to a NULL dereference in
>>>> sctp_getsockopt_sctp_status() when calling getsockopt() with
>>>> SCTP_STATUS immediately after establishing a connection.  This occurs
>>>> when transport = asoc->peer.primary_path; is NULL and transport is
>>>> later dereferenced.  Is there any way that an association would be
>>>> present but have no primary_path?
>>>
>>> No, that shouldn't happen.  The very first transport that is added
>>> to the association is assigned to the primary_path.  Primary_path can
>>> never be null since the association must have at least 1 transport and
>>> that 1 transport will always be primary.
>>
>> cristian: hi!
>>
>> but, vlad, at least in 2.6.30.2 (which I have on my hdd with
>> the "tags" already computed), sctp_getsockopt_primary_addr() is a little
>> bit more conservative and does:
>>
>>      ...
>>      asoc = sctp_id2assoc(sk, prim.ssp_assoc_id);
>>      if (!asoc)
>>          return -EINVAL;
>>
>>      if (!asoc->peer.primary_path)
>>          return -ENOTCONN;
>>      ...
>>
>> any idea if is there any big difference btw. the two getsockopt()s? behind
>> them, everything is assumed to be asynch., right? (in the sense that
>> the assoc. can come and go at any time)
>>
> All paths that touch primary_path hold the associated socket lock for the socket
> that owns the transport path, so there shouldn't be any changing of the primary
> path while accessing it here, at least not in the upstream kernel.

Right.  I don't this that condition is possible.  Association can't live 
without at least 1 transport and that 1 transport will always be primary.

-vlad

>
> Regards
> Neil
>
>> thanks!
>> bye now!
>> cristian
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (3 preceding siblings ...)
  2013-03-07 15:44 ` Vlad Yasevich
@ 2013-03-07 15:48 ` Vlad Yasevich
  2013-03-07 17:06 ` Karl Heiss
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07 15:48 UTC (permalink / raw)
  To: linux-sctp

On 03/06/2013 08:53 PM, Vlad Yasevich wrote:
> On 03/06/2013 05:57 PM, Karl Heiss wrote:
>> I am getting kernel panics due to a NULL dereference in
>> sctp_getsockopt_sctp_status() when calling getsockopt() with
>> SCTP_STATUS immediately after establishing a connection.  This occurs
>> when transport = asoc->peer.primary_path; is NULL and transport is
>> later dereferenced.  Is there any way that an association would be
>> present but have no primary_path?
>
> No, that shouldn't happen.  The very first transport that is added
> to the association is assigned to the primary_path.  Primary_path can
> never be null since the association must have at least 1 transport and
> that 1 transport will always be primary.
>
> Is this happening on the server or the client side?
>
> Which kernel version?
>
> Is Add-IP on and are there any Add-IP options in the packets?

Also, are you using SOCK_STREAM or SOCK_SEQPACKET sockets?

Thanks
-vlad

>
> Thanks
> -vlad
>
>>  Should
>> sctp_getsockopt_sctp_status() be checking asoc->peer.primary_path and
>> returning -EINVAL?
>>
>> Karl
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (4 preceding siblings ...)
  2013-03-07 15:48 ` Vlad Yasevich
@ 2013-03-07 17:06 ` Karl Heiss
  2013-03-07 17:17 ` Vlad Yasevich
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-07 17:06 UTC (permalink / raw)
  To: linux-sctp

The issue appears to manifest itself when the connection is closed
from the remote end and getsockopt(SCTP_STATUS) is called within a
small window in which the association is still valid but
asoc->peer.primary_path is NULL.

On Thu, Mar 7, 2013 at 10:48 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/06/2013 08:53 PM, Vlad Yasevich wrote:
>>
>> On 03/06/2013 05:57 PM, Karl Heiss wrote:
>>>
>>> I am getting kernel panics due to a NULL dereference in
>>> sctp_getsockopt_sctp_status() when calling getsockopt() with
>>> SCTP_STATUS immediately after establishing a connection.  This occurs
>>> when transport = asoc->peer.primary_path; is NULL and transport is
>>> later dereferenced.  Is there any way that an association would be
>>> present but have no primary_path?
>>
>>
>> No, that shouldn't happen.  The very first transport that is added
>> to the association is assigned to the primary_path.  Primary_path can
>> never be null since the association must have at least 1 transport and
>> that 1 transport will always be primary.
>>
>> Is this happening on the server or the client side?
>>
>> Which kernel version?
>>
>> Is Add-IP on and are there any Add-IP options in the packets?
>
>
> Also, are you using SOCK_STREAM or SOCK_SEQPACKET sockets?
>
> Thanks
> -vlad
>
>
>>
>> Thanks
>> -vlad
>>
>>>  Should
>>> sctp_getsockopt_sctp_status() be checking asoc->peer.primary_path and
>>> returning -EINVAL?
>>>
>>> Karl
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (5 preceding siblings ...)
  2013-03-07 17:06 ` Karl Heiss
@ 2013-03-07 17:17 ` Vlad Yasevich
  2013-03-07 21:51 ` Karl Heiss
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07 17:17 UTC (permalink / raw)
  To: linux-sctp

On 03/07/2013 12:06 PM, Karl Heiss wrote:
> The issue appears to manifest itself when the connection is closed
> from the remote end and getsockopt(SCTP_STATUS) is called within a
> small window in which the association is still valid but
> asoc->peer.primary_path is NULL.

Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the 
association to remain while all transports have been removed.

Here is a patch that should have addressed this condition:

commit 8c98653f05534acd1cb07ea4929702a3659177d1
Author: Daniel Borkmann <dborkman@redhat.com>
Date:   Fri Feb 1 04:37:43 2013 +0000

     sctp: sctp_close: fix release of bindings for deferred call_rcu's

Full patch is here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1

Make sure that you have this patch in the kernel you are running

-vlad

>
> On Thu, Mar 7, 2013 at 10:48 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/06/2013 08:53 PM, Vlad Yasevich wrote:
>>>
>>> On 03/06/2013 05:57 PM, Karl Heiss wrote:
>>>>
>>>> I am getting kernel panics due to a NULL dereference in
>>>> sctp_getsockopt_sctp_status() when calling getsockopt() with
>>>> SCTP_STATUS immediately after establishing a connection.  This occurs
>>>> when transport = asoc->peer.primary_path; is NULL and transport is
>>>> later dereferenced.  Is there any way that an association would be
>>>> present but have no primary_path?
>>>
>>>
>>> No, that shouldn't happen.  The very first transport that is added
>>> to the association is assigned to the primary_path.  Primary_path can
>>> never be null since the association must have at least 1 transport and
>>> that 1 transport will always be primary.
>>>
>>> Is this happening on the server or the client side?
>>>
>>> Which kernel version?
>>>
>>> Is Add-IP on and are there any Add-IP options in the packets?
>>
>>
>> Also, are you using SOCK_STREAM or SOCK_SEQPACKET sockets?
>>
>> Thanks
>> -vlad
>>
>>
>>>
>>> Thanks
>>> -vlad
>>>
>>>>   Should
>>>> sctp_getsockopt_sctp_status() be checking asoc->peer.primary_path and
>>>> returning -EINVAL?
>>>>
>>>> Karl
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (6 preceding siblings ...)
  2013-03-07 17:17 ` Vlad Yasevich
@ 2013-03-07 21:51 ` Karl Heiss
  2013-03-07 22:08 ` Vlad Yasevich
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-07 21:51 UTC (permalink / raw)
  To: linux-sctp

On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>
>> The issue appears to manifest itself when the connection is closed
>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>> small window in which the association is still valid but
>> asoc->peer.primary_path is NULL.
>
>
> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
> association to remain while all transports have been removed.
>
> Here is a patch that should have addressed this condition:
>
> commit 8c98653f05534acd1cb07ea4929702a3659177d1
> Author: Daniel Borkmann <dborkman@redhat.com>
> Date:   Fri Feb 1 04:37:43 2013 +0000
>
>     sctp: sctp_close: fix release of bindings for deferred call_rcu's
>
> Full patch is here:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>
> Make sure that you have this patch in the kernel you are running
>
> -vlad
>
>
>>

Unfortunately this patch wont apply to the version of the SCTP stack
that we are using (2.6.36.2) since it does not have a
sctp_transport_destroy_rcu() function.  Is there any chance that
simply swapping the order of the instructions without moving them
would have any effect?  I ask this hypothetically because the race
condition window seems to be difficult to recreate, thus nothing to
test against (aside from in the field!).

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (7 preceding siblings ...)
  2013-03-07 21:51 ` Karl Heiss
@ 2013-03-07 22:08 ` Vlad Yasevich
  2013-03-07 23:09 ` Vlad Yasevich
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07 22:08 UTC (permalink / raw)
  To: linux-sctp

On 03/07/2013 04:51 PM, Karl Heiss wrote:
> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>
>>> The issue appears to manifest itself when the connection is closed
>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>> small window in which the association is still valid but
>>> asoc->peer.primary_path is NULL.
>>
>>
>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>> association to remain while all transports have been removed.
>>
>> Here is a patch that should have addressed this condition:
>>
>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>> Author: Daniel Borkmann <dborkman@redhat.com>
>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>
>>      sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>
>> Full patch is here:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>
>> Make sure that you have this patch in the kernel you are running
>>
>> -vlad
>>
>>
>>>
>
> Unfortunately this patch wont apply to the version of the SCTP stack
> that we are using (2.6.36.2) since it does not have a
> sctp_transport_destroy_rcu() function.  Is there any chance that
> simply swapping the order of the instructions without moving them
> would have any effect?  I ask this hypothetically because the race
> condition window seems to be difficult to recreate, thus nothing to
> test against (aside from in the field!).
>
> Karl
>

I see.  Let me take another look.  If this race was happening before the
rcu, then it'll probably be there after.
If you don't hear anything for a few days, pester me.

Thanks
-vlad

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (8 preceding siblings ...)
  2013-03-07 22:08 ` Vlad Yasevich
@ 2013-03-07 23:09 ` Vlad Yasevich
  2013-03-08 13:52 ` Karl Heiss
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-07 23:09 UTC (permalink / raw)
  To: linux-sctp

On 03/07/2013 04:51 PM, Karl Heiss wrote:
> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>
>>> The issue appears to manifest itself when the connection is closed
>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>> small window in which the association is still valid but
>>> asoc->peer.primary_path is NULL.
>>
>>
>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>> association to remain while all transports have been removed.
>>
>> Here is a patch that should have addressed this condition:
>>
>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>> Author: Daniel Borkmann <dborkman@redhat.com>
>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>
>>      sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>
>> Full patch is here:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>
>> Make sure that you have this patch in the kernel you are running
>>
>> -vlad
>>
>>
>>>
>
> Unfortunately this patch wont apply to the version of the SCTP stack
> that we are using (2.6.36.2) since it does not have a
> sctp_transport_destroy_rcu() function.  Is there any chance that
> simply swapping the order of the instructions without moving them
> would have any effect?  I ask this hypothetically because the race
> condition window seems to be difficult to recreate, thus nothing to
> test against (aside from in the field!).
>
> Karl
>

Hi Karl

I think I see the problem now.  The problem happens when the association 
is destroyed.  We delay removing the association from
the association id pool until all references on the association
have dropped.  As a result, it is possible (for a very short
period of time) for an association structure to still exist in
the kernel and still be found via the association id, but that 
association has no transports and is about to be completely destroyed.

This is a really interesting race and I need to figure out if it is
there on purpose or not?

In the mean time, here is a patch that should solve it for you.

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index b907073..2d92c89 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock 
*sk, sctp_assoc_t id)
		if (!list_empty(&sctp_sk(sk)->ep->asocs))
			asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
					  struct sctp_association, asocs);
-		return asoc;
+		goto done;
	}

	/* Otherwise this is a UDP-style socket. */
@@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock 
*sk, sctp_assoc_t id)
	asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
	spin_unlock_bh(&sctp_assocs_id_lock);

+done:
	if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
		return NULL;


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (9 preceding siblings ...)
  2013-03-07 23:09 ` Vlad Yasevich
@ 2013-03-08 13:52 ` Karl Heiss
  2013-03-08 14:31 ` Karl Heiss
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-08 13:52 UTC (permalink / raw)
  To: linux-sctp

On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>
>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>> wrote:
>>>
>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>
>>>>
>>>> The issue appears to manifest itself when the connection is closed
>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>> small window in which the association is still valid but
>>>> asoc->peer.primary_path is NULL.
>>>
>>>
>>>
>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>> association to remain while all transports have been removed.
>>>
>>> Here is a patch that should have addressed this condition:
>>>
>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>
>>>      sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>>
>>> Full patch is here:
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>
>>> Make sure that you have this patch in the kernel you are running
>>>
>>> -vlad
>>>
>>>
>>>>
>>
>> Unfortunately this patch wont apply to the version of the SCTP stack
>> that we are using (2.6.36.2) since it does not have a
>> sctp_transport_destroy_rcu() function.  Is there any chance that
>> simply swapping the order of the instructions without moving them
>> would have any effect?  I ask this hypothetically because the race
>> condition window seems to be difficult to recreate, thus nothing to
>> test against (aside from in the field!).
>>
>> Karl
>>
>
> Hi Karl
>
> I think I see the problem now.  The problem happens when the association is
> destroyed.  We delay removing the association from
> the association id pool until all references on the association
> have dropped.  As a result, it is possible (for a very short
> period of time) for an association structure to still exist in
> the kernel and still be found via the association id, but that association
> has no transports and is about to be completely destroyed.
>
> This is a really interesting race and I need to figure out if it is
> there on purpose or not?
>
> In the mean time, here is a patch that should solve it for you.
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index b907073..2d92c89 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
> sctp_assoc_t id)
>                 if (!list_empty(&sctp_sk(sk)->ep->asocs))
>                         asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>                                           struct sctp_association, asocs);
> -               return asoc;
> +               goto done;
>         }
>
>         /* Otherwise this is a UDP-style socket. */
> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
> sctp_assoc_t id)
>         asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
> (int)id);
>         spin_unlock_bh(&sctp_assocs_id_lock);
>
> +done:
>         if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>                 return NULL;
>

Vlad,

Looking at the kdump from the panic, I am seeing that your patch above
may not work in this case since the asoc is valid, the base.sk is
valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
sk, this wouldn't appear to fix this issue.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (10 preceding siblings ...)
  2013-03-08 13:52 ` Karl Heiss
@ 2013-03-08 14:31 ` Karl Heiss
  2013-03-08 14:35 ` Neil Horman
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-08 14:31 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>
>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>> wrote:
>>>>
>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> The issue appears to manifest itself when the connection is closed
>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>> small window in which the association is still valid but
>>>>> asoc->peer.primary_path is NULL.
>>>>
>>>>
>>>>
>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>> association to remain while all transports have been removed.
>>>>
>>>> Here is a patch that should have addressed this condition:
>>>>
>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>
>>>>      sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>>>
>>>> Full patch is here:
>>>>
>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>
>>>> Make sure that you have this patch in the kernel you are running
>>>>
>>>> -vlad
>>>>
>>>>
>>>>>
>>>
>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>> that we are using (2.6.36.2) since it does not have a
>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>> simply swapping the order of the instructions without moving them
>>> would have any effect?  I ask this hypothetically because the race
>>> condition window seems to be difficult to recreate, thus nothing to
>>> test against (aside from in the field!).
>>>
>>> Karl
>>>
>>
>> Hi Karl
>>
>> I think I see the problem now.  The problem happens when the association is
>> destroyed.  We delay removing the association from
>> the association id pool until all references on the association
>> have dropped.  As a result, it is possible (for a very short
>> period of time) for an association structure to still exist in
>> the kernel and still be found via the association id, but that association
>> has no transports and is about to be completely destroyed.
>>
>> This is a really interesting race and I need to figure out if it is
>> there on purpose or not?
>>
>> In the mean time, here is a patch that should solve it for you.
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index b907073..2d92c89 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>> sctp_assoc_t id)
>>                 if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>                         asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>                                           struct sctp_association, asocs);
>> -               return asoc;
>> +               goto done;
>>         }
>>
>>         /* Otherwise this is a UDP-style socket. */
>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>> sctp_assoc_t id)
>>         asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>> (int)id);
>>         spin_unlock_bh(&sctp_assocs_id_lock);
>>
>> +done:
>>         if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>                 return NULL;
>>
>
> Vlad,
>
> Looking at the kdump from the panic, I am seeing that your patch above
> may not work in this case since the asoc is valid, the base.sk is
> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
> sk, this wouldn't appear to fix this issue.
>
> Karl

Vlad,

One other thing, with the difficulty we are having recreating this
issue, is there any generic way to increase the likelihood for the
transport to be cleared out while delaying the association cleanup?
Is there any way that the association is initialized without any
transport information? The reason I ask; we believe the issue is
happening very shortly after the association is brought up (we bring
it up and then do the getsockopt()).

Thanks,
Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (11 preceding siblings ...)
  2013-03-08 14:31 ` Karl Heiss
@ 2013-03-08 14:35 ` Neil Horman
  2013-03-08 15:31 ` Vlad Yasevich
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Neil Horman @ 2013-03-08 14:35 UTC (permalink / raw)
  To: linux-sctp

On Thu, Mar 07, 2013 at 04:51:18PM -0500, Karl Heiss wrote:
> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> > On 03/07/2013 12:06 PM, Karl Heiss wrote:
> >>
> >> The issue appears to manifest itself when the connection is closed
> >> from the remote end and getsockopt(SCTP_STATUS) is called within a
> >> small window in which the association is still valid but
> >> asoc->peer.primary_path is NULL.
> >
> >
> > Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
> > association to remain while all transports have been removed.
> >
> > Here is a patch that should have addressed this condition:
> >
> > commit 8c98653f05534acd1cb07ea4929702a3659177d1
> > Author: Daniel Borkmann <dborkman@redhat.com>
> > Date:   Fri Feb 1 04:37:43 2013 +0000
> >
> >     sctp: sctp_close: fix release of bindings for deferred call_rcu's
> >
> > Full patch is here:
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
> >
> > Make sure that you have this patch in the kernel you are running
> >
> > -vlad
> >
> >
> >>
> 
> Unfortunately this patch wont apply to the version of the SCTP stack
> that we are using (2.6.36.2) since it does not have a
> sctp_transport_destroy_rcu() function.  Is there any chance that
> simply swapping the order of the instructions without moving them
> would have any effect?  I ask this hypothetically because the race
> condition window seems to be difficult to recreate, thus nothing to
> test against (aside from in the field!).
> 
> Karl

you also need to backport comit 45122ca26ced7fae41049326a3797a73f961db2e.  You
may also need to massage these patches manually somewhat so that they apply
properly.
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (12 preceding siblings ...)
  2013-03-08 14:35 ` Neil Horman
@ 2013-03-08 15:31 ` Vlad Yasevich
  2013-03-08 15:37 ` Karl Heiss
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-08 15:31 UTC (permalink / raw)
  To: linux-sctp

On 03/08/2013 09:31 AM, Karl Heiss wrote:
> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>
>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>> small window in which the association is still valid but
>>>>>> asoc->peer.primary_path is NULL.
>>>>>
>>>>>
>>>>>
>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>> association to remain while all transports have been removed.
>>>>>
>>>>> Here is a patch that should have addressed this condition:
>>>>>
>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>
>>>>>       sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>>>>
>>>>> Full patch is here:
>>>>>
>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>
>>>>> Make sure that you have this patch in the kernel you are running
>>>>>
>>>>> -vlad
>>>>>
>>>>>
>>>>>>
>>>>
>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>> that we are using (2.6.36.2) since it does not have a
>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>> simply swapping the order of the instructions without moving them
>>>> would have any effect?  I ask this hypothetically because the race
>>>> condition window seems to be difficult to recreate, thus nothing to
>>>> test against (aside from in the field!).
>>>>
>>>> Karl
>>>>
>>>
>>> Hi Karl
>>>
>>> I think I see the problem now.  The problem happens when the association is
>>> destroyed.  We delay removing the association from
>>> the association id pool until all references on the association
>>> have dropped.  As a result, it is possible (for a very short
>>> period of time) for an association structure to still exist in
>>> the kernel and still be found via the association id, but that association
>>> has no transports and is about to be completely destroyed.
>>>
>>> This is a really interesting race and I need to figure out if it is
>>> there on purpose or not?
>>>
>>> In the mean time, here is a patch that should solve it for you.
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index b907073..2d92c89 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>>> sctp_assoc_t id)
>>>                  if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>                          asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>>                                            struct sctp_association, asocs);
>>> -               return asoc;
>>> +               goto done;
>>>          }
>>>
>>>          /* Otherwise this is a UDP-style socket. */
>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>>> sctp_assoc_t id)
>>>          asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>> (int)id);
>>>          spin_unlock_bh(&sctp_assocs_id_lock);
>>>
>>> +done:
>>>          if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>                  return NULL;
>>>
>>
>> Vlad,
>>
>> Looking at the kdump from the panic, I am seeing that your patch above
>> may not work in this case since the asoc is valid, the base.sk is
>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>> sk, this wouldn't appear to fix this issue.

Hm..  If the association is not marked "dead", it should still have all 
its transports present.  If you look at the peer.transport_addr_list in
you kdump, is that list empty or not?

Are any other peer transport pointers set (active_path, retran_path)?

>>
>> Karl
>
> Vlad,
>
> One other thing, with the difficulty we are having recreating this
> issue, is there any generic way to increase the likelihood for the
> transport to be cleared out while delaying the association cleanup?
> Is there any way that the association is initialized without any
> transport information?

When the association is initialized, the lists are empty, but the next
thing that happens is that we add transport of the destination we are
sending to or receiving from to the association and mark it as primary 
and active.  All this happens under a socket lock, so getsockopt can't
access the association until all actions on that association complete.

> The reason I ask; we believe the issue is
> happening very shortly after the association is brought up (we bring
> it up and then do the getsockopt()).

Can you check what the association state is?  Alternately, can you 
provide the kdump and the kernel so I can dig around.

Thanks
-vlad
>
> Thanks,
> Karl
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (13 preceding siblings ...)
  2013-03-08 15:31 ` Vlad Yasevich
@ 2013-03-08 15:37 ` Karl Heiss
  2013-03-08 16:42 ` Vlad Yasevich
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-08 15:37 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>
>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>
>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>> wrote:
>>>>
>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>> small window in which the association is still valid but
>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>> association to remain while all transports have been removed.
>>>>>>
>>>>>> Here is a patch that should have addressed this condition:
>>>>>>
>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>
>>>>>>       sctp: sctp_close: fix release of bindings for deferred
>>>>>> call_rcu's
>>>>>>
>>>>>> Full patch is here:
>>>>>>
>>>>>>
>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>
>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>
>>>>>>>
>>>>>
>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>> that we are using (2.6.36.2) since it does not have a
>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>> simply swapping the order of the instructions without moving them
>>>>> would have any effect?  I ask this hypothetically because the race
>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>> test against (aside from in the field!).
>>>>>
>>>>> Karl
>>>>>
>>>>
>>>> Hi Karl
>>>>
>>>> I think I see the problem now.  The problem happens when the association
>>>> is
>>>> destroyed.  We delay removing the association from
>>>> the association id pool until all references on the association
>>>> have dropped.  As a result, it is possible (for a very short
>>>> period of time) for an association structure to still exist in
>>>> the kernel and still be found via the association id, but that
>>>> association
>>>> has no transports and is about to be completely destroyed.
>>>>
>>>> This is a really interesting race and I need to figure out if it is
>>>> there on purpose or not?
>>>>
>>>> In the mean time, here is a patch that should solve it for you.
>>>>
>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>> index b907073..2d92c89 100644
>>>> --- a/net/sctp/socket.c
>>>> +++ b/net/sctp/socket.c
>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>> *sk,
>>>> sctp_assoc_t id)
>>>>                  if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>                          asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>                                            struct sctp_association,
>>>> asocs);
>>>> -               return asoc;
>>>> +               goto done;
>>>>          }
>>>>
>>>>          /* Otherwise this is a UDP-style socket. */
>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>> *sk,
>>>> sctp_assoc_t id)
>>>>          asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>> (int)id);
>>>>          spin_unlock_bh(&sctp_assocs_id_lock);
>>>>
>>>> +done:
>>>>          if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>                  return NULL;
>>>>
>>>
>>> Vlad,
>>>
>>> Looking at the kdump from the panic, I am seeing that your patch above
>>> may not work in this case since the asoc is valid, the base.sk is
>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>> sk, this wouldn't appear to fix this issue.
>
>
> Hm..  If the association is not marked "dead", it should still have all its
> transports present.  If you look at the peer.transport_addr_list in
> you kdump, is that list empty or not?
>
> Are any other peer transport pointers set (active_path, retran_path)?
>

crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
$14 = {
  rwnd = 65535,
  transport_addr_list = {
    next = 0xffff8107670e3180,
    prev = 0xffff8107670e3180
  },
  transport_count = 0,
  port = 3868,
  primary_path = 0x0,
  primary_addr = {
    v4 = {
      sin_family = 0,
      sin_port = 0,
      sin_addr = {
        s_addr = 0
      },
      __pad = "\000\000\000\000\000\000\000"
    },
    v6 = {
      sin6_family = 0,
      sin6_port = 0,
      sin6_flowinfo = 0,
      sin6_addr = {
        in6_u = {
          u6_addr8 "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
          u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
          u6_addr32 = {0, 0, 0, 0}
        }
      },
      sin6_scope_id = 0
    },
    sa = {
      sa_family = 0,
      sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
    }
  },
  active_path = 0x0,
  retran_path = 0x0,
  last_sent_to = 0x0,
  last_data_from = 0x0,
  tsn_map = {
    tsn_map = 0x0,
    base_tsn = 0,
    cumulative_tsn_ack_point = 0,
    max_tsn_seen = 0,
    len = 0,
    pending_data = 0,
    num_dup_tsns = 0,
    dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
  },
  sack_needed = 1 '\001',
  sack_cnt = 0,
  ecn_capable = 0 '\0',
  ipv4_address = 1 '\001',
  ipv6_address = 0 '\0',
  hostname_address = 0 '\0',
  asconf_capable = 0 '\0',
  prsctp_capable = 0 '\0',
  auth_capable = 0 '\0',
  adaptation_ind = 0,
  addip_disabled_mask = 0,
  i = {
    init_tag = 0,
    a_rwnd = 0,
    num_outbound_streams = 0,
    num_inbound_streams = 0,
    initial_tsn = 0
  },
  cookie_len = 0,
  cookie = 0x0,
  addip_serial = 0,
  peer_random = 0x0,
  peer_chunks = 0x0,
  peer_hmacs = 0x0
}

>
>>>
>>> Karl
>>
>>
>> Vlad,
>>
>> One other thing, with the difficulty we are having recreating this
>> issue, is there any generic way to increase the likelihood for the
>> transport to be cleared out while delaying the association cleanup?
>> Is there any way that the association is initialized without any
>> transport information?
>
>
> When the association is initialized, the lists are empty, but the next
> thing that happens is that we add transport of the destination we are
> sending to or receiving from to the association and mark it as primary and
> active.  All this happens under a socket lock, so getsockopt can't
> access the association until all actions on that association complete.
>
>
>> The reason I ask; we believe the issue is
>> happening very shortly after the association is brought up (we bring
>> it up and then do the getsockopt()).
>
>
> Can you check what the association state is?  Alternately, can you provide
> the kdump and the kernel so I can dig around.

crash> p ((struct sctp_association *) 0xffff8107670e3000).state
$15 = SCTP_STATE_CLOSED

>
> Thanks
> -vlad
>>
>>
>> Thanks,
>> Karl
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (14 preceding siblings ...)
  2013-03-08 15:37 ` Karl Heiss
@ 2013-03-08 16:42 ` Vlad Yasevich
  2013-03-08 17:06 ` Karl Heiss
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-08 16:42 UTC (permalink / raw)
  To: linux-sctp

On 03/08/2013 10:37 AM, Karl Heiss wrote:
> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>
>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>
>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>>> small window in which the association is still valid but
>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>>> association to remain while all transports have been removed.
>>>>>>>
>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>
>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>
>>>>>>>        sctp: sctp_close: fix release of bindings for deferred
>>>>>>> call_rcu's
>>>>>>>
>>>>>>> Full patch is here:
>>>>>>>
>>>>>>>
>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>
>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>
>>>>>>> -vlad
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>> simply swapping the order of the instructions without moving them
>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>>> test against (aside from in the field!).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>
>>>>> Hi Karl
>>>>>
>>>>> I think I see the problem now.  The problem happens when the association
>>>>> is
>>>>> destroyed.  We delay removing the association from
>>>>> the association id pool until all references on the association
>>>>> have dropped.  As a result, it is possible (for a very short
>>>>> period of time) for an association structure to still exist in
>>>>> the kernel and still be found via the association id, but that
>>>>> association
>>>>> has no transports and is about to be completely destroyed.
>>>>>
>>>>> This is a really interesting race and I need to figure out if it is
>>>>> there on purpose or not?
>>>>>
>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>
>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>> index b907073..2d92c89 100644
>>>>> --- a/net/sctp/socket.c
>>>>> +++ b/net/sctp/socket.c
>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>> *sk,
>>>>> sctp_assoc_t id)
>>>>>                   if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>                           asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>                                             struct sctp_association,
>>>>> asocs);
>>>>> -               return asoc;
>>>>> +               goto done;
>>>>>           }
>>>>>
>>>>>           /* Otherwise this is a UDP-style socket. */
>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>> *sk,
>>>>> sctp_assoc_t id)
>>>>>           asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>>> (int)id);
>>>>>           spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>
>>>>> +done:
>>>>>           if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>                   return NULL;
>>>>>
>>>>
>>>> Vlad,
>>>>
>>>> Looking at the kdump from the panic, I am seeing that your patch above
>>>> may not work in this case since the asoc is valid, the base.sk is
>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>>> sk, this wouldn't appear to fix this issue.
>>
>>
>> Hm..  If the association is not marked "dead", it should still have all its
>> transports present.  If you look at the peer.transport_addr_list in
>> you kdump, is that list empty or not?
>>
>> Are any other peer transport pointers set (active_path, retran_path)?
>>
>
> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
> $14 = {
>    rwnd = 65535,
>    transport_addr_list = {
>      next = 0xffff8107670e3180,
>      prev = 0xffff8107670e3180
>    },
>    transport_count = 0,
>    port = 3868,
>    primary_path = 0x0,
>    primary_addr = {
>      v4 = {
>        sin_family = 0,
>        sin_port = 0,
>        sin_addr = {
>          s_addr = 0
>        },
>        __pad = "\000\000\000\000\000\000\000"
>      },
>      v6 = {
>        sin6_family = 0,
>        sin6_port = 0,
>        sin6_flowinfo = 0,
>        sin6_addr = {
>          in6_u = {
>            u6_addr8 > "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>            u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>            u6_addr32 = {0, 0, 0, 0}
>          }
>        },
>        sin6_scope_id = 0
>      },
>      sa = {
>        sa_family = 0,
>        sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>      }
>    },
>    active_path = 0x0,
>    retran_path = 0x0,
>    last_sent_to = 0x0,
>    last_data_from = 0x0,
>    tsn_map = {
>      tsn_map = 0x0,
>      base_tsn = 0,
>      cumulative_tsn_ack_point = 0,
>      max_tsn_seen = 0,
>      len = 0,
>      pending_data = 0,
>      num_dup_tsns = 0,
>      dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>    },
>    sack_needed = 1 '\001',
>    sack_cnt = 0,
>    ecn_capable = 0 '\0',
>    ipv4_address = 1 '\001',
>    ipv6_address = 0 '\0',
>    hostname_address = 0 '\0',
>    asconf_capable = 0 '\0',
>    prsctp_capable = 0 '\0',
>    auth_capable = 0 '\0',
>    adaptation_ind = 0,
>    addip_disabled_mask = 0,
>    i = {
>      init_tag = 0,
>      a_rwnd = 0,
>      num_outbound_streams = 0,
>      num_inbound_streams = 0,
>      initial_tsn = 0
>    },
>    cookie_len = 0,
>    cookie = 0x0,
>    addip_serial = 0,
>    peer_random = 0x0,
>    peer_chunks = 0x0,
>    peer_hmacs = 0x0
> }
>
>>
>>>>
>>>> Karl
>>>
>>>
>>> Vlad,
>>>
>>> One other thing, with the difficulty we are having recreating this
>>> issue, is there any generic way to increase the likelihood for the
>>> transport to be cleared out while delaying the association cleanup?
>>> Is there any way that the association is initialized without any
>>> transport information?
>>
>>
>> When the association is initialized, the lists are empty, but the next
>> thing that happens is that we add transport of the destination we are
>> sending to or receiving from to the association and mark it as primary and
>> active.  All this happens under a socket lock, so getsockopt can't
>> access the association until all actions on that association complete.
>>
>>
>>> The reason I ask; we believe the issue is
>>> happening very shortly after the association is brought up (we bring
>>> it up and then do the getsockopt()).
>>
>>
>> Can you check what the association state is?  Alternately, can you provide
>> the kdump and the kernel so I can dig around.
>
> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
> $15 = SCTP_STATE_CLOSED


Hi Karl

Was this the client or the server side?  Also what was the socket type 
(STREAM or SEQPACKET)?

-vlad
>
>>
>> Thanks
>> -vlad
>>>
>>>
>>> Thanks,
>>> Karl
>>>
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (15 preceding siblings ...)
  2013-03-08 16:42 ` Vlad Yasevich
@ 2013-03-08 17:06 ` Karl Heiss
  2013-03-09 20:19 ` Karl Heiss
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-08 17:06 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>
>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>> wrote:
>>>
>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>
>>>>
>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>>>> small window in which the association is still valid but
>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>
>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>
>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>
>>>>>>>>        sctp: sctp_close: fix release of bindings for deferred
>>>>>>>> call_rcu's
>>>>>>>>
>>>>>>>> Full patch is here:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>
>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>>>> test against (aside from in the field!).
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>
>>>>>> Hi Karl
>>>>>>
>>>>>> I think I see the problem now.  The problem happens when the
>>>>>> association
>>>>>> is
>>>>>> destroyed.  We delay removing the association from
>>>>>> the association id pool until all references on the association
>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>> period of time) for an association structure to still exist in
>>>>>> the kernel and still be found via the association id, but that
>>>>>> association
>>>>>> has no transports and is about to be completely destroyed.
>>>>>>
>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>> there on purpose or not?
>>>>>>
>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>
>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>> index b907073..2d92c89 100644
>>>>>> --- a/net/sctp/socket.c
>>>>>> +++ b/net/sctp/socket.c
>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>> *sk,
>>>>>> sctp_assoc_t id)
>>>>>>                   if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>                           asoc >>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>                                             struct sctp_association,
>>>>>> asocs);
>>>>>> -               return asoc;
>>>>>> +               goto done;
>>>>>>           }
>>>>>>
>>>>>>           /* Otherwise this is a UDP-style socket. */
>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>> *sk,
>>>>>> sctp_assoc_t id)
>>>>>>           asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>>>> (int)id);
>>>>>>           spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>
>>>>>> +done:
>>>>>>           if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>                   return NULL;
>>>>>>
>>>>>
>>>>> Vlad,
>>>>>
>>>>> Looking at the kdump from the panic, I am seeing that your patch above
>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>>>> sk, this wouldn't appear to fix this issue.
>>>
>>>
>>>
>>> Hm..  If the association is not marked "dead", it should still have all
>>> its
>>> transports present.  If you look at the peer.transport_addr_list in
>>> you kdump, is that list empty or not?
>>>
>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>
>>
>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>> $14 = {
>>    rwnd = 65535,
>>    transport_addr_list = {
>>      next = 0xffff8107670e3180,
>>      prev = 0xffff8107670e3180
>>    },
>>    transport_count = 0,
>>    port = 3868,
>>    primary_path = 0x0,
>>    primary_addr = {
>>      v4 = {
>>        sin_family = 0,
>>        sin_port = 0,
>>        sin_addr = {
>>          s_addr = 0
>>        },
>>        __pad = "\000\000\000\000\000\000\000"
>>      },
>>      v6 = {
>>        sin6_family = 0,
>>        sin6_port = 0,
>>        sin6_flowinfo = 0,
>>        sin6_addr = {
>>          in6_u = {
>>            u6_addr8 >> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>            u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>            u6_addr32 = {0, 0, 0, 0}
>>          }
>>        },
>>        sin6_scope_id = 0
>>      },
>>      sa = {
>>        sa_family = 0,
>>        sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>      }
>>    },
>>    active_path = 0x0,
>>    retran_path = 0x0,
>>    last_sent_to = 0x0,
>>    last_data_from = 0x0,
>>    tsn_map = {
>>      tsn_map = 0x0,
>>      base_tsn = 0,
>>      cumulative_tsn_ack_point = 0,
>>      max_tsn_seen = 0,
>>      len = 0,
>>      pending_data = 0,
>>      num_dup_tsns = 0,
>>      dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>    },
>>    sack_needed = 1 '\001',
>>    sack_cnt = 0,
>>    ecn_capable = 0 '\0',
>>    ipv4_address = 1 '\001',
>>    ipv6_address = 0 '\0',
>>    hostname_address = 0 '\0',
>>    asconf_capable = 0 '\0',
>>    prsctp_capable = 0 '\0',
>>    auth_capable = 0 '\0',
>>    adaptation_ind = 0,
>>    addip_disabled_mask = 0,
>>    i = {
>>      init_tag = 0,
>>      a_rwnd = 0,
>>      num_outbound_streams = 0,
>>      num_inbound_streams = 0,
>>      initial_tsn = 0
>>    },
>>    cookie_len = 0,
>>    cookie = 0x0,
>>    addip_serial = 0,
>>    peer_random = 0x0,
>>    peer_chunks = 0x0,
>>    peer_hmacs = 0x0
>> }
>>
>>>
>>>>>
>>>>> Karl
>>>>
>>>>
>>>>
>>>> Vlad,
>>>>
>>>> One other thing, with the difficulty we are having recreating this
>>>> issue, is there any generic way to increase the likelihood for the
>>>> transport to be cleared out while delaying the association cleanup?
>>>> Is there any way that the association is initialized without any
>>>> transport information?
>>>
>>>
>>>
>>> When the association is initialized, the lists are empty, but the next
>>> thing that happens is that we add transport of the destination we are
>>> sending to or receiving from to the association and mark it as primary
>>> and
>>> active.  All this happens under a socket lock, so getsockopt can't
>>> access the association until all actions on that association complete.
>>>
>>>
>>>> The reason I ask; we believe the issue is
>>>> happening very shortly after the association is brought up (we bring
>>>> it up and then do the getsockopt()).
>>>
>>>
>>>
>>> Can you check what the association state is?  Alternately, can you
>>> provide
>>> the kdump and the kernel so I can dig around.
>>
>>
>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>> $15 = SCTP_STATE_CLOSED
>
>
>
> Hi Karl
>
> Was this the client or the server side?  Also what was the socket type
> (STREAM or SEQPACKET)?
>
> -vlad
>>
>>
>>>
>>> Thanks
>>> -vlad
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>
>

We believe this is occurring on the client side (still working on
confirming, this system is a Diameter router so we get connections
going in both directions).  The connections are all STREAM.  We are
also seeing ABORTs fairly regularly on the connections in suspect.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (16 preceding siblings ...)
  2013-03-08 17:06 ` Karl Heiss
@ 2013-03-09 20:19 ` Karl Heiss
  2013-03-11 21:59 ` Vlad Yasevich
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-09 20:19 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>
>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>> wrote:
>>>>
>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>
>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>
>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>
>>>>>>>>>        sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>> call_rcu's
>>>>>>>>>
>>>>>>>>> Full patch is here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>
>>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>>
>>>>>>>>> -vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>>>>> test against (aside from in the field!).
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>> association
>>>>>>> is
>>>>>>> destroyed.  We delay removing the association from
>>>>>>> the association id pool until all references on the association
>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>> period of time) for an association structure to still exist in
>>>>>>> the kernel and still be found via the association id, but that
>>>>>>> association
>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>
>>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>>> there on purpose or not?
>>>>>>>
>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>
>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>> index b907073..2d92c89 100644
>>>>>>> --- a/net/sctp/socket.c
>>>>>>> +++ b/net/sctp/socket.c
>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>>> *sk,
>>>>>>> sctp_assoc_t id)
>>>>>>>                   if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>                           asoc >>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>                                             struct sctp_association,
>>>>>>> asocs);
>>>>>>> -               return asoc;
>>>>>>> +               goto done;
>>>>>>>           }
>>>>>>>
>>>>>>>           /* Otherwise this is a UDP-style socket. */
>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>>> *sk,
>>>>>>> sctp_assoc_t id)
>>>>>>>           asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>>>>> (int)id);
>>>>>>>           spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>
>>>>>>> +done:
>>>>>>>           if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>>                   return NULL;
>>>>>>>
>>>>>>
>>>>>> Vlad,
>>>>>>
>>>>>> Looking at the kdump from the panic, I am seeing that your patch above
>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>>>>> sk, this wouldn't appear to fix this issue.
>>>>
>>>>
>>>>
>>>> Hm..  If the association is not marked "dead", it should still have all
>>>> its
>>>> transports present.  If you look at the peer.transport_addr_list in
>>>> you kdump, is that list empty or not?
>>>>
>>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>>
>>>
>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>> $14 = {
>>>    rwnd = 65535,
>>>    transport_addr_list = {
>>>      next = 0xffff8107670e3180,
>>>      prev = 0xffff8107670e3180
>>>    },
>>>    transport_count = 0,
>>>    port = 3868,
>>>    primary_path = 0x0,
>>>    primary_addr = {
>>>      v4 = {
>>>        sin_family = 0,
>>>        sin_port = 0,
>>>        sin_addr = {
>>>          s_addr = 0
>>>        },
>>>        __pad = "\000\000\000\000\000\000\000"
>>>      },
>>>      v6 = {
>>>        sin6_family = 0,
>>>        sin6_port = 0,
>>>        sin6_flowinfo = 0,
>>>        sin6_addr = {
>>>          in6_u = {
>>>            u6_addr8 >>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>            u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>            u6_addr32 = {0, 0, 0, 0}
>>>          }
>>>        },
>>>        sin6_scope_id = 0
>>>      },
>>>      sa = {
>>>        sa_family = 0,
>>>        sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>      }
>>>    },
>>>    active_path = 0x0,
>>>    retran_path = 0x0,
>>>    last_sent_to = 0x0,
>>>    last_data_from = 0x0,
>>>    tsn_map = {
>>>      tsn_map = 0x0,
>>>      base_tsn = 0,
>>>      cumulative_tsn_ack_point = 0,
>>>      max_tsn_seen = 0,
>>>      len = 0,
>>>      pending_data = 0,
>>>      num_dup_tsns = 0,
>>>      dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>    },
>>>    sack_needed = 1 '\001',
>>>    sack_cnt = 0,
>>>    ecn_capable = 0 '\0',
>>>    ipv4_address = 1 '\001',
>>>    ipv6_address = 0 '\0',
>>>    hostname_address = 0 '\0',
>>>    asconf_capable = 0 '\0',
>>>    prsctp_capable = 0 '\0',
>>>    auth_capable = 0 '\0',
>>>    adaptation_ind = 0,
>>>    addip_disabled_mask = 0,
>>>    i = {
>>>      init_tag = 0,
>>>      a_rwnd = 0,
>>>      num_outbound_streams = 0,
>>>      num_inbound_streams = 0,
>>>      initial_tsn = 0
>>>    },
>>>    cookie_len = 0,
>>>    cookie = 0x0,
>>>    addip_serial = 0,
>>>    peer_random = 0x0,
>>>    peer_chunks = 0x0,
>>>    peer_hmacs = 0x0
>>> }
>>>
>>>>
>>>>>>
>>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> Vlad,
>>>>>
>>>>> One other thing, with the difficulty we are having recreating this
>>>>> issue, is there any generic way to increase the likelihood for the
>>>>> transport to be cleared out while delaying the association cleanup?
>>>>> Is there any way that the association is initialized without any
>>>>> transport information?
>>>>
>>>>
>>>>
>>>> When the association is initialized, the lists are empty, but the next
>>>> thing that happens is that we add transport of the destination we are
>>>> sending to or receiving from to the association and mark it as primary
>>>> and
>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>> access the association until all actions on that association complete.
>>>>
>>>>
>>>>> The reason I ask; we believe the issue is
>>>>> happening very shortly after the association is brought up (we bring
>>>>> it up and then do the getsockopt()).
>>>>
>>>>
>>>>
>>>> Can you check what the association state is?  Alternately, can you
>>>> provide
>>>> the kdump and the kernel so I can dig around.
>>>
>>>
>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>> $15 = SCTP_STATE_CLOSED
>>
>>
>>
>> Hi Karl
>>
>> Was this the client or the server side?  Also what was the socket type
>> (STREAM or SEQPACKET)?
>>
>> -vlad
>>>
>>>
>>>>
>>>> Thanks
>>>> -vlad
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>
>>
>
> We believe this is occurring on the client side (still working on
> confirming, this system is a Diameter router so we get connections
> going in both directions).  The connections are all STREAM.  We are
> also seeing ABORTs fairly regularly on the connections in suspect.
>
> Karl

So we finally got a capture around the time of the panic.  The
panicing system is acting as a server and the client is connecting,
gets through INIT and COOKIE_ECHO, and sends several data packets when
the client sends another INIT.  At this point, the server handles the
INIT, starts over and it starts sending data packets again when the
server sends an ABORT because the application doesn't support
restarting the connection.  It is around this time that the panic
occurs.  One thing that I noticed is that the sctp_association
structure looks awfully similar to a temporary association that is
created when an unexpected INIT is received, but before it is
populated with peer information. However the temp value is not set to
0 as would be expected.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (17 preceding siblings ...)
  2013-03-09 20:19 ` Karl Heiss
@ 2013-03-11 21:59 ` Vlad Yasevich
  2013-03-11 22:44 ` Karl Heiss
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-11 21:59 UTC (permalink / raw)
  To: linux-sctp

On 03/09/2013 03:19 PM, Karl Heiss wrote:
> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>
>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>
>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>
>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>
>>>>>>>>>>         sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>> call_rcu's
>>>>>>>>>>
>>>>>>>>>> Full patch is here:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>
>>>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>>>
>>>>>>>>>> -vlad
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>> association
>>>>>>>> is
>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>> the association id pool until all references on the association
>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>> association
>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>
>>>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>>>> there on purpose or not?
>>>>>>>>
>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>
>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>> index b907073..2d92c89 100644
>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>>>> *sk,
>>>>>>>> sctp_assoc_t id)
>>>>>>>>                    if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>                            asoc >>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>                                              struct sctp_association,
>>>>>>>> asocs);
>>>>>>>> -               return asoc;
>>>>>>>> +               goto done;
>>>>>>>>            }
>>>>>>>>
>>>>>>>>            /* Otherwise this is a UDP-style socket. */
>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>>>>>> *sk,
>>>>>>>> sctp_assoc_t id)
>>>>>>>>            asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>>>>>> (int)id);
>>>>>>>>            spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>
>>>>>>>> +done:
>>>>>>>>            if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>>>                    return NULL;
>>>>>>>>
>>>>>>>
>>>>>>> Vlad,
>>>>>>>
>>>>>>> Looking at the kdump from the panic, I am seeing that your patch above
>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>
>>>>>
>>>>>
>>>>> Hm..  If the association is not marked "dead", it should still have all
>>>>> its
>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>> you kdump, is that list empty or not?
>>>>>
>>>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>>>
>>>>
>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>> $14 = {
>>>>     rwnd = 65535,
>>>>     transport_addr_list = {
>>>>       next = 0xffff8107670e3180,
>>>>       prev = 0xffff8107670e3180
>>>>     },
>>>>     transport_count = 0,
>>>>     port = 3868,
>>>>     primary_path = 0x0,
>>>>     primary_addr = {
>>>>       v4 = {
>>>>         sin_family = 0,
>>>>         sin_port = 0,
>>>>         sin_addr = {
>>>>           s_addr = 0
>>>>         },
>>>>         __pad = "\000\000\000\000\000\000\000"
>>>>       },
>>>>       v6 = {
>>>>         sin6_family = 0,
>>>>         sin6_port = 0,
>>>>         sin6_flowinfo = 0,
>>>>         sin6_addr = {
>>>>           in6_u = {
>>>>             u6_addr8 >>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>             u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>             u6_addr32 = {0, 0, 0, 0}
>>>>           }
>>>>         },
>>>>         sin6_scope_id = 0
>>>>       },
>>>>       sa = {
>>>>         sa_family = 0,
>>>>         sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>       }
>>>>     },
>>>>     active_path = 0x0,
>>>>     retran_path = 0x0,
>>>>     last_sent_to = 0x0,
>>>>     last_data_from = 0x0,
>>>>     tsn_map = {
>>>>       tsn_map = 0x0,
>>>>       base_tsn = 0,
>>>>       cumulative_tsn_ack_point = 0,
>>>>       max_tsn_seen = 0,
>>>>       len = 0,
>>>>       pending_data = 0,
>>>>       num_dup_tsns = 0,
>>>>       dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>     },
>>>>     sack_needed = 1 '\001',
>>>>     sack_cnt = 0,
>>>>     ecn_capable = 0 '\0',
>>>>     ipv4_address = 1 '\001',
>>>>     ipv6_address = 0 '\0',
>>>>     hostname_address = 0 '\0',
>>>>     asconf_capable = 0 '\0',
>>>>     prsctp_capable = 0 '\0',
>>>>     auth_capable = 0 '\0',
>>>>     adaptation_ind = 0,
>>>>     addip_disabled_mask = 0,
>>>>     i = {
>>>>       init_tag = 0,
>>>>       a_rwnd = 0,
>>>>       num_outbound_streams = 0,
>>>>       num_inbound_streams = 0,
>>>>       initial_tsn = 0
>>>>     },
>>>>     cookie_len = 0,
>>>>     cookie = 0x0,
>>>>     addip_serial = 0,
>>>>     peer_random = 0x0,
>>>>     peer_chunks = 0x0,
>>>>     peer_hmacs = 0x0
>>>> }
>>>>
>>>>>
>>>>>>>
>>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> Vlad,
>>>>>>
>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>> Is there any way that the association is initialized without any
>>>>>> transport information?
>>>>>
>>>>>
>>>>>
>>>>> When the association is initialized, the lists are empty, but the next
>>>>> thing that happens is that we add transport of the destination we are
>>>>> sending to or receiving from to the association and mark it as primary
>>>>> and
>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>> access the association until all actions on that association complete.
>>>>>
>>>>>
>>>>>> The reason I ask; we believe the issue is
>>>>>> happening very shortly after the association is brought up (we bring
>>>>>> it up and then do the getsockopt()).
>>>>>
>>>>>
>>>>>
>>>>> Can you check what the association state is?  Alternately, can you
>>>>> provide
>>>>> the kdump and the kernel so I can dig around.
>>>>
>>>>
>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>> $15 = SCTP_STATE_CLOSED
>>>
>>>
>>>
>>> Hi Karl
>>>
>>> Was this the client or the server side?  Also what was the socket type
>>> (STREAM or SEQPACKET)?
>>>
>>> -vlad
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> -vlad
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>
>>>
>>
>> We believe this is occurring on the client side (still working on
>> confirming, this system is a Diameter router so we get connections
>> going in both directions).  The connections are all STREAM.  We are
>> also seeing ABORTs fairly regularly on the connections in suspect.
>>
>> Karl
>
> So we finally got a capture around the time of the panic.  The
> panicing system is acting as a server and the client is connecting,
> gets through INIT and COOKIE_ECHO, and sends several data packets when
> the client sends another INIT.  At this point, the server handles the
> INIT, starts over and it starts sending data packets again when the
> server sends an ABORT because the application doesn't support
> restarting the connection.

Do you know if this is done through SO_LINGER or with sendmsg and MSG_ABORT?

-vlad

>  It is around this time that the panic
> occurs.  One thing that I noticed is that the sctp_association
> structure looks awfully similar to a temporary association that is
> created when an unexpected INIT is received, but before it is
> populated with peer information. However the temp value is not set to
> 0 as would be expected.
>
> Karl
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (18 preceding siblings ...)
  2013-03-11 21:59 ` Vlad Yasevich
@ 2013-03-11 22:44 ` Karl Heiss
  2013-03-11 23:10 ` Vlad Yasevich
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-11 22:44 UTC (permalink / raw)
  To: linux-sctp

On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>
>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>
>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>> wrote:
>>>>
>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>> closed
>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within
>>>>>>>>>>>> a
>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>> the
>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>
>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>
>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>
>>>>>>>>>>>         sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>> call_rcu's
>>>>>>>>>>>
>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>
>>>>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>>>>
>>>>>>>>>>> -vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>> stack
>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>> to
>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Karl
>>>>>>>>>
>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>> association
>>>>>>>>> is
>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>> the association id pool until all references on the association
>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>> association
>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>
>>>>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>>>>> there on purpose or not?
>>>>>>>>>
>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>
>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>> sock
>>>>>>>>> *sk,
>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>                    if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>                            asoc >>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>                                              struct
>>>>>>>>> sctp_association,
>>>>>>>>> asocs);
>>>>>>>>> -               return asoc;
>>>>>>>>> +               goto done;
>>>>>>>>>            }
>>>>>>>>>
>>>>>>>>>            /* Otherwise this is a UDP-style socket. */
>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>> sock
>>>>>>>>> *sk,
>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>            asoc = (struct sctp_association
>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>> (int)id);
>>>>>>>>>            spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>
>>>>>>>>> +done:
>>>>>>>>>            if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>>>>                    return NULL;
>>>>>>>>>
>>>>>>>>
>>>>>>>> Vlad,
>>>>>>>>
>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>> above
>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>> match
>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>> all
>>>>>> its
>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>> you kdump, is that list empty or not?
>>>>>>
>>>>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>>>>
>>>>>
>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>> $14 = {
>>>>>     rwnd = 65535,
>>>>>     transport_addr_list = {
>>>>>       next = 0xffff8107670e3180,
>>>>>       prev = 0xffff8107670e3180
>>>>>     },
>>>>>     transport_count = 0,
>>>>>     port = 3868,
>>>>>     primary_path = 0x0,
>>>>>     primary_addr = {
>>>>>       v4 = {
>>>>>         sin_family = 0,
>>>>>         sin_port = 0,
>>>>>         sin_addr = {
>>>>>           s_addr = 0
>>>>>         },
>>>>>         __pad = "\000\000\000\000\000\000\000"
>>>>>       },
>>>>>       v6 = {
>>>>>         sin6_family = 0,
>>>>>         sin6_port = 0,
>>>>>         sin6_flowinfo = 0,
>>>>>         sin6_addr = {
>>>>>           in6_u = {
>>>>>             u6_addr8 >>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>             u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>             u6_addr32 = {0, 0, 0, 0}
>>>>>           }
>>>>>         },
>>>>>         sin6_scope_id = 0
>>>>>       },
>>>>>       sa = {
>>>>>         sa_family = 0,
>>>>>         sa_data >>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>       }
>>>>>     },
>>>>>     active_path = 0x0,
>>>>>     retran_path = 0x0,
>>>>>     last_sent_to = 0x0,
>>>>>     last_data_from = 0x0,
>>>>>     tsn_map = {
>>>>>       tsn_map = 0x0,
>>>>>       base_tsn = 0,
>>>>>       cumulative_tsn_ack_point = 0,
>>>>>       max_tsn_seen = 0,
>>>>>       len = 0,
>>>>>       pending_data = 0,
>>>>>       num_dup_tsns = 0,
>>>>>       dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>     },
>>>>>     sack_needed = 1 '\001',
>>>>>     sack_cnt = 0,
>>>>>     ecn_capable = 0 '\0',
>>>>>     ipv4_address = 1 '\001',
>>>>>     ipv6_address = 0 '\0',
>>>>>     hostname_address = 0 '\0',
>>>>>     asconf_capable = 0 '\0',
>>>>>     prsctp_capable = 0 '\0',
>>>>>     auth_capable = 0 '\0',
>>>>>     adaptation_ind = 0,
>>>>>     addip_disabled_mask = 0,
>>>>>     i = {
>>>>>       init_tag = 0,
>>>>>       a_rwnd = 0,
>>>>>       num_outbound_streams = 0,
>>>>>       num_inbound_streams = 0,
>>>>>       initial_tsn = 0
>>>>>     },
>>>>>     cookie_len = 0,
>>>>>     cookie = 0x0,
>>>>>     addip_serial = 0,
>>>>>     peer_random = 0x0,
>>>>>     peer_chunks = 0x0,
>>>>>     peer_hmacs = 0x0
>>>>> }
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Vlad,
>>>>>>>
>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>> Is there any way that the association is initialized without any
>>>>>>> transport information?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> When the association is initialized, the lists are empty, but the next
>>>>>> thing that happens is that we add transport of the destination we are
>>>>>> sending to or receiving from to the association and mark it as primary
>>>>>> and
>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>> access the association until all actions on that association complete.
>>>>>>
>>>>>>
>>>>>>> The reason I ask; we believe the issue is
>>>>>>> happening very shortly after the association is brought up (we bring
>>>>>>> it up and then do the getsockopt()).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>> provide
>>>>>> the kdump and the kernel so I can dig around.
>>>>>
>>>>>
>>>>>
>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>> $15 = SCTP_STATE_CLOSED
>>>>
>>>>
>>>>
>>>>
>>>> Hi Karl
>>>>
>>>> Was this the client or the server side?  Also what was the socket type
>>>> (STREAM or SEQPACKET)?
>>>>
>>>> -vlad
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> -vlad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>
>>>>
>>>
>>> We believe this is occurring on the client side (still working on
>>> confirming, this system is a Diameter router so we get connections
>>> going in both directions).  The connections are all STREAM.  We are
>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>
>>> Karl
>>
>>
>> So we finally got a capture around the time of the panic.  The
>> panicing system is acting as a server and the client is connecting,
>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>> the client sends another INIT.  At this point, the server handles the
>> INIT, starts over and it starts sending data packets again when the
>> server sends an ABORT because the application doesn't support
>> restarting the connection.
>
>
> Do you know if this is done through SO_LINGER or with sendmsg and MSG_ABORT?
>
> -vlad
>
>
>>  It is around this time that the panic
>> occurs.  One thing that I noticed is that the sctp_association
>> structure looks awfully similar to a temporary association that is
>> created when an unexpected INIT is received, but before it is
>> populated with peer information. However the temp value is not set to
>> 0 as would be expected.
>>
>> Karl
>>
>

This is done with SO_LINGER.  However, we just were able to reproduce the issue.

When a duplicate cookie-echo message is received, and the
sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
creating a full-fledged association instead of one with "temp" set.
Now, if we enter collision case, the primary path does not get written
in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
since the association does not have "temp" marked, it gets added to
the association hash table and the endpoint. Even when the command
SCTP_CMD_DELETE_TCB is processed, since the association is not
temporary, the following check in sctp_cmd_delete_tcb() prevents the
association from being deleted from the hash table or the endpoint.

                if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
                    (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
                                return;

                sctp_unhash_established(asoc);
     <<< never reached
                sctp_association_free(asoc);
           <<< never reached

When we duplicate the traffic using netem, we are able to get this to
occur when getsockopt(SCTP_STATUS) is called due to the transport
being NULL.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (19 preceding siblings ...)
  2013-03-11 22:44 ` Karl Heiss
@ 2013-03-11 23:10 ` Vlad Yasevich
  2013-03-12  1:05 ` Karl Heiss
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-11 23:10 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 12856 bytes --]

On 03/11/2013 06:44 PM, Karl Heiss wrote:
> On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>>
>>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>
>>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>>> closed
>>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within
>>>>>>>>>>>>> a
>>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>>> the
>>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>>
>>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>>
>>>>>>>>>>>>          sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>>> call_rcu's
>>>>>>>>>>>>
>>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>
>>>>>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>>>>>
>>>>>>>>>>>> -vlad
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>>> stack
>>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>>> to
>>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>>> association
>>>>>>>>>> is
>>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>>> the association id pool until all references on the association
>>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>>> association
>>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>>
>>>>>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>>>>>> there on purpose or not?
>>>>>>>>>>
>>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>>
>>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>> sock
>>>>>>>>>> *sk,
>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>                     if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>>                             asoc =
>>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>>                                               struct
>>>>>>>>>> sctp_association,
>>>>>>>>>> asocs);
>>>>>>>>>> -               return asoc;
>>>>>>>>>> +               goto done;
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>>             /* Otherwise this is a UDP-style socket. */
>>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>> sock
>>>>>>>>>> *sk,
>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>             asoc = (struct sctp_association
>>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>>> (int)id);
>>>>>>>>>>             spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>>
>>>>>>>>>> +done:
>>>>>>>>>>             if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>>>>>                     return NULL;
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vlad,
>>>>>>>>>
>>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>>> above
>>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>>> match
>>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>>> all
>>>>>>> its
>>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>>> you kdump, is that list empty or not?
>>>>>>>
>>>>>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>>>>>
>>>>>>
>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>>> $14 = {
>>>>>>      rwnd = 65535,
>>>>>>      transport_addr_list = {
>>>>>>        next = 0xffff8107670e3180,
>>>>>>        prev = 0xffff8107670e3180
>>>>>>      },
>>>>>>      transport_count = 0,
>>>>>>      port = 3868,
>>>>>>      primary_path = 0x0,
>>>>>>      primary_addr = {
>>>>>>        v4 = {
>>>>>>          sin_family = 0,
>>>>>>          sin_port = 0,
>>>>>>          sin_addr = {
>>>>>>            s_addr = 0
>>>>>>          },
>>>>>>          __pad = "\000\000\000\000\000\000\000"
>>>>>>        },
>>>>>>        v6 = {
>>>>>>          sin6_family = 0,
>>>>>>          sin6_port = 0,
>>>>>>          sin6_flowinfo = 0,
>>>>>>          sin6_addr = {
>>>>>>            in6_u = {
>>>>>>              u6_addr8 =
>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>>              u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>>              u6_addr32 = {0, 0, 0, 0}
>>>>>>            }
>>>>>>          },
>>>>>>          sin6_scope_id = 0
>>>>>>        },
>>>>>>        sa = {
>>>>>>          sa_family = 0,
>>>>>>          sa_data =
>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>>        }
>>>>>>      },
>>>>>>      active_path = 0x0,
>>>>>>      retran_path = 0x0,
>>>>>>      last_sent_to = 0x0,
>>>>>>      last_data_from = 0x0,
>>>>>>      tsn_map = {
>>>>>>        tsn_map = 0x0,
>>>>>>        base_tsn = 0,
>>>>>>        cumulative_tsn_ack_point = 0,
>>>>>>        max_tsn_seen = 0,
>>>>>>        len = 0,
>>>>>>        pending_data = 0,
>>>>>>        num_dup_tsns = 0,
>>>>>>        dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>>      },
>>>>>>      sack_needed = 1 '\001',
>>>>>>      sack_cnt = 0,
>>>>>>      ecn_capable = 0 '\0',
>>>>>>      ipv4_address = 1 '\001',
>>>>>>      ipv6_address = 0 '\0',
>>>>>>      hostname_address = 0 '\0',
>>>>>>      asconf_capable = 0 '\0',
>>>>>>      prsctp_capable = 0 '\0',
>>>>>>      auth_capable = 0 '\0',
>>>>>>      adaptation_ind = 0,
>>>>>>      addip_disabled_mask = 0,
>>>>>>      i = {
>>>>>>        init_tag = 0,
>>>>>>        a_rwnd = 0,
>>>>>>        num_outbound_streams = 0,
>>>>>>        num_inbound_streams = 0,
>>>>>>        initial_tsn = 0
>>>>>>      },
>>>>>>      cookie_len = 0,
>>>>>>      cookie = 0x0,
>>>>>>      addip_serial = 0,
>>>>>>      peer_random = 0x0,
>>>>>>      peer_chunks = 0x0,
>>>>>>      peer_hmacs = 0x0
>>>>>> }
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Vlad,
>>>>>>>>
>>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>>> Is there any way that the association is initialized without any
>>>>>>>> transport information?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> When the association is initialized, the lists are empty, but the next
>>>>>>> thing that happens is that we add transport of the destination we are
>>>>>>> sending to or receiving from to the association and mark it as primary
>>>>>>> and
>>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>>> access the association until all actions on that association complete.
>>>>>>>
>>>>>>>
>>>>>>>> The reason I ask; we believe the issue is
>>>>>>>> happening very shortly after the association is brought up (we bring
>>>>>>>> it up and then do the getsockopt()).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>>> provide
>>>>>>> the kdump and the kernel so I can dig around.
>>>>>>
>>>>>>
>>>>>>
>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>>> $15 = SCTP_STATE_CLOSED
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Karl
>>>>>
>>>>> Was this the client or the server side?  Also what was the socket type
>>>>> (STREAM or SEQPACKET)?
>>>>>
>>>>> -vlad
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>> We believe this is occurring on the client side (still working on
>>>> confirming, this system is a Diameter router so we get connections
>>>> going in both directions).  The connections are all STREAM.  We are
>>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>>
>>>> Karl
>>>
>>>
>>> So we finally got a capture around the time of the panic.  The
>>> panicing system is acting as a server and the client is connecting,
>>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>>> the client sends another INIT.  At this point, the server handles the
>>> INIT, starts over and it starts sending data packets again when the
>>> server sends an ABORT because the application doesn't support
>>> restarting the connection.
>>
>>
>> Do you know if this is done through SO_LINGER or with sendmsg and MSG_ABORT?
>>
>> -vlad
>>
>>
>>>   It is around this time that the panic
>>> occurs.  One thing that I noticed is that the sctp_association
>>> structure looks awfully similar to a temporary association that is
>>> created when an unexpected INIT is received, but before it is
>>> populated with peer information. However the temp value is not set to
>>> 0 as would be expected.
>>>
>>> Karl
>>>
>>
>
> This is done with SO_LINGER.  However, we just were able to reproduce the issue.
>
> When a duplicate cookie-echo message is received, and the
> sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
> sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
> creating a full-fledged association instead of one with "temp" set.
> Now, if we enter collision case, the primary path does not get written
> in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
> since the association does not have "temp" marked, it gets added to
> the association hash table and the endpoint. Even when the command
> SCTP_CMD_DELETE_TCB is processed, since the association is not
> temporary, the following check in sctp_cmd_delete_tcb() prevents the
> association from being deleted from the hash table or the endpoint.
>
>                  if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
>                      (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
>                                  return;
>
>                  sctp_unhash_established(asoc);
>       <<< never reached
>                  sctp_association_free(asoc);
>             <<< never reached
>
> When we duplicate the traffic using netem, we are able to get this to
> occur when getsockopt(SCTP_STATUS) is called due to the transport
> being NULL.
>
> Karl

Hi Karl

Yep, this is the code I've been looking at as well, just didn't get far
enough.  I was focusing the dookcook_a case().

I'm attaching a patch (untested) that should fix this.

-vlad
>


[-- Attachment #2: test.patch --]
[-- Type: text/x-patch, Size: 557 bytes --]

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 5131fcf..de1a013 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2082,7 +2082,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(struct net *net,
 	}
 
 	/* Delete the tempory new association. */
-	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_ASOC, SCTP_ASOC(new_asoc));
+	sctp_add_cmd_sf(commands, SCTP_CMD_SET_ASOC, SCTP_ASOC(new_asoc));
 	sctp_add_cmd_sf(commands, SCTP_CMD_DELETE_TCB, SCTP_NULL());
 
 	/* Restore association pointer to provide SCTP command interpeter

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (20 preceding siblings ...)
  2013-03-11 23:10 ` Vlad Yasevich
@ 2013-03-12  1:05 ` Karl Heiss
  2013-03-12 16:18 ` Karl Heiss
  2013-03-12 17:23 ` Vlad Yasevich
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-12  1:05 UTC (permalink / raw)
  To: linux-sctp

On Mon, Mar 11, 2013 at 7:10 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 03/11/2013 06:44 PM, Karl Heiss wrote:
>>
>> On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com>
>> wrote:
>>>
>>> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>>>
>>>>
>>>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich
>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>>>> closed
>>>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called
>>>>>>>>>>>>>> within
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>>>> the
>>>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>>>
>>>>>>>>>>>>>          sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>>>> call_rcu's
>>>>>>>>>>>>>
>>>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Make sure that you have this patch in the kernel you are
>>>>>>>>>>>>> running
>>>>>>>>>>>>>
>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>>>> stack
>>>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>>>> simply swapping the order of the instructions without moving
>>>>>>>>>>>> them
>>>>>>>>>>>> would have any effect?  I ask this hypothetically because the
>>>>>>>>>>>> race
>>>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>>>> to
>>>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Karl
>>>>>>>>>>>
>>>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>>>> association
>>>>>>>>>>> is
>>>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>>>> the association id pool until all references on the association
>>>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>>>> association
>>>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>>>
>>>>>>>>>>> This is a really interesting race and I need to figure out if it
>>>>>>>>>>> is
>>>>>>>>>>> there on purpose or not?
>>>>>>>>>>>
>>>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>> sock
>>>>>>>>>>> *sk,
>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>                     if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>>>                             asoc >>>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>>>                                               struct
>>>>>>>>>>> sctp_association,
>>>>>>>>>>> asocs);
>>>>>>>>>>> -               return asoc;
>>>>>>>>>>> +               goto done;
>>>>>>>>>>>             }
>>>>>>>>>>>
>>>>>>>>>>>             /* Otherwise this is a UDP-style socket. */
>>>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>> sock
>>>>>>>>>>> *sk,
>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>             asoc = (struct sctp_association
>>>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>>>> (int)id);
>>>>>>>>>>>             spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>>>
>>>>>>>>>>> +done:
>>>>>>>>>>>             if (!asoc || (asoc->base.sk != sk) ||
>>>>>>>>>>> asoc->base.dead)
>>>>>>>>>>>                     return NULL;
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vlad,
>>>>>>>>>>
>>>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>>>> above
>>>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>>>> match
>>>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>>>> all
>>>>>>>> its
>>>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>>>> you kdump, is that list empty or not?
>>>>>>>>
>>>>>>>> Are any other peer transport pointers set (active_path,
>>>>>>>> retran_path)?
>>>>>>>>
>>>>>>>
>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>>>> $14 = {
>>>>>>>      rwnd = 65535,
>>>>>>>      transport_addr_list = {
>>>>>>>        next = 0xffff8107670e3180,
>>>>>>>        prev = 0xffff8107670e3180
>>>>>>>      },
>>>>>>>      transport_count = 0,
>>>>>>>      port = 3868,
>>>>>>>      primary_path = 0x0,
>>>>>>>      primary_addr = {
>>>>>>>        v4 = {
>>>>>>>          sin_family = 0,
>>>>>>>          sin_port = 0,
>>>>>>>          sin_addr = {
>>>>>>>            s_addr = 0
>>>>>>>          },
>>>>>>>          __pad = "\000\000\000\000\000\000\000"
>>>>>>>        },
>>>>>>>        v6 = {
>>>>>>>          sin6_family = 0,
>>>>>>>          sin6_port = 0,
>>>>>>>          sin6_flowinfo = 0,
>>>>>>>          sin6_addr = {
>>>>>>>            in6_u = {
>>>>>>>              u6_addr8 >>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>>>              u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>>>              u6_addr32 = {0, 0, 0, 0}
>>>>>>>            }
>>>>>>>          },
>>>>>>>          sin6_scope_id = 0
>>>>>>>        },
>>>>>>>        sa = {
>>>>>>>          sa_family = 0,
>>>>>>>          sa_data >>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>>>        }
>>>>>>>      },
>>>>>>>      active_path = 0x0,
>>>>>>>      retran_path = 0x0,
>>>>>>>      last_sent_to = 0x0,
>>>>>>>      last_data_from = 0x0,
>>>>>>>      tsn_map = {
>>>>>>>        tsn_map = 0x0,
>>>>>>>        base_tsn = 0,
>>>>>>>        cumulative_tsn_ack_point = 0,
>>>>>>>        max_tsn_seen = 0,
>>>>>>>        len = 0,
>>>>>>>        pending_data = 0,
>>>>>>>        num_dup_tsns = 0,
>>>>>>>        dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>>>      },
>>>>>>>      sack_needed = 1 '\001',
>>>>>>>      sack_cnt = 0,
>>>>>>>      ecn_capable = 0 '\0',
>>>>>>>      ipv4_address = 1 '\001',
>>>>>>>      ipv6_address = 0 '\0',
>>>>>>>      hostname_address = 0 '\0',
>>>>>>>      asconf_capable = 0 '\0',
>>>>>>>      prsctp_capable = 0 '\0',
>>>>>>>      auth_capable = 0 '\0',
>>>>>>>      adaptation_ind = 0,
>>>>>>>      addip_disabled_mask = 0,
>>>>>>>      i = {
>>>>>>>        init_tag = 0,
>>>>>>>        a_rwnd = 0,
>>>>>>>        num_outbound_streams = 0,
>>>>>>>        num_inbound_streams = 0,
>>>>>>>        initial_tsn = 0
>>>>>>>      },
>>>>>>>      cookie_len = 0,
>>>>>>>      cookie = 0x0,
>>>>>>>      addip_serial = 0,
>>>>>>>      peer_random = 0x0,
>>>>>>>      peer_chunks = 0x0,
>>>>>>>      peer_hmacs = 0x0
>>>>>>> }
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vlad,
>>>>>>>>>
>>>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>>>> Is there any way that the association is initialized without any
>>>>>>>>> transport information?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When the association is initialized, the lists are empty, but the
>>>>>>>> next
>>>>>>>> thing that happens is that we add transport of the destination we
>>>>>>>> are
>>>>>>>> sending to or receiving from to the association and mark it as
>>>>>>>> primary
>>>>>>>> and
>>>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>>>> access the association until all actions on that association
>>>>>>>> complete.
>>>>>>>>
>>>>>>>>
>>>>>>>>> The reason I ask; we believe the issue is
>>>>>>>>> happening very shortly after the association is brought up (we
>>>>>>>>> bring
>>>>>>>>> it up and then do the getsockopt()).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>>>> provide
>>>>>>>> the kdump and the kernel so I can dig around.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>>>> $15 = SCTP_STATE_CLOSED
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Karl
>>>>>>
>>>>>> Was this the client or the server side?  Also what was the socket type
>>>>>> (STREAM or SEQPACKET)?
>>>>>>
>>>>>> -vlad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> -vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>> We believe this is occurring on the client side (still working on
>>>>> confirming, this system is a Diameter router so we get connections
>>>>> going in both directions).  The connections are all STREAM.  We are
>>>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>>>
>>>>> Karl
>>>>
>>>>
>>>>
>>>> So we finally got a capture around the time of the panic.  The
>>>> panicing system is acting as a server and the client is connecting,
>>>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>>>> the client sends another INIT.  At this point, the server handles the
>>>> INIT, starts over and it starts sending data packets again when the
>>>> server sends an ABORT because the application doesn't support
>>>> restarting the connection.
>>>
>>>
>>>
>>> Do you know if this is done through SO_LINGER or with sendmsg and
>>> MSG_ABORT?
>>>
>>> -vlad
>>>
>>>
>>>>   It is around this time that the panic
>>>> occurs.  One thing that I noticed is that the sctp_association
>>>> structure looks awfully similar to a temporary association that is
>>>> created when an unexpected INIT is received, but before it is
>>>> populated with peer information. However the temp value is not set to
>>>> 0 as would be expected.
>>>>
>>>> Karl
>>>>
>>>
>>
>> This is done with SO_LINGER.  However, we just were able to reproduce the
>> issue.
>>
>> When a duplicate cookie-echo message is received, and the
>> sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
>> sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
>> creating a full-fledged association instead of one with "temp" set.
>> Now, if we enter collision case, the primary path does not get written
>> in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
>> since the association does not have "temp" marked, it gets added to
>> the association hash table and the endpoint. Even when the command
>> SCTP_CMD_DELETE_TCB is processed, since the association is not
>> temporary, the following check in sctp_cmd_delete_tcb() prevents the
>> association from being deleted from the hash table or the endpoint.
>>
>>                  if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
>>                      (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
>>                                  return;
>>
>>                  sctp_unhash_established(asoc);
>>       <<< never reached
>>                  sctp_association_free(asoc);
>>             <<< never reached
>>
>> When we duplicate the traffic using netem, we are able to get this to
>> occur when getsockopt(SCTP_STATUS) is called due to the transport
>> being NULL.
>>
>> Karl
>
>
> Hi Karl
>
> Yep, this is the code I've been looking at as well, just didn't get far
> enough.  I was focusing the dookcook_a case().
>
> I'm attaching a patch (untested) that should fix this.
>
> -vlad
>>
>>
>
Vlad,

That looks promising, however SCTP_CMD_SET_ASOC doesn't exist in this
(2.6.36.2) SCTP stack.  I will look into backporting this side effect
state or finding an alternate way of preventing the association from
being added to the endpoint.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (21 preceding siblings ...)
  2013-03-12  1:05 ` Karl Heiss
@ 2013-03-12 16:18 ` Karl Heiss
  2013-03-12 17:23 ` Vlad Yasevich
  23 siblings, 0 replies; 25+ messages in thread
From: Karl Heiss @ 2013-03-12 16:18 UTC (permalink / raw)
  To: linux-sctp

On Mon, Mar 11, 2013 at 9:05 PM, Karl Heiss <kheiss@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 7:10 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/11/2013 06:44 PM, Karl Heiss wrote:
>>>
>>> On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com>
>>> wrote:
>>>>
>>>> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich
>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>>>>> closed
>>>>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called
>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>>>>> call_rcu's
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Make sure that you have this patch in the kernel you are
>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>>>>> stack
>>>>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>>>>> simply swapping the order of the instructions without moving
>>>>>>>>>>>>> them
>>>>>>>>>>>>> would have any effect?  I ask this hypothetically because the
>>>>>>>>>>>>> race
>>>>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>>>>> to
>>>>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>
>>>>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>>>>> association
>>>>>>>>>>>> is
>>>>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>>>>> the association id pool until all references on the association
>>>>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>>>>> association
>>>>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>>>>
>>>>>>>>>>>> This is a really interesting race and I need to figure out if it
>>>>>>>>>>>> is
>>>>>>>>>>>> there on purpose or not?
>>>>>>>>>>>>
>>>>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>>> sock
>>>>>>>>>>>> *sk,
>>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>>                     if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>>>>                             asoc >>>>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>>>>                                               struct
>>>>>>>>>>>> sctp_association,
>>>>>>>>>>>> asocs);
>>>>>>>>>>>> -               return asoc;
>>>>>>>>>>>> +               goto done;
>>>>>>>>>>>>             }
>>>>>>>>>>>>
>>>>>>>>>>>>             /* Otherwise this is a UDP-style socket. */
>>>>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>>> sock
>>>>>>>>>>>> *sk,
>>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>>             asoc = (struct sctp_association
>>>>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>>>>> (int)id);
>>>>>>>>>>>>             spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>>>>
>>>>>>>>>>>> +done:
>>>>>>>>>>>>             if (!asoc || (asoc->base.sk != sk) ||
>>>>>>>>>>>> asoc->base.dead)
>>>>>>>>>>>>                     return NULL;
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Vlad,
>>>>>>>>>>>
>>>>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>>>>> above
>>>>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>>>>> match
>>>>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>>>>> all
>>>>>>>>> its
>>>>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>>>>> you kdump, is that list empty or not?
>>>>>>>>>
>>>>>>>>> Are any other peer transport pointers set (active_path,
>>>>>>>>> retran_path)?
>>>>>>>>>
>>>>>>>>
>>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>>>>> $14 = {
>>>>>>>>      rwnd = 65535,
>>>>>>>>      transport_addr_list = {
>>>>>>>>        next = 0xffff8107670e3180,
>>>>>>>>        prev = 0xffff8107670e3180
>>>>>>>>      },
>>>>>>>>      transport_count = 0,
>>>>>>>>      port = 3868,
>>>>>>>>      primary_path = 0x0,
>>>>>>>>      primary_addr = {
>>>>>>>>        v4 = {
>>>>>>>>          sin_family = 0,
>>>>>>>>          sin_port = 0,
>>>>>>>>          sin_addr = {
>>>>>>>>            s_addr = 0
>>>>>>>>          },
>>>>>>>>          __pad = "\000\000\000\000\000\000\000"
>>>>>>>>        },
>>>>>>>>        v6 = {
>>>>>>>>          sin6_family = 0,
>>>>>>>>          sin6_port = 0,
>>>>>>>>          sin6_flowinfo = 0,
>>>>>>>>          sin6_addr = {
>>>>>>>>            in6_u = {
>>>>>>>>              u6_addr8 >>>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>>>>              u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>>>>              u6_addr32 = {0, 0, 0, 0}
>>>>>>>>            }
>>>>>>>>          },
>>>>>>>>          sin6_scope_id = 0
>>>>>>>>        },
>>>>>>>>        sa = {
>>>>>>>>          sa_family = 0,
>>>>>>>>          sa_data >>>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>>>>        }
>>>>>>>>      },
>>>>>>>>      active_path = 0x0,
>>>>>>>>      retran_path = 0x0,
>>>>>>>>      last_sent_to = 0x0,
>>>>>>>>      last_data_from = 0x0,
>>>>>>>>      tsn_map = {
>>>>>>>>        tsn_map = 0x0,
>>>>>>>>        base_tsn = 0,
>>>>>>>>        cumulative_tsn_ack_point = 0,
>>>>>>>>        max_tsn_seen = 0,
>>>>>>>>        len = 0,
>>>>>>>>        pending_data = 0,
>>>>>>>>        num_dup_tsns = 0,
>>>>>>>>        dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>>>>      },
>>>>>>>>      sack_needed = 1 '\001',
>>>>>>>>      sack_cnt = 0,
>>>>>>>>      ecn_capable = 0 '\0',
>>>>>>>>      ipv4_address = 1 '\001',
>>>>>>>>      ipv6_address = 0 '\0',
>>>>>>>>      hostname_address = 0 '\0',
>>>>>>>>      asconf_capable = 0 '\0',
>>>>>>>>      prsctp_capable = 0 '\0',
>>>>>>>>      auth_capable = 0 '\0',
>>>>>>>>      adaptation_ind = 0,
>>>>>>>>      addip_disabled_mask = 0,
>>>>>>>>      i = {
>>>>>>>>        init_tag = 0,
>>>>>>>>        a_rwnd = 0,
>>>>>>>>        num_outbound_streams = 0,
>>>>>>>>        num_inbound_streams = 0,
>>>>>>>>        initial_tsn = 0
>>>>>>>>      },
>>>>>>>>      cookie_len = 0,
>>>>>>>>      cookie = 0x0,
>>>>>>>>      addip_serial = 0,
>>>>>>>>      peer_random = 0x0,
>>>>>>>>      peer_chunks = 0x0,
>>>>>>>>      peer_hmacs = 0x0
>>>>>>>> }
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vlad,
>>>>>>>>>>
>>>>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>>>>> Is there any way that the association is initialized without any
>>>>>>>>>> transport information?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When the association is initialized, the lists are empty, but the
>>>>>>>>> next
>>>>>>>>> thing that happens is that we add transport of the destination we
>>>>>>>>> are
>>>>>>>>> sending to or receiving from to the association and mark it as
>>>>>>>>> primary
>>>>>>>>> and
>>>>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>>>>> access the association until all actions on that association
>>>>>>>>> complete.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The reason I ask; we believe the issue is
>>>>>>>>>> happening very shortly after the association is brought up (we
>>>>>>>>>> bring
>>>>>>>>>> it up and then do the getsockopt()).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>>>>> provide
>>>>>>>>> the kdump and the kernel so I can dig around.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>>>>> $15 = SCTP_STATE_CLOSED
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Karl
>>>>>>>
>>>>>>> Was this the client or the server side?  Also what was the socket type
>>>>>>> (STREAM or SEQPACKET)?
>>>>>>>
>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> -vlad
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> We believe this is occurring on the client side (still working on
>>>>>> confirming, this system is a Diameter router so we get connections
>>>>>> going in both directions).  The connections are all STREAM.  We are
>>>>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>>>>
>>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> So we finally got a capture around the time of the panic.  The
>>>>> panicing system is acting as a server and the client is connecting,
>>>>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>>>>> the client sends another INIT.  At this point, the server handles the
>>>>> INIT, starts over and it starts sending data packets again when the
>>>>> server sends an ABORT because the application doesn't support
>>>>> restarting the connection.
>>>>
>>>>
>>>>
>>>> Do you know if this is done through SO_LINGER or with sendmsg and
>>>> MSG_ABORT?
>>>>
>>>> -vlad
>>>>
>>>>
>>>>>   It is around this time that the panic
>>>>> occurs.  One thing that I noticed is that the sctp_association
>>>>> structure looks awfully similar to a temporary association that is
>>>>> created when an unexpected INIT is received, but before it is
>>>>> populated with peer information. However the temp value is not set to
>>>>> 0 as would be expected.
>>>>>
>>>>> Karl
>>>>>
>>>>
>>>
>>> This is done with SO_LINGER.  However, we just were able to reproduce the
>>> issue.
>>>
>>> When a duplicate cookie-echo message is received, and the
>>> sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
>>> sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
>>> creating a full-fledged association instead of one with "temp" set.
>>> Now, if we enter collision case, the primary path does not get written
>>> in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
>>> since the association does not have "temp" marked, it gets added to
>>> the association hash table and the endpoint. Even when the command
>>> SCTP_CMD_DELETE_TCB is processed, since the association is not
>>> temporary, the following check in sctp_cmd_delete_tcb() prevents the
>>> association from being deleted from the hash table or the endpoint.
>>>
>>>                  if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
>>>                      (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
>>>                                  return;
>>>
>>>                  sctp_unhash_established(asoc);
>>>       <<< never reached
>>>                  sctp_association_free(asoc);
>>>             <<< never reached
>>>
>>> When we duplicate the traffic using netem, we are able to get this to
>>> occur when getsockopt(SCTP_STATUS) is called due to the transport
>>> being NULL.
>>>
>>> Karl
>>
>>
>> Hi Karl
>>
>> Yep, this is the code I've been looking at as well, just didn't get far
>> enough.  I was focusing the dookcook_a case().
>>
>> I'm attaching a patch (untested) that should fix this.
>>
>> -vlad
>>>
>>>
>>
> Vlad,
>
> That looks promising, however SCTP_CMD_SET_ASOC doesn't exist in this
> (2.6.36.2) SCTP stack.  I will look into backporting this side effect
> state or finding an alternate way of preventing the association from
> being added to the endpoint.
>
> Karl

Vlad,

I have another kernel which experiences panics with the same
duplicated SCTP traffic and has a SCTP stack from 3.1.7, to which your
previous patch cleanly applies.  Unfortunately, the panic now occurs
when sctp_unhash_established() is called from sctp_cmd_delete_tcb(),
attempting to delete a node from the association base.

As a test, I attempted the crude method of setting all associations
generated from sctp_unpack_cookie() in sctp_sf_do_5_2_4_dupcook() to
be temporary associations and we are unable to panic the system.  From
my (somewhat poor) understanding, however, this would break the
behavior described in the RFC for case 'A' and possibly 'B'.  Does it
make sense instead to modify case 'A' and 'B' to alter asoc instead of
new_asoc and leave new_asoc as a temporary association for all cases?

Thanks for the patience and help so far.

Karl

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NULL primary_path
  2013-03-06 22:57 NULL primary_path Karl Heiss
                   ` (22 preceding siblings ...)
  2013-03-12 16:18 ` Karl Heiss
@ 2013-03-12 17:23 ` Vlad Yasevich
  23 siblings, 0 replies; 25+ messages in thread
From: Vlad Yasevich @ 2013-03-12 17:23 UTC (permalink / raw)
  To: linux-sctp

On 03/12/2013 12:18 PM, Karl Heiss wrote:
> On Mon, Mar 11, 2013 at 9:05 PM, Karl Heiss <kheiss@gmail.com> wrote:
>> On Mon, Mar 11, 2013 at 7:10 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>>> On 03/11/2013 06:44 PM, Karl Heiss wrote:
>>>>
>>>> On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich
>>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>>>>>> closed
>>>>>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called
>>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>>>>>> call_rcu's
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Make sure that you have this patch in the kernel you are
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>>>>>> simply swapping the order of the instructions without moving
>>>>>>>>>>>>>> them
>>>>>>>>>>>>>> would have any effect?  I ask this hypothetically because the
>>>>>>>>>>>>>> race
>>>>>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>>>>>> association
>>>>>>>>>>>>> is
>>>>>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>>>>>> the association id pool until all references on the association
>>>>>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>>>>>> association
>>>>>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a really interesting race and I need to figure out if it
>>>>>>>>>>>>> is
>>>>>>>>>>>>> there on purpose or not?
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>>>> sock
>>>>>>>>>>>>> *sk,
>>>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>>>                      if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>>>>>                              asoc >>>>>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>>>>>                                                struct
>>>>>>>>>>>>> sctp_association,
>>>>>>>>>>>>> asocs);
>>>>>>>>>>>>> -               return asoc;
>>>>>>>>>>>>> +               goto done;
>>>>>>>>>>>>>              }
>>>>>>>>>>>>>
>>>>>>>>>>>>>              /* Otherwise this is a UDP-style socket. */
>>>>>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>>>>> sock
>>>>>>>>>>>>> *sk,
>>>>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>>>>              asoc = (struct sctp_association
>>>>>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>>>>>> (int)id);
>>>>>>>>>>>>>              spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>>>>>
>>>>>>>>>>>>> +done:
>>>>>>>>>>>>>              if (!asoc || (asoc->base.sk != sk) ||
>>>>>>>>>>>>> asoc->base.dead)
>>>>>>>>>>>>>                      return NULL;
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Vlad,
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>>>>>> above
>>>>>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>>>>>> match
>>>>>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>>>>>> all
>>>>>>>>>> its
>>>>>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>>>>>> you kdump, is that list empty or not?
>>>>>>>>>>
>>>>>>>>>> Are any other peer transport pointers set (active_path,
>>>>>>>>>> retran_path)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>>>>>> $14 = {
>>>>>>>>>       rwnd = 65535,
>>>>>>>>>       transport_addr_list = {
>>>>>>>>>         next = 0xffff8107670e3180,
>>>>>>>>>         prev = 0xffff8107670e3180
>>>>>>>>>       },
>>>>>>>>>       transport_count = 0,
>>>>>>>>>       port = 3868,
>>>>>>>>>       primary_path = 0x0,
>>>>>>>>>       primary_addr = {
>>>>>>>>>         v4 = {
>>>>>>>>>           sin_family = 0,
>>>>>>>>>           sin_port = 0,
>>>>>>>>>           sin_addr = {
>>>>>>>>>             s_addr = 0
>>>>>>>>>           },
>>>>>>>>>           __pad = "\000\000\000\000\000\000\000"
>>>>>>>>>         },
>>>>>>>>>         v6 = {
>>>>>>>>>           sin6_family = 0,
>>>>>>>>>           sin6_port = 0,
>>>>>>>>>           sin6_flowinfo = 0,
>>>>>>>>>           sin6_addr = {
>>>>>>>>>             in6_u = {
>>>>>>>>>               u6_addr8 >>>>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>>>>>               u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>>>>>               u6_addr32 = {0, 0, 0, 0}
>>>>>>>>>             }
>>>>>>>>>           },
>>>>>>>>>           sin6_scope_id = 0
>>>>>>>>>         },
>>>>>>>>>         sa = {
>>>>>>>>>           sa_family = 0,
>>>>>>>>>           sa_data >>>>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>>>>>         }
>>>>>>>>>       },
>>>>>>>>>       active_path = 0x0,
>>>>>>>>>       retran_path = 0x0,
>>>>>>>>>       last_sent_to = 0x0,
>>>>>>>>>       last_data_from = 0x0,
>>>>>>>>>       tsn_map = {
>>>>>>>>>         tsn_map = 0x0,
>>>>>>>>>         base_tsn = 0,
>>>>>>>>>         cumulative_tsn_ack_point = 0,
>>>>>>>>>         max_tsn_seen = 0,
>>>>>>>>>         len = 0,
>>>>>>>>>         pending_data = 0,
>>>>>>>>>         num_dup_tsns = 0,
>>>>>>>>>         dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>>>>>       },
>>>>>>>>>       sack_needed = 1 '\001',
>>>>>>>>>       sack_cnt = 0,
>>>>>>>>>       ecn_capable = 0 '\0',
>>>>>>>>>       ipv4_address = 1 '\001',
>>>>>>>>>       ipv6_address = 0 '\0',
>>>>>>>>>       hostname_address = 0 '\0',
>>>>>>>>>       asconf_capable = 0 '\0',
>>>>>>>>>       prsctp_capable = 0 '\0',
>>>>>>>>>       auth_capable = 0 '\0',
>>>>>>>>>       adaptation_ind = 0,
>>>>>>>>>       addip_disabled_mask = 0,
>>>>>>>>>       i = {
>>>>>>>>>         init_tag = 0,
>>>>>>>>>         a_rwnd = 0,
>>>>>>>>>         num_outbound_streams = 0,
>>>>>>>>>         num_inbound_streams = 0,
>>>>>>>>>         initial_tsn = 0
>>>>>>>>>       },
>>>>>>>>>       cookie_len = 0,
>>>>>>>>>       cookie = 0x0,
>>>>>>>>>       addip_serial = 0,
>>>>>>>>>       peer_random = 0x0,
>>>>>>>>>       peer_chunks = 0x0,
>>>>>>>>>       peer_hmacs = 0x0
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Vlad,
>>>>>>>>>>>
>>>>>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>>>>>> Is there any way that the association is initialized without any
>>>>>>>>>>> transport information?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When the association is initialized, the lists are empty, but the
>>>>>>>>>> next
>>>>>>>>>> thing that happens is that we add transport of the destination we
>>>>>>>>>> are
>>>>>>>>>> sending to or receiving from to the association and mark it as
>>>>>>>>>> primary
>>>>>>>>>> and
>>>>>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>>>>>> access the association until all actions on that association
>>>>>>>>>> complete.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The reason I ask; we believe the issue is
>>>>>>>>>>> happening very shortly after the association is brought up (we
>>>>>>>>>>> bring
>>>>>>>>>>> it up and then do the getsockopt()).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>>>>>> provide
>>>>>>>>>> the kdump and the kernel so I can dig around.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>>>>>> $15 = SCTP_STATE_CLOSED
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Karl
>>>>>>>>
>>>>>>>> Was this the client or the server side?  Also what was the socket type
>>>>>>>> (STREAM or SEQPACKET)?
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -vlad
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> We believe this is occurring on the client side (still working on
>>>>>>> confirming, this system is a Diameter router so we get connections
>>>>>>> going in both directions).  The connections are all STREAM.  We are
>>>>>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>>>>>
>>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> So we finally got a capture around the time of the panic.  The
>>>>>> panicing system is acting as a server and the client is connecting,
>>>>>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>>>>>> the client sends another INIT.  At this point, the server handles the
>>>>>> INIT, starts over and it starts sending data packets again when the
>>>>>> server sends an ABORT because the application doesn't support
>>>>>> restarting the connection.
>>>>>
>>>>>
>>>>>
>>>>> Do you know if this is done through SO_LINGER or with sendmsg and
>>>>> MSG_ABORT?
>>>>>
>>>>> -vlad
>>>>>
>>>>>
>>>>>>    It is around this time that the panic
>>>>>> occurs.  One thing that I noticed is that the sctp_association
>>>>>> structure looks awfully similar to a temporary association that is
>>>>>> created when an unexpected INIT is received, but before it is
>>>>>> populated with peer information. However the temp value is not set to
>>>>>> 0 as would be expected.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>
>>>>
>>>> This is done with SO_LINGER.  However, we just were able to reproduce the
>>>> issue.
>>>>
>>>> When a duplicate cookie-echo message is received, and the
>>>> sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
>>>> sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
>>>> creating a full-fledged association instead of one with "temp" set.
>>>> Now, if we enter collision case, the primary path does not get written
>>>> in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
>>>> since the association does not have "temp" marked, it gets added to
>>>> the association hash table and the endpoint. Even when the command
>>>> SCTP_CMD_DELETE_TCB is processed, since the association is not
>>>> temporary, the following check in sctp_cmd_delete_tcb() prevents the
>>>> association from being deleted from the hash table or the endpoint.
>>>>
>>>>                   if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
>>>>                       (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
>>>>                                   return;
>>>>
>>>>                   sctp_unhash_established(asoc);
>>>>        <<< never reached
>>>>                   sctp_association_free(asoc);
>>>>              <<< never reached
>>>>
>>>> When we duplicate the traffic using netem, we are able to get this to
>>>> occur when getsockopt(SCTP_STATUS) is called due to the transport
>>>> being NULL.
>>>>
>>>> Karl
>>>
>>>
>>> Hi Karl
>>>
>>> Yep, this is the code I've been looking at as well, just didn't get far
>>> enough.  I was focusing the dookcook_a case().
>>>
>>> I'm attaching a patch (untested) that should fix this.
>>>
>>> -vlad
>>>>
>>>>
>>>
>> Vlad,
>>
>> That looks promising, however SCTP_CMD_SET_ASOC doesn't exist in this
>> (2.6.36.2) SCTP stack.  I will look into backporting this side effect
>> state or finding an alternate way of preventing the association from
>> being added to the endpoint.
>>
>> Karl
>
> Vlad,
>
> I have another kernel which experiences panics with the same
> duplicated SCTP traffic and has a SCTP stack from 3.1.7, to which your
> previous patch cleanly applies.  Unfortunately, the panic now occurs
> when sctp_unhash_established() is called from sctp_cmd_delete_tcb(),
> attempting to delete a node from the association base.

There was a patch to address this.

2eebc1e188e9e45886ee00662519849339884d6d
  sctp: Fix list corruption resulting from freeing an association on a list

This should be in stable.  Can you make sure you have that patch in your 
tree?

>
> As a test, I attempted the crude method of setting all associations
> generated from sctp_unpack_cookie() in sctp_sf_do_5_2_4_dupcook() to
> be temporary associations and we are unable to panic the system.  From
> my (somewhat poor) understanding, however, this would break the
> behavior described in the RFC for case 'A' and possibly 'B'.  Does it
> make sense instead to modify case 'A' and 'B' to alter asoc instead of
> new_asoc and leave new_asoc as a temporary association for all cases?

Technically, it might be safe to tag the new_asoc at "temporary" in the 
'A' and 'B' cases, and may be even in all of them, since we'll be 
destroying it at the end of the duplicate cookie processing.  I don't 
think this would break things, since we actually try to modify the 
existing 'asoc' based on all the values we've written into 'new_assoc'.

I am concerned however about the above crash, and I hope that above 
patch should fix it for your.  If it does,  changing the command is
the best solution as that is exactly what we want.  Changing the 'temp'
flag on the fly is a bit dangerous and may have unintended side effects.

-vlad
>
> Thanks for the patience and help so far.
>
> Karl
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-03-12 17:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-06 22:57 NULL primary_path Karl Heiss
2013-03-07  1:53 ` Vlad Yasevich
2013-03-07 14:52 ` Cristian Constantin
2013-03-07 15:29 ` Neil Horman
2013-03-07 15:44 ` Vlad Yasevich
2013-03-07 15:48 ` Vlad Yasevich
2013-03-07 17:06 ` Karl Heiss
2013-03-07 17:17 ` Vlad Yasevich
2013-03-07 21:51 ` Karl Heiss
2013-03-07 22:08 ` Vlad Yasevich
2013-03-07 23:09 ` Vlad Yasevich
2013-03-08 13:52 ` Karl Heiss
2013-03-08 14:31 ` Karl Heiss
2013-03-08 14:35 ` Neil Horman
2013-03-08 15:31 ` Vlad Yasevich
2013-03-08 15:37 ` Karl Heiss
2013-03-08 16:42 ` Vlad Yasevich
2013-03-08 17:06 ` Karl Heiss
2013-03-09 20:19 ` Karl Heiss
2013-03-11 21:59 ` Vlad Yasevich
2013-03-11 22:44 ` Karl Heiss
2013-03-11 23:10 ` Vlad Yasevich
2013-03-12  1:05 ` Karl Heiss
2013-03-12 16:18 ` Karl Heiss
2013-03-12 17:23 ` Vlad Yasevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.