On 03/11/2013 06:44 PM, Karl Heiss wrote:
> On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 03/09/2013 03:19 PM, Karl Heiss wrote:
>>>
>>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>
>>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich
>>>>>>>>>>> <vyasevich@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue appears to manifest itself when the connection is
>>>>>>>>>>>>> closed
>>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within
>>>>>>>>>>>>> a
>>>>>>>>>>>>> small window in which the association is still valid but
>>>>>>>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed
>>>>>>>>>>>> the
>>>>>>>>>>>> association to remain while all transports have been removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is a patch that should have addressed this condition:
>>>>>>>>>>>>
>>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>>>>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>>>>>>>
>>>>>>>>>>>>          sctp: sctp_close: fix release of bindings for deferred
>>>>>>>>>>>> call_rcu's
>>>>>>>>>>>>
>>>>>>>>>>>> Full patch is here:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>>>>>>>
>>>>>>>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>>>>>>>
>>>>>>>>>>>> -vlad
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP
>>>>>>>>>>> stack
>>>>>>>>>>> that we are using (2.6.36.2) since it does not have a
>>>>>>>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>>>>>>>> simply swapping the order of the instructions without moving them
>>>>>>>>>>> would have any effect?  I ask this hypothetically because the race
>>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing
>>>>>>>>>>> to
>>>>>>>>>>> test against (aside from in the field!).
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Karl
>>>>>>>>>>
>>>>>>>>>> I think I see the problem now.  The problem happens when the
>>>>>>>>>> association
>>>>>>>>>> is
>>>>>>>>>> destroyed.  We delay removing the association from
>>>>>>>>>> the association id pool until all references on the association
>>>>>>>>>> have dropped.  As a result, it is possible (for a very short
>>>>>>>>>> period of time) for an association structure to still exist in
>>>>>>>>>> the kernel and still be found via the association id, but that
>>>>>>>>>> association
>>>>>>>>>> has no transports and is about to be completely destroyed.
>>>>>>>>>>
>>>>>>>>>> This is a really interesting race and I need to figure out if it is
>>>>>>>>>> there on purpose or not?
>>>>>>>>>>
>>>>>>>>>> In the mean time, here is a patch that should solve it for you.
>>>>>>>>>>
>>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>>>>> index b907073..2d92c89 100644
>>>>>>>>>> --- a/net/sctp/socket.c
>>>>>>>>>> +++ b/net/sctp/socket.c
>>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>> sock
>>>>>>>>>> *sk,
>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>                     if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>>>>>>>                             asoc =
>>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>>>>>>>                                               struct
>>>>>>>>>> sctp_association,
>>>>>>>>>> asocs);
>>>>>>>>>> -               return asoc;
>>>>>>>>>> +               goto done;
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>>             /* Otherwise this is a UDP-style socket. */
>>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct
>>>>>>>>>> sock
>>>>>>>>>> *sk,
>>>>>>>>>> sctp_assoc_t id)
>>>>>>>>>>             asoc = (struct sctp_association
>>>>>>>>>> *)idr_find(&sctp_assocs_id,
>>>>>>>>>> (int)id);
>>>>>>>>>>             spin_unlock_bh(&sctp_assocs_id_lock);
>>>>>>>>>>
>>>>>>>>>> +done:
>>>>>>>>>>             if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>>>>>>>                     return NULL;
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vlad,
>>>>>>>>>
>>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch
>>>>>>>>> above
>>>>>>>>> may not work in this case since the asoc is valid, the base.sk is
>>>>>>>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't
>>>>>>>>> match
>>>>>>>>> sk, this wouldn't appear to fix this issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hm..  If the association is not marked "dead", it should still have
>>>>>>> all
>>>>>>> its
>>>>>>> transports present.  If you look at the peer.transport_addr_list in
>>>>>>> you kdump, is that list empty or not?
>>>>>>>
>>>>>>> Are any other peer transport pointers set (active_path, retran_path)?
>>>>>>>
>>>>>>
>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
>>>>>> $14 = {
>>>>>>      rwnd = 65535,
>>>>>>      transport_addr_list = {
>>>>>>        next = 0xffff8107670e3180,
>>>>>>        prev = 0xffff8107670e3180
>>>>>>      },
>>>>>>      transport_count = 0,
>>>>>>      port = 3868,
>>>>>>      primary_path = 0x0,
>>>>>>      primary_addr = {
>>>>>>        v4 = {
>>>>>>          sin_family = 0,
>>>>>>          sin_port = 0,
>>>>>>          sin_addr = {
>>>>>>            s_addr = 0
>>>>>>          },
>>>>>>          __pad = "\000\000\000\000\000\000\000"
>>>>>>        },
>>>>>>        v6 = {
>>>>>>          sin6_family = 0,
>>>>>>          sin6_port = 0,
>>>>>>          sin6_flowinfo = 0,
>>>>>>          sin6_addr = {
>>>>>>            in6_u = {
>>>>>>              u6_addr8 =
>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>>>>>>              u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
>>>>>>              u6_addr32 = {0, 0, 0, 0}
>>>>>>            }
>>>>>>          },
>>>>>>          sin6_scope_id = 0
>>>>>>        },
>>>>>>        sa = {
>>>>>>          sa_family = 0,
>>>>>>          sa_data =
>>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000"
>>>>>>        }
>>>>>>      },
>>>>>>      active_path = 0x0,
>>>>>>      retran_path = 0x0,
>>>>>>      last_sent_to = 0x0,
>>>>>>      last_data_from = 0x0,
>>>>>>      tsn_map = {
>>>>>>        tsn_map = 0x0,
>>>>>>        base_tsn = 0,
>>>>>>        cumulative_tsn_ack_point = 0,
>>>>>>        max_tsn_seen = 0,
>>>>>>        len = 0,
>>>>>>        pending_data = 0,
>>>>>>        num_dup_tsns = 0,
>>>>>>        dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>>>>>>      },
>>>>>>      sack_needed = 1 '\001',
>>>>>>      sack_cnt = 0,
>>>>>>      ecn_capable = 0 '\0',
>>>>>>      ipv4_address = 1 '\001',
>>>>>>      ipv6_address = 0 '\0',
>>>>>>      hostname_address = 0 '\0',
>>>>>>      asconf_capable = 0 '\0',
>>>>>>      prsctp_capable = 0 '\0',
>>>>>>      auth_capable = 0 '\0',
>>>>>>      adaptation_ind = 0,
>>>>>>      addip_disabled_mask = 0,
>>>>>>      i = {
>>>>>>        init_tag = 0,
>>>>>>        a_rwnd = 0,
>>>>>>        num_outbound_streams = 0,
>>>>>>        num_inbound_streams = 0,
>>>>>>        initial_tsn = 0
>>>>>>      },
>>>>>>      cookie_len = 0,
>>>>>>      cookie = 0x0,
>>>>>>      addip_serial = 0,
>>>>>>      peer_random = 0x0,
>>>>>>      peer_chunks = 0x0,
>>>>>>      peer_hmacs = 0x0
>>>>>> }
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Vlad,
>>>>>>>>
>>>>>>>> One other thing, with the difficulty we are having recreating this
>>>>>>>> issue, is there any generic way to increase the likelihood for the
>>>>>>>> transport to be cleared out while delaying the association cleanup?
>>>>>>>> Is there any way that the association is initialized without any
>>>>>>>> transport information?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> When the association is initialized, the lists are empty, but the next
>>>>>>> thing that happens is that we add transport of the destination we are
>>>>>>> sending to or receiving from to the association and mark it as primary
>>>>>>> and
>>>>>>> active.  All this happens under a socket lock, so getsockopt can't
>>>>>>> access the association until all actions on that association complete.
>>>>>>>
>>>>>>>
>>>>>>>> The reason I ask; we believe the issue is
>>>>>>>> happening very shortly after the association is brought up (we bring
>>>>>>>> it up and then do the getsockopt()).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you check what the association state is?  Alternately, can you
>>>>>>> provide
>>>>>>> the kdump and the kernel so I can dig around.
>>>>>>
>>>>>>
>>>>>>
>>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state
>>>>>> $15 = SCTP_STATE_CLOSED
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Karl
>>>>>
>>>>> Was this the client or the server side?  Also what was the socket type
>>>>> (STREAM or SEQPACKET)?
>>>>>
>>>>> -vlad
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>> We believe this is occurring on the client side (still working on
>>>> confirming, this system is a Diameter router so we get connections
>>>> going in both directions).  The connections are all STREAM.  We are
>>>> also seeing ABORTs fairly regularly on the connections in suspect.
>>>>
>>>> Karl
>>>
>>>
>>> So we finally got a capture around the time of the panic.  The
>>> panicing system is acting as a server and the client is connecting,
>>> gets through INIT and COOKIE_ECHO, and sends several data packets when
>>> the client sends another INIT.  At this point, the server handles the
>>> INIT, starts over and it starts sending data packets again when the
>>> server sends an ABORT because the application doesn't support
>>> restarting the connection.
>>
>>
>> Do you know if this is done through SO_LINGER or with sendmsg and MSG_ABORT?
>>
>> -vlad
>>
>>
>>>   It is around this time that the panic
>>> occurs.  One thing that I noticed is that the sctp_association
>>> structure looks awfully similar to a temporary association that is
>>> created when an unexpected INIT is received, but before it is
>>> populated with peer information. However the temp value is not set to
>>> 0 as would be expected.
>>>
>>> Karl
>>>
>>
>
> This is done with SO_LINGER.  However, we just were able to reproduce the issue.
>
> When a duplicate cookie-echo message is received, and the
> sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls
> sctp_association_new() instead of sctp_make_temp_asoc(), and ends up
> creating a full-fledged association instead of one with "temp" set.
> Now, if we enter collision case, the primary path does not get written
> in the association. When the next command is set to SCTP_CMD_NEW_ASOC,
> since the association does not have "temp" marked, it gets added to
> the association hash table and the endpoint. Even when the command
> SCTP_CMD_DELETE_TCB is processed, since the association is not
> temporary, the following check in sctp_cmd_delete_tcb() prevents the
> association from being deleted from the hash table or the endpoint.
>
>                  if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) &&
>                      (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
>                                  return;
>
>                  sctp_unhash_established(asoc);
>       <<< never reached
>                  sctp_association_free(asoc);
>             <<< never reached
>
> When we duplicate the traffic using netem, we are able to get this to
> occur when getsockopt(SCTP_STATUS) is called due to the transport
> being NULL.
>
> Karl

Hi Karl

Yep, this is the code I've been looking at as well, just didn't get far
enough.  I was focusing the dookcook_a case().

I'm attaching a patch (untested) that should fix this.

-vlad
>