On 03/11/2013 06:44 PM, Karl Heiss wrote: > On Mon, Mar 11, 2013 at 5:59 PM, Vlad Yasevich wrote: >> On 03/09/2013 03:19 PM, Karl Heiss wrote: >>> >>> On Fri, Mar 8, 2013 at 12:06 PM, Karl Heiss wrote: >>>> >>>> On Fri, Mar 8, 2013 at 11:42 AM, Vlad Yasevich >>>> wrote: >>>>> >>>>> On 03/08/2013 10:37 AM, Karl Heiss wrote: >>>>>> >>>>>> >>>>>> On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 03/08/2013 09:31 AM, Karl Heiss wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The issue appears to manifest itself when the connection is >>>>>>>>>>>>> closed >>>>>>>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within >>>>>>>>>>>>> a >>>>>>>>>>>>> small window in which the association is still valid but >>>>>>>>>>>>> asoc->peer.primary_path is NULL. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Aha! Thanks. There was a bug in the rcu clean-up that allowed >>>>>>>>>>>> the >>>>>>>>>>>> association to remain while all transports have been removed. >>>>>>>>>>>> >>>>>>>>>>>> Here is a patch that should have addressed this condition: >>>>>>>>>>>> >>>>>>>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1 >>>>>>>>>>>> Author: Daniel Borkmann >>>>>>>>>>>> Date: Fri Feb 1 04:37:43 2013 +0000 >>>>>>>>>>>> >>>>>>>>>>>> sctp: sctp_close: fix release of bindings for deferred >>>>>>>>>>>> call_rcu's >>>>>>>>>>>> >>>>>>>>>>>> Full patch is here: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8c98653f05534acd1cb07ea4929702a3659177d1 >>>>>>>>>>>> >>>>>>>>>>>> Make sure that you have this patch in the kernel you are running >>>>>>>>>>>> >>>>>>>>>>>> -vlad >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Unfortunately this patch wont apply to the version of the SCTP >>>>>>>>>>> stack >>>>>>>>>>> that we are using (2.6.36.2) since it does not have a >>>>>>>>>>> sctp_transport_destroy_rcu() function. Is there any chance that >>>>>>>>>>> simply swapping the order of the instructions without moving them >>>>>>>>>>> would have any effect? I ask this hypothetically because the race >>>>>>>>>>> condition window seems to be difficult to recreate, thus nothing >>>>>>>>>>> to >>>>>>>>>>> test against (aside from in the field!). >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Karl >>>>>>>>>> >>>>>>>>>> I think I see the problem now. The problem happens when the >>>>>>>>>> association >>>>>>>>>> is >>>>>>>>>> destroyed. We delay removing the association from >>>>>>>>>> the association id pool until all references on the association >>>>>>>>>> have dropped. As a result, it is possible (for a very short >>>>>>>>>> period of time) for an association structure to still exist in >>>>>>>>>> the kernel and still be found via the association id, but that >>>>>>>>>> association >>>>>>>>>> has no transports and is about to be completely destroyed. >>>>>>>>>> >>>>>>>>>> This is a really interesting race and I need to figure out if it is >>>>>>>>>> there on purpose or not? >>>>>>>>>> >>>>>>>>>> In the mean time, here is a patch that should solve it for you. >>>>>>>>>> >>>>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>>>>>>>> index b907073..2d92c89 100644 >>>>>>>>>> --- a/net/sctp/socket.c >>>>>>>>>> +++ b/net/sctp/socket.c >>>>>>>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct >>>>>>>>>> sock >>>>>>>>>> *sk, >>>>>>>>>> sctp_assoc_t id) >>>>>>>>>> if (!list_empty(&sctp_sk(sk)->ep->asocs)) >>>>>>>>>> asoc = >>>>>>>>>> list_entry(sctp_sk(sk)->ep->asocs.next, >>>>>>>>>> struct >>>>>>>>>> sctp_association, >>>>>>>>>> asocs); >>>>>>>>>> - return asoc; >>>>>>>>>> + goto done; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /* Otherwise this is a UDP-style socket. */ >>>>>>>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct >>>>>>>>>> sock >>>>>>>>>> *sk, >>>>>>>>>> sctp_assoc_t id) >>>>>>>>>> asoc = (struct sctp_association >>>>>>>>>> *)idr_find(&sctp_assocs_id, >>>>>>>>>> (int)id); >>>>>>>>>> spin_unlock_bh(&sctp_assocs_id_lock); >>>>>>>>>> >>>>>>>>>> +done: >>>>>>>>>> if (!asoc || (asoc->base.sk != sk) || asoc->base.dead) >>>>>>>>>> return NULL; >>>>>>>>>> >>>>>>>>> >>>>>>>>> Vlad, >>>>>>>>> >>>>>>>>> Looking at the kdump from the panic, I am seeing that your patch >>>>>>>>> above >>>>>>>>> may not work in this case since the asoc is valid, the base.sk is >>>>>>>>> valid, and base.dead is 0. Unless base.sk is valid but doesn't >>>>>>>>> match >>>>>>>>> sk, this wouldn't appear to fix this issue. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hm.. If the association is not marked "dead", it should still have >>>>>>> all >>>>>>> its >>>>>>> transports present. If you look at the peer.transport_addr_list in >>>>>>> you kdump, is that list empty or not? >>>>>>> >>>>>>> Are any other peer transport pointers set (active_path, retran_path)? >>>>>>> >>>>>> >>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).peer >>>>>> $14 = { >>>>>> rwnd = 65535, >>>>>> transport_addr_list = { >>>>>> next = 0xffff8107670e3180, >>>>>> prev = 0xffff8107670e3180 >>>>>> }, >>>>>> transport_count = 0, >>>>>> port = 3868, >>>>>> primary_path = 0x0, >>>>>> primary_addr = { >>>>>> v4 = { >>>>>> sin_family = 0, >>>>>> sin_port = 0, >>>>>> sin_addr = { >>>>>> s_addr = 0 >>>>>> }, >>>>>> __pad = "\000\000\000\000\000\000\000" >>>>>> }, >>>>>> v6 = { >>>>>> sin6_family = 0, >>>>>> sin6_port = 0, >>>>>> sin6_flowinfo = 0, >>>>>> sin6_addr = { >>>>>> in6_u = { >>>>>> u6_addr8 = >>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", >>>>>> u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, >>>>>> u6_addr32 = {0, 0, 0, 0} >>>>>> } >>>>>> }, >>>>>> sin6_scope_id = 0 >>>>>> }, >>>>>> sa = { >>>>>> sa_family = 0, >>>>>> sa_data = >>>>>> "\000\000\000\000\000\000\000\000\000\000\000\000\000" >>>>>> } >>>>>> }, >>>>>> active_path = 0x0, >>>>>> retran_path = 0x0, >>>>>> last_sent_to = 0x0, >>>>>> last_data_from = 0x0, >>>>>> tsn_map = { >>>>>> tsn_map = 0x0, >>>>>> base_tsn = 0, >>>>>> cumulative_tsn_ack_point = 0, >>>>>> max_tsn_seen = 0, >>>>>> len = 0, >>>>>> pending_data = 0, >>>>>> num_dup_tsns = 0, >>>>>> dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} >>>>>> }, >>>>>> sack_needed = 1 '\001', >>>>>> sack_cnt = 0, >>>>>> ecn_capable = 0 '\0', >>>>>> ipv4_address = 1 '\001', >>>>>> ipv6_address = 0 '\0', >>>>>> hostname_address = 0 '\0', >>>>>> asconf_capable = 0 '\0', >>>>>> prsctp_capable = 0 '\0', >>>>>> auth_capable = 0 '\0', >>>>>> adaptation_ind = 0, >>>>>> addip_disabled_mask = 0, >>>>>> i = { >>>>>> init_tag = 0, >>>>>> a_rwnd = 0, >>>>>> num_outbound_streams = 0, >>>>>> num_inbound_streams = 0, >>>>>> initial_tsn = 0 >>>>>> }, >>>>>> cookie_len = 0, >>>>>> cookie = 0x0, >>>>>> addip_serial = 0, >>>>>> peer_random = 0x0, >>>>>> peer_chunks = 0x0, >>>>>> peer_hmacs = 0x0 >>>>>> } >>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Vlad, >>>>>>>> >>>>>>>> One other thing, with the difficulty we are having recreating this >>>>>>>> issue, is there any generic way to increase the likelihood for the >>>>>>>> transport to be cleared out while delaying the association cleanup? >>>>>>>> Is there any way that the association is initialized without any >>>>>>>> transport information? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> When the association is initialized, the lists are empty, but the next >>>>>>> thing that happens is that we add transport of the destination we are >>>>>>> sending to or receiving from to the association and mark it as primary >>>>>>> and >>>>>>> active. All this happens under a socket lock, so getsockopt can't >>>>>>> access the association until all actions on that association complete. >>>>>>> >>>>>>> >>>>>>>> The reason I ask; we believe the issue is >>>>>>>> happening very shortly after the association is brought up (we bring >>>>>>>> it up and then do the getsockopt()). >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can you check what the association state is? Alternately, can you >>>>>>> provide >>>>>>> the kdump and the kernel so I can dig around. >>>>>> >>>>>> >>>>>> >>>>>> crash> p ((struct sctp_association *) 0xffff8107670e3000).state >>>>>> $15 = SCTP_STATE_CLOSED >>>>> >>>>> >>>>> >>>>> >>>>> Hi Karl >>>>> >>>>> Was this the client or the server side? Also what was the socket type >>>>> (STREAM or SEQPACKET)? >>>>> >>>>> -vlad >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> -vlad >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Karl >>>>>>>> >>>>>>> >>>>> >>>> >>>> We believe this is occurring on the client side (still working on >>>> confirming, this system is a Diameter router so we get connections >>>> going in both directions). The connections are all STREAM. We are >>>> also seeing ABORTs fairly regularly on the connections in suspect. >>>> >>>> Karl >>> >>> >>> So we finally got a capture around the time of the panic. The >>> panicing system is acting as a server and the client is connecting, >>> gets through INIT and COOKIE_ECHO, and sends several data packets when >>> the client sends another INIT. At this point, the server handles the >>> INIT, starts over and it starts sending data packets again when the >>> server sends an ABORT because the application doesn't support >>> restarting the connection. >> >> >> Do you know if this is done through SO_LINGER or with sendmsg and MSG_ABORT? >> >> -vlad >> >> >>> It is around this time that the panic >>> occurs. One thing that I noticed is that the sctp_association >>> structure looks awfully similar to a temporary association that is >>> created when an unexpected INIT is received, but before it is >>> populated with peer information. However the temp value is not set to >>> 0 as would be expected. >>> >>> Karl >>> >> > > This is done with SO_LINGER. However, we just were able to reproduce the issue. > > When a duplicate cookie-echo message is received, and the > sctp_sf_do_5_2_4_dupcook() => sctp_unpack_cookie() is called, it calls > sctp_association_new() instead of sctp_make_temp_asoc(), and ends up > creating a full-fledged association instead of one with "temp" set. > Now, if we enter collision case, the primary path does not get written > in the association. When the next command is set to SCTP_CMD_NEW_ASOC, > since the association does not have "temp" marked, it gets added to > the association hash table and the endpoint. Even when the command > SCTP_CMD_DELETE_TCB is processed, since the association is not > temporary, the following check in sctp_cmd_delete_tcb() prevents the > association from being deleted from the hash table or the endpoint. > > if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING) && > (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK)) > return; > > sctp_unhash_established(asoc); > <<< never reached > sctp_association_free(asoc); > <<< never reached > > When we duplicate the traffic using netem, we are able to get this to > occur when getsockopt(SCTP_STATUS) is called due to the transport > being NULL. > > Karl Hi Karl Yep, this is the code I've been looking at as well, just didn't get far enough. I was focusing the dookcook_a case(). I'm attaching a patch (untested) that should fix this. -vlad >