From: Vlad Yasevich <vyasevich@gmail.com>
To: linux-sctp@vger.kernel.org
Subject: Re: NULL primary_path
Date: Fri, 08 Mar 2013 15:31:04 +0000 [thread overview]
Message-ID: <513A0438.8050703@gmail.com> (raw)
In-Reply-To: <CAGugRbVkLt2AzKkxX3SzJhjDfyPJpviD64_Y5-hgtqxSTwzQiA@mail.gmail.com>
On 03/08/2013 09:31 AM, Karl Heiss wrote:
> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@gmail.com> wrote:
>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>
>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@gmail.com>
>>>> wrote:
>>>>>
>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>
>>>>>>
>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>> small window in which the association is still valid but
>>>>>> asoc->peer.primary_path is NULL.
>>>>>
>>>>>
>>>>>
>>>>> Aha! Thanks. There was a bug in the rcu clean-up that allowed the
>>>>> association to remain while all transports have been removed.
>>>>>
>>>>> Here is a patch that should have addressed this condition:
>>>>>
>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>> Author: Daniel Borkmann <dborkman@redhat.com>
>>>>> Date: Fri Feb 1 04:37:43 2013 +0000
>>>>>
>>>>> sctp: sctp_close: fix release of bindings for deferred call_rcu's
>>>>>
>>>>> Full patch is here:
>>>>>
>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?idŒ98653f05534acd1cb07ea4929702a3659177d1
>>>>>
>>>>> Make sure that you have this patch in the kernel you are running
>>>>>
>>>>> -vlad
>>>>>
>>>>>
>>>>>>
>>>>
>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>> that we are using (2.6.36.2) since it does not have a
>>>> sctp_transport_destroy_rcu() function. Is there any chance that
>>>> simply swapping the order of the instructions without moving them
>>>> would have any effect? I ask this hypothetically because the race
>>>> condition window seems to be difficult to recreate, thus nothing to
>>>> test against (aside from in the field!).
>>>>
>>>> Karl
>>>>
>>>
>>> Hi Karl
>>>
>>> I think I see the problem now. The problem happens when the association is
>>> destroyed. We delay removing the association from
>>> the association id pool until all references on the association
>>> have dropped. As a result, it is possible (for a very short
>>> period of time) for an association structure to still exist in
>>> the kernel and still be found via the association id, but that association
>>> has no transports and is about to be completely destroyed.
>>>
>>> This is a really interesting race and I need to figure out if it is
>>> there on purpose or not?
>>>
>>> In the mean time, here is a patch that should solve it for you.
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index b907073..2d92c89 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>>> sctp_assoc_t id)
>>> if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>> asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>> struct sctp_association, asocs);
>>> - return asoc;
>>> + goto done;
>>> }
>>>
>>> /* Otherwise this is a UDP-style socket. */
>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk,
>>> sctp_assoc_t id)
>>> asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>> (int)id);
>>> spin_unlock_bh(&sctp_assocs_id_lock);
>>>
>>> +done:
>>> if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>> return NULL;
>>>
>>
>> Vlad,
>>
>> Looking at the kdump from the panic, I am seeing that your patch above
>> may not work in this case since the asoc is valid, the base.sk is
>> valid, and base.dead is 0. Unless base.sk is valid but doesn't match
>> sk, this wouldn't appear to fix this issue.
Hm.. If the association is not marked "dead", it should still have all
its transports present. If you look at the peer.transport_addr_list in
you kdump, is that list empty or not?
Are any other peer transport pointers set (active_path, retran_path)?
>>
>> Karl
>
> Vlad,
>
> One other thing, with the difficulty we are having recreating this
> issue, is there any generic way to increase the likelihood for the
> transport to be cleared out while delaying the association cleanup?
> Is there any way that the association is initialized without any
> transport information?
When the association is initialized, the lists are empty, but the next
thing that happens is that we add transport of the destination we are
sending to or receiving from to the association and mark it as primary
and active. All this happens under a socket lock, so getsockopt can't
access the association until all actions on that association complete.
> The reason I ask; we believe the issue is
> happening very shortly after the association is brought up (we bring
> it up and then do the getsockopt()).
Can you check what the association state is? Alternately, can you
provide the kdump and the kernel so I can dig around.
Thanks
-vlad
>
> Thanks,
> Karl
>
next prev parent reply other threads:[~2013-03-08 15:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-06 22:57 NULL primary_path Karl Heiss
2013-03-07 1:53 ` Vlad Yasevich
2013-03-07 14:52 ` Cristian Constantin
2013-03-07 15:29 ` Neil Horman
2013-03-07 15:44 ` Vlad Yasevich
2013-03-07 15:48 ` Vlad Yasevich
2013-03-07 17:06 ` Karl Heiss
2013-03-07 17:17 ` Vlad Yasevich
2013-03-07 21:51 ` Karl Heiss
2013-03-07 22:08 ` Vlad Yasevich
2013-03-07 23:09 ` Vlad Yasevich
2013-03-08 13:52 ` Karl Heiss
2013-03-08 14:31 ` Karl Heiss
2013-03-08 14:35 ` Neil Horman
2013-03-08 15:31 ` Vlad Yasevich [this message]
2013-03-08 15:37 ` Karl Heiss
2013-03-08 16:42 ` Vlad Yasevich
2013-03-08 17:06 ` Karl Heiss
2013-03-09 20:19 ` Karl Heiss
2013-03-11 21:59 ` Vlad Yasevich
2013-03-11 22:44 ` Karl Heiss
2013-03-11 23:10 ` Vlad Yasevich
2013-03-12 1:05 ` Karl Heiss
2013-03-12 16:18 ` Karl Heiss
2013-03-12 17:23 ` Vlad Yasevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513A0438.8050703@gmail.com \
--to=vyasevich@gmail.com \
--cc=linux-sctp@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.