From: Vlad Yasevich <vyasevic@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>,
"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>,
Xufeng Zhang <xufengzhang.main@gmail.com>,
davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] sctp: optimize searching the active path for tsns
Date: Wed, 13 Mar 2013 12:40:14 -0400 [thread overview]
Message-ID: <5140ABEE.2020903@redhat.com> (raw)
In-Reply-To: <20130313142104.GE17592@hmsreliant.think-freely.org>
On 03/13/2013 10:21 AM, Neil Horman wrote:
> On Wed, Mar 13, 2013 at 10:06:43AM -0400, Vlad Yasevich wrote:
>> On 03/13/2013 09:28 AM, Neil Horman wrote:
>>> On Tue, Mar 12, 2013 at 09:43:20PM -0400, Vlad Yasevich wrote:
>>>> On 03/12/2013 09:20 PM, Neil Horman wrote:
>>>>> On Tue, Mar 12, 2013 at 05:01:50PM -0400, Vlad Yasevich wrote:
>>>>>> Hi Neil
>>>>>>
>>>>>> On 03/12/2013 01:29 PM, Neil Horman wrote:
>>>>>>> SCTP currently attempts to optimize the search for tsns on a transport by first
>>>>>>> checking the active_path, then searching alternate transports. This operation
>>>>>>> however is a bit convoluted, as we explicitly search the active path, then serch
>>>>>>> all other transports, skipping the active path, when its detected. Lets
>>>>>>> optimize this by preforming a move to front on the transport_addr_list every
>>>>>>> time the active_path is changed. The active_path changes occur in relatively
>>>>>>> non-critical paths, and doing so allows us to just search the
>>>>>>> transport_addr_list in order, avoiding an extra conditional check in the
>>>>>>> relatively hot tsn lookup path. This also happens to fix a bug where we break
>>>>>>> out of the for loop early in the tsn lookup.
>>>>>>>
>>>>>>> CC: Xufeng Zhang <xufengzhang.main@gmail.com>
>>>>>>> CC: vyasevich@gmail.com
>>>>>>> CC: davem@davemloft.net
>>>>>>> CC: netdev@vger.kernel.org
>>>>>>> ---
>>>>>>> net/sctp/associola.c | 31 ++++++++++++-------------------
>>>>>>> 1 file changed, 12 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>>>>>> index 43cd0dd..7af96b3 100644
>>>>>>> --- a/net/sctp/associola.c
>>>>>>> +++ b/net/sctp/associola.c
>>>>>>> @@ -513,8 +513,11 @@ void sctp_assoc_set_primary(struct sctp_association *asoc,
>>>>>>> * user wants to use this new path.
>>>>>>> */
>>>>>>> if ((transport->state == SCTP_ACTIVE) ||
>>>>>>> - (transport->state == SCTP_UNKNOWN))
>>>>>>> + (transport->state == SCTP_UNKNOWN)) {
>>>>>>> + list_del_rcu(&transport->transports);
>>>>>>> + list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list);
>>>>>>> asoc->peer.active_path = transport;
>>>>>>> + }
>>>>>>
>>>>>> What would happen if at the same time someone is walking the list
>>>>>> through the proc interfaces?
>>>>>>
>>>>>> Since you are effectively changing the .next pointer, I think it is
>>>>>> possible to get a duplicate transport output essentially corrupting
>>>>>> the output.
>>>>>>
>>>>> It would be the case, but you're using the rcu variants of the list_add macros
>>>>> at all the points where we modify the list (some of which we do at call sites
>>>>> right before we call set_primary, see sctp_assoc_add_peer, where
>>>>> list_add_tail_rcu also modifies a next pointer). So if this is a problem, its a
>>>>> problem without this patch. In fact looking at it, our list access to
>>>>> transport_addr_list is broken, as we use rcu apis to modify the list but non-rcu
>>>>> apis to traverse the list. I'll need to fix that first.
>>>>
>>>> As long as we are under lock, we don't need rcu variants for
>>>> traversal. The traversals done by the sctp_seq_ functions already
>>>> use correct rcu variants.
>>>>
>>>> I don't see this as a problem right now since we either delete the
>>>> transport, or add it. We don't move it to a new location in the list.
>>>> With the move, what could happen is that while sctp_seq_ is printing
>>>> a transport, you might move it to another spot, and the when you grab
>>>> the .next pointer on the next iteration, it points to a completely
>>>> different spot.
>>>>
>>> Ok, I see what you're saying, and looking at the seq code, with your description
>>> I see how we're using half the rcu code to allow the proc interface to avoid
>>> grabbing the socket lock. But this just strikes me a poor coding. Its
>>> confusing to say the least, and begging for mistakes like the one I just made to
>>> be made again. Regardless of necessisty, it seems to me the code would be both
>>> more readable and understandible if we modified it so that we used the rcu api
>>> consistently to access that list.
>>> Neil
>>>
>>
>> Can you elaborate a bit on why you believe we are using half of the
>> rcu code?
>>
> We're using the rcu list add/del apis on the write side when
> we modify the transport_addr_list, but we're using the non-rcu primitives on the
> read side, save for the proc interface.
What other lockless read side do we have?
> Granted that generally safe, exepct for
> any case in which you do exactly what we were speaking about above. I know were
> not doing that now, but it seems to me it would be better to use the rcu
> primitives consistently, so that it was clear how to access the list. It would
> prevent mistakes like the one I just made, as well as other possible mistakes,
> in which future coding errors.
It would cost extra barriers that are completely unnecessary.
>
>> I think to support the move operation you are proposing here, you
>> need something like
>> list_for_each_entry_safe_rcu()
>>
> Not to the best of my knoweldge. Consistent use of the rcu list access
> primitives is safe against rcu list mutation primitives, as long as the read
> side is protected by the rcu_read_lock. As long as thats locked, any mutations
> won't be seen by the read context until after we exit the read side critical
> section.
Not the way I understand rcu. rcu_read_lock will not prevent access to
new contents. This is why list_del_rcu() is careful not to change the
.next pointer.
In the case you are proposing, if the reader is current processing entry
X, and the writer, at the same time moves X to another place, then
at the time the reader looks at X.next, it may point to the new place.
If that happens, you've now corrupted output.
>
>> where the .next pointer is fetched through rcu_dereference() to
>> guard against its possible.
>>
> You won't see the updated next pointer until the read critical side ends. Thats
> why you need to do an rcu_read_lock in addition to grabbing the socket lock.
>
No, that's now how it works. It's based on memory barriers at the time
of deletion/addition and dereference.
-vlad
>> But even that would only work if the move happens to the the earlier
>> spot (like head) of the list. If the move happens to the later part
>> of the list (like tail), you may end up visiting the same list node
>> twice.
>>
> Again, not if you hold the rcu_read_lock. Doing so creates a quiescent point in
> which the status of the list is held until such time as read side critical
> section exits. The move (specifically the delete and the add) won't be seen by
> the reading context until that quiescent point completes, which by definition is
> when that reader is done reading.
>
> Neil
>
>> I think rcu simply can't guard against this.
>>
>> -vlad
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-03-13 16:40 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-08 7:39 [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Xufeng Zhang
2013-03-08 14:27 ` Neil Horman
2013-03-11 2:14 ` Xufeng Zhang
2013-03-11 13:31 ` Neil Horman
2013-03-12 2:24 ` Xufeng Zhang
2013-03-12 11:30 ` Neil Horman
2013-03-12 12:11 ` Vlad Yasevich
2013-03-12 15:44 ` Neil Horman
2013-03-12 16:00 ` Vlad Yasevich
2013-03-12 17:29 ` [PATCH] sctp: optimize searching the active path for tsns Neil Horman
2013-03-12 21:01 ` Vlad Yasevich
2013-03-13 1:20 ` Neil Horman
2013-03-13 1:43 ` Vlad Yasevich
2013-03-13 13:28 ` Neil Horman
2013-03-13 14:06 ` Vlad Yasevich
2013-03-13 14:21 ` Neil Horman
2013-03-13 16:40 ` Vlad Yasevich [this message]
2013-03-13 16:41 ` Paul E. McKenney
2013-03-13 13:33 ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-13 13:52 ` Vlad Yasevich
2013-03-13 14:11 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5140ABEE.2020903@redhat.com \
--to=vyasevic@redhat.com \
--cc=davem@davemloft.net \
--cc=linux-sctp@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=vyasevich@gmail.com \
--cc=xufengzhang.main@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).