Re: [PATCH] sctp: optimize searching the active path for tsns

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vlad Yasevich <vyasevic@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>,
	"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>,
	Xufeng Zhang <xufengzhang.main@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] sctp: optimize searching the active path for tsns
Date: Wed, 13 Mar 2013 12:40:14 -0400	[thread overview]
Message-ID: <5140ABEE.2020903@redhat.com> (raw)
In-Reply-To: <20130313142104.GE17592@hmsreliant.think-freely.org>

On 03/13/2013 10:21 AM, Neil Horman wrote:
> On Wed, Mar 13, 2013 at 10:06:43AM -0400, Vlad Yasevich wrote:
>> On 03/13/2013 09:28 AM, Neil Horman wrote:
>>> On Tue, Mar 12, 2013 at 09:43:20PM -0400, Vlad Yasevich wrote:
>>>> On 03/12/2013 09:20 PM, Neil Horman wrote:
>>>>> On Tue, Mar 12, 2013 at 05:01:50PM -0400, Vlad Yasevich wrote:
>>>>>> Hi Neil
>>>>>>
>>>>>> On 03/12/2013 01:29 PM, Neil Horman wrote:
>>>>>>> SCTP currently attempts to optimize the search for tsns on a transport by first
>>>>>>> checking the active_path, then searching alternate transports.  This operation
>>>>>>> however is a bit convoluted, as we explicitly search the active path, then serch
>>>>>>> all other transports, skipping the active path, when its detected.  Lets
>>>>>>> optimize this by preforming a move to front on the transport_addr_list every
>>>>>>> time the active_path is changed.  The active_path changes occur in relatively
>>>>>>> non-critical paths, and doing so allows us to just search the
>>>>>>> transport_addr_list in order, avoiding an extra conditional check in the
>>>>>>> relatively hot tsn lookup path.  This also happens to fix a bug where we break
>>>>>>> out of the for loop early in the tsn lookup.
>>>>>>>
>>>>>>> CC: Xufeng Zhang <xufengzhang.main@gmail.com>
>>>>>>> CC: vyasevich@gmail.com
>>>>>>> CC: davem@davemloft.net
>>>>>>> CC: netdev@vger.kernel.org
>>>>>>> ---
>>>>>>>   net/sctp/associola.c | 31 ++++++++++++-------------------
>>>>>>>   1 file changed, 12 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>>>>>> index 43cd0dd..7af96b3 100644
>>>>>>> --- a/net/sctp/associola.c
>>>>>>> +++ b/net/sctp/associola.c
>>>>>>> @@ -513,8 +513,11 @@ void sctp_assoc_set_primary(struct sctp_association *asoc,
>>>>>>>   	 * user wants to use this new path.
>>>>>>>   	 */
>>>>>>>   	if ((transport->state == SCTP_ACTIVE) ||
>>>>>>> -	    (transport->state == SCTP_UNKNOWN))
>>>>>>> +	    (transport->state == SCTP_UNKNOWN)) {
>>>>>>> +		list_del_rcu(&transport->transports);
>>>>>>> +		list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list);
>>>>>>>   		asoc->peer.active_path = transport;
>>>>>>> +	}
>>>>>>
>>>>>> What would happen if at the same time someone is walking the list
>>>>>> through the proc interfaces?
>>>>>>
>>>>>> Since you are effectively changing the .next pointer, I think it is
>>>>>> possible to get a duplicate transport output essentially corrupting
>>>>>> the output.
>>>>>>
>>>>> It would be the case, but you're using the rcu variants of the list_add macros
>>>>> at all the points where we modify the list (some of which we do at call sites
>>>>> right before we call set_primary, see sctp_assoc_add_peer, where
>>>>> list_add_tail_rcu also modifies a next pointer).  So if this is a problem, its a
>>>>> problem without this patch.  In fact looking at it, our list access to
>>>>> transport_addr_list is broken, as we use rcu apis to modify the list but non-rcu
>>>>> apis to traverse the list.  I'll need to fix that first.
>>>>
>>>> As long as we are under lock, we don't need rcu variants for
>>>> traversal.  The traversals done by the sctp_seq_ functions already
>>>> use correct rcu variants.
>>>>
>>>> I don't see this as a problem right now since we either delete the
>>>> transport, or add it.  We don't move it to a new location in the list.
>>>> With the move, what could happen is that while sctp_seq_ is printing
>>>> a transport, you might move it to another spot, and the when you grab
>>>> the .next pointer on the next iteration, it points to a completely
>>>> different spot.
>>>>
>>> Ok, I see what you're saying, and looking at the seq code, with your description
>>> I see how we're using half the rcu code to allow the proc interface to avoid
>>> grabbing the socket lock.  But this just strikes me a poor coding.  Its
>>> confusing to say the least, and begging for mistakes like the one I just made to
>>> be made again.  Regardless of necessisty, it seems to me the code would be both
>>> more readable and understandible if we modified it so that we used the rcu api
>>> consistently to access that list.
>>> Neil
>>>
>>
>> Can you elaborate a bit on why you believe we are using half of the
>> rcu code?
>>
> We're using the rcu list add/del apis on the write side when
> we modify the transport_addr_list, but we're using the non-rcu primitives on the
>  read side, save for the proc interface.

What other lockless read side do we have?


>  Granted that generally safe, exepct for
> any case in which you do exactly what we were speaking about above.  I know were
> not doing that now, but it seems to me it would be better to use the rcu
> primitives consistently, so that it was clear how to access the list.  It would
> prevent mistakes like the one I just made, as well as other possible mistakes,
> in which future coding errors.

It would cost extra barriers that are completely unnecessary.

>
>> I think to support the move operation you are proposing here, you
>> need something like
>> 	list_for_each_entry_safe_rcu()
>>
> Not to the best of my knoweldge.  Consistent use of the rcu list access
> primitives is safe against rcu list mutation primitives, as long as the read
> side is protected by the rcu_read_lock.  As long as thats locked, any mutations
> won't be seen by the read context until after we exit the read side critical
> section.

Not the way I understand rcu.  rcu_read_lock will not prevent access to 
new contents.  This is why list_del_rcu() is careful not to change the 
.next pointer.
In the case you are proposing, if the reader is current processing entry 
X, and the writer, at the same time moves X to another place, then
at the time the reader looks at X.next, it may point to the new place. 
If that happens, you've now corrupted output.

>
>> where the .next pointer is fetched through rcu_dereference() to
>> guard against its possible.
>>
> You won't see the updated next pointer until the read critical side ends.  Thats
> why you need to do an rcu_read_lock in addition to grabbing the socket lock.
>

No, that's now how it works.  It's based on memory barriers at the time 
of deletion/addition and dereference.

-vlad

>> But even that would only work if the move happens to the the earlier
>> spot (like head) of the list.  If the move happens to the later part
>> of the list (like tail), you may end up visiting the same list node
>> twice.
>>
> Again, not if you hold the rcu_read_lock.  Doing so creates a quiescent point in
> which the status of the list is held until such time as read side critical
> section exits.  The move (specifically the delete and the add) won't be seen by
> the reading context until that quiescent point completes, which by definition is
> when that reader is done reading.
>
> Neil
>
>> I think rcu simply can't guard against this.
>>
>> -vlad
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2013-03-13 16:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-08  7:39 [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Xufeng Zhang
2013-03-08 14:27 ` Neil Horman
2013-03-11  2:14   ` Xufeng Zhang
2013-03-11 13:31     ` Neil Horman
2013-03-12  2:24       ` Xufeng Zhang
2013-03-12 11:30         ` Neil Horman
2013-03-12 12:11       ` Vlad Yasevich
2013-03-12 15:44         ` Neil Horman
2013-03-12 16:00           ` Vlad Yasevich
2013-03-12 17:29             ` [PATCH] sctp: optimize searching the active path for tsns Neil Horman
2013-03-12 21:01               ` Vlad Yasevich
2013-03-13  1:20                 ` Neil Horman
2013-03-13  1:43                   ` Vlad Yasevich
2013-03-13 13:28                     ` Neil Horman
2013-03-13 14:06                       ` Vlad Yasevich
2013-03-13 14:21                         ` Neil Horman
2013-03-13 16:40                           ` Vlad Yasevich [this message]
2013-03-13 16:41                       ` Paul E. McKenney
2013-03-13 13:33 ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-13 13:52 ` Vlad Yasevich
2013-03-13 14:11   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5140ABEE.2020903@redhat.com \
    --to=vyasevic@redhat.com \
    --cc=davem@davemloft.net \
    --cc=linux-sctp@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=vyasevich@gmail.com \
    --cc=xufengzhang.main@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).