Re: [PATCH] sctp: optimize searching the active path for tsns

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlad Yasevich <vyasevic@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>,
	"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>,
	Xufeng Zhang <xufengzhang.main@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] sctp: optimize searching the active path for tsns
Date: Wed, 13 Mar 2013 16:40:14 +0000	[thread overview]
Message-ID: <5140ABEE.2020903@redhat.com> (raw)
In-Reply-To: <20130313142104.GE17592@hmsreliant.think-freely.org>

On 03/13/2013 10:21 AM, Neil Horman wrote:
> On Wed, Mar 13, 2013 at 10:06:43AM -0400, Vlad Yasevich wrote:
>> On 03/13/2013 09:28 AM, Neil Horman wrote:
>>> On Tue, Mar 12, 2013 at 09:43:20PM -0400, Vlad Yasevich wrote:
>>>> On 03/12/2013 09:20 PM, Neil Horman wrote:
>>>>> On Tue, Mar 12, 2013 at 05:01:50PM -0400, Vlad Yasevich wrote:
>>>>>> Hi Neil
>>>>>>
>>>>>> On 03/12/2013 01:29 PM, Neil Horman wrote:
>>>>>>> SCTP currently attempts to optimize the search for tsns on a transport by first
>>>>>>> checking the active_path, then searching alternate transports.  This operation
>>>>>>> however is a bit convoluted, as we explicitly search the active path, then serch
>>>>>>> all other transports, skipping the active path, when its detected.  Lets
>>>>>>> optimize this by preforming a move to front on the transport_addr_list every
>>>>>>> time the active_path is changed.  The active_path changes occur in relatively
>>>>>>> non-critical paths, and doing so allows us to just search the
>>>>>>> transport_addr_list in order, avoiding an extra conditional check in the
>>>>>>> relatively hot tsn lookup path.  This also happens to fix a bug where we break
>>>>>>> out of the for loop early in the tsn lookup.
>>>>>>>
>>>>>>> CC: Xufeng Zhang <xufengzhang.main@gmail.com>
>>>>>>> CC: vyasevich@gmail.com
>>>>>>> CC: davem@davemloft.net
>>>>>>> CC: netdev@vger.kernel.org
>>>>>>> ---
>>>>>>>   net/sctp/associola.c | 31 ++++++++++++-------------------
>>>>>>>   1 file changed, 12 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>>>>>> index 43cd0dd..7af96b3 100644
>>>>>>> --- a/net/sctp/associola.c
>>>>>>> +++ b/net/sctp/associola.c
>>>>>>> @@ -513,8 +513,11 @@ void sctp_assoc_set_primary(struct sctp_association *asoc,
>>>>>>>   	 * user wants to use this new path.
>>>>>>>   	 */
>>>>>>>   	if ((transport->state = SCTP_ACTIVE) ||
>>>>>>> -	    (transport->state = SCTP_UNKNOWN))
>>>>>>> +	    (transport->state = SCTP_UNKNOWN)) {
>>>>>>> +		list_del_rcu(&transport->transports);
>>>>>>> +		list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list);
>>>>>>>   		asoc->peer.active_path = transport;
>>>>>>> +	}
>>>>>>
>>>>>> What would happen if at the same time someone is walking the list
>>>>>> through the proc interfaces?
>>>>>>
>>>>>> Since you are effectively changing the .next pointer, I think it is
>>>>>> possible to get a duplicate transport output essentially corrupting
>>>>>> the output.
>>>>>>
>>>>> It would be the case, but you're using the rcu variants of the list_add macros
>>>>> at all the points where we modify the list (some of which we do at call sites
>>>>> right before we call set_primary, see sctp_assoc_add_peer, where
>>>>> list_add_tail_rcu also modifies a next pointer).  So if this is a problem, its a
>>>>> problem without this patch.  In fact looking at it, our list access to
>>>>> transport_addr_list is broken, as we use rcu apis to modify the list but non-rcu
>>>>> apis to traverse the list.  I'll need to fix that first.
>>>>
>>>> As long as we are under lock, we don't need rcu variants for
>>>> traversal.  The traversals done by the sctp_seq_ functions already
>>>> use correct rcu variants.
>>>>
>>>> I don't see this as a problem right now since we either delete the
>>>> transport, or add it.  We don't move it to a new location in the list.
>>>> With the move, what could happen is that while sctp_seq_ is printing
>>>> a transport, you might move it to another spot, and the when you grab
>>>> the .next pointer on the next iteration, it points to a completely
>>>> different spot.
>>>>
>>> Ok, I see what you're saying, and looking at the seq code, with your description
>>> I see how we're using half the rcu code to allow the proc interface to avoid
>>> grabbing the socket lock.  But this just strikes me a poor coding.  Its
>>> confusing to say the least, and begging for mistakes like the one I just made to
>>> be made again.  Regardless of necessisty, it seems to me the code would be both
>>> more readable and understandible if we modified it so that we used the rcu api
>>> consistently to access that list.
>>> Neil
>>>
>>
>> Can you elaborate a bit on why you believe we are using half of the
>> rcu code?
>>
> We're using the rcu list add/del apis on the write side when
> we modify the transport_addr_list, but we're using the non-rcu primitives on the
>  read side, save for the proc interface.

What other lockless read side do we have?


>  Granted that generally safe, exepct for
> any case in which you do exactly what we were speaking about above.  I know were
> not doing that now, but it seems to me it would be better to use the rcu
> primitives consistently, so that it was clear how to access the list.  It would
> prevent mistakes like the one I just made, as well as other possible mistakes,
> in which future coding errors.

It would cost extra barriers that are completely unnecessary.

>
>> I think to support the move operation you are proposing here, you
>> need something like
>> 	list_for_each_entry_safe_rcu()
>>
> Not to the best of my knoweldge.  Consistent use of the rcu list access
> primitives is safe against rcu list mutation primitives, as long as the read
> side is protected by the rcu_read_lock.  As long as thats locked, any mutations
> won't be seen by the read context until after we exit the read side critical
> section.

Not the way I understand rcu.  rcu_read_lock will not prevent access to 
new contents.  This is why list_del_rcu() is careful not to change the 
.next pointer.
In the case you are proposing, if the reader is current processing entry 
X, and the writer, at the same time moves X to another place, then
at the time the reader looks at X.next, it may point to the new place. 
If that happens, you've now corrupted output.

>
>> where the .next pointer is fetched through rcu_dereference() to
>> guard against its possible.
>>
> You won't see the updated next pointer until the read critical side ends.  Thats
> why you need to do an rcu_read_lock in addition to grabbing the socket lock.
>

No, that's now how it works.  It's based on memory barriers at the time 
of deletion/addition and dereference.

-vlad

>> But even that would only work if the move happens to the the earlier
>> spot (like head) of the list.  If the move happens to the later part
>> of the list (like tail), you may end up visiting the same list node
>> twice.
>>
> Again, not if you hold the rcu_read_lock.  Doing so creates a quiescent point in
> which the status of the list is held until such time as read side critical
> section exits.  The move (specifically the delete and the add) won't be seen by
> the reading context until that quiescent point completes, which by definition is
> when that reader is done reading.
>
> Neil
>
>> I think rcu simply can't guard against this.
>>
>> -vlad
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

WARNING: multiple messages have this Message-ID (diff)

From: Vlad Yasevich <vyasevic@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>,
	"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>,
	Xufeng Zhang <xufengzhang.main@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] sctp: optimize searching the active path for tsns
Date: Wed, 13 Mar 2013 12:40:14 -0400	[thread overview]
Message-ID: <5140ABEE.2020903@redhat.com> (raw)
In-Reply-To: <20130313142104.GE17592@hmsreliant.think-freely.org>

On 03/13/2013 10:21 AM, Neil Horman wrote:
> On Wed, Mar 13, 2013 at 10:06:43AM -0400, Vlad Yasevich wrote:
>> On 03/13/2013 09:28 AM, Neil Horman wrote:
>>> On Tue, Mar 12, 2013 at 09:43:20PM -0400, Vlad Yasevich wrote:
>>>> On 03/12/2013 09:20 PM, Neil Horman wrote:
>>>>> On Tue, Mar 12, 2013 at 05:01:50PM -0400, Vlad Yasevich wrote:
>>>>>> Hi Neil
>>>>>>
>>>>>> On 03/12/2013 01:29 PM, Neil Horman wrote:
>>>>>>> SCTP currently attempts to optimize the search for tsns on a transport by first
>>>>>>> checking the active_path, then searching alternate transports.  This operation
>>>>>>> however is a bit convoluted, as we explicitly search the active path, then serch
>>>>>>> all other transports, skipping the active path, when its detected.  Lets
>>>>>>> optimize this by preforming a move to front on the transport_addr_list every
>>>>>>> time the active_path is changed.  The active_path changes occur in relatively
>>>>>>> non-critical paths, and doing so allows us to just search the
>>>>>>> transport_addr_list in order, avoiding an extra conditional check in the
>>>>>>> relatively hot tsn lookup path.  This also happens to fix a bug where we break
>>>>>>> out of the for loop early in the tsn lookup.
>>>>>>>
>>>>>>> CC: Xufeng Zhang <xufengzhang.main@gmail.com>
>>>>>>> CC: vyasevich@gmail.com
>>>>>>> CC: davem@davemloft.net
>>>>>>> CC: netdev@vger.kernel.org
>>>>>>> ---
>>>>>>>   net/sctp/associola.c | 31 ++++++++++++-------------------
>>>>>>>   1 file changed, 12 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>>>>>> index 43cd0dd..7af96b3 100644
>>>>>>> --- a/net/sctp/associola.c
>>>>>>> +++ b/net/sctp/associola.c
>>>>>>> @@ -513,8 +513,11 @@ void sctp_assoc_set_primary(struct sctp_association *asoc,
>>>>>>>   	 * user wants to use this new path.
>>>>>>>   	 */
>>>>>>>   	if ((transport->state == SCTP_ACTIVE) ||
>>>>>>> -	    (transport->state == SCTP_UNKNOWN))
>>>>>>> +	    (transport->state == SCTP_UNKNOWN)) {
>>>>>>> +		list_del_rcu(&transport->transports);
>>>>>>> +		list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list);
>>>>>>>   		asoc->peer.active_path = transport;
>>>>>>> +	}
>>>>>>
>>>>>> What would happen if at the same time someone is walking the list
>>>>>> through the proc interfaces?
>>>>>>
>>>>>> Since you are effectively changing the .next pointer, I think it is
>>>>>> possible to get a duplicate transport output essentially corrupting
>>>>>> the output.
>>>>>>
>>>>> It would be the case, but you're using the rcu variants of the list_add macros
>>>>> at all the points where we modify the list (some of which we do at call sites
>>>>> right before we call set_primary, see sctp_assoc_add_peer, where
>>>>> list_add_tail_rcu also modifies a next pointer).  So if this is a problem, its a
>>>>> problem without this patch.  In fact looking at it, our list access to
>>>>> transport_addr_list is broken, as we use rcu apis to modify the list but non-rcu
>>>>> apis to traverse the list.  I'll need to fix that first.
>>>>
>>>> As long as we are under lock, we don't need rcu variants for
>>>> traversal.  The traversals done by the sctp_seq_ functions already
>>>> use correct rcu variants.
>>>>
>>>> I don't see this as a problem right now since we either delete the
>>>> transport, or add it.  We don't move it to a new location in the list.
>>>> With the move, what could happen is that while sctp_seq_ is printing
>>>> a transport, you might move it to another spot, and the when you grab
>>>> the .next pointer on the next iteration, it points to a completely
>>>> different spot.
>>>>
>>> Ok, I see what you're saying, and looking at the seq code, with your description
>>> I see how we're using half the rcu code to allow the proc interface to avoid
>>> grabbing the socket lock.  But this just strikes me a poor coding.  Its
>>> confusing to say the least, and begging for mistakes like the one I just made to
>>> be made again.  Regardless of necessisty, it seems to me the code would be both
>>> more readable and understandible if we modified it so that we used the rcu api
>>> consistently to access that list.
>>> Neil
>>>
>>
>> Can you elaborate a bit on why you believe we are using half of the
>> rcu code?
>>
> We're using the rcu list add/del apis on the write side when
> we modify the transport_addr_list, but we're using the non-rcu primitives on the
>  read side, save for the proc interface.

What other lockless read side do we have?


>  Granted that generally safe, exepct for
> any case in which you do exactly what we were speaking about above.  I know were
> not doing that now, but it seems to me it would be better to use the rcu
> primitives consistently, so that it was clear how to access the list.  It would
> prevent mistakes like the one I just made, as well as other possible mistakes,
> in which future coding errors.

It would cost extra barriers that are completely unnecessary.

>
>> I think to support the move operation you are proposing here, you
>> need something like
>> 	list_for_each_entry_safe_rcu()
>>
> Not to the best of my knoweldge.  Consistent use of the rcu list access
> primitives is safe against rcu list mutation primitives, as long as the read
> side is protected by the rcu_read_lock.  As long as thats locked, any mutations
> won't be seen by the read context until after we exit the read side critical
> section.

Not the way I understand rcu.  rcu_read_lock will not prevent access to 
new contents.  This is why list_del_rcu() is careful not to change the 
.next pointer.
In the case you are proposing, if the reader is current processing entry 
X, and the writer, at the same time moves X to another place, then
at the time the reader looks at X.next, it may point to the new place. 
If that happens, you've now corrupted output.

>
>> where the .next pointer is fetched through rcu_dereference() to
>> guard against its possible.
>>
> You won't see the updated next pointer until the read critical side ends.  Thats
> why you need to do an rcu_read_lock in addition to grabbing the socket lock.
>

No, that's now how it works.  It's based on memory barriers at the time 
of deletion/addition and dereference.

-vlad

>> But even that would only work if the move happens to the the earlier
>> spot (like head) of the list.  If the move happens to the later part
>> of the list (like tail), you may end up visiting the same list node
>> twice.
>>
> Again, not if you hold the rcu_read_lock.  Doing so creates a quiescent point in
> which the status of the list is held until such time as read side critical
> section exits.  The move (specifically the delete and the add) won't be seen by
> the reading context until that quiescent point completes, which by definition is
> when that reader is done reading.
>
> Neil
>
>> I think rcu simply can't guard against this.
>>
>> -vlad
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2013-03-13 16:40 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-08  7:39 [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Xufeng Zhang
2013-03-08  7:39 ` Xufeng Zhang
2013-03-08 14:27 ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Neil Horman
2013-03-08 14:27   ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-11  2:14   ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Xufeng Zhang
2013-03-11  2:14     ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Xufeng Zhang
2013-03-11 13:31     ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Neil Horman
2013-03-11 13:31       ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-12  2:24       ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Xufeng Zhang
2013-03-12  2:24         ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Xufeng Zhang
2013-03-12 11:30         ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Neil Horman
2013-03-12 11:30           ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-12 12:11       ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Vlad Yasevich
2013-03-12 12:11         ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Vlad Yasevich
2013-03-12 15:44         ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Neil Horman
2013-03-12 15:44           ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-12 16:00           ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Vlad Yasevich
2013-03-12 16:00             ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Vlad Yasevich
2013-03-12 17:29             ` [PATCH] sctp: optimize searching the active path for tsns Neil Horman
2013-03-12 21:01               ` Vlad Yasevich
2013-03-13  1:20                 ` Neil Horman
2013-03-13  1:43                   ` Vlad Yasevich
2013-03-13 13:28                     ` Neil Horman
2013-03-13 14:06                       ` Vlad Yasevich
2013-03-13 14:06                         ` Vlad Yasevich
2013-03-13 14:21                         ` Neil Horman
2013-03-13 14:21                           ` Neil Horman
2013-03-13 16:40                           ` Vlad Yasevich [this message]
2013-03-13 16:40                             ` Vlad Yasevich
2013-03-13 16:41                       ` Paul E. McKenney
2013-03-13 13:33 ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Neil Horman
2013-03-13 13:33   ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Neil Horman
2013-03-13 13:52 ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans Vlad Yasevich
2013-03-13 13:52   ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport Vlad Yasevich
2013-03-13 14:11   ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched trans David Miller
2013-03-13 14:11     ` [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5140ABEE.2020903@redhat.com \
    --to=vyasevic@redhat.com \
    --cc=davem@davemloft.net \
    --cc=linux-sctp@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=vyasevich@gmail.com \
    --cc=xufengzhang.main@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.