netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Octavian Purdila <opurdila@ixiacom.com>
Cc: netdev@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Subject: Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
Date: Wed, 09 Dec 2009 21:19:24 +0100	[thread overview]
Message-ID: <4B20064C.7070301@gmail.com> (raw)
In-Reply-To: <200912082310.45846.opurdila@ixiacom.com>

Le 08/12/2009 22:10, Octavian Purdila a écrit :
> On Friday 04 December 2009 01:52:44 you wrote:
> 
>>> Since at this point we are using UP ports contention is not really an
>>> issue for us. I've extrapolated this (lock per hash bucket) based on how
>>> locking is done in other places, like UDP.
>>
>> Yes but you know we want to remove those locks per UDP hash bucket, since
>>  we dont really need them anymore. ;)
>>
>>
>> If you remember, we had in the past one rwlock for the whole UDP table.
>>
>> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>>  lookups for unicast RX
>>
>> Then we dynamically sized udp table at boot (up to 65536 slots)
>>
>> multicast optimization (holding lock for small duration + double hashing)
>>
>> bind optimization (thanks to double hashing)
>>
>> To be done :
>>
>> 1) multicast RX can be done without taking any lock, and RCU lookups
>> 2) zap all locks and use one lock, or a small array of hashed spinlocks
>>
> 
> OK, here is my first try at llc RCU-fication. 
> 
> One doubt before pasting the code: In slab.h comment and in udp.c I see the lookup is restarted if an improper object is returned. Is that really required?
> 
> Thanks!
> tavi
>  
> [RFC PATCH] llc: convert the socket list to RCU locking
>     
> For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
> which require some extra checks in the lookup code:
>     
> a) Since socket can be free while looking it up, or getting a
> reference to it, we need to check the socket reference counter to make
> sure the reference we got points to a live socket
>     
> b) It can also happen that the reference we got during lookup will be
> freed and the memory reused by another socket. Thus, after getting the
> reference, we must check again that we got the right socket.
>     
> Note that the /proc code was not yet converted to RCU, it still uses
> spinlocks for protection.
>     
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> ---
>  include/net/llc.h  |    7 ++--
>  net/llc/af_llc.c   |    1 +
>  net/llc/llc_conn.c |   89 ++++++++++++++++++++++++++++++++++------------------
>  net/llc/llc_core.c |    5 ++-
>  net/llc/llc_proc.c |   20 ++++++------
>  net/llc/llc_sap.c  |   72 ++++++++++++++++++++++++++++-------------
>  6 files changed, 124 insertions(+), 70 deletions(-)
> 
> diff --git a/include/net/llc.h b/include/net/llc.h
> index 7940da1..1559cf1 100644
> --- a/include/net/llc.h
> +++ b/include/net/llc.h
> @@ -16,6 +16,7 @@
>  #include <linux/if_ether.h>
>  #include <linux/list.h>
>  #include <linux/spinlock.h>
> +#include <linux/rculist_nulls.h>
>  
>  #include <asm/atomic.h>
>  
> @@ -53,10 +54,8 @@ struct llc_sap {
>  				     struct net_device *orig_dev);
>  	struct llc_addr	 laddr;
>  	struct list_head node;
> -	struct {
> -		rwlock_t	  lock;
> -		struct hlist_head list;
> -	} sk_list;
> +	spinlock_t sk_lock;
> +	struct hlist_nulls_head sk_list;
>  };
>  
>  #define LLC_DEST_INVALID         0      /* Invalid LLC PDU type */
> diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
> index ffc96d3..baefb4f 100644
> --- a/net/llc/af_llc.c
> +++ b/net/llc/af_llc.c
> @@ -140,6 +140,7 @@ static struct proto llc_proto = {
>  	.name	  = "LLC",
>  	.owner	  = THIS_MODULE,
>  	.obj_size = sizeof(struct llc_sock),
> +	.slab_flags = SLAB_DESTROY_BY_RCU,
>  };
>  
>  /**
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index c6bab39..8331fc8 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -468,6 +468,19 @@ static int llc_exec_conn_trans_actions(struct sock *sk,
>  	return rc;
>  }
>  
> +static inline bool llc_estab_match(const struct llc_sap *sap,
> +				   const struct llc_addr *daddr,
> +				   const struct llc_addr *laddr,
> +				   const struct sock *sk)
> +{
> +	struct llc_sock *llc = llc_sk(sk);
> +
> +	return llc->laddr.lsap == laddr->lsap &&
> +		llc->daddr.lsap == daddr->lsap &&
> +		llc_mac_match(llc->laddr.mac, laddr->mac) &&
> +		llc_mac_match(llc->daddr.mac, daddr->mac);
> +}
> +
>  /**
>   *	__llc_lookup_established - Finds connection for the remote/local sap/mac
>   *	@sap: SAP
> @@ -484,23 +497,24 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
>  					     struct llc_addr *laddr)
>  {
>  	struct sock *rc;
> -	struct hlist_node *node;
> -
> -	read_lock(&sap->sk_list.lock);
> -	sk_for_each(rc, node, &sap->sk_list.list) {
> -		struct llc_sock *llc = llc_sk(rc);
> -
> -		if (llc->laddr.lsap == laddr->lsap &&
> -		    llc->daddr.lsap == daddr->lsap &&
> -		    llc_mac_match(llc->laddr.mac, laddr->mac) &&
> -		    llc_mac_match(llc->daddr.mac, daddr->mac)) {
> -			sock_hold(rc);
> +	struct hlist_nulls_node *node;
> +
> +	rcu_read_lock();
> +	sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
> +		if (llc_estab_match(sap, daddr, laddr, rc)) {
> +			/* Extra checks required by SLAB_DESTROY_BY_RCU */
> +			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
> +				continue;

Hmm, this wont work in fact, because if we have several llc_sap allocated on machine,
we have no guarantee a freed socket wont be reused and inserted in another llc_sap list.

So before calling llc_estab_match() (and/or llc_listener_match()) you should
check if 'rc' really belong to current sap.

If not, you must restart the lookup (you hit a socket that was moved to another sap)


> +			if (unlikely(!llc_estab_match(sap, daddr, laddr, rc))) {
> +				sock_put(rc);
> +				continue;
> +			}
>  			goto found;
>  		}
>  	}
>  	rc = NULL;
>  found:
> -	read_unlock(&sap->sk_list.lock);
> +	rcu_read_unlock();
>  	return rc;
>  }
>  


  parent reply	other threads:[~2009-12-09 20:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
2009-12-03 22:31 ` [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO Octavian Purdila
2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
2009-12-03 22:59   ` Eric Dumazet
2009-12-03 23:30     ` Octavian Purdila
2009-12-03 23:52       ` Eric Dumazet
2009-12-04  0:15         ` Octavian Purdila
2009-12-04  0:28           ` Eric Dumazet
2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
2009-12-08 21:26           ` Eric Dumazet
2009-12-09 20:19           ` Eric Dumazet [this message]
2009-12-09 20:36             ` Octavian Purdila
2009-12-09 21:49               ` Octavian Purdila
2009-12-09 22:34                 ` Eric Dumazet
2009-12-09 20:52           ` Jarek Poplawski
2009-12-03 23:25   ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
2009-12-03 23:53     ` Octavian Purdila
2009-12-04  0:37       ` Stephen Hemminger
2009-12-03 22:31 ` [PATCH 4/4] llc: replace the socket list with a local address based hash Octavian Purdila
2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
2009-12-04  0:20   ` Octavian Purdila

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B20064C.7070301@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=acme@ghostprotocols.net \
    --cc=netdev@vger.kernel.org \
    --cc=opurdila@ixiacom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).