From: Eric Dumazet <eric.dumazet@gmail.com>
To: Octavian Purdila <opurdila@ixiacom.com>
Cc: netdev@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Subject: Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
Date: Wed, 09 Dec 2009 21:19:24 +0100 [thread overview]
Message-ID: <4B20064C.7070301@gmail.com> (raw)
In-Reply-To: <200912082310.45846.opurdila@ixiacom.com>
Le 08/12/2009 22:10, Octavian Purdila a écrit :
> On Friday 04 December 2009 01:52:44 you wrote:
>
>>> Since at this point we are using UP ports contention is not really an
>>> issue for us. I've extrapolated this (lock per hash bucket) based on how
>>> locking is done in other places, like UDP.
>>
>> Yes but you know we want to remove those locks per UDP hash bucket, since
>> we dont really need them anymore. ;)
>>
>>
>> If you remember, we had in the past one rwlock for the whole UDP table.
>>
>> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>> lookups for unicast RX
>>
>> Then we dynamically sized udp table at boot (up to 65536 slots)
>>
>> multicast optimization (holding lock for small duration + double hashing)
>>
>> bind optimization (thanks to double hashing)
>>
>> To be done :
>>
>> 1) multicast RX can be done without taking any lock, and RCU lookups
>> 2) zap all locks and use one lock, or a small array of hashed spinlocks
>>
>
> OK, here is my first try at llc RCU-fication.
>
> One doubt before pasting the code: In slab.h comment and in udp.c I see the lookup is restarted if an improper object is returned. Is that really required?
>
> Thanks!
> tavi
>
> [RFC PATCH] llc: convert the socket list to RCU locking
>
> For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
> which require some extra checks in the lookup code:
>
> a) Since socket can be free while looking it up, or getting a
> reference to it, we need to check the socket reference counter to make
> sure the reference we got points to a live socket
>
> b) It can also happen that the reference we got during lookup will be
> freed and the memory reused by another socket. Thus, after getting the
> reference, we must check again that we got the right socket.
>
> Note that the /proc code was not yet converted to RCU, it still uses
> spinlocks for protection.
>
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> ---
> include/net/llc.h | 7 ++--
> net/llc/af_llc.c | 1 +
> net/llc/llc_conn.c | 89 ++++++++++++++++++++++++++++++++++------------------
> net/llc/llc_core.c | 5 ++-
> net/llc/llc_proc.c | 20 ++++++------
> net/llc/llc_sap.c | 72 ++++++++++++++++++++++++++++-------------
> 6 files changed, 124 insertions(+), 70 deletions(-)
>
> diff --git a/include/net/llc.h b/include/net/llc.h
> index 7940da1..1559cf1 100644
> --- a/include/net/llc.h
> +++ b/include/net/llc.h
> @@ -16,6 +16,7 @@
> #include <linux/if_ether.h>
> #include <linux/list.h>
> #include <linux/spinlock.h>
> +#include <linux/rculist_nulls.h>
>
> #include <asm/atomic.h>
>
> @@ -53,10 +54,8 @@ struct llc_sap {
> struct net_device *orig_dev);
> struct llc_addr laddr;
> struct list_head node;
> - struct {
> - rwlock_t lock;
> - struct hlist_head list;
> - } sk_list;
> + spinlock_t sk_lock;
> + struct hlist_nulls_head sk_list;
> };
>
> #define LLC_DEST_INVALID 0 /* Invalid LLC PDU type */
> diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
> index ffc96d3..baefb4f 100644
> --- a/net/llc/af_llc.c
> +++ b/net/llc/af_llc.c
> @@ -140,6 +140,7 @@ static struct proto llc_proto = {
> .name = "LLC",
> .owner = THIS_MODULE,
> .obj_size = sizeof(struct llc_sock),
> + .slab_flags = SLAB_DESTROY_BY_RCU,
> };
>
> /**
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index c6bab39..8331fc8 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -468,6 +468,19 @@ static int llc_exec_conn_trans_actions(struct sock *sk,
> return rc;
> }
>
> +static inline bool llc_estab_match(const struct llc_sap *sap,
> + const struct llc_addr *daddr,
> + const struct llc_addr *laddr,
> + const struct sock *sk)
> +{
> + struct llc_sock *llc = llc_sk(sk);
> +
> + return llc->laddr.lsap == laddr->lsap &&
> + llc->daddr.lsap == daddr->lsap &&
> + llc_mac_match(llc->laddr.mac, laddr->mac) &&
> + llc_mac_match(llc->daddr.mac, daddr->mac);
> +}
> +
> /**
> * __llc_lookup_established - Finds connection for the remote/local sap/mac
> * @sap: SAP
> @@ -484,23 +497,24 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
> struct llc_addr *laddr)
> {
> struct sock *rc;
> - struct hlist_node *node;
> -
> - read_lock(&sap->sk_list.lock);
> - sk_for_each(rc, node, &sap->sk_list.list) {
> - struct llc_sock *llc = llc_sk(rc);
> -
> - if (llc->laddr.lsap == laddr->lsap &&
> - llc->daddr.lsap == daddr->lsap &&
> - llc_mac_match(llc->laddr.mac, laddr->mac) &&
> - llc_mac_match(llc->daddr.mac, daddr->mac)) {
> - sock_hold(rc);
> + struct hlist_nulls_node *node;
> +
> + rcu_read_lock();
> + sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
> + if (llc_estab_match(sap, daddr, laddr, rc)) {
> + /* Extra checks required by SLAB_DESTROY_BY_RCU */
> + if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
> + continue;
Hmm, this wont work in fact, because if we have several llc_sap allocated on machine,
we have no guarantee a freed socket wont be reused and inserted in another llc_sap list.
So before calling llc_estab_match() (and/or llc_listener_match()) you should
check if 'rc' really belong to current sap.
If not, you must restart the lookup (you hit a socket that was moved to another sap)
> + if (unlikely(!llc_estab_match(sap, daddr, laddr, rc))) {
> + sock_put(rc);
> + continue;
> + }
> goto found;
> }
> }
> rc = NULL;
> found:
> - read_unlock(&sap->sk_list.lock);
> + rcu_read_unlock();
> return rc;
> }
>
next prev parent reply other threads:[~2009-12-09 20:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
2009-12-03 22:31 ` [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO Octavian Purdila
2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
2009-12-03 22:59 ` Eric Dumazet
2009-12-03 23:30 ` Octavian Purdila
2009-12-03 23:52 ` Eric Dumazet
2009-12-04 0:15 ` Octavian Purdila
2009-12-04 0:28 ` Eric Dumazet
2009-12-08 21:10 ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
2009-12-08 21:26 ` Eric Dumazet
2009-12-09 20:19 ` Eric Dumazet [this message]
2009-12-09 20:36 ` Octavian Purdila
2009-12-09 21:49 ` Octavian Purdila
2009-12-09 22:34 ` Eric Dumazet
2009-12-09 20:52 ` Jarek Poplawski
2009-12-03 23:25 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
2009-12-03 23:53 ` Octavian Purdila
2009-12-04 0:37 ` Stephen Hemminger
2009-12-03 22:31 ` [PATCH 4/4] llc: replace the socket list with a local address based hash Octavian Purdila
2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
2009-12-04 0:20 ` Octavian Purdila
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B20064C.7070301@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=acme@ghostprotocols.net \
--cc=netdev@vger.kernel.org \
--cc=opurdila@ixiacom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.