* [PATCH 1/1][PKT_CLS] Avoid multiple tree locks @ 2007-03-21 9:58 jamal 2007-03-21 10:10 ` Patrick McHardy 0 siblings, 1 reply; 11+ messages in thread From: jamal @ 2007-03-21 9:58 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, Patrick McHardy [-- Attachment #1: Type: text/plain, Size: 198 bytes --] Seems to have been around a while. IMO, mterial for 2.6.21 but not stable. I have only compile-tested but it looks right(tm). I could have moved the lock down, but this looked safer. cheers, jamal [-- Attachment #2: qdtl_p --] [-- Type: text/plain, Size: 2163 bytes --] [PKT_CLS] Avoid multiple tree locks This fixes: When dumping filters the tree is locked first in the main dump function then when looking qdisc Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> --- commit 4a52cdd599f259b05320219d7aba1bac58fdf6d0 tree e9e4b83f7a2925b4408e4f18211365c3f9bff3fa parent 0a14fe6e5efd0af0f9c6c01e0433445d615d0110 author Jamal Hadi Salim <hadi@cyberus.ca> Wed, 21 Mar 2007 05:27:55 -0400 committer Jamal Hadi Salim <hadi@cyberus.ca> Wed, 21 Mar 2007 05:27:55 -0400 include/net/pkt_sched.h | 1 + net/sched/cls_api.c | 2 +- net/sched/sch_api.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index f6afee7..dd930bd 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -212,6 +212,7 @@ extern struct Qdisc_ops bfifo_qdisc_ops; extern int register_qdisc(struct Qdisc_ops *qops); extern int unregister_qdisc(struct Qdisc_ops *qops); +extern struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle); extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); extern struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); extern struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 5c6ffdb..17d4d37 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -403,7 +403,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb) if (!tcm->tcm_parent) q = dev->qdisc_sleeping; else - q = qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent)); + q = __qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent)); if (!q) goto out; if ((cops = q->ops->cl_ops) == NULL) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index ecc988a..1a3b65e 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -190,7 +190,7 @@ int unregister_qdisc(struct Qdisc_ops *qops) (root qdisc, all its children, children of children etc.) */ -static struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle) +struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle) { struct Qdisc *q; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 9:58 [PATCH 1/1][PKT_CLS] Avoid multiple tree locks jamal @ 2007-03-21 10:10 ` Patrick McHardy 2007-03-21 10:17 ` Patrick McHardy 2007-03-21 10:38 ` jamal 0 siblings, 2 replies; 11+ messages in thread From: Patrick McHardy @ 2007-03-21 10:10 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev jamal wrote: > Seems to have been around a while. IMO, mterial for 2.6.21 but not > stable. I have only compile-tested but it looks right(tm). Its harmless since its a read lock, which can be nested. I actually don't see any need for qdisc_tree_lock at all, all changes and all walking is done under the RTNL, which is why I've removed it in my (upcoming) patches. I suggest to leave it as is for now so I don't need to change the __qdisc_lookup back to qdisc_lookup in 2.6.22. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 10:10 ` Patrick McHardy @ 2007-03-21 10:17 ` Patrick McHardy 2007-03-21 12:35 ` Patrick McHardy 2007-03-21 10:38 ` jamal 1 sibling, 1 reply; 11+ messages in thread From: Patrick McHardy @ 2007-03-21 10:17 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev Patrick McHardy wrote: > jamal wrote: > >>Seems to have been around a while. IMO, mterial for 2.6.21 but not >>stable. I have only compile-tested but it looks right(tm). > > > > Its harmless since its a read lock, which can be nested. I actually > don't see any need for qdisc_tree_lock at all, all changes and all > walking is done under the RTNL, which is why I've removed it in > my (upcoming) patches. I suggest to leave it as is for now so I > don't need to change the __qdisc_lookup back to qdisc_lookup in > 2.6.22. Alexey just explained to me why we do need qdisc_tree_lock in private mail. While dumping only the first skb is filled under the RTNL, while filling further skbs we don't hold the RTNL anymore. So I will probably have to drop that patch. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 10:17 ` Patrick McHardy @ 2007-03-21 12:35 ` Patrick McHardy 2007-03-21 14:04 ` Patrick McHardy 0 siblings, 1 reply; 11+ messages in thread From: Patrick McHardy @ 2007-03-21 12:35 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev, Thomas Graf Patrick McHardy wrote: >>Its harmless since its a read lock, which can be nested. I actually >>don't see any need for qdisc_tree_lock at all, all changes and all >>walking is done under the RTNL, which is why I've removed it in >>my (upcoming) patches. I suggest to leave it as is for now so I >>don't need to change the __qdisc_lookup back to qdisc_lookup in >>2.6.22. > > > > Alexey just explained to me why we do need qdisc_tree_lock in private > mail. While dumping only the first skb is filled under the RTNL, > while filling further skbs we don't hold the RTNL anymore. So I will > probably have to drop that patch. What we could do is replace the netlink cb_lock spinlock by a user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex in this case). That would put the entire dump under the rtnl and allow us to get rid of qdisc_tree_lock and avoid the need to take dev_base_lock during qdisc dumping. Same in other spots like rtnl_dump_ifinfo, inet_dump_ifaddr, ... What do you think? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 12:35 ` Patrick McHardy @ 2007-03-21 14:04 ` Patrick McHardy 2007-03-21 14:06 ` Patrick McHardy 2007-03-22 6:07 ` jamal 0 siblings, 2 replies; 11+ messages in thread From: Patrick McHardy @ 2007-03-21 14:04 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev, Thomas Graf [-- Attachment #1: Type: text/plain, Size: 1246 bytes --] Patrick McHardy wrote: >>Alexey just explained to me why we do need qdisc_tree_lock in private >>mail. While dumping only the first skb is filled under the RTNL, >>while filling further skbs we don't hold the RTNL anymore. So I will >>probably have to drop that patch. > > > > What we could do is replace the netlink cb_lock spinlock by a > user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex > in this case). That would put the entire dump under the rtnl and > allow us to get rid of qdisc_tree_lock and avoid the need to take > dev_base_lock during qdisc dumping. Same in other spots like > rtnl_dump_ifinfo, inet_dump_ifaddr, ... These (compile tested) patches demonstrate the idea. The first one lets netlink_kernel_create users specify a mutex that should be held during dump callbacks, the second one uses this for rtnetlink and changes inet_dump_ifaddr for demonstration. A complete patch would allow us to simplify locking in lots of spots, all rtnetlink users currently need to implement extra locking just for the dump functions, and a number of them already get it wrong and seem to rely on the rtnl. If there are no objections to this change I'm going to update the second patch to include all rtnetlink users. [-- Attachment #2: 01.diff --] [-- Type: text/x-diff, Size: 1515 bytes --] [NET_SCHED]: cls_basic: fix NULL pointer dereference cls_basic doesn't allocate tp->root before it is linked into the active classifier list, resulting in a NULL pointer dereference when packets hit the classifier before its ->change function is called. Reported by Chris Madden <chris@reflexsecurity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> --- commit f1b9a0694552e18e7a43c292d21abe3b51dfcae2 tree f5ae39c1746fdc1ffbee6c1d90d035ee48ca4904 parent 0a14fe6e5efd0af0f9c6c01e0433445d615d0110 author Patrick McHardy <kaber@trash.net> Tue, 20 Mar 2007 16:08:54 +0100 committer Patrick McHardy <kaber@trash.net> Tue, 20 Mar 2007 16:08:54 +0100 net/sched/cls_basic.c | 16 +++++++--------- 1 files changed, 7 insertions(+), 9 deletions(-) diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index fad08e5..70fe36e 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -81,6 +81,13 @@ static void basic_put(struct tcf_proto * static int basic_init(struct tcf_proto *tp) { + struct basic_head *head; + + head = kzalloc(sizeof(*head), GFP_KERNEL); + if (head == NULL) + return -ENOBUFS; + INIT_LIST_HEAD(&head->flist); + tp->root = head; return 0; } @@ -176,15 +183,6 @@ static int basic_change(struct tcf_proto } err = -ENOBUFS; - if (head == NULL) { - head = kzalloc(sizeof(*head), GFP_KERNEL); - if (head == NULL) - goto errout; - - INIT_LIST_HEAD(&head->flist); - tp->root = head; - } - f = kzalloc(sizeof(*f), GFP_KERNEL); if (f == NULL) goto errout; [-- Attachment #3: 02.diff --] [-- Type: text/x-diff, Size: 1137 bytes --] [NET_SCHED]: Fix ingress locking Ingress queueing uses a seperate lock for serializing enqueue operations, but fails to properly protect itself against concurrent changes to the qdisc tree. Use queue_lock for now since the real fix it quite intrusive. Signed-off-by: Patrick McHardy <kaber@trash.net> --- commit 11985909b582dc688b5a7c0f73f16244224116f4 tree 0ee26bec34053f6c9b5f905ffbc1437881428eeb parent f1b9a0694552e18e7a43c292d21abe3b51dfcae2 author Patrick McHardy <kaber@trash.net> Tue, 20 Mar 2007 16:11:56 +0100 committer Patrick McHardy <kaber@trash.net> Tue, 20 Mar 2007 16:11:56 +0100 net/core/dev.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index cf71614..5984b55 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1750,10 +1750,10 @@ static int ing_filter(struct sk_buff *sk skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_INGRESS); - spin_lock(&dev->ingress_lock); + spin_lock(&dev->queue_lock); if ((q = dev->qdisc_ingress) != NULL) result = q->enqueue(skb, q); - spin_unlock(&dev->ingress_lock); + spin_unlock(&dev->queue_lock); } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 14:04 ` Patrick McHardy @ 2007-03-21 14:06 ` Patrick McHardy 2007-03-22 6:07 ` jamal 1 sibling, 0 replies; 11+ messages in thread From: Patrick McHardy @ 2007-03-21 14:06 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev, Thomas Graf [-- Attachment #1: Type: text/plain, Size: 1335 bytes --] Patrick McHardy wrote: > Patrick McHardy wrote: > >>>Alexey just explained to me why we do need qdisc_tree_lock in private >>>mail. While dumping only the first skb is filled under the RTNL, >>>while filling further skbs we don't hold the RTNL anymore. So I will >>>probably have to drop that patch. >> >> >> >>What we could do is replace the netlink cb_lock spinlock by a >>user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex >>in this case). That would put the entire dump under the rtnl and >>allow us to get rid of qdisc_tree_lock and avoid the need to take >>dev_base_lock during qdisc dumping. Same in other spots like >>rtnl_dump_ifinfo, inet_dump_ifaddr, ... > > > > These (compile tested) patches demonstrate the idea. The first one > lets netlink_kernel_create users specify a mutex that should be > held during dump callbacks, the second one uses this for rtnetlink > and changes inet_dump_ifaddr for demonstration. > > A complete patch would allow us to simplify locking in lots of > spots, all rtnetlink users currently need to implement extra > locking just for the dump functions, and a number of them > already get it wrong and seem to rely on the rtnl. > > If there are no objections to this change I'm going to update > the second patch to include all rtnetlink users D'oh .. wrong patches. [-- Attachment #2: 01.diff --] [-- Type: text/x-diff, Size: 14747 bytes --] [NETLINK]: Put dump callback under mutex, optionally user supplied Replace the callback spinlock by a mutex and allow users to supply their own mutex to allow getting rid of seperate locking in dump callbacks. For users that don't supply their own mutex nothing changes. Signed-off-by: Patrick McHardy <kaber@trash.net> --- commit c3400c45267a1fd291da75b0fe4b7970c846ff50 tree 96a4dc6050d74e72b4fffe9c047a0e695085e6db parent 2c31e4429748f2629c59379b1113931a13a0cca9 author Patrick McHardy <kaber@trash.net> Wed, 21 Mar 2007 14:43:02 +0100 committer Patrick McHardy <kaber@trash.net> Wed, 21 Mar 2007 14:43:02 +0100 drivers/connector/connector.c | 2 +- drivers/scsi/scsi_netlink.c | 3 ++- drivers/scsi/scsi_transport_iscsi.c | 2 +- fs/ecryptfs/netlink.c | 2 +- include/linux/netlink.h | 5 ++++- lib/kobject_uevent.c | 2 +- net/bridge/netfilter/ebt_ulog.c | 2 +- net/core/rtnetlink.c | 2 +- net/decnet/netfilter/dn_rtmsg.c | 2 +- net/ipv4/fib_frontend.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/ipv4/netfilter/ip_queue.c | 2 +- net/ipv4/netfilter/ipt_ULOG.c | 2 +- net/ipv6/netfilter/ip6_queue.c | 2 +- net/netfilter/nfnetlink.c | 2 +- net/netlink/af_netlink.c | 30 +++++++++++++++++++----------- net/netlink/genetlink.c | 2 +- net/xfrm/xfrm_user.c | 2 +- security/selinux/netlink.c | 2 +- 19 files changed, 41 insertions(+), 29 deletions(-) diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index 7f9c4fb..a7b9e9b 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -448,7 +448,7 @@ static int __devinit cn_init(void) dev->nls = netlink_kernel_create(NETLINK_CONNECTOR, CN_NETLINK_USERS + 0xf, - dev->input, THIS_MODULE); + dev->input, NULL, THIS_MODULE); if (!dev->nls) return -EIO; diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c index 45646a2..4bf9aa5 100644 --- a/drivers/scsi/scsi_netlink.c +++ b/drivers/scsi/scsi_netlink.c @@ -168,7 +168,8 @@ scsi_netlink_init(void) } scsi_nl_sock = netlink_kernel_create(NETLINK_SCSITRANSPORT, - SCSI_NL_GRP_CNT, scsi_nl_rcv, THIS_MODULE); + SCSI_NL_GRP_CNT, scsi_nl_rcv, NULL, + THIS_MODULE); if (!scsi_nl_sock) { printk(KERN_ERR "%s: register of recieve handler failed\n", __FUNCTION__); diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 10590cd..aabaa05 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -1435,7 +1435,7 @@ static __init int iscsi_transport_init(void) if (err) goto unregister_conn_class; - nls = netlink_kernel_create(NETLINK_ISCSI, 1, iscsi_if_rx, + nls = netlink_kernel_create(NETLINK_ISCSI, 1, iscsi_if_rx, NULL, THIS_MODULE); if (!nls) { err = -ENOBUFS; diff --git a/fs/ecryptfs/netlink.c b/fs/ecryptfs/netlink.c index 8405d21..fe91863 100644 --- a/fs/ecryptfs/netlink.c +++ b/fs/ecryptfs/netlink.c @@ -229,7 +229,7 @@ int ecryptfs_init_netlink(void) ecryptfs_nl_sock = netlink_kernel_create(NETLINK_ECRYPTFS, 0, ecryptfs_receive_nl_message, - THIS_MODULE); + NULL, THIS_MODULE); if (!ecryptfs_nl_sock) { rc = -EIO; ecryptfs_printk(KERN_ERR, "Failed to create netlink socket\n"); diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 0d11f6a..f41688f 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -157,7 +157,10 @@ struct netlink_skb_parms #define NETLINK_CREDS(skb) (&NETLINK_CB((skb)).creds) -extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module); +extern struct sock *netlink_kernel_create(int unit, unsigned int groups, + void (*input)(struct sock *sk, int len), + struct mutex *cb_mutex, + struct module *module); extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err); extern int netlink_has_listeners(struct sock *sk, unsigned int group); extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock); diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index 84272ed..82fc179 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -293,7 +293,7 @@ EXPORT_SYMBOL_GPL(add_uevent_var); static int __init kobject_uevent_init(void) { uevent_sock = netlink_kernel_create(NETLINK_KOBJECT_UEVENT, 1, NULL, - THIS_MODULE); + NULL, THIS_MODULE); if (!uevent_sock) { printk(KERN_ERR diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c index 259f5c3..445d2fd 100644 --- a/net/bridge/netfilter/ebt_ulog.c +++ b/net/bridge/netfilter/ebt_ulog.c @@ -304,7 +304,7 @@ static int __init ebt_ulog_init(void) } ebtulognl = netlink_kernel_create(NETLINK_NFLOG, EBT_ULOG_MAXNLGROUPS, - NULL, THIS_MODULE); + NULL, NULL, THIS_MODULE); if (!ebtulognl) ret = -ENOMEM; else if ((ret = ebt_register_watcher(&ulog))) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 6055074..777c4c0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -873,7 +873,7 @@ void __init rtnetlink_init(void) panic("rtnetlink_init: cannot allocate rta_buf\n"); rtnl = netlink_kernel_create(NETLINK_ROUTE, RTNLGRP_MAX, rtnetlink_rcv, - THIS_MODULE); + NULL, THIS_MODULE); if (rtnl == NULL) panic("rtnetlink_init: cannot initialize rtnetlink\n"); netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV); diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c index 9e8256a..7799f36 100644 --- a/net/decnet/netfilter/dn_rtmsg.c +++ b/net/decnet/netfilter/dn_rtmsg.c @@ -138,7 +138,7 @@ static int __init dn_rtmsg_init(void) int rv = 0; dnrmg = netlink_kernel_create(NETLINK_DNRTMSG, DNRNG_NLGRP_MAX, - dnrmg_receive_user_sk, THIS_MODULE); + dnrmg_receive_user_sk, NULL, THIS_MODULE); if (dnrmg == NULL) { printk(KERN_ERR "dn_rtmsg: Cannot create netlink socket"); return -ENOMEM; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 35515fb..2d79413 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -816,7 +816,7 @@ static void nl_fib_input(struct sock *sk, int len) static void nl_fib_lookup_init(void) { - netlink_kernel_create(NETLINK_FIB_LOOKUP, 0, nl_fib_input, THIS_MODULE); + netlink_kernel_create(NETLINK_FIB_LOOKUP, 0, nl_fib_input, NULL, THIS_MODULE); } static void fib_disable_ip(struct net_device *dev, int force) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 62c2e9f..abb012a 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -918,7 +918,7 @@ static int __init inet_diag_init(void) goto out; idiagnl = netlink_kernel_create(NETLINK_INET_DIAG, 0, inet_diag_rcv, - THIS_MODULE); + NULL, THIS_MODULE); if (idiagnl == NULL) goto out_free_table; err = 0; diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c index 17f7c98..c81fd41 100644 --- a/net/ipv4/netfilter/ip_queue.c +++ b/net/ipv4/netfilter/ip_queue.c @@ -680,7 +680,7 @@ static int __init ip_queue_init(void) netlink_register_notifier(&ipq_nl_notifier); ipqnl = netlink_kernel_create(NETLINK_FIREWALL, 0, ipq_rcv_sk, - THIS_MODULE); + NULL, THIS_MODULE); if (ipqnl == NULL) { printk(KERN_ERR "ip_queue: failed to create netlink socket\n"); goto cleanup_netlink_notifier; diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c index 3e5566b..17094c1 100644 --- a/net/ipv4/netfilter/ipt_ULOG.c +++ b/net/ipv4/netfilter/ipt_ULOG.c @@ -398,7 +398,7 @@ static int __init ipt_ulog_init(void) } nflognl = netlink_kernel_create(NETLINK_NFLOG, ULOG_MAXNLGROUPS, NULL, - THIS_MODULE); + NULL, THIS_MODULE); if (!nflognl) return -ENOMEM; diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c index 275e625..9caa7f7 100644 --- a/net/ipv6/netfilter/ip6_queue.c +++ b/net/ipv6/netfilter/ip6_queue.c @@ -669,7 +669,7 @@ static int __init ip6_queue_init(void) struct proc_dir_entry *proc; netlink_register_notifier(&ipq_nl_notifier); - ipqnl = netlink_kernel_create(NETLINK_IP6_FW, 0, ipq_rcv_sk, + ipqnl = netlink_kernel_create(NETLINK_IP6_FW, 0, ipq_rcv_sk, NULL, THIS_MODULE); if (ipqnl == NULL) { printk(KERN_ERR "ip6_queue: failed to create netlink socket\n"); diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index dec36ab..51acce6 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -279,7 +279,7 @@ static int __init nfnetlink_init(void) printk("Netfilter messages via NETLINK v%s.\n", nfversion); nfnl = netlink_kernel_create(NETLINK_NETFILTER, NFNLGRP_MAX, - nfnetlink_rcv, THIS_MODULE); + nfnetlink_rcv, NULL, THIS_MODULE); if (!nfnl) { printk(KERN_ERR "cannot initialize nfnetlink!\n"); return -1; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 5a7e6c8..7f6c666 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -76,7 +76,8 @@ struct netlink_sock { unsigned long state; wait_queue_head_t wait; struct netlink_callback *cb; - spinlock_t cb_lock; + struct mutex *cb_mutex; + struct mutex cb_def_mutex; void (*data_ready)(struct sock *sk, int bytes); struct module *module; }; @@ -108,6 +109,7 @@ struct netlink_table { unsigned long *listeners; unsigned int nl_nonroot; unsigned int groups; + struct mutex *cb_mutex; struct module *module; int registered; }; @@ -384,7 +386,7 @@ static int __netlink_create(struct socket *sock, int protocol) sock_init_data(sock, sk); nlk = nlk_sk(sk); - spin_lock_init(&nlk->cb_lock); + mutex_init(nlk->cb_mutex); init_waitqueue_head(&nlk->wait); sk->sk_destruct = netlink_sock_destruct; @@ -395,6 +397,7 @@ static int __netlink_create(struct socket *sock, int protocol) static int netlink_create(struct socket *sock, int protocol) { struct module *module = NULL; + struct mutex *cb_mutex; struct netlink_sock *nlk; unsigned int groups; int err = 0; @@ -419,6 +422,7 @@ static int netlink_create(struct socket *sock, int protocol) try_module_get(nl_table[protocol].module)) module = nl_table[protocol].module; groups = nl_table[protocol].groups; + cb_mutex = nl_table[protocol].cb_mutex; netlink_unlock_table(); if ((err = __netlink_create(sock, protocol)) < 0) @@ -426,6 +430,7 @@ static int netlink_create(struct socket *sock, int protocol) nlk = nlk_sk(sock->sk); nlk->module = module; + nlk->cb_mutex = cb_mutex ? cb_mutex : &nlk->cb_def_mutex; out: return err; @@ -445,14 +450,14 @@ static int netlink_release(struct socket *sock) netlink_remove(sk); nlk = nlk_sk(sk); - spin_lock(&nlk->cb_lock); + mutex_lock(nlk->cb_mutex); if (nlk->cb) { if (nlk->cb->done) nlk->cb->done(nlk->cb); netlink_destroy_callback(nlk->cb); nlk->cb = NULL; } - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); /* OK. Socket is unlinked, and, therefore, no new packets will arrive */ @@ -1268,6 +1273,7 @@ static void netlink_data_ready(struct sock *sk, int len) struct sock * netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), + struct mutex *cb_mutex, struct module *module) { struct socket *sock; @@ -1303,11 +1309,13 @@ netlink_kernel_create(int unit, unsigned int groups, nlk = nlk_sk(sk); nlk->flags |= NETLINK_KERNEL_SOCKET; + nlk->cb_mutex = &nlk->cb_def_mutex; netlink_table_grab(); nl_table[unit].groups = groups; nl_table[unit].listeners = listeners; nl_table[unit].module = module; + nl_table[unit].cb_mutex = cb_mutex; nl_table[unit].registered = 1; netlink_table_ungrab(); @@ -1349,7 +1357,7 @@ static int netlink_dump(struct sock *sk) if (!skb) goto errout; - spin_lock(&nlk->cb_lock); + mutex_lock(nlk->cb_mutex); cb = nlk->cb; if (cb == NULL) { @@ -1360,7 +1368,7 @@ static int netlink_dump(struct sock *sk) len = cb->dump(skb, cb); if (len > 0) { - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); skb_queue_tail(&sk->sk_receive_queue, skb); sk->sk_data_ready(sk, len); return 0; @@ -1378,13 +1386,13 @@ static int netlink_dump(struct sock *sk) if (cb->done) cb->done(cb); nlk->cb = NULL; - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); netlink_destroy_callback(cb); return 0; errout_skb: - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); kfree_skb(skb); errout: return err; @@ -1416,15 +1424,15 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk = nlk_sk(sk); /* A dump is in progress... */ - spin_lock(&nlk->cb_lock); + mutex_lock(nlk->cb_mutex); if (nlk->cb) { - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); netlink_destroy_callback(cb); sock_put(sk); return -EBUSY; } nlk->cb = cb; - spin_unlock(&nlk->cb_lock); + mutex_unlock(nlk->cb_mutex); netlink_dump(sk); sock_put(sk); diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index c299679..dd45b7b 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -586,7 +586,7 @@ static int __init genl_init(void) netlink_set_nonroot(NETLINK_GENERIC, NL_NONROOT_RECV); genl_sock = netlink_kernel_create(NETLINK_GENERIC, GENL_MAX_ID, - genl_rcv, THIS_MODULE); + genl_rcv, NULL, THIS_MODULE); if (genl_sock == NULL) panic("GENL: Cannot initialize generic netlink\n"); diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 699a7e0..8d0654e 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2467,7 +2467,7 @@ static int __init xfrm_user_init(void) printk(KERN_INFO "Initializing XFRM netlink socket\n"); nlsk = netlink_kernel_create(NETLINK_XFRM, XFRMNLGRP_MAX, - xfrm_netlink_rcv, THIS_MODULE); + xfrm_netlink_rcv, NULL, THIS_MODULE); if (nlsk == NULL) return -ENOMEM; rcu_assign_pointer(xfrm_nl, nlsk); diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c index 33f2e06..f49046d 100644 --- a/security/selinux/netlink.c +++ b/security/selinux/netlink.c @@ -104,7 +104,7 @@ void selnl_notify_policyload(u32 seqno) static int __init selnl_init(void) { - selnl = netlink_kernel_create(NETLINK_SELINUX, SELNLGRP_MAX, NULL, + selnl = netlink_kernel_create(NETLINK_SELINUX, SELNLGRP_MAX, NULL, NULL, THIS_MODULE); if (selnl == NULL) panic("SELinux: Cannot create netlink socket."); [-- Attachment #3: 02.diff --] [-- Type: text/x-diff, Size: 1714 bytes --] diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 777c4c0..3f5d198 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -873,7 +873,7 @@ void __init rtnetlink_init(void) panic("rtnetlink_init: cannot allocate rta_buf\n"); rtnl = netlink_kernel_create(NETLINK_ROUTE, RTNLGRP_MAX, rtnetlink_rcv, - NULL, THIS_MODULE); + &rtnl_mutex, THIS_MODULE); if (rtnl == NULL) panic("rtnetlink_init: cannot initialize rtnetlink\n"); netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV); diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 043857b..c4b82e3 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1183,17 +1183,13 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) int s_ip_idx, s_idx = cb->args[0]; s_ip_idx = ip_idx = cb->args[1]; - read_lock(&dev_base_lock); for (dev = dev_base, idx = 0; dev; dev = dev->next, idx++) { if (idx < s_idx) continue; if (idx > s_idx) s_ip_idx = 0; - rcu_read_lock(); - if ((in_dev = __in_dev_get_rcu(dev)) == NULL) { - rcu_read_unlock(); + if ((in_dev = __in_dev_get_rtnl(dev)) == NULL) continue; - } for (ifa = in_dev->ifa_list, ip_idx = 0; ifa; ifa = ifa->ifa_next, ip_idx++) { @@ -1201,16 +1197,12 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) continue; if (inet_fill_ifaddr(skb, ifa, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, - RTM_NEWADDR, NLM_F_MULTI) <= 0) { - rcu_read_unlock(); + RTM_NEWADDR, NLM_F_MULTI) <= 0) goto done; - } } - rcu_read_unlock(); } done: - read_unlock(&dev_base_lock); cb->args[0] = idx; cb->args[1] = ip_idx; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 14:04 ` Patrick McHardy 2007-03-21 14:06 ` Patrick McHardy @ 2007-03-22 6:07 ` jamal 2007-03-22 11:36 ` Patrick McHardy 1 sibling, 1 reply; 11+ messages in thread From: jamal @ 2007-03-22 6:07 UTC (permalink / raw) To: Patrick McHardy; +Cc: David S. Miller, netdev, Thomas Graf On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote: > Patrick McHardy wrote: > > What we could do is replace the netlink cb_lock spinlock by a > > user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex > > in this case). That would put the entire dump under the rtnl and > > allow us to get rid of qdisc_tree_lock and avoid the need to take > > dev_base_lock during qdisc dumping. Same in other spots like > > rtnl_dump_ifinfo, inet_dump_ifaddr, ... > > > These (compile tested) patches demonstrate the idea. > > The first one > lets netlink_kernel_create users specify a mutex that should be > held during dump callbacks, the second one uses this for rtnetlink > and changes inet_dump_ifaddr for demonstration. > > A complete patch would allow us to simplify locking in lots of > spots, all rtnetlink users currently need to implement extra > locking just for the dump functions, and a number of them > already get it wrong and seem to rely on the rtnl. > The mutex is certainly a cleaner approach; and a lot of the RCU protection would go away. I like it. Knowing you i sense theres something clever in there that i am missing. I dont see how you could get rid of the tree locking since we need to protect against the data path still, no? Or are you looking at that as a separate effort? > If there are no objections to this change I'm going to update > the second patch to include all rtnetlink users. No objections here. cheers, jamal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-22 6:07 ` jamal @ 2007-03-22 11:36 ` Patrick McHardy 2007-03-23 13:12 ` jamal 2007-03-27 23:44 ` David Miller 0 siblings, 2 replies; 11+ messages in thread From: Patrick McHardy @ 2007-03-22 11:36 UTC (permalink / raw) To: hadi; +Cc: David S. Miller, netdev, Thomas Graf jamal wrote: > On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote: > >>These (compile tested) patches demonstrate the idea. >> >>The first one >>lets netlink_kernel_create users specify a mutex that should be >>held during dump callbacks, the second one uses this for rtnetlink >>and changes inet_dump_ifaddr for demonstration. >> >>A complete patch would allow us to simplify locking in lots of >>spots, all rtnetlink users currently need to implement extra >>locking just for the dump functions, and a number of them >>already get it wrong and seem to rely on the rtnl. >> > > > The mutex is certainly a cleaner approach; > and a lot of the RCU protection would go away. I like it. Not as much as I initially thought, but at least we would have consistent locking for the dump callbacks. > Knowing you i sense theres something clever in there that i am > missing. I dont see how you could get rid of the tree locking > since we need to protect against the data path still, no? > Or are you looking at that as a separate effort? We can remove qdisc_tree_lock since with this patch all changes and all tree walking happen under the RTNL. We still need to keep dev->queue_lock for the data path. I'll update the patches to include all rtnetlink users and repost in a day or two. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-22 11:36 ` Patrick McHardy @ 2007-03-23 13:12 ` jamal 2007-03-27 23:44 ` David Miller 1 sibling, 0 replies; 11+ messages in thread From: jamal @ 2007-03-23 13:12 UTC (permalink / raw) To: Patrick McHardy; +Cc: David S. Miller, netdev, Thomas Graf On Thu, 2007-22-03 at 12:36 +0100, Patrick McHardy wrote: > jamal wrote: > > On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote: > We can remove qdisc_tree_lock since with this patch all changes > and all tree walking happen under the RTNL. > We still need to keep dev->queue_lock for the data path. > ok, that would work. Should have been obvious to me. cheers, jamal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-22 11:36 ` Patrick McHardy 2007-03-23 13:12 ` jamal @ 2007-03-27 23:44 ` David Miller 1 sibling, 0 replies; 11+ messages in thread From: David Miller @ 2007-03-27 23:44 UTC (permalink / raw) To: kaber; +Cc: hadi, netdev, tgraf From: Patrick McHardy <kaber@trash.net> Date: Thu, 22 Mar 2007 12:36:17 +0100 > jamal wrote: > > The mutex is certainly a cleaner approach; > > and a lot of the RCU protection would go away. I like it. > > Not as much as I initially thought, but at least we would have > consistent locking for the dump callbacks. > > > Knowing you i sense theres something clever in there that i am > > missing. I dont see how you could get rid of the tree locking > > since we need to protect against the data path still, no? > > Or are you looking at that as a separate effort? > > We can remove qdisc_tree_lock since with this patch all changes > and all tree walking happen under the RTNL. We still need to keep > dev->queue_lock for the data path. > > I'll update the patches to include all rtnetlink users and repost > in a day or two. The existing weird "first SKB only" locking is unintuitive to me as well, so I'm all for these mutex patches once you respin them FWIW. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks 2007-03-21 10:10 ` Patrick McHardy 2007-03-21 10:17 ` Patrick McHardy @ 2007-03-21 10:38 ` jamal 1 sibling, 0 replies; 11+ messages in thread From: jamal @ 2007-03-21 10:38 UTC (permalink / raw) To: Patrick McHardy; +Cc: David S. Miller, netdev On Wed, 2007-21-03 at 11:10 +0100, Patrick McHardy wrote: > Its harmless since its a read lock, which can be nested. I actually > don't see any need for qdisc_tree_lock at all, all changes and all > walking is done under the RTNL, which is why I've removed it in > my (upcoming) patches. I suggest to leave it as is for now so I > don't need to change the __qdisc_lookup back to qdisc_lookup in > 2.6.22. Sounds good to me. cheers, jamal ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-03-27 23:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-21 9:58 [PATCH 1/1][PKT_CLS] Avoid multiple tree locks jamal 2007-03-21 10:10 ` Patrick McHardy 2007-03-21 10:17 ` Patrick McHardy 2007-03-21 12:35 ` Patrick McHardy 2007-03-21 14:04 ` Patrick McHardy 2007-03-21 14:06 ` Patrick McHardy 2007-03-22 6:07 ` jamal 2007-03-22 11:36 ` Patrick McHardy 2007-03-23 13:12 ` jamal 2007-03-27 23:44 ` David Miller 2007-03-21 10:38 ` jamal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).