* [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism
@ 2006-07-18 18:48 Steve Wise
2006-07-18 18:48 ` [PATCH Round 4 1/3] " Steve Wise
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Steve Wise @ 2006-07-18 18:48 UTC (permalink / raw)
To: davem, rdreier; +Cc: openib-general, netdev
All,
I'm posting this one more time for a definitive decision on pulling in
this netevent notifier patch.
I've included in this patchset changes to the Infiniband Core to use
netevents instead of packet snooping to discover IPoIB ARP changes.
See patch 3/3 for the Infiniband changes.
Thanks,
Steve.
Round 4 Changes:
- changed drivers/infiniband/core/addr.c to use netevents for
discovering IPoIB ARP events.
Round 3 Changes:
- changed netlink msg for neighbour change to (RTM_NEIGHUPD)
- added netlink msg for PMTU change events (RTM_ROUTEUPD)
- added netlink messages for redirect (RTM_DELROUTE + RTM_NEWROUTE)
- tested neighbour change events via netlink for ipv4 and ipv6.
- tested redirect change events via netlink for ipv4.
Round 2 Changes:
- cleaned up event structures per review feedback.
- began integration with netlink (see neighbour changes in patch 2).
- added IPv6 support.
------
This patch implements a mechanism that allows interested clients to
register for notification of certain network events. The intended use
is to allow RDMA devices (linux/drivers/infiniband) to be notified of
neighbour updates, ICMP redirects, path MTU changes, and route changes.
The reason these devices need update events is because they typically
cache this information in hardware and need to be notified when this
information has been updated. For information on RDMA protocols, see:
http://www.ietf.org/html.charters/rddp-charter.html.
The key events of interest are:
- neighbour mac address change
- routing redirect (the next hop neighbour changes for a dst_entry)
- path mtu change (the path mtu for a dst_entry changes).
- route add/deletes
NOTE: These new netevents are also passed up to user space via netlink.
We would like to get this or similar functionality included in 2.6.19
and request comments.
This patchset consists of 3 patches:
1) New files implementing the Network Event Notifier
2) Core network changes to generate network event notifications
3) Cleanup ib_addr modules to use the netevent patch
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH Round 4 1/3] Network Event Notifier Mechanism. 2006-07-18 18:48 [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Steve Wise @ 2006-07-18 18:48 ` Steve Wise 2006-07-18 18:49 ` [PATCH Round 4 2/3] Core network changes to support network event notification Steve Wise 2006-07-18 18:49 ` [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch Steve Wise 2 siblings, 0 replies; 9+ messages in thread From: Steve Wise @ 2006-07-18 18:48 UTC (permalink / raw) To: davem, rdreier; +Cc: openib-general, netdev This patch uses notifier blocks to implement a network event notifier mechanism. Clients register their callback function by calling register_netevent_notifier() like this: static struct notifier_block nb = { .notifier_call = my_callback_func }; ... register_netevent_notifier(&nb); --- include/net/netevent.h | 49 +++++++++++++++++++++++++++++++++++ net/core/netevent.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 0 deletions(-) diff --git a/include/net/netevent.h b/include/net/netevent.h new file mode 100644 index 0000000..22214c8 --- /dev/null +++ b/include/net/netevent.h @@ -0,0 +1,49 @@ +#ifndef _NET_EVENT_H +#define _NET_EVENT_H + +/* + * Generic netevent notifiers + * + * Authors: + * Tom Tucker <tom@opengridcomputing.com> + * + * Changes: + */ + +#ifdef __KERNEL__ + +#include <net/dst.h> + +/* + * Generic route info structure. + * + * Family Data ptr type + * -------------------------------- + * AF_INET - struct fib_info * + * AF_INET6 - struct rt6_info * + * AF_DECnet - struct dn_route * + */ +struct netevent_route_info { + u16 family; + void *data; +}; + +struct netevent_redirect { + struct dst_entry *old; + struct dst_entry *new; +}; + +enum netevent_notif_type { + NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */ + NETEVENT_ROUTE_ADD, /* arg is struct netevent_route_info ptr */ + NETEVENT_ROUTE_DEL, /* arg is struct netevent_route_info ptr */ + NETEVENT_PMTU_UPDATE, /* arg is struct dst_entry ptr */ + NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */ +}; + +extern int register_netevent_notifier(struct notifier_block *nb); +extern int unregister_netevent_notifier(struct notifier_block *nb); +extern int call_netevent_notifiers(unsigned long val, void *v); + +#endif +#endif diff --git a/net/core/netevent.c b/net/core/netevent.c new file mode 100644 index 0000000..e995751 --- /dev/null +++ b/net/core/netevent.c @@ -0,0 +1,68 @@ +/* + * Network event notifiers + * + * Authors: + * Tom Tucker <tom@opengridcomputing.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + */ + +#include <linux/rtnetlink.h> +#include <linux/notifier.h> + +static ATOMIC_NOTIFIER_HEAD(netevent_notif_chain); + +/** + * register_netevent_notifier - register a netevent notifier block + * @nb: notifier + * + * Register a notifier to be called when a netevent occurs. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + */ +int register_netevent_notifier(struct notifier_block *nb) +{ + int err; + + err = atomic_notifier_chain_register(&netevent_notif_chain, nb); + return err; +} + +/** + * netevent_unregister_notifier - unregister a netevent notifier block + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_neigh_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + */ + +int unregister_netevent_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&netevent_notif_chain, nb); +} + +/** + * call_netevent_notifiers - call all netevent notifier blocks + * @val: value passed unmodified to notifier function + * @v: pointer passed unmodified to notifier function + * + * Call all neighbour notifier blocks. Parameters and return value + * are as for notifier_call_chain(). + */ + +int call_netevent_notifiers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&netevent_notif_chain, val, v); +} + +EXPORT_SYMBOL_GPL(register_netevent_notifier); +EXPORT_SYMBOL_GPL(unregister_netevent_notifier); +EXPORT_SYMBOL_GPL(call_netevent_notifiers); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-18 18:48 [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Steve Wise 2006-07-18 18:48 ` [PATCH Round 4 1/3] " Steve Wise @ 2006-07-18 18:49 ` Steve Wise 2006-07-25 7:39 ` Herbert Xu 2006-07-18 18:49 ` [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch Steve Wise 2 siblings, 1 reply; 9+ messages in thread From: Steve Wise @ 2006-07-18 18:49 UTC (permalink / raw) To: davem, rdreier; +Cc: openib-general, netdev This patch adds netevent and netlink calls for neighbour change, route add/del, pmtu change, and routing redirect events. Netlink Details: Neighbour change events are broadcast as a new ndmsg type RTM_NEIGHUPD. Path mtu change events are broadcast as a new rtmsg type RTM_ROUTEUPD. Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE and RTM_NEWROUTE. --- include/linux/rtnetlink.h | 4 ++ net/core/Makefile | 2 + net/core/neighbour.c | 37 ++++++++++++++++--- net/ipv4/fib_semantics.c | 9 +++++ net/ipv4/route.c | 86 ++++++++++++++++++++++++++++++++++++++++++-- net/ipv6/route.c | 87 +++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 213 insertions(+), 12 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index facd9ee..340ca4f 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -35,6 +35,8 @@ #define RTM_NEWROUTE RTM_NEWROUTE #define RTM_DELROUTE RTM_DELROUTE RTM_GETROUTE, #define RTM_GETROUTE RTM_GETROUTE + RTM_ROUTEUPD, +#define RTM_ROUTEUPD RTM_ROUTEUPD RTM_NEWNEIGH = 28, #define RTM_NEWNEIGH RTM_NEWNEIGH @@ -42,6 +44,8 @@ #define RTM_NEWNEIGH RTM_NEWNEIGH #define RTM_DELNEIGH RTM_DELNEIGH RTM_GETNEIGH, #define RTM_GETNEIGH RTM_GETNEIGH + RTM_NEIGHUPD, +#define RTM_NEIGHUPD RTM_NEIGHUPD RTM_NEWRULE = 32, #define RTM_NEWRULE RTM_NEWRULE diff --git a/net/core/Makefile b/net/core/Makefile index e9bd246..2645ba4 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -7,7 +7,7 @@ obj-y := sock.o request_sock.o skbuff.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-y += dev.o ethtool.o dev_mcast.o dst.o \ +obj-y += dev.o ethtool.o dev_mcast.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_XFRM) += flow.o diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7ad681f..11c7643 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -29,9 +29,11 @@ #include <linux/times.h> #include <net/neighbour.h> #include <net/dst.h> #include <net/sock.h> +#include <net/netevent.h> #include <linux/rtnetlink.h> #include <linux/random.h> #include <linux/string.h> +#include <linux/notifier.h> #define NEIGH_DEBUG 1 @@ -58,6 +60,7 @@ static void neigh_app_notify(struct neig #endif static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); +static void rtm_neigh_change(struct neighbour *n); static struct neigh_table *neigh_tables; #ifdef CONFIG_PROC_FS @@ -754,6 +757,7 @@ #endif neigh->nud_state = NUD_STALE; neigh->updated = jiffies; neigh_suspect(neigh); + notify = 1; } } else if (state & NUD_DELAY) { if (time_before_eq(now, @@ -762,6 +766,7 @@ #endif neigh->nud_state = NUD_REACHABLE; neigh->updated = jiffies; neigh_connect(neigh); + notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); @@ -819,6 +824,8 @@ #endif out: write_unlock(&neigh->lock); } + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) @@ -926,9 +933,7 @@ int neigh_update(struct neighbour *neigh { u8 old; int err; -#ifdef CONFIG_ARPD int notify = 0; -#endif struct net_device *dev; int update_isrouter = 0; @@ -948,9 +953,7 @@ #endif neigh_suspect(neigh); neigh->nud_state = new; err = 0; -#ifdef CONFIG_ARPD notify = old & NUD_VALID; -#endif goto out; } @@ -1022,9 +1025,7 @@ #endif if (!(new & NUD_CONNECTED)) neigh->confirmed = jiffies - (neigh->parms->base_reachable_time << 1); -#ifdef CONFIG_ARPD notify = 1; -#endif } if (new == old) goto out; @@ -1055,7 +1056,11 @@ out: (neigh->flags | NTF_ROUTER) : (neigh->flags & ~NTF_ROUTER); } + write_unlock_bh(&neigh->lock); + + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) neigh_app_notify(neigh); @@ -2369,9 +2374,27 @@ static void neigh_app_notify(struct neig NETLINK_CB(skb).dst_group = RTNLGRP_NEIGH; netlink_broadcast(rtnl, skb, 0, RTNLGRP_NEIGH, GFP_ATOMIC); } - #endif /* CONFIG_ARPD */ +static void rtm_neigh_change(struct neighbour *n) +{ + struct nlmsghdr *nlh; + int size = NLMSG_SPACE(sizeof(struct ndmsg) + 256); + struct sk_buff *skb = alloc_skb(size, GFP_ATOMIC); + + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); + if (!skb) + return; + + if (neigh_fill_info(skb, n, 0, 0, RTM_NEIGHUPD, 0) < 0) { + kfree_skb(skb); + return; + } + nlh = (struct nlmsghdr *)skb->data; + NETLINK_CB(skb).dst_group = RTNLGRP_NEIGH; + netlink_broadcast(rtnl, skb, 0, RTNLGRP_NEIGH, GFP_ATOMIC); +} + #ifdef CONFIG_SYSCTL static struct neigh_sysctl_table { diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 5f87533..33d8a83 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -44,6 +44,7 @@ #include <net/tcp.h> #include <net/sock.h> #include <net/ip_fib.h> #include <net/ip_mp_alg.h> +#include <net/netevent.h> #include "fib_lookup.h" @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc struct sk_buff *skb; u32 pid = req ? req->pid : n->nlmsg_pid; int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); + struct netevent_route_info nri; + int netevent; + + nri.family = AF_INET; + nri.data = &fa->fa_info; + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD + : NETEVENT_ROUTE_DEL; + call_netevent_notifiers(netevent, &nri); skb = alloc_skb(size, GFP_KERNEL); if (!skb) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 2dc6dbb..18879e6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -104,6 +104,7 @@ #include <net/tcp.h> #include <net/icmp.h> #include <net/xfrm.h> #include <net/ip_mp_alg.h> +#include <net/netevent.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif @@ -151,6 +152,8 @@ static struct dst_entry *ipv4_negative_a static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu); static int rt_garbage_collect(void); +static int rt_fill_info(struct sk_buff *skb, u32 pid, u32 seq, int event, + int nowait, unsigned int flags, unsigned int prot); static struct dst_ops ipv4_dst_ops = { @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct spin_unlock_bh(rt_hash_lock_addr(hash)); } +static void rtm_redirect(struct rtable *old, struct rtable *new) +{ + struct netevent_redirect netevent; + struct sk_buff *skb; + int err; + + netevent.old = &old->u.dst; + netevent.new = &new->u.dst; + + /* notify netevent subscribers */ + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); + + /* Post NETLINK messages: RTM_DELROUTE for old route, + RTM_NEWROUTE for new route */ + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &old->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_DELROUTE, 1, 0, RTPROT_UNSPEC); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &new->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_NEWROUTE, 1, 0, RTPROT_REDIRECT); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + void ip_rt_redirect(u32 old_gw, u32 daddr, u32 new_gw, u32 saddr, struct net_device *dev) { @@ -1216,6 +1265,8 @@ void ip_rt_redirect(u32 old_gw, u32 dadd rt_drop(rt); goto do_next; } + + rtm_redirect(rth, rt); rt_del(hash, rth); if (!rt_intern_hash(hash, rt, &rt)) @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct return est_mtu ? : new_mtu; } +static void rtm_pmtu_update(struct rtable *rt) +{ + struct sk_buff *skb; + int err; + + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &rt->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt_fill_info(skb, 0, 0, RTM_ROUTEUPD, 1, 0, RTPROT_UNSPEC); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV4_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu) { if (dst->metrics[RTAX_MTU-1] > mtu && mtu >= 68 && @@ -1452,6 +1529,7 @@ static void ip_rt_update_pmtu(struct dst } dst->metrics[RTAX_MTU-1] = mtu; dst_set_expires(dst, ip_rt_mtu_expires); + rtm_pmtu_update((struct rtable *)dst); } } @@ -2627,7 +2705,7 @@ int ip_route_output_key(struct rtable ** } static int rt_fill_info(struct sk_buff *skb, u32 pid, u32 seq, int event, - int nowait, unsigned int flags) + int nowait, unsigned int flags, unsigned int prot) { struct rtable *rt = (struct rtable*)skb->dst; struct rtmsg *r; @@ -2646,7 +2724,7 @@ #endif r->rtm_table = RT_TABLE_MAIN; r->rtm_type = rt->rt_type; r->rtm_scope = RT_SCOPE_UNIVERSE; - r->rtm_protocol = RTPROT_UNSPEC; + r->rtm_protocol = prot; r->rtm_flags = (rt->rt_flags & ~0xFFFF) | RTM_F_CLONED; if (rt->rt_flags & RTCF_NOTIFY) r->rtm_flags |= RTM_F_NOTIFY; @@ -2792,7 +2870,7 @@ int inet_rtm_getroute(struct sk_buff *in NETLINK_CB(skb).dst_pid = NETLINK_CB(in_skb).pid; err = rt_fill_info(skb, NETLINK_CB(in_skb).pid, nlh->nlmsg_seq, - RTM_NEWROUTE, 0, 0); + RTM_NEWROUTE, 0, 0, RTPROT_UNSPEC); if (!err) goto out_free; if (err < 0) { @@ -2830,7 +2908,7 @@ int ip_rt_dump(struct sk_buff *skb, str skb->dst = dst_clone(&rt->u.dst); if (rt_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, RTM_NEWROUTE, - 1, NLM_F_MULTI) <= 0) { + 1, NLM_F_MULTI, RTPROT_UNSPEC) <= 0) { dst_release(xchg(&skb->dst, NULL)); rcu_read_unlock_bh(); goto done; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 87c39c9..a2b1d53 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -53,6 +53,7 @@ #include <net/tcp.h> #include <linux/rtnetlink.h> #include <net/dst.h> #include <net/xfrm.h> +#include <net/netevent.h> #include <asm/uaccess.h> @@ -96,6 +97,10 @@ static int ip6_pkt_discard(struct sk_bu static int ip6_pkt_discard_out(struct sk_buff *skb); static void ip6_link_failure(struct sk_buff *skb); static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu); +static int rt6_fill_node(struct sk_buff *skb, struct rt6_info *rt, + struct in6_addr *dst, struct in6_addr *src, + int iif, int type, u32 pid, u32 seq, + int prefix, unsigned int flags); #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_add_route_info(struct in6_addr *prefix, int prefixlen, @@ -731,6 +736,32 @@ static void ip6_link_failure(struct sk_b } } +static void rtm_pmtu_update(struct rt6_info *rt) +{ + struct sk_buff *skb; + int err; + + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + skb->dst = &rt->u.dst; + NETLINK_CB(skb).dst_pid = 0; + + err = rt6_fill_node(skb, rt, NULL, NULL, 0, RTM_ROUTEUPD, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu) { struct rt6_info *rt6 = (struct rt6_info*)dst; @@ -742,6 +773,7 @@ static void ip6_rt_update_pmtu(struct ds dst->metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG; } dst->metrics[RTAX_MTU-1] = mtu; + rtm_pmtu_update(rt6); } } @@ -907,6 +939,7 @@ int ip6_route_add(struct in6_rtmsg *rtms struct net_device *dev = NULL; struct inet6_dev *idev = NULL; int addr_type; + struct netevent_route_info nri; rta = (struct rtattr **) _rtattr; @@ -1085,6 +1118,9 @@ install_route: rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_mtu(&rt->u.dst)); rt->u.dst.dev = dev; rt->rt6i_idev = idev; + nri.family = AF_INET6; + nri.data = rt; + call_netevent_notifiers(NETEVENT_ROUTE_ADD, &nri); return ip6_ins_rt(rt, nlh, _rtattr, req); out: @@ -1116,6 +1152,7 @@ static int ip6_route_del(struct in6_rtms struct fib6_node *fn; struct rt6_info *rt; int err = -ESRCH; + struct netevent_route_info nri; read_lock_bh(&rt6_lock); @@ -1137,6 +1174,10 @@ static int ip6_route_del(struct in6_rtms continue; dst_hold(&rt->u.dst); read_unlock_bh(&rt6_lock); + + nri.family = AF_INET6; + nri.data = rt; + call_netevent_notifiers(NETEVENT_ROUTE_DEL, &nri); return ip6_del_rt(rt, nlh, _rtattr, req); } @@ -1146,6 +1187,50 @@ static int ip6_route_del(struct in6_rtms return err; } +static void rtm_redirect(struct rt6_info *old, struct rt6_info *new) +{ + struct netevent_redirect netevent; + struct sk_buff *skb; + int err; + + netevent.old = &old->u.dst; + netevent.new = &new->u.dst; + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); + + /* Post NETLINK messages: RTM_DELROUTE for old route, + RTM_NEWROUTE for new route */ + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + NETLINK_CB(skb).dst_pid = 0; + NETLINK_CB(skb).dst_group = RTNLGRP_IPV6_ROUTE; + + err = rt6_fill_node(skb, old, NULL, NULL, 0, RTM_DELROUTE, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + return; + skb->mac.raw = skb->nh.raw = skb->data; + NETLINK_CB(skb).dst_pid = 0; + NETLINK_CB(skb).dst_group = RTNLGRP_IPV6_ROUTE; + + err = rt6_fill_node(skb, new, NULL, NULL, 0, RTM_NEWROUTE, 0, 0, 0, 0); + if (err <= 0) + goto out_free; + + netlink_broadcast(rtnl, skb, 0, RTNLGRP_IPV6_ROUTE, GFP_ATOMIC); + return; + +out_free: + kfree_skb(skb); + return; +} + /* * Handle redirects */ @@ -1252,6 +1337,8 @@ restart: if (ip6_ins_rt(nrt, NULL, NULL, NULL)) goto out; + rtm_redirect(rt, nrt); + if (rt->rt6i_flags&RTF_CACHE) { ip6_del_rt(rt, NULL, NULL, NULL); return; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-18 18:49 ` [PATCH Round 4 2/3] Core network changes to support network event notification Steve Wise @ 2006-07-25 7:39 ` Herbert Xu 2006-07-25 15:05 ` Steve Wise 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2006-07-25 7:39 UTC (permalink / raw) To: Steve Wise; +Cc: davem, rdreier, openib-general, netdev Steve Wise <swise@opengridcomputing.com> wrote: > > Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE > and RTM_NEWROUTE. This may confuse existing rtnetlink users since you're generating an RTM_DELROUTE message that's identical to one triggered by something like 'ip route del'. As you're introducing a completely new RTM_ROUTEUPD type, it might be better to attach any information from the existing route that you need to the ROUTEUPD message. Actually, what was the reason you need the existing route here? > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index 5f87533..33d8a83 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -44,6 +44,7 @@ #include <net/tcp.h> > #include <net/sock.h> > #include <net/ip_fib.h> > #include <net/ip_mp_alg.h> > +#include <net/netevent.h> > > #include "fib_lookup.h" > > @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc > struct sk_buff *skb; > u32 pid = req ? req->pid : n->nlmsg_pid; > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > + struct netevent_route_info nri; > + int netevent; > + > + nri.family = AF_INET; > + nri.data = &fa->fa_info; > + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD > + : NETEVENT_ROUTE_DEL; > + call_netevent_notifiers(netevent, &nri); Hmm, this is broken. These route events are meaningless without the corresponding IP rule events. Are you sure you really want to make your hardware/driver grok multiple routing tables? Perhaps you should simply stick to dst entries and flush all your tables when the routes are changed. This is what the Linux IP stack does. > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 2dc6dbb..18879e6 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct > spin_unlock_bh(rt_hash_lock_addr(hash)); > } > > +static void rtm_redirect(struct rtable *old, struct rtable *new) > +{ > + struct netevent_redirect netevent; > + struct sk_buff *skb; > + int err; > + > + netevent.old = &old->u.dst; > + netevent.new = &new->u.dst; > + > + /* notify netevent subscribers */ > + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); > + > + /* Post NETLINK messages: RTM_DELROUTE for old route, > + RTM_NEWROUTE for new route */ > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); Please use a better size estimate rather than NLMSG_GOODSIZE here since you're doing GFP_ATOMIC. > @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct > return est_mtu ? : new_mtu; > } > > +static void rtm_pmtu_update(struct rtable *rt) > +{ > + struct sk_buff *skb; > + int err; > + > + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); > + > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); Ditto. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-25 7:39 ` Herbert Xu @ 2006-07-25 15:05 ` Steve Wise 2006-07-26 3:39 ` Herbert Xu 0 siblings, 1 reply; 9+ messages in thread From: Steve Wise @ 2006-07-25 15:05 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, rdreier, openib-general, netdev On Tue, 2006-07-25 at 17:39 +1000, Herbert Xu wrote: > Steve Wise <swise@opengridcomputing.com> wrote: > > > > Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE > > and RTM_NEWROUTE. > > This may confuse existing rtnetlink users since you're generating an > RTM_DELROUTE message that's identical to one triggered by something > like 'ip route del'. > Yea, I didn't really want to create a REDIRECT rtmsg, so I punted. :-) But they really are seeing a delete followed by an add. That's what the kernel is doing. > As you're introducing a completely new RTM_ROUTEUPD type, it might > be better to attach any information from the existing route that you > need to the ROUTEUPD message. Yea, the main change is the next hop ip address or gateway field. > > Actually, what was the reason you need the existing route here? > The rdma driver needs to update all established rdma connections that are using the next-hop information of the existing route and make them use the next-hop information of the new route. In addition, the rdma driver might have a reference to the old dst entry. So it can release that ref and add a ref to the new dst entry. > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > > index 5f87533..33d8a83 100644 > > --- a/net/ipv4/fib_semantics.c > > +++ b/net/ipv4/fib_semantics.c > > @@ -44,6 +44,7 @@ #include <net/tcp.h> > > #include <net/sock.h> > > #include <net/ip_fib.h> > > #include <net/ip_mp_alg.h> > > +#include <net/netevent.h> > > > > #include "fib_lookup.h" > > > > @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc > > struct sk_buff *skb; > > u32 pid = req ? req->pid : n->nlmsg_pid; > > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > > + struct netevent_route_info nri; > > + int netevent; > > + > > + nri.family = AF_INET; > > + nri.data = &fa->fa_info; > > + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD > > + : NETEVENT_ROUTE_DEL; > > + call_netevent_notifiers(netevent, &nri); > > Hmm, this is broken. These route events are meaningless without the > corresponding IP rule events. Are you sure you really want to make > your hardware/driver grok multiple routing tables? > > Perhaps you should simply stick to dst entries and flush all your > tables when the routes are changed. This is what the Linux IP stack > does. > I have to admit I'm a little fuzzy on the routing stuff. The main netevents I've utilized in the the rdma driver I'm writing is the neighbour update event and the redirect event. Route add/del was added for completeness of "routing" netevents. Can you expand further or point me to code where the IP stack "flushes its tables" when routes are changed? >From my experience, all the rdma driver needs is the dst entry. It using the routing table to determine the dst_entry at connection establish time. And it needs to know if the next-hop or PMTU ever changes. > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > > index 2dc6dbb..18879e6 100644 > > --- a/net/ipv4/route.c > > +++ b/net/ipv4/route.c > > @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct > > spin_unlock_bh(rt_hash_lock_addr(hash)); > > } > > > > +static void rtm_redirect(struct rtable *old, struct rtable *new) > > +{ > > + struct netevent_redirect netevent; > > + struct sk_buff *skb; > > + int err; > > + > > + netevent.old = &old->u.dst; > > + netevent.new = &new->u.dst; > > + > > + /* notify netevent subscribers */ > > + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); > > + > > + /* Post NETLINK messages: RTM_DELROUTE for old route, > > + RTM_NEWROUTE for new route */ > > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); > > Please use a better size estimate rather than NLMSG_GOODSIZE here since > you're doing GFP_ATOMIC. > ok > > @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct > > return est_mtu ? : new_mtu; > > } > > > > +static void rtm_pmtu_update(struct rtable *rt) > > +{ > > + struct sk_buff *skb; > > + int err; > > + > > + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst); > > + > > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); > > Ditto. > ok Thanks, Steve. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-25 15:05 ` Steve Wise @ 2006-07-26 3:39 ` Herbert Xu 2006-07-26 16:15 ` Steve Wise 0 siblings, 1 reply; 9+ messages in thread From: Herbert Xu @ 2006-07-26 3:39 UTC (permalink / raw) To: Steve Wise; +Cc: davem, rdreier, openib-general, netdev On Tue, Jul 25, 2006 at 10:05:40AM -0500, Steve Wise wrote: > > But they really are seeing a delete followed by an add. That's what the > kernel is doing. Actually that's the other thing I don't really like. The user-space monitor may perceive that a route was actually deleted and replaced by a new one even though this isn't what's happening at all. In fact the problem here is that you're sending route notifications when it's really the dst_entry that's changing. User-space as it stands only get notifications about fib changes which is quite different from changes to the transient dst_entry objects which only exist in the route cache. Is anyone actually going to use the user-space interface of this? If not perhaps we should wait until someone really needs it before adding the netlink part of the patch. We can change the kernel interface at will so if we make a mistake with netevent it can be easily corrected. For user-space though the rules are totally different. I'd really hate to be stuck with an interface which turns out to not be the one that people actually want to have. > The rdma driver needs to update all established rdma connections that > are using the next-hop information of the existing route and make them > use the next-hop information of the new route. In addition, the rdma > driver might have a reference to the old dst entry. So it can release > that ref and add a ref to the new dst entry. Do you really need the old route for the user-space part of your patch? > I have to admit I'm a little fuzzy on the routing stuff. The main > netevents I've utilized in the the rdma driver I'm writing is the > neighbour update event and the redirect event. Route add/del was added > for completeness of "routing" netevents. So you mean you aren't going to use the route notifications? In that case we should probably just drop them and add them when someone actually needs it. At that point they can tell us what semantics they want from it :) > Can you expand further or point me to code where the IP stack "flushes > its tables" when routes are changed? Grep for rt_cache_flush in net/ipv4/fib_hash.c. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-26 3:39 ` Herbert Xu @ 2006-07-26 16:15 ` Steve Wise 2006-07-26 20:56 ` David Miller 0 siblings, 1 reply; 9+ messages in thread From: Steve Wise @ 2006-07-26 16:15 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, rdreier, openib-general, netdev On Wed, 2006-07-26 at 13:39 +1000, Herbert Xu wrote: > On Tue, Jul 25, 2006 at 10:05:40AM -0500, Steve Wise wrote: > > > > But they really are seeing a delete followed by an add. That's what the > > kernel is doing. > > Actually that's the other thing I don't really like. The user-space > monitor may perceive that a route was actually deleted and replaced > by a new one even though this isn't what's happening at all. > > In fact the problem here is that you're sending route notifications > when it's really the dst_entry that's changing. User-space as it > stands only get notifications about fib changes which is quite different > from changes to the transient dst_entry objects which only exist in the > route cache. > > Is anyone actually going to use the user-space interface of this? If not > perhaps we should wait until someone really needs it before adding the > netlink part of the patch. > > We can change the kernel interface at will so if we make a mistake with > netevent it can be easily corrected. For user-space though the rules > are totally different. I'd really hate to be stuck with an interface > which turns out to not be the one that people actually want to have. > The user interface is not needed for the rdma users. They are all in kernel. I added this at the request of reviewers of this patch. I have no problem at all defering the rtnetlink integration until someone really needs it. > > The rdma driver needs to update all established rdma connections that > > are using the next-hop information of the existing route and make them > > use the next-hop information of the new route. In addition, the rdma > > driver might have a reference to the old dst entry. So it can release > > that ref and add a ref to the new dst entry. > > Do you really need the old route for the user-space part of your patch? > Not if we remove the user-space parts. :-) > > I have to admit I'm a little fuzzy on the routing stuff. The main > > netevents I've utilized in the the rdma driver I'm writing is the > > neighbour update event and the redirect event. Route add/del was added > > for completeness of "routing" netevents. > > So you mean you aren't going to use the route notifications? In that case > we should probably just drop them and add them when someone actually needs > it. At that point they can tell us what semantics they want from it :) > This is fine by me too! The key events needed for rdma are: neighbour update events rtredirect events pmtu change events > > Can you expand further or point me to code where the IP stack "flushes > > its tables" when routes are changed? > > Grep for rt_cache_flush in net/ipv4/fib_hash.c. > thanks. Dave, what do you think about removing the user-space stuff for the first round of integration? IE: Just add netevents and kernel hooks to generate them. Steve. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH Round 4 2/3] Core network changes to support network event notification. 2006-07-26 16:15 ` Steve Wise @ 2006-07-26 20:56 ` David Miller 0 siblings, 0 replies; 9+ messages in thread From: David Miller @ 2006-07-26 20:56 UTC (permalink / raw) To: swise; +Cc: herbert, rdreier, openib-general, netdev From: Steve Wise <swise@opengridcomputing.com> Date: Wed, 26 Jul 2006 11:15:43 -0500 > Dave, what do you think about removing the user-space stuff for the > first round of integration? IE: Just add netevents and kernel hooks to > generate them. Sure. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch. 2006-07-18 18:48 [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Steve Wise 2006-07-18 18:48 ` [PATCH Round 4 1/3] " Steve Wise 2006-07-18 18:49 ` [PATCH Round 4 2/3] Core network changes to support network event notification Steve Wise @ 2006-07-18 18:49 ` Steve Wise 2 siblings, 0 replies; 9+ messages in thread From: Steve Wise @ 2006-07-18 18:49 UTC (permalink / raw) To: davem, rdreier; +Cc: openib-general, netdev --- drivers/infiniband/core/addr.c | 30 ++++++++++++++---------------- 1 files changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d294bbc..1205e80 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -35,6 +35,7 @@ #include <linux/if_arp.h> #include <net/arp.h> #include <net/neighbour.h> #include <net/route.h> +#include <net/netevent.h> #include <rdma/ib_addr.h> MODULE_AUTHOR("Sean Hefty"); @@ -326,25 +327,22 @@ void rdma_addr_cancel(struct rdma_dev_ad } EXPORT_SYMBOL(rdma_addr_cancel); -static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pkt, struct net_device *orig_dev) +static int netevent_callback(struct notifier_block *self, unsigned long event, + void *ctx) { - struct arphdr *arp_hdr; + if (event == NETEVENT_NEIGH_UPDATE) { + struct neighbour *neigh = ctx; - arp_hdr = (struct arphdr *) skb->nh.raw; - - if (arp_hdr->ar_op == htons(ARPOP_REQUEST) || - arp_hdr->ar_op == htons(ARPOP_REPLY)) - set_timeout(jiffies); - - kfree_skb(skb); + if (neigh->dev->type == ARPHRD_INFINIBAND && + (neigh->nud_state & NUD_VALID)) { + set_timeout(jiffies); + } + } return 0; } -static struct packet_type addr_arp = { - .type = __constant_htons(ETH_P_ARP), - .func = addr_arp_recv, - .af_packet_priv = (void*) 1, +static struct notifier_block nb = { + .notifier_call = netevent_callback }; static int addr_init(void) @@ -353,13 +351,13 @@ static int addr_init(void) if (!addr_wq) return -ENOMEM; - dev_add_pack(&addr_arp); + register_netevent_notifier(&nb); return 0; } static void addr_cleanup(void) { - dev_remove_pack(&addr_arp); + unregister_netevent_notifier(&nb); destroy_workqueue(addr_wq); } ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-07-26 20:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-18 18:48 [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Steve Wise 2006-07-18 18:48 ` [PATCH Round 4 1/3] " Steve Wise 2006-07-18 18:49 ` [PATCH Round 4 2/3] Core network changes to support network event notification Steve Wise 2006-07-25 7:39 ` Herbert Xu 2006-07-25 15:05 ` Steve Wise 2006-07-26 3:39 ` Herbert Xu 2006-07-26 16:15 ` Steve Wise 2006-07-26 20:56 ` David Miller 2006-07-18 18:49 ` [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).