* [PATCH 0/2][RFC] Network Event Notifier Mechanism
@ 2006-06-21 18:45 Steve Wise
2006-06-21 18:45 ` [PATCH 1/2] " Steve Wise
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Steve Wise @ 2006-06-21 18:45 UTC (permalink / raw)
To: netdev
This patch implements a mechanism that allows interested clients to
register for notification of certain network events. The intended use
is to allow RDMA devices (linux/drivers/infiniband) to be notified of
neighbour updates, ICMP redirects, path MTU changes, and route changes.
The reason these devices need update events is because they typically
cache this information in hardware and need to be notified when this
information has been updated.
This approach is one of many possibilities and may be preferred because it
uses an existing notification mechanism that has precedent in the stack.
An alternative would be to add a netdev method to notify affect devices
of these events.
This code does not yet implement path MTU change because the number of
places in which this value is updated is large and if this mechanism
seems reasonable, it would be probably be best to funnel these updates
through a single function.
We would like to get this or similar functionality included in 2.6.19
and request comments.
This patchset consists of 2 patches:
1) New files implementing the Network Event Notifier
2) Core network changes to generate network event notifications
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/2] Network Event Notifier Mechanism. 2006-06-21 18:45 [PATCH 0/2][RFC] Network Event Notifier Mechanism Steve Wise @ 2006-06-21 18:45 ` Steve Wise 2006-06-21 18:45 ` [PATCH 2/2] Core network changes to support network event notification Steve Wise ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Steve Wise @ 2006-06-21 18:45 UTC (permalink / raw) To: netdev This patch uses notifier blocks to implement a network event notifier mechanism. Clients register their callback function by calling register_netevent_notifier() like this: static struct notifier_block nb = { .notifier_call = my_callback_func }; ... register_netevent_notifier(&nb); --- include/net/netevent.h | 41 +++++++++++++++++++++++++++++ net/core/netevent.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+), 0 deletions(-) diff --git a/include/net/netevent.h b/include/net/netevent.h new file mode 100644 index 0000000..9ceab27 --- /dev/null +++ b/include/net/netevent.h @@ -0,0 +1,41 @@ +#ifndef _NET_EVENT_H +#define _NET_EVENT_H + +/* + * Generic netevent notifiers + * + * Authors: + * Tom Tucker <tom@opengridcomputing.com> + * + * Changes: + */ + +#ifdef __KERNEL__ + +#include <net/dst.h> + +struct netevent_redirect { + struct dst_entry *old; + struct dst_entry *new; +}; + +struct netevent_route_change { + int event; + struct fib_info *fib_info; +}; + +enum netevent_notif_type { + NETEVENT_NEIGH_UPDATE = 1, /* arg is * struct neighbour */ + NETEVENT_ROUTE_UPDATE, /* arg is * netevent_route_change */ + NETEVENT_PMTU_UPDATE, + NETEVENT_REDIRECT, /* arg is * struct netevent_redirect */ +}; + +extern int register_netevent_notifier(struct notifier_block *nb); +extern int unregister_netevent_notifier(struct notifier_block *nb); +extern int call_netevent_notifiers(unsigned long val, void *v); + +#endif +#endif + + diff --git a/net/core/netevent.c b/net/core/netevent.c new file mode 100644 index 0000000..2261fb3 --- /dev/null +++ b/net/core/netevent.c @@ -0,0 +1,67 @@ +/* + * Network event notifiers + * + * Authors: + * Tom Tucker <tom@opengridcomputing.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + */ + +#include <linux/rtnetlink.h> +#include <linux/notifier.h> + +static struct atomic_notifier_head netevent_notif_chain; + +/** + * register_netevent_notifier - register a netevent notifier block + * @nb: notifier + * + * Register a notifier to be called when a netevent occurs. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + */ +int register_netevent_notifier(struct notifier_block *nb) +{ + int err; + + err = atomic_notifier_chain_register(&netevent_notif_chain, nb); + return err; +} + +/** + * netevent_unregister_notifier - unregister a netevent notifier block + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_neigh_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + */ + +int unregister_netevent_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&netevent_notif_chain, nb); +} + +/** + * call_netevent_notifiers - call all netevent notifier blocks + * @val: value passed unmodified to notifier function + * @v: pointer passed unmodified to notifier function + * + * Call all neighbour notifier blocks. Parameters and return value + * are as for notifier_call_chain(). + */ + +int call_netevent_notifiers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&netevent_notif_chain, val, v); +} + +EXPORT_SYMBOL_GPL(register_netevent_notifier); +EXPORT_SYMBOL_GPL(unregister_netevent_notifier); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] Core network changes to support network event notification. 2006-06-21 18:45 [PATCH 0/2][RFC] Network Event Notifier Mechanism Steve Wise 2006-06-21 18:45 ` [PATCH 1/2] " Steve Wise @ 2006-06-21 18:45 ` Steve Wise 2006-06-21 19:08 ` [PATCH 0/2][RFC] Network Event Notifier Mechanism YOSHIFUJI Hideaki / 吉藤英明 2006-06-22 8:57 ` David Miller 3 siblings, 0 replies; 16+ messages in thread From: Steve Wise @ 2006-06-21 18:45 UTC (permalink / raw) To: netdev This patch adds event calls for neighbour change, route update, and routing redirect events. TODO: PMTU change events. --- net/core/Makefile | 2 +- net/core/neighbour.c | 8 ++++++++ net/ipv4/fib_semantics.c | 7 +++++++ net/ipv4/route.c | 6 ++++++ 4 files changed, 22 insertions(+), 1 deletions(-) diff --git a/net/core/Makefile b/net/core/Makefile index e9bd246..2645ba4 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -7,7 +7,7 @@ obj-y := sock.o request_sock.o skbuff.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-y += dev.o ethtool.o dev_mcast.o dst.o \ +obj-y += dev.o ethtool.o dev_mcast.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_XFRM) += flow.o diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 50a8c73..c637897 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -30,9 +30,11 @@ #include <linux/times.h> #include <net/neighbour.h> #include <net/dst.h> #include <net/sock.h> +#include <net/netevent.h> #include <linux/rtnetlink.h> #include <linux/random.h> #include <linux/string.h> +#include <linux/notifier.h> #define NEIGH_DEBUG 1 @@ -755,6 +757,7 @@ #endif neigh->nud_state = NUD_STALE; neigh->updated = jiffies; neigh_suspect(neigh); + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); } } else if (state & NUD_DELAY) { if (time_before_eq(now, @@ -763,6 +766,7 @@ #endif neigh->nud_state = NUD_REACHABLE; neigh->updated = jiffies; neigh_connect(neigh); + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); @@ -783,6 +787,7 @@ #endif neigh->nud_state = NUD_FAILED; neigh->updated = jiffies; notify = 1; + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); NEIGH_CACHE_STAT_INC(neigh->tbl, res_failed); NEIGH_PRINTK2("neigh %p is failed.\n", neigh); @@ -1056,6 +1061,9 @@ out: (neigh->flags | NTF_ROUTER) : (neigh->flags & ~NTF_ROUTER); } + + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); + write_unlock_bh(&neigh->lock); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 0f4145b..67a30af 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -45,6 +45,7 @@ #include <net/tcp.h> #include <net/sock.h> #include <net/ip_fib.h> #include <net/ip_mp_alg.h> +#include <net/netevent.h> #include "fib_lookup.h" @@ -278,9 +279,15 @@ void rtmsg_fib(int event, u32 key, struc struct nlmsghdr *n, struct netlink_skb_parms *req) { struct sk_buff *skb; + struct netevent_route_change rev; + u32 pid = req ? req->pid : n->nlmsg_pid; int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); + rev.event = event; + rev.fib_info = fa->fa_info; + call_netevent_notifiers(NETEVENT_ROUTE_UPDATE, &rev); + skb = alloc_skb(size, GFP_KERNEL); if (!skb) return; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index cc9423d..e9ba831 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -105,6 +105,7 @@ #include <net/tcp.h> #include <net/icmp.h> #include <net/xfrm.h> #include <net/ip_mp_alg.h> +#include <net/netevent.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif @@ -1120,6 +1121,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd struct rtable *rth, **rthp; u32 skeys[2] = { saddr, 0 }; int ikeys[2] = { dev->ifindex, 0 }; + struct netevent_redirect netevent; if (!in_dev) return; @@ -1211,6 +1213,10 @@ void ip_rt_redirect(u32 old_gw, u32 dadd rt_drop(rt); goto do_next; } + + netevent.old = &rth->u.dst; + netevent.new = &rt->u.dst; + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent); rt_del(hash, rth); if (!rt_intern_hash(hash, rt, &rt)) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-21 18:45 [PATCH 0/2][RFC] Network Event Notifier Mechanism Steve Wise 2006-06-21 18:45 ` [PATCH 1/2] " Steve Wise 2006-06-21 18:45 ` [PATCH 2/2] Core network changes to support network event notification Steve Wise @ 2006-06-21 19:08 ` YOSHIFUJI Hideaki / 吉藤英明 2006-06-22 8:57 ` David Miller 3 siblings, 0 replies; 16+ messages in thread From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-21 19:08 UTC (permalink / raw) To: swise; +Cc: netdev, yoshfuji In article <20060621184519.10425.69175.stgit@stevo-desktop> (at Wed, 21 Jun 2006 13:45:19 -0500), Steve Wise <swise@opengridcomputing.com> says: > This patch implements a mechanism that allows interested clients to > register for notification of certain network events. The intended use > is to allow RDMA devices (linux/drivers/infiniband) to be notified of > neighbour updates, ICMP redirects, path MTU changes, and route changes. Why not netlink? Neighbor / routing updates should be transmitted via netlink, at least. --yoshfuji ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-21 18:45 [PATCH 0/2][RFC] Network Event Notifier Mechanism Steve Wise ` (2 preceding siblings ...) 2006-06-21 19:08 ` [PATCH 0/2][RFC] Network Event Notifier Mechanism YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-22 8:57 ` David Miller 2006-06-22 13:53 ` Steve Wise 3 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2006-06-22 8:57 UTC (permalink / raw) To: swise; +Cc: netdev From: Steve Wise <swise@opengridcomputing.com> Date: Wed, 21 Jun 2006 13:45:19 -0500 > This patch implements a mechanism that allows interested clients to > register for notification of certain network events. We have a generic network event notification facility called netlink, please use it and extend it for your needs if necessary. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 8:57 ` David Miller @ 2006-06-22 13:53 ` Steve Wise 2006-06-22 15:27 ` Steve Wise 0 siblings, 1 reply; 16+ messages in thread From: Steve Wise @ 2006-06-22 13:53 UTC (permalink / raw) To: David Miller; +Cc: netdev On Thu, 2006-06-22 at 01:57 -0700, David Miller wrote: > From: Steve Wise <swise@opengridcomputing.com> > Date: Wed, 21 Jun 2006 13:45:19 -0500 > > > This patch implements a mechanism that allows interested clients to > > register for notification of certain network events. > > We have a generic network event notification facility called > netlink, please use it and extend it for your needs if necessary. I'll investigate this. Thanks, Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 13:53 ` Steve Wise @ 2006-06-22 15:27 ` Steve Wise 2006-06-22 19:43 ` jamal 0 siblings, 1 reply; 16+ messages in thread From: Steve Wise @ 2006-06-22 15:27 UTC (permalink / raw) To: David Miller; +Cc: netdev On Thu, 2006-06-22 at 08:53 -0500, Steve Wise wrote: > On Thu, 2006-06-22 at 01:57 -0700, David Miller wrote: > > From: Steve Wise <swise@opengridcomputing.com> > > Date: Wed, 21 Jun 2006 13:45:19 -0500 > > > > > This patch implements a mechanism that allows interested clients to > > > register for notification of certain network events. > > > > We have a generic network event notification facility called > > netlink, please use it and extend it for your needs if necessary. > > I'll investigate this. > > Thanks, The in-kernel Infiniband subsystem needs to know when certain events happen. For example, if the mac address of a neighbour changes. Any rdma devices that are using said neighbour need to be notified of the change. You are asking that I extend the netlink facility (if necessary) to provide this functionality. Are you suggesting, then, that the Infiniband subsystem should create an in-kernel NETLINK socket and obtain these events (and the pertinent information) via the socket? I'm still learning about netlink, but my understanding to date is that its a way to pass events/commands between the kernel and user applications. It perhaps seems overkill to use this mechanism for kernel->kernel event notifications. That's why I started with notifier blocks and added a netevent_notifier mechanism. Any help is greatly appreciated. Sorry if I'm being dense... Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 15:27 ` Steve Wise @ 2006-06-22 19:43 ` jamal 2006-06-22 20:18 ` Steve Wise 2006-06-22 20:40 ` Steve Wise 0 siblings, 2 replies; 16+ messages in thread From: jamal @ 2006-06-22 19:43 UTC (permalink / raw) To: Steve Wise; +Cc: netdev, David Miller On Thu, 2006-22-06 at 10:27 -0500, Steve Wise wrote: > > The in-kernel Infiniband subsystem needs to know when certain events > happen. For example, if the mac address of a neighbour changes. Any > rdma devices that are using said neighbour need to be notified of the > change. You are asking that I extend the netlink facility (if > necessary) to provide this functionality. > No - what these 2 gents are saying was these events and infrastructure already exist. If there are some events that dont and you need to extend what already exists. Your patch was a serious reinvention of the wheel (and in the case of the neighbor code looking very wrong). As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. Actually you are probably making this too complicated. Listen to events in user space and tell infiniband from user space. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 19:43 ` jamal @ 2006-06-22 20:18 ` Steve Wise 2006-06-22 20:36 ` jamal 2006-06-22 20:40 ` Steve Wise 1 sibling, 1 reply; 16+ messages in thread From: Steve Wise @ 2006-06-22 20:18 UTC (permalink / raw) To: hadi; +Cc: netdev, David Miller On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > On Thu, 2006-22-06 at 10:27 -0500, Steve Wise wrote: > > > > > The in-kernel Infiniband subsystem needs to know when certain events > > happen. For example, if the mac address of a neighbour changes. Any > > rdma devices that are using said neighbour need to be notified of the > > change. You are asking that I extend the netlink facility (if > > necessary) to provide this functionality. > > > > No - what these 2 gents are saying was these events and infrastructure > already exist. If there are some events that dont and you need to extend > what already exists. Your patch was a serious reinvention of the wheel > (and in the case of the neighbor code looking very wrong). ok. > As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. > Actually you are probably making this too complicated. NETDEV_CHANGEADDR uses a notifier block, and the network subsystem calls call_netdevice_notifiers() when it sets an addr. And any kernel module can register for these events. That's the model I used to create the netevent_notifier mechanism in the patch I posted. I could add the new events to this netdevice notifier, but these aren't really net device events. Their network events. > Listen to events > in user space and tell infiniband from user space. > I can indeed extend the rtnetlink stuff to add the events in question (neighbour mac addr change, route redirect, etc). In fact, there is similar functionality under the CONFIG_ARPD option to support a user space arp daemon. Its not quite the same, and it doesn't cover redirect and routing events, just neighbour events. But in the case of the RDMA subsystem, the consumer of these events is in the kernel. Why is it better to propagate events all the way up to user space, then send the event back down into the Infiniband kernel subsystem? That seems very inefficient. Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 20:18 ` Steve Wise @ 2006-06-22 20:36 ` jamal 2006-06-22 20:58 ` Steve Wise 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2006-06-22 20:36 UTC (permalink / raw) To: Steve Wise; +Cc: David Miller, netdev On Thu, 2006-22-06 at 15:18 -0500, Steve Wise wrote: > On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > > As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. > > Actually you are probably making this too complicated. > > NETDEV_CHANGEADDR uses a notifier block, and the network subsystem calls > call_netdevice_notifiers() when it sets an addr. And any kernel module > can register for these events. That's the model I used to create the > netevent_notifier mechanism in the patch I posted. > it also gets emmited as a netlink event. > I could add the new events to this netdevice notifier, but these aren't > really net device events. Their network events. > Different blocks for sure - the point is the infrastructure which constitutes using notifiers exists. And it is joined at the hip with netlink. > I can indeed extend the rtnetlink stuff to add the events in question > (neighbour mac addr change, route redirect, etc). In fact, there is > similar functionality under the CONFIG_ARPD option to support a user > space arp daemon. Its not quite the same, and it doesn't cover redirect > and routing events, just neighbour events. > CONFIG_ARPD will give you all neighbor events you need. => rt_redirect doesnt exist neither do route cache creation/updates/deletions. FIB changes exist etc > But in the case of the RDMA subsystem, the consumer of these events is > in the kernel. Why is it better to propagate events all the way up to > user space, then send the event back down into the Infiniband kernel > subsystem? That seems very inefficient. Your mileage may vary. If you do it in user space you dont have to wait for the next kernel release in case of a bug. Additionally, it allows for more feature richness that would tend to bloat the kernel/infiniband otherwise. Out of curiosity - what does RDMA NIC have that would need these events? a route table or L2 table etc? Can you elucidate a little? cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 20:36 ` jamal @ 2006-06-22 20:58 ` Steve Wise 2006-06-22 22:14 ` jamal 0 siblings, 1 reply; 16+ messages in thread From: Steve Wise @ 2006-06-22 20:58 UTC (permalink / raw) To: hadi; +Cc: David Miller, netdev On Thu, 2006-06-22 at 16:36 -0400, jamal wrote: > On Thu, 2006-22-06 at 15:18 -0500, Steve Wise wrote: > > On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > > > > As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. > > > Actually you are probably making this too complicated. > > > > NETDEV_CHANGEADDR uses a notifier block, and the network subsystem calls > > call_netdevice_notifiers() when it sets an addr. And any kernel module > > can register for these events. That's the model I used to create the > > netevent_notifier mechanism in the patch I posted. > > > > it also gets emmited as a netlink event. > right. > > I could add the new events to this netdevice notifier, but these aren't > > really net device events. Their network events. > > > > Different blocks for sure - the point is the infrastructure which > constitutes using notifiers exists. And it is joined at the hip with > netlink. > I created a new notifier block in my patch for these network events. I guess I thought I was using the existing infrastructure to provide this notification service. (I thought my patch was lovely :) But I didn't integrate with netlink for user space notification. Mainly cuz I didn't think these events should be propagated up to users unless there was a need. > > I can indeed extend the rtnetlink stuff to add the events in question > > (neighbour mac addr change, route redirect, etc). In fact, there is > > similar functionality under the CONFIG_ARPD option to support a user > > space arp daemon. Its not quite the same, and it doesn't cover redirect > > and routing events, just neighbour events. > > > > CONFIG_ARPD will give you all neighbor events you need. > => rt_redirect doesnt exist neither do route cache > creation/updates/deletions. FIB changes exist etc > Just to clarify, you're suggesting I add any needed netlink hooks for rt_redirect and the others that don't exist today, and use a NETLINK socket in user space to discover these events. Yes? > > But in the case of the RDMA subsystem, the consumer of these events is > > in the kernel. Why is it better to propagate events all the way up to > > user space, then send the event back down into the Infiniband kernel > > subsystem? That seems very inefficient. > > Your mileage may vary. If you do it in user space you dont have to wait > for the next kernel release in case of a bug. As long as all the events are passed up correctly :-) > Additionally, it allows > for more feature richness that would tend to bloat the kernel/infiniband > otherwise. Another issue I see with netlink is that the event notifications aren't reliable. Especially the CONFIG_ARPD stuff because it allocs an sk_buff with ATOMIC. A lost neighbour macaddr change is perhaps fatal for an RDMA connection... > Out of curiosity - what does RDMA NIC have that would need these events? > a route table or L2 table etc? Can you elucidate a little? > Mainly the L2 table, next hop ip addr, and the path mtu. RDMA NICs implement the entire RDMA stack in HW. How they deal with L2 and L3 changes vary to some degree, but what seems to be emerging is that they get this information from the native stack because ARP and ICMP, for example, are always passed up to the native stack. These devices also act a standard Ethernet NIC btw... Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 20:58 ` Steve Wise @ 2006-06-22 22:14 ` jamal 2006-06-23 13:11 ` Steve Wise 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2006-06-22 22:14 UTC (permalink / raw) To: Steve Wise; +Cc: netdev, David Miller On Thu, 2006-22-06 at 15:58 -0500, Steve Wise wrote: > On Thu, 2006-06-22 at 16:36 -0400, jamal wrote: > I created a new notifier block in my patch for these network events. I > guess I thought I was using the existing infrastructure to provide this > notification service. (I thought my patch was lovely :) But I didn't > integrate with netlink for user space notification. Mainly cuz I didn't > think these events should be propagated up to users unless there was a > need. I think they will be useful in user space. Typically you only propagate them if there's a user space program subscribed to listening (there are hooks which will tell you if there's anyone listening). The netdevice events tend to be a lot more usable in a few other blocks because they are lower in the hierachy (i.e routing depends on ip addresses which depend on netdevices) within the kernel unlike in this case where you are the only consumer; so it does sound logical to me to do it in user space; however, not totally unreasonable to do it in the kernel. > Just to clarify, you're suggesting I add any needed netlink hooks for > rt_redirect and the others that don't exist today, and use a NETLINK > socket in user space to discover these events. Yes? > indeed. > > Your mileage may vary. If you do it in user space you dont have to wait > > for the next kernel release in case of a bug. > > As long as all the events are passed up correctly :-) > They have been for years ;-> > > Additionally, it allows > > for more feature richness that would tend to bloat the kernel/infiniband > > otherwise. > > > Another issue I see with netlink is that the event notifications aren't > reliable. Especially the CONFIG_ARPD stuff because it allocs an sk_buff > with ATOMIC. A lost neighbour macaddr change is perhaps fatal for an > RDMA connection... > This would happen in the cases where you are short on memory; i would suspect you will need to allocate memory in your driver as well to update something in the hardware as well - so same problem. You can however work around issues like these in netlink. > > > Out of curiosity - what does RDMA NIC have that would need these events? > > a route table or L2 table etc? Can you elucidate a little? > > > > Mainly the L2 table, next hop ip addr, and the path mtu. RDMA NICs > implement the entire RDMA stack in HW. How they deal with L2 and L3 > changes vary to some degree, but what seems to be emerging is that they > get this information from the native stack because ARP and ICMP, for > example, are always passed up to the native stack. > I am still unclear: You have destination IP address, the dstMAC of the nexthop to get the packet to this IP address and i suspect some srcMAC address you will use sending out as well as the pathMTU to get there correct? Because of the IP address it sounds to me like you are populating an L3 table How is this info used in hardware? Can you explain how an arriving packet would be used by the RDMA in conjunction with this info once it is in the hardware? > These devices also act a standard Ethernet NIC btw... > Meaning there is no funky hardware processing? cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 22:14 ` jamal @ 2006-06-23 13:11 ` Steve Wise 0 siblings, 0 replies; 16+ messages in thread From: Steve Wise @ 2006-06-23 13:11 UTC (permalink / raw) To: hadi; +Cc: netdev, David Miller > > > > > Out of curiosity - what does RDMA NIC have that would need these events? > > > a route table or L2 table etc? Can you elucidate a little? > > > > > > > Mainly the L2 table, next hop ip addr, and the path mtu. RDMA NICs > > implement the entire RDMA stack in HW. How they deal with L2 and L3 > > changes vary to some degree, but what seems to be emerging is that they > > get this information from the native stack because ARP and ICMP, for > > example, are always passed up to the native stack. > > > > I am still unclear: > You have destination IP address, the dstMAC of the nexthop to get the > packet to this IP address and i suspect some srcMAC address you will use > sending out as well as the pathMTU to get there correct? > Because of the IP address it sounds to me like you are populating an L3 > table I mispoke. The HW I'm using really only maintains a table of next hop mac addrs and a table of src mac addrs. Each active RDMA connection in HW keeps an index into each table for building the ethernet header. The _driver_ needs to know when the next hop mac addr changes, or when the next hop itself changes for a given destination so that it can update the active connections and/or the L2T table accordingly. Same deal with the path mtu... > How is this info used in hardware? Can you explain how an arriving > packet would be used by the RDMA in conjunction with this info once it > is in the hardware? > I think my stuff above explains this, eh? > > These devices also act a standard Ethernet NIC btw... > > > > Meaning there is no funky hardware processing? > If an incoming packet is not for one of the active RDMA connections (or a listening RDMA endpoint), then the packet is passed up to the native stack via the device's netdev driver. Stevo. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 19:43 ` jamal 2006-06-22 20:18 ` Steve Wise @ 2006-06-22 20:40 ` Steve Wise 2006-06-22 20:56 ` jamal 1 sibling, 1 reply; 16+ messages in thread From: Steve Wise @ 2006-06-22 20:40 UTC (permalink / raw) To: hadi; +Cc: netdev, David Miller On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > > No - what these 2 gents are saying was these events and infrastructure > already exist. Notification of the exact events needed does not exist today. The key events, again, are: - the neighbour entry mac address has changed. - the next hop ip address (ie the neighbour) for a given dst_entry has changed. - the path mtu for a given dst_entry has changed. Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 20:40 ` Steve Wise @ 2006-06-22 20:56 ` jamal 2006-06-23 13:17 ` Steve Wise 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2006-06-22 20:56 UTC (permalink / raw) To: Steve Wise; +Cc: David Miller, netdev On Thu, 2006-22-06 at 15:40 -0500, Steve Wise wrote: > On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > > > > No - what these 2 gents are saying was these events and infrastructure > > already exist. > > Notification of the exact events needed does not exist today. > Ok, so you cant event make use of anything that already exists? Or is a subset of what you need already there? > The key events, again, are: > > - the neighbour entry mac address has changed. > > > - the next hop ip address (ie the neighbour) for a given dst_entry has > changed. I dont see a difference for the above two from an L2 perspective. Are you keeping track of IP addresses? You didn't answer my question in the previous email as to what RDMA needs to keep track of in hardware. > > - the path mtu for a given dst_entry has changed. > Same with this. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism 2006-06-22 20:56 ` jamal @ 2006-06-23 13:17 ` Steve Wise 0 siblings, 0 replies; 16+ messages in thread From: Steve Wise @ 2006-06-23 13:17 UTC (permalink / raw) To: hadi; +Cc: David Miller, netdev On Thu, 2006-06-22 at 16:56 -0400, jamal wrote: > On Thu, 2006-22-06 at 15:40 -0500, Steve Wise wrote: > > On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: > > > > > > No - what these 2 gents are saying was these events and infrastructure > > > already exist. > > > > Notification of the exact events needed does not exist today. > > > > Ok, so you cant event make use of anything that already exists? > Or is a subset of what you need already there? > > > The key events, again, are: > > > > - the neighbour entry mac address has changed. > > > > > > - the next hop ip address (ie the neighbour) for a given dst_entry has > > changed. > > > I dont see a difference for the above two from an L2 perspective. > Are you keeping track of IP addresses? There is no difference from an L2 perspective, but the RDMA driver needs notification of each so it can correctly manipulate the L2 table in HW and/or control block for the affected active connections. > You didn't answer my question in the previous email as to what RDMA > needs to keep track of in hardware. > See my previous email. To reiterate: The HW I'm working on maintains a L2 table, and each active RDMA connection keeps an index into this table . If the mac addr of the next hop changes, then the L2 Table gets updated. If the next hop itself changes, then each active connection must be kicked to update its index into the L2 table. > > > > - the path mtu for a given dst_entry has changed. > > > > Same with this. > The RDMA HW needs the path mtu for each connection in order to do segmentation. Steve. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-06-23 13:17 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-21 18:45 [PATCH 0/2][RFC] Network Event Notifier Mechanism Steve Wise 2006-06-21 18:45 ` [PATCH 1/2] " Steve Wise 2006-06-21 18:45 ` [PATCH 2/2] Core network changes to support network event notification Steve Wise 2006-06-21 19:08 ` [PATCH 0/2][RFC] Network Event Notifier Mechanism YOSHIFUJI Hideaki / 吉藤英明 2006-06-22 8:57 ` David Miller 2006-06-22 13:53 ` Steve Wise 2006-06-22 15:27 ` Steve Wise 2006-06-22 19:43 ` jamal 2006-06-22 20:18 ` Steve Wise 2006-06-22 20:36 ` jamal 2006-06-22 20:58 ` Steve Wise 2006-06-22 22:14 ` jamal 2006-06-23 13:11 ` Steve Wise 2006-06-22 20:40 ` Steve Wise 2006-06-22 20:56 ` jamal 2006-06-23 13:17 ` Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).