From: Steve Wise <swise@opengridcomputing.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: davem@davemloft.net, rdreier@cisco.com,
openib-general@openib.org, netdev@vger.kernel.org
Subject: Re: [PATCH Round 4 2/3] Core network changes to support network event notification.
Date: Tue, 25 Jul 2006 10:05:40 -0500 [thread overview]
Message-ID: <1153839940.14354.37.camel@stevo-desktop> (raw)
In-Reply-To: <E1G5HW5-0005xn-00@gondolin.me.apana.org.au>
On Tue, 2006-07-25 at 17:39 +1000, Herbert Xu wrote:
> Steve Wise <swise@opengridcomputing.com> wrote:
> >
> > Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE
> > and RTM_NEWROUTE.
>
> This may confuse existing rtnetlink users since you're generating an
> RTM_DELROUTE message that's identical to one triggered by something
> like 'ip route del'.
>
Yea, I didn't really want to create a REDIRECT rtmsg, so I punted. :-)
But they really are seeing a delete followed by an add. That's what the
kernel is doing.
> As you're introducing a completely new RTM_ROUTEUPD type, it might
> be better to attach any information from the existing route that you
> need to the ROUTEUPD message.
Yea, the main change is the next hop ip address or gateway field.
>
> Actually, what was the reason you need the existing route here?
>
The rdma driver needs to update all established rdma connections that
are using the next-hop information of the existing route and make them
use the next-hop information of the new route. In addition, the rdma
driver might have a reference to the old dst entry. So it can release
that ref and add a ref to the new dst entry.
> > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > index 5f87533..33d8a83 100644
> > --- a/net/ipv4/fib_semantics.c
> > +++ b/net/ipv4/fib_semantics.c
> > @@ -44,6 +44,7 @@ #include <net/tcp.h>
> > #include <net/sock.h>
> > #include <net/ip_fib.h>
> > #include <net/ip_mp_alg.h>
> > +#include <net/netevent.h>
> >
> > #include "fib_lookup.h"
> >
> > @@ -279,6 +280,14 @@ void rtmsg_fib(int event, u32 key, struc
> > struct sk_buff *skb;
> > u32 pid = req ? req->pid : n->nlmsg_pid;
> > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256);
> > + struct netevent_route_info nri;
> > + int netevent;
> > +
> > + nri.family = AF_INET;
> > + nri.data = &fa->fa_info;
> > + netevent = event == RTM_NEWROUTE ? NETEVENT_ROUTE_ADD
> > + : NETEVENT_ROUTE_DEL;
> > + call_netevent_notifiers(netevent, &nri);
>
> Hmm, this is broken. These route events are meaningless without the
> corresponding IP rule events. Are you sure you really want to make
> your hardware/driver grok multiple routing tables?
>
> Perhaps you should simply stick to dst entries and flush all your
> tables when the routes are changed. This is what the Linux IP stack
> does.
>
I have to admit I'm a little fuzzy on the routing stuff. The main
netevents I've utilized in the the rdma driver I'm writing is the
neighbour update event and the redirect event. Route add/del was added
for completeness of "routing" netevents.
Can you expand further or point me to code where the IP stack "flushes
its tables" when routes are changed?
>From my experience, all the rdma driver needs is the dst entry. It
using the routing table to determine the dst_entry at connection
establish time. And it needs to know if the next-hop or PMTU ever
changes.
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 2dc6dbb..18879e6 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1117,6 +1120,52 @@ static void rt_del(unsigned hash, struct
> > spin_unlock_bh(rt_hash_lock_addr(hash));
> > }
> >
> > +static void rtm_redirect(struct rtable *old, struct rtable *new)
> > +{
> > + struct netevent_redirect netevent;
> > + struct sk_buff *skb;
> > + int err;
> > +
> > + netevent.old = &old->u.dst;
> > + netevent.new = &new->u.dst;
> > +
> > + /* notify netevent subscribers */
> > + call_netevent_notifiers(NETEVENT_REDIRECT, &netevent);
> > +
> > + /* Post NETLINK messages: RTM_DELROUTE for old route,
> > + RTM_NEWROUTE for new route */
> > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
>
> Please use a better size estimate rather than NLMSG_GOODSIZE here since
> you're doing GFP_ATOMIC.
>
ok
> > @@ -1442,6 +1493,32 @@ unsigned short ip_rt_frag_needed(struct
> > return est_mtu ? : new_mtu;
> > }
> >
> > +static void rtm_pmtu_update(struct rtable *rt)
> > +{
> > + struct sk_buff *skb;
> > + int err;
> > +
> > + call_netevent_notifiers(NETEVENT_PMTU_UPDATE, &rt->u.dst);
> > +
> > + skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
>
> Ditto.
>
ok
Thanks,
Steve.
next prev parent reply other threads:[~2006-07-25 15:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-18 18:48 [PATCH Round 4 0/3][RFC] Network Event Notifier Mechanism Steve Wise
2006-07-18 18:48 ` [PATCH Round 4 1/3] " Steve Wise
2006-07-18 18:49 ` [PATCH Round 4 2/3] Core network changes to support network event notification Steve Wise
2006-07-25 7:39 ` Herbert Xu
2006-07-25 15:05 ` Steve Wise [this message]
2006-07-26 3:39 ` Herbert Xu
2006-07-26 16:15 ` Steve Wise
2006-07-26 20:56 ` David Miller
2006-07-18 18:49 ` [PATCH Round 4 3/3] Cleanup ib_addr module to use the netevent patch Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1153839940.14354.37.camel@stevo-desktop \
--to=swise@opengridcomputing.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=openib-general@openib.org \
--cc=rdreier@cisco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).