* [Q] How to invalidate ARP cache for a network device from within kernel @ 2010-11-26 17:38 Maxim Levitsky 2010-11-27 1:18 ` Stefan Richter 0 siblings, 1 reply; 8+ messages in thread From: Maxim Levitsky @ 2010-11-26 17:38 UTC (permalink / raw) To: netdev@vger.kernel.org; +Cc: Stefan Richter, linux1394-devel Hi, I am currently dealing with one of the last issues present in implementation of networking over firewire as described in RFC 2734. The driver in question is firewire-net. Here is the description of the problem: To communicate with a remote network node, the sender node must know the target node ID and a bus address within its bus address space. Both things, unlike ethernet MAC, aren't fixed and can change on each bus reset that happens every time a device appears/disappers from a bus and can happen due to few more reasons. The nodes on the bus do have a unique permanent label, a GUID, but it can't be used for addressing. Unfortunately, instead of defining a private ARP like broadcast message that would map a GUID to nodeID/bus address on every bus reset, the RFC 2734 abuses ARP for that. Every ARP response in addition to IP, includes all needed information about the sending node. This sort of works as long as firewire-net driver translates between hardware specific and standard ARP formats. However as soon as bus reset happens, the upper layer ARP cache isn't invalidated, thus all attempts to send packets to remote node now fail, because the additional information (node id and bus address) about remote node is now invalid, but ARP core doesn't send ARP requests because it has the response in the cache. Now there are 2 ways to resolve this issue, I can implement ether of them, but I need some assistance from you. 1. Make firewire-net send ARP requests on its own for all new discovered remote nodes. The easiest solution, but I am not sure if right one. Also if such ARP packet won't reach the destination on first try, which can happen because bus resets severe the IO for a while, the problem will persist. 2. Allow firewire-net to discard the ARP cache for its generated entries. Not sure if that is possible now, but that will solve the problem for sure. I did look at arp core code, and I didn't find a way to do that yet. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-26 17:38 [Q] How to invalidate ARP cache for a network device from within kernel Maxim Levitsky @ 2010-11-27 1:18 ` Stefan Richter 2010-11-27 1:25 ` Maxim Levitsky 0 siblings, 1 reply; 8+ messages in thread From: Stefan Richter @ 2010-11-27 1:18 UTC (permalink / raw) To: Maxim Levitsky; +Cc: netdev@vger.kernel.org, linux1394-devel On Nov 26 Maxim Levitsky wrote: > However as soon as bus reset happens, the upper layer ARP cache isn't > invalidated, thus all attempts to send packets to remote node now fail, > because the additional information (node id and bus address) about > remote node is now invalid, but ARP core doesn't send ARP requests > because it has the response in the cache. When is this a problem? With nodes which stay on the bus (i.e. are present before and after the bus reset)? Or with nodes which go away and come back much later (but before the old ARP cache entry was cleaned out)? -- Stefan Richter -=====-==-=- =-== ==-== http://arcgraph.de/sr/ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 1:18 ` Stefan Richter @ 2010-11-27 1:25 ` Maxim Levitsky 2010-11-27 8:33 ` Stefan Richter 2010-11-27 14:13 ` Stefan Richter 0 siblings, 2 replies; 8+ messages in thread From: Maxim Levitsky @ 2010-11-27 1:25 UTC (permalink / raw) To: Stefan Richter; +Cc: netdev@vger.kernel.org, linux1394-devel [-- Attachment #1: Type: text/plain, Size: 1187 bytes --] On Sat, 2010-11-27 at 02:18 +0100, Stefan Richter wrote: > On Nov 26 Maxim Levitsky wrote: > > However as soon as bus reset happens, the upper layer ARP cache isn't > > invalidated, thus all attempts to send packets to remote node now fail, > > because the additional information (node id and bus address) about > > remote node is now invalid, but ARP core doesn't send ARP requests > > because it has the response in the cache. > > When is this a problem? With nodes which stay on the bus (i.e. are > present before and after the bus reset)? Or with nodes which go away > and come back much later (but before the old ARP cache entry was cleaned > out)? Its about later. A node that disconnects and connects after 5 seconds for example or 20 seconds. ARP timeout is I think 30 seconds or even more. Btw I already solved that problem. Patches attached. With this and all great patches from you and Clemens, the firewire networking strongly resembles ethernet in terms of speed and reliability. (When I resume from ram my desktop, connection restores after less that 5 seconds). On laptop I still see few issues on s2ram cycle, I am tackling them now. Best regards, Maxim Levitsky [-- Attachment #2: 0001-firewire-net-restart-ISO-channel-on-bus-resets.patch --] [-- Type: text/x-patch, Size: 1017 bytes --] >From e77f71cfed34d35bcec48d7cbcda88d0626c2d09 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <maximlevitsky@gmail.com> Date: Sun, 14 Nov 2010 23:24:46 +0200 Subject: [PATCH 1/3] firewire: net: restart ISO channel on bus resets --- drivers/firewire/net.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c index 1a467a9..007969c 100644 --- a/drivers/firewire/net.c +++ b/drivers/firewire/net.c @@ -1593,10 +1593,15 @@ static void fwnet_update(struct fw_unit *unit) { struct fw_device *device = fw_parent_device(unit); struct fwnet_peer *peer = dev_get_drvdata(&unit->device); + struct fwnet_device *dev = peer->dev; int generation; generation = device->generation; + fw_iso_context_stop(dev->broadcast_rcv_context); + fw_iso_context_start(dev->broadcast_rcv_context, -1, 0, + FW_ISO_CONTEXT_MATCH_ALL_TAGS); + spin_lock_irq(&peer->dev->lock); peer->node_id = device->node_id; peer->generation = generation; -- 1.7.1 [-- Attachment #3: 0002-NET-ARP-allow-to-invalidate-specific-ARP-entries.patch --] [-- Type: text/x-patch, Size: 2602 bytes --] >From 70bab78da7677767f476290e94ea8fb5903e2e2a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <maximlevitsky@gmail.com> Date: Sat, 27 Nov 2010 00:50:45 +0200 Subject: [PATCH 2/3] NET: ARP: allow to invalidate specific ARP entries IPv4 over firewire needs to be able to remove ARP entries from cache that belong to nodes that are removed, because IPv4 over firewire uses ARP packets for private information about nodes. This information becames invalid on node removal, thus as soon as it is connected again, ARP packet should be sent to it which is not done due to valid cache entry. CC: netdev@vger.kernel.org Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> --- include/net/arp.h | 1 + net/ipv4/arp.c | 29 ++++++++++++++++++----------- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index f4cf6ce..91f0568 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -25,5 +25,6 @@ extern struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *target_hw); extern void arp_xmit(struct sk_buff *skb); +int arp_invalidate(struct net_device *dev, __be32 ip); #endif /* _ARP_H */ diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index d8e540c..35b1272 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1142,6 +1142,23 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev) return err; } +int arp_invalidate(struct net_device *dev, __be32 ip) +{ + int err = -ENXIO; + struct neighbour *neigh = neigh_lookup(&arp_tbl, &ip, dev); + + if (neigh) { + if (neigh->nud_state & ~NUD_NOARP) + err = neigh_update(neigh, NULL, NUD_FAILED, + NEIGH_UPDATE_F_OVERRIDE| + NEIGH_UPDATE_F_ADMIN); + neigh_release(neigh); + } + + return err; +} +EXPORT_SYMBOL(arp_invalidate); + static int arp_req_delete_public(struct net *net, struct arpreq *r, struct net_device *dev) { @@ -1162,7 +1179,6 @@ static int arp_req_delete(struct net *net, struct arpreq *r, { int err; __be32 ip; - struct neighbour *neigh; if (r->arp_flags & ATF_PUBL) return arp_req_delete_public(net, r, dev); @@ -1180,16 +1196,7 @@ static int arp_req_delete(struct net *net, struct arpreq *r, if (!dev) return -EINVAL; } - err = -ENXIO; - neigh = neigh_lookup(&arp_tbl, &ip, dev); - if (neigh) { - if (neigh->nud_state & ~NUD_NOARP) - err = neigh_update(neigh, NULL, NUD_FAILED, - NEIGH_UPDATE_F_OVERRIDE| - NEIGH_UPDATE_F_ADMIN); - neigh_release(neigh); - } - return err; + return arp_invalidate(dev, ip); } /* -- 1.7.1 [-- Attachment #4: 0003-firewire-net-invalidate-ARP-entries-for-removed-node.patch --] [-- Type: text/x-patch, Size: 1672 bytes --] >From f284b2644694797f42df9df94cc6ccbaa17155ca Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <maximlevitsky@gmail.com> Date: Sat, 27 Nov 2010 00:35:04 +0200 Subject: [PATCH 3/3] firewire: net: invalidate ARP entries for removed nodes. This allows to be able to connect to nodes that disappered from the bus and after some time appeared again. Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> --- drivers/firewire/net.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c index 007969c..bb7939a 100644 --- a/drivers/firewire/net.c +++ b/drivers/firewire/net.c @@ -189,6 +189,7 @@ struct fwnet_peer { struct fwnet_device *dev; u64 guid; u64 fifo; + __be32 ip; /* guarded by dev->lock */ struct list_head pd_list; /* received partial datagrams */ @@ -568,6 +569,8 @@ static int fwnet_finish_incoming_packet(struct net_device *net, peer->speed = sspd; if (peer->max_payload > max_payload) peer->max_payload = max_payload; + + peer->ip = arp1394->sip; } spin_unlock_irqrestore(&dev->lock, flags); @@ -1443,6 +1446,7 @@ static int fwnet_add_peer(struct fwnet_device *dev, peer->dev = dev; peer->guid = (u64)device->config_rom[3] << 32 | device->config_rom[4]; peer->fifo = FWNET_NO_FIFO_ADDR; + peer->ip = 0; INIT_LIST_HEAD(&peer->pd_list); peer->pdg_size = 0; peer->datagram_label = 0; @@ -1558,6 +1562,9 @@ static int fwnet_remove(struct device *_dev) mutex_lock(&fwnet_device_mutex); + if (dev->netdev && peer->ip) + arp_invalidate(dev->netdev, peer->ip); + fwnet_remove_peer(peer); if (list_empty(&dev->peer_list)) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 1:25 ` Maxim Levitsky @ 2010-11-27 8:33 ` Stefan Richter 2010-11-27 15:19 ` Stefan Richter 2010-11-27 14:13 ` Stefan Richter 1 sibling, 1 reply; 8+ messages in thread From: Stefan Richter @ 2010-11-27 8:33 UTC (permalink / raw) To: Maxim Levitsky; +Cc: netdev@vger.kernel.org, linux1394-devel On Nov 27 Maxim Levitsky wrote: > Subject: [PATCH 1/3] firewire: net: restart ISO channel on bus resets > > --- > drivers/firewire/net.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c > index 1a467a9..007969c 100644 > --- a/drivers/firewire/net.c > +++ b/drivers/firewire/net.c > @@ -1593,10 +1593,15 @@ static void fwnet_update(struct fw_unit *unit) > { > struct fw_device *device = fw_parent_device(unit); > struct fwnet_peer *peer = dev_get_drvdata(&unit->device); > + struct fwnet_device *dev = peer->dev; > int generation; > > generation = device->generation; > > + fw_iso_context_stop(dev->broadcast_rcv_context); > + fw_iso_context_start(dev->broadcast_rcv_context, -1, 0, > + FW_ISO_CONTEXT_MATCH_ALL_TAGS); > + > spin_lock_irq(&peer->dev->lock); > peer->node_id = device->node_id; > peer->generation = generation; Could you add a changelog? And it can be optimized to do only once per bus generation. E.g. add a generation field for the IR context in fwnet_device. Or do it only if fw_unit is that of the local node. OTOH, is this actually necessary on normal bus resets? It should only be necessary after PM resume, right? If so, perhaps do it only if the generation increased by more than one. Also, wouldn't a fw_iso_context_start alone, without prior stop, suffice? -- Stefan Richter -=====-==-=- =-== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 8:33 ` Stefan Richter @ 2010-11-27 15:19 ` Stefan Richter 2010-11-27 15:44 ` Maxim Levitsky 0 siblings, 1 reply; 8+ messages in thread From: Stefan Richter @ 2010-11-27 15:19 UTC (permalink / raw) To: Maxim Levitsky; +Cc: netdev@vger.kernel.org, linux1394-devel On Nov 27 Stefan Richter wrote: > > @@ -1593,10 +1593,15 @@ static void fwnet_update(struct fw_unit > > *unit) { > > struct fw_device *device = fw_parent_device(unit); > > struct fwnet_peer *peer = dev_get_drvdata(&unit->device); > > + struct fwnet_device *dev = peer->dev; > > int generation; > > > > generation = device->generation; > > > > + fw_iso_context_stop(dev->broadcast_rcv_context); > > + fw_iso_context_start(dev->broadcast_rcv_context, -1, 0, > > + FW_ISO_CONTEXT_MATCH_ALL_TAGS); [...] > OTOH, is this actually necessary on normal bus resets? It should only > be necessary after PM resume, right? If so, perhaps do it only if the > generation increased by more than one. Actually, shouldn't context restart at resume happen in firewire-ohci rather? Any other protocol driver which uses isochronous I/O (be it a kernel driver or userspace driver) would have the same issue, I guess. -- Stefan Richter -=====-==-=- =-== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 15:19 ` Stefan Richter @ 2010-11-27 15:44 ` Maxim Levitsky 0 siblings, 0 replies; 8+ messages in thread From: Maxim Levitsky @ 2010-11-27 15:44 UTC (permalink / raw) To: Stefan Richter; +Cc: netdev@vger.kernel.org, linux1394-devel On Sat, 2010-11-27 at 16:19 +0100, Stefan Richter wrote: > On Nov 27 Stefan Richter wrote: > > > @@ -1593,10 +1593,15 @@ static void fwnet_update(struct fw_unit > > > *unit) { > > > struct fw_device *device = fw_parent_device(unit); > > > struct fwnet_peer *peer = dev_get_drvdata(&unit->device); > > > + struct fwnet_device *dev = peer->dev; > > > int generation; > > > > > > generation = device->generation; > > > > > > + fw_iso_context_stop(dev->broadcast_rcv_context); > > > + fw_iso_context_start(dev->broadcast_rcv_context, -1, 0, > > > + FW_ISO_CONTEXT_MATCH_ALL_TAGS); > [...] > > OTOH, is this actually necessary on normal bus resets? It should only > > be necessary after PM resume, right? If so, perhaps do it only if the > > generation increased by more than one. > > Actually, shouldn't context restart at resume happen in firewire-ohci > rather? Any other protocol driver which uses isochronous I/O (be it a > kernel driver or userspace driver) would have the same issue, I guess. Well, the spec says that upon bus reset ISO channels must be allocated again. This somewhat mimics that, although it can be done better. Of course the same true after suspend/resume. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 1:25 ` Maxim Levitsky 2010-11-27 8:33 ` Stefan Richter @ 2010-11-27 14:13 ` Stefan Richter 2010-11-27 14:33 ` Maxim Levitsky 1 sibling, 1 reply; 8+ messages in thread From: Stefan Richter @ 2010-11-27 14:13 UTC (permalink / raw) To: Maxim Levitsky; +Cc: netdev@vger.kernel.org, linux1394-devel On Nov 27 Maxim Levitsky wrote: > > > However as soon as bus reset happens, the upper layer ARP cache > > > isn't invalidated, thus all attempts to send packets to remote > > > node now fail, because the additional information (node id and > > > bus address) about remote node is now invalid, but ARP core > > > doesn't send ARP requests because it has the response in the > > > cache. > > > > When is this a problem? With nodes which stay on the bus (i.e. are > > present before and after the bus reset)? Or with nodes which go > > away and come back much later (but before the old ARP cache entry > > was cleaned out)? > Its about later. > A node that disconnects and connects after 5 seconds for example or 20 > seconds. > ARP timeout is I think 30 seconds or even more. > > Btw I already solved that problem. > Patches attached. [...] > Subject: [PATCH 2/3] NET: ARP: allow to invalidate specific ARP entries > > IPv4 over firewire needs to be able to remove ARP entries > from cache that belong to nodes that are removed, because > IPv4 over firewire uses ARP packets for private information > about nodes. > > This information becames invalid on node removal, thus > as soon as it is connected again, ARP packet should be sent > to it which is not done due to valid cache entry. > > CC: netdev@vger.kernel.org > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> > --- > include/net/arp.h | 1 + > net/ipv4/arp.c | 29 ++++++++++++++++++----------- > 2 files changed, 19 insertions(+), 11 deletions(-) [...] > Subject: [PATCH 3/3] firewire: net: invalidate ARP entries for > removed nodes. > > This allows to be able to connect to nodes that disappered > from the bus and after some time appeared again. > > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> > --- > drivers/firewire/net.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) I wonder if this is the right approach. Suppose somebody implements IPv6 over 1394 (RFC 3146) which uses Neighbour Discovery (RFC 2461). What are we going to do then to solve the very same problem? (Is it a problem at all? There is just an annoying period of 30 seconds or so during which packets are dropped. And that period starts when the cable was pulled or the remote node PM-suspended or a hub powered down or the likes.) Anyhow. I suspect eth1394's/ firewire-net's neighbour (fwnet_peer) management is lacking. Consider this example session between Linux/firewire-net and OS X. 1.) Plug them together, ifup on Linux. On the Linux node, the local node is fw5 and the remote OS X node is fw9. 2.) On OS X, don't start any user action on the FireWire networking interface. On Linux, start pinging the remote node. Ping gets replies. 3.) Unplug the cable. Ping's requests are being dropped from now on. There is a bit of log spam until firewire-core releases the fw9 fw_device instance, which includes that firewire-net removes the corresponding fwnet_peer instance: Nov 27 12:17:15 stein kernel: firewire_net: fwnet_write_complete: failed: 13 Nov 27 12:17:16 stein kernel: firewire_net: fwnet_write_complete: failed: 13 4.) Plug the cable back in a few seconds later. Resulting dmesg: Nov 27 12:17:19 stein kernel: firewire_core: skipped bus generations, destroying all nodes Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_core: rediscovered device fw5 Nov 27 12:17:20 stein kernel: firewire_core: phy config: card 2, new root=ffc1, gap_count=5 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:22 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:23 stein kernel: firewire_core: created device fw9: GUID 0017f2fffe66fb80, S400, 1 config ROM retries 5.) At this point, ping's requests are still being dropped. 6.) A whole while later, ping is back in business again, obviously because the old ARP entry was cleared and a new ARP request--response was performed. We learn two things from that: - OS X sends gratuitous ARP messages. Maybe that's Zeroconf (RFC 3927), or maybe that's just part of their RFC 2734 driver. There seem to be consistently nine of such messages sent within a period of 3 or 4 seconds, starting almost immediately after self-ID-complete after cable replug. - fwnet_probe, which adds the fwnet_peer instance that pertains to fw9, is performed just a little bit too late to match one of those ARP packets with an fwnet_peer instance. Should firewire-net send gratuitous ARP messages too? I.e., in fwnet_probe, if the interface is up, send an ARP Request packet which solicits a response. Likewise, if/when IPv6-over-1394 is implemented, let fwnet_probe send a Neighbour Solicitation packet. --- In effect, this means that we would not add EXPORT_SYMBOL(arp_invalidate) and, perspectively, EXPORT_SYMBOL(ndisc_invalidate), and call those when a node went away. Instead, we solicit an ARP Response or a Neighbor Advertisement when a node joined us and let that response or advertisement update the ARP cache or NDP cache. The question is, is the link-layer driver firewire-net a proper place to call arp_send() and ndisc_send_ns()? And is this any better than a new arp_invalidate() and ndisc_invalidate()? ---- On a loosely related note, after looking at 1394 AR and at NDP, shouldn't we rather set net_device.addr_len = 16 and net_device.dev_addr = concatenation of EUI-64, max_rec, spd, and unicast_FIFO ? -- Stefan Richter -=====-==-=- =-== ==-== http://arcgraph.de/sr/ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Q] How to invalidate ARP cache for a network device from within kernel 2010-11-27 14:13 ` Stefan Richter @ 2010-11-27 14:33 ` Maxim Levitsky 0 siblings, 0 replies; 8+ messages in thread From: Maxim Levitsky @ 2010-11-27 14:33 UTC (permalink / raw) To: Stefan Richter; +Cc: netdev@vger.kernel.org, linux1394-devel On Sat, 2010-11-27 at 15:13 +0100, Stefan Richter wrote: > On Nov 27 Maxim Levitsky wrote: > > > > However as soon as bus reset happens, the upper layer ARP cache > > > > isn't invalidated, thus all attempts to send packets to remote > > > > node now fail, because the additional information (node id and > > > > bus address) about remote node is now invalid, but ARP core > > > > doesn't send ARP requests because it has the response in the > > > > cache. > > > > > > When is this a problem? With nodes which stay on the bus (i.e. are > > > present before and after the bus reset)? Or with nodes which go > > > away and come back much later (but before the old ARP cache entry > > > was cleaned out)? > > Its about later. > > A node that disconnects and connects after 5 seconds for example or 20 > > seconds. > > ARP timeout is I think 30 seconds or even more. > > > > Btw I already solved that problem. > > Patches attached. > [...] > > Subject: [PATCH 2/3] NET: ARP: allow to invalidate specific ARP entries > > > > IPv4 over firewire needs to be able to remove ARP entries > > from cache that belong to nodes that are removed, because > > IPv4 over firewire uses ARP packets for private information > > about nodes. > > > > This information becames invalid on node removal, thus > > as soon as it is connected again, ARP packet should be sent > > to it which is not done due to valid cache entry. > > > > CC: netdev@vger.kernel.org > > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> > > --- > > include/net/arp.h | 1 + > > net/ipv4/arp.c | 29 ++++++++++++++++++----------- > > 2 files changed, 19 insertions(+), 11 deletions(-) > > [...] > > > Subject: [PATCH 3/3] firewire: net: invalidate ARP entries for > > removed nodes. > > > > This allows to be able to connect to nodes that disappered > > from the bus and after some time appeared again. > > > > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> > > --- > > drivers/firewire/net.c | 7 +++++++ > > 1 files changed, 7 insertions(+), 0 deletions(-) > > I wonder if this is the right approach. > > Suppose somebody implements IPv6 over 1394 (RFC 3146) which uses > Neighbour Discovery (RFC 2461). What are we going to do then to solve > the very same problem? Well, thats a problem, but firewire is somewhat unique. I don't image any other networking transport to be protocol dependent. > > (Is it a problem at all? There is just an annoying period of 30 > seconds or so during which packets are dropped. And that period > starts when the cable was pulled or the remote node PM-suspended or a > hub powered down or the likes.) It is somewhat a problem, if you for example suspend a system by mistake and on resume you need to wait too much. It is annoying. > > Anyhow. I suspect eth1394's/ firewire-net's neighbour (fwnet_peer) > management is lacking. Consider this example session between > Linux/firewire-net and OS X. > > 1.) Plug them together, ifup on Linux. On the Linux node, the local > node is fw5 and the remote OS X node is fw9. > > 2.) On OS X, don't start any user action on the FireWire networking > interface. On Linux, start pinging the remote node. Ping gets replies. > > 3.) Unplug the cable. Ping's requests are being dropped from now on. > There is a bit of log spam until firewire-core releases the fw9 > fw_device instance, which includes that firewire-net removes the > corresponding fwnet_peer instance: > Nov 27 12:17:15 stein kernel: firewire_net: fwnet_write_complete: failed: 13 > Nov 27 12:17:16 stein kernel: firewire_net: fwnet_write_complete: failed: 13 > > 4.) Plug the cable back in a few seconds later. Resulting dmesg: > Nov 27 12:17:19 stein kernel: firewire_core: skipped bus generations, destroying all nodes > Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:20 stein kernel: firewire_core: rediscovered device fw5 > Nov 27 12:17:20 stein kernel: firewire_core: phy config: card 2, new root=ffc1, gap_count=5 > Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 > Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from > 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer > for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:22 stein kernel: > firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 > 12:17:23 stein kernel: firewire_core: created device fw9: GUID > 0017f2fffe66fb80, S400, 1 config ROM retries > > 5.) At this point, ping's requests are still being dropped. > > 6.) A whole while later, ping is back in business again, obviously > because the old ARP entry was cleared and a new ARP request--response > was performed. > > We learn two things from that: > > - OS X sends gratuitous ARP messages. Maybe that's Zeroconf (RFC > 3927), or maybe that's just part of their RFC 2734 driver. > There seem to be consistently nine of such messages sent within a > period of 3 or 4 seconds, starting almost immediately after > self-ID-complete after cable replug. > > - fwnet_probe, which adds the fwnet_peer instance that pertains to > fw9, is performed just a little bit too late to match one of those > ARP packets with an fwnet_peer instance. Which means that even if we teach firewire-net to send ARP requests, these won't be handled by other side that runs firewire-net too. Of course > > Should firewire-net send gratuitous ARP messages too? I.e., in > fwnet_probe, if the interface is up, send an ARP Request packet which > solicits a response. Likewise, if/when IPv6-over-1394 is implemented, > let fwnet_probe send a Neighbour Solicitation packet. --- In effect, > this means that we would not add EXPORT_SYMBOL(arp_invalidate) and, > perspectively, EXPORT_SYMBOL(ndisc_invalidate), and call those when a > node went away. Instead, we solicit an ARP Response or a Neighbor > Advertisement when a node joined us and let that response or > advertisement update the ARP cache or NDP cache. I am not against that at all. Clearning the cache seemed just to be very robust and solve a root case. This is less robust solution (which you even proved because OSX does it...) > > The question is, is the link-layer driver firewire-net a proper place > to call arp_send() and ndisc_send_ns()? > > And is this any better than a new arp_invalidate() and > ndisc_invalidate()? That what I am not sure at all. I can bypass arp_send, and just create a 1394 ARP packet and send it using fw_request. But doing that as I did seemed to be also quite simple. It is protocol depedent but that is firewire fault not mine. > > ---- > > On a loosely related note, after looking at 1394 AR and at NDP, > shouldn't we rather set > net_device.addr_len = 16 > and > net_device.dev_addr = concatenation of EUI-64, max_rec, spd, > and unicast_FIFO > ? The problem is that except GUID, the rest can change. And hardware addresses should be fixed. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-11-27 15:44 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-26 17:38 [Q] How to invalidate ARP cache for a network device from within kernel Maxim Levitsky 2010-11-27 1:18 ` Stefan Richter 2010-11-27 1:25 ` Maxim Levitsky 2010-11-27 8:33 ` Stefan Richter 2010-11-27 15:19 ` Stefan Richter 2010-11-27 15:44 ` Maxim Levitsky 2010-11-27 14:13 ` Stefan Richter 2010-11-27 14:33 ` Maxim Levitsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).