From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Richter Subject: Re: [Q] How to invalidate ARP cache for a network device from within kernel Date: Sat, 27 Nov 2010 15:13:15 +0100 Message-ID: <20101127151315.631dc1dd@stein> References: <1290793099.3716.21.camel@maxim-laptop> <20101127021833.328e8942@stein> <1290821143.4145.3.camel@maxim-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: "netdev@vger.kernel.org" , linux1394-devel To: Maxim Levitsky Return-path: In-Reply-To: <1290821143.4145.3.camel@maxim-laptop> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux1394-devel-bounces@lists.sourceforge.net List-Id: netdev.vger.kernel.org On Nov 27 Maxim Levitsky wrote: > > > However as soon as bus reset happens, the upper layer ARP cache > > > isn't invalidated, thus all attempts to send packets to remote > > > node now fail, because the additional information (node id and > > > bus address) about remote node is now invalid, but ARP core > > > doesn't send ARP requests because it has the response in the > > > cache. > > > > When is this a problem? With nodes which stay on the bus (i.e. are > > present before and after the bus reset)? Or with nodes which go > > away and come back much later (but before the old ARP cache entry > > was cleaned out)? > Its about later. > A node that disconnects and connects after 5 seconds for example or 20 > seconds. > ARP timeout is I think 30 seconds or even more. > > Btw I already solved that problem. > Patches attached. [...] > Subject: [PATCH 2/3] NET: ARP: allow to invalidate specific ARP entries > > IPv4 over firewire needs to be able to remove ARP entries > from cache that belong to nodes that are removed, because > IPv4 over firewire uses ARP packets for private information > about nodes. > > This information becames invalid on node removal, thus > as soon as it is connected again, ARP packet should be sent > to it which is not done due to valid cache entry. > > CC: netdev@vger.kernel.org > Signed-off-by: Maxim Levitsky > --- > include/net/arp.h | 1 + > net/ipv4/arp.c | 29 ++++++++++++++++++----------- > 2 files changed, 19 insertions(+), 11 deletions(-) [...] > Subject: [PATCH 3/3] firewire: net: invalidate ARP entries for > removed nodes. > > This allows to be able to connect to nodes that disappered > from the bus and after some time appeared again. > > Signed-off-by: Maxim Levitsky > --- > drivers/firewire/net.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) I wonder if this is the right approach. Suppose somebody implements IPv6 over 1394 (RFC 3146) which uses Neighbour Discovery (RFC 2461). What are we going to do then to solve the very same problem? (Is it a problem at all? There is just an annoying period of 30 seconds or so during which packets are dropped. And that period starts when the cable was pulled or the remote node PM-suspended or a hub powered down or the likes.) Anyhow. I suspect eth1394's/ firewire-net's neighbour (fwnet_peer) management is lacking. Consider this example session between Linux/firewire-net and OS X. 1.) Plug them together, ifup on Linux. On the Linux node, the local node is fw5 and the remote OS X node is fw9. 2.) On OS X, don't start any user action on the FireWire networking interface. On Linux, start pinging the remote node. Ping gets replies. 3.) Unplug the cable. Ping's requests are being dropped from now on. There is a bit of log spam until firewire-core releases the fw9 fw_device instance, which includes that firewire-net removes the corresponding fwnet_peer instance: Nov 27 12:17:15 stein kernel: firewire_net: fwnet_write_complete: failed: 13 Nov 27 12:17:16 stein kernel: firewire_net: fwnet_write_complete: failed: 13 4.) Plug the cable back in a few seconds later. Resulting dmesg: Nov 27 12:17:19 stein kernel: firewire_core: skipped bus generations, destroying all nodes Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_core: rediscovered device fw5 Nov 27 12:17:20 stein kernel: firewire_core: phy config: card 2, new root=ffc1, gap_count=5 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:22 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:23 stein kernel: firewire_core: created device fw9: GUID 0017f2fffe66fb80, S400, 1 config ROM retries 5.) At this point, ping's requests are still being dropped. 6.) A whole while later, ping is back in business again, obviously because the old ARP entry was cleared and a new ARP request--response was performed. We learn two things from that: - OS X sends gratuitous ARP messages. Maybe that's Zeroconf (RFC 3927), or maybe that's just part of their RFC 2734 driver. There seem to be consistently nine of such messages sent within a period of 3 or 4 seconds, starting almost immediately after self-ID-complete after cable replug. - fwnet_probe, which adds the fwnet_peer instance that pertains to fw9, is performed just a little bit too late to match one of those ARP packets with an fwnet_peer instance. Should firewire-net send gratuitous ARP messages too? I.e., in fwnet_probe, if the interface is up, send an ARP Request packet which solicits a response. Likewise, if/when IPv6-over-1394 is implemented, let fwnet_probe send a Neighbour Solicitation packet. --- In effect, this means that we would not add EXPORT_SYMBOL(arp_invalidate) and, perspectively, EXPORT_SYMBOL(ndisc_invalidate), and call those when a node went away. Instead, we solicit an ARP Response or a Neighbor Advertisement when a node joined us and let that response or advertisement update the ARP cache or NDP cache. The question is, is the link-layer driver firewire-net a proper place to call arp_send() and ndisc_send_ns()? And is this any better than a new arp_invalidate() and ndisc_invalidate()? ---- On a loosely related note, after looking at 1394 AR and at NDP, shouldn't we rather set net_device.addr_len = 16 and net_device.dev_addr = concatenation of EUI-64, max_rec, spd, and unicast_FIFO ? -- Stefan Richter -=====-==-=- =-== ==-== http://arcgraph.de/sr/ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev