netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <maximlevitsky@gmail.com>
To: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	linux1394-devel <linux1394-devel@lists.sourceforge.net>
Subject: Re: [Q] How to invalidate ARP cache for a network device from within kernel
Date: Sat, 27 Nov 2010 16:33:15 +0200	[thread overview]
Message-ID: <1290868395.5305.14.camel@maxim-laptop> (raw)
In-Reply-To: <20101127151315.631dc1dd@stein>

On Sat, 2010-11-27 at 15:13 +0100, Stefan Richter wrote:
> On Nov 27 Maxim Levitsky wrote:
> > > > However as soon as bus reset happens, the upper layer ARP cache
> > > > isn't invalidated, thus all attempts to send packets to remote
> > > > node now fail, because the additional information (node id and
> > > > bus address) about remote node is now invalid, but ARP core
> > > > doesn't send ARP requests because it has the response in the
> > > > cache.  
> > > 
> > > When is this a problem?  With nodes which stay on the bus (i.e. are
> > > present before and after the bus reset)?  Or with nodes which go
> > > away and come back much later (but before the old ARP cache entry
> > > was cleaned out)?  
> > Its about later.
> > A node that disconnects and connects after 5 seconds for example or 20
> > seconds.
> > ARP timeout is I think 30 seconds or even more.
> > 
> > Btw I already solved that problem.
> > Patches attached.
> [...]
> > Subject: [PATCH 2/3] NET: ARP: allow to invalidate specific ARP entries
> > 
> > IPv4 over firewire needs to be able to remove ARP entries
> > from cache that belong to nodes that are removed, because
> > IPv4 over firewire uses ARP packets for private information
> > about nodes.
> > 
> > This information becames invalid on node removal, thus
> > as soon as it is connected again, ARP packet should be sent
> > to it which is not done due to valid cache entry.
> > 
> > CC: netdev@vger.kernel.org
> > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
> > ---
> >  include/net/arp.h |    1 +
> >  net/ipv4/arp.c    |   29 ++++++++++++++++++-----------
> >  2 files changed, 19 insertions(+), 11 deletions(-)
> 
> [...]
> 
> > Subject: [PATCH 3/3] firewire: net: invalidate ARP entries for
> > removed nodes.
> > 
> > This allows to be able to connect to nodes that disappered
> > from the bus and after some time appeared again.
> > 
> > Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
> > ---
> >  drivers/firewire/net.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> I wonder if this is the right approach.
> 
> Suppose somebody implements IPv6 over 1394 (RFC 3146) which uses
> Neighbour Discovery (RFC 2461).  What are we going to do then to solve
> the very same problem?
Well, thats a problem, but firewire is somewhat unique.
I don't image any other networking transport to be protocol dependent.

> 
> (Is it a problem at all?  There is just an annoying period of 30
> seconds or so during which packets are dropped.  And that period
> starts when the cable was pulled or the remote node PM-suspended or a
> hub powered down or the likes.)
It is somewhat a problem, if you for example suspend a system by mistake
and on resume you need to wait too much.
It is annoying.


> 
> Anyhow.  I suspect eth1394's/ firewire-net's neighbour (fwnet_peer)
> management is lacking.  Consider this example session between
> Linux/firewire-net and OS X.
> 
> 1.) Plug them together, ifup on Linux.  On the Linux node, the local
> node is fw5 and the remote OS X node is fw9.
> 
> 2.) On OS X, don't start any user action on the FireWire networking
> interface.  On Linux, start pinging the remote node.  Ping gets replies.
> 
> 3.) Unplug the cable.  Ping's requests are being dropped from now on.
> There is a bit of log spam until firewire-core releases the fw9
> fw_device instance, which includes that firewire-net removes the
> corresponding fwnet_peer instance:
> Nov 27 12:17:15 stein kernel: firewire_net: fwnet_write_complete: failed: 13
> Nov 27 12:17:16 stein kernel: firewire_net: fwnet_write_complete: failed: 13
> 
> 4.) Plug the cable back in a few seconds later.  Resulting dmesg:
> Nov 27 12:17:19 stein kernel: firewire_core: skipped bus generations, destroying all nodes
> Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:20 stein kernel: firewire_core: rediscovered device fw5
> Nov 27 12:17:20 stein kernel: firewire_core: phy config: card 2, new root=ffc1, gap_count=5
> Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:20 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from 0017f2fffe66fb80
> Nov 27 12:17:21 stein kernel: firewire_net: No peer for ARP packet from
> 0017f2fffe66fb80 Nov 27 12:17:21 stein kernel: firewire_net: No peer
> for ARP packet from 0017f2fffe66fb80 Nov 27 12:17:22 stein kernel:
> firewire_net: No peer for ARP packet from 0017f2fffe66fb80 Nov 27
> 12:17:23 stein kernel: firewire_core: created device fw9: GUID
> 0017f2fffe66fb80, S400, 1 config ROM retries
> 
> 5.) At this point, ping's requests are still being dropped.
> 
> 6.) A whole while later, ping is back in business again, obviously
> because the old ARP entry was cleared and a new ARP request--response
> was performed.
> 
> We learn two things from that:
> 
>   - OS X sends gratuitous ARP messages.  Maybe that's Zeroconf (RFC
>     3927), or maybe that's just part of their RFC 2734 driver.
>     There seem to be consistently nine of such messages sent within a
>     period of 3 or 4 seconds, starting almost immediately after
>     self-ID-complete after cable replug.
> 
>   - fwnet_probe, which adds the fwnet_peer instance that pertains to
>     fw9, is performed just a little bit too late to match one of those
>     ARP packets with an fwnet_peer instance.
Which means that even if we teach firewire-net to send ARP requests,
these won't be handled by other side that runs firewire-net too.
Of course 

> 
> Should firewire-net send gratuitous ARP messages too?  I.e., in
> fwnet_probe, if the interface is up, send an ARP Request packet which
> solicits a response.  Likewise, if/when IPv6-over-1394 is implemented,
> let fwnet_probe send a Neighbour Solicitation packet.  ---  In effect,
> this means that we would not add EXPORT_SYMBOL(arp_invalidate) and,
> perspectively, EXPORT_SYMBOL(ndisc_invalidate), and call those when a
> node went away.  Instead, we solicit an ARP Response or a Neighbor
> Advertisement when a node joined us and let that response or
> advertisement update the ARP cache or NDP cache.
I am not against that at all.
Clearning the cache seemed just to be very robust and solve a root case.
This is less robust solution (which you even proved because OSX does
it...)


> 
> The question is, is the link-layer driver firewire-net a proper place
> to call arp_send() and ndisc_send_ns()?
> 
> And is this any better than a new arp_invalidate() and
> ndisc_invalidate()?
That what I am not sure at all.
I can bypass arp_send, and just create a 1394 ARP packet and send it
using fw_request.
But doing that as I did seemed to be also quite simple.
It is protocol depedent but that is firewire  fault not mine.

> 
> ----
> 
> On a loosely related note, after looking at 1394 AR and at NDP,
> shouldn't we rather set
> 	net_device.addr_len = 16
> and
> 	net_device.dev_addr = concatenation of EUI-64, max_rec, spd,
> 	                      and unicast_FIFO
> ?
The problem is that except GUID, the rest can change.
And hardware addresses should be fixed.


Best regards,
	Maxim Levitsky


      reply	other threads:[~2010-11-27 14:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-26 17:38 [Q] How to invalidate ARP cache for a network device from within kernel Maxim Levitsky
2010-11-27  1:18 ` Stefan Richter
2010-11-27  1:25   ` Maxim Levitsky
2010-11-27  8:33     ` Stefan Richter
2010-11-27 15:19       ` Stefan Richter
2010-11-27 15:44         ` Maxim Levitsky
2010-11-27 14:13     ` Stefan Richter
2010-11-27 14:33       ` Maxim Levitsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290868395.5305.14.camel@maxim-laptop \
    --to=maximlevitsky@gmail.com \
    --cc=linux1394-devel@lists.sourceforge.net \
    --cc=netdev@vger.kernel.org \
    --cc=stefanr@s5r6.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).