public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* What does "Neighbour table overflow" message indicate?
@ 2001-07-29  1:23 Steve Snyder
       [not found] ` <20010729133848.A3254@weta.f00f.org>
  2001-07-29  5:41 ` Riley Williams
  0 siblings, 2 replies; 11+ messages in thread
From: Steve Snyder @ 2001-07-29  1:23 UTC (permalink / raw)
  To: linux-kernel

I just got this sequence of messages in my system log:

Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
Jul 28 19:47:44 sunburn last message repeated 9 times
Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
Jul 28 19:48:09 sunburn last message repeated 3 times
Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.

This is on a RedHat v7.1 + SMP kernel v2.4.7 system.  What is the kernel 
trying to tell me here?

Please cc me as I am not a subscriber to this list.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
       [not found] ` <20010729133848.A3254@weta.f00f.org>
@ 2001-07-29  1:53   ` Steve Snyder
  2001-07-29  1:57     ` Chris Wedgwood
  2001-07-30 12:38     ` Carlos O'Donell Jr.
  0 siblings, 2 replies; 11+ messages in thread
From: Steve Snyder @ 2001-07-29  1:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Wedgwood

No, and no errors are shown for it either:

# ifconfig lo
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
          TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

All *seems* well.  Just that 30-second period of messages and then silence.

Thanks for the response.


On Saturday 28 July 2001 08:38 pm, you wrote:
> is lo down?
>
>   --cw
>
> On Sat, Jul 28, 2001 at 08:23:14PM -0500, Steve Snyder wrote:
>     I just got this sequence of messages in my system log:
>
>     Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
>     Jul 28 19:47:44 sunburn last message repeated 9 times
>     Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
>     Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
>     Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
>     Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
>     Jul 28 19:48:09 sunburn last message repeated 3 times
>     Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
>     Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
>
>     This is on a RedHat v7.1 + SMP kernel v2.4.7 system.  What is the
> kernel trying to tell me here?
>
>     Please cc me as I am not a subscriber to this list.
>
>     Thanks.
>     -
>     To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>     More majordomo info at  http://vger.kernel.org/majordomo-info.html
>     Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  1:53   ` Steve Snyder
@ 2001-07-29  1:57     ` Chris Wedgwood
  2001-07-29  2:15       ` Steve Snyder
  2001-07-30 12:38     ` Carlos O'Donell Jr.
  1 sibling, 1 reply; 11+ messages in thread
From: Chris Wedgwood @ 2001-07-29  1:57 UTC (permalink / raw)
  To: Steve Snyder; +Cc: linux-kernel

On Sat, Jul 28, 2001 at 08:53:48PM -0500, Steve Snyder wrote:

    No, and no errors are shown for it either:

    # ifconfig lo
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
              TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0

    All *seems* well.  Just that 30-second period of messages and then
    silence.


What is the machine doing?  What kind of network is it attached to and
with how many hosts on it?



  --cw

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  1:57     ` Chris Wedgwood
@ 2001-07-29  2:15       ` Steve Snyder
  2001-07-29  9:08         ` Eric W. Biederman
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Steve Snyder @ 2001-07-29  2:15 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

On Saturday 28 July 2001 08:57 pm, Chris Wedgwood wrote:
> On Sat, Jul 28, 2001 at 08:53:48PM -0500, Steve Snyder wrote:
>
>     No, and no errors are shown for it either:
>
>     # ifconfig lo
>     lo        Link encap:Local Loopback
>               inet addr:127.0.0.1  Mask:255.0.0.0
>               UP LOOPBACK RUNNING  MTU:16436  Metric:1
>               RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
>               TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
>               collisions:0 txqueuelen:0
>
>     All *seems* well.  Just that 30-second period of messages and then
>     silence.
>
>
> What is the machine doing?  What kind of network is it attached to and
> with how many hosts on it?

It is a server for a small LAN.  Interfaces: eth0=LAN, eth1=cable modem.  I 
believe that I was playing Quake3 (multi-player across internet) on one of 
the LAN's client machines when the message were logged.  No corresponding 
messages are seen in the client's (another RHL v7.1 box) system log, but 
then, it's not running iptables.

Further snooping shows the error msg text in file inux/net/ipv4/route.c:

    if (net_ratelimit())
        printk("Neighbour table overflow.\n");

The reference to "net_ratelimit" make me wonder if it is related to 
iptables.  I am using iptable, and have since kernel 2.4.1, but I've seen 
these messages before.  Hmmm.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  1:23 What does "Neighbour table overflow" message indicate? Steve Snyder
       [not found] ` <20010729133848.A3254@weta.f00f.org>
@ 2001-07-29  5:41 ` Riley Williams
  2001-07-29 13:50   ` jeff millar
  1 sibling, 1 reply; 11+ messages in thread
From: Riley Williams @ 2001-07-29  5:41 UTC (permalink / raw)
  To: Steve Snyder; +Cc: Linux Kernel

Hi Steve.

 > I just got this sequence of messages in my system log:
 >
 > Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
 > Jul 28 19:47:44 sunburn last message repeated 9 times
 > Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
 > Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
 > Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
 > Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
 > Jul 28 19:48:09 sunburn last message repeated 3 times
 > Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
 > Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
 >
 > This is on a RedHat v7.1 + SMP kernel v2.4.7 system.  What is
 > the kernel trying to tell me here?
 >
 > Please cc me as I am not a subscriber to this list.

This could be on completely the wrong track, but here's one of the
entries from the 2.4.5 kernel's Configure.help file (I don't yet have
2.4.7 on my system):

 Q> ARP daemon support (EXPERIMENTAL)
 Q> CONFIG_ARPD
 Q>   Normally, the kernel maintains an internal cache which maps IP
 Q>   addresses to hardware addresses on the local network, so that
 Q>   Ethernet/Token Ring/ etc. frames are sent to the proper address
 Q>   on the physical networking layer. For small networks having a
 Q>   few hundred directly connected hosts or less, keeping this
 Q>   address resolution (ARP) cache inside the kernel works well.
 Q>
 Q>   However, maintaining an internal ARP cache does not work well
 Q>   for very large switched networks, and will use a lot of kernel
 Q>   memory if TCP/IP connections are made to many machines on the
 Q>   network.
 Q>
 Q>   If you say Y here, the kernel's internal ARP cache will never
 Q>   grow to more than 256 entries (the oldest entries are expired
 Q>   in a LIFO manner) and communication will be attempted with the
 Q>   user space ARP daemon arpd. Arpd then answers the address
 Q>   resolution request either from its own cache or by asking the
 Q>   net.
 Q>
 Q>   This code is experimental and also obsolete. If you want to
 Q>   use it, you need to find a version of the daemon arpd on the
 Q>   net somewhere, and you should also say Y to "Kernel/User
 Q>   network link driver", below. If unsure, say N.

The text in there looks suspiciously related to your problem to me.

Best wishes from Riley.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  2:15       ` Steve Snyder
@ 2001-07-29  9:08         ` Eric W. Biederman
  2001-07-29  9:46         ` Kurt Roeckx
  2001-07-29 13:55         ` Bernd Eckenfels
  2 siblings, 0 replies; 11+ messages in thread
From: Eric W. Biederman @ 2001-07-29  9:08 UTC (permalink / raw)
  To: swsnyder; +Cc: Chris Wedgwood, linux-kernel

> Further snooping shows the error msg text in file inux/net/ipv4/route.c:
> 
>     if (net_ratelimit())
>         printk("Neighbour table overflow.\n");

> 
> The reference to "net_ratelimit" make me wonder if it is related to 
> iptables.  I am using iptable, and have since kernel 2.4.1, but I've seen 
> these messages before.  Hmmm.

My experience with this is the message occurs when you a machine starts
arping for a non-existent ip address.  I suspect net_ratelimit triggers
when there are too many arps.

Run tcpdump -n -i eth0 (assuming your network is on eth0) and see if you
see an arp request that never gets answered.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  2:15       ` Steve Snyder
  2001-07-29  9:08         ` Eric W. Biederman
@ 2001-07-29  9:46         ` Kurt Roeckx
  2001-07-29 13:55         ` Bernd Eckenfels
  2 siblings, 0 replies; 11+ messages in thread
From: Kurt Roeckx @ 2001-07-29  9:46 UTC (permalink / raw)
  To: Steve Snyder; +Cc: Chris Wedgwood, linux-kernel

On Sat, Jul 28, 2001 at 09:15:11PM -0500, Steve Snyder wrote:
> 
> Further snooping shows the error msg text in file inux/net/ipv4/route.c:
> 
>     if (net_ratelimit())
>         printk("Neighbour table overflow.\n");
> 
> The reference to "net_ratelimit" make me wonder if it is related to 
> iptables.  I am using iptable, and have since kernel 2.4.1, but I've seen 
> these messages before.  Hmmm.

net_ratelimit() is there to only log something every 5 seconds,
so your logs don't get flooded.  It should be used for every
printk that has to do with net.

See core/utils.c


Kurt


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  5:41 ` Riley Williams
@ 2001-07-29 13:50   ` jeff millar
  0 siblings, 0 replies; 11+ messages in thread
From: jeff millar @ 2001-07-29 13:50 UTC (permalink / raw)
  To: Riley Williams, Steve Snyder; +Cc: Linux Kernel

We used to get this from an embedded PowerPC processor under 2.2.x when the
hardware to device driver interface got screwed up.

jeff

----- Original Message -----
From: "Riley Williams" <rhw@MemAlpha.CX>
To: "Steve Snyder" <swsnyder@home.com>
Cc: "Linux Kernel" <linux-kernel@vger.kernel.org>
Sent: Sunday, July 29, 2001 1:41 AM
Subject: Re: What does "Neighbour table overflow" message indicate?


> Hi Steve.
>
>  > I just got this sequence of messages in my system log:
>  >
>  > Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
>  > Jul 28 19:47:44 sunburn last message repeated 9 times
>  > Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
>  > Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
>  > Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
>  > Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
>  > Jul 28 19:48:09 sunburn last message repeated 3 times
>  > Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
>  > Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
>  >
>  > This is on a RedHat v7.1 + SMP kernel v2.4.7 system.  What is
>  > the kernel trying to tell me here?
>  >
>  > Please cc me as I am not a subscriber to this list.
>
> This could be on completely the wrong track, but here's one of the
> entries from the 2.4.5 kernel's Configure.help file (I don't yet have
> 2.4.7 on my system):
>
>  Q> ARP daemon support (EXPERIMENTAL)
>  Q> CONFIG_ARPD
>  Q>   Normally, the kernel maintains an internal cache which maps IP
>  Q>   addresses to hardware addresses on the local network, so that
>  Q>   Ethernet/Token Ring/ etc. frames are sent to the proper address
>  Q>   on the physical networking layer. For small networks having a
>  Q>   few hundred directly connected hosts or less, keeping this
>  Q>   address resolution (ARP) cache inside the kernel works well.
>  Q>
>  Q>   However, maintaining an internal ARP cache does not work well
>  Q>   for very large switched networks, and will use a lot of kernel
>  Q>   memory if TCP/IP connections are made to many machines on the
>  Q>   network.
>  Q>
>  Q>   If you say Y here, the kernel's internal ARP cache will never
>  Q>   grow to more than 256 entries (the oldest entries are expired
>  Q>   in a LIFO manner) and communication will be attempted with the
>  Q>   user space ARP daemon arpd. Arpd then answers the address
>  Q>   resolution request either from its own cache or by asking the
>  Q>   net.
>  Q>
>  Q>   This code is experimental and also obsolete. If you want to
>  Q>   use it, you need to find a version of the daemon arpd on the
>  Q>   net somewhere, and you should also say Y to "Kernel/User
>  Q>   network link driver", below. If unsure, say N.
>
> The text in there looks suspiciously related to your problem to me.
>
> Best wishes from Riley.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  2:15       ` Steve Snyder
  2001-07-29  9:08         ` Eric W. Biederman
  2001-07-29  9:46         ` Kurt Roeckx
@ 2001-07-29 13:55         ` Bernd Eckenfels
  2 siblings, 0 replies; 11+ messages in thread
From: Bernd Eckenfels @ 2001-07-29 13:55 UTC (permalink / raw)
  To: linux-kernel

In article <01072821151103.01125@mercury.snydernet.lan> you wrote:
>    if (net_ratelimit())
>        printk("Neighbour table overflow.\n");

> The reference to "net_ratelimit" make me wonder if it is related to 
> iptables.  I am using iptable, and have since kernel 2.4.1, but I've seen 
> these messages before.  Hmmm.

Net ratelimit is used to limit the rate of messages or actions done by the
network module. In this case it only ensures, that the printk message is not
printed too often. The actual condition why the message is printed is above
this if.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-29  1:53   ` Steve Snyder
  2001-07-29  1:57     ` Chris Wedgwood
@ 2001-07-30 12:38     ` Carlos O'Donell Jr.
  2001-07-30 23:28       ` Rob Landley
  1 sibling, 1 reply; 11+ messages in thread
From: Carlos O'Donell Jr. @ 2001-07-30 12:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: swsnyder


> network module. In this case it only ensures, that the printk message is not
> printed too often. The actual condition why the message is printed is above
> this if.
> 
> Greetings
> Bernd
> -

Snyder,

Just by looking at your email, @home, I can guess that your
cable modem is connected to an HFC Cable network segment.

In general these segments are extremely large and due to the
nature of the users, can cause large amounts of arp broadcast
traffic during peak times.

The message you are seeing is directly related to your arp cache
overflowing.

I've seen this message during high traffic hours on my 2.2.x 
firewall.

Things to check:

- Is your netmask set correctly?
- Check to see how many hosts are on your segment?

======================================================
Why the kernel spat what it spat : blow by blow
======================================================

N.B. Using 2.4.7 Kernel Source.

I think the critical point is:

In route.c:

639: int err = arp_bind_neighbour(&rt->u.dst);
640:                if (err) {
...			[snip]

Which means that if the binding of an arp neighbour fails, then
we trod down the path closer towards that printk, that has
caused us so much distress.

In arp.c, we look for "arp_bind_neighbour" and find it on line 429:

Right off the bat, we hope that:

434:        if (dev == NULL)
435:                return -EINVAL;

Isn't the case :)

Unless, it's alredy bound, then the next line is the case...

436:        if (n == NULL) {

And the only return that is non-zero is from:

440:                n = __neigh_lookup_errno(
441:#ifdef CONFIG_ATM_CLIP
442:                    dev->type == ARPHRD_ATM ? &clip_tbl :
443:#endif
444:                    &arp_tbl, &nexthop, dev);
445:                if (IS_ERR(n)) 
446:                        return PTR_ERR(n);

So __neigh_lookup_errno is the culprit...

In ./include/net/neighbour.h we have the function defined:

266:static inline struct neighbour *
267:__neigh_lookup_errno(struct neigh_table *tbl, const void *pkey,
268:struct net_device *dev)
...
275:       return neigh_create(tbl, pkey, dev)

Is the interesting point.. since our table is overflowing, we
need to find the point where the entry is created :)

Off we go to line 288 in ./net/core/neighbour.c:
(I love to trace source!)

296:        n = neigh_alloc(tbl);
297:        if (n == NULL)
298:                return ERR_PTR(-ENOBUFS);

Hrm... -ENOBUFS :)

In neigh_alloc, same file:

235:         if (tbl->entries > tbl->gc_thresh3 ||
236:            (tbl->entries > tbl->gc_thresh2 &&
237:             now - tbl->last_flush > 5*HZ)) {
238:                if (neigh_forced_gc(tbl) == 0 &&
239:                    tbl->entries > tbl->gc_thresh3)
240:                        return NULL;
241:        }

Which leads us to note that if the cache is growing faster than
the garbage collecting (ref counting code) is being done, and we
begin to exceed our allocations, we will trigger a table 
overflow.

Can you make the tables bigger?
What type of inpact does this have?
Should we be asking @Home to make segments smaller? 
(Probably not possible)

In ./net/ipv4/arp.c you could change the GC collection parameters...
I'm not sure how they were tuned?

Line 187:
        gc_interval:    30 * HZ,
        gc_thresh1:     128,
        gc_thresh2:     512,
        gc_thresh3:     1024,

Hrm... just pondering.


=================================================================

Cheers,
Carlos O'Donell Jr.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What does "Neighbour table overflow" message indicate?
  2001-07-30 12:38     ` Carlos O'Donell Jr.
@ 2001-07-30 23:28       ` Rob Landley
  0 siblings, 0 replies; 11+ messages in thread
From: Rob Landley @ 2001-07-30 23:28 UTC (permalink / raw)
  To: Carlos O'Donell Jr., linux-kernel; +Cc: swsnyder

On Monday 30 July 2001 08:38, Carlos O'Donell Jr. wrote:
> > network module. In this case it only ensures, that the printk message is
> > not printed too often. The actual condition why the message is printed is
> > above this if.
> >
> > Greetings
> > Bernd
> > -
>
> Snyder,
>
> Just by looking at your email, @home, I can guess that your
> cable modem is connected to an HFC Cable network segment.

Random datapoint: it does this on Road Runner cable modems in Austin, too 
(2.2.17 or so did, anyway.  Haven't had a monitor physically hooked up to my 
old 486 gateway since I moved it to 2.4).  Basically any cable modem that 
acts like a hub instead of a router (letting all packets through instead of 
just the one for the mac address it's connected to).

I made it go away somehow (~6 months back).  Possibly deselecting all the ARP 
stuff when I recompiled, I honestly don't remember at this point.  2.4 never 
did it (that I saw), but I compiled that sucker myself and don't log into my 
gateway much (it's a dumb ip masquerader running no daemons) so...

> In general these segments are extremely large and due to the
> nature of the users, can cause large amounts of arp broadcast
> traffic during peak times.

Forget peak times.  It did this 24/7 for me.

Bloated the logs something fierce.  I vaguely remember looking in the source 
and finding the message and going "so why doesn't it just delete something 
out of the neighbor table then?"  Never did figure that one out.  (It 
happened when the machine wasn't actually in USE, just connected to the net 
but otherwise idle.  No connections masquerading through it, nothing...)

Rob


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-08-02 20:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-07-29  1:23 What does "Neighbour table overflow" message indicate? Steve Snyder
     [not found] ` <20010729133848.A3254@weta.f00f.org>
2001-07-29  1:53   ` Steve Snyder
2001-07-29  1:57     ` Chris Wedgwood
2001-07-29  2:15       ` Steve Snyder
2001-07-29  9:08         ` Eric W. Biederman
2001-07-29  9:46         ` Kurt Roeckx
2001-07-29 13:55         ` Bernd Eckenfels
2001-07-30 12:38     ` Carlos O'Donell Jr.
2001-07-30 23:28       ` Rob Landley
2001-07-29  5:41 ` Riley Williams
2001-07-29 13:50   ` jeff millar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox