All of lore.kernel.org
 help / color / mirror / Atom feed
* Virtual device and ARP table
@ 2010-06-07 10:21 Christophe Jelger
  2010-06-07 12:22 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Jelger @ 2010-06-07 10:21 UTC (permalink / raw)
  To: linux-kernel

Hello,

I am currently "resurrecting" a Linux module (called LUNAR) which I 
co-developed in 2007 and I'm having a weird kernel crash. This code 
basically used to work fine up to 2.6.18 which was the latest version 
before we stopped our development. I quickly ported it to 2.6.{31,32}: 
it compiles fine and loads fine, but it crashes/hangs the kernel when 
it's really being used.

The module is a virtual device used for MANET routing: with the current 
version, it basically "captures" DNS requests sent to the virtual 
interface --> this triggers the sending of a fake DNS reply (see below) 
and the creation of an ARP table entry for the destination (the MANET 
route is built at the same time). Packets can then be sent to the 
destination.

The problem I'm having is that the kernel quickly hangs after I create a 
new ARP entry (actually only if it's being used). If the entry I create 
is set to NUD_PERMANENT, then everything works fine! I use 
__neigh_lookup_errno to lookup/create the entry and neigh_lookup to 
set/update the MAC address. Note that the ARP entry is created without 
problem, but typically even just doing a userspace "arp -a" command can 
crash the kernel (it also hangs the userspace command!). Doing "arp -na" 
usually does NOT crash the kernel!

I guess the problem comes from a combination of ARP + DNS 
lookups/replies. Note that my kernel module has its own internal fake 
DNS server which captures lookups and sends replies directly back to the 
stack. What is amazing: if the ARP entry I create is set to 
NUD_PERMANENT, then I don't get any crash (however I cannot develop my 
module with permanent ARP entries).

I'm wondering if there were any major changes to the neighbor and arp 
code (between 2.6.18 and 2.6.31) that are somehow causing this problem ?...

Any hint is very welcome.

thanks in advance,
Christophe

PS: I can easily reproduce the problem, and was trying to debug with 
qemu and gdb server but so fra no success to clearly identify the 
problem. Last point: it seems the kernel does not really "crash" but 
rather ends up in some unstable state and maybe in a loop.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Virtual device and ARP table
  2010-06-07 10:21 Virtual device and ARP table Christophe Jelger
@ 2010-06-07 12:22 ` Eric Dumazet
  2010-06-07 13:03   ` Christophe Jelger
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2010-06-07 12:22 UTC (permalink / raw)
  To: Christophe Jelger; +Cc: linux-kernel, netdev

Le lundi 07 juin 2010 à 12:21 +0200, Christophe Jelger a écrit :
> Hello,
> 
> I am currently "resurrecting" a Linux module (called LUNAR) which I 
> co-developed in 2007 and I'm having a weird kernel crash. This code 
> basically used to work fine up to 2.6.18 which was the latest version 
> before we stopped our development. I quickly ported it to 2.6.{31,32}: 
> it compiles fine and loads fine, but it crashes/hangs the kernel when 
> it's really being used.
> 
> The module is a virtual device used for MANET routing: with the current 
> version, it basically "captures" DNS requests sent to the virtual 
> interface --> this triggers the sending of a fake DNS reply (see below) 
> and the creation of an ARP table entry for the destination (the MANET 
> route is built at the same time). Packets can then be sent to the 
> destination.
> 
> The problem I'm having is that the kernel quickly hangs after I create a 
> new ARP entry (actually only if it's being used). If the entry I create 
> is set to NUD_PERMANENT, then everything works fine! I use 
> __neigh_lookup_errno to lookup/create the entry and neigh_lookup to 
> set/update the MAC address. Note that the ARP entry is created without 
> problem, but typically even just doing a userspace "arp -a" command can 
> crash the kernel (it also hangs the userspace command!). Doing "arp -na" 
> usually does NOT crash the kernel!
> 
> I guess the problem comes from a combination of ARP + DNS 
> lookups/replies. Note that my kernel module has its own internal fake 
> DNS server which captures lookups and sends replies directly back to the 
> stack. What is amazing: if the ARP entry I create is set to 
> NUD_PERMANENT, then I don't get any crash (however I cannot develop my 
> module with permanent ARP entries).
> 
> I'm wondering if there were any major changes to the neighbor and arp 
> code (between 2.6.18 and 2.6.31) that are somehow causing this problem ?...
> 
> Any hint is very welcome.
> 
> thanks in advance,
> Christophe
> 
> PS: I can easily reproduce the problem, and was trying to debug with 
> qemu and gdb server but so fra no success to clearly identify the 
> problem. Last point: it seems the kernel does not really "crash" but 
> rather ends up in some unstable state and maybe in a loop.
> --

Hi Christophe

You should ask these kind of questions on netdev instead of lkml.

And of course, post your patch, or send us a crystal ball ;)

Yes, many things changed between 2.6.18 and 2.6.34



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Virtual device and ARP table
  2010-06-07 12:22 ` Eric Dumazet
@ 2010-06-07 13:03   ` Christophe Jelger
  2010-06-07 13:30     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Jelger @ 2010-06-07 13:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet wrote:
> Le lundi 07 juin 2010 à 12:21 +0200, Christophe Jelger a écrit :
>> Hello,
>>
>> I am currently "resurrecting" a Linux module (called LUNAR) which I 
>> co-developed in 2007 and I'm having a weird kernel crash. This code 
>> basically used to work fine up to 2.6.18 which was the latest version 
>> before we stopped our development. I quickly ported it to 2.6.{31,32}: 
>> it compiles fine and loads fine, but it crashes/hangs the kernel when 
>> it's really being used.
>>
>> The module is a virtual device used for MANET routing: with the current 
>> version, it basically "captures" DNS requests sent to the virtual 
>> interface --> this triggers the sending of a fake DNS reply (see below) 
>> and the creation of an ARP table entry for the destination (the MANET 
>> route is built at the same time). Packets can then be sent to the 
>> destination.
>>
>> The problem I'm having is that the kernel quickly hangs after I create a 
>> new ARP entry (actually only if it's being used). If the entry I create 
>> is set to NUD_PERMANENT, then everything works fine! I use 
>> __neigh_lookup_errno to lookup/create the entry and neigh_lookup to 
>> set/update the MAC address. Note that the ARP entry is created without 
>> problem, but typically even just doing a userspace "arp -a" command can 
>> crash the kernel (it also hangs the userspace command!). Doing "arp -na" 
>> usually does NOT crash the kernel!
>>
>> I guess the problem comes from a combination of ARP + DNS 
>> lookups/replies. Note that my kernel module has its own internal fake 
>> DNS server which captures lookups and sends replies directly back to the 
>> stack. What is amazing: if the ARP entry I create is set to 
>> NUD_PERMANENT, then I don't get any crash (however I cannot develop my 
>> module with permanent ARP entries).
>>
>> I'm wondering if there were any major changes to the neighbor and arp 
>> code (between 2.6.18 and 2.6.31) that are somehow causing this problem ?...
>>
>> Any hint is very welcome.
>>
>> thanks in advance,
>> Christophe
>>
>> PS: I can easily reproduce the problem, and was trying to debug with 
>> qemu and gdb server but so fra no success to clearly identify the 
>> problem. Last point: it seems the kernel does not really "crash" but 
>> rather ends up in some unstable state and maybe in a loop.
>> --
> 
> Hi Christophe
> 
> You should ask these kind of questions on netdev instead of lkml.
> 
> And of course, post your patch, or send us a crystal ball ;)
> 
> Yes, many things changed between 2.6.18 and 2.6.34
> 

Eric: thanks for the forward to the netdev list. Regarding the code, I 
of course welcome any help but didn't want to pollute the list with 
unsollicited code: I can of course of course send it directly to anyone 
who is willing to help (I can easily reproduce the problem on different 
machines).

Christophe




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Virtual device and ARP table
  2010-06-07 13:03   ` Christophe Jelger
@ 2010-06-07 13:30     ` Eric Dumazet
  2010-06-07 14:19       ` [RFC] lunar manet routing module Christophe Jelger
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2010-06-07 13:30 UTC (permalink / raw)
  To: Christophe Jelger; +Cc: netdev

Le lundi 07 juin 2010 à 15:03 +0200, Christophe Jelger a écrit :

> 
> Eric: thanks for the forward to the netdev list. Regarding the code, I 
> of course welcome any help but didn't want to pollute the list with 
> unsollicited code: I can of course of course send it directly to anyone 
> who is willing to help (I can easily reproduce the problem on different 
> machines).
> 

Christophe,

Unless patch is really huge, its ok to send it on netdev, with a RFC
label, so that only people with free time take a look, eventually.

[RFC] lunar: ....



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC] lunar manet routing module
  2010-06-07 13:30     ` Eric Dumazet
@ 2010-06-07 14:19       ` Christophe Jelger
  0 siblings, 0 replies; 5+ messages in thread
From: Christophe Jelger @ 2010-06-07 14:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

Eric Dumazet wrote:
> 
> Christophe,
> 
> Unless patch is really huge, its ok to send it on netdev, with a RFC
> label, so that only people with free time take a look, eventually.
> 
> [RFC] lunar: ....

[not sure what 'huge' means, I'm sending 60k -- sorry for the pollution]

Eric: thanks for the advice. Instead of a patch I attach a .tgz (hope 
it's ok) of the module code with a README explaining everything: it 
compiles for 2.6.31 and 2.6.32, didn't try more recent kernels because I 
actually want to deploy the whole thing on OpenWRT 2.6.32 on Linksys 
devices.

To all: the lunar module crashes the kernel, so be careful. I tried 
debugging with qemu and gdb server but could not find the bug(s) -- my 
experience for kernel debugging is in fact limited.

thanks in advance for any help,
Christophe

[-- Attachment #2: lunar.tgz --]
[-- Type: application/x-compressed-tar, Size: 59498 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-07 14:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-07 10:21 Virtual device and ARP table Christophe Jelger
2010-06-07 12:22 ` Eric Dumazet
2010-06-07 13:03   ` Christophe Jelger
2010-06-07 13:30     ` Eric Dumazet
2010-06-07 14:19       ` [RFC] lunar manet routing module Christophe Jelger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.