Netdev List
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-05-10 10:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, casteyde.christian, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <4DC909BD.5080903@cs.helsinki.fi>

Le mardi 10 mai 2011 à 12:47 +0300, Pekka Enberg a écrit :
> On 5/10/11 11:43 AM, Eric Dumazet wrote:
> > I am trying to follow things but honestly I am lost.
> > Isnt commit 1759415e63 planned for 2.6.40 ?
> > ( ref :
> > http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=1759415e630e5db0dd2390df9f94892cbfb9a8a2 )
> 
> Yes, it's for 2.6.40.
> 
> > How shall we fix things for 2.6.39 ? I thought my patch was OK for that.
> >
> 
> It's so late in the release cycle that I think the best option is to fix 
> it in 2.6.40 and backport it to -stable together with the above commit.
> 
> > Its a bit hard to work with you on this stuff, for a report I made ages
> > ago, I find it incredible its not yet fixed in linux-2.6.
> 
> It's not incredible, I simply managed to miss your patch. Sorry about that.
> 

Pekka, my word was probably too strong, sorry for that.

What I meant is I dont understand how Christoph expect to solve this
problem if irqsafe_cpu_cmpxchg_double() is used everywhere.





^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Pekka Enberg @ 2011-05-10  9:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Pekka Enberg, casteyde.christian,
	Andrew Morton, netdev, bugzilla-daemon, bugme-daemon,
	Vegard Nossum
In-Reply-To: <1305016988.2614.6.camel@edumazet-laptop>

On 5/10/11 11:43 AM, Eric Dumazet wrote:
> I am trying to follow things but honestly I am lost.
> Isnt commit 1759415e63 planned for 2.6.40 ?
> ( ref :
> http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=1759415e630e5db0dd2390df9f94892cbfb9a8a2 )

Yes, it's for 2.6.40.

> How shall we fix things for 2.6.39 ? I thought my patch was OK for that.
>

It's so late in the release cycle that I think the best option is to fix 
it in 2.6.40 and backport it to -stable together with the above commit.

> Its a bit hard to work with you on this stuff, for a report I made ages
> ago, I find it incredible its not yet fixed in linux-2.6.

It's not incredible, I simply managed to miss your patch. Sorry about that.

                     Pekka

^ permalink raw reply

* Re: future developments of usbnet
From: Richard Cochran @ 2011-05-10  9:42 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201105062045.37336.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

On Fri, May 06, 2011 at 08:45:37PM +0200, Oliver Neukum wrote:
> Hi,
> 
> I'd like to get a feeling what people are working out there regarding usbnet.
> So please, if you do something, or think something ought to be done, please
> speak up now.
> 
> IMHO usbnet needs better support for
> 
> - batching protocols
> - double buffering on the rx path

I would like see phylib and (if possible) NAPI, although for different
reasons than the problems mentioned in this thread. I am willing and
able to do the work, but I obviously don't have all of the usbnet
hardware available for testing.

Thanks,

Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
From: Eric Dumazet @ 2011-05-10  8:59 UTC (permalink / raw)
  To: Einar EL Lueck; +Cc: davem, netdev, Frank Blaschka
In-Reply-To: <1305017672.2614.9.camel@edumazet-laptop>

Le mardi 10 mai 2011 à 10:54 +0200, Eric Dumazet a écrit :

> I am currently working on this stuff [adding even more batching and
> probably bugs as well ], so instead of revert I'll try to find a way to
> fix this.
> 
> If you already have a script to reproduce the bug on virtual devices on
> x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
> copy of it.
> 
> Thanks for the reminder.

BTW make sure latest linux-2.6 still exhibits the problem, we fixed some
things after original Octavian commit

List of commits :

commit ceaaec98ad99859ac90ac6863ad0a6cd075d8e0e
net: deinit automatic LIST_HEAD

commit f87e6f47933e3ebeced9bb12615e830a72cedce4 
net: dont leave active on stack LIST_HEAD 




^ permalink raw reply

* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
From: Eric Dumazet @ 2011-05-10  8:54 UTC (permalink / raw)
  To: Einar EL Lueck; +Cc: davem, netdev, Frank Blaschka
In-Reply-To: <OF0F4919C3.B9CFCAE8-ONC125788C.002D5EB1-C125788C.002D83B7@de.ibm.com>

Le mardi 10 mai 2011 à 10:17 +0200, Einar EL Lueck a écrit :
> Hi Dave,
> 
> Einar EL Lueck/Germany/IBM wrote on 04/29/2011 04:45:45 PM:
> 
> > From:
> >
> > Einar EL Lueck/Germany/IBM
> >
> > To:
> >
> > opurdila@ixiacom.com, netdev@vger.kernel.org, linux-
> > s390@vger.kernel.org, davem@davemloft.net
> >
> > Cc:
> >
> > Frank Blaschka/Germany/IBM@IBMDE
> >
> > Date:
> >
> > 04/29/2011 04:45 PM
> >
> > Subject:
> >
> > Re: oops during unregister_netdevice interface enslaved to bond -
> regression
> >
> > Hi Octavian,
> >
> > On 04/15/2011 10:53 AM, Frank Blaschka wrote:
> > > Hi Octavian,
> > >
> > > your commit 443457242beb6716b43db4d62fe148eab5515505 introduced
> > this regression.
> > > I have reviewed the net device unregister code but did not
> > understand it very well.
> > > I have seen the problem only in combination with bonding. Can you
> > give me some help
> > > how to go on with this problem. I can reproduced it very easy on
> asingle CPU
> > > machine.
> > >
> >
> > In this case rollback_registered_many iterates over the list of devs
> > that initially has just one device in it. In a loop it calls
> > call_netdevice_notifiers(NETDEV_UNREGISTER, dev) which triggers the
> > bonding driver to call dev_close_many for the same device. That call
> > to dev_close_many leads to the addition of the same device to the
> > list over which rollback_registered_many is iterating. Consequently,
> > netdev_unregister_kobject(dev) is called twice for the same device.
> > Frank captured the result in his mail.
> >
> 
> Calls to the *_many functions introduced by Octavian may never interleave
> because
> the traversed lists modify each other. This was the root cause for the
> symptom that Frank discovered. Octavian is not a valid mail recipient
> anymore and did not react from any new mail address. I suggest to revert
> the commit.
> 

Hello Einar

I am currently working on this stuff [adding even more batching and
probably bugs as well ], so instead of revert I'll try to find a way to
fix this.

If you already have a script to reproduce the bug on virtual devices on
x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
copy of it.

Thanks for the reminder.


^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-05-10  8:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Pekka Enberg, casteyde.christian, Andrew Morton,
	netdev, bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <alpine.DEB.2.00.1105091502260.26839@router.home>

Le lundi 09 mai 2011 à 15:04 -0500, Christoph Lameter a écrit :
> On Mon, 9 May 2011, Pekka Enberg wrote:
> 
> > On Wed, 20 Apr 2011, Eric Dumazet wrote:
> > > [PATCH v4] slub: dont use cmpxchg_double if KMEMCHECK or DEBUG_PAGEALLOC
> > >
> > > Christian Casteyde reported a KMEMCHECK splat in slub code.
> > >
> > > Problem is now we are lockless and allow IRQ in slab_alloc(), the object
> > > we manipulate from freelist can be allocated and freed right before we
> > > try to read object->next.
> > >
> > > Same problem can happen with DEBUG_PAGEALLOC
> > >
> > > Just dont use cmpxchg_double() if either CONFIG_KMEMCHECK or
> > > CONFIG_DEBUG_PAGEALLOC is defined.
> >
> > Christoph, Eric, is this still relevant after commit 1759415 ("slub: Remove
> > CONFIG_CMPXCHG_LOCAL ifdeffery") in slab/next of slab.git?
> 
> There is still an issue and now you can no longer fix the thing through
> CONFIG_CMPXCHG_LOCAL.
> 
> It needs to be legal for slub to deref the counter even if the object has
> been freed.
> 

I am trying to follow things but honestly I am lost.

Isnt commit 1759415e63 planned for 2.6.40 ?
( ref :
http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=1759415e630e5db0dd2390df9f94892cbfb9a8a2 )

How shall we fix things for 2.6.39 ? I thought my patch was OK for that.


Its a bit hard to work with you on this stuff, for a report I made ages
ago, I find it incredible its not yet fixed in linux-2.6.

Christoph, is your plan to make SLUB not compatable with
CONFIG_DEBUG_PAGEALLOC ? Luckily we still have SLAB ;)

I am a bit surprised of 1759415e63 commit. Its obviously wrong for
DEBUG_PAGEALLOG. Sure KMEMCHECK could be handled differently (since we
are !SMP in this case)

???




^ permalink raw reply

* Fw: oops during unregister_netdevice interface enslaved to bond - regression
From: Einar EL Lueck @ 2011-05-10  8:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Frank Blaschka


Hi Dave,

Einar EL Lueck/Germany/IBM wrote on 04/29/2011 04:45:45 PM:

> From:
>
> Einar EL Lueck/Germany/IBM
>
> To:
>
> opurdila@ixiacom.com, netdev@vger.kernel.org, linux-
> s390@vger.kernel.org, davem@davemloft.net
>
> Cc:
>
> Frank Blaschka/Germany/IBM@IBMDE
>
> Date:
>
> 04/29/2011 04:45 PM
>
> Subject:
>
> Re: oops during unregister_netdevice interface enslaved to bond -
regression
>
> Hi Octavian,
>
> On 04/15/2011 10:53 AM, Frank Blaschka wrote:
> > Hi Octavian,
> >
> > your commit 443457242beb6716b43db4d62fe148eab5515505 introduced
> this regression.
> > I have reviewed the net device unregister code but did not
> understand it very well.
> > I have seen the problem only in combination with bonding. Can you
> give me some help
> > how to go on with this problem. I can reproduced it very easy on
asingle CPU
> > machine.
> >
>
> In this case rollback_registered_many iterates over the list of devs
> that initially has just one device in it. In a loop it calls
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev) which triggers the
> bonding driver to call dev_close_many for the same device. That call
> to dev_close_many leads to the addition of the same device to the
> list over which rollback_registered_many is iterating. Consequently,
> netdev_unregister_kobject(dev) is called twice for the same device.
> Frank captured the result in his mail.
>

Calls to the *_many functions introduced by Octavian may never interleave
because
the traversed lists modify each other. This was the root cause for the
symptom that Frank discovered. Octavian is not a valid mail recipient
anymore and did not react from any new mail address. I suggest to revert
the commit.

Regards,
Einar.


^ permalink raw reply

* Re: 8390 drivers (was: Re: Debian kernel 2.6.38-5)
From: Christian T. Steigies @ 2011-05-10  7:29 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Finn Thain, Thorsten Glaser, linux-m68k, Stephen Hemminger,
	David S. Miller, netdev
In-Reply-To: <BANLkTik27HjWOF0s5HRqgva5zYzQRis27Q@mail.gmail.com>

On Tue, May 10, 2011 at 08:52:48AM +0200, Geert Uytterhoeven wrote:
> >
> > zorro8390.mod.c
> 
> That's a generated file, so you can ignore it.

ok

> Yeah, speed depends on CPU type and motherboard/CPU interface.

well, it was much faster for a while. Now, after downloading and during
installing packages (and switching to dependency based booting, etc), I get:

eth0: mismatched read page pointer:  1 vs ff.
and
eth0: mismatched read page pointer:  1 vs f0. (I think, maybe fe or  0)

A lot of them. So many that I can not stop the output or do anything on the
machine :-(
There is a different message as well, but it is not readable.

Christian

^ permalink raw reply

* Re: 8390 drivers (was: Re: Debian kernel 2.6.38-5)
From: Geert Uytterhoeven @ 2011-05-10  6:52 UTC (permalink / raw)
  To: Geert Uytterhoeven, Finn Thain, Thorsten Glaser, linux-m68k,
	Stephen Hemminger
In-Reply-To: <20110509213213.GA28675@chumley.earth.sol>

On Mon, May 9, 2011 at 23:32, Christian T. Steigies <cts@debian.org> wrote:
> On Mon, May 09, 2011 at 10:38:18PM +0200, Geert Uytterhoeven wrote:
>> On Mon, May 9, 2011 at 22:25, Christian T. Steigies <cts@debian.org> wrote:
>> > On Mon, May 09, 2011 at 09:16:16AM +0200, Geert Uytterhoeven wrote:
>> >> On Mon, May 9, 2011 at 01:14, Finn Thain <fthain@telegraphics.com.au> wrote:
>> >> > On Sun, 8 May 2011, Christian T. Steigies wrote:
>> >> >> PS 2.6.28 did not boot: kernel too old. When was TLS introduced? I'll try to
>> >> >> apply the patch you mentioned in your other message.
>> >> >
>> >> > I sometimes test network cards with busybox. It can be built without
>> >> > linking in glibc and doesn't need TLS.
>> >> >
>> >> >
>> >> >> [  130.870000] eth0: trigger_send() called with the transmitter busy.
>> >> >> [  132.240000] ------------[ cut here ]------------
>> >> >> [  132.240000] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x1ac/0x1cc()
>> >> >> [  132.250000] NETDEV WATCHDOG: eth0 (): transmit queue 0 timed out
>> >> >
>> >> > Looks a lot like this problem:
>> >> >
>> >> > http://patchwork.ozlabs.org/patch/27774/
>> >>
>> >> Yeah, that's a very likely culprit. Thx!
>> >
>> > I applied the following patch and it works! apt-get update that is, there is
>> > no sshd on this freshly installed system yet...
>> >
>> > I also changed this line to have 4 underscores at the beginning:
>> >
>> >        { 0xec940559, "____alloc_ei_netdev" },
>>
>> In which file is that?
>
> zorro8390.mod.c

That's a generated file, so you can ignore it.

>> > it does not show up in git diff, so I am not sure if this is a leftover from
>> > a previous checkout, or if it is also needed. And if it is needed, if it
>> > should be two underscores or four or none, and if so why or why not...
>> > Looking at the other drivers, there does not seem to be a lot of consistency?
>>
>> From this, I guess hydra and ne-h8300 are also affected, as they
>> include lib8390.c
>> theirselves and are linked with 8390.o?
>>
>> > diff --git a/drivers/net/zorro8390.c b/drivers/net/zorro8390.c
>> > index b78a38d9..8c7c522 100644
>> > --- a/drivers/net/zorro8390.c
>> > +++ b/drivers/net/zorro8390.c
>> > @@ -126,7 +126,7 @@ static int __devinit zorro8390_init_one(struct zorro_dev *z,
>> >
>> >     board = z->resource.start;
>> >     ioaddr = board+cards[i].offset;
>> > -    dev = alloc_ei_netdev();
>> > +    dev = ____alloc_ei_netdev(0);
>> >     if (!dev)
>> >        return -ENOMEM;
>> >     if (!request_mem_region(ioaddr, NE_IO_EXTENT*2, DRV_NAME)) {
>> > @@ -146,15 +146,15 @@ static int __devinit zorro8390_init_one(struct zorro_dev *z,
>> >  static const struct net_device_ops zorro8390_netdev_ops = {
>> >        .ndo_open               = zorro8390_open,
>> >        .ndo_stop               = zorro8390_close,
>> > -       .ndo_start_xmit         = ei_start_xmit,
>> > -       .ndo_tx_timeout         = ei_tx_timeout,
>> > -       .ndo_get_stats          = ei_get_stats,
>> > -       .ndo_set_multicast_list = ei_set_multicast_list,
>> > +       .ndo_start_xmit         = __ei_start_xmit,
>> > +       .ndo_tx_timeout         = __ei_tx_timeout,
>> > +       .ndo_get_stats          = __ei_get_stats,
>> > +       .ndo_set_multicast_list = __ei_set_multicast_list,
>> >        .ndo_validate_addr      = eth_validate_addr,
>> >        .ndo_set_mac_address    = eth_mac_addr,
>> >        .ndo_change_mtu         = eth_change_mtu,
>> >  #ifdef CONFIG_NET_POLL_CONTROLLER
>> > -       .ndo_poll_controller    = ei_poll,
>> > +       .ndo_poll_controller    = __ei_poll,
>> >  #endif
>> >  };
>>
>> And 8390.o can be removed from drivers/net/Makefile?
>
> I did not change that and it still works, modprobe zorro8390 did not load
> 8390. The speed is rather slow, though, 10kbit/sec seems to be the max
> speed. I have not been able to log in via ssh yet, maybe after it has
> finished downloading packages.

Yeah, speed depends on CPU type and motherboard/CPU interface.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH net-2.6] vlan: fix GVRP at dismantle time
From: Eric Dumazet @ 2011-05-10  6:40 UTC (permalink / raw)
  To: David Miller; +Cc: mirqus, alex, netdev, jesse, greearb, Patrick McHardy
In-Reply-To: <1304972251.3050.11.camel@edumazet-laptop>

Le lundi 09 mai 2011 à 22:17 +0200, Eric Dumazet a écrit :
> Le lundi 09 mai 2011 à 21:05 +0200, Eric Dumazet a écrit :
> 
> > BTW, bug must be present in net-2.6, if we unload vlan module (since in this
> > case we also had a non NULL head )
> 
> Yes, I confirm we have the bug in linux-2.6
> 

Here is a patch to address this problem.

Thanks !

[PATCH net-2.6] vlan: fix GVRP at dismantle time

ip link add link eth2 eth2.103 type vlan id 103 gvrp on loose_binding on
ip link set eth2.103 up
rmmod tg3    # driver providing eth2

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp]
 PGD 11d251067 PUD 11b9e0067 PMD 0 
 Oops: 0000 [#1] SMP 
 last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex
 CPU 0 
 Modules linked in: tg3(-) 8021q garp nfsd lockd auth_rpcgss sunrpc libphy sg [last unloaded: x_tables]
 
 Pid: 11494, comm: rmmod Tainted: G        W   2.6.39-rc6-00261-gfd71257-dirty #580 HP ProLiant BL460c G6
 RIP: 0010:[<ffffffffa0030c9e>]  [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp]
 RSP: 0018:ffff88007a19bae8  EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff88011b5e2000 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: 0000000000000175 RDI: ffffffffa0030d5b
 RBP: ffff88007a19bb18 R08: 0000000000000001 R09: ffff88011bd64a00
 R10: ffff88011d34ec00 R11: 0000000000000000 R12: 0000000000000002
 R13: ffff88007a19bc48 R14: ffff88007a19bb88 R15: 0000000000000001
 FS:  0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f77d76c0
 CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
 CR2: 0000000000000000 CR3: 000000011a675000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process rmmod (pid: 11494, threadinfo ffff88007a19a000, task ffff8800798595c0)
 Stack:
  ffff88007a19bb36 ffff88011c84b800 ffff88011b5e2000 ffff88007a19bc48
  ffff88007a19bb88 0000000000000006 ffff88007a19bb38 ffffffffa003a5f6
  ffff88007a19bb38 670088007a19bba8 ffff88007a19bb58 ffffffffa00397e7
 Call Trace:
  [<ffffffffa003a5f6>] vlan_gvrp_request_leave+0x46/0x50 [8021q]
  [<ffffffffa00397e7>] vlan_dev_stop+0xb7/0xc0 [8021q]
  [<ffffffff8137e427>] __dev_close_many+0x87/0xe0
  [<ffffffff8137e507>] dev_close_many+0x87/0x110
  [<ffffffff8137e630>] rollback_registered_many+0xa0/0x240
  [<ffffffff8137e7e9>] unregister_netdevice_many+0x19/0x60
  [<ffffffffa00389eb>] vlan_device_event+0x53b/0x550 [8021q]
  [<ffffffff8143f448>] ? ip6mr_device_event+0xa8/0xd0
  [<ffffffff81479d03>] notifier_call_chain+0x53/0x80
  [<ffffffff81062539>] __raw_notifier_call_chain+0x9/0x10
  [<ffffffff81062551>] raw_notifier_call_chain+0x11/0x20
  [<ffffffff8137df82>] call_netdevice_notifiers+0x32/0x60
  [<ffffffff8137e69f>] rollback_registered_many+0x10f/0x240
  [<ffffffff8137e85f>] rollback_registered+0x2f/0x40
  [<ffffffff8137e8c8>] unregister_netdevice_queue+0x58/0x90
  [<ffffffff8137e9eb>] unregister_netdev+0x1b/0x30
  [<ffffffffa005d73f>] tg3_remove_one+0x6f/0x10b [tg3]

We should call vlan_gvrp_request_leave() from unregister_vlan_dev(),
not from vlan_dev_stop(), because vlan_gvrp_uninit_applicant() 
is called right after unregister_netdevice_queue(). In batch mode,
unregister_netdevice_queue() doesn’t immediately call vlan_dev_stop().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/8021q/vlan.c     |    3 +++
 net/8021q/vlan_dev.c |    3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 7850412..0eb1a88 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -124,6 +124,9 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 
 	grp->nr_vlans--;
 
+	if (vlan->flags & VLAN_FLAG_GVRP)
+		vlan_gvrp_request_leave(dev);
+
 	vlan_group_set_device(grp, vlan_id, NULL);
 	if (!grp->killall)
 		synchronize_net();
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index e34ea9e..b2ff6c8 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -487,9 +487,6 @@ static int vlan_dev_stop(struct net_device *dev)
 	struct vlan_dev_info *vlan = vlan_dev_info(dev);
 	struct net_device *real_dev = vlan->real_dev;
 
-	if (vlan->flags & VLAN_FLAG_GVRP)
-		vlan_gvrp_request_leave(dev);
-
 	dev_mc_unsync(real_dev, dev);
 	dev_uc_unsync(real_dev, dev);
 	if (dev->flags & IFF_ALLMULTI)



^ permalink raw reply related

* slcan: fix ldisc->open retval
From: Oliver Hartkopp @ 2011-05-10  6:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, matvejchikov

TTY layer expects 0 if the ldisc->open operation succeeded.

Reported-by: Matvejchikov Ilya <matvejchikov@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>

---

diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
index b423965..1b49df6 100644
--- a/drivers/net/can/slcan.c
+++ b/drivers/net/can/slcan.c
@@ -583,7 +583,9 @@ static int slcan_open(struct tty_struct *tty)
 	/* Done.  We have linked the TTY line to a channel. */
 	rtnl_unlock();
 	tty->receive_room = 65536;	/* We don't flow control */
-	return sl->dev->base_addr;
+
+	/* TTY layer expects 0 on success */
+	return 0;
 
 err_free_chan:
 	sl->tty = NULL;


^ permalink raw reply related

* [PATCH] xfrm: Don't allow esn with disabled anti replay detection
From: Steffen Klassert @ 2011-05-10  5:43 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev

Unlike the standard case, disabled anti replay detection needs some
nontrivial extra treatment on ESN. RFC 4303 states:

Note: If a receiver chooses to not enable anti-replay for an SA, then
the receiver SHOULD NOT negotiate ESN in an SA management protocol.
Use of ESN creates a need for the receiver to manage the anti-replay
window (in order to determine the correct value for the high-order
bits of the ESN, which are employed in the ICV computation), which is
generally contrary to the notion of disabling anti-replay for an SA.

So return an error if an ESN state with disabled anti replay detection
is inserted for now and add the extra treatment later if we need it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_replay.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index e8a7814..47f1b86 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -535,6 +535,9 @@ int xfrm_init_replay(struct xfrm_state *x)
 		    replay_esn->bmp_len * sizeof(__u32) * 8)
 			return -EINVAL;
 
+	if ((x->props.flags & XFRM_STATE_ESN) && replay_esn->replay_window == 0)
+		return -EINVAL;
+
 	if ((x->props.flags & XFRM_STATE_ESN) && x->replay_esn)
 		x->repl = &xfrm_replay_esn;
 	else
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] xfrm: Assign the inner mode output function to the dst entry
From: Steffen Klassert @ 2011-05-10  5:36 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev

As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit
ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.

With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h      |    3 +++
 net/ipv4/xfrm4_output.c |    8 ++++++--
 net/ipv4/xfrm4_state.c  |    1 +
 net/ipv6/xfrm6_output.c |    6 +++---
 net/ipv6/xfrm6_state.c  |    1 +
 net/xfrm/xfrm_policy.c  |   14 +++++++++++++-
 6 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 6ae4bc5..20afeaa 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -324,6 +324,7 @@ struct xfrm_state_afinfo {
 	int			(*tmpl_sort)(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n);
 	int			(*state_sort)(struct xfrm_state **dst, struct xfrm_state **src, int n);
 	int			(*output)(struct sk_buff *skb);
+	int			(*output_finish)(struct sk_buff *skb);
 	int			(*extract_input)(struct xfrm_state *x,
 						 struct sk_buff *skb);
 	int			(*extract_output)(struct xfrm_state *x,
@@ -1454,6 +1455,7 @@ static inline int xfrm4_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 extern int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm4_output(struct sk_buff *skb);
+extern int xfrm4_output_finish(struct sk_buff *skb);
 extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 extern int xfrm6_extract_header(struct sk_buff *skb);
@@ -1470,6 +1472,7 @@ extern __be32 xfrm6_tunnel_spi_lookup(struct net *net, xfrm_address_t *saddr);
 extern int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm6_output(struct sk_buff *skb);
+extern int xfrm6_output_finish(struct sk_buff *skb);
 extern int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 				 u8 **prevhdr);
 
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 571aa96..2d51840 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -69,7 +69,7 @@ int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(xfrm4_prepare_output);
 
-static int xfrm4_output_finish(struct sk_buff *skb)
+int xfrm4_output_finish(struct sk_buff *skb)
 {
 #ifdef CONFIG_NETFILTER
 	if (!skb_dst(skb)->xfrm) {
@@ -86,7 +86,11 @@ static int xfrm4_output_finish(struct sk_buff *skb)
 
 int xfrm4_output(struct sk_buff *skb)
 {
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
+
 	return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, skb,
-			    NULL, skb_dst(skb)->dev, xfrm4_output_finish,
+			    NULL, dst->dev,
+			    x->outer_mode->afinfo->output_finish,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 1717c64..805d63e 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -78,6 +78,7 @@ static struct xfrm_state_afinfo xfrm4_state_afinfo = {
 	.init_tempsel		= __xfrm4_init_tempsel,
 	.init_temprop		= xfrm4_init_temprop,
 	.output			= xfrm4_output,
+	.output_finish		= xfrm4_output_finish,
 	.extract_input		= xfrm4_extract_input,
 	.extract_output		= xfrm4_extract_output,
 	.transport_finish	= xfrm4_transport_finish,
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 8e688b3..49a91c5 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -79,7 +79,7 @@ int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(xfrm6_prepare_output);
 
-static int xfrm6_output_finish(struct sk_buff *skb)
+int xfrm6_output_finish(struct sk_buff *skb)
 {
 #ifdef CONFIG_NETFILTER
 	IP6CB(skb)->flags |= IP6SKB_XFRM_TRANSFORMED;
@@ -97,9 +97,9 @@ static int __xfrm6_output(struct sk_buff *skb)
 	if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
 	    ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
 		dst_allfrag(skb_dst(skb)))) {
-			return ip6_fragment(skb, xfrm6_output_finish);
+			return ip6_fragment(skb, x->outer_mode->afinfo->output_finish);
 	}
-	return xfrm6_output_finish(skb);
+	return x->outer_mode->afinfo->output_finish(skb);
 }
 
 int xfrm6_output(struct sk_buff *skb)
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index afe941e..248f0b2 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -178,6 +178,7 @@ static struct xfrm_state_afinfo xfrm6_state_afinfo = {
 	.tmpl_sort		= __xfrm6_tmpl_sort,
 	.state_sort		= __xfrm6_state_sort,
 	.output			= xfrm6_output,
+	.output_finish		= xfrm6_output_finish,
 	.extract_input		= xfrm6_extract_input,
 	.extract_output		= xfrm6_extract_output,
 	.transport_finish	= xfrm6_transport_finish,
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 15792d8..b4d745e 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1406,6 +1406,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 	struct net *net = xp_net(policy);
 	unsigned long now = jiffies;
 	struct net_device *dev;
+	struct xfrm_mode *inner_mode;
 	struct dst_entry *dst_prev = NULL;
 	struct dst_entry *dst0 = NULL;
 	int i = 0;
@@ -1436,6 +1437,17 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 			goto put_states;
 		}
 
+		if (xfrm[i]->sel.family == AF_UNSPEC) {
+			inner_mode = xfrm_ip2inner_mode(xfrm[i],
+							xfrm_af2proto(family));
+			if (!inner_mode) {
+				err = -EAFNOSUPPORT;
+				dst_release(dst);
+				goto put_states;
+			}
+		} else
+			inner_mode = xfrm[i]->inner_mode;
+
 		if (!dst_prev)
 			dst0 = dst1;
 		else {
@@ -1464,7 +1476,7 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		dst1->lastuse = now;
 
 		dst1->input = dst_discard;
-		dst1->output = xfrm[i]->outer_mode->afinfo->output;
+		dst1->output = inner_mode->afinfo->output;
 
 		dst1->next = dst_prev;
 		dst_prev = dst1;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 10/10] ipv4: xfrm: Eliminate ->rt_src reference in policy code.
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


Rearrange xfrm4_dst_lookup() so that it works by calling a helper
function __xfrm_dst_lookup() that takes an explicit flow key storage
area as an argument.

Use this new helper in xfrm4_get_saddr() so we can fetch the selected
source address from the flow instead of from rt->rt_src

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/xfrm4_policy.c |   34 +++++++++++++++++++++-------------
 1 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 7ff973b..981e43e 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -18,38 +18,46 @@
 
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo;
 
-static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
-					  const xfrm_address_t *saddr,
-					  const xfrm_address_t *daddr)
+static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4,
+					    int tos,
+					    const xfrm_address_t *saddr,
+					    const xfrm_address_t *daddr)
 {
-	struct flowi4 fl4 = {
-		.daddr = daddr->a4,
-		.flowi4_tos = tos,
-	};
 	struct rtable *rt;
 
+	memset(fl4, 0, sizeof(*fl4));
+	fl4->daddr = daddr->a4;
+	fl4->flowi4_tos = tos;
 	if (saddr)
-		fl4.saddr = saddr->a4;
+		fl4->saddr = saddr->a4;
 
-	rt = __ip_route_output_key(net, &fl4);
+	rt = __ip_route_output_key(net, fl4);
 	if (!IS_ERR(rt))
 		return &rt->dst;
 
 	return ERR_CAST(rt);
 }
 
+static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
+					  const xfrm_address_t *saddr,
+					  const xfrm_address_t *daddr)
+{
+	struct flowi4 fl4;
+
+	return __xfrm4_dst_lookup(net, &fl4, tos, saddr, daddr);
+}
+
 static int xfrm4_get_saddr(struct net *net,
 			   xfrm_address_t *saddr, xfrm_address_t *daddr)
 {
 	struct dst_entry *dst;
-	struct rtable *rt;
+	struct flowi4 fl4;
 
-	dst = xfrm4_dst_lookup(net, 0, NULL, daddr);
+	dst = __xfrm4_dst_lookup(net, &fl4, 0, NULL, daddr);
 	if (IS_ERR(dst))
 		return -EHOSTUNREACH;
 
-	rt = (struct rtable *)dst;
-	saddr->a4 = rt->rt_src;
+	saddr->a4 = fl4.saddr;
 	dst_release(dst);
 	return 0;
 }
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 9/10] infiniband: Remove rt->rt_src usage in addr4_resolve()
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


Use an explicit flow key and fetch it from there.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/infiniband/core/addr.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 4ffc224..8e21d45 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -185,15 +185,20 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	__be32 dst_ip = dst_in->sin_addr.s_addr;
 	struct rtable *rt;
 	struct neighbour *neigh;
+	struct flowi4 fl4;
 	int ret;
 
-	rt = ip_route_output(&init_net, dst_ip, src_ip, 0, addr->bound_dev_if);
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.daddr = dst_ip;
+	fl4.saddr = src_ip;
+	fl4.flowi4_oif = addr->bound_dev_if;
+	rt = ip_route_output_key(&init_net, &fl4);
 	if (IS_ERR(rt)) {
 		ret = PTR_ERR(rt);
 		goto out;
 	}
 	src_in->sin_family = AF_INET;
-	src_in->sin_addr.s_addr = rt->rt_src;
+	src_in->sin_addr.s_addr = fl4.saddr;
 
 	if (rt->dst.dev->flags & IFF_LOOPBACK) {
 		ret = rdma_translate_ip((struct sockaddr *) dst_in, addr);
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 8/10] sctp: Remove rt->rt_src usage in sctp_v4_get_saddr()
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


Flow key is available, so fetch it from there.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/protocol.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4f270ac..4de77cb 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -566,7 +566,7 @@ static void sctp_v4_get_saddr(struct sctp_sock *sk,
 
 	if (rt) {
 		saddr->v4.sin_family = AF_INET;
-		saddr->v4.sin_addr.s_addr = rt->rt_src;
+		saddr->v4.sin_addr.s_addr = fl->u.ip4.saddr;
 	}
 }
 
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 7/10] ipvs: Remove all remaining references to rt->rt_{src,dst}
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


These values are always obtainable via the ip_vs_conn flow key.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netfilter/ipvs/ip_vs_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_xmit.c |   17 +++++++++++++----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 07accf6..fa8c1fd 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1384,7 +1384,7 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 	    skb_rtable(skb)->rt_flags & RTCF_LOCAL) {
 		IP_VS_DBG(1, "%s(): "
 			  "local delivery to %pI4 but in FORWARD\n",
-			  __func__, &skb_rtable(skb)->rt_dst);
+			  __func__, &cp->fl.u.ip4.daddr);
 		verdict = NF_DROP;
 	}
 
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 2a300fe..99e7644 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -510,6 +510,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	struct rtable *rt;		/* Route to the other host */
 	int mtu;
 	struct iphdr *iph = ip_hdr(skb);
+	struct flowi4 *fl4;
 	int local;
 
 	EnterFunction(10);
@@ -549,8 +550,10 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	}
 #endif
 
+	fl4 = &cp->fl.u.ip4;
+
 	/* From world but DNAT to loopback address? */
-	if (local && ipv4_is_loopback(rt->rt_dst) &&
+	if (local && ipv4_is_loopback(fl4->daddr) &&
 	    rt_is_input_route(skb_rtable(skb))) {
 		IP_VS_DBG_RL_PKT(1, AF_INET, pp, skb, 0, "ip_vs_nat_xmit(): "
 				 "stopping DNAT to loopback address");
@@ -767,6 +770,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	__be16 df = old_iph->frag_off;
 	struct iphdr  *iph;			/* Our new IP header */
 	unsigned int max_headroom;		/* The extra header space needed */
+	struct flowi4 *fl4;
 	int    mtu;
 	int ret;
 
@@ -833,6 +837,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	skb_dst_drop(skb);
 	skb_dst_set(skb, &rt->dst);
 
+	fl4 = &cp->fl.u.ip4;
+
 	/*
 	 *	Push down and install the IPIP header.
 	 */
@@ -842,8 +848,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	iph->frag_off		=	df;
 	iph->protocol		=	IPPROTO_IPIP;
 	iph->tos		=	tos;
-	iph->daddr		=	rt->rt_dst;
-	iph->saddr		=	rt->rt_src;
+	iph->daddr		=	fl4->daddr;
+	iph->saddr		=	fl4->saddr;
 	iph->ttl		=	old_iph->ttl;
 	ip_select_ident(iph, &rt->dst, NULL);
 
@@ -1127,6 +1133,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 		struct ip_vs_protocol *pp, int offset)
 {
 	struct rtable	*rt;	/* Route to the other host */
+	struct flowi4 *fl4;
 	int mtu;
 	int rc;
 	int local;
@@ -1176,8 +1183,10 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	}
 #endif
 
+	fl4 = &cp->fl.u.ip4;
+
 	/* From world but DNAT to loopback address? */
-	if (local && ipv4_is_loopback(rt->rt_dst) &&
+	if (local && ipv4_is_loopback(fl4->daddr) &&
 	    rt_is_input_route(skb_rtable(skb))) {
 		IP_VS_DBG(1, "%s(): "
 			  "stopping DNAT to loopback %pI4\n",
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 6/10] ipvs: Store a flow key in ip_vs_conn and use it in route lookups.
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


This is a key step in being able to eliminate the remaining references
to rt->rt_{src,dst} in the IPVS code.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_vs.h             |    1 +
 net/netfilter/ipvs/ip_vs_xmit.c |   25 +++++++++++++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index e0b7f13..6122c71 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -481,6 +481,7 @@ struct ip_vs_conn {
 	struct net              *net;           /* Name space */
 #endif
 	/* Protocol, addresses and port numbers */
+	struct flowi		fl;
 	u16                     af;             /* address family */
 	__be16                  cport;
 	__be16                  vport;
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index e5ef75b..2a300fe 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -86,19 +86,25 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 
 /* Get route to destination or remote server */
 static struct rtable *
-__ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
+__ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_conn *cp,
+		   struct ip_vs_dest *dest,
 		   __be32 daddr, u32 rtos, int rt_mode)
 {
 	struct net *net = dev_net(skb_dst(skb)->dev);
+	struct flowi4 *fl4;
 	struct rtable *rt;			/* Route to the other host */
 	struct rtable *ort;			/* Original route */
 	int local;
 
+	fl4 = &cp->fl.u.ip4;
 	if (dest) {
 		spin_lock(&dest->dst_lock);
 		if (!(rt = (struct rtable *)
 		      __ip_vs_dst_check(dest, rtos))) {
-			rt = ip_route_output(net, dest->addr.ip, 0, rtos, 0);
+			memset(fl4, 0, sizeof(*fl4));
+			fl4->daddr = dest->addr.ip;
+			fl4->flowi4_tos = rtos;
+			rt = ip_route_output_key(net, fl4);
 			if (IS_ERR(rt)) {
 				spin_unlock(&dest->dst_lock);
 				IP_VS_DBG_RL("ip_route_output error, dest: %pI4\n",
@@ -113,7 +119,10 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 		daddr = dest->addr.ip;
 		spin_unlock(&dest->dst_lock);
 	} else {
-		rt = ip_route_output(net, daddr, 0, rtos, 0);
+		memset(fl4, 0, sizeof(*fl4));
+		fl4->daddr = daddr;
+		fl4->flowi4_tos = rtos;
+		rt = ip_route_output_key(net, fl4);
 		if (IS_ERR(rt)) {
 			IP_VS_DBG_RL("ip_route_output error, dest: %pI4\n",
 				     &daddr);
@@ -386,7 +395,7 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt(skb, NULL, iph->daddr, RT_TOS(iph->tos),
+	if (!(rt = __ip_vs_get_out_rt(skb, cp, NULL, iph->daddr, RT_TOS(iph->tos),
 				      IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
 
@@ -515,7 +524,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 		IP_VS_DBG(10, "filled cport=%d\n", ntohs(*p));
 	}
 
-	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	if (!(rt = __ip_vs_get_out_rt(skb, cp, cp->dest, cp->daddr.ip,
 				      RT_TOS(iph->tos),
 				      IP_VS_RT_MODE_LOCAL |
 					IP_VS_RT_MODE_NON_LOCAL |
@@ -763,7 +772,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	if (!(rt = __ip_vs_get_out_rt(skb, cp, cp->dest, cp->daddr.ip,
 				      RT_TOS(tos), IP_VS_RT_MODE_LOCAL |
 						   IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
@@ -994,7 +1003,7 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	if (!(rt = __ip_vs_get_out_rt(skb, cp, cp->dest, cp->daddr.ip,
 				      RT_TOS(iph->tos),
 				      IP_VS_RT_MODE_LOCAL |
 					IP_VS_RT_MODE_NON_LOCAL)))
@@ -1141,7 +1150,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	 * mangle and send the packet here (only for VS/NAT)
 	 */
 
-	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	if (!(rt = __ip_vs_get_out_rt(skb, cp, cp->dest, cp->daddr.ip,
 				      RT_TOS(ip_hdr(skb)->tos),
 				      IP_VS_RT_MODE_LOCAL |
 					IP_VS_RT_MODE_NON_LOCAL |
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 5/10] ipvs: Eliminate rt->rt_dst usage in __ip_vs_get_out_rt().
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


We can simply track what destination address is used based upon which
code block is taken at the top of the function.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netfilter/ipvs/ip_vs_xmit.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c4a19f9..e5ef75b 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -110,6 +110,7 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 				  &dest->addr.ip,
 				  atomic_read(&rt->dst.__refcnt), rtos);
 		}
+		daddr = dest->addr.ip;
 		spin_unlock(&dest->dst_lock);
 	} else {
 		rt = ip_route_output(net, daddr, 0, rtos, 0);
@@ -125,7 +126,7 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 	      rt_mode)) {
 		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI4\n",
 			     (rt->rt_flags & RTCF_LOCAL) ?
-			     "local":"non-local", &rt->rt_dst);
+			     "local":"non-local", &daddr);
 		ip_rt_put(rt);
 		return NULL;
 	}
@@ -133,14 +134,14 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 	    !((ort = skb_rtable(skb)) && ort->rt_flags & RTCF_LOCAL)) {
 		IP_VS_DBG_RL("Redirect from non-local address %pI4 to local "
 			     "requires NAT method, dest: %pI4\n",
-			     &ip_hdr(skb)->daddr, &rt->rt_dst);
+			     &ip_hdr(skb)->daddr, &daddr);
 		ip_rt_put(rt);
 		return NULL;
 	}
 	if (unlikely(!local && ipv4_is_loopback(ip_hdr(skb)->saddr))) {
 		IP_VS_DBG_RL("Stopping traffic from loopback address %pI4 "
 			     "to non-local address, dest: %pI4\n",
-			     &ip_hdr(skb)->saddr, &rt->rt_dst);
+			     &ip_hdr(skb)->saddr, &daddr);
 		ip_rt_put(rt);
 		return NULL;
 	}
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 4/10] ipvs: Use IP_VS_RT_MODE_* instead of magic constants.
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netfilter/ipvs/ip_vs_xmit.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 6132b21..c4a19f9 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -440,7 +440,8 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr, NULL, 0, 2)))
+	if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr, NULL, 0,
+					 IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
 
 	/* MTU checking */
@@ -632,7 +633,9 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	}
 
 	if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-					 0, 1|2|4)))
+					 0, (IP_VS_RT_MODE_LOCAL |
+					     IP_VS_RT_MODE_NON_LOCAL |
+					     IP_VS_RT_MODE_RDR))))
 		goto tx_error_icmp;
 	local = __ip_vs_is_local_route6(rt);
 	/*
@@ -875,7 +878,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6,
-					 &saddr, 1, 1|2)))
+					 &saddr, 1, (IP_VS_RT_MODE_LOCAL |
+						     IP_VS_RT_MODE_NON_LOCAL))))
 		goto tx_error_icmp;
 	if (__ip_vs_is_local_route6(rt)) {
 		dst_release(&rt->dst);
@@ -1050,7 +1054,8 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-					 0, 1|2)))
+					 0, (IP_VS_RT_MODE_LOCAL |
+					     IP_VS_RT_MODE_NON_LOCAL))))
 		goto tx_error_icmp;
 	if (__ip_vs_is_local_route6(rt)) {
 		dst_release(&rt->dst);
@@ -1254,7 +1259,9 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	 */
 
 	if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-					 0, 1|2|4)))
+					 0, (IP_VS_RT_MODE_LOCAL |
+					     IP_VS_RT_MODE_NON_LOCAL |
+					     IP_VS_RT_MODE_RDR))))
 		goto tx_error_icmp;
 
 	local = __ip_vs_is_local_route6(rt);
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 3/10] ipv4: udp: Eliminate remaining uses of rt->rt_src
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


We already track and pass around the correct flow key,
so simply use it in udp_send_skb().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/udp.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 66341a3..599374f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -706,12 +706,11 @@ static void udp4_hwcsum(struct sk_buff *skb, __be32 src, __be32 dst)
 	}
 }
 
-static int udp_send_skb(struct sk_buff *skb, __be32 daddr, __be32 dport)
+static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4)
 {
 	struct sock *sk = skb->sk;
 	struct inet_sock *inet = inet_sk(sk);
 	struct udphdr *uh;
-	struct rtable *rt = (struct rtable *)skb_dst(skb);
 	int err = 0;
 	int is_udplite = IS_UDPLITE(sk);
 	int offset = skb_transport_offset(skb);
@@ -723,7 +722,7 @@ static int udp_send_skb(struct sk_buff *skb, __be32 daddr, __be32 dport)
 	 */
 	uh = udp_hdr(skb);
 	uh->source = inet->inet_sport;
-	uh->dest = dport;
+	uh->dest = fl4->fl4_dport;
 	uh->len = htons(len);
 	uh->check = 0;
 
@@ -737,14 +736,14 @@ static int udp_send_skb(struct sk_buff *skb, __be32 daddr, __be32 dport)
 
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
 
-		udp4_hwcsum(skb, rt->rt_src, daddr);
+		udp4_hwcsum(skb, fl4->saddr, fl4->daddr);
 		goto send;
 
 	} else
 		csum = udp_csum(skb);
 
 	/* add protocol-dependent pseudo-header */
-	uh->check = csum_tcpudp_magic(rt->rt_src, daddr, len,
+	uh->check = csum_tcpudp_magic(fl4->saddr, fl4->daddr, len,
 				      sk->sk_protocol, csum);
 	if (uh->check == 0)
 		uh->check = CSUM_MANGLED_0;
@@ -778,7 +777,7 @@ static int udp_push_pending_frames(struct sock *sk)
 	if (!skb)
 		goto out;
 
-	err = udp_send_skb(skb, fl4->daddr, fl4->fl4_dport);
+	err = udp_send_skb(skb, fl4);
 
 out:
 	up->len = 0;
@@ -963,7 +962,7 @@ back_from_confirm:
 				  msg->msg_flags);
 		err = PTR_ERR(skb);
 		if (skb && !IS_ERR(skb))
-			err = udp_send_skb(skb, daddr, dport);
+			err = udp_send_skb(skb, fl4);
 		goto out;
 	}
 
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 2/10] ipv4: icmp: Eliminate remaining uses of rt->rt_src
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


On input packets, rt->rt_src always equals ip_hdr(skb)->saddr

Anything that mangles or otherwise changes the IP header must
relookup the route found at skb_rtable().  Therefore this
invariant must always hold true.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/icmp.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 853a670..3314394 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -345,7 +345,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	icmp_param->data.icmph.checksum = 0;
 
 	inet->tos = ip_hdr(skb)->tos;
-	daddr = ipc.addr = rt->rt_src;
+	daddr = ipc.addr = ip_hdr(skb)->saddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
 	if (icmp_param->replyopts.opt.opt.optlen) {
@@ -930,12 +930,12 @@ static void icmp_address_reply(struct sk_buff *skb)
 		BUG_ON(mp == NULL);
 		for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) {
 			if (*mp == ifa->ifa_mask &&
-			    inet_ifa_match(rt->rt_src, ifa))
+			    inet_ifa_match(ip_hdr(skb)->saddr, ifa))
 				break;
 		}
 		if (!ifa && net_ratelimit()) {
 			printk(KERN_INFO "Wrong address mask %pI4 from %s/%pI4\n",
-			       mp, dev->name, &rt->rt_src);
+			       mp, dev->name, &ip_hdr(skb)->saddr);
 		}
 	}
 }
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 1/10] ipv4: Pass explicit daddr arg to ip_send_reply().
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


This eliminates an access to rt->rt_src.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip.h     |    4 ++--
 net/ipv4/ip_output.c |    7 +++----
 net/ipv4/tcp_ipv4.c  |    4 ++--
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 0b30d3a..07118c7 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -174,8 +174,8 @@ static inline __u8 ip_reply_arg_flowi_flags(const struct ip_reply_arg *arg)
 	return (arg->flags & IP_REPLY_ARG_NOSRCCHECK) ? FLOWI_FLAG_ANYSRC : 0;
 }
 
-void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *arg,
-		   unsigned int len); 
+void ip_send_reply(struct sock *sk, struct sk_buff *skb, __be32 daddr,
+		   struct ip_reply_arg *arg, unsigned int len); 
 
 struct ipv4_config {
 	int	log_martians;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index cd89d22..70778d4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1459,20 +1459,19 @@ static int ip_reply_glue_bits(void *dptr, char *to, int offset,
  *	Should run single threaded per socket because it uses the sock
  *     	structure to pass arguments.
  */
-void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *arg,
-		   unsigned int len)
+void ip_send_reply(struct sock *sk, struct sk_buff *skb, __be32 daddr,
+		   struct ip_reply_arg *arg, unsigned int len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct ip_options_data replyopts;
 	struct ipcm_cookie ipc;
 	struct flowi4 fl4;
-	__be32 daddr;
 	struct rtable *rt = skb_rtable(skb);
 
 	if (ip_options_echo(&replyopts.opt.opt, skb))
 		return;
 
-	daddr = ipc.addr = rt->rt_src;
+	ipc.addr = daddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 2b65503..f67fb34 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -651,7 +651,7 @@ static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb)
 	arg.flags = (sk && inet_sk(sk)->transparent) ? IP_REPLY_ARG_NOSRCCHECK : 0;
 
 	net = dev_net(skb_dst(skb)->dev);
-	ip_send_reply(net->ipv4.tcp_sock, skb,
+	ip_send_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr,
 		      &arg, arg.iov[0].iov_len);
 
 	TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
@@ -726,7 +726,7 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
 	if (oif)
 		arg.bound_dev_if = oif;
 
-	ip_send_reply(net->ipv4.tcp_sock, skb,
+	ip_send_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr,
 		      &arg, arg.iov[0].iov_len);
 
 	TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 0/10] Another batch of rt->rt_{src,dst} removals
From: David Miller @ 2011-05-10  5:31 UTC (permalink / raw)
  To: netdev


After this series there are very nearly no references remaining
outside of net/ipv4/route.c

They all come in the usual pattern, making sure the flow key can be
materialized at the site where rt->rt_{src,dst} is needed.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4
From: Noah Meyerhans @ 2011-05-10  4:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 625914, netdev, bridge
In-Reply-To: <1304995124.4065.157.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Tue, May 10, 2011 at 03:38:44AM +0100, Ben Hutchings wrote:
> This is pretty weird.  Debian version 2.6.38-3 has a few bridging
> changes from stable 2.6.38.3 and 2.6.38.4, but they don't look like they
> would cause this.

I have apparently filed the bug against the wrong version of Debian's
kernel.  2.6.38-3 is not affected, and works as expected.  The change
was introduced in -4.  That may have been clear from the report itself,
but the report was filed against -3.  I've fixed that in the BTS.

I've also confirmed that -5 is affected, to no great surprise.

I'll investigate further.

noah


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox