Netdev List
 help / color / mirror / Atom feed
* Re: Strange packet drops with heavy firewalling
From: Benny Amorsen @ 2010-04-12  6:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1270819762.2623.97.camel@edumazet-laptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :
>
>> Thank you very much for the help! I will report back whether it was the
>> hash buckets.
>
> OK
>
> You could try :
>
> ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
> ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100
>
> (to reduce tx completion irqs)

Alas, even with the hash buckets I still have the same problem. Perhaps
slightly less severe, but it's still there.

I implemented the other changes you suggested as well except for the
ethtool -G. I may try to switch to net-next if I can find an easy way to
make an RPM out of it.

Thank you for the help!

/proc/sys/net/netfilter/nf_conntrack_acct:1
/proc/sys/net/netfilter/nf_conntrack_buckets:1048576
/proc/sys/net/netfilter/nf_conntrack_checksum:1
/proc/sys/net/netfilter/nf_conntrack_count:43430
/proc/sys/net/netfilter/nf_conntrack_events:1
/proc/sys/net/netfilter/nf_conntrack_events_retry_timeout:15
/proc/sys/net/netfilter/nf_conntrack_expect_max:2048
/proc/sys/net/netfilter/nf_conntrack_generic_timeout:600
/proc/sys/net/netfilter/nf_conntrack_icmp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_log_invalid:1
/proc/sys/net/netfilter/nf_conntrack_max:1048576
/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal:0
/proc/sys/net/netfilter/nf_conntrack_tcp_loose:1
/proc/sys/net/netfilter/nf_conntrack_tcp_max_retrans:3
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close:10
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established:432000
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_last_ack:30
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_max_retrans:300
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_recv:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_sent:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_unacknowledged:300
/proc/sys/net/netfilter/nf_conntrack_udp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream:180


/Benny

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-12  6:01 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko
In-Reply-To: <OF19A2A36F.5B268C61-ON65257703.0012EB6A-65257703.0013F81C@in.ibm.com>

Le lundi 12 avril 2010 à 09:08 +0530, Krishna Kumar2 a écrit :
> Hi Eric,
> 
> Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM:
> 
> > I believe the following lines from dev_pick_tx() are not the problem :
> >
> >    if (sk && sk->sk_dst_cache)
> >       sk_tx_queue_set(sk, queue_index);
> >
> > It is IMHO not safe, because route for this socket might have just
> > changed and we are transmitting an old packet (queued some milli seconds
> > before, when route was different).
> >
> > We then memorize a queue_index that might be too big for the new device
> > of new selected route.
> >
> > Next packet we want to transmit will take the cached value of
> > queue_index, correct for old device, maybe not correct for new device.
> 
> When route changes, I think my patch had reset sk->sk_tx_queue_mapping
> by calling sk_tx_queue_clear. I don't know if I missed any path where
> the route changes and sk_dst_reset() was not called.
> 

Problem is when you reset sk->sk_tx_queue_mapping at the very moment
route (or destination) changes, we might have old packets queued in tx
queues, of the old ethernet device (eth0 : multi queue compatable)

2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
is rebuild, it points to a dst_entry referring a new device (eth1 : non
multiqueue)

3) When one old packet finally is transmitted, we do :

	queue_index = 1; // any value > 0

	if (sk && sk->sk_dst_cache)
		sk_tx_queue_set(sk, queue_index); // remember a >0 value

4) application does a sendmsg(), enqueues a new skb on eth1

5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
   we pick a non existent txq for eth1 device.

6) We crash.


> The following might be better to prove the panic is due to this, since
> your suggestion will hide a panic that happens somewhat rare (according
> to Denys):
> 
>       if (sk_tx_queue_recorded(sk)) {
>             queue_index = sk_tx_queue_get(sk);
> +           queue_index = dev_cap_txqueue(dev, queue_index);
>       } else {
> 

Sure, but I thought I was clear enough to prove this commit was wrong,
and we have to find a fix.

> Thanks,
> 
> - KK
> 
> > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> >
> > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > Author: Krishna Kumar <krkumar2@in.ibm.com>
> > Date:   Mon Oct 19 23:50:07 2009 +0000
> >
> >     net: Use sk_tx_queue_mapping for connected sockets
> >
> >     For connected sockets, the first run of dev_pick_tx saves the
> >     calculated txq in sk_tx_queue_mapping. This is not saved if
> >     either the device has a queue select or the socket is not
> >     connected. Next iterations of dev_pick_tx uses the cached value
> >     of sk_tx_queue_mapping.
> >
> >     Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 28b0b9e..fa88dcd 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
> >  static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> >                 struct sk_buff *skb)
> >  {
> > -   const struct net_device_ops *ops = dev->netdev_ops;
> > -   u16 queue_index = 0;
> > +   u16 queue_index;
> > +   struct sock *sk = skb->sk;
> > +
> > +   if (sk_tx_queue_recorded(sk)) {
> > +      queue_index = sk_tx_queue_get(sk);
> > +   } else {
> > +      const struct net_device_ops *ops = dev->netdev_ops;
> >
> > -   if (ops->ndo_select_queue)
> > -      queue_index = ops->ndo_select_queue(dev, skb);
> > -   else if (dev->real_num_tx_queues > 1)
> > -      queue_index = skb_tx_hash(dev, skb);
> > +      if (ops->ndo_select_queue) {
> > +         queue_index = ops->ndo_select_queue(dev, skb);
> > +      } else {
> > +         queue_index = 0;
> > +         if (dev->real_num_tx_queues > 1)
> > +            queue_index = skb_tx_hash(dev, skb);
> > +
> > +         if (sk && sk->sk_dst_cache)
> > +            sk_tx_queue_set(sk, queue_index);
> > +      }
> > +   }
> >
> >     skb_set_queue_mapping(skb, queue_index);
> >     return netdev_get_tx_queue(dev, queue_index);
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: [RFC PATCH 0/2] netdev: Add tracepoint to network/driver interface
From: Koki Sanagi @ 2010-04-12  5:20 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, izumi.taku, kaneshige.kenji, davem
In-Reply-To: <20100409110420.GA16609@hmsreliant.think-freely.org>

(2010/04/09 20:04), Neil Horman wrote:
> On Fri, Apr 09, 2010 at 04:37:53PM +0900, Koki Sanagi wrote:
>> These patches add tracepoints to network/driver interface.
>>
>> These tracepoints are helpful to investigate whether a packet passes or not.
>> For example, when Heart Beat is disconnected, that information is helpful
>> to investigate the cause is whether driver/device side or not.
>>
>> An output is below.
>>
>>    sshd-2443  [001] 68238.415621: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
>> <idle>-0     [001] 68238.417058: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=52
>> <idle>-0     [001] 68238.704363: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=100
>>    sshd-2443  [001] 68238.705459: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
>> <idle>-0     [001] 68238.706891: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=52
>> <idle>-0     [001] 68238.878736: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=100
>>    sshd-2443  [001] 68238.880361: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
>>
>> As other use case I have, we can get throughput per interface with some sort of
>> perf scripts. I plan to create it.
>>
>> Thanks
>> Koki Sanagi
>>
> You can get a reasonable estimate of per-interface throughput using ethtool or
> even ifconfig in a script.  What are the tracepoints needed for that?  Don't get
> me wrong, I think these tracepoints could have some potential use thats not
> covered by other tools, I just don't see the above as a conclusive reason to add
> them.

Yeah, a script using ethtool or ifconfig can do same thing. But it must be poll
driven. By contrast this implement using tracepoint is event-driven.
It means that a record leaves on memory certainly whenever kernel crashes.
It is useful to investigate network record from dump.
This is superior to ethtool or ifconfig scripts.


^ permalink raw reply

* Re: net-next: 2.6.34-rc1 regression: panic when running diagnostic on interface with IPv6
From: Stephen Hemminger @ 2010-04-12  4:49 UTC (permalink / raw)
  To: Emil S Tantilov; +Cc: netdev, David Miller
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1365FE4F6B7C4@rrsmsx501.amr.corp.intel.com>


----- "Emil S Tantilov" <emil.s.tantilov@intel.com> wrote:

> Stephen Hemminger wrote:
> > Send me your kernel config. And are you running tests online or
> > offline 
> 
> Config attached. The kernel runs on top of RHEL5.4, not sure if that
> is significant, but should explain some of the sysfs deprecated
> options enabled in it. 
> 
> The test is offline (online doesn't do much other than link check).
> 
> I also reproduced it with a fresh pull from this morning. Looks like
> it is easier to reproduce after you pass some traffic, just assigning
> IPv6 address may not be enough. Also rmmod hangs.
> 
> Thanks,
> Emil

I have a fix, but it needs more testing.

^ permalink raw reply

* linux-next: manual merge of the wireless tree with the net tree
From: Stephen Rothwell @ 2010-04-12  4:05 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev,
	Saravanan Dhanabal, Luciano Coelho

Hi John,

Today's linux-next merge of the wireless tree got a conflict in
drivers/net/wireless/wl12xx/wl1271_main.c between commit
22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
list_head") from the net tree and commit
2c10bb9cb3f9cecb71bd2cbb771778136433ebe2 ("wl1271: Fix mac80211
configuration requests during WL1271_STATE_OFF") from the wireless tree.

Just context changes.  I fixed it up (see below) and can carry the fix as
necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/wireless/wl12xx/wl1271_main.c
index aa970b7,283d5da..0000000
--- a/drivers/net/wireless/wl12xx/wl1271_main.c
+++ b/drivers/net/wireless/wl12xx/wl1271_main.c
@@@ -1267,11 -1304,15 +1305,15 @@@ struct wl1271_filter_params 
  	u8 mc_list[ACX_MC_ADDRESS_GROUP_MAX][ETH_ALEN];
  };
  
 -static u64 wl1271_op_prepare_multicast(struct ieee80211_hw *hw, int mc_count,
 -				       struct dev_addr_list *mc_list)
 +static u64 wl1271_op_prepare_multicast(struct ieee80211_hw *hw,
 +				       struct netdev_hw_addr_list *mc_list)
  {
  	struct wl1271_filter_params *fp;
 +	struct netdev_hw_addr *ha;
+ 	struct wl1271 *wl = hw->priv;
 -	int i;
+ 
+ 	if (unlikely(wl->state == WL1271_STATE_OFF))
+ 		return 0;
  
  	fp = kzalloc(sizeof(*fp), GFP_ATOMIC);
  	if (!fp) {

^ permalink raw reply

* Re: [PATCH] rps: add flow director support
From: Changli Gao @ 2010-04-12  3:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev, Tom Herbert
In-Reply-To: <1271001939.2078.61.camel@edumazet-laptop>

On Mon, Apr 12, 2010 at 12:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 12 avril 2010 à 05:42 +0800, Changli Gao a écrit :
>
> Changli
>
> I am a bit disappointed to find so many bugs in your patch.

:(. Thanks for your patiently review.

>
> I believe this is over engineering at this stage, we yet have to get
> some benches or real world results.

OK. Leave this patch here for testing. And I'll post benchmarking after I do.

>
> Plus it conflicts with the much more interesting upcoming stuff (RFS).
> You name this patch 'flow director', to get our attention, but it's an
> old idea of you, to get different weights on cpus, that RPS is not yet
> able to perform.
>

I don't think it conflicts with RFS. RFS is for applications, and RPS
flow director is for firewalls and routers. Because softirqs aren't
under scheduler control, in order to do softirq load balancing, we
have to scheduler flows internally. At the same time, RPS flow
director exposes more than the old design, and keeps the interface
rps_cpus there.

> Maybe this is the reason you forgot to CC Tom Herbert (and me) ?
>

Sorry for forgetting adding you and Tom to CC list.

> Consider now :
>
> 1) echo 65000 >/sys/class/net/eth0/queues/rx-0/rps_flow_0
>   possible crash, dereferencing a smaller cpumap.

Yea, I supposed cpu_online() would check cpu was valid or not. After
checking the code, I found that you were right. OK, I'll add this
sanity check.

if (cpu >= nr_cpumask_bits)
      return -EINVAL;

>
> 2) echo 3000000000 >/sys/class/net/eth0/queues/rx-0/rps_flow_0
>   probable crash, because of overflow in RPS_MAP_SIZE(flows)

did you mean rps_flows? Yea, RPS_MAP_SIZE may overflow. Maybe I need
check the upper boundary.

if (flows > USHORT_MAX)
     return -EINVAL;

>
> 3) How can rps_flow_attribute & rps_flow_attribute_size be static (one
> instance for whole kernel), if your intent is to have a per rxqueue
> attributes ? (/sys/class/net/eth0/queues/rx-0/ ...). Or the first lines
> of update_rps_flow_files() are completely wrong...
>
> echo 10 > /sys/class/net/eth0/queues/rx-0/rps_flows
> echo 2 > /sys/class/net/eth1/queues/rx-0/rps_flows
> cat /sys/class/net/eth0/queues/rx-0/rps_flow_9

Yea, each rxqueue has his own attributes. rps_flow_attribute is much
like rps_cpus_attribute. As sysfs_create_file() and
sysfs_remove_file() don't modify them, and only use them constantly,
we can make them static for all the rxqueues.

>
> 4) Lack of atomic changes of the RPS flows -> many packet reordering can
> occur.

Yea, if you do packet dispatching dynamically. I don't know how to
avoid packet reordering totally. If OOO doesn't occur frequently, it
isn't a real problem.

>
> 5) Many possible memory leaks in update_rps_flow_files(), you obviously
> were very lazy. We try to build a bug-free kernel, not only a 'cool
> kernel', and if you are lazy, your patches wont be accepted.
>

Yea, I was lazy, and didn't shrink the rps_flow_attribute. I'll add
reference counters for rps_flow_attributes to trace them usage, and
free them when they aren't needed.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002) on DEC Alpha Personal Workstation 433au
From: Grant Grundler @ 2010-04-12  3:44 UTC (permalink / raw)
  To: Adrian Glaubitz
  Cc: Grant Grundler, Kyle McMartin, David S. Miller, Joe Perches,
	netdev, linux-kernel
In-Reply-To: <20100405171318.GA18915@physik.fu-berlin.de>

On Mon, Apr 05, 2010 at 07:13:18PM +0200, Adrian Glaubitz wrote:
> Hi guys,
> 
> I installed Debian unstable on an old digital workstation "DEC Digital
> Personal Workstation 433au" (Miata) which has an on-board tulip
> network controller. I'm not really using that network controller but
> an off-board intel e1000 controller. However, I found that the tulip
> driver produces a lot of noise in the message log, the following
> message is repated periodically and spams the whole message log:
> 
> 0000:00:03.0: tulip_stop_rxtx() failed (CSR5 0xf0260000 CSR6 0xb3862002)
> 
> Do you think this is related to the fact that no cable is connected to
> the network controller?

It shouldn't be. The "stop_rxtx failed" error is because either 
CSR5_TS or CSR5_RS are still set in CSR5. This means the chip
thinks the transmit or receive protocol engines are still running.

What I suspect is the state isn't "running" by rather some other
"hung" state caused by failed (lack of) link. This has been reported
before but I've not looked that closely yet.

> The lspci output of the hardware looks like this:
> 
> test-adrian1:~# lspci
> 00:03.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 30)
> 00:07.0 ISA bridge: Contaq Microsystems 82c693
> 00:07.1 IDE interface: Contaq Microsystems 82c693
> 00:07.2 IDE interface: Contaq Microsystems 82c693
> 00:07.3 USB Controller: Contaq Microsystems 82c693
> 00:0b.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2064W [Millennium] (rev 01)
> 00:14.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
> 01:04.0 SCSI storage controller: QLogic Corp. ISP1020 Fast-wide SCSI (rev 05)
> 01:09.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller
> 
> If you need anymore verbose or debug output, please let me know.

I might. I first have to stare at the tulip programmers guide a while
sort.

cheers,
grant

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Krishna Kumar2 @ 2010-04-12  3:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Denys Fedorysychenko
In-Reply-To: <1271025353.2078.155.camel@edumazet-laptop>

Hi Eric,

Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM:

> I believe the following lines from dev_pick_tx() are not the problem :
>
>    if (sk && sk->sk_dst_cache)
>       sk_tx_queue_set(sk, queue_index);
>
> It is IMHO not safe, because route for this socket might have just
> changed and we are transmitting an old packet (queued some milli seconds
> before, when route was different).
>
> We then memorize a queue_index that might be too big for the new device
> of new selected route.
>
> Next packet we want to transmit will take the cached value of
> queue_index, correct for old device, maybe not correct for new device.

When route changes, I think my patch had reset sk->sk_tx_queue_mapping
by calling sk_tx_queue_clear. I don't know if I missed any path where
the route changes and sk_dst_reset() was not called.

The following might be better to prove the panic is due to this, since
your suggestion will hide a panic that happens somewhat rare (according
to Denys):

      if (sk_tx_queue_recorded(sk)) {
            queue_index = sk_tx_queue_get(sk);
+           queue_index = dev_cap_txqueue(dev, queue_index);
      } else {

Thanks,

- KK

> You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
>
> commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> Author: Krishna Kumar <krkumar2@in.ibm.com>
> Date:   Mon Oct 19 23:50:07 2009 +0000
>
>     net: Use sk_tx_queue_mapping for connected sockets
>
>     For connected sockets, the first run of dev_pick_tx saves the
>     calculated txq in sk_tx_queue_mapping. This is not saved if
>     either the device has a queue select or the socket is not
>     connected. Next iterations of dev_pick_tx uses the cached value
>     of sk_tx_queue_mapping.
>
>     Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 28b0b9e..fa88dcd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
>  static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>                 struct sk_buff *skb)
>  {
> -   const struct net_device_ops *ops = dev->netdev_ops;
> -   u16 queue_index = 0;
> +   u16 queue_index;
> +   struct sock *sk = skb->sk;
> +
> +   if (sk_tx_queue_recorded(sk)) {
> +      queue_index = sk_tx_queue_get(sk);
> +   } else {
> +      const struct net_device_ops *ops = dev->netdev_ops;
>
> -   if (ops->ndo_select_queue)
> -      queue_index = ops->ndo_select_queue(dev, skb);
> -   else if (dev->real_num_tx_queues > 1)
> -      queue_index = skb_tx_hash(dev, skb);
> +      if (ops->ndo_select_queue) {
> +         queue_index = ops->ndo_select_queue(dev, skb);
> +      } else {
> +         queue_index = 0;
> +         if (dev->real_num_tx_queues > 1)
> +            queue_index = skb_tx_hash(dev, skb);
> +
> +         if (sk && sk->sk_dst_cache)
> +            sk_tx_queue_set(sk, queue_index);
> +      }
> +   }
>
>     skb_set_queue_mapping(skb, queue_index);
>     return netdev_get_tx_queue(dev, queue_index);
>
>


^ permalink raw reply

* Linux arp flux problem
From: Ming-Ching Tiew @ 2010-04-12  3:16 UTC (permalink / raw)
  To: Net Dev


The following link explains the Linux arp flux problem pretty well, and I myself have been burnt badly by a life site where the "arp_filter" does not help at all.

         http://linux-ip.net/html/ether-arp.html

And I tested the kernel patch by Julian Anastasov, and it works 100% reliably :-

         http://www.ssi.bg/~ja/#hidden

My question is the patches has been around for many years, why has it not been included into the kernel ? Is it that Linux is supposed to have this "side effects" of arp linux on purpose ?

Best regards.


      

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Denys Fedorysychenko @ 2010-04-11 23:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller
In-Reply-To: <1271027483.2078.158.camel@edumazet-laptop>

On Monday 12 April 2010 02:11:23 Eric Dumazet wrote:
> Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit :
> > > It is IMHO not safe, because route for this socket might have just
> > > changed and we are transmitting an old packet (queued some milli
> > > seconds before, when route was different).
> > >
> > > We then memorize a queue_index that might be too big for the new device
> > > of new selected route.
> > >
> > > Next packet we want to transmit will take the cached value of
> > > queue_index, correct for old device, maybe not correct for new device.
> >
> > Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.
> >
> > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > >
> > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> >
> > I will try to revert it now. But to trigger bug probably i need 1-2 days.
> 
> I can try to reproduce bug here, with a multiqueue device, a non
> multiqueue device, and changing routes while transmitting bulk of
> packets, and of course some trafic shaping to slow down xmits.
> 
> Could you send :
> 
> ifconfig -a
> lspci
A bit more even, hopefully it is useful.
Additional note - i have few blackhole routes, so if specific route announce 
will disappear - it will be less specific blackhole route.

I have also some source routing (ip rule).


lspci

00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub 
(rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
2 (rev b1)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
4-5 (rev b1)
00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
5 (rev b1)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
6-7 (rev b1)
00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
7 (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express 
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI 
USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC 
Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 
09)
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI 
Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller 
(rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream 
Port (rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X 
Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E3 (rev 01)
04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
07:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
0b:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
0b:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
0c:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000

eth0      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:194.146.153.17  Bcast:0.0.0.0  Mask:255.255.255.224                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:1047262147 errors:0 dropped:0 overruns:0 frame:0                                                                                                            
          TX packets:1017399476 errors:0 dropped:0 overruns:0 carrier:0                                                                                                          
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:698944316059 (650.9 GiB)  TX bytes:694648889407 (646.9 GiB)                                                                                                   
          Memory:fc6e0000-fc700000                                                                                                                                               
                                                                                                                                                                                 
eth0.165  Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:172.30.1.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5375126 errors:0 dropped:0 overruns:0 frame:0                                                                                                               
          TX packets:3446299 errors:0 dropped:2 overruns:0 carrier:0                                                                                                             
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:1910233297 (1.7 GiB)  TX bytes:814740452 (776.9 MiB)                                                                                                          
                                                                                                                                                                                 
eth0.32   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.22.22.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:96167917 errors:0 dropped:0 overruns:0 frame:0                                                                                                              
          TX packets:166845841 errors:0 dropped:2 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:14589314556 (13.5 GiB)  TX bytes:126771970521 (118.0 GiB)                                                                                                     
                                                                                                                                                                                 
eth0.33   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.152.152.2  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                            
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:10171808 errors:0 dropped:0 overruns:0 frame:0                                                                                                              
          TX packets:130698572 errors:0 dropped:0 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:2690522850 (2.5 GiB)  TX bytes:25223310290 (23.4 GiB)                                                                                                         
                                                                                                                                                                                 
eth0.34   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.25.25.1  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:133415723 errors:0 dropped:0 overruns:0 frame:0                                                                                                             
          TX packets:1352 errors:0 dropped:1 overruns:0 carrier:0                                                                                                                
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:140007983122 (130.3 GiB)  TX bytes:117870 (115.1 KiB)                                                                                                         
                                                                                                                                                                                 
eth0.35   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth0.36   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:1474 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:125741 (122.7 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth0.38   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.0.4.1  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:674 errors:0 dropped:0 overruns:0 frame:0                                                                                                                   
          TX packets:188760217 errors:0 dropped:0 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:31004 (30.2 KiB)  TX bytes:212060676669 (197.4 GiB)                                                                                                           
                                                                                                                                                                                 
eth0.39   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.22.24.1  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:270892875 errors:0 dropped:0 overruns:0 frame:0                                                                                                             
          TX packets:309713487 errors:0 dropped:3 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:77224626928 (71.9 GiB)  TX bytes:215883462565 (201.0 GiB)                                                                                                     
                                                                                                                                                                                 
eth0.40   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:62.84.94.13  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                             
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth0.41   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:2.2.3.1  Bcast:0.0.0.0  Mask:255.255.255.128                                                                                                                 
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth1      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1                                                                                                             
          RX packets:949912 errors:0 dropped:0 overruns:0 frame:0                                                                                                                
          TX packets:192 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                 
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:429565276 (409.6 MiB)  TX bytes:8064 (7.8 KiB)                                                                                                                
          Memory:fc6c0000-fc6e0000                                                                                                                                               
                                                                                                                                                                                 
eth1.2    Link encap:Ethernet  HWaddr 00:1C:C0:19:23:80                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5462 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273100 (266.6 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth1.3    Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5462 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273100 (266.6 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth1.4    Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          inet addr:10.154.154.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                              
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5461 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:192 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                 
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273050 (266.6 KiB)  TX bytes:8064 (7.8 KiB)                                                                                                                   
                                                                                                                                                                                 
eth2      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CE                                                                                                                          
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
          Memory:fcde0000-fce00000                                                                                                                                               
                                                                                                                                                                                 
eth3      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CF                                                                                                                          
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
          Memory:fcd80000-fcda0000                                                                                                                                               
                                                                                                                                                                                 
ifb0      Link encap:Ethernet  HWaddr 6A:73:52:ED:4A:D6                                                                                                                          
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1                                                                                                                         
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb1      Link encap:Ethernet  HWaddr C2:97:7A:32:8B:59
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb2      Link encap:Ethernet  HWaddr 96:AC:DD:0D:68:B0
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb3      Link encap:Ethernet  HWaddr FA:D9:59:C6:16:E7
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3192 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3192 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:222919 (217.6 KiB)  TX bytes:222919 (217.6 KiB)

tunx0     Link encap:UNSPEC  HWaddr 
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:1.1.1.2  P-t-P:1.1.1.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12531732 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:0 (0.0 B)  TX bytes:1888404300 (1.7 GiB)

Router-Dora ~ # tc -d qdisc
qdisc hfsc 1: dev eth0 root refcnt 2
qdisc bfifo 101: dev eth0 parent 1:101 limit 12500000b
qdisc bfifo 111: dev eth0 parent 1:111 limit 3125Kb
qdisc bfifo 112: dev eth0 parent 1:112 limit 3125Kb
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc hfsc 1: dev eth0.33 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 1500 tsize 1436
qdisc bfifo 110: dev eth0.33 parent 1:110 limit 3000000b
qdisc bfifo 120: dev eth0.33 parent 1:120 limit 100000b
qdisc bfifo 130: dev eth0.33 parent 1:130 limit 100000b
qdisc bfifo 140: dev eth0.33 parent 1:140 limit 100000b
qdisc bfifo 150: dev eth0.33 parent 1:150 limit 1000000b
qdisc bfifo 160: dev eth0.33 parent 1:160 limit 100000b
qdisc bfifo 170: dev eth0.33 parent 1:170 limit 100000b
qdisc bfifo 180: dev eth0.33 parent 1:180 limit 100000b
qdisc bfifo 190: dev eth0.33 parent 1:190 limit 100000b
qdisc hfsc 1: dev eth0.38 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 3: dev eth0.38 parent 1:3 limit 5000000b
qdisc bfifo 4: dev eth0.38 parent 1:4 limit 5000000b
qdisc hfsc 1: dev eth0.39 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 3: dev eth0.39 parent 1:3 limit 3000000b
qdisc bfifo 20: dev eth0.39 parent 1:20 limit 20000000b
qdisc bfifo 31: dev eth0.39 parent 1:31 limit 1000000b
qdisc bfifo 30: dev eth0.39 parent 1:30 limit 5000000b
qdisc ingress ffff: dev eth1.3 parent ffff:fff1 ----------------
qdisc pfifo_fast 0: dev ifb0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb3 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev tunx0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1


Router-Dora ~ # ip link                                                                                                                                                          
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN                                                                                                              
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00                                                                                                                        
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP qlen 
1000                                                                                                
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff                                                                                                                           
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP qlen 1000
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:1e:68:57:4b:ce brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:1e:68:57:4b:cf brd ff:ff:ff:ff:ff:ff
6: eth0.32@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
7: eth0.33@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
8: eth0.34@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
9: eth0.35@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
10: eth0.36@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
11: eth0.38@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
12: eth0.39@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
13: eth0.40@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
14: eth0.41@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
15: eth0.165@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
16: eth1.2@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1c:c0:19:23:80 brd ff:ff:ff:ff:ff:ff
17: eth1.3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
18: eth1.4@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
19: ifb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether 6a:73:52:ed:4a:d6 brd ff:ff:ff:ff:ff:ff
20: ifb1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether c2:97:7a:32:8b:59 brd ff:ff:ff:ff:ff:ff
21: ifb2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether 96:ac:dd:0d:68:b0 brd ff:ff:ff:ff:ff:ff
22: ifb3: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether fa:d9:59:c6:16:e7 brd ff:ff:ff:ff:ff:ff
23: tunx0: <POINTOPOINT,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN 
qlen 1024
    link/[65534]





> 
> Thanks
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-11 23:11 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller
In-Reply-To: <201004120204.11405.nuclearcat@nuclearcat.com>

Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit :
> > It is IMHO not safe, because route for this socket might have just
> > changed and we are transmitting an old packet (queued some milli seconds
> > before, when route was different).
> > 
> > We then memorize a queue_index that might be too big for the new device
> > of new selected route.
> > 
> > Next packet we want to transmit will take the cached value of
> > queue_index, correct for old device, maybe not correct for new device.
> Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.
> 
> > 
> > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > 
> > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> I will try to revert it now. But to trigger bug probably i need 1-2 days.

I can try to reproduce bug here, with a multiqueue device, a non
multiqueue device, and changing routes while transmitting bulk of
packets, and of course some trafic shaping to slow down xmits.

Could you send :

ifconfig -a
lspci

Thanks



^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Denys Fedorysychenko @ 2010-04-11 23:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller
In-Reply-To: <1271025353.2078.155.camel@edumazet-laptop>

> It is IMHO not safe, because route for this socket might have just
> changed and we are transmitting an old packet (queued some milli seconds
> before, when route was different).
> 
> We then memorize a queue_index that might be too big for the new device
> of new selected route.
> 
> Next packet we want to transmit will take the cached value of
> queue_index, correct for old device, maybe not correct for new device.
Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.

> 
> You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> 
> commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
I will try to revert it now. But to trigger bug probably i need 1-2 days.


^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-11 22:35 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller
In-Reply-To: <201004112338.47019.nuclearcat@nuclearcat.com>

Le dimanche 11 avril 2010 à 23:38 +0300, Denys Fedorysychenko a écrit :
> Hi
> 
> Got recently NULL pointer dereference
> Note - it seems in 32-bit i didn't experience this issue.
> 
> Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to 
> 802.1Q vlan's), userspace application using tun.
> Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), 
> 
> full message received over netconsole attached
> More info can provide on request
> pièce jointe document texte brut (BUG)
> Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null)
> Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 
> Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP 
> Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4         /        
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>]  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0  EFLAGS: 00010202
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS:  0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20)
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack:
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace:
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <IRQ> 
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <EOI>
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 

> 48 8b 55 d0 // mov    -0x30(%rbp),%rdx

> 49 c1 e4 07 // shl    $0x7,%r12 

>  66 41 8b 86 a6 00 00 00 // mov 0xa6(%r14),%ax

> 4c 03 a2 00 03 00 00 // add 0x0300(%rdx),%r12

> 80 e4 cf  // and    $0xcf,%ah 

> 80 cc 20 // or     $0x20,%ah

> 49 8b 5c 24 08 // mov    0x8(%r12),%rbx   rcu_dereference(txq->qdisc);

> 66 41 89 86 a6 00 00 00 // mov    %ax,0xa6(%r14) 

> <48> 83 3b 00 // cmpq   $0x0,(%rbx)

> 0f 84 bb 00 00 00 // je ...

>  4c 8d ab 9c 00 00 00 4c 89 ef e8
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  RSP <ffff880028203db0>
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000
> Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]---
> Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt
> Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G      D    2.6.33.2-build-0051-64 #4
> Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace:
> Apr 11 23:28:52 ROUTER kernel: [21095.464036]  <IRQ>  [<ffffffff81259743>] panic+0xa0/0x161
> Apr 11 23:28:52 ROUTER kernel: [21095.464307]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
> Apr 11 23:28:52 ROUTER kernel: [21095.464533]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
> Apr 11 23:28:52 ROUTER kernel: [21095.464757]  [<ffffffff81006651>] oops_end+0xaa/0xba
> Apr 11 23:28:52 ROUTER kernel: [21095.464980]  [<ffffffff8101e653>] no_context+0x1f3/0x202
> Apr 11 23:28:52 ROUTER kernel: [21095.465211]  [<ffffffff810523a4>] ? tick_program_event+0x25/0x27
> Apr 11 23:28:52 ROUTER kernel: [21095.465437]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
> Apr 11 23:28:52 ROUTER kernel: [21095.465668]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
> Apr 11 23:28:52 ROUTER kernel: [21095.465907]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466139]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
> Apr 11 23:28:52 ROUTER kernel: [21095.466372]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466605]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466833]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
> Apr 11 23:28:52 ROUTER kernel: [21095.467064]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
> Apr 11 23:28:52 ROUTER kernel: [21095.467293]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
> Apr 11 23:28:52 ROUTER kernel: [21095.467516]  [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.467743]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 11 23:28:52 ROUTER kernel: [21095.468139]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 11 23:28:52 ROUTER kernel: [21095.468366]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
> Apr 11 23:28:52 ROUTER kernel: [21095.468591]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
> Apr 11 23:28:52 ROUTER kernel: [21095.468814]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
> Apr 11 23:28:52 ROUTER kernel: [21095.469042]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 11 23:28:52 ROUTER kernel: [21095.469267]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 11 23:28:52 ROUTER kernel: [21095.469491]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 11 23:28:52 ROUTER kernel: [21095.469714]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 11 23:28:52 ROUTER kernel: [21095.469936]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
> Apr 11 23:28:52 ROUTER kernel: [21095.470167]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20

Hi Denys !

txq->qdisc is NULL at this point, I dont think its even possible, so my
random guess is dev_pick_tx() got an queue_index >
dev->real_num_tx_queues


        txq = dev_pick_tx(dev, skb);
        q = rcu_dereference(txq->qdisc);  // q = NULL

#ifdef CONFIG_NET_CLS_ACT
        skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
#endif
        if (q->enqueue) {  // crash in dereference
                rc = __dev_xmit_skb(skb, q, dev, txq);
                goto out;
        }


I believe the following lines from dev_pick_tx() are not the problem :

	if (sk && sk->sk_dst_cache) 
		sk_tx_queue_set(sk, queue_index);

It is IMHO not safe, because route for this socket might have just
changed and we are transmitting an old packet (queued some milli seconds
before, when route was different).

We then memorize a queue_index that might be too big for the new device
of new selected route.

Next packet we want to transmit will take the cached value of
queue_index, correct for old device, maybe not correct for new device.

You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe

commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
Author: Krishna Kumar <krkumar2@in.ibm.com>
Date:   Mon Oct 19 23:50:07 2009 +0000

    net: Use sk_tx_queue_mapping for connected sockets
    
    For connected sockets, the first run of dev_pick_tx saves the
    calculated txq in sk_tx_queue_mapping. This is not saved if
    either the device has a queue select or the socket is not
    connected. Next iterations of dev_pick_tx uses the cached value
    of sk_tx_queue_mapping.
    
    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 28b0b9e..fa88dcd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
 static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					struct sk_buff *skb)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-	u16 queue_index = 0;
+	u16 queue_index;
+	struct sock *sk = skb->sk;
+
+	if (sk_tx_queue_recorded(sk)) {
+		queue_index = sk_tx_queue_get(sk);
+	} else {
+		const struct net_device_ops *ops = dev->netdev_ops;
 
-	if (ops->ndo_select_queue)
-		queue_index = ops->ndo_select_queue(dev, skb);
-	else if (dev->real_num_tx_queues > 1)
-		queue_index = skb_tx_hash(dev, skb);
+		if (ops->ndo_select_queue) {
+			queue_index = ops->ndo_select_queue(dev, skb);
+		} else {
+			queue_index = 0;
+			if (dev->real_num_tx_queues > 1)
+				queue_index = skb_tx_hash(dev, skb);
+
+			if (sk && sk->sk_dst_cache)
+				sk_tx_queue_set(sk, queue_index);
+		}
+	}
 
 	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);



^ permalink raw reply related

* Re: [PATCH 1/3] tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
From: David Miller @ 2010-04-11 22:00 UTC (permalink / raw)
  To: yinghai.lu; +Cc: herbert, linux-kernel, netdev, torvalds
In-Reply-To: <4BC23A1C.9020607@oracle.com>

From: Yinghai <yinghai.lu@oracle.com>
Date: Sun, 11 Apr 2010 14:07:40 -0700

> On 04/11/2010 05:15 AM, Herbert Xu wrote:
>> tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
>> 
>> This patch moves the common code between tcp_v4_send_check and
>> tcp_v4_gso_send_check into a new function __tcp_v4_send_check.
>> 
>> It then uses the new function in tcp_v4_send_synack so that it
>> handles CHECKSUM_PARTIAL properly.
>> 
> Good, three patches fix the problem.

Thanks for testing.

^ permalink raw reply

* Re: ssh server etc doesn't work anymore with net-2.6
From: David Miller @ 2010-04-11 21:43 UTC (permalink / raw)
  To: herbert; +Cc: yinghai, linux-kernel, netdev, torvalds
In-Reply-To: <20100411120403.GA20565@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 11 Apr 2010 20:04:03 +0800

> Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> 
>> After looking at the actual net-2.6 tree I see that it is actually
>> CHECKSUM_PARTIAL that caused this breakage.
>> 
>> The fact that when this was first implemented we didn't use hw
>> checksums on dataless packets might not have been an oversight
>> after all.
> 
> Looks like I was too quick to blame the hardware, the synack
> code can't handle CHECKSUM_PARTIAL so this is probably the real
> cause.
> 
> I will send patches to fix this.

Thanks a lot for figuring this out.

I'll put my original patch (with fixed commit log message!)
and your fixes into net-next-2.6

^ permalink raw reply

* Re: IGB handling of zero length checksumming?
From: David Miller @ 2010-04-11 21:42 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	alexander.h.duyck, peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <20100411.024027.120459168.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sun, 11 Apr 2010 02:40:27 -0700 (PDT)

> 
> If the IGB is given a "skb->ip_summed === CHECKSUM_PARTIAL" packet and
> the data area past the TCP header is of zero length, will it do the
> right thing?

Hey guys you don't have to worry about this.

The problem turned out to be somewhere else. :-)


^ permalink raw reply

* Re: [PATCH 1/3] tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
From: Yinghai @ 2010-04-11 21:07 UTC (permalink / raw)
  To: Herbert Xu, davem; +Cc: linux-kernel, netdev, torvalds
In-Reply-To: <E1O0w4v-0005UZ-GF@gondolin.me.apana.org.au>

On 04/11/2010 05:15 AM, Herbert Xu wrote:
> tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
> 
> This patch moves the common code between tcp_v4_send_check and
> tcp_v4_gso_send_check into a new function __tcp_v4_send_check.
> 
> It then uses the new function in tcp_v4_send_synack so that it
> handles CHECKSUM_PARTIAL properly.
> 
Good, three patches fix the problem.

Thanks

Yinghai

^ permalink raw reply

* NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Denys Fedorysychenko @ 2010-04-11 20:38 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: Text/Plain, Size: 386 bytes --]

Hi

Got recently NULL pointer dereference
Note - it seems in 32-bit i didn't experience this issue.

Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to 
802.1Q vlan's), userspace application using tun.
Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), 

full message received over netconsole attached
More info can provide on request

[-- Attachment #2: BUG --]
[-- Type: text/plain, Size: 7186 bytes --]

Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null)
Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 
Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP 
Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4         /        
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>]  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0  EFLAGS: 00010202
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00
Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80
Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000
Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS:  0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0
Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20)
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack:
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80
Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9
Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace:
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <IRQ> 
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <EOI>
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  RSP <ffff880028203db0>
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000
Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]---
Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt
Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G      D    2.6.33.2-build-0051-64 #4
Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace:
Apr 11 23:28:52 ROUTER kernel: [21095.464036]  <IRQ>  [<ffffffff81259743>] panic+0xa0/0x161
Apr 11 23:28:52 ROUTER kernel: [21095.464307]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
Apr 11 23:28:52 ROUTER kernel: [21095.464533]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
Apr 11 23:28:52 ROUTER kernel: [21095.464757]  [<ffffffff81006651>] oops_end+0xaa/0xba
Apr 11 23:28:52 ROUTER kernel: [21095.464980]  [<ffffffff8101e653>] no_context+0x1f3/0x202
Apr 11 23:28:52 ROUTER kernel: [21095.465211]  [<ffffffff810523a4>] ? tick_program_event+0x25/0x27
Apr 11 23:28:52 ROUTER kernel: [21095.465437]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
Apr 11 23:28:52 ROUTER kernel: [21095.465668]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
Apr 11 23:28:52 ROUTER kernel: [21095.465907]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466139]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
Apr 11 23:28:52 ROUTER kernel: [21095.466372]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466605]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466833]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
Apr 11 23:28:52 ROUTER kernel: [21095.467064]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
Apr 11 23:28:52 ROUTER kernel: [21095.467293]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
Apr 11 23:28:52 ROUTER kernel: [21095.467516]  [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.467743]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 11 23:28:52 ROUTER kernel: [21095.468139]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 11 23:28:52 ROUTER kernel: [21095.468366]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
Apr 11 23:28:52 ROUTER kernel: [21095.468591]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
Apr 11 23:28:52 ROUTER kernel: [21095.468814]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
Apr 11 23:28:52 ROUTER kernel: [21095.469042]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 11 23:28:52 ROUTER kernel: [21095.469267]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 11 23:28:52 ROUTER kernel: [21095.469491]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 11 23:28:52 ROUTER kernel: [21095.469714]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 11 23:28:52 ROUTER kernel: [21095.469936]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
Apr 11 23:28:52 ROUTER kernel: [21095.470167]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20

^ permalink raw reply

* Re: Linux 2.6.34-rc3 + CAN build problem
From: David Miller @ 2010-04-11 20:19 UTC (permalink / raw)
  To: socketcan
  Cc: nm127, eric.dumazet, oliver.hartkopp, urs.thuermann,
	socketcan-core, netdev, linux-kernel
In-Reply-To: <4BC21AC4.8060403@hartkopp.net>

From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Sun, 11 Apr 2010 20:53:56 +0200

> By applying the patch thankfully provided by Eric?

I'll probably do that, yes.

^ permalink raw reply

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
From: David Miller @ 2010-04-11 20:18 UTC (permalink / raw)
  To: cristiklein; +Cc: linux-kernel, netdev
In-Reply-To: <4BC218B6.8050703@gmail.com>

From: Cristian KLEIN <cristiklein@gmail.com>
Date: Sun, 11 Apr 2010 20:45:10 +0200

> Without this patch, an application which needs this behaviour
> (i.e. not to slow start after idle) is forced to implement its own
> UDP-based protocol with all the congestion control, retransmission
> etc. Undue congestion might still occur.

Ask your system administrator to set the existing sysctl, because it
is a physical network attribute whether this behavior is safe or not.

And if it is safe, it is safe for all applications, there is no reason
for one application to ask for it and others to not.  If it's legal,
it helps all applications without exception.

Your attempts to tie this to NAGLE is complete nonsense.

NAGLE changes acking behavior, whereas this feature controls in what
way we trust congestion control information we've probed for in the
past.

^ permalink raw reply

* Read Attached Mail From Miss. Gladys Duke
From: gla11 @ 2010-04-11 19:45 UTC (permalink / raw)
  To: Gladys Duke

[-- Attachment #1: Mail message body --]
[-- Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: From Miss.Gladys Duke.doc --]
[-- Type: application/msword, Size: 20992 bytes --]

^ permalink raw reply

* Re: Linux 2.6.34-rc3 + CAN build problem
From: Oliver Hartkopp @ 2010-04-11 18:53 UTC (permalink / raw)
  To: David Miller
  Cc: nm127-Y8qEzhMunLyT9ig0jae3mg, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	urs.thuermann-l29pVbxQd1IUtdQbppsyvg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	oliver.hartkopp-l29pVbxQd1IUtdQbppsyvg
In-Reply-To: <20100410.155009.66184583.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

David Miller wrote:
> From: Oliver Hartkopp <socketcan-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
> Date: Sat, 10 Apr 2010 14:36:48 +0200
> 
>> So i wonder why Nemeth trapped into this problem ... probably an include file
>> mix-up?
> 
> Do you have CONFIG_DEBUG_STRICT_USER_COPY_CHECKS enabled in your
> kernel config?
> 
> That's the only way you get an actual failure of a build when
> the user copy size can't be proven to be in range by the
> compiler, otherwise it just warns.

No, indeed i do not have it set.

# CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is not set

What would be the best approach to fix this build failure then?

By applying the patch thankfully provided by Eric?

Regards,
Oliver

^ permalink raw reply

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
From: Cristian KLEIN @ 2010-04-11 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, netdev
In-Reply-To: <20100410.154709.201110995.davem@davemloft.net>

On 11/04/2010 00:47, David Miller wrote:
> From: Cristian KLEIN<cristiklein@gmail.com>
> Date: Sat, 10 Apr 2010 14:09:03 +0200
>
>> Could you please explain me why it is dangerous? To me it seems that
>> it's just like allowing applications to disable NAGLE or to choose a
>> congestion control algorithm.
>
> Because you can cause undue congestion to other people on the network
> because you are believing path information that has been outdated and
> has not been validated by sending data for a certain amount of time.

I consider your argument an important concern, but I'm not quite 
convinced this patch is so bad.

An application which does not need this behaviour will continue to slow 
start after idle by default.

Without this patch, an application which needs this behaviour (i.e. not 
to slow start after idle) is forced to implement its own UDP-based 
protocol with all the congestion control, retransmission etc. Undue 
congestion might still occur.


If you don't agree with the above two points, would you consider 
accepting a patch with an allow_user_fast_start_after_idle sysctl?

Cristi.

^ permalink raw reply

* Re: [PATCH] net_sched: make traffic control network namespace aware
From: Patrick McHardy @ 2010-04-11 17:48 UTC (permalink / raw)
  To: Tom Goff; +Cc: netdev
In-Reply-To: <4BA7B13C.7020304@trash.net>

Patrick McHardy wrote:
> Patrick McHardy wrote:
>> Tom Goff wrote:
>>> Mostly minor changes to add a net argument to various functions and
>>> remove initial network namespace checks.
>>>
>>> Make /proc/net/psched per network namespace.
>> Looks fine from a qdisc POV. One thing that appears to be missing
>> though is teql master netdev registration in other than the initial
>> namespace.
> 
> Actually we could take this opportunity and add rtnl_link support
> for teql device registration. I can look into this in a couple of
> days.

I tried to do that, but adding proper netns support is more
complicated than I expected. sch_teql registers a qdisc for
each master device using the name of the master. Currently
qdisc registrations are global, so this doesn't work with
network namespaces.

We could of course make them per netns, but that would require
duplicating all global registrations for each namespace. I'm
not convinced that its worth doing this since its only teql
that needs it and it doesn't seem to be very useful to use
teql in a virtual environment.


^ permalink raw reply

* [RFC PATCH 8/9] net: ipmr: move mroute data into seperate structure
From: kaber @ 2010-04-11 17:37 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1271007435-20035-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netns/ipv4.h |   13 +--
 net/ipv4/ipmr.c          |  369 +++++++++++++++++++++++++---------------------
 2 files changed, 200 insertions(+), 182 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 5d06429..72e762a 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -59,18 +59,7 @@ struct netns_ipv4 {
 	atomic_t rt_genid;
 
 #ifdef CONFIG_IP_MROUTE
-	struct sock		*mroute_sk;
-	struct timer_list	ipmr_expire_timer;
-	struct list_head	mfc_unres_queue;
-	struct list_head	*mfc_cache_array;
-	struct vif_device	*vif_table;
-	int			maxvif;
-	atomic_t		cache_resolve_queue_len;
-	int			mroute_do_assert;
-	int			mroute_do_pim;
-#if defined(CONFIG_IP_PIMSM_V1) || defined(CONFIG_IP_PIMSM_V2)
-	int			mroute_reg_vif_num;
-#endif
+	struct mr_table		*mrt;
 #endif
 };
 #endif
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 6107790..b733a12 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -67,6 +67,21 @@
 #define CONFIG_IP_PIMSM	1
 #endif
 
+struct mr_table {
+	struct sock		*mroute_sk;
+	struct timer_list	ipmr_expire_timer;
+	struct list_head	mfc_unres_queue;
+	struct list_head	mfc_cache_array[MFC_LINES];
+	struct vif_device	vif_table[MAXVIFS];
+	int			maxvif;
+	atomic_t		cache_resolve_queue_len;
+	int			mroute_do_assert;
+	int			mroute_do_pim;
+#if defined(CONFIG_IP_PIMSM_V1) || defined(CONFIG_IP_PIMSM_V2)
+	int			mroute_reg_vif_num;
+#endif
+};
+
 /* Big lock, protecting vif table, mrt cache and mroute socket state.
    Note that the changes are semaphored via rtnl_lock.
  */
@@ -77,7 +92,7 @@ static DEFINE_RWLOCK(mrt_lock);
  *	Multicast router control variables
  */
 
-#define VIF_EXISTS(_net, _idx) ((_net)->ipv4.vif_table[_idx].dev != NULL)
+#define VIF_EXISTS(_mrt, _idx) ((_mrt)->vif_table[_idx].dev != NULL)
 
 /* Special spinlock for queue of unresolved entries */
 static DEFINE_SPINLOCK(mfc_unres_lock);
@@ -92,11 +107,12 @@ static DEFINE_SPINLOCK(mfc_unres_lock);
 
 static struct kmem_cache *mrt_cachep __read_mostly;
 
-static int ip_mr_forward(struct net *net, struct sk_buff *skb,
-			 struct mfc_cache *cache, int local);
-static int ipmr_cache_report(struct net *net,
+static int ip_mr_forward(struct net *net, struct mr_table *mrt,
+			 struct sk_buff *skb, struct mfc_cache *cache,
+			 int local);
+static int ipmr_cache_report(struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert);
-static int ipmr_fill_mroute(struct net *net, struct sk_buff *skb,
+static int ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 			    struct mfc_cache *c, struct rtmsg *rtm);
 
 /* Service routines creating virtual interfaces: DVMRP tunnels and PIMREG */
@@ -198,12 +214,12 @@ failure:
 static netdev_tx_t reg_vif_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct net *net = dev_net(dev);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	read_lock(&mrt_lock);
 	dev->stats.tx_bytes += skb->len;
 	dev->stats.tx_packets++;
-	ipmr_cache_report(net, skb, net->ipv4.mroute_reg_vif_num,
-			  IGMPMSG_WHOLEPKT);
+	ipmr_cache_report(mrt, skb, mrt->mroute_reg_vif_num, IGMPMSG_WHOLEPKT);
 	read_unlock(&mrt_lock);
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
@@ -273,17 +289,17 @@ failure:
  *	@notify: Set to 1, if the caller is a notifier_call
  */
 
-static int vif_delete(struct net *net, int vifi, int notify,
+static int vif_delete(struct mr_table *mrt, int vifi, int notify,
 		      struct list_head *head)
 {
 	struct vif_device *v;
 	struct net_device *dev;
 	struct in_device *in_dev;
 
-	if (vifi < 0 || vifi >= net->ipv4.maxvif)
+	if (vifi < 0 || vifi >= mrt->maxvif)
 		return -EADDRNOTAVAIL;
 
-	v = &net->ipv4.vif_table[vifi];
+	v = &mrt->vif_table[vifi];
 
 	write_lock_bh(&mrt_lock);
 	dev = v->dev;
@@ -295,17 +311,17 @@ static int vif_delete(struct net *net, int vifi, int notify,
 	}
 
 #ifdef CONFIG_IP_PIMSM
-	if (vifi == net->ipv4.mroute_reg_vif_num)
-		net->ipv4.mroute_reg_vif_num = -1;
+	if (vifi == mrt->mroute_reg_vif_num)
+		mrt->mroute_reg_vif_num = -1;
 #endif
 
-	if (vifi+1 == net->ipv4.maxvif) {
+	if (vifi+1 == mrt->maxvif) {
 		int tmp;
 		for (tmp=vifi-1; tmp>=0; tmp--) {
-			if (VIF_EXISTS(net, tmp))
+			if (VIF_EXISTS(mrt, tmp))
 				break;
 		}
-		net->ipv4.maxvif = tmp+1;
+		mrt->maxvif = tmp+1;
 	}
 
 	write_unlock_bh(&mrt_lock);
@@ -333,12 +349,13 @@ static inline void ipmr_cache_free(struct mfc_cache *c)
    and reporting error to netlink readers.
  */
 
-static void ipmr_destroy_unres(struct net *net, struct mfc_cache *c)
+static void ipmr_destroy_unres(struct mr_table *mrt, struct mfc_cache *c)
 {
+	struct net *net = NULL; //mrt->net;
 	struct sk_buff *skb;
 	struct nlmsgerr *e;
 
-	atomic_dec(&net->ipv4.cache_resolve_queue_len);
+	atomic_dec(&mrt->cache_resolve_queue_len);
 
 	while ((skb = skb_dequeue(&c->mfc_un.unres.unresolved))) {
 		if (ip_hdr(skb)->version == 0) {
@@ -363,23 +380,23 @@ static void ipmr_destroy_unres(struct net *net, struct mfc_cache *c)
 
 static void ipmr_expire_process(unsigned long arg)
 {
-	struct net *net = (struct net *)arg;
+	struct mr_table *mrt = (struct mr_table *)arg;
 	unsigned long now;
 	unsigned long expires;
 	struct mfc_cache *c, *next;
 
 	if (!spin_trylock(&mfc_unres_lock)) {
-		mod_timer(&net->ipv4.ipmr_expire_timer, jiffies+HZ/10);
+		mod_timer(&mrt->ipmr_expire_timer, jiffies+HZ/10);
 		return;
 	}
 
-	if (list_empty(&net->ipv4.mfc_unres_queue))
+	if (list_empty(&mrt->mfc_unres_queue))
 		goto out;
 
 	now = jiffies;
 	expires = 10*HZ;
 
-	list_for_each_entry_safe(c, next, &net->ipv4.mfc_unres_queue, list) {
+	list_for_each_entry_safe(c, next, &mrt->mfc_unres_queue, list) {
 		if (time_after(c->mfc_un.unres.expires, now)) {
 			unsigned long interval = c->mfc_un.unres.expires - now;
 			if (interval < expires)
@@ -388,11 +405,11 @@ static void ipmr_expire_process(unsigned long arg)
 		}
 
 		list_del(&c->list);
-		ipmr_destroy_unres(net, c);
+		ipmr_destroy_unres(mrt, c);
 	}
 
-	if (!list_empty(&net->ipv4.mfc_unres_queue))
-		mod_timer(&net->ipv4.ipmr_expire_timer, jiffies + expires);
+	if (!list_empty(&mrt->mfc_unres_queue))
+		mod_timer(&mrt->ipmr_expire_timer, jiffies + expires);
 
 out:
 	spin_unlock(&mfc_unres_lock);
@@ -400,7 +417,7 @@ out:
 
 /* Fill oifs list. It is called under write locked mrt_lock. */
 
-static void ipmr_update_thresholds(struct net *net, struct mfc_cache *cache,
+static void ipmr_update_thresholds(struct mr_table *mrt, struct mfc_cache *cache,
 				   unsigned char *ttls)
 {
 	int vifi;
@@ -409,8 +426,8 @@ static void ipmr_update_thresholds(struct net *net, struct mfc_cache *cache,
 	cache->mfc_un.res.maxvif = 0;
 	memset(cache->mfc_un.res.ttls, 255, MAXVIFS);
 
-	for (vifi = 0; vifi < net->ipv4.maxvif; vifi++) {
-		if (VIF_EXISTS(net, vifi) &&
+	for (vifi = 0; vifi < mrt->maxvif; vifi++) {
+		if (VIF_EXISTS(mrt, vifi) &&
 		    ttls[vifi] && ttls[vifi] < 255) {
 			cache->mfc_un.res.ttls[vifi] = ttls[vifi];
 			if (cache->mfc_un.res.minvif > vifi)
@@ -421,16 +438,17 @@ static void ipmr_update_thresholds(struct net *net, struct mfc_cache *cache,
 	}
 }
 
-static int vif_add(struct net *net, struct vifctl *vifc, int mrtsock)
+static int vif_add(struct net *net, struct mr_table *mrt,
+		   struct vifctl *vifc, int mrtsock)
 {
 	int vifi = vifc->vifc_vifi;
-	struct vif_device *v = &net->ipv4.vif_table[vifi];
+	struct vif_device *v = &mrt->vif_table[vifi];
 	struct net_device *dev;
 	struct in_device *in_dev;
 	int err;
 
 	/* Is vif busy ? */
-	if (VIF_EXISTS(net, vifi))
+	if (VIF_EXISTS(mrt, vifi))
 		return -EADDRINUSE;
 
 	switch (vifc->vifc_flags) {
@@ -440,7 +458,7 @@ static int vif_add(struct net *net, struct vifctl *vifc, int mrtsock)
 		 * Special Purpose VIF in PIM
 		 * All the packets will be sent to the daemon
 		 */
-		if (net->ipv4.mroute_reg_vif_num >= 0)
+		if (mrt->mroute_reg_vif_num >= 0)
 			return -EADDRINUSE;
 		dev = ipmr_reg_vif(net);
 		if (!dev)
@@ -518,22 +536,22 @@ static int vif_add(struct net *net, struct vifctl *vifc, int mrtsock)
 	v->dev = dev;
 #ifdef CONFIG_IP_PIMSM
 	if (v->flags&VIFF_REGISTER)
-		net->ipv4.mroute_reg_vif_num = vifi;
+		mrt->mroute_reg_vif_num = vifi;
 #endif
-	if (vifi+1 > net->ipv4.maxvif)
-		net->ipv4.maxvif = vifi+1;
+	if (vifi+1 > mrt->maxvif)
+		mrt->maxvif = vifi+1;
 	write_unlock_bh(&mrt_lock);
 	return 0;
 }
 
-static struct mfc_cache *ipmr_cache_find(struct net *net,
+static struct mfc_cache *ipmr_cache_find(struct mr_table *mrt,
 					 __be32 origin,
 					 __be32 mcastgrp)
 {
 	int line = MFC_HASH(mcastgrp, origin);
 	struct mfc_cache *c;
 
-	list_for_each_entry(c, &net->ipv4.mfc_cache_array[line], list) {
+	list_for_each_entry(c, &mrt->mfc_cache_array[line], list) {
 		if (c->mfc_origin == origin && c->mfc_mcastgrp == mcastgrp)
 			return c;
 	}
@@ -566,8 +584,8 @@ static struct mfc_cache *ipmr_cache_alloc_unres(void)
  *	A cache entry has gone into a resolved state from queued
  */
 
-static void ipmr_cache_resolve(struct net *net, struct mfc_cache *uc,
-			       struct mfc_cache *c)
+static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt,
+			       struct mfc_cache *uc, struct mfc_cache *c)
 {
 	struct sk_buff *skb;
 	struct nlmsgerr *e;
@@ -580,7 +598,7 @@ static void ipmr_cache_resolve(struct net *net, struct mfc_cache *uc,
 		if (ip_hdr(skb)->version == 0) {
 			struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct iphdr));
 
-			if (ipmr_fill_mroute(net, skb, c, NLMSG_DATA(nlh)) > 0) {
+			if (ipmr_fill_mroute(mrt, skb, c, NLMSG_DATA(nlh)) > 0) {
 				nlh->nlmsg_len = (skb_tail_pointer(skb) -
 						  (u8 *)nlh);
 			} else {
@@ -594,7 +612,7 @@ static void ipmr_cache_resolve(struct net *net, struct mfc_cache *uc,
 
 			rtnl_unicast(skb, net, NETLINK_CB(skb).pid);
 		} else
-			ip_mr_forward(net, skb, c, 0);
+			ip_mr_forward(net, mrt, skb, c, 0);
 	}
 }
 
@@ -605,7 +623,7 @@ static void ipmr_cache_resolve(struct net *net, struct mfc_cache *uc,
  *	Called under mrt_lock.
  */
 
-static int ipmr_cache_report(struct net *net,
+static int ipmr_cache_report(struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert)
 {
 	struct sk_buff *skb;
@@ -638,7 +656,7 @@ static int ipmr_cache_report(struct net *net,
 		memcpy(msg, skb_network_header(pkt), sizeof(struct iphdr));
 		msg->im_msgtype = IGMPMSG_WHOLEPKT;
 		msg->im_mbz = 0;
-		msg->im_vif = net->ipv4.mroute_reg_vif_num;
+		msg->im_vif = mrt->mroute_reg_vif_num;
 		ip_hdr(skb)->ihl = sizeof(struct iphdr) >> 2;
 		ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(pkt)->tot_len) +
 					     sizeof(struct iphdr));
@@ -670,7 +688,7 @@ static int ipmr_cache_report(struct net *net,
 	skb->transport_header = skb->network_header;
 	}
 
-	if (net->ipv4.mroute_sk == NULL) {
+	if (mrt->mroute_sk == NULL) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
@@ -678,7 +696,7 @@ static int ipmr_cache_report(struct net *net,
 	/*
 	 *	Deliver to mrouted
 	 */
-	ret = sock_queue_rcv_skb(net->ipv4.mroute_sk, skb);
+	ret = sock_queue_rcv_skb(mrt->mroute_sk, skb);
 	if (ret < 0) {
 		if (net_ratelimit())
 			printk(KERN_WARNING "mroute: pending queue full, dropping entries.\n");
@@ -693,7 +711,7 @@ static int ipmr_cache_report(struct net *net,
  */
 
 static int
-ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
+ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, struct sk_buff *skb)
 {
 	bool found = false;
 	int err;
@@ -701,7 +719,7 @@ ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
 	const struct iphdr *iph = ip_hdr(skb);
 
 	spin_lock_bh(&mfc_unres_lock);
-	list_for_each_entry(c, &net->ipv4.mfc_unres_queue, list) {
+	list_for_each_entry(c, &mrt->mfc_unres_queue, list) {
 		if (c->mfc_mcastgrp == iph->daddr &&
 		    c->mfc_origin == iph->saddr) {
 			found = true;
@@ -714,7 +732,7 @@ ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
 		 *	Create a new entry if allowable
 		 */
 
-		if (atomic_read(&net->ipv4.cache_resolve_queue_len) >= 10 ||
+		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
 		    (c = ipmr_cache_alloc_unres()) == NULL) {
 			spin_unlock_bh(&mfc_unres_lock);
 
@@ -732,7 +750,7 @@ ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
 		/*
 		 *	Reflect first query at mrouted.
 		 */
-		err = ipmr_cache_report(net, skb, vifi, IGMPMSG_NOCACHE);
+		err = ipmr_cache_report(mrt, skb, vifi, IGMPMSG_NOCACHE);
 		if (err < 0) {
 			/* If the report failed throw the cache entry
 			   out - Brad Parker
@@ -744,10 +762,10 @@ ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
 			return err;
 		}
 
-		atomic_inc(&net->ipv4.cache_resolve_queue_len);
-		list_add_tail(&c->list, &net->ipv4.mfc_unres_queue);
+		atomic_inc(&mrt->cache_resolve_queue_len);
+		list_add_tail(&c->list, &mrt->mfc_unres_queue);
 
-		mod_timer(&net->ipv4.ipmr_expire_timer, c->mfc_un.unres.expires);
+		mod_timer(&mrt->ipmr_expire_timer, c->mfc_un.unres.expires);
 	}
 
 	/*
@@ -769,14 +787,14 @@ ipmr_cache_unresolved(struct net *net, vifi_t vifi, struct sk_buff *skb)
  *	MFC cache manipulation by user space mroute daemon
  */
 
-static int ipmr_mfc_delete(struct net *net, struct mfcctl *mfc)
+static int ipmr_mfc_delete(struct mr_table *mrt, struct mfcctl *mfc)
 {
 	int line;
 	struct mfc_cache *c, *next;
 
 	line = MFC_HASH(mfc->mfcc_mcastgrp.s_addr, mfc->mfcc_origin.s_addr);
 
-	list_for_each_entry_safe(c, next, &net->ipv4.mfc_cache_array[line], list) {
+	list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[line], list) {
 		if (c->mfc_origin == mfc->mfcc_origin.s_addr &&
 		    c->mfc_mcastgrp == mfc->mfcc_mcastgrp.s_addr) {
 			write_lock_bh(&mrt_lock);
@@ -790,7 +808,8 @@ static int ipmr_mfc_delete(struct net *net, struct mfcctl *mfc)
 	return -ENOENT;
 }
 
-static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
+static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
+			struct mfcctl *mfc, int mrtsock)
 {
 	bool found = false;
 	int line;
@@ -801,7 +820,7 @@ static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
 
 	line = MFC_HASH(mfc->mfcc_mcastgrp.s_addr, mfc->mfcc_origin.s_addr);
 
-	list_for_each_entry(c, &net->ipv4.mfc_cache_array[line], list) {
+	list_for_each_entry(c, &mrt->mfc_cache_array[line], list) {
 		if (c->mfc_origin == mfc->mfcc_origin.s_addr &&
 		    c->mfc_mcastgrp == mfc->mfcc_mcastgrp.s_addr) {
 			found = true;
@@ -812,7 +831,7 @@ static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
 	if (found) {
 		write_lock_bh(&mrt_lock);
 		c->mfc_parent = mfc->mfcc_parent;
-		ipmr_update_thresholds(net, c, mfc->mfcc_ttls);
+		ipmr_update_thresholds(mrt, c, mfc->mfcc_ttls);
 		if (!mrtsock)
 			c->mfc_flags |= MFC_STATIC;
 		write_unlock_bh(&mrt_lock);
@@ -829,12 +848,12 @@ static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
 	c->mfc_origin = mfc->mfcc_origin.s_addr;
 	c->mfc_mcastgrp = mfc->mfcc_mcastgrp.s_addr;
 	c->mfc_parent = mfc->mfcc_parent;
-	ipmr_update_thresholds(net, c, mfc->mfcc_ttls);
+	ipmr_update_thresholds(mrt, c, mfc->mfcc_ttls);
 	if (!mrtsock)
 		c->mfc_flags |= MFC_STATIC;
 
 	write_lock_bh(&mrt_lock);
-	list_add_tail(&c->list, &net->ipv4.mfc_cache_array[line]);
+	list_add_tail(&c->list, &mrt->mfc_cache_array[line]);
 	write_unlock_bh(&mrt_lock);
 
 	/*
@@ -842,20 +861,20 @@ static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
 	 *	need to send on the frames and tidy up.
 	 */
 	spin_lock_bh(&mfc_unres_lock);
-	list_for_each_entry(uc, &net->ipv4.mfc_unres_queue, list) {
+	list_for_each_entry(uc, &mrt->mfc_unres_queue, list) {
 		if (uc->mfc_origin == c->mfc_origin &&
 		    uc->mfc_mcastgrp == c->mfc_mcastgrp) {
 			list_del(&uc->list);
-			atomic_dec(&net->ipv4.cache_resolve_queue_len);
+			atomic_dec(&mrt->cache_resolve_queue_len);
 			break;
 		}
 	}
-	if (list_empty(&net->ipv4.mfc_unres_queue))
-		del_timer(&net->ipv4.ipmr_expire_timer);
+	if (list_empty(&mrt->mfc_unres_queue))
+		del_timer(&mrt->ipmr_expire_timer);
 	spin_unlock_bh(&mfc_unres_lock);
 
 	if (uc) {
-		ipmr_cache_resolve(net, uc, c);
+		ipmr_cache_resolve(net, mrt, uc, c);
 		ipmr_cache_free(uc);
 	}
 	return 0;
@@ -865,7 +884,7 @@ static int ipmr_mfc_add(struct net *net, struct mfcctl *mfc, int mrtsock)
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct net *net)
+static void mroute_clean_tables(struct mr_table *mrt)
 {
 	int i;
 	LIST_HEAD(list);
@@ -874,9 +893,9 @@ static void mroute_clean_tables(struct net *net)
 	/*
 	 *	Shut down all active vif entries
 	 */
-	for (i = 0; i < net->ipv4.maxvif; i++) {
-		if (!(net->ipv4.vif_table[i].flags&VIFF_STATIC))
-			vif_delete(net, i, 0, &list);
+	for (i = 0; i < mrt->maxvif; i++) {
+		if (!(mrt->vif_table[i].flags&VIFF_STATIC))
+			vif_delete(mrt, i, 0, &list);
 	}
 	unregister_netdevice_many(&list);
 
@@ -884,7 +903,7 @@ static void mroute_clean_tables(struct net *net)
 	 *	Wipe the cache
 	 */
 	for (i = 0; i < MFC_LINES; i++) {
-		list_for_each_entry_safe(c, next, &net->ipv4.mfc_cache_array[i], list) {
+		list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
 			if (c->mfc_flags&MFC_STATIC)
 				continue;
 			write_lock_bh(&mrt_lock);
@@ -895,11 +914,11 @@ static void mroute_clean_tables(struct net *net)
 		}
 	}
 
-	if (atomic_read(&net->ipv4.cache_resolve_queue_len) != 0) {
+	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
 		spin_lock_bh(&mfc_unres_lock);
-		list_for_each_entry_safe(c, next, &net->ipv4.mfc_unres_queue, list) {
+		list_for_each_entry_safe(c, next, &mrt->mfc_unres_queue, list) {
 			list_del(&c->list);
-			ipmr_destroy_unres(net, c);
+			ipmr_destroy_unres(mrt, c);
 		}
 		spin_unlock_bh(&mfc_unres_lock);
 	}
@@ -908,16 +927,17 @@ static void mroute_clean_tables(struct net *net)
 static void mrtsock_destruct(struct sock *sk)
 {
 	struct net *net = sock_net(sk);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	rtnl_lock();
-	if (sk == net->ipv4.mroute_sk) {
+	if (sk == mrt->mroute_sk) {
 		IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
 
 		write_lock_bh(&mrt_lock);
-		net->ipv4.mroute_sk = NULL;
+		mrt->mroute_sk = NULL;
 		write_unlock_bh(&mrt_lock);
 
-		mroute_clean_tables(net);
+		mroute_clean_tables(mrt);
 	}
 	rtnl_unlock();
 }
@@ -935,9 +955,10 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 	struct vifctl vif;
 	struct mfcctl mfc;
 	struct net *net = sock_net(sk);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	if (optname != MRT_INIT) {
-		if (sk != net->ipv4.mroute_sk && !capable(CAP_NET_ADMIN))
+		if (sk != mrt->mroute_sk && !capable(CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
@@ -950,7 +971,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 			return -ENOPROTOOPT;
 
 		rtnl_lock();
-		if (net->ipv4.mroute_sk) {
+		if (mrt->mroute_sk) {
 			rtnl_unlock();
 			return -EADDRINUSE;
 		}
@@ -958,7 +979,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 		ret = ip_ra_control(sk, 1, mrtsock_destruct);
 		if (ret == 0) {
 			write_lock_bh(&mrt_lock);
-			net->ipv4.mroute_sk = sk;
+			mrt->mroute_sk = sk;
 			write_unlock_bh(&mrt_lock);
 
 			IPV4_DEVCONF_ALL(net, MC_FORWARDING)++;
@@ -966,7 +987,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 		rtnl_unlock();
 		return ret;
 	case MRT_DONE:
-		if (sk != net->ipv4.mroute_sk)
+		if (sk != mrt->mroute_sk)
 			return -EACCES;
 		return ip_ra_control(sk, 0, NULL);
 	case MRT_ADD_VIF:
@@ -979,9 +1000,9 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 			return -ENFILE;
 		rtnl_lock();
 		if (optname == MRT_ADD_VIF) {
-			ret = vif_add(net, &vif, sk == net->ipv4.mroute_sk);
+			ret = vif_add(net, mrt, &vif, sk == mrt->mroute_sk);
 		} else {
-			ret = vif_delete(net, vif.vifc_vifi, 0, NULL);
+			ret = vif_delete(mrt, vif.vifc_vifi, 0, NULL);
 		}
 		rtnl_unlock();
 		return ret;
@@ -998,9 +1019,9 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 			return -EFAULT;
 		rtnl_lock();
 		if (optname == MRT_DEL_MFC)
-			ret = ipmr_mfc_delete(net, &mfc);
+			ret = ipmr_mfc_delete(mrt, &mfc);
 		else
-			ret = ipmr_mfc_add(net, &mfc, sk == net->ipv4.mroute_sk);
+			ret = ipmr_mfc_add(net, mrt, &mfc, sk == mrt->mroute_sk);
 		rtnl_unlock();
 		return ret;
 		/*
@@ -1011,7 +1032,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 		int v;
 		if (get_user(v,(int __user *)optval))
 			return -EFAULT;
-		net->ipv4.mroute_do_assert = (v) ? 1 : 0;
+		mrt->mroute_do_assert = (v) ? 1 : 0;
 		return 0;
 	}
 #ifdef CONFIG_IP_PIMSM
@@ -1025,9 +1046,9 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 
 		rtnl_lock();
 		ret = 0;
-		if (v != net->ipv4.mroute_do_pim) {
-			net->ipv4.mroute_do_pim = v;
-			net->ipv4.mroute_do_assert = v;
+		if (v != mrt->mroute_do_pim) {
+			mrt->mroute_do_pim = v;
+			mrt->mroute_do_assert = v;
 		}
 		rtnl_unlock();
 		return ret;
@@ -1051,6 +1072,7 @@ int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int
 	int olr;
 	int val;
 	struct net *net = sock_net(sk);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	if (optname != MRT_VERSION &&
 #ifdef CONFIG_IP_PIMSM
@@ -1072,10 +1094,10 @@ int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int
 		val = 0x0305;
 #ifdef CONFIG_IP_PIMSM
 	else if (optname == MRT_PIM)
-		val = net->ipv4.mroute_do_pim;
+		val = mrt->mroute_do_pim;
 #endif
 	else
-		val = net->ipv4.mroute_do_assert;
+		val = mrt->mroute_do_assert;
 	if (copy_to_user(optval, &val, olr))
 		return -EFAULT;
 	return 0;
@@ -1092,16 +1114,17 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
 	struct vif_device *vif;
 	struct mfc_cache *c;
 	struct net *net = sock_net(sk);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	switch (cmd) {
 	case SIOCGETVIFCNT:
 		if (copy_from_user(&vr, arg, sizeof(vr)))
 			return -EFAULT;
-		if (vr.vifi >= net->ipv4.maxvif)
+		if (vr.vifi >= mrt->maxvif)
 			return -EINVAL;
 		read_lock(&mrt_lock);
-		vif = &net->ipv4.vif_table[vr.vifi];
-		if (VIF_EXISTS(net, vr.vifi)) {
+		vif = &mrt->vif_table[vr.vifi];
+		if (VIF_EXISTS(mrt, vr.vifi)) {
 			vr.icount = vif->pkt_in;
 			vr.ocount = vif->pkt_out;
 			vr.ibytes = vif->bytes_in;
@@ -1119,7 +1142,7 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
 			return -EFAULT;
 
 		read_lock(&mrt_lock);
-		c = ipmr_cache_find(net, sr.src.s_addr, sr.grp.s_addr);
+		c = ipmr_cache_find(mrt, sr.src.s_addr, sr.grp.s_addr);
 		if (c) {
 			sr.pktcnt = c->mfc_un.res.pkt;
 			sr.bytecnt = c->mfc_un.res.bytes;
@@ -1142,16 +1165,17 @@ static int ipmr_device_event(struct notifier_block *this, unsigned long event, v
 {
 	struct net_device *dev = ptr;
 	struct net *net = dev_net(dev);
+	struct mr_table *mrt = net->ipv4.mrt;
 	struct vif_device *v;
 	int ct;
 	LIST_HEAD(list);
 
 	if (event != NETDEV_UNREGISTER)
 		return NOTIFY_DONE;
-	v = &net->ipv4.vif_table[0];
-	for (ct = 0; ct < net->ipv4.maxvif; ct++, v++) {
+	v = &mrt->vif_table[0];
+	for (ct = 0; ct < mrt->maxvif; ct++, v++) {
 		if (v->dev == dev)
-			vif_delete(net, ct, 1, &list);
+			vif_delete(mrt, ct, 1, &list);
 	}
 	unregister_netdevice_many(&list);
 	return NOTIFY_DONE;
@@ -1210,11 +1234,11 @@ static inline int ipmr_forward_finish(struct sk_buff *skb)
  *	Processing handlers for ipmr_forward
  */
 
-static void ipmr_queue_xmit(struct net *net, struct sk_buff *skb,
-			    struct mfc_cache *c, int vifi)
+static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
+			    struct sk_buff *skb, struct mfc_cache *c, int vifi)
 {
 	const struct iphdr *iph = ip_hdr(skb);
-	struct vif_device *vif = &net->ipv4.vif_table[vifi];
+	struct vif_device *vif = &mrt->vif_table[vifi];
 	struct net_device *dev;
 	struct rtable *rt;
 	int    encap = 0;
@@ -1228,7 +1252,7 @@ static void ipmr_queue_xmit(struct net *net, struct sk_buff *skb,
 		vif->bytes_out += skb->len;
 		vif->dev->stats.tx_bytes += skb->len;
 		vif->dev->stats.tx_packets++;
-		ipmr_cache_report(net, skb, vifi, IGMPMSG_WHOLEPKT);
+		ipmr_cache_report(mrt, skb, vifi, IGMPMSG_WHOLEPKT);
 		goto out_free;
 	}
 #endif
@@ -1311,12 +1335,12 @@ out_free:
 	return;
 }
 
-static int ipmr_find_vif(struct net_device *dev)
+static int ipmr_find_vif(struct mr_table *mrt, struct net_device *dev)
 {
-	struct net *net = dev_net(dev);
 	int ct;
-	for (ct = net->ipv4.maxvif-1; ct >= 0; ct--) {
-		if (net->ipv4.vif_table[ct].dev == dev)
+
+	for (ct = mrt->maxvif-1; ct >= 0; ct--) {
+		if (mrt->vif_table[ct].dev == dev)
 			break;
 	}
 	return ct;
@@ -1324,8 +1348,9 @@ static int ipmr_find_vif(struct net_device *dev)
 
 /* "local" means that we should preserve one skb (for local delivery) */
 
-static int ip_mr_forward(struct net *net, struct sk_buff *skb,
-			 struct mfc_cache *cache, int local)
+static int ip_mr_forward(struct net *net, struct mr_table *mrt,
+			 struct sk_buff *skb, struct mfc_cache *cache,
+			 int local)
 {
 	int psend = -1;
 	int vif, ct;
@@ -1337,7 +1362,7 @@ static int ip_mr_forward(struct net *net, struct sk_buff *skb,
 	/*
 	 * Wrong interface: drop packet and (maybe) send PIM assert.
 	 */
-	if (net->ipv4.vif_table[vif].dev != skb->dev) {
+	if (mrt->vif_table[vif].dev != skb->dev) {
 		int true_vifi;
 
 		if (skb_rtable(skb)->fl.iif == 0) {
@@ -1356,26 +1381,26 @@ static int ip_mr_forward(struct net *net, struct sk_buff *skb,
 		}
 
 		cache->mfc_un.res.wrong_if++;
-		true_vifi = ipmr_find_vif(skb->dev);
+		true_vifi = ipmr_find_vif(mrt, skb->dev);
 
-		if (true_vifi >= 0 && net->ipv4.mroute_do_assert &&
+		if (true_vifi >= 0 && mrt->mroute_do_assert &&
 		    /* pimsm uses asserts, when switching from RPT to SPT,
 		       so that we cannot check that packet arrived on an oif.
 		       It is bad, but otherwise we would need to move pretty
 		       large chunk of pimd to kernel. Ough... --ANK
 		     */
-		    (net->ipv4.mroute_do_pim ||
+		    (mrt->mroute_do_pim ||
 		     cache->mfc_un.res.ttls[true_vifi] < 255) &&
 		    time_after(jiffies,
 			       cache->mfc_un.res.last_assert + MFC_ASSERT_THRESH)) {
 			cache->mfc_un.res.last_assert = jiffies;
-			ipmr_cache_report(net, skb, true_vifi, IGMPMSG_WRONGVIF);
+			ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
 		}
 		goto dont_forward;
 	}
 
-	net->ipv4.vif_table[vif].pkt_in++;
-	net->ipv4.vif_table[vif].bytes_in += skb->len;
+	mrt->vif_table[vif].pkt_in++;
+	mrt->vif_table[vif].bytes_in += skb->len;
 
 	/*
 	 *	Forward the frame
@@ -1385,7 +1410,8 @@ static int ip_mr_forward(struct net *net, struct sk_buff *skb,
 			if (psend != -1) {
 				struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 				if (skb2)
-					ipmr_queue_xmit(net, skb2, cache, psend);
+					ipmr_queue_xmit(net, mrt, skb2, cache,
+							psend);
 			}
 			psend = ct;
 		}
@@ -1394,9 +1420,9 @@ static int ip_mr_forward(struct net *net, struct sk_buff *skb,
 		if (local) {
 			struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 			if (skb2)
-				ipmr_queue_xmit(net, skb2, cache, psend);
+				ipmr_queue_xmit(net, mrt, skb2, cache, psend);
 		} else {
-			ipmr_queue_xmit(net, skb, cache, psend);
+			ipmr_queue_xmit(net, mrt, skb, cache, psend);
 			return 0;
 		}
 	}
@@ -1416,6 +1442,7 @@ int ip_mr_input(struct sk_buff *skb)
 {
 	struct mfc_cache *cache;
 	struct net *net = dev_net(skb->dev);
+	struct mr_table *mrt = net->ipv4.mrt;
 	int local = skb_rtable(skb)->rt_flags & RTCF_LOCAL;
 
 	/* Packet is looped back after forward, it should not be
@@ -1436,9 +1463,9 @@ int ip_mr_input(struct sk_buff *skb)
 			       that we can forward NO IGMP messages.
 			     */
 			    read_lock(&mrt_lock);
-			    if (net->ipv4.mroute_sk) {
+			    if (mrt->mroute_sk) {
 				    nf_reset(skb);
-				    raw_rcv(net->ipv4.mroute_sk, skb);
+				    raw_rcv(mrt->mroute_sk, skb);
 				    read_unlock(&mrt_lock);
 				    return 0;
 			    }
@@ -1447,7 +1474,7 @@ int ip_mr_input(struct sk_buff *skb)
 	}
 
 	read_lock(&mrt_lock);
-	cache = ipmr_cache_find(net, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr);
+	cache = ipmr_cache_find(mrt, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr);
 
 	/*
 	 *	No usable cache entry
@@ -1465,9 +1492,9 @@ int ip_mr_input(struct sk_buff *skb)
 			skb = skb2;
 		}
 
-		vif = ipmr_find_vif(skb->dev);
+		vif = ipmr_find_vif(mrt, skb->dev);
 		if (vif >= 0) {
-			int err = ipmr_cache_unresolved(net, vif, skb);
+			int err = ipmr_cache_unresolved(mrt, vif, skb);
 			read_unlock(&mrt_lock);
 
 			return err;
@@ -1477,7 +1504,7 @@ int ip_mr_input(struct sk_buff *skb)
 		return -ENODEV;
 	}
 
-	ip_mr_forward(net, skb, cache, local);
+	ip_mr_forward(net, mrt, skb, cache, local);
 
 	read_unlock(&mrt_lock);
 
@@ -1499,6 +1526,7 @@ static int __pim_rcv(struct sk_buff *skb, unsigned int pimlen)
 	struct net_device *reg_dev = NULL;
 	struct iphdr *encap;
 	struct net *net = dev_net(skb->dev);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	encap = (struct iphdr *)(skb_transport_header(skb) + pimlen);
 	/*
@@ -1513,8 +1541,8 @@ static int __pim_rcv(struct sk_buff *skb, unsigned int pimlen)
 		return 1;
 
 	read_lock(&mrt_lock);
-	if (net->ipv4.mroute_reg_vif_num >= 0)
-		reg_dev = net->ipv4.vif_table[net->ipv4.mroute_reg_vif_num].dev;
+	if (mrt->mroute_reg_vif_num >= 0)
+		reg_dev = mrt->vif_table[mrt->mroute_reg_vif_num].dev;
 	if (reg_dev)
 		dev_hold(reg_dev);
 	read_unlock(&mrt_lock);
@@ -1549,13 +1577,14 @@ int pim_rcv_v1(struct sk_buff * skb)
 {
 	struct igmphdr *pim;
 	struct net *net = dev_net(skb->dev);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(struct iphdr)))
 		goto drop;
 
 	pim = igmp_hdr(skb);
 
-	if (!net->ipv4.mroute_do_pim ||
+	if (!mrt->mroute_do_pim ||
 	    pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER)
 		goto drop;
 
@@ -1591,7 +1620,7 @@ drop:
 #endif
 
 static int
-ipmr_fill_mroute(struct net *net, struct sk_buff *skb, struct mfc_cache *c,
+ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, struct mfc_cache *c,
 		 struct rtmsg *rtm)
 {
 	int ct;
@@ -1603,19 +1632,19 @@ ipmr_fill_mroute(struct net *net, struct sk_buff *skb, struct mfc_cache *c,
 	if (c->mfc_parent > MAXVIFS)
 		return -ENOENT;
 
-	if (VIF_EXISTS(net, c->mfc_parent))
-		RTA_PUT(skb, RTA_IIF, 4, &net->ipv4.vif_table[c->mfc_parent].dev->ifindex);
+	if (VIF_EXISTS(mrt, c->mfc_parent))
+		RTA_PUT(skb, RTA_IIF, 4, &mrt->vif_table[c->mfc_parent].dev->ifindex);
 
 	mp_head = (struct rtattr *)skb_put(skb, RTA_LENGTH(0));
 
 	for (ct = c->mfc_un.res.minvif; ct < c->mfc_un.res.maxvif; ct++) {
-		if (VIF_EXISTS(net, ct) && c->mfc_un.res.ttls[ct] < 255) {
+		if (VIF_EXISTS(mrt, ct) && c->mfc_un.res.ttls[ct] < 255) {
 			if (skb_tailroom(skb) < RTA_ALIGN(RTA_ALIGN(sizeof(*nhp)) + 4))
 				goto rtattr_failure;
 			nhp = (struct rtnexthop *)skb_put(skb, RTA_ALIGN(sizeof(*nhp)));
 			nhp->rtnh_flags = 0;
 			nhp->rtnh_hops = c->mfc_un.res.ttls[ct];
-			nhp->rtnh_ifindex = net->ipv4.vif_table[ct].dev->ifindex;
+			nhp->rtnh_ifindex = mrt->vif_table[ct].dev->ifindex;
 			nhp->rtnh_len = sizeof(*nhp);
 		}
 	}
@@ -1633,11 +1662,12 @@ int ipmr_get_route(struct net *net,
 		   struct sk_buff *skb, struct rtmsg *rtm, int nowait)
 {
 	int err;
+	struct mr_table *mrt = net->ipv4.mrt;
 	struct mfc_cache *cache;
 	struct rtable *rt = skb_rtable(skb);
 
 	read_lock(&mrt_lock);
-	cache = ipmr_cache_find(net, rt->rt_src, rt->rt_dst);
+	cache = ipmr_cache_find(mrt, rt->rt_src, rt->rt_dst);
 
 	if (cache == NULL) {
 		struct sk_buff *skb2;
@@ -1651,7 +1681,7 @@ int ipmr_get_route(struct net *net,
 		}
 
 		dev = skb->dev;
-		if (dev == NULL || (vif = ipmr_find_vif(dev)) < 0) {
+		if (dev == NULL || (vif = ipmr_find_vif(mrt, dev)) < 0) {
 			read_unlock(&mrt_lock);
 			return -ENODEV;
 		}
@@ -1668,14 +1698,14 @@ int ipmr_get_route(struct net *net,
 		iph->saddr = rt->rt_src;
 		iph->daddr = rt->rt_dst;
 		iph->version = 0;
-		err = ipmr_cache_unresolved(net, vif, skb2);
+		err = ipmr_cache_unresolved(mrt, vif, skb2);
 		read_unlock(&mrt_lock);
 		return err;
 	}
 
 	if (!nowait && (rtm->rtm_flags&RTM_F_NOTIFY))
 		cache->mfc_flags |= MFC_NOTIFY;
-	err = ipmr_fill_mroute(net, skb, cache, rtm);
+	err = ipmr_fill_mroute(mrt, skb, cache, rtm);
 	read_unlock(&mrt_lock);
 	return err;
 }
@@ -1693,11 +1723,13 @@ static struct vif_device *ipmr_vif_seq_idx(struct net *net,
 					   struct ipmr_vif_iter *iter,
 					   loff_t pos)
 {
-	for (iter->ct = 0; iter->ct < net->ipv4.maxvif; ++iter->ct) {
-		if (!VIF_EXISTS(net, iter->ct))
+	struct mr_table *mrt = net->ipv4.mrt;
+
+	for (iter->ct = 0; iter->ct < mrt->maxvif; ++iter->ct) {
+		if (!VIF_EXISTS(mrt, iter->ct))
 			continue;
 		if (pos-- == 0)
-			return &net->ipv4.vif_table[iter->ct];
+			return &mrt->vif_table[iter->ct];
 	}
 	return NULL;
 }
@@ -1716,15 +1748,16 @@ static void *ipmr_vif_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ipmr_vif_iter *iter = seq->private;
 	struct net *net = seq_file_net(seq);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	++*pos;
 	if (v == SEQ_START_TOKEN)
 		return ipmr_vif_seq_idx(net, iter, 0);
 
-	while (++iter->ct < net->ipv4.maxvif) {
-		if (!VIF_EXISTS(net, iter->ct))
+	while (++iter->ct < mrt->maxvif) {
+		if (!VIF_EXISTS(mrt, iter->ct))
 			continue;
-		return &net->ipv4.vif_table[iter->ct];
+		return &mrt->vif_table[iter->ct];
 	}
 	return NULL;
 }
@@ -1738,6 +1771,7 @@ static void ipmr_vif_seq_stop(struct seq_file *seq, void *v)
 static int ipmr_vif_seq_show(struct seq_file *seq, void *v)
 {
 	struct net *net = seq_file_net(seq);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	if (v == SEQ_START_TOKEN) {
 		seq_puts(seq,
@@ -1748,7 +1782,7 @@ static int ipmr_vif_seq_show(struct seq_file *seq, void *v)
 
 		seq_printf(seq,
 			   "%2Zd %-10s %8ld %7ld  %8ld %7ld %05X %08X %08X\n",
-			   vif - net->ipv4.vif_table,
+			   vif - mrt->vif_table,
 			   name, vif->bytes_in, vif->pkt_in,
 			   vif->bytes_out, vif->pkt_out,
 			   vif->flags, vif->local, vif->remote);
@@ -1787,11 +1821,12 @@ struct ipmr_mfc_iter {
 static struct mfc_cache *ipmr_mfc_seq_idx(struct net *net,
 					  struct ipmr_mfc_iter *it, loff_t pos)
 {
+	struct mr_table *mrt = net->ipv4.mrt;
 	struct mfc_cache *mfc;
 
 	read_lock(&mrt_lock);
 	for (it->ct = 0; it->ct < MFC_LINES; it->ct++) {
-		it->cache = &net->ipv4.mfc_cache_array[it->ct];
+		it->cache = &mrt->mfc_cache_array[it->ct];
 		list_for_each_entry(mfc, it->cache, list)
 			if (pos-- == 0)
 				return mfc;
@@ -1799,7 +1834,7 @@ static struct mfc_cache *ipmr_mfc_seq_idx(struct net *net,
 	read_unlock(&mrt_lock);
 
 	spin_lock_bh(&mfc_unres_lock);
-	it->cache = &net->ipv4.mfc_unres_queue;
+	it->cache = &mrt->mfc_unres_queue;
 	list_for_each_entry(mfc, it->cache, list)
 		if (pos-- == 0)
 			return mfc;
@@ -1826,6 +1861,7 @@ static void *ipmr_mfc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	struct mfc_cache *mfc = v;
 	struct ipmr_mfc_iter *it = seq->private;
 	struct net *net = seq_file_net(seq);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	++*pos;
 
@@ -1835,13 +1871,13 @@ static void *ipmr_mfc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	if (mfc->list.next != it->cache)
 		return list_entry(mfc->list.next, struct mfc_cache, list);
 
-	if (it->cache == &net->ipv4.mfc_unres_queue)
+	if (it->cache == &mrt->mfc_unres_queue)
 		goto end_of_list;
 
-	BUG_ON(it->cache != &net->ipv4.mfc_cache_array[it->ct]);
+	BUG_ON(it->cache != &mrt->mfc_cache_array[it->ct]);
 
 	while (++it->ct < MFC_LINES) {
-		it->cache = &net->ipv4.mfc_cache_array[it->ct];
+		it->cache = &mrt->mfc_cache_array[it->ct];
 		if (list_empty(it->cache))
 			continue;
 		return list_first_entry(it->cache, struct mfc_cache, list);
@@ -1849,7 +1885,7 @@ static void *ipmr_mfc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 
 	/* exhausted cache_array, show unresolved */
 	read_unlock(&mrt_lock);
-	it->cache = &net->ipv4.mfc_unres_queue;
+	it->cache = &mrt->mfc_unres_queue;
 	it->ct = 0;
 
 	spin_lock_bh(&mfc_unres_lock);
@@ -1867,10 +1903,11 @@ static void ipmr_mfc_seq_stop(struct seq_file *seq, void *v)
 {
 	struct ipmr_mfc_iter *it = seq->private;
 	struct net *net = seq_file_net(seq);
+	struct mr_table *mrt = net->ipv4.mrt;
 
-	if (it->cache == &net->ipv4.mfc_unres_queue)
+	if (it->cache == &mrt->mfc_unres_queue)
 		spin_unlock_bh(&mfc_unres_lock);
-	else if (it->cache == &net->ipv4.mfc_cache_array[it->ct])
+	else if (it->cache == &mrt->mfc_cache_array[it->ct])
 		read_unlock(&mrt_lock);
 }
 
@@ -1878,6 +1915,7 @@ static int ipmr_mfc_seq_show(struct seq_file *seq, void *v)
 {
 	int n;
 	struct net *net = seq_file_net(seq);
+	struct mr_table *mrt = net->ipv4.mrt;
 
 	if (v == SEQ_START_TOKEN) {
 		seq_puts(seq,
@@ -1891,14 +1929,14 @@ static int ipmr_mfc_seq_show(struct seq_file *seq, void *v)
 			   (unsigned long) mfc->mfc_origin,
 			   mfc->mfc_parent);
 
-		if (it->cache != &net->ipv4.mfc_unres_queue) {
+		if (it->cache != &mrt->mfc_unres_queue) {
 			seq_printf(seq, " %8lu %8lu %8lu",
 				   mfc->mfc_un.res.pkt,
 				   mfc->mfc_un.res.bytes,
 				   mfc->mfc_un.res.wrong_if);
 			for (n = mfc->mfc_un.res.minvif;
 			     n < mfc->mfc_un.res.maxvif; n++ ) {
-				if (VIF_EXISTS(net, n) &&
+				if (VIF_EXISTS(mrt, n) &&
 				    mfc->mfc_un.res.ttls[n] < 255)
 					seq_printf(seq,
 					   " %2d:%-3d",
@@ -1950,35 +1988,27 @@ static const struct net_protocol pim_protocol = {
  */
 static int __net_init ipmr_net_init(struct net *net)
 {
+	struct mr_table *mrt;
 	unsigned int i;
 	int err = 0;
 
-	net->ipv4.vif_table = kcalloc(MAXVIFS, sizeof(struct vif_device),
-				      GFP_KERNEL);
-	if (!net->ipv4.vif_table) {
+	mrt = kzalloc(sizeof(*mrt), GFP_KERNEL);
+	if (mrt == NULL) {
 		err = -ENOMEM;
 		goto fail;
 	}
 
 	/* Forwarding cache */
-	net->ipv4.mfc_cache_array = kcalloc(MFC_LINES,
-					    sizeof(struct list_head),
-					    GFP_KERNEL);
-	if (!net->ipv4.mfc_cache_array) {
-		err = -ENOMEM;
-		goto fail_mfc_cache;
-	}
-
 	for (i = 0; i < MFC_LINES; i++)
-		INIT_LIST_HEAD(&net->ipv4.mfc_cache_array[i]);
+		INIT_LIST_HEAD(&mrt->mfc_cache_array[i]);
 
-	INIT_LIST_HEAD(&net->ipv4.mfc_unres_queue);
+	INIT_LIST_HEAD(&mrt->mfc_unres_queue);
 
-	setup_timer(&net->ipv4.ipmr_expire_timer, ipmr_expire_process,
+	setup_timer(&mrt->ipmr_expire_timer, ipmr_expire_process,
 		    (unsigned long)net);
 
 #ifdef CONFIG_IP_PIMSM
-	net->ipv4.mroute_reg_vif_num = -1;
+	mrt->mroute_reg_vif_num = -1;
 #endif
 
 #ifdef CONFIG_PROC_FS
@@ -1988,16 +2018,16 @@ static int __net_init ipmr_net_init(struct net *net)
 	if (!proc_net_fops_create(net, "ip_mr_cache", 0, &ipmr_mfc_fops))
 		goto proc_cache_fail;
 #endif
+
+	net->ipv4.mrt = mrt;
 	return 0;
 
 #ifdef CONFIG_PROC_FS
 proc_cache_fail:
 	proc_net_remove(net, "ip_mr_vif");
 proc_vif_fail:
-	kfree(net->ipv4.mfc_cache_array);
+	kfree(mrt);
 #endif
-fail_mfc_cache:
-	kfree(net->ipv4.vif_table);
 fail:
 	return err;
 }
@@ -2008,8 +2038,7 @@ static void __net_exit ipmr_net_exit(struct net *net)
 	proc_net_remove(net, "ip_mr_cache");
 	proc_net_remove(net, "ip_mr_vif");
 #endif
-	kfree(net->ipv4.mfc_cache_array);
-	kfree(net->ipv4.vif_table);
+	kfree(net->ipv4.mrt);
 }
 
 static struct pernet_operations ipmr_net_ops = {
-- 
1.7.0.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox