Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] usbnet/cdc_ncm: add missing .reset_resume hook
From: Stefan (metze) Metzmacher @ 2011-06-07  7:38 UTC (permalink / raw)
  To: Greg KH; +Cc: David Miller, oliver, linux-usb, netdev, linux-kernel
In-Reply-To: <20110606150655.GB3732@suse.de>

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

Am 06.06.2011 17:06, schrieb Greg KH:
> On Mon, Jun 06, 2011 at 02:23:16PM +0200, Stefan (metze) Metzmacher wrote:
>> Hi David,
>>
>>> From: Stefan Metzmacher <metze@samba.org>
>>> Date: Wed,  1 Jun 2011 14:01:41 +0200
>>>
>>>> This avoids messages like this after suspend:
>>>>
>>>>    cdc_ncm 2-1.4:1.6: no reset_resume for driver cdc_ncm?
>>>>    cdc_ncm 2-1.4:1.7: no reset_resume for driver cdc_ncm?
>>>>    cdc_ncm 2-1.4:1.6: usb0: unregister 'cdc_ncm' usb-0000:00:1d.0-1.4, CDC NCM
>>>>
>>>> This is important for the Ericsson F5521gw GSM/UMTS modem.
>>>> Otherwise modemmanager looses the fact that the cdc_ncm and cdc_acm devices
>>>> belong together.
>>>>
>>>> The cdc_ether module does the same.
>>>>
>>>> Signed-off-by: Stefan Metzmacher <metze@samba.org>
>>>
>>> Applied and queued up for -stable, thanks.
>>
>> It seems to be part of 3.0-rc2, but I'm not seeing it in any stable tree
>> yet...
>>
>> When can I expect it in stable trees like 2.6.38.y?
> 
> The .38.y tree is closed and will not have new releases, so you will
> never see it there, sorry.

Ok, are there chances for .39.y?

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH] net/ipv6: check for mistakenly passed in non-AF_INET6 sockaddrs
From: Reinhard Max @ 2011-06-07  7:25 UTC (permalink / raw)
  To: Marcus Meissner; +Cc: David Miller, netdev
In-Reply-To: <20110606160007.GD28535@suse.de>

On Mon, 6 Jun 2011 at 18:00, Marcus Meissner wrote:

> Same check as for IPv4, also do for IPv6.
> [...]
> +
> +	if (addr->sin6_family != AF_INET6)
> +		return -EINVAL;
> +

According to the POSIX manpage for bind(), the error code should be 
EAFNOSUPPORT ("The specified address is not a valid address for the 
address family of the specified socket"). This would also match the 
error code of connect() in the same situation.

And I think the family should be checked before the length for both, 
bind() and connect(), to get more descriptive error messages when 
passing an address of the wrong family.


cu
 	Reinhard

^ permalink raw reply

* Re: [PATCH] net: cpu offline cause napi stall
From: Eric Dumazet @ 2011-06-07  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: heiko.carstens, blaschka, netdev, linux-s390
In-Reply-To: <1307429403.2642.77.camel@edumazet-laptop>

Le mardi 07 juin 2011 à 08:50 +0200, Eric Dumazet a écrit :
> While doing my tests on bnx2x adapter, I found patch was working ok,
> but /proc/interrupts still increment interrupt count on my offlined
> cpu... go figure...

Oh well, "cat /proc/interrupts" skips offlined cpu, of course, thats a
false alarm.

I was doing "grep eth1 /proc/interrupts" so didnt catch this.




^ permalink raw reply

* [PATCH] net: cpu offline cause napi stall
From: Eric Dumazet @ 2011-06-07  6:50 UTC (permalink / raw)
  To: David Miller; +Cc: heiko.carstens, blaschka, netdev, linux-s390
In-Reply-To: <20110606.145051.267562411413352856.davem@davemloft.net>

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Frank Blaschka reported :
<quote>
  During heavy network load we turn off/on cpus.
  Sometimes this causes a stall on the network device.
  Digging into the dump I found out following:

  napi is scheduled but does not run. From the I/O buffers
  and the napi state I see napi/rx_softirq processing has stopped
  because the budget was reached. napi stays in the
  softnet_data poll_list and the rx_softirq was raised again.

  I assume at this time the cpu offline comes in,
  the rx softirq is raised/moved to another cpu but napi stays in the
  poll_list of the softnet_data of the now offline cpu.

  Reviewing dev_cpu_callback (net/core/dev.c) I did not find the
  poll_list is transfered to the new cpu.
</quote>

This patch is a straightforward implementation of Frank suggestion :

Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
---
While doing my tests on bnx2x adapter, I found patch was working ok,
but /proc/interrupts still increment interrupt count on my offlined
cpu... go figure...

 net/core/dev.c |    5 +++++
 1 files changed, 5 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9393078..095909c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6178,6 +6178,11 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 		oldsd->output_queue = NULL;
 		oldsd->output_queue_tailp = &oldsd->output_queue;
 	}
+	/* Append NAPI poll list from offline CPU. */
+	if (!list_empty(&oldsd->poll_list)) {
+		list_splice_init(&oldsd->poll_list, &sd->poll_list);
+		raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	}
 
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();



^ permalink raw reply related

* Re: [PATCH net-next] net: remove interrupt.h inclusion from netdevice.h
From: David Miller @ 2011-06-07  5:55 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev
In-Reply-To: <20110606204346.GA24175@p183.telecom.by>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Mon, 6 Jun 2011 23:43:46 +0300

> * remove interrupt.g inclusion from netdevice.h -- not needed
> * fixup fallout, add interrupt.h and hardirq.h back where needed.
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Thanks for doing this, patch applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] be2net: Fix Rx pause counter for lancer
From: David Miller @ 2011-06-07  5:54 UTC (permalink / raw)
  To: padmanabh.ratnakar; +Cc: netdev, selvin.xavier
In-Reply-To: <78aa0be5-ee8d-4d65-88aa-6ab858bb7259@exht1.ad.emulex.com>

From: Padmanabh Ratnakar <padmanabh.ratnakar@Emulex.Com>
Date: Mon, 6 Jun 2011 17:57:13 +0530

> From: Selvin Xavier <selvin.xavier@emulex.com>
> 
> Fixed Rx pause counter for Lancer. Swapping hi and lo words.
> 
> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] qlge: remove unecessary if statement
From: David Miller @ 2011-06-07  5:53 UTC (permalink / raw)
  To: Gregory.Dietsche; +Cc: ron.mercer, linux-driver, netdev, linux-kernel
In-Reply-To: <1307321053-21695-1-git-send-email-Gregory.Dietsche@cuw.edu>

From: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Date: Sun,  5 Jun 2011 19:44:13 -0500

> the code always returns 'status' regardless, so if(status) check is unecessary.
> 
> Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>

Applied, thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] enic: Add support for MTU change via port profile on a dynamic vnic
From: David Miller @ 2011-06-07  5:48 UTC (permalink / raw)
  To: roprabhu; +Cc: netdev
In-Reply-To: <20110604003517.21165.45654.stgit@savbu-pc100.cisco.com>

From: Roopa Prabhu <roprabhu@cisco.com>
Date: Fri, 03 Jun 2011 17:35:17 -0700

> From: Roopa Prabhu <roprabhu@cisco.com>
> 
> enic driver gets MTU change notifications for MTU changes in the
> port profile associated to a dynamic vnic. This patch adds support
> in enic driver to set new MTU on the dynamic vnic and dynamically
> adjust its buffers with new MTU size in response to such notifications.
> 
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>
> Signed-off-by: Christian Benvenuti <benve@cisco.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] af_packet: prevent information leak
From: David Miller @ 2011-06-07  5:42 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, kaber
In-Reply-To: <1307425119.2642.63.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 07:38:39 +0200

> Le lundi 06 juin 2011 à 22:24 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 07 Jun 2011 07:02:19 +0200
>> 
>> > In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
>> > added a small information leak.
>> > 
>> > Add padding field and make sure its zeroed before copy to user.
>> > 
>> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> > CC: Patrick McHardy <kaber@trash.net>
>> 
>> I fear this will change the size of these structures on some weird
>> architecture.  Doesn't ARM, for example, have weird rules
>> wrt. alignment and structure sizing when "smaller than word" elements
>> are involved?
>> 
>> That's why we need __packed on:
>> 
>> struct nd_opt_hdr {
>> 	__u8		nd_opt_type;
>> 	__u8		nd_opt_len;
>> } __packed;
>> 
>> for example.
>> 
>> Probably safe to just do a memset of the tail, and the constant length
>> will evaluate to zero on these weird platforms.  On others, where the
>> padding does matter, the memset will emit the same code as your new
>> assignments do.
> 
> It should not matter for these structures, or 393e52e33c6c2 would have
> break applications on all arches (not only ARM), since it enlarged them
> without any warning ;)

Indeed.  Ok, I'm convinced.

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07  5:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <1307425119.2642.63.camel@edumazet-laptop>

Le mardi 07 juin 2011 à 07:38 +0200, Eric Dumazet a écrit :

> It should not matter for these structures, or 393e52e33c6c2 would have
> break applications on all arches (not only ARM), since it enlarged them
> without any warning ;)
> 
> 
> 

BTW, it means that next time we want to add a field in these structure,
we cant use the padding space, or else an application cant know for sure
if the kernel put a padding (say on linux-3.0) or the new field
(linux-3.8)




^ permalink raw reply

* Re: [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07  5:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20110606.222416.1114593259198630297.davem@davemloft.net>

Le lundi 06 juin 2011 à 22:24 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 07 Jun 2011 07:02:19 +0200
> 
> > In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
> > added a small information leak.
> > 
> > Add padding field and make sure its zeroed before copy to user.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Patrick McHardy <kaber@trash.net>
> 
> I fear this will change the size of these structures on some weird
> architecture.  Doesn't ARM, for example, have weird rules
> wrt. alignment and structure sizing when "smaller than word" elements
> are involved?
> 
> That's why we need __packed on:
> 
> struct nd_opt_hdr {
> 	__u8		nd_opt_type;
> 	__u8		nd_opt_len;
> } __packed;
> 
> for example.
> 
> Probably safe to just do a memset of the tail, and the constant length
> will evaluate to zero on these weird platforms.  On others, where the
> padding does matter, the memset will emit the same code as your new
> assignments do.

It should not matter for these structures, or 393e52e33c6c2 would have
break applications on all arches (not only ARM), since it enlarged them
without any warning ;)





^ permalink raw reply

* Re: [PATCH] af_packet: prevent information leak
From: David Miller @ 2011-06-07  5:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, kaber
In-Reply-To: <1307422939.2642.57.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 07:02:19 +0200

> In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
> added a small information leak.
> 
> Add padding field and make sure its zeroed before copy to user.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>

I fear this will change the size of these structures on some weird
architecture.  Doesn't ARM, for example, have weird rules
wrt. alignment and structure sizing when "smaller than word" elements
are involved?

That's why we need __packed on:

struct nd_opt_hdr {
	__u8		nd_opt_type;
	__u8		nd_opt_len;
} __packed;

for example.

Probably safe to just do a memset of the tail, and the constant length
will evaluate to zero on these weird platforms.  On others, where the
padding does matter, the memset will emit the same code as your new
assignments do.

^ permalink raw reply

* Re: Fw: [PATCH] e100: Fix inconsistency in bad frames handling
From: Eric Dumazet @ 2011-06-07  5:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, e1000-devel, Andrea Merello
In-Reply-To: <20110606214454.4108aff4.akpm@linux-foundation.org>

Le lundi 06 juin 2011 à 21:44 -0700, Andrew Morton a écrit :
> 
> Begin forwarded message:
> 
> Date: Sun, 5 Jun 2011 03:14:49 +0200
> From: Andrea Merello <andrea.merello@gmail.com>
> To: linux-kernel@vger.kernel.org
> Subject: [PATCH] e100: Fix inconsistency in bad frames handling
> 
> 
> Hello!
> 
> In e100 driver it seems that the intention was to accept bad frames in
> promiscuous mode and loopback mode.
> I think this is evident because of the following code in the driver:
> 
> if (nic->flags & promiscuous || nic->loopback) {
>                config->rx_save_bad_frames = 0x1;       /* 1=save, 0=discard */
>                config->rx_discard_short_frames = 0x0;  /* 1=discard, 0=save */
>                config->promiscuous_mode = 0x1;         /* 1=on, 0=off */
>        }
> 
> 
> However this intention is not really realized because bad frames are
> discarded later by SW check.
> This patch finally honors the above intention, making the RX code to
> let bad frames to pass when the NIC is in promiscuous or loopback
> mode.
> 
> This helped me a lot to debug an FPGA ethernet core.
> Maybe it can be also useful to someone else..
> 
> Thanks
> Andrea
> 
>  --- drivers/net/e100_orig.c     2011-06-14 23:29:38.322267075 +0200
> +++ drivers/net/e100.c  2011-06-14 23:34:10.700791472 +0200
> @@ -1975,7 +1975,8 @@ static int e100_rx_indicate(struct nic *
>        skb_put(skb, actual_size);
>        skb->protocol = eth_type_trans(skb, nic->netdev);
> 
> -       if (unlikely(!(rfd_status & cb_ok))) {
> +       if (unlikely(!(nic->flags & promiscuous || nic->loopback) &&
> +           !(rfd_status & cb_ok))) {
>                /* Don't indicate if hardware indicates errors */
>                dev_kfree_skb_any(skb);
>        } else if (actual_size > ETH_DATA_LEN + VLAN_ETH_HLEN) {


Thanks Andrew, subject already opened on netdev :)

http://www.spinics.net/lists/netdev/msg166301.html

Lets close this thread and continue on previous one ?




^ permalink raw reply

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
From: Eric Dumazet @ 2011-06-07  5:06 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, Herbert Xu, netdev
In-Reply-To: <20110606085247.GE31505@secunet.com>

Le lundi 06 juin 2011 à 10:52 +0200, Steffen Klassert a écrit :
> On Mon, Jun 06, 2011 at 09:38:19AM +0200, Eric Dumazet wrote:
> > 
> > Woh, I am afraid I wont have time in following days to check your
> > assertion.
> 
> My test setup was the following:
> 
> I use an IPsec tunnel with tunnel endpoints 192.168.1.1 and 192.168.1.2
> 
> Then I do at 192.168.1.2
> 
> ping -c1 -M do -s 1410 192.168.1.1
> 
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1438)
> 
> --- 192.168.1.1 ping statistics ---
> 0 packets transmitted, 0 received, +1 errors
> 
> So the packet matches the mtu but it is not send.
> I used a kernel with your patch as head commit.
> 
> Reverting your patch (going one commit deeper in the history):
> 
> ping -c1 -M do -s 1410 192.168.1.1
> 
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> 1418 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=3.01 ms
> 
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 3.014/3.014/3.014/0.000 ms
> 
> > 
> > What about original problem then, how should we fix it ?
> > 
> 
> Hm, I don't know. I'll try to reproduce it here.
> 
> > We do have some cases where at least one fragment (the last one) is
> > oversized.
> 
> trailer_len is used only on IPsec so the poroblem exists only when
> using IPsec, right?
> 
> > 
> > I remember I used Nick Bowler scripts at that time, I might find them
> > again...
> 
> Would be nice if you could provide these scripts and some informations
> on how to reproduce the problem.
> 

Nick mail was :

http://www.spinics.net/lists/netdev/msg141308.html

Unfortunatly I could not find on my machines where I put my own
scripts...

Not a big deal, I suspect we can revert my commit if you say it added a
regression :)

Thanks



^ permalink raw reply

* [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07  5:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick McHardy

In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.

Add padding field and make sure its zeroed before copy to user.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
 include/linux/if_packet.h |    2 ++
 net/packet/af_packet.c    |    2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 6d66ce1..7b31863 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -62,6 +62,7 @@ struct tpacket_auxdata {
 	__u16		tp_mac;
 	__u16		tp_net;
 	__u16		tp_vlan_tci;
+	__u16		tp_padding;
 };
 
 /* Rx ring - header status */
@@ -101,6 +102,7 @@ struct tpacket2_hdr {
 	__u32		tp_sec;
 	__u32		tp_nsec;
 	__u16		tp_vlan_tci;
+	__u16		tp_padding;
 };
 
 #define TPACKET2_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ba248d9..c0c3cda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -804,6 +804,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 		} else {
 			h.h2->tp_vlan_tci = 0;
 		}
+		h.h2->tp_padding = 0;
 		hdrlen = sizeof(*h.h2);
 		break;
 	default:
@@ -1736,6 +1737,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 		} else {
 			aux.tp_vlan_tci = 0;
 		}
+		aux.tp_padding = 0;
 		put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
 	}
 



^ permalink raw reply related

* Fw: [PATCH] e100: Fix inconsistency in bad frames handling
From: Andrew Morton @ 2011-06-07  4:44 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: Andrea Merello



Begin forwarded message:

Date: Sun, 5 Jun 2011 03:14:49 +0200
From: Andrea Merello <andrea.merello@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: [PATCH] e100: Fix inconsistency in bad frames handling


Hello!

In e100 driver it seems that the intention was to accept bad frames in
promiscuous mode and loopback mode.
I think this is evident because of the following code in the driver:

if (nic->flags & promiscuous || nic->loopback) {
               config->rx_save_bad_frames = 0x1;       /* 1=save, 0=discard */
               config->rx_discard_short_frames = 0x0;  /* 1=discard, 0=save */
               config->promiscuous_mode = 0x1;         /* 1=on, 0=off */
       }


However this intention is not really realized because bad frames are
discarded later by SW check.
This patch finally honors the above intention, making the RX code to
let bad frames to pass when the NIC is in promiscuous or loopback
mode.

This helped me a lot to debug an FPGA ethernet core.
Maybe it can be also useful to someone else..

Thanks
Andrea

 --- drivers/net/e100_orig.c     2011-06-14 23:29:38.322267075 +0200
+++ drivers/net/e100.c  2011-06-14 23:34:10.700791472 +0200
@@ -1975,7 +1975,8 @@ static int e100_rx_indicate(struct nic *
       skb_put(skb, actual_size);
       skb->protocol = eth_type_trans(skb, nic->netdev);

-       if (unlikely(!(rfd_status & cb_ok))) {
+       if (unlikely(!(nic->flags & promiscuous || nic->loopback) &&
+           !(rfd_status & cb_ok))) {
               /* Don't indicate if hardware indicates errors */
               dev_kfree_skb_any(skb);
       } else if (actual_size > ETH_DATA_LEN + VLAN_ETH_HLEN) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: David Miller @ 2011-06-07  3:38 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lucian.grijincu, netdev
In-Reply-To: <1307416765.2642.37.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 05:19:25 +0200

> Le mardi 07 juin 2011 à 04:39 +0300, Lucian Adrian Grijincu a écrit :
>> The most like case is that no one else is registering devices with a
>> name like "dummy%d".
>> 
>> We can bring the complexity down by replacing:
>> - alloc_netdev_id which is O(N) with
>> - alloc_netdev_id which, on the average case, is O(1).
>> 
>> $ time modprobe dummy numdummies=5000
>> - with alloc_netdev   : 9.50s
>> - with alloc_netdev_id: 3.50s
>> 
>> NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
>>       current O(N^2) sysctl algorithm with a better one.
> 
> Yes, and disabled hotplug I guess.
> 
> Dont try this on a random computer ;)
> 
> # time modprobe dummy numdummies=5000
> 
> real	4m45.646s
> user	0m0.000s
> sys	0m12.440s
> # uptime
>  05:13:46 up 13:30,  3 users,  load average: 11221.41, 6918.70, 3101.12

ROFL

^ permalink raw reply

* Re: KVM induced panic on 2.6.38[2367] & 2.6.39
From: Brad Campbell @ 2011-06-07  3:33 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: kvm, linux-mm, linux-kernel, netdev, netfilter-devel
In-Reply-To: <4DED344D.7000005@pandora.be>

On 07/06/11 04:10, Bart De Schuymer wrote:
> Hi Brad,
>
> This has probably nothing to do with ebtables, so please rmmod in case
> it's loaded.
> A few questions I didn't directly see an answer to in the threads I
> scanned...
> I'm assuming you actually use the bridging firewall functionality. So,
> what iptables modules do you use? Can you reduce your iptables rules to
> a core that triggers the bug?
> Or does it get triggered even with an empty set of firewall rules?
> Are you using a stock .35 kernel or is it patched?
> Is this something I can trigger on a poor guy's laptop or does it
> require specialized hardware (I'm catching up on qemu/kvm...)?

Not specialised hardware as such, I've just not been able to reproduce 
it outside of this specific operating scenario.

I can't trigger it with empty firewall rules as it relies on a DNAT to 
occur. If I try it directly to the internal IP address (as I have to 
without netfilter loaded) then of course nothing fails.

It's a pain in the bum as a fault, but it's one I can easily reproduce 
as long as I use the same set of circumstances.

I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it 
on that then I'll attempt to pare down the IPTABLES rules to a bare minimum.

It is nothing to do with ebtables as I don't compile it. I'm not really 
sure about "bridging firewall" functionality. I just use a couple of 
hand coded bash scripts to set the tables up.

brad@srv:~$ lsmod
Module                  Size  Used by
xt_iprange              1637  1
xt_DSCP                 2077  2
xt_length               1216  1
xt_CLASSIFY             1091  26
sch_sfq                 6681  4
xt_CHECKSUM             1229  2 brad@srv:~$ lsmod
Module                  Size  Used by
xt_iprange              1637  1
xt_DSCP                 2077  2
xt_length               1216  1
xt_CLASSIFY             1091  26
sch_sfq                 6681  4
xt_CHECKSUM             1229  2
ipt_REJECT              2277  1
ipt_MASQUERADE          1759  7
ipt_REDIRECT            1133  1
xt_recent               8223  2
xt_state                1226  5
iptable_nat             3993  1
nf_nat                 16773  3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat
nf_conntrack_ipv4      11868  8 iptable_nat,nf_nat
nf_conntrack           60962  5 
ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4          1417  1 nf_conntrack_ipv4
xt_TCPMSS               2567  2
xt_tcpmss               1469  0
xt_tcpudp               2467  56
iptable_mangle          1487  1
pppoe                   9574  2
pppox                   2188  1 pppoe
iptable_filter          1442  1
ip_tables              16762  3 iptable_nat,iptable_mangle,iptable_filter
x_tables               20462  17 
xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables
ppp_generic            24243  6 pppoe,pppox
slhc                    5293  1 ppp_generic
cls_u32                 6468  6
sch_htb                14432  2
deflate                 1937  0
zlib_deflate           21228  1 deflate
des_generic            16135  0
cbc                     2721  0
ecb                     1975  0
crypto_blkcipher       13645  2 cbc,ecb
sha1_generic            2095  0
md5                     4001  0
hmac                    2977  0
crypto_hash            14519  3 sha1_generic,md5,hmac
cryptomgr               2636  0
aead                    6137  1 cryptomgr
crypto_algapi          15289  9 
deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead
af_key                 27372  0
fuse                   66747  1
w83627ehf              32052  0
hwmon_vid               2867  1 w83627ehf
vhost_net              16802  6
powernow_k8            12932  0
mperf                   1263  1 powernow_k8
kvm_amd                53431  24
kvm                   235155  1 kvm_amd
pl2303                 12732  1
xhci_hcd               62865  0
i2c_piix4               8391  0
k10temp                 3183  0
usbserial              34452  3 pl2303
usb_storage            37887  1
usb_libusual           10999  1 usb_storage
ohci_hcd               18105  0
ehci_hcd               33641  0
ahci                   20748  4
usbcore               130936  7 
pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd
libahci                21202  1 ahci
sata_mv                26939  0
megaraid_sas           71659  14

Nat Table (external ip substituted for xxx.xxx.xxx.xxx)

Chain PREROUTING (policy ACCEPT 1761K packets, 152M bytes)
  pkts bytes target     prot opt in     out     source 
destination
     5   210 DNAT       udp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           udp dpt:1195 to:192.168.253.199
     6   252 DNAT       udp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       udp dpt:1195 to:192.168.253.199
     0     0 DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:25001 to:192.168.253.199:465
     0     0 DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:25000 to:192.168.253.199:993
     0     0 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:25001 to:192.168.253.199:465
     0     0 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:25000 to:192.168.253.199:993
     2   142 DNAT       47   --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           to:192.168.253.199
    18   880 DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:1723 to:192.168.253.199
     0     0 DNAT       47   --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       to:192.168.253.199
     0     0 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:1723 to:192.168.253.199
  2969  149K DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:443 to:192.168.253.198
    20  1280 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:443 to:192.168.253.198
     0     0 DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:3101 to:192.168.253.197
     0     0 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:3101 to:192.168.253.197
     0     0 DNAT       tcp  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       tcp dpt:4101 to:192.168.253.197
44528 2718K REDIRECT   tcp  --  !ppp0  *       0.0.0.0/0 
!192.168.0.0/16      tcp dpt:80 redir ports 8080
     0     0 DNAT       tcp  --  *      *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:3724 to:192.168.2.107
  596K   33M DNAT       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpts:2001:2030 to:10.99.99.2
1420K  119M DNAT       udp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           udp dpts:2001:2030 to:10.99.99.2
  7483  449K DNAT       all  --  !ppp0  *       0.0.0.0/0 
xxx.xxx.xxx.xxx       to:192.168.2.1


Mangle Table

Chain INPUT (policy ACCEPT 270K packets, 17M bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain OUTPUT (policy ACCEPT 170K packets, 12M bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 2205K packets, 166M bytes)
  pkts bytes target     prot opt in     out     source 
destination
     0     0 MASQUERADE  all  --  *      *       0.0.0.0/0 
192.168.254.3
     6   360 ACCEPT     all  --  *      *       0.0.0.0/0 
xxx.xxx.xxx.xxx
20424 2120K MASQUERADE  all  --  *      ppp0    192.168.0.0/16 
!192.168.0.0/16
     0     0 MASQUERADE  all  --  *      ppp0    10.0.0.0/24 
0.0.0.0/0
     3   204 MASQUERADE  all  --  *      *       192.168.2.0/24 
10.8.0.0/24
1418K  128M MASQUERADE  all  --  *      *       10.99.99.0/24 
0.0.0.0/0
68248 4095K MASQUERADE  all  --  *      *       192.168.253.0/24 
10.8.0.0/16
13305 2405K MASQUERADE  all  --  *      *       192.168.253.0/24 
!192.168.0.0/16

Chain PREROUTING (policy ACCEPT 278M packets, 293G bytes)
  pkts bytes target     prot opt in     out     source 
destination
   169 55528 CHECKSUM   udp  --  br1    *       0.0.0.0/0 
0.0.0.0/0           udp dpt:67 CHECKSUM fill

Chain INPUT (policy ACCEPT 180M packets, 250G bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain FORWARD (policy ACCEPT 98M packets, 44G bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 253M packets, 223G bytes)
  pkts bytes target     prot opt in     out     source 
destination
   165 54182 CHECKSUM   udp  --  *      br1     0.0.0.0/0 
0.0.0.0/0           udp spt:67 CHECKSUM fill
    51  3712 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:53 CLASSIFY set 1:20
85274 6454K CLASSIFY   udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp dpt:53 CLASSIFY set 1:20
   187  257K CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp spt:81 CLASSIFY set 1:20
   25M 1180M CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp flags:0x3F/0x10 state ESTABLISHED length 40:100 
CLASSIFY set 1:15
  728K   67M CLASSIFY   icmp --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           CLASSIFY set 1:15
   231 23484 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:2401 CLASSIFY set 1:15
65636 5610K CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:22 CLASSIFY set 1:10
  2018  315K CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp spt:22 CLASSIFY set 1:10
    80 10092 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:3389 CLASSIFY set 1:10
26063 8910K CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:8080 CLASSIFY set 1:15
  932K  131M CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:80 CLASSIFY set 1:15
  3511  267K CLASSIFY   udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp dpt:123 CLASSIFY set 1:10
     0     0 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp spt:20 CLASSIFY set 1:15
     3   180 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:20 CLASSIFY set 1:15
94058   38M CLASSIFY   47   --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           CLASSIFY set 1:10
1086K  183M CLASSIFY   udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp spt:1194 CLASSIFY set 1:10
1086K  183M TOS        udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp spt:1194 TOS set 0x10/0x3f
48817   10M CLASSIFY   udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp spt:1195 CLASSIFY set 1:10
48817   10M TOS        udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp spt:1195 TOS set 0x10/0x3f
94058   38M CLASSIFY   47   --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           CLASSIFY set 1:15
   106  7207 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:1863 CLASSIFY set 1:15
  188K   34M CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:443 CLASSIFY set 1:15
51541 3327K CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpts:6660:6669 CLASSIFY set 1:15
     0     0 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp spts:2021:2030 CLASSIFY set 1:15
    85  4944 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp dpt:19999 CLASSIFY set 1:15
  208K   86M CLASSIFY   udp  --  *      *       0.0.0.0/0 
0.0.0.0/0           source IP range 192.168.2.80-192.168.2.120 CLASSIFY 
set 1:10
     0     0 CLASSIFY   tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp spt:12345 CLASSIFY set 1:15
     1    80 CLASSIFY   udp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           udp spt:12345 CLASSIFY set 1:15


Default table

Chain INPUT (policy ACCEPT 176M packets, 247G bytes)
  pkts bytes target     prot opt in     out     source 
destination
     0     0 ACCEPT     udp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           udp dpt:4569
1187K  582M ACCEPT     udp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           udp dpt:1194
     2   577 ACCEPT     udp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           udp dpt:1195
    28  1224 ACCEPT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:3389
   230 12372            tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:22 state NEW recent: SET name: DEFAULT side: 
source
     3   180 DROP       tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:22 state NEW recent: UPDATE seconds: 300 
hit_count: 4 name: DEFAULT side: source
  1750  143K ACCEPT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:22
     3   144 ACCEPT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:113
   120  6090 ACCEPT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:81
36094   29M ACCEPT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp dpt:25
1456K 1706M ACCEPT     all  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           state RELATED,ESTABLISHED
31047 2334K REJECT     tcp  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0           tcp option=!2 reject-with tcp-reset
  552K   60M ACCEPT     all  --  !ppp0  *       0.0.0.0/0 
0.0.0.0/0           state NEW
13552 1207K ACCEPT     icmp --  ppp0   *       0.0.0.0/0 
0.0.0.0/0
  5712  392K DROP       all  --  ppp0   *       0.0.0.0/0 
0.0.0.0/0

Chain FORWARD (policy ACCEPT 98M packets, 44G bytes)
  pkts bytes target     prot opt in     out     source 
destination
1207K   68M TCPMSS     tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp flags:0x06/0x02 TCPMSS clamp to PMTU

Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes)
  pkts bytes target     prot opt in     out     source 
destination
31675 1895K TCPMSS     tcp  --  *      ppp0    0.0.0.0/0 
0.0.0.0/0           tcp flags:0x06/0x02 TCPMSS clamp to PMTU

lsmod

ipt_REJECT              2277  1
ipt_MASQUERADE          1759  7
ipt_REDIRECT            1133  1
xt_recent               8223  2
xt_state                1226  5
iptable_nat             3993  1
nf_nat                 16773  3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat
nf_conntrack_ipv4      11868  8 iptable_nat,nf_nat
nf_conntrack           60962  5 
ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4          1417  1 nf_conntrack_ipv4
xt_TCPMSS               2567  2
xt_tcpmss               1469  0
xt_tcpudp               2467  56
iptable_mangle          1487  1
pppoe                   9574  2
pppox                   2188  1 pppoe
iptable_filter          1442  1
ip_tables              16762  3 iptable_nat,iptable_mangle,iptable_filter
x_tables               20462  17 
xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables
ppp_generic            24243  6 pppoe,pppox
slhc                    5293  1 ppp_generic
cls_u32                 6468  6
sch_htb                14432  2
deflate                 1937  0
zlib_deflate           21228  1 deflate
des_generic            16135  0
cbc                     2721  0
ecb                     1975  0
crypto_blkcipher       13645  2 cbc,ecb
sha1_generic            2095  0
md5                     4001  0
hmac                    2977  0
crypto_hash            14519  3 sha1_generic,md5,hmac
cryptomgr               2636  0
aead                    6137  1 cryptomgr
crypto_algapi          15289  9 
deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead
af_key                 27372  0
fuse                   66747  1
w83627ehf              32052  0
hwmon_vid               2867  1 w83627ehf
vhost_net              16802  6
powernow_k8            12932  0
mperf                   1263  1 powernow_k8
kvm_amd                53431  24
kvm                   235155  1 kvm_amd
pl2303                 12732  1
xhci_hcd               62865  0
i2c_piix4               8391  0
k10temp                 3183  0
usbserial              34452  3 pl2303
usb_storage            37887  1
usb_libusual           10999  1 usb_storage
ohci_hcd               18105  0
ehci_hcd               33641  0
ahci                   20748  4
usbcore               130936  7 
pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd
libahci                21202  1 ahci
sata_mv                26939  0
megaraid_sas           71659  14

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: Eric Dumazet @ 2011-06-07  3:19 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: netdev, David S. Miller
In-Reply-To: <1307410786-19110-3-git-send-email-lucian.grijincu@gmail.com>

Le mardi 07 juin 2011 à 04:39 +0300, Lucian Adrian Grijincu a écrit :
> The most like case is that no one else is registering devices with a
> name like "dummy%d".
> 
> We can bring the complexity down by replacing:
> - alloc_netdev_id which is O(N) with
> - alloc_netdev_id which, on the average case, is O(1).
> 
> $ time modprobe dummy numdummies=5000
> - with alloc_netdev   : 9.50s
> - with alloc_netdev_id: 3.50s
> 
> NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
>       current O(N^2) sysctl algorithm with a better one.

Yes, and disabled hotplug I guess.

Dont try this on a random computer ;)

# time modprobe dummy numdummies=5000

real	4m45.646s
user	0m0.000s
sys	0m12.440s
# uptime
 05:13:46 up 13:30,  3 users,  load average: 11221.41, 6918.70, 3101.12
# uptime
 05:18:45 up 13:35,  3 users,  load average: 12159.82, 10277.39, 5623.19




^ permalink raw reply

* [PATCH] bonding: use new value of lacp_rate and ad_select
From: Weiping Pan @ 2011-06-07  2:24 UTC (permalink / raw)
  To: jpirko, xiyou.wangcong
  Cc: Weiping Pan, Jay Vosburgh, Andy Gospodarek,
	open list:BONDING DRIVER, open list

There is bug that when you modify lacp_rate via sysfs,
802.3ad won't use the new value of lacp_rate to transmit packets.
This is because port->actor_oper_port_state isn't changed.

As for ad_select, it can work,
but both struct bond_params and ad_bond_info have lacp_fast and ad_select,
they are duplicate and need extra synchronization.
802.3ad can get them from bond_params directly every time.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_3ad.c   |   41 ++++++++++++++++++++++++++++++++-----
 drivers/net/bonding/bond_3ad.h   |    7 +----
 drivers/net/bonding/bond_main.c  |    3 +-
 drivers/net/bonding/bond_sysfs.c |    1 +
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c7537abc..6122725 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -262,7 +262,7 @@ static inline u32 __get_agg_selection_mode(struct port *port)
 	if (bond == NULL)
 		return BOND_AD_STABLE;
 
-	return BOND_AD_INFO(bond).agg_select_mode;
+	return bond->params.ad_select;
 }
 
 /**
@@ -1859,7 +1859,6 @@ static void ad_marker_response_received(struct bond_marker *marker,
 void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout)
 {
 	BOND_AD_INFO(bond).agg_select_timer = timeout;
-	BOND_AD_INFO(bond).agg_select_mode = bond->params.ad_select;
 }
 
 static u16 aggregator_identifier;
@@ -1868,11 +1867,10 @@ static u16 aggregator_identifier;
  * bond_3ad_initialize - initialize a bond's 802.3ad parameters and structures
  * @bond: bonding struct to work on
  * @tick_resolution: tick duration (millisecond resolution)
- * @lacp_fast: boolean. whether fast periodic should be used
  *
  * Can be called only after the mac address of the bond is set.
  */
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast)
+void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution)
 {
 	// check that the bond is not initialized yet
 	if (MAC_ADDRESS_COMPARE(&(BOND_AD_INFO(bond).system.sys_mac_addr),
@@ -1880,7 +1878,6 @@ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fas
 
 		aggregator_identifier = 0;
 
-		BOND_AD_INFO(bond).lacp_fast = lacp_fast;
 		BOND_AD_INFO(bond).system.sys_priority = 0xFFFF;
 		BOND_AD_INFO(bond).system.sys_mac_addr = *((struct mac_addr *)bond->dev->dev_addr);
 
@@ -1903,6 +1900,7 @@ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fas
 int bond_3ad_bind_slave(struct slave *slave)
 {
 	struct bonding *bond = bond_get_bond_by_slave(slave);
+	int lacp_fast = bond->params.lacp_fast;
 	struct port *port;
 	struct aggregator *aggregator;
 
@@ -1918,7 +1916,7 @@ int bond_3ad_bind_slave(struct slave *slave)
 		// port initialization
 		port = &(SLAVE_AD_INFO(slave).port);
 
-		ad_initialize_port(port, BOND_AD_INFO(bond).lacp_fast);
+		ad_initialize_port(port, lacp_fast);
 
 		port->slave = slave;
 		port->actor_port_number = SLAVE_AD_INFO(slave).id;
@@ -2473,3 +2471,34 @@ void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 	bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
 	read_unlock(&bond->lock);
 }
+
+/*
+ * When modify lacp_rate parameter via sysfs,
+ * update actor_oper_port_state of each port.
+ *
+ * Hold slave->state_machine_lock,
+ * so we can modify port->actor_oper_port_state,
+ * no matter bond is up or down.
+ */
+void bond_3ad_update_lacp_rate(struct bonding *bond)
+{
+	int i;
+	struct slave *slave;
+	struct port *port = NULL;
+	int lacp_fast;
+
+	read_lock(&bond->lock);
+	lacp_fast = bond->params.lacp_fast;
+
+	bond_for_each_slave(bond, slave, i) {
+		port = &(SLAVE_AD_INFO(slave).port);
+		__get_state_machine_lock(port);
+		if (lacp_fast)
+			port->actor_oper_port_state |= AD_STATE_LACP_TIMEOUT;
+		else
+			port->actor_oper_port_state &= ~AD_STATE_LACP_TIMEOUT;
+		__release_state_machine_lock(port);
+	}
+
+	read_unlock(&bond->lock);
+}
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 0ee3f16..1682e69 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -253,10 +253,6 @@ struct ad_system {
 struct ad_bond_info {
 	struct ad_system system;	    /* 802.3ad system structure */
 	u32 agg_select_timer;	    // Timer to select aggregator after all adapter's hand shakes
-	u32 agg_select_mode;	    // Mode of selection of active aggregator(bandwidth/count)
-	int lacp_fast;		/* whether fast periodic tx should be
-				 * requested
-				 */
 	struct timer_list ad_timer;
 };
 
@@ -269,7 +265,7 @@ struct ad_slave_info {
 };
 
 // ================= AD Exported functions to the main bonding code ==================
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast);
+void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution);
 int  bond_3ad_bind_slave(struct slave *slave);
 void bond_3ad_unbind_slave(struct slave *slave);
 void bond_3ad_state_machine_handler(struct work_struct *);
@@ -282,5 +278,6 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
 void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave);
 int bond_3ad_set_carrier(struct bonding *bond);
+void bond_3ad_update_lacp_rate(struct bonding *bond);
 #endif //__BOND_3AD_H__
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 716c852..bb1af9c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1843,8 +1843,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			/* Initialize AD with the number of times that the AD timer is called in 1 second
 			 * can be called only after the mac address of the bond is set
 			 */
-			bond_3ad_initialize(bond, 1000/AD_TIMER_INTERVAL,
-					    bond->params.lacp_fast);
+			bond_3ad_initialize(bond, 1000/AD_TIMER_INTERVAL);
 		} else {
 			SLAVE_AD_INFO(new_slave).id =
 				SLAVE_AD_INFO(new_slave->prev).id + 1;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 88fcb25..03d1196 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -804,6 +804,7 @@ static ssize_t bonding_store_lacp(struct device *d,
 
 	if ((new_value == 1) || (new_value == 0)) {
 		bond->params.lacp_fast = new_value;
+		bond_3ad_update_lacp_rate(bond);
 		pr_info("%s: Setting LACP rate to %s (%d).\n",
 			bond->dev->name, bond_lacp_tbl[new_value].modename,
 			new_value);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: Lucian Adrian Grijincu @ 2011-06-07  1:39 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu
In-Reply-To: <1307410786-19110-1-git-send-email-lucian.grijincu@gmail.com>

The most like case is that no one else is registering devices with a
name like "dummy%d".

We can bring the complexity down by replacing:
- alloc_netdev_id which is O(N) with
- alloc_netdev_id which, on the average case, is O(1).

$ time modprobe dummy numdummies=5000
- with alloc_netdev   : 9.50s
- with alloc_netdev_id: 3.50s

NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
      current O(N^2) sysctl algorithm with a better one.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 drivers/net/dummy.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 39cf9b9..24d4ee5 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -159,12 +159,14 @@ static struct rtnl_link_ops dummy_link_ops __read_mostly = {
 module_param(numdummies, int, 0);
 MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices");
 
+
+static int last_device_id = -1;
 static int __init dummy_init_one(void)
 {
 	struct net_device *dev_dummy;
 	int err;
 
-	dev_dummy = alloc_netdev(0, "dummy%d", dummy_setup);
+	dev_dummy = alloc_netdev_id(0, "dummy%d", dummy_setup, &last_device_id);
 	if (!dev_dummy)
 		return -ENOMEM;
 
-- 
1.7.5.2.317.g391b14


^ permalink raw reply related

* [PATCH 1/2] net: add alloc_netdev_mqs_id
From: Lucian Adrian Grijincu @ 2011-06-07  1:39 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu
In-Reply-To: <1307410786-19110-1-git-send-email-lucian.grijincu@gmail.com>

The complexity of alloc_netdev_mqs depends on the type of the device name:
- O(nr-net-devices) - for a device name with '%d' in it
- O(1)              - for given device name without any format.

The difference comes from the path chosen in __dev_alloc_name: if '%d'
is found in the name (e.g. 'dummy%d') it will:
- match all the devices in the that network namespace with the device
  name format extracting all used values for '%d' (e.g. 'dummy0',
  'dummy1', dummy3' => {0, 1 ,3} are used)
- create a device with the smallest unused value (e.g. 'dummy2').

Obviously the O(N) part comes the for_each_netdev loop. One could keep
around a precomputed table of values that are in use for each pattern
that is of interest (patterns for with there will be large numbers of
devices created) and make sure to mark slots as unused when
unregistering the device. The table would have no use after
registering a device and would need to be netns-specific.

Things get more complicated when taking into consideration device
renames and registration of devices that do not use patterns in names
(e.g. an explicit registration of a device with the 'dummy3' name).

This patch adds a new method of creating device names that aims to sit
in the middle: accept device names patterns with '%d' and the last
value used for '%d'. If the next slot is not taken, alloc_netdev_mqs_id
will be an O(1) operation. If that name is taken it falls back on
the O(N) algorithm.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/linux/netdevice.h |    7 +++++
 net/core/dev.c            |   63 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca333e7..612c1f3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2452,9 +2452,16 @@ extern void		ether_setup(struct net_device *dev);
 extern struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int txqs, unsigned int rxqs);
+extern struct net_device *alloc_netdev_mqs_id(int sizeof_priv, const char *name,
+		void (*setup)(struct net_device *),
+		unsigned int txqs, unsigned int rxqs, int *p_last_id);
+
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mqs(sizeof_priv, name, setup, 1, 1)
 
+#define alloc_netdev_id(sizeof_priv, name, setup, p_last_id)	\
+	alloc_netdev_mqs_id(sizeof_priv, name, setup, 1, 1, p_last_id)
+
 #define alloc_netdev_mq(sizeof_priv, name, setup, count) \
 	alloc_netdev_mqs(sizeof_priv, name, setup, count, count)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 9393078..0862e81 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5908,6 +5908,69 @@ free_p:
 }
 EXPORT_SYMBOL(alloc_netdev_mqs);
 
+
+/**
+ * alloc_netdev_mqs_id - allocate a network device
+ *
+ * @name      - format of the device name. E.g. 'dummy%d'
+ * @p_last_id - IN: last known value that was given to '%d'
+ *              OUT: the value used for '%d' for the newly created device
+ *
+ *	@sizeof_priv:	size of private data to allocate space for
+ *	@setup:		callback to initialize device
+ *	@txqs:		the number of TX subqueues to allocate
+ *	@rxqs:		the number of RX subqueues to allocate
+ *
+ * alloc_netdev_mqs' complexity depends on the device name:
+ *  - O(nr-net-devices) - for a device name with '%d' in it
+ *  - O(1)              - for given device name without any format.
+ *
+ * alloc_netdev_mqs_id takes an extra argument: the last value that was
+ * used to fill '%d' in the name pattern. It uses this to create name
+ * that is likely to not be used (last_id+1) and tries to register a
+ * device with that name - O(1). If that fails it drops to the O(N)
+ * algorithm by sending the device name format.
+ *
+ * alloc_netdev_mqs will always make sure to find the smallest unused
+ * value for the '%d' in the name. alloc_netdev_mqs_id does not.
+ *
+ * E.g.:
+ * - you create 8 devices by calling alloc_netdev_mqs_id ('eth0' .. 'eth7')
+ *   and you know the next free slot is 'eth8'.
+ * - someone renames 'eth2' to 'some-other-name'
+ * - the next device created by alloc_netdev_mqs_id will be 'eth8'
+ *   even though 'eth2' could have been used.
+ */
+struct net_device *alloc_netdev_mqs_id(int sizeof_priv, const char *name,
+		void (*setup)(struct net_device *),
+		unsigned int txqs, unsigned int rxqs, int *p_last_id)
+{
+	struct net_device *dev;
+	char buf[IFNAMSIZ];
+
+	int new_id = (*p_last_id) + 1;
+
+	/* first try with explicit name - O(1) */
+	snprintf(buf, IFNAMSIZ, name, new_id);
+	dev = alloc_netdev_mqs(sizeof_priv, buf, setup, txqs, rxqs);
+	if (dev)
+		goto out;
+
+	/* fallback: create a name automatically - O(N) */
+	dev = alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs);
+	if (!dev)
+		goto fail;
+
+	sscanf(dev->name, name, &new_id);
+
+out:
+	*p_last_id = new_id;
+fail:
+	return dev;
+}
+EXPORT_SYMBOL(alloc_netdev_mqs_id);
+
+
 /**
  *	free_netdev - free network device
  *	@dev: device
-- 
1.7.5.2.317.g391b14


^ permalink raw reply related

* [PATCH 0/2] speed up net device allocation using pattern names
From: Lucian Adrian Grijincu @ 2011-06-07  1:39 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu

The next two patches:
- add a faster way to add net devices using pattern names like "dummy%d"
- call that routine for dummy

Patches are against net-next, but should apply cleanly to 3.0-rc1 too.

Lucian Adrian Grijincu (2):
  net: add alloc_netdev_mqs_id
  net: dummy: allocate devices with alloc_netdev_id

 drivers/net/dummy.c       |    4 ++-
 include/linux/netdevice.h |    7 +++++
 net/core/dev.c            |   63 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 1 deletions(-)

-- 
1.7.5.2.317.g391b14


^ permalink raw reply

* Re: suspect locking in net/irda/iriap.c
From: Dave Jones @ 2011-06-07  0:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, samuel
In-Reply-To: <20110606.170056.193694265.davem@davemloft.net>

On Mon, Jun 06, 2011 at 05:00:56PM -0700, David Miller wrote:
 > From: Dave Jones <davej@redhat.com>
 > Date: Wed, 20 Apr 2011 23:40:58 -0400
 > 
 > > My reading of that comment suggests that the two locks aren't the same,
 > > so is this just missing a lockdep annotation ?
 > 
 > Dave, I'm going to check in the following to net-2.6 to try and
 > address this.  Let me know how it works for you.
 
will check it out once I'm done bisecting a different bug, thanks.

	Dave


^ permalink raw reply

* Re: [Bugme-new] [Bug 33902] New: tcpi_state field in tcp_info structure reports TCP_CLOSE instead of TCP_TIME_WAIT state
From: David Miller @ 2011-06-07  0:05 UTC (permalink / raw)
  To: akpm; +Cc: netdev, bugzilla-daemon, bugme-daemon, Dmitry.Izbitsky
In-Reply-To: <20110425143421.3267fcc1.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 25 Apr 2011 14:34:21 -0700

> On Mon, 25 Apr 2011 08:08:36 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> Setup - TCP connection in ESTABLISHED state. Local socket calls
>> shutdown(SHUT_RDWR). After that peer calls shutdown(SHUT_RDWR).
>> 
>> Local socket should now be in TIME_WAIT state (from specification point 
>> of view). And it's indeed in TIME_WAIT (TCP_TIME_WAIT) state if we look at 
>> /proc/net/tcp (or netstat -t). However, if one tries to get connection state
>> via tcp_info (getsockopt(TCP_INFO)) the reported state is CLOSED (TCP_CLOSE).
>> 
>> Looks like the problem is in tcp_time_wait() function
>> (net/ipv4/tcp_minisocks.c).
>> It's called with state=TCP_TIME_WAIT, and sets inet_timewaitk_sock
>> *tw->tw_state field to TCP_TIME_WAIT. That's why the state is reported
>> correctly when looking into /proc. However, at the end it calls tcp_done(sk),
>> which itself calls tcp_set_state(TCP_CLOSE), so sk->sk_state is set to
>> TCP_CLOSE instead of TCP_TIME_WAIT. And it's reported this way via TCP_INFO
>> socket option.
>> 
>> Problem is reproduced on 2.6.26, 2.6.38 and is probably observed on earlier
>> kernels.

As far as the user side of the socket is concerned, it is TCP_CLOSE.

For timewait connections we create a completely seperate light-weight object
to manage the network side visible state of the TCP flow.  This is not
accessible from, and is entirely differently from, the heavy-weight full
socket we keep around until the user gives up his final reference.

So I do not see this behavior changing, it would be quite invasive and
expensive to make this work as you expect, and only for marginal gain.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox