* Re: bridge/netfilter: regression in 2.6.39.1
From: David Miller @ 2011-06-07 7:52 UTC (permalink / raw)
To: eric.dumazet; +Cc: holler, nhorman, linux-kernel, herbert, netdev
In-Reply-To: <1307370363.3098.37.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Jun 2011 16:26:03 +0200
> [PATCH] bridge: provide a cow_metrics method for fake_ops
>
> Like in commit 0972ddb237 (provide cow_metrics() methods to blackhole
> dst_ops), we must provide a cow_metrics for bridges fake_dst_ops as
> well.
>
> This fixes a regression coming from commits 62fa8a846d7d (net: Implement
> read-only protection and COW'ing of metrics.) and 33eb9873a28 (bridge:
> initialize fake_rtable metrics)
>
> ip link set mybridge mtu 1234
> -->
...
> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: Lucian Adrian Grijincu @ 2011-06-07 7:49 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David S. Miller
In-Reply-To: <1307416765.2642.37.camel@edumazet-laptop>
On Tue, Jun 7, 2011 at 6:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 07 juin 2011 à 04:39 +0300, Lucian Adrian Grijincu a écrit :
>> The most like case is that no one else is registering devices with a
>> name like "dummy%d".
>>
>> We can bring the complexity down by replacing:
>> - alloc_netdev_id which is O(N) with
>> - alloc_netdev_id which, on the average case, is O(1).
>>
>> $ time modprobe dummy numdummies=5000
>> - with alloc_netdev : 9.50s
>> - with alloc_netdev_id: 3.50s
>>
>> NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
>> current O(N^2) sysctl algorithm with a better one.
>
> Yes, and disabled hotplug I guess.
$ cat .config | grep HOTPLUG
CONFIG_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_HOTPLUG_PCI is not set
I would guess you're going for CONFIG_HOTPLUG_PCI or
CONFIG_HOTPLUG_CPU, but I don't understand the implications.
> Dont try this on a random computer ;)
Could you briefly explain what's at stake here? What can go wrong if
we do this? I wasn't advocating replacing all alloc_netdev calls with
the new call.
> # time modprobe dummy numdummies=5000
>
> real 4m45.646s
> user 0m0.000s
> sys 0m12.440s
> # uptime
> 05:13:46 up 13:30, 3 users, load average: 11221.41, 6918.70, 3101.12
> # uptime
> 05:18:45 up 13:35, 3 users, load average: 12159.82, 10277.39, 5623.19
I don't get where you're going with these stats.
I don't care much about these patches, and even less if they're
fundamentally borked, but I'd like to know how/why they're not ok.
Please spare a second and illuminate me; I feel left out on a big joke.
--
.
..: Lucian
^ permalink raw reply
* Re: [PATCH] usbnet/cdc_ncm: add missing .reset_resume hook
From: Stefan (metze) Metzmacher @ 2011-06-07 7:38 UTC (permalink / raw)
To: Greg KH; +Cc: David Miller, oliver, linux-usb, netdev, linux-kernel
In-Reply-To: <20110606150655.GB3732@suse.de>
[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]
Am 06.06.2011 17:06, schrieb Greg KH:
> On Mon, Jun 06, 2011 at 02:23:16PM +0200, Stefan (metze) Metzmacher wrote:
>> Hi David,
>>
>>> From: Stefan Metzmacher <metze@samba.org>
>>> Date: Wed, 1 Jun 2011 14:01:41 +0200
>>>
>>>> This avoids messages like this after suspend:
>>>>
>>>> cdc_ncm 2-1.4:1.6: no reset_resume for driver cdc_ncm?
>>>> cdc_ncm 2-1.4:1.7: no reset_resume for driver cdc_ncm?
>>>> cdc_ncm 2-1.4:1.6: usb0: unregister 'cdc_ncm' usb-0000:00:1d.0-1.4, CDC NCM
>>>>
>>>> This is important for the Ericsson F5521gw GSM/UMTS modem.
>>>> Otherwise modemmanager looses the fact that the cdc_ncm and cdc_acm devices
>>>> belong together.
>>>>
>>>> The cdc_ether module does the same.
>>>>
>>>> Signed-off-by: Stefan Metzmacher <metze@samba.org>
>>>
>>> Applied and queued up for -stable, thanks.
>>
>> It seems to be part of 3.0-rc2, but I'm not seeing it in any stable tree
>> yet...
>>
>> When can I expect it in stable trees like 2.6.38.y?
>
> The .38.y tree is closed and will not have new releases, so you will
> never see it there, sorry.
Ok, are there chances for .39.y?
metze
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply
* Re: [PATCH] net/ipv6: check for mistakenly passed in non-AF_INET6 sockaddrs
From: Reinhard Max @ 2011-06-07 7:25 UTC (permalink / raw)
To: Marcus Meissner; +Cc: David Miller, netdev
In-Reply-To: <20110606160007.GD28535@suse.de>
On Mon, 6 Jun 2011 at 18:00, Marcus Meissner wrote:
> Same check as for IPv4, also do for IPv6.
> [...]
> +
> + if (addr->sin6_family != AF_INET6)
> + return -EINVAL;
> +
According to the POSIX manpage for bind(), the error code should be
EAFNOSUPPORT ("The specified address is not a valid address for the
address family of the specified socket"). This would also match the
error code of connect() in the same situation.
And I think the family should be checked before the length for both,
bind() and connect(), to get more descriptive error messages when
passing an address of the wrong family.
cu
Reinhard
^ permalink raw reply
* Re: [PATCH] net: cpu offline cause napi stall
From: Eric Dumazet @ 2011-06-07 7:09 UTC (permalink / raw)
To: David Miller; +Cc: heiko.carstens, blaschka, netdev, linux-s390
In-Reply-To: <1307429403.2642.77.camel@edumazet-laptop>
Le mardi 07 juin 2011 à 08:50 +0200, Eric Dumazet a écrit :
> While doing my tests on bnx2x adapter, I found patch was working ok,
> but /proc/interrupts still increment interrupt count on my offlined
> cpu... go figure...
Oh well, "cat /proc/interrupts" skips offlined cpu, of course, thats a
false alarm.
I was doing "grep eth1 /proc/interrupts" so didnt catch this.
^ permalink raw reply
* [PATCH] net: cpu offline cause napi stall
From: Eric Dumazet @ 2011-06-07 6:50 UTC (permalink / raw)
To: David Miller; +Cc: heiko.carstens, blaschka, netdev, linux-s390
In-Reply-To: <20110606.145051.267562411413352856.davem@davemloft.net>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Frank Blaschka reported :
<quote>
During heavy network load we turn off/on cpus.
Sometimes this causes a stall on the network device.
Digging into the dump I found out following:
napi is scheduled but does not run. From the I/O buffers
and the napi state I see napi/rx_softirq processing has stopped
because the budget was reached. napi stays in the
softnet_data poll_list and the rx_softirq was raised again.
I assume at this time the cpu offline comes in,
the rx softirq is raised/moved to another cpu but napi stays in the
poll_list of the softnet_data of the now offline cpu.
Reviewing dev_cpu_callback (net/core/dev.c) I did not find the
poll_list is transfered to the new cpu.
</quote>
This patch is a straightforward implementation of Frank suggestion :
Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu.
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
---
While doing my tests on bnx2x adapter, I found patch was working ok,
but /proc/interrupts still increment interrupt count on my offlined
cpu... go figure...
net/core/dev.c | 5 +++++
1 files changed, 5 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 9393078..095909c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6178,6 +6178,11 @@ static int dev_cpu_callback(struct notifier_block *nfb,
oldsd->output_queue = NULL;
oldsd->output_queue_tailp = &oldsd->output_queue;
}
+ /* Append NAPI poll list from offline CPU. */
+ if (!list_empty(&oldsd->poll_list)) {
+ list_splice_init(&oldsd->poll_list, &sd->poll_list);
+ raise_softirq_irqoff(NET_RX_SOFTIRQ);
+ }
raise_softirq_irqoff(NET_TX_SOFTIRQ);
local_irq_enable();
^ permalink raw reply related
* Re: [PATCH net-next] net: remove interrupt.h inclusion from netdevice.h
From: David Miller @ 2011-06-07 5:55 UTC (permalink / raw)
To: adobriyan; +Cc: netdev
In-Reply-To: <20110606204346.GA24175@p183.telecom.by>
From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Mon, 6 Jun 2011 23:43:46 +0300
> * remove interrupt.g inclusion from netdevice.h -- not needed
> * fixup fallout, add interrupt.h and hardirq.h back where needed.
>
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Thanks for doing this, patch applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6] be2net: Fix Rx pause counter for lancer
From: David Miller @ 2011-06-07 5:54 UTC (permalink / raw)
To: padmanabh.ratnakar; +Cc: netdev, selvin.xavier
In-Reply-To: <78aa0be5-ee8d-4d65-88aa-6ab858bb7259@exht1.ad.emulex.com>
From: Padmanabh Ratnakar <padmanabh.ratnakar@Emulex.Com>
Date: Mon, 6 Jun 2011 17:57:13 +0530
> From: Selvin Xavier <selvin.xavier@emulex.com>
>
> Fixed Rx pause counter for Lancer. Swapping hi and lo words.
>
> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] qlge: remove unecessary if statement
From: David Miller @ 2011-06-07 5:53 UTC (permalink / raw)
To: Gregory.Dietsche; +Cc: ron.mercer, linux-driver, netdev, linux-kernel
In-Reply-To: <1307321053-21695-1-git-send-email-Gregory.Dietsche@cuw.edu>
From: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Date: Sun, 5 Jun 2011 19:44:13 -0500
> the code always returns 'status' regardless, so if(status) check is unecessary.
>
> Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Applied, thanks.
^ permalink raw reply
* Re: [net-next-2.6 PATCH] enic: Add support for MTU change via port profile on a dynamic vnic
From: David Miller @ 2011-06-07 5:48 UTC (permalink / raw)
To: roprabhu; +Cc: netdev
In-Reply-To: <20110604003517.21165.45654.stgit@savbu-pc100.cisco.com>
From: Roopa Prabhu <roprabhu@cisco.com>
Date: Fri, 03 Jun 2011 17:35:17 -0700
> From: Roopa Prabhu <roprabhu@cisco.com>
>
> enic driver gets MTU change notifications for MTU changes in the
> port profile associated to a dynamic vnic. This patch adds support
> in enic driver to set new MTU on the dynamic vnic and dynamically
> adjust its buffers with new MTU size in response to such notifications.
>
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>
> Signed-off-by: Christian Benvenuti <benve@cisco.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] af_packet: prevent information leak
From: David Miller @ 2011-06-07 5:42 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, kaber
In-Reply-To: <1307425119.2642.63.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 07:38:39 +0200
> Le lundi 06 juin 2011 à 22:24 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 07 Jun 2011 07:02:19 +0200
>>
>> > In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
>> > added a small information leak.
>> >
>> > Add padding field and make sure its zeroed before copy to user.
>> >
>> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> > CC: Patrick McHardy <kaber@trash.net>
>>
>> I fear this will change the size of these structures on some weird
>> architecture. Doesn't ARM, for example, have weird rules
>> wrt. alignment and structure sizing when "smaller than word" elements
>> are involved?
>>
>> That's why we need __packed on:
>>
>> struct nd_opt_hdr {
>> __u8 nd_opt_type;
>> __u8 nd_opt_len;
>> } __packed;
>>
>> for example.
>>
>> Probably safe to just do a memset of the tail, and the constant length
>> will evaluate to zero on these weird platforms. On others, where the
>> padding does matter, the memset will emit the same code as your new
>> assignments do.
>
> It should not matter for these structures, or 393e52e33c6c2 would have
> break applications on all arches (not only ARM), since it enlarged them
> without any warning ;)
Indeed. Ok, I'm convinced.
Applied and queued up for -stable, thanks!
^ permalink raw reply
* Re: [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07 5:41 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kaber
In-Reply-To: <1307425119.2642.63.camel@edumazet-laptop>
Le mardi 07 juin 2011 à 07:38 +0200, Eric Dumazet a écrit :
> It should not matter for these structures, or 393e52e33c6c2 would have
> break applications on all arches (not only ARM), since it enlarged them
> without any warning ;)
>
>
>
BTW, it means that next time we want to add a field in these structure,
we cant use the padding space, or else an application cant know for sure
if the kernel put a padding (say on linux-3.0) or the new field
(linux-3.8)
^ permalink raw reply
* Re: [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07 5:38 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20110606.222416.1114593259198630297.davem@davemloft.net>
Le lundi 06 juin 2011 à 22:24 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 07 Jun 2011 07:02:19 +0200
>
> > In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
> > added a small information leak.
> >
> > Add padding field and make sure its zeroed before copy to user.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Patrick McHardy <kaber@trash.net>
>
> I fear this will change the size of these structures on some weird
> architecture. Doesn't ARM, for example, have weird rules
> wrt. alignment and structure sizing when "smaller than word" elements
> are involved?
>
> That's why we need __packed on:
>
> struct nd_opt_hdr {
> __u8 nd_opt_type;
> __u8 nd_opt_len;
> } __packed;
>
> for example.
>
> Probably safe to just do a memset of the tail, and the constant length
> will evaluate to zero on these weird platforms. On others, where the
> padding does matter, the memset will emit the same code as your new
> assignments do.
It should not matter for these structures, or 393e52e33c6c2 would have
break applications on all arches (not only ARM), since it enlarged them
without any warning ;)
^ permalink raw reply
* Re: [PATCH] af_packet: prevent information leak
From: David Miller @ 2011-06-07 5:24 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, kaber
In-Reply-To: <1307422939.2642.57.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 07:02:19 +0200
> In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
> added a small information leak.
>
> Add padding field and make sure its zeroed before copy to user.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>
I fear this will change the size of these structures on some weird
architecture. Doesn't ARM, for example, have weird rules
wrt. alignment and structure sizing when "smaller than word" elements
are involved?
That's why we need __packed on:
struct nd_opt_hdr {
__u8 nd_opt_type;
__u8 nd_opt_len;
} __packed;
for example.
Probably safe to just do a memset of the tail, and the constant length
will evaluate to zero on these weird platforms. On others, where the
padding does matter, the memset will emit the same code as your new
assignments do.
^ permalink raw reply
* Re: Fw: [PATCH] e100: Fix inconsistency in bad frames handling
From: Eric Dumazet @ 2011-06-07 5:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: netdev, e1000-devel, Andrea Merello
In-Reply-To: <20110606214454.4108aff4.akpm@linux-foundation.org>
Le lundi 06 juin 2011 à 21:44 -0700, Andrew Morton a écrit :
>
> Begin forwarded message:
>
> Date: Sun, 5 Jun 2011 03:14:49 +0200
> From: Andrea Merello <andrea.merello@gmail.com>
> To: linux-kernel@vger.kernel.org
> Subject: [PATCH] e100: Fix inconsistency in bad frames handling
>
>
> Hello!
>
> In e100 driver it seems that the intention was to accept bad frames in
> promiscuous mode and loopback mode.
> I think this is evident because of the following code in the driver:
>
> if (nic->flags & promiscuous || nic->loopback) {
> config->rx_save_bad_frames = 0x1; /* 1=save, 0=discard */
> config->rx_discard_short_frames = 0x0; /* 1=discard, 0=save */
> config->promiscuous_mode = 0x1; /* 1=on, 0=off */
> }
>
>
> However this intention is not really realized because bad frames are
> discarded later by SW check.
> This patch finally honors the above intention, making the RX code to
> let bad frames to pass when the NIC is in promiscuous or loopback
> mode.
>
> This helped me a lot to debug an FPGA ethernet core.
> Maybe it can be also useful to someone else..
>
> Thanks
> Andrea
>
> --- drivers/net/e100_orig.c 2011-06-14 23:29:38.322267075 +0200
> +++ drivers/net/e100.c 2011-06-14 23:34:10.700791472 +0200
> @@ -1975,7 +1975,8 @@ static int e100_rx_indicate(struct nic *
> skb_put(skb, actual_size);
> skb->protocol = eth_type_trans(skb, nic->netdev);
>
> - if (unlikely(!(rfd_status & cb_ok))) {
> + if (unlikely(!(nic->flags & promiscuous || nic->loopback) &&
> + !(rfd_status & cb_ok))) {
> /* Don't indicate if hardware indicates errors */
> dev_kfree_skb_any(skb);
> } else if (actual_size > ETH_DATA_LEN + VLAN_ETH_HLEN) {
Thanks Andrew, subject already opened on netdev :)
http://www.spinics.net/lists/netdev/msg166301.html
Lets close this thread and continue on previous one ?
^ permalink raw reply
* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
From: Eric Dumazet @ 2011-06-07 5:06 UTC (permalink / raw)
To: Steffen Klassert; +Cc: David Miller, Herbert Xu, netdev
In-Reply-To: <20110606085247.GE31505@secunet.com>
Le lundi 06 juin 2011 à 10:52 +0200, Steffen Klassert a écrit :
> On Mon, Jun 06, 2011 at 09:38:19AM +0200, Eric Dumazet wrote:
> >
> > Woh, I am afraid I wont have time in following days to check your
> > assertion.
>
> My test setup was the following:
>
> I use an IPsec tunnel with tunnel endpoints 192.168.1.1 and 192.168.1.2
>
> Then I do at 192.168.1.2
>
> ping -c1 -M do -s 1410 192.168.1.1
>
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1438)
>
> --- 192.168.1.1 ping statistics ---
> 0 packets transmitted, 0 received, +1 errors
>
> So the packet matches the mtu but it is not send.
> I used a kernel with your patch as head commit.
>
> Reverting your patch (going one commit deeper in the history):
>
> ping -c1 -M do -s 1410 192.168.1.1
>
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> 1418 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=3.01 ms
>
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 3.014/3.014/3.014/0.000 ms
>
> >
> > What about original problem then, how should we fix it ?
> >
>
> Hm, I don't know. I'll try to reproduce it here.
>
> > We do have some cases where at least one fragment (the last one) is
> > oversized.
>
> trailer_len is used only on IPsec so the poroblem exists only when
> using IPsec, right?
>
> >
> > I remember I used Nick Bowler scripts at that time, I might find them
> > again...
>
> Would be nice if you could provide these scripts and some informations
> on how to reproduce the problem.
>
Nick mail was :
http://www.spinics.net/lists/netdev/msg141308.html
Unfortunatly I could not find on my machines where I put my own
scripts...
Not a big deal, I suspect we can revert my commit if you say it added a
regression :)
Thanks
^ permalink raw reply
* [PATCH] af_packet: prevent information leak
From: Eric Dumazet @ 2011-06-07 5:02 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Patrick McHardy
In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.
Add padding field and make sure its zeroed before copy to user.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
include/linux/if_packet.h | 2 ++
net/packet/af_packet.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 6d66ce1..7b31863 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -62,6 +62,7 @@ struct tpacket_auxdata {
__u16 tp_mac;
__u16 tp_net;
__u16 tp_vlan_tci;
+ __u16 tp_padding;
};
/* Rx ring - header status */
@@ -101,6 +102,7 @@ struct tpacket2_hdr {
__u32 tp_sec;
__u32 tp_nsec;
__u16 tp_vlan_tci;
+ __u16 tp_padding;
};
#define TPACKET2_HDRLEN (TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ba248d9..c0c3cda 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -804,6 +804,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
} else {
h.h2->tp_vlan_tci = 0;
}
+ h.h2->tp_padding = 0;
hdrlen = sizeof(*h.h2);
break;
default:
@@ -1736,6 +1737,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
} else {
aux.tp_vlan_tci = 0;
}
+ aux.tp_padding = 0;
put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
}
^ permalink raw reply related
* Fw: [PATCH] e100: Fix inconsistency in bad frames handling
From: Andrew Morton @ 2011-06-07 4:44 UTC (permalink / raw)
To: netdev, e1000-devel; +Cc: Andrea Merello
Begin forwarded message:
Date: Sun, 5 Jun 2011 03:14:49 +0200
From: Andrea Merello <andrea.merello@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: [PATCH] e100: Fix inconsistency in bad frames handling
Hello!
In e100 driver it seems that the intention was to accept bad frames in
promiscuous mode and loopback mode.
I think this is evident because of the following code in the driver:
if (nic->flags & promiscuous || nic->loopback) {
config->rx_save_bad_frames = 0x1; /* 1=save, 0=discard */
config->rx_discard_short_frames = 0x0; /* 1=discard, 0=save */
config->promiscuous_mode = 0x1; /* 1=on, 0=off */
}
However this intention is not really realized because bad frames are
discarded later by SW check.
This patch finally honors the above intention, making the RX code to
let bad frames to pass when the NIC is in promiscuous or loopback
mode.
This helped me a lot to debug an FPGA ethernet core.
Maybe it can be also useful to someone else..
Thanks
Andrea
--- drivers/net/e100_orig.c 2011-06-14 23:29:38.322267075 +0200
+++ drivers/net/e100.c 2011-06-14 23:34:10.700791472 +0200
@@ -1975,7 +1975,8 @@ static int e100_rx_indicate(struct nic *
skb_put(skb, actual_size);
skb->protocol = eth_type_trans(skb, nic->netdev);
- if (unlikely(!(rfd_status & cb_ok))) {
+ if (unlikely(!(nic->flags & promiscuous || nic->loopback) &&
+ !(rfd_status & cb_ok))) {
/* Don't indicate if hardware indicates errors */
dev_kfree_skb_any(skb);
} else if (actual_size > ETH_DATA_LEN + VLAN_ETH_HLEN) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: David Miller @ 2011-06-07 3:38 UTC (permalink / raw)
To: eric.dumazet; +Cc: lucian.grijincu, netdev
In-Reply-To: <1307416765.2642.37.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 07 Jun 2011 05:19:25 +0200
> Le mardi 07 juin 2011 à 04:39 +0300, Lucian Adrian Grijincu a écrit :
>> The most like case is that no one else is registering devices with a
>> name like "dummy%d".
>>
>> We can bring the complexity down by replacing:
>> - alloc_netdev_id which is O(N) with
>> - alloc_netdev_id which, on the average case, is O(1).
>>
>> $ time modprobe dummy numdummies=5000
>> - with alloc_netdev : 9.50s
>> - with alloc_netdev_id: 3.50s
>>
>> NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
>> current O(N^2) sysctl algorithm with a better one.
>
> Yes, and disabled hotplug I guess.
>
> Dont try this on a random computer ;)
>
> # time modprobe dummy numdummies=5000
>
> real 4m45.646s
> user 0m0.000s
> sys 0m12.440s
> # uptime
> 05:13:46 up 13:30, 3 users, load average: 11221.41, 6918.70, 3101.12
ROFL
^ permalink raw reply
* Re: KVM induced panic on 2.6.38[2367] & 2.6.39
From: Brad Campbell @ 2011-06-07 3:33 UTC (permalink / raw)
To: Bart De Schuymer; +Cc: kvm, linux-mm, linux-kernel, netdev, netfilter-devel
In-Reply-To: <4DED344D.7000005@pandora.be>
On 07/06/11 04:10, Bart De Schuymer wrote:
> Hi Brad,
>
> This has probably nothing to do with ebtables, so please rmmod in case
> it's loaded.
> A few questions I didn't directly see an answer to in the threads I
> scanned...
> I'm assuming you actually use the bridging firewall functionality. So,
> what iptables modules do you use? Can you reduce your iptables rules to
> a core that triggers the bug?
> Or does it get triggered even with an empty set of firewall rules?
> Are you using a stock .35 kernel or is it patched?
> Is this something I can trigger on a poor guy's laptop or does it
> require specialized hardware (I'm catching up on qemu/kvm...)?
Not specialised hardware as such, I've just not been able to reproduce
it outside of this specific operating scenario.
I can't trigger it with empty firewall rules as it relies on a DNAT to
occur. If I try it directly to the internal IP address (as I have to
without netfilter loaded) then of course nothing fails.
It's a pain in the bum as a fault, but it's one I can easily reproduce
as long as I use the same set of circumstances.
I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it
on that then I'll attempt to pare down the IPTABLES rules to a bare minimum.
It is nothing to do with ebtables as I don't compile it. I'm not really
sure about "bridging firewall" functionality. I just use a couple of
hand coded bash scripts to set the tables up.
brad@srv:~$ lsmod
Module Size Used by
xt_iprange 1637 1
xt_DSCP 2077 2
xt_length 1216 1
xt_CLASSIFY 1091 26
sch_sfq 6681 4
xt_CHECKSUM 1229 2 brad@srv:~$ lsmod
Module Size Used by
xt_iprange 1637 1
xt_DSCP 2077 2
xt_length 1216 1
xt_CLASSIFY 1091 26
sch_sfq 6681 4
xt_CHECKSUM 1229 2
ipt_REJECT 2277 1
ipt_MASQUERADE 1759 7
ipt_REDIRECT 1133 1
xt_recent 8223 2
xt_state 1226 5
iptable_nat 3993 1
nf_nat 16773 3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat
nf_conntrack_ipv4 11868 8 iptable_nat,nf_nat
nf_conntrack 60962 5
ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1417 1 nf_conntrack_ipv4
xt_TCPMSS 2567 2
xt_tcpmss 1469 0
xt_tcpudp 2467 56
iptable_mangle 1487 1
pppoe 9574 2
pppox 2188 1 pppoe
iptable_filter 1442 1
ip_tables 16762 3 iptable_nat,iptable_mangle,iptable_filter
x_tables 20462 17
xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables
ppp_generic 24243 6 pppoe,pppox
slhc 5293 1 ppp_generic
cls_u32 6468 6
sch_htb 14432 2
deflate 1937 0
zlib_deflate 21228 1 deflate
des_generic 16135 0
cbc 2721 0
ecb 1975 0
crypto_blkcipher 13645 2 cbc,ecb
sha1_generic 2095 0
md5 4001 0
hmac 2977 0
crypto_hash 14519 3 sha1_generic,md5,hmac
cryptomgr 2636 0
aead 6137 1 cryptomgr
crypto_algapi 15289 9
deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead
af_key 27372 0
fuse 66747 1
w83627ehf 32052 0
hwmon_vid 2867 1 w83627ehf
vhost_net 16802 6
powernow_k8 12932 0
mperf 1263 1 powernow_k8
kvm_amd 53431 24
kvm 235155 1 kvm_amd
pl2303 12732 1
xhci_hcd 62865 0
i2c_piix4 8391 0
k10temp 3183 0
usbserial 34452 3 pl2303
usb_storage 37887 1
usb_libusual 10999 1 usb_storage
ohci_hcd 18105 0
ehci_hcd 33641 0
ahci 20748 4
usbcore 130936 7
pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd
libahci 21202 1 ahci
sata_mv 26939 0
megaraid_sas 71659 14
Nat Table (external ip substituted for xxx.xxx.xxx.xxx)
Chain PREROUTING (policy ACCEPT 1761K packets, 152M bytes)
pkts bytes target prot opt in out source
destination
5 210 DNAT udp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 udp dpt:1195 to:192.168.253.199
6 252 DNAT udp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx udp dpt:1195 to:192.168.253.199
0 0 DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:25001 to:192.168.253.199:465
0 0 DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:25000 to:192.168.253.199:993
0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:25001 to:192.168.253.199:465
0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:25000 to:192.168.253.199:993
2 142 DNAT 47 -- ppp0 * 0.0.0.0/0
0.0.0.0/0 to:192.168.253.199
18 880 DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:1723 to:192.168.253.199
0 0 DNAT 47 -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx to:192.168.253.199
0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:1723 to:192.168.253.199
2969 149K DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:443 to:192.168.253.198
20 1280 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:443 to:192.168.253.198
0 0 DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:3101 to:192.168.253.197
0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:3101 to:192.168.253.197
0 0 DNAT tcp -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx tcp dpt:4101 to:192.168.253.197
44528 2718K REDIRECT tcp -- !ppp0 * 0.0.0.0/0
!192.168.0.0/16 tcp dpt:80 redir ports 8080
0 0 DNAT tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:3724 to:192.168.2.107
596K 33M DNAT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpts:2001:2030 to:10.99.99.2
1420K 119M DNAT udp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 udp dpts:2001:2030 to:10.99.99.2
7483 449K DNAT all -- !ppp0 * 0.0.0.0/0
xxx.xxx.xxx.xxx to:192.168.2.1
Mangle Table
Chain INPUT (policy ACCEPT 270K packets, 17M bytes)
pkts bytes target prot opt in out source
destination
Chain OUTPUT (policy ACCEPT 170K packets, 12M bytes)
pkts bytes target prot opt in out source
destination
Chain POSTROUTING (policy ACCEPT 2205K packets, 166M bytes)
pkts bytes target prot opt in out source
destination
0 0 MASQUERADE all -- * * 0.0.0.0/0
192.168.254.3
6 360 ACCEPT all -- * * 0.0.0.0/0
xxx.xxx.xxx.xxx
20424 2120K MASQUERADE all -- * ppp0 192.168.0.0/16
!192.168.0.0/16
0 0 MASQUERADE all -- * ppp0 10.0.0.0/24
0.0.0.0/0
3 204 MASQUERADE all -- * * 192.168.2.0/24
10.8.0.0/24
1418K 128M MASQUERADE all -- * * 10.99.99.0/24
0.0.0.0/0
68248 4095K MASQUERADE all -- * * 192.168.253.0/24
10.8.0.0/16
13305 2405K MASQUERADE all -- * * 192.168.253.0/24
!192.168.0.0/16
Chain PREROUTING (policy ACCEPT 278M packets, 293G bytes)
pkts bytes target prot opt in out source
destination
169 55528 CHECKSUM udp -- br1 * 0.0.0.0/0
0.0.0.0/0 udp dpt:67 CHECKSUM fill
Chain INPUT (policy ACCEPT 180M packets, 250G bytes)
pkts bytes target prot opt in out source
destination
Chain FORWARD (policy ACCEPT 98M packets, 44G bytes)
pkts bytes target prot opt in out source
destination
Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes)
pkts bytes target prot opt in out source
destination
Chain POSTROUTING (policy ACCEPT 253M packets, 223G bytes)
pkts bytes target prot opt in out source
destination
165 54182 CHECKSUM udp -- * br1 0.0.0.0/0
0.0.0.0/0 udp spt:67 CHECKSUM fill
51 3712 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:53 CLASSIFY set 1:20
85274 6454K CLASSIFY udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp dpt:53 CLASSIFY set 1:20
187 257K CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp spt:81 CLASSIFY set 1:20
25M 1180M CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp flags:0x3F/0x10 state ESTABLISHED length 40:100
CLASSIFY set 1:15
728K 67M CLASSIFY icmp -- * ppp0 0.0.0.0/0
0.0.0.0/0 CLASSIFY set 1:15
231 23484 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:2401 CLASSIFY set 1:15
65636 5610K CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 CLASSIFY set 1:10
2018 315K CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp spt:22 CLASSIFY set 1:10
80 10092 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:3389 CLASSIFY set 1:10
26063 8910K CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:8080 CLASSIFY set 1:15
932K 131M CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:80 CLASSIFY set 1:15
3511 267K CLASSIFY udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp dpt:123 CLASSIFY set 1:10
0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp spt:20 CLASSIFY set 1:15
3 180 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:20 CLASSIFY set 1:15
94058 38M CLASSIFY 47 -- * ppp0 0.0.0.0/0
0.0.0.0/0 CLASSIFY set 1:10
1086K 183M CLASSIFY udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp spt:1194 CLASSIFY set 1:10
1086K 183M TOS udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp spt:1194 TOS set 0x10/0x3f
48817 10M CLASSIFY udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp spt:1195 CLASSIFY set 1:10
48817 10M TOS udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp spt:1195 TOS set 0x10/0x3f
94058 38M CLASSIFY 47 -- * ppp0 0.0.0.0/0
0.0.0.0/0 CLASSIFY set 1:15
106 7207 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:1863 CLASSIFY set 1:15
188K 34M CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:443 CLASSIFY set 1:15
51541 3327K CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpts:6660:6669 CLASSIFY set 1:15
0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp spts:2021:2030 CLASSIFY set 1:15
85 4944 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp dpt:19999 CLASSIFY set 1:15
208K 86M CLASSIFY udp -- * * 0.0.0.0/0
0.0.0.0/0 source IP range 192.168.2.80-192.168.2.120 CLASSIFY
set 1:10
0 0 CLASSIFY tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp spt:12345 CLASSIFY set 1:15
1 80 CLASSIFY udp -- * ppp0 0.0.0.0/0
0.0.0.0/0 udp spt:12345 CLASSIFY set 1:15
Default table
Chain INPUT (policy ACCEPT 176M packets, 247G bytes)
pkts bytes target prot opt in out source
destination
0 0 ACCEPT udp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 udp dpt:4569
1187K 582M ACCEPT udp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 udp dpt:1194
2 577 ACCEPT udp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 udp dpt:1195
28 1224 ACCEPT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:3389
230 12372 tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 state NEW recent: SET name: DEFAULT side:
source
3 180 DROP tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22 state NEW recent: UPDATE seconds: 300
hit_count: 4 name: DEFAULT side: source
1750 143K ACCEPT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:22
3 144 ACCEPT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:113
120 6090 ACCEPT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:81
36094 29M ACCEPT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp dpt:25
1456K 1706M ACCEPT all -- ppp0 * 0.0.0.0/0
0.0.0.0/0 state RELATED,ESTABLISHED
31047 2334K REJECT tcp -- ppp0 * 0.0.0.0/0
0.0.0.0/0 tcp option=!2 reject-with tcp-reset
552K 60M ACCEPT all -- !ppp0 * 0.0.0.0/0
0.0.0.0/0 state NEW
13552 1207K ACCEPT icmp -- ppp0 * 0.0.0.0/0
0.0.0.0/0
5712 392K DROP all -- ppp0 * 0.0.0.0/0
0.0.0.0/0
Chain FORWARD (policy ACCEPT 98M packets, 44G bytes)
pkts bytes target prot opt in out source
destination
1207K 68M TCPMSS tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS clamp to PMTU
Chain OUTPUT (policy ACCEPT 155M packets, 180G bytes)
pkts bytes target prot opt in out source
destination
31675 1895K TCPMSS tcp -- * ppp0 0.0.0.0/0
0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS clamp to PMTU
lsmod
ipt_REJECT 2277 1
ipt_MASQUERADE 1759 7
ipt_REDIRECT 1133 1
xt_recent 8223 2
xt_state 1226 5
iptable_nat 3993 1
nf_nat 16773 3 ipt_MASQUERADE,ipt_REDIRECT,iptable_nat
nf_conntrack_ipv4 11868 8 iptable_nat,nf_nat
nf_conntrack 60962 5
ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1417 1 nf_conntrack_ipv4
xt_TCPMSS 2567 2
xt_tcpmss 1469 0
xt_tcpudp 2467 56
iptable_mangle 1487 1
pppoe 9574 2
pppox 2188 1 pppoe
iptable_filter 1442 1
ip_tables 16762 3 iptable_nat,iptable_mangle,iptable_filter
x_tables 20462 17
xt_iprange,xt_DSCP,xt_length,xt_CLASSIFY,xt_CHECKSUM,ipt_REJECT,ipt_MASQUERADE,ipt_REDIRECT,xt_recent,xt_state,iptable_nat,xt_TCPMSS,xt_tcpmss,xt_tcpudp,iptable_mangle,iptable_filter,ip_tables
ppp_generic 24243 6 pppoe,pppox
slhc 5293 1 ppp_generic
cls_u32 6468 6
sch_htb 14432 2
deflate 1937 0
zlib_deflate 21228 1 deflate
des_generic 16135 0
cbc 2721 0
ecb 1975 0
crypto_blkcipher 13645 2 cbc,ecb
sha1_generic 2095 0
md5 4001 0
hmac 2977 0
crypto_hash 14519 3 sha1_generic,md5,hmac
cryptomgr 2636 0
aead 6137 1 cryptomgr
crypto_algapi 15289 9
deflate,des_generic,cbc,ecb,crypto_blkcipher,hmac,crypto_hash,cryptomgr,aead
af_key 27372 0
fuse 66747 1
w83627ehf 32052 0
hwmon_vid 2867 1 w83627ehf
vhost_net 16802 6
powernow_k8 12932 0
mperf 1263 1 powernow_k8
kvm_amd 53431 24
kvm 235155 1 kvm_amd
pl2303 12732 1
xhci_hcd 62865 0
i2c_piix4 8391 0
k10temp 3183 0
usbserial 34452 3 pl2303
usb_storage 37887 1
usb_libusual 10999 1 usb_storage
ohci_hcd 18105 0
ehci_hcd 33641 0
ahci 20748 4
usbcore 130936 7
pl2303,xhci_hcd,usbserial,usb_storage,usb_libusual,ohci_hcd,ehci_hcd
libahci 21202 1 ahci
sata_mv 26939 0
megaraid_sas 71659 14
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: Eric Dumazet @ 2011-06-07 3:19 UTC (permalink / raw)
To: Lucian Adrian Grijincu; +Cc: netdev, David S. Miller
In-Reply-To: <1307410786-19110-3-git-send-email-lucian.grijincu@gmail.com>
Le mardi 07 juin 2011 à 04:39 +0300, Lucian Adrian Grijincu a écrit :
> The most like case is that no one else is registering devices with a
> name like "dummy%d".
>
> We can bring the complexity down by replacing:
> - alloc_netdev_id which is O(N) with
> - alloc_netdev_id which, on the average case, is O(1).
>
> $ time modprobe dummy numdummies=5000
> - with alloc_netdev : 9.50s
> - with alloc_netdev_id: 3.50s
>
> NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
> current O(N^2) sysctl algorithm with a better one.
Yes, and disabled hotplug I guess.
Dont try this on a random computer ;)
# time modprobe dummy numdummies=5000
real 4m45.646s
user 0m0.000s
sys 0m12.440s
# uptime
05:13:46 up 13:30, 3 users, load average: 11221.41, 6918.70, 3101.12
# uptime
05:18:45 up 13:35, 3 users, load average: 12159.82, 10277.39, 5623.19
^ permalink raw reply
* [PATCH] bonding: use new value of lacp_rate and ad_select
From: Weiping Pan @ 2011-06-07 2:24 UTC (permalink / raw)
To: jpirko, xiyou.wangcong
Cc: Weiping Pan, Jay Vosburgh, Andy Gospodarek,
open list:BONDING DRIVER, open list
There is bug that when you modify lacp_rate via sysfs,
802.3ad won't use the new value of lacp_rate to transmit packets.
This is because port->actor_oper_port_state isn't changed.
As for ad_select, it can work,
but both struct bond_params and ad_bond_info have lacp_fast and ad_select,
they are duplicate and need extra synchronization.
802.3ad can get them from bond_params directly every time.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
drivers/net/bonding/bond_3ad.c | 41 ++++++++++++++++++++++++++++++++-----
drivers/net/bonding/bond_3ad.h | 7 +----
drivers/net/bonding/bond_main.c | 3 +-
drivers/net/bonding/bond_sysfs.c | 1 +
4 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c7537abc..6122725 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -262,7 +262,7 @@ static inline u32 __get_agg_selection_mode(struct port *port)
if (bond == NULL)
return BOND_AD_STABLE;
- return BOND_AD_INFO(bond).agg_select_mode;
+ return bond->params.ad_select;
}
/**
@@ -1859,7 +1859,6 @@ static void ad_marker_response_received(struct bond_marker *marker,
void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout)
{
BOND_AD_INFO(bond).agg_select_timer = timeout;
- BOND_AD_INFO(bond).agg_select_mode = bond->params.ad_select;
}
static u16 aggregator_identifier;
@@ -1868,11 +1867,10 @@ static u16 aggregator_identifier;
* bond_3ad_initialize - initialize a bond's 802.3ad parameters and structures
* @bond: bonding struct to work on
* @tick_resolution: tick duration (millisecond resolution)
- * @lacp_fast: boolean. whether fast periodic should be used
*
* Can be called only after the mac address of the bond is set.
*/
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast)
+void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution)
{
// check that the bond is not initialized yet
if (MAC_ADDRESS_COMPARE(&(BOND_AD_INFO(bond).system.sys_mac_addr),
@@ -1880,7 +1878,6 @@ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fas
aggregator_identifier = 0;
- BOND_AD_INFO(bond).lacp_fast = lacp_fast;
BOND_AD_INFO(bond).system.sys_priority = 0xFFFF;
BOND_AD_INFO(bond).system.sys_mac_addr = *((struct mac_addr *)bond->dev->dev_addr);
@@ -1903,6 +1900,7 @@ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fas
int bond_3ad_bind_slave(struct slave *slave)
{
struct bonding *bond = bond_get_bond_by_slave(slave);
+ int lacp_fast = bond->params.lacp_fast;
struct port *port;
struct aggregator *aggregator;
@@ -1918,7 +1916,7 @@ int bond_3ad_bind_slave(struct slave *slave)
// port initialization
port = &(SLAVE_AD_INFO(slave).port);
- ad_initialize_port(port, BOND_AD_INFO(bond).lacp_fast);
+ ad_initialize_port(port, lacp_fast);
port->slave = slave;
port->actor_port_number = SLAVE_AD_INFO(slave).id;
@@ -2473,3 +2471,34 @@ void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
read_unlock(&bond->lock);
}
+
+/*
+ * When modify lacp_rate parameter via sysfs,
+ * update actor_oper_port_state of each port.
+ *
+ * Hold slave->state_machine_lock,
+ * so we can modify port->actor_oper_port_state,
+ * no matter bond is up or down.
+ */
+void bond_3ad_update_lacp_rate(struct bonding *bond)
+{
+ int i;
+ struct slave *slave;
+ struct port *port = NULL;
+ int lacp_fast;
+
+ read_lock(&bond->lock);
+ lacp_fast = bond->params.lacp_fast;
+
+ bond_for_each_slave(bond, slave, i) {
+ port = &(SLAVE_AD_INFO(slave).port);
+ __get_state_machine_lock(port);
+ if (lacp_fast)
+ port->actor_oper_port_state |= AD_STATE_LACP_TIMEOUT;
+ else
+ port->actor_oper_port_state &= ~AD_STATE_LACP_TIMEOUT;
+ __release_state_machine_lock(port);
+ }
+
+ read_unlock(&bond->lock);
+}
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 0ee3f16..1682e69 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -253,10 +253,6 @@ struct ad_system {
struct ad_bond_info {
struct ad_system system; /* 802.3ad system structure */
u32 agg_select_timer; // Timer to select aggregator after all adapter's hand shakes
- u32 agg_select_mode; // Mode of selection of active aggregator(bandwidth/count)
- int lacp_fast; /* whether fast periodic tx should be
- * requested
- */
struct timer_list ad_timer;
};
@@ -269,7 +265,7 @@ struct ad_slave_info {
};
// ================= AD Exported functions to the main bonding code ==================
-void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast);
+void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution);
int bond_3ad_bind_slave(struct slave *slave);
void bond_3ad_unbind_slave(struct slave *slave);
void bond_3ad_state_machine_handler(struct work_struct *);
@@ -282,5 +278,6 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
struct slave *slave);
int bond_3ad_set_carrier(struct bonding *bond);
+void bond_3ad_update_lacp_rate(struct bonding *bond);
#endif //__BOND_3AD_H__
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 716c852..bb1af9c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1843,8 +1843,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
/* Initialize AD with the number of times that the AD timer is called in 1 second
* can be called only after the mac address of the bond is set
*/
- bond_3ad_initialize(bond, 1000/AD_TIMER_INTERVAL,
- bond->params.lacp_fast);
+ bond_3ad_initialize(bond, 1000/AD_TIMER_INTERVAL);
} else {
SLAVE_AD_INFO(new_slave).id =
SLAVE_AD_INFO(new_slave->prev).id + 1;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 88fcb25..03d1196 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -804,6 +804,7 @@ static ssize_t bonding_store_lacp(struct device *d,
if ((new_value == 1) || (new_value == 0)) {
bond->params.lacp_fast = new_value;
+ bond_3ad_update_lacp_rate(bond);
pr_info("%s: Setting LACP rate to %s (%d).\n",
bond->dev->name, bond_lacp_tbl[new_value].modename,
new_value);
--
1.7.4.4
^ permalink raw reply related
* [PATCH 2/2] net: dummy: allocate devices with alloc_netdev_id
From: Lucian Adrian Grijincu @ 2011-06-07 1:39 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu
In-Reply-To: <1307410786-19110-1-git-send-email-lucian.grijincu@gmail.com>
The most like case is that no one else is registering devices with a
name like "dummy%d".
We can bring the complexity down by replacing:
- alloc_netdev_id which is O(N) with
- alloc_netdev_id which, on the average case, is O(1).
$ time modprobe dummy numdummies=5000
- with alloc_netdev : 9.50s
- with alloc_netdev_id: 3.50s
NOTE: Stats generated on a heavily patched 3.0-rc1 which replaces the
current O(N^2) sysctl algorithm with a better one.
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
drivers/net/dummy.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 39cf9b9..24d4ee5 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -159,12 +159,14 @@ static struct rtnl_link_ops dummy_link_ops __read_mostly = {
module_param(numdummies, int, 0);
MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices");
+
+static int last_device_id = -1;
static int __init dummy_init_one(void)
{
struct net_device *dev_dummy;
int err;
- dev_dummy = alloc_netdev(0, "dummy%d", dummy_setup);
+ dev_dummy = alloc_netdev_id(0, "dummy%d", dummy_setup, &last_device_id);
if (!dev_dummy)
return -ENOMEM;
--
1.7.5.2.317.g391b14
^ permalink raw reply related
* [PATCH 1/2] net: add alloc_netdev_mqs_id
From: Lucian Adrian Grijincu @ 2011-06-07 1:39 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu
In-Reply-To: <1307410786-19110-1-git-send-email-lucian.grijincu@gmail.com>
The complexity of alloc_netdev_mqs depends on the type of the device name:
- O(nr-net-devices) - for a device name with '%d' in it
- O(1) - for given device name without any format.
The difference comes from the path chosen in __dev_alloc_name: if '%d'
is found in the name (e.g. 'dummy%d') it will:
- match all the devices in the that network namespace with the device
name format extracting all used values for '%d' (e.g. 'dummy0',
'dummy1', dummy3' => {0, 1 ,3} are used)
- create a device with the smallest unused value (e.g. 'dummy2').
Obviously the O(N) part comes the for_each_netdev loop. One could keep
around a precomputed table of values that are in use for each pattern
that is of interest (patterns for with there will be large numbers of
devices created) and make sure to mark slots as unused when
unregistering the device. The table would have no use after
registering a device and would need to be netns-specific.
Things get more complicated when taking into consideration device
renames and registration of devices that do not use patterns in names
(e.g. an explicit registration of a device with the 'dummy3' name).
This patch adds a new method of creating device names that aims to sit
in the middle: accept device names patterns with '%d' and the last
value used for '%d'. If the next slot is not taken, alloc_netdev_mqs_id
will be an O(1) operation. If that name is taken it falls back on
the O(N) algorithm.
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
include/linux/netdevice.h | 7 +++++
net/core/dev.c | 63 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 70 insertions(+), 0 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca333e7..612c1f3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2452,9 +2452,16 @@ extern void ether_setup(struct net_device *dev);
extern struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
void (*setup)(struct net_device *),
unsigned int txqs, unsigned int rxqs);
+extern struct net_device *alloc_netdev_mqs_id(int sizeof_priv, const char *name,
+ void (*setup)(struct net_device *),
+ unsigned int txqs, unsigned int rxqs, int *p_last_id);
+
#define alloc_netdev(sizeof_priv, name, setup) \
alloc_netdev_mqs(sizeof_priv, name, setup, 1, 1)
+#define alloc_netdev_id(sizeof_priv, name, setup, p_last_id) \
+ alloc_netdev_mqs_id(sizeof_priv, name, setup, 1, 1, p_last_id)
+
#define alloc_netdev_mq(sizeof_priv, name, setup, count) \
alloc_netdev_mqs(sizeof_priv, name, setup, count, count)
diff --git a/net/core/dev.c b/net/core/dev.c
index 9393078..0862e81 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5908,6 +5908,69 @@ free_p:
}
EXPORT_SYMBOL(alloc_netdev_mqs);
+
+/**
+ * alloc_netdev_mqs_id - allocate a network device
+ *
+ * @name - format of the device name. E.g. 'dummy%d'
+ * @p_last_id - IN: last known value that was given to '%d'
+ * OUT: the value used for '%d' for the newly created device
+ *
+ * @sizeof_priv: size of private data to allocate space for
+ * @setup: callback to initialize device
+ * @txqs: the number of TX subqueues to allocate
+ * @rxqs: the number of RX subqueues to allocate
+ *
+ * alloc_netdev_mqs' complexity depends on the device name:
+ * - O(nr-net-devices) - for a device name with '%d' in it
+ * - O(1) - for given device name without any format.
+ *
+ * alloc_netdev_mqs_id takes an extra argument: the last value that was
+ * used to fill '%d' in the name pattern. It uses this to create name
+ * that is likely to not be used (last_id+1) and tries to register a
+ * device with that name - O(1). If that fails it drops to the O(N)
+ * algorithm by sending the device name format.
+ *
+ * alloc_netdev_mqs will always make sure to find the smallest unused
+ * value for the '%d' in the name. alloc_netdev_mqs_id does not.
+ *
+ * E.g.:
+ * - you create 8 devices by calling alloc_netdev_mqs_id ('eth0' .. 'eth7')
+ * and you know the next free slot is 'eth8'.
+ * - someone renames 'eth2' to 'some-other-name'
+ * - the next device created by alloc_netdev_mqs_id will be 'eth8'
+ * even though 'eth2' could have been used.
+ */
+struct net_device *alloc_netdev_mqs_id(int sizeof_priv, const char *name,
+ void (*setup)(struct net_device *),
+ unsigned int txqs, unsigned int rxqs, int *p_last_id)
+{
+ struct net_device *dev;
+ char buf[IFNAMSIZ];
+
+ int new_id = (*p_last_id) + 1;
+
+ /* first try with explicit name - O(1) */
+ snprintf(buf, IFNAMSIZ, name, new_id);
+ dev = alloc_netdev_mqs(sizeof_priv, buf, setup, txqs, rxqs);
+ if (dev)
+ goto out;
+
+ /* fallback: create a name automatically - O(N) */
+ dev = alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs);
+ if (!dev)
+ goto fail;
+
+ sscanf(dev->name, name, &new_id);
+
+out:
+ *p_last_id = new_id;
+fail:
+ return dev;
+}
+EXPORT_SYMBOL(alloc_netdev_mqs_id);
+
+
/**
* free_netdev - free network device
* @dev: device
--
1.7.5.2.317.g391b14
^ permalink raw reply related
* [PATCH 0/2] speed up net device allocation using pattern names
From: Lucian Adrian Grijincu @ 2011-06-07 1:39 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Eric Dumazet, Lucian Adrian Grijincu
The next two patches:
- add a faster way to add net devices using pattern names like "dummy%d"
- call that routine for dummy
Patches are against net-next, but should apply cleanly to 3.0-rc1 too.
Lucian Adrian Grijincu (2):
net: add alloc_netdev_mqs_id
net: dummy: allocate devices with alloc_netdev_id
drivers/net/dummy.c | 4 ++-
include/linux/netdevice.h | 7 +++++
net/core/dev.c | 63 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+), 1 deletions(-)
--
1.7.5.2.317.g391b14
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox