* [PATCH]: bnx2: Fix uninitialized variable warning in bnx2_disable_forced_2g5()
From: Prarit Bhargava @ 2010-05-27 22:13 UTC (permalink / raw)
To: netdev, mchan; +Cc: Prarit Bhargava
Fix warning:
drivers/net/bnx2.c: In function 'bnx2_disable_forced_2g5':
drivers/net/bnx2.c:1489: warning: 'bmcr' may be used uninitialized in this function
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 667f419..7e2f8ac 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1486,7 +1486,7 @@ bnx2_enable_forced_2g5(struct bnx2 *bp)
static void
bnx2_disable_forced_2g5(struct bnx2 *bp)
{
- u32 bmcr;
+ u32 uninitialized_var(bmcr);
if (!(bp->phy_flags & BNX2_PHY_FLAG_2_5G_CAPABLE))
return;
^ permalink raw reply related
* Re: boot crash in arp_error_report()
From: David Miller @ 2010-05-27 23:10 UTC (permalink / raw)
To: eric.dumazet; +Cc: torvalds, mingo, tglx, akpm, netdev, linux-kernel
In-Reply-To: <1274991504.2446.13.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 May 2010 22:18:24 +0200
> [PATCH] net: fix __neigh_event_send()
>
> commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot
> where an skb is enqueued, with a possibly not refcounted dst entry.
>
> __neigh_event_send() inserts skb into arp_queue, so we must make sure
> dst entry is refcounted, or dst entry can be freed by garbage collector
> after caller exits from rcu protected section.
>
> Reported-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
Ingo can we get a confirmation that this fixes the bootup crash?
Thanks!
^ permalink raw reply
* Re: [GIT PULL net-2.6] vhost-net: error handling fixes
From: David Miller @ 2010-05-27 23:13 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, krkumar2
In-Reply-To: <20100527115714.GA7899@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 27 May 2010 14:57:14 +0300
> David,
> The following tree includes fixes dealing with error handling
> in vhost-net. It is on top of net-2.6.
> Please merge it for 2.6.35.
> Thanks!
>
> The following changes since commit 8a74ad60a546b13bd1096b2a61a7a5c6fd9ae17c:
>
> net: fix lock_sock_bh/unlock_sock_bh (2010-05-27 00:30:53 -0700)
>
> are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
Pulled, thanks Michael.
^ permalink raw reply
* Re: ipv6: Add GSO support on forwarding path
From: David Miller @ 2010-05-27 23:14 UTC (permalink / raw)
To: herbert; +Cc: ralf, netdev
In-Reply-To: <20100527115440.GA9952@gondor.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 27 May 2010 21:54:40 +1000
> On Thu, May 27, 2010 at 08:24:29PM +1000, Herbert Xu wrote:
>>
>> Actually, this patch is still not quite right for the GSO case.
>> Right now it's only adding the IP header size, not the full header
>> size.
>>
>> I need to think a bit more about this.
>
> Let's do it the old way first, so that it at least works for the
> common case. I'll fix it properly later.
>
> ipv6: Add GSO support on forwarding path
Ok, applied, thanks Herbert.
^ permalink raw reply
* Re: [PATCH]: niu: fix uninitialized variable warning
From: David Miller @ 2010-05-27 23:23 UTC (permalink / raw)
To: prarit; +Cc: netdev
In-Reply-To: <20100527183100.24251.12228.sendpatchset@prarit.bos.redhat.com>
From: Prarit Bhargava <prarit@redhat.com>
Date: Thu, 27 May 2010 14:35:03 -0400
> Fixes warning:
>
> drivers/net/niu.c: In function ‘niu_process_rx_pkt’:
> drivers/net/niu.c:3457: error: ‘link’ may be used uninitialized in this function
>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
These blind uninitialized_var() annotations are almost always the wrong
thing to do. And that is the case here.
The code that looks up the page here can return NULL if there is some
error and the page hashes aren't setup properly, and then the callers
go and blindly assume that 'page' is non-NULL without any error
checking.
Applying this patch does more harm than good because it hides this
issue. So I'm definitely not applying this patch.
^ permalink raw reply
* Re: [PATCH]: bnx2: Fix uninitialized variable warning in bnx2_disable_forced_2g5()
From: David Miller @ 2010-05-27 23:26 UTC (permalink / raw)
To: prarit; +Cc: netdev, mchan
In-Reply-To: <20100527220856.25418.86593.sendpatchset@prarit.bos.redhat.com>
From: Prarit Bhargava <prarit@redhat.com>
Date: Thu, 27 May 2010 18:13:01 -0400
> Fix warning:
>
> drivers/net/bnx2.c: In function 'bnx2_disable_forced_2g5':
> drivers/net/bnx2.c:1489: warning: 'bmcr' may be used uninitialized in this function
>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
This is another bad patch, you're hiding a bug by making this warning
go away.
If bnx2_read_phy() in fact does not initialize "bmcr" we might
want to do something about that instead of just blindly using
whatever garbage is in that local variable.
Please stop making these patches without applying at least some
intelligence to the analysis of why these warnings are being
printed at all.
None of these cases you're posting patches for are situations where
the compiler simply can't see the control flow properly, they are real
issues.
Thanks.
^ permalink raw reply
* Re: [PATCH net-2.6] be2net: Patch removes redundant while statement in loop.
From: David Miller @ 2010-05-27 23:28 UTC (permalink / raw)
To: sarveshwarb; +Cc: netdev
In-Reply-To: <20100527093208.GA13740@serverengines.com>
From: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Date: Thu, 27 May 2010 15:02:19 +0530
> Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 4/11] drivers/net/hamradio: Eliminate a NULL pointer dereference
From: David Miller @ 2010-05-27 23:29 UTC (permalink / raw)
To: julia; +Cc: jpr, linux-hams, netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005271432450.5422@ask.diku.dk>
From: Julia Lawall <julia@diku.dk>
Date: Thu, 27 May 2010 14:33:00 +0200 (CEST)
> At the point of the print, dev is NULL.
>
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
...
> Signed-off-by: Julia Lawall <julia@diku.dk>
Applied.
^ permalink raw reply
* Re: [PATCH 6/11] drivers/net: Eliminate a NULL pointer dereference
From: David Miller @ 2010-05-27 23:30 UTC (permalink / raw)
To: julia; +Cc: netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005271433180.5422@ask.diku.dk>
From: Julia Lawall <julia@diku.dk>
Date: Thu, 27 May 2010 14:33:32 +0200 (CEST)
> At the point of the print, dev is NULL.
...
> Signed-off-by: Julia Lawall <julia@diku.dk>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] cnic: Fix context memory init. on 5709.
From: David Miller @ 2010-05-27 23:31 UTC (permalink / raw)
To: mchan; +Cc: netdev
In-Reply-To: <1274991858-6201-1-git-send-email-mchan@broadcom.com>
From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 27 May 2010 13:24:18 -0700
> We need to zero context memory on 5709 in the function cnic_init_context().
> Without this, iscsid restart on 5709 will not work because of stale data.
> TX context blocks should not be initialized by cnic_init_context() because
> of the special remapping on 5709.
>
> Update version to 2.1.2.
>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied, thanks Michael.
^ permalink raw reply
* RE: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode
From: Xin, Xiaohui @ 2010-05-28 1:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Herbert Xu
In-Reply-To: <20100527082025.GA5579@redhat.com>
Michael,
What's you have suggested could avoid to taint the guest virtio-net driver. Really thanks!
For the two alternative, the first will do more trick with driver or net-core stuff. So currently, I prefer to try the second one. Anyway, let me have a good think of it. Thanks!
Thanks
Xiaohui
>-----Original Message-----
>From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of
>Michael S. Tsirkin
>Sent: Thursday, May 27, 2010 4:20 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Herbert
>Xu
>Subject: Re: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy
>to support PS mode
>
>On Thu, May 27, 2010 at 09:21:02AM +0800, Xin, Xiaohui wrote:
>> Michael,
>> I'm now looking into the vhost mergeable buffer, and I tried to use it to support PS mode
>with zero-copy. And I found an issue there that I have to modify the guest virito-net driver.
>>
>> When guest virtio-net driver submits mergeable buffers, it submits multiple pages outside.
>In zero-copy case, vhost cannot know which page is used to put header, and which page is
>used to put payload. Then vhost can only reserves 12 bytes for each page. That means, the
>page_offset of the payload DMAed into the guest buffer is always 12 bytes. But guest
>virtio-net driver always use offset 0 to put the data (See receive_mergeable()). That's where
>the zero-copy use mergeable buffer must modify.
>>
>> Have I missed something here? And how do you think about it?
>>
>> Thanks
>> Xiaohui
>
>Maybe you can teach the hardware skip the first 12 bytes: qemu will
>call an ioctl telling hardware what the virtio header size is.
>This is how we plan to do it for tap.
>
>Alternatively, buffers can be used in any order.
>So we can have hardware use N buffers for the packet, and then
>have vhost put the header in buffer N+1.
>
>--
>MST
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-05-28 2:47 UTC (permalink / raw)
To: Flavio Leitner
Cc: linux-kernel, Matt Mackall, netdev, bridge, Andy Gospodarek,
Neil Horman, Jeff Moyer, Stephen Hemminger, bonding-devel,
Jay Vosburgh, David Miller
In-Reply-To: <20100527180545.GA2345@sysclose.org>
On 05/28/10 02:05, Flavio Leitner wrote:
>
> Hi guys!
>
> I finally could test this to see if an old problem reported on bugzilla[1] was
> fixed now, but unfortunately it is still there.
>
> The ticket is private I guess, but basically the problem happens when bonding
> driver tries to print something after it had taken the write_lock (monitor
> functions, enslave/de-enslave), so the printk() will pass through netpoll, then
> on bonding again which no matter what mode you use, it will try to read_lock()
> the lock again. The result is a deadlock and the entire system hangs.
This is true, I already fixed some similar issues.
>
> I manage to get a fresh backtrace with mode 1, see below:
>
>
> [ 93.167079] Call Trace:
> [ 93.167079] [<ffffffff81034cf9>] warn_slowpath_common+0x77/0x8f
> [ 93.167079] [<ffffffff81034d5e>] warn_slowpath_fmt+0x3c/0x3e
> [ 93.167079] [<ffffffff81366aef>] ? _raw_read_trylock+0x11/0x4b
> [ 93.167079] [<ffffffffa02a2c42>] ? bond_start_xmit+0x12b/0x401 [bonding]
> -> read_lock fails
> [ 93.167079] [<ffffffffa02a2c9f>] bond_start_xmit+0x188/0x401 [bonding]
> [ 93.167079] [<ffffffff81055b37>] ? trace_hardirqs_off+0xd/0xf
> [ 93.167079] [<ffffffff812dfdb9>] netpoll_send_skb+0xbd/0x1f3
> [ 93.167079] [<ffffffff812e00ed>] netpoll_send_udp+0x1fe/0x20d
> [ 93.167079] [<ffffffffa02c017c>] write_msg+0x89/0xcd [netconsole]
> [ 93.167079] [<ffffffff81034e65>] __call_console_drivers+0x67/0x79
> [ 93.167079] [<ffffffff81034ed0>] _call_console_drivers+0x59/0x5d
> [ 93.167079] [<ffffffff810352d3>] release_console_sem+0x121/0x1d7
> [ 93.167079] [<ffffffff8103590a>] vprintk+0x35d/0x393
> [ 93.167079] [<ffffffff8103f947>] ? add_timer+0x17/0x19
> [ 93.167079] [<ffffffff81046ddf>] ? queue_delayed_work_on+0xa2/0xa9
> [ 93.167079] [<ffffffff81363bb8>] printk+0x3c/0x44
> [ 93.167079] [<ffffffffa02a3b17>] bond_select_active_slave+0x105/0x109 [bonding]
> -> write_locked
> [ 93.167079] [<ffffffffa02a4798>] bond_mii_monitor+0x479/0x4ed [bonding]
> [ 93.167079] [<ffffffff81046009>] worker_thread+0x1ef/0x2e2
>
> In this case, the message should be
> "bonding: bond0: making interface eth0 the new active one"
Hmm, you triggered a warning here, let me check the source code
and try to reproduce it here.
>
> I did the following patch to discard the packet if it was IN_NETPOLL
> and the read_lock() fails, so I could go ahead testing it:
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..a3b8bad 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4258,8 +4258,19 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> struct bonding *bond = netdev_priv(bond_dev);
> int res = 1;
>
> - read_lock(&bond->lock);
> - read_lock(&bond->curr_slave_lock);
> + if (read_trylock(&bond->lock) == 0&&
> + (bond_dev->flags& IFF_IN_NETPOLL)) {
> + dev_kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + if (read_trylock(&bond->curr_slave_lock) == 0&&
> + (bond_dev->flags& IFF_IN_NETPOLL)) {
> + read_unlock(&bond->lock);
> + dev_kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
>
> if (!BOND_IS_OK(bond))
> goto out;
>
This looks like a workaround, not a fix. :)
>
> and I found another problem. The function netpoll_send_skb() checks
> if the npinfo's queue length is zero and if it's not, it will queue
> the packet to make sure it's in order and then schedule the thread
> to run. Later, the thread wakes up running queue_process() which disables
> interrupts before calling ndo_start_xmit(). However, dev_queue_xmit()
> uses rcu_*_bh() and before return, it will enable the interrupts again,
> spitting this:
>
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:143 local_bh_enable+0x3c/0x86()
> Hardware name: Precision WorkStation 490
> Modules linked in: netconsole bonding sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6]
> Pid: 17, comm: events/2 Not tainted 2.6.34-04700-gd938a70 #21
> Call Trace:
> [<ffffffff810381d6>] warn_slowpath_common+0x77/0x8f
> [<ffffffff810381fd>] warn_slowpath_null+0xf/0x11
> [<ffffffff8103d691>] local_bh_enable+0x3c/0x86
> [<ffffffff812e4d85>] dev_queue_xmit+0x462/0x493
> [<ffffffffa018805f>] bond_dev_queue_xmit+0x1bd/0x1e3 [bonding]
> [<ffffffffa01881dd>] bond_start_xmit+0x158/0x37b [bonding]
> -> interrupts disabled
> [<ffffffff812f3fca>] queue_process+0x9d/0xf9
> [<ffffffff8104d022>] worker_thread+0x19d/0x224
> [<ffffffff812f3f2d>] ? queue_process+0x0/0xf9
> [<ffffffff81050819>] ? autoremove_wake_function+0x0/0x34
> [<ffffffff8104ce85>] ? worker_thread+0x0/0x224
> [<ffffffff8105040b>] kthread+0x7a/0x82
> [<ffffffff810036d4>] kernel_thread_helper+0x4/0x10
> [<ffffffff81050391>] ? kthread+0x0/0x82
> [<ffffffff810036d0>] ? kernel_thread_helper+0x0/0x10
> ---[ end trace 74e3904503fdb632 ]---
>
> kernel/softirq.c:
> 141 static inline void _local_bh_enable_ip(unsigned long ip)
> 142 {
> 143 WARN_ON_ONCE(in_irq() || irqs_disabled());
> 144 #ifdef CONFIG_TRACE_IRQFLAGS
> 145 local_irq_disable();
> 146 #endif
> 147 /*
> 148 * Are softirqs going to be turned on now:
> 149 */
>
>
I am wondering if this was caused by the previous issue.
> The git is updated up to:
> d938a70 be2net: increase POST timeout for EEH recovery
>
> Two slave interfaces, bonding mode 1, netconsole over bond0.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=248374#c5
How did you reproduce this?
I will check that BZ to see if I can find how to reproduce this.
Thanks.
^ permalink raw reply
* fn_trie_lookup and fn_hash_lookup
From: ratheesh k @ 2010-05-28 3:23 UTC (permalink / raw)
To: netdev, linux-net
hi ,
I was looking into 2.6.18 kernel . I can see two functions
fn_trie_lookup , fn_hash_lookup
ipv4/fib_trie.c: tb->tb_lookup = fn_trie_lookup;
ipv4/fib_hash.c: tb->tb_lookup = fn_hash_lookup;
What is the differnce between these two functions ?
Thanks,
Ratheesh
^ permalink raw reply
* Re: [net-next-2.6 V9 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Scott Feldman @ 2010-05-28 3:24 UTC (permalink / raw)
To: Patrick McHardy; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <4BF54616.1020508@trash.net>
On 5/20/10 7:24 AM, "Patrick McHardy" <kaber@trash.net> wrote:
> Scott Feldman wrote:
>> +static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
>> +{
>> + struct nlattr *vf_ports;
>> + struct nlattr *vf_port;
>> + int vf;
>> + int err;
>> +
>> + vf_ports = nla_nest_start(skb, IFLA_VF_PORTS);
>> + if (!vf_ports)
>> + return -EMSGSIZE;
>> +
>> + for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
>> + vf_port = nla_nest_start(skb, IFLA_VF_PORT);
>> + if (!vf_port) {
>> + nla_nest_cancel(skb, vf_ports);
>> + return -EMSGSIZE;
>> + }
>> + NLA_PUT_U32(skb, IFLA_PORT_VF, vf);
>> + err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
>> + if (err) {
>> +nla_put_failure:
>> + nla_nest_cancel(skb, vf_port);
>> + continue;
>
> Are you sure you want to continue here? During a full dump
> this means that the last VF port fitting in the skb will most
> likely be incomplete since the higher layer code won't
> notice that the size was exceeded and the entry should be
> dumped into another skb.
Drats, that's a bug. I'll get that fixed. Thanks Patrick for catching it.
-scott
^ permalink raw reply
* Re: fn_trie_lookup and fn_hash_lookup
From: Stephen Hemminger @ 2010-05-28 3:40 UTC (permalink / raw)
To: ratheesh k; +Cc: netdev, linux-net
In-Reply-To: <AANLkTikmIzPaOKdlXtCwosfJXzaKY1vxib_MdqI0vN_5@mail.gmail.com>
On Fri, 28 May 2010 08:53:11 +0530
ratheesh k <ratheesh.ksz@gmail.com> wrote:
> hi ,
>
> I was looking into 2.6.18 kernel . I can see two functions
> fn_trie_lookup , fn_hash_lookup
>
> ipv4/fib_trie.c: tb->tb_lookup = fn_trie_lookup;
> ipv4/fib_hash.c: tb->tb_lookup = fn_hash_lookup;
>
> What is the differnce between these two functions ?
They are the two possible FIB algorithms configurable.
net/ipv4/Kconfig:
choice
prompt "Choose IP: FIB lookup algorithm (choose FIB_HASH if unsure)"
depends on IP_ADVANCED_ROUTER
default ASK_IP_FIB_HASH
config ASK_IP_FIB_HASH
bool "FIB_HASH"
---help---
Current FIB is very proven and good enough for most users.
config IP_FIB_TRIE
bool "FIB_TRIE"
---help---
Use new experimental LC-trie as FIB lookup algorithm.
This improves lookup performance if you have a large
number of routes.
LC-trie is a longest matching prefix lookup algorithm which
performs better than FIB_HASH for large routing tables.
But, it consumes more memory and is more complex.
LC-trie is described in:
IP-address lookup using LC-tries. Stefan Nilsson and Gunnar Karlsson
IEEE Journal on Selected Areas in Communications, 17(6):1083-1092,
June 1999
An experimental study of compression methods for dynamic tries
Stefan Nilsson and Matti Tikkanen. Algorithmica, 33(1):19-33, 2002.
http://www.nada.kth.se/~snilsson/public/papers/dyntrie2/
Also see Documentation/networking/fib_trie.txt
^ permalink raw reply
* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: Bryan Wu @ 2010-05-28 5:23 UTC (permalink / raw)
To: David Miller; +Cc: s.hauer, gerg, amit.kucheria, netdev, linux-kernel
In-Reply-To: <20100516.002818.133869940.davem@davemloft.net>
On 05/16/2010 03:28 PM, David Miller wrote:
> From: Bryan Wu <bryan.wu@canonical.com>
> Date: Fri, 7 May 2010 10:27:18 +0800
>
>> BugLink: http://bugs.launchpad.net/bugs/546649
>> BugLink: http://bugs.launchpad.net/bugs/457878
>>
>> After introducing phylib supporting, users experienced performace drop. That is
>> because of the mdio polling operation of phylib. Use msleep to replace the busy
>> waiting cpu_relax() and remove the warning message.
>>
>> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
>> Acked-by: Andy Whitcroft <apw@canonical.com>
>
> As you've already been told, making these MDIO interfaces fail silently
> is not acceptable.
>
> Please fix this bug properly and resubmit this patch series.
>
So sorry for the delay. My board's broken for several weeks. After I got a new,
I will try to fix this and resubmit this patch.
Thanks
--
Bryan Wu <bryan.wu@canonical.com>
Kernel Developer +86.138-1617-6545 Mobile
Ubuntu Kernel Team | Hardware Enablement Team
Canonical Ltd. www.canonical.com
Ubuntu - Linux for human beings | www.ubuntu.com
^ permalink raw reply
* Re: [RFC] netfilter: WIP: Xtables idletimer target implementation
From: Luciano Coelho @ 2010-05-28 5:25 UTC (permalink / raw)
To: ext Jan Engelhardt
Cc: netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
kaber@trash.net, Timo Teras
In-Reply-To: <alpine.LSU.2.01.1005280057190.14077@obet.zrqbmnf.qr>
Hi Jan,
Thanks a lot for your comments!
On Fri, 2010-05-28 at 01:17 +0200, ext Jan Engelhardt wrote:
> On Thursday 2010-05-27 22:54, Luciano Coelho wrote:
> >There are still some issues to be resolved:
> >
> >How to treat several rules for the same interface?
> >
> >We need a key for the timer list. I'm using the targinfo pointer for that,
> >but this looks shaky, because there is no assurance that this pointer will
> >live for the entire lifetime of the rule.
>
> Well xt_quota for example has its targinfo around at all times,
> as have other modules.
Great, so this will work. I had checked the x_tables code and it seemed
that the lifetime of targinfo was sufficient, but I was not sure this
would be safe in the future. Now, if this changes in the future, my
module won't be the only one to break ;)
> >This is now an x_tables target, so other protocols need to be implemented.
>
> Huh? You're not looking at packets, so why does it need proto-specific
> stuff?
I need to associate the timers with specific interfaces, because it's
the idle time of the interfaces that the userspace in interested in. I
didn't find any other way to associate the timers with them, except by
looking at the iniface and outiface values in ipt_ip (and eventually,
with IPv6 properly implemented, in ip6t_ip6). This is what Patrick
suggested when he checked my previous patch [1] and triggered me to do a
major rework on my module ;)
Do you have any other suggestion on how I can associate the rules to
specific interfaces?
> >+static
> >+struct utimer_t *__utimer_find_by_info(const struct xt_idletimer_info *info)
> >+{
> >+ struct utimer_t *entry;
> >+
> >+ list_for_each_entry(entry, &active_utimer_head, entry) {
> >+ if (info == entry->info) {
> >+ return entry;
> >+ }
> >+ }
> >+
> >+ return NULL;
> >+}
>
> Can do with less braces.
Sure. These remained there after I removed some traces. I'll clean
this up.
> >+static void utimer_expired(unsigned long data)
> >+{
> >+ struct utimer_t *timer = (struct utimer_t *) data;
> >+
> >+ pr_debug("xt_idletimer: timer '%s' ns %p expired\n",
> >+ timer->name, timer->net);
> >+
> >+ schedule_work(&timer->work);
> >+}
>
> You don't need xt_idletimer, because pr_debug already prints
> that (with #define pr_fmt(fmt) KBUILD_MODNAME ": " as many
> other modules do)
Ok, I'll change it. Thanks for pointing out.
> >+
> >+static struct utimer_t *utimer_create(const char *name,
> >+ struct net *net,
> >+ const struct xt_idletimer_info *info)
> >+{
> >+ struct utimer_t *timer;
> >+
> >+ timer = kmalloc(sizeof(struct utimer_t), GFP_ATOMIC);
> >+ if (timer == NULL)
> >+ return NULL;
>
> This is called from user context, so GFP_KERNEL will perfectly suffice.
Yup, will change.
> >+static int xt_idletimer_checkentry(const struct xt_tgchk_param *par)
> >+{
> >+ const struct xt_idletimer_info *info = par->targinfo;
> >+ const struct ipt_entry *entryinfo = par->entryinfo;
> >+ const struct ipt_ip *ip = &entryinfo->ip;
>
> I'm not sure spying on ipt_ip is a long-term viable solution.
Do you have any other suggestions on how I could get an interface
associated with the rule? I thought about having the userspace pass the
interface as an option to the rule (like I already do for the timeout
value), but that looked ugly to me, since the interface can already be
defined as part of the ruleset.
> >+ pr_debug("xt_idletimer: checkentry targinfo %p\n", par->targinfo);
> >+
> >+ if (info->timeout == 0) {
> >+ pr_debug("xt_idletimer: timeout value is zero\n");
> >+ return -EINVAL;
> >+ }
> >+
> >+ /* FIXME: implement support for other protocol families */
> >+ switch (par->family) {
> >+ case NFPROTO_IPV4 :
> >+ pr_debug("xt_idletimer: NFPROTO_IPV4\n");
> >+ break;
> >+
> >+ default:
> >+ pr_debug("xt_idletimer: Unsupported protocol family family!\n");
> >+ return -EINVAL;
> >+ }
> >+
> >+ if (strlen(ip->iniface) == 0 && strlen(ip->outiface) == 0) {
> >+ pr_debug("xt_idletimer: in or out interface must "
> >+ "be specified\n");
> >+ return -EINVAL;
> >+ }
> >+
> >+ if (utimer_create(ip->iniface, par->net, info) == NULL) {
> >+ pr_debug("xt_idletimer: failed to create timer\n");
> >+ return -ENOMEM;
> >+ }
>
> What about outiface?
> What blows up when iniface is empty?
Ooops! I've redone this part of the code so many times and in this
version I completely forgot to include the outiface. I'll fix it.
> >+static void xt_idletimer_destroy(const struct xt_tgdtor_param *par)
> >+{
> >+ const struct xt_idletimer_info *info = par->targinfo;
> >+
> >+ pr_debug("xt_idletimer: destroy targinfo %p\n", par->targinfo);
> >+
> >+ utimer_delete(info);
> >+}
> >+
> >+static int __init init(void)
> >+{
> >+ int ret;
> >+
> >+ ret = utimer_init();
> >+ if (ret)
> >+ return ret;
> >+
> >+ ret = xt_register_target(&xt_idletimer);
> >+ if (ret < 0) {
> >+ utimer_fini();
> >+ return ret;
> >+ }
> >+
> >+ return 0;
> >+}
> >+
> >+static void __exit fini(void)
> >+{
> >+ xt_unregister_target(&xt_idletimer);
> >+ utimer_fini();
> >+}
> >+
> >+module_init(init);
> >+module_exit(fini);
>
> Call it just exit?
Yes.
> Also give the functions better names (see other modules), that is going
> to be unrecognizable in stacktraces otherwise.
I agree. These names are coming from the original code. I thought
about changing them to something clearer, but didn't bother to do it
yet, because I was focusing on the actual functionality. I'll change
the names.
Again, thanks for your comments. I'll rework and submit v2 soon.
Ah, and please excuse my lameness of mistyping the netdev email address
when I submitted the patch. I fixed it now.
[1]
http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/33934
--
Cheers,
Luca.
^ permalink raw reply
* Re: [PATCH 2/2] netdev/fec: fix ifconfig eth0 down hang issue
From: Bryan Wu @ 2010-05-28 5:26 UTC (permalink / raw)
To: afleming, davem
Cc: Sascha Hauer, Greg Ungerer, Amit Kucheria, netdev, linux-kernel
In-Reply-To: <1273199239-11057-3-git-send-email-bryan.wu@canonical.com>
Since this patch is for another bug and doesn't depend on my first one patch.
Could you please review this?
Thanks,
-Bryan
On 05/07/2010 10:27 AM, Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/559065
>
> In fec open/close function, we need to use phy_connect and phy_disconnect
> operation before we start/stop phy. Otherwise it will cause system hang.
>
> Only call fec_enet_mii_probe() in open function, because the first open
> action will cause NULL pointer error.
>
> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
> ---
> drivers/net/fec.c | 28 ++++++++++++++++------------
> 1 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 9c58f6b..af4243f 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -678,6 +678,8 @@ static int fec_enet_mii_probe(struct net_device *dev)
> struct phy_device *phy_dev = NULL;
> int phy_addr;
>
> + fep->phy_dev = NULL;
> +
> /* find the first phy */
> for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
> if (fep->mii_bus->phy_map[phy_addr]) {
> @@ -708,6 +710,11 @@ static int fec_enet_mii_probe(struct net_device *dev)
> fep->link = 0;
> fep->full_duplex = 0;
>
> + printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> + "(mii_bus:phy_addr=%s, irq=%d)\n", dev->name,
> + fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> + fep->phy_dev->irq);
> +
> return 0;
> }
>
> @@ -753,13 +760,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> if (mdiobus_register(fep->mii_bus))
> goto err_out_free_mdio_irq;
>
> - if (fec_enet_mii_probe(dev) != 0)
> - goto err_out_unregister_bus;
> -
> return 0;
>
> -err_out_unregister_bus:
> - mdiobus_unregister(fep->mii_bus);
> err_out_free_mdio_irq:
> kfree(fep->mii_bus->irq);
> err_out_free_mdiobus:
> @@ -912,7 +914,12 @@ fec_enet_open(struct net_device *dev)
> if (ret)
> return ret;
>
> - /* schedule a link state check */
> + /* Probe and connect to PHY when open the interface */
> + ret = fec_enet_mii_probe(dev);
> + if (ret) {
> + fec_enet_free_buffers(dev);
> + return ret;
> + }
> phy_start(fep->phy_dev);
> netif_start_queue(dev);
> fep->opened = 1;
> @@ -926,10 +933,12 @@ fec_enet_close(struct net_device *dev)
>
> /* Don't know what to do yet. */
> fep->opened = 0;
> - phy_stop(fep->phy_dev);
> netif_stop_queue(dev);
> fec_stop(dev);
>
> + if (fep->phy_dev)
> + phy_disconnect(fep->phy_dev);
> +
> fec_enet_free_buffers(dev);
>
> return 0;
> @@ -1293,11 +1302,6 @@ fec_probe(struct platform_device *pdev)
> if (ret)
> goto failed_register;
>
> - printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> - "(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
> - fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> - fep->phy_dev->irq);
> -
> return 0;
>
> failed_register:
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: replace hooks in __netif_receive_skb V2
From: Jiri Pirko @ 2010-05-28 5:51 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, davem, kaber, eric.dumazet
In-Reply-To: <20100527130822.02cb1661@nehalam>
Thu, May 27, 2010 at 10:08:22PM CEST, shemminger@vyatta.com wrote:
>On Thu, 27 May 2010 20:08:24 +0200
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>> changelog:
>> v1->v2
>> - writers are locked by rtnl_lock (removed unnecessary spinlock)
>> - using call_rcu in unregister
>> - nicer macvlan_port_create cleanup
>> - struct rx_hanler is made const in funtion parameters
>>
>> What this patch does is it removes two receive frame hooks (for bridge and for
>> macvlan) from __netif_receive_skb. These are replaced them with a general
>> list of rx_handlers which is iterated thru instead.
>>
>> Then a network driver (of virtual netdev like macvlan or bridge) can register
>> an rx_handler for needed net device.
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>I wonder if we really need the chaining. What about allowing only one
>hook per device (and return -EBUSY)? That also gets rid of the allocation,
>and the extra overhead.
Hmm, that's a good question. I thought about it already. But I'm also wondering
if there is a possible scenario, when the chaining can be necessary. Also I do
not see any -significant- downside of having multiple handlers in a list, so I
would probably go this way now. Opinions?
>
>The handler hook, has to be export GPL, since we really don't want more
>network stack abuse by 3rd parties.
Noted, will be in the next patch version.
Jirka
^ permalink raw reply
* [PATCH net-next-2.6] net: replace hooks in __netif_receive_skb V3
From: Jiri Pirko @ 2010-05-28 6:12 UTC (permalink / raw)
To: netdev; +Cc: davem, shemminger, kaber, eric.dumazet
In-Reply-To: <20100528055154.GB2823@psychotron.redhat.com>
changelog:
v2->v3
- used GPL-ed exports
v1->v2
- writers are locked by rtnl_lock (removed unnecessary spinlock)
- using call_rcu in unregister
- nicer macvlan_port_create cleanup
- struct rx_hanler is made const in funtion parameters
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/macvlan.c | 27 ++++++--
include/linux/if_bridge.h | 2 -
include/linux/if_macvlan.h | 4 -
include/linux/netdevice.h | 18 +++++
net/bridge/br.c | 2 -
net/bridge/br_if.c | 14 ++++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 +-
net/core/dev.c | 160 ++++++++++++++++++++++++++------------------
9 files changed, 160 insertions(+), 82 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = macvlan_hash_lookup(port, eth->h_source);
if (!src)
@@ -511,10 +512,16 @@ static void macvlan_setup(struct net_device *dev)
dev->tx_queue_len = 0;
}
+static const struct netdev_rx_handler macvlan_rx_handler = {
+ .order = NETDEV_RX_HANDLER_ORDER_MACVLAN,
+ .callback = macvlan_handle_frame,
+};
+
static int macvlan_port_create(struct net_device *dev)
{
struct macvlan_port *port;
unsigned int i;
+ int err;
if (dev->type != ARPHRD_ETHER || dev->flags & IFF_LOOPBACK)
return -EINVAL;
@@ -528,13 +535,26 @@ static int macvlan_port_create(struct net_device *dev)
for (i = 0; i < MACVLAN_HASH_SIZE; i++)
INIT_HLIST_HEAD(&port->vlan_hash[i]);
rcu_assign_pointer(dev->macvlan_port, port);
+
+ err = netdev_rx_handler_register(dev, &macvlan_rx_handler);
+ if (err)
+ goto cleanup;
+
return 0;
+
+cleanup:
+ rcu_assign_pointer(dev->macvlan_port, NULL);
+ synchronize_rcu();
+ kfree(port);
+
+ return err;
}
static void macvlan_port_destroy(struct net_device *dev)
{
struct macvlan_port *port = dev->macvlan_port;
+ netdev_rx_handler_unregister(dev, &macvlan_rx_handler);
rcu_assign_pointer(dev->macvlan_port, NULL);
synchronize_rcu();
kfree(port);
@@ -767,14 +787,12 @@ static int __init macvlan_init_module(void)
int err;
register_netdevice_notifier(&macvlan_notifier_block);
- macvlan_handle_frame_hook = macvlan_handle_frame;
err = macvlan_link_register(&macvlan_link_ops);
if (err < 0)
goto err1;
return 0;
err1:
- macvlan_handle_frame_hook = NULL;
unregister_netdevice_notifier(&macvlan_notifier_block);
return err;
}
@@ -782,7 +800,6 @@ err1:
static void __exit macvlan_cleanup_module(void)
{
rtnl_link_unregister(&macvlan_link_ops);
- macvlan_handle_frame_hook = NULL;
unregister_netdevice_notifier(&macvlan_notifier_block);
}
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 938b7e8..0d241a5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -102,8 +102,6 @@ struct __fdb_entry {
#include <linux/netdevice.h>
extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
-extern struct sk_buff *(*br_handle_frame_hook)(struct net_bridge_port *p,
- struct sk_buff *skb);
extern int (*br_should_route_hook)(struct sk_buff *skb);
#endif
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 9ea047a..c26a0e4 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -84,8 +84,4 @@ extern int macvlan_link_register(struct rtnl_link_ops *ops);
extern netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
struct net_device *dev);
-
-extern struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *,
- struct sk_buff *);
-
#endif /* _LINUX_IF_MACVLAN_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a1bff65..f54b97d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -254,6 +254,16 @@ struct netdev_hw_addr_list {
#define netdev_for_each_mc_addr(ha, dev) \
netdev_hw_addr_list_for_each(ha, &(dev)->mc)
+
+struct netdev_rx_handler {
+ struct list_head list;
+ unsigned int order;
+#define NETDEV_RX_HANDLER_ORDER_BRIDGE 1
+#define NETDEV_RX_HANDLER_ORDER_MACVLAN 2
+ struct sk_buff *(*callback)(struct sk_buff *skb);
+ struct rcu_head rcu;
+};
+
struct hh_cache {
struct hh_cache *hh_next; /* Next entry */
atomic_t hh_refcnt; /* number of users */
@@ -1031,6 +1041,9 @@ struct net_device {
/* GARP */
struct garp_port *garp_port;
+ /* receive handlers (hooks) list */
+ struct list_head rx_handlers;
+
/* class/net/name entry */
struct device dev;
/* space for optional device, statistics, and wireless sysfs groups */
@@ -1685,6 +1698,11 @@ static inline void napi_free_frags(struct napi_struct *napi)
napi->skb = NULL;
}
+extern int netdev_rx_handler_register(struct net_device *dev,
+ const struct netdev_rx_handler *rh);
+extern void netdev_rx_handler_unregister(struct net_device *dev,
+ const struct netdev_rx_handler *rh);
+
extern void netif_nit_deliver(struct sk_buff *skb);
extern int dev_valid_name(const char *name);
extern int dev_ioctl(struct net *net, unsigned int cmd, void __user *);
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 76357b5..c8436fa 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -63,7 +63,6 @@ static int __init br_init(void)
goto err_out4;
brioctl_set(br_ioctl_deviceless_stub);
- br_handle_frame_hook = br_handle_frame;
#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
br_fdb_test_addr_hook = br_fdb_test_addr;
@@ -100,7 +99,6 @@ static void __exit br_deinit(void)
br_fdb_test_addr_hook = NULL;
#endif
- br_handle_frame_hook = NULL;
br_fdb_fini();
}
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 18b245e..45270e5 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -119,6 +119,11 @@ static void destroy_nbp_rcu(struct rcu_head *head)
destroy_nbp(p);
}
+static const struct netdev_rx_handler br_rx_handler = {
+ .order = NETDEV_RX_HANDLER_ORDER_BRIDGE,
+ .callback = br_handle_frame,
+};
+
/* Delete port(interface) from bridge is done in two steps.
* via RCU. First step, marks device as down. That deletes
* all the timers and stops new packets from flowing through.
@@ -147,6 +152,7 @@ static void del_nbp(struct net_bridge_port *p)
list_del_rcu(&p->list);
+ netdev_rx_handler_unregister(dev, &br_rx_handler);
rcu_assign_pointer(dev->br_port, NULL);
br_multicast_del_port(p);
@@ -429,6 +435,11 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
goto err2;
rcu_assign_pointer(dev->br_port, p);
+
+ err = netdev_rx_handler_register(dev, &br_rx_handler);
+ if (err)
+ goto err3;
+
dev_disable_lro(dev);
list_add_rcu(&p->list, &br->port_list);
@@ -451,6 +462,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
br_netpoll_enable(br, dev);
return 0;
+err3:
+ rcu_assign_pointer(dev->br_port, NULL);
+ synchronize_rcu();
err2:
br_fdb_delete_by_port(br, p, 1);
err1:
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index d36e700..99647d8 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -131,15 +131,19 @@ static inline int is_link_local(const unsigned char *dest)
}
/*
- * Called via br_handle_frame_hook.
* Return NULL if skb is handled
- * note: already called with rcu_read_lock (preempt_disabled)
+ * note: already called with rcu_read_lock (preempt_disabled) from
+ * netif_receive_skb
*/
-struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
+struct sk_buff *br_handle_frame(struct sk_buff *skb)
{
+ struct net_bridge_port *p;
const unsigned char *dest = eth_hdr(skb)->h_dest;
int (*rhook)(struct sk_buff *skb);
+ if (skb->pkt_type == PACKET_LOOPBACK)
+ return skb;
+
if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
goto drop;
@@ -147,6 +151,8 @@ struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
if (!skb)
return NULL;
+ p = rcu_dereference(skb->dev->br_port);
+
if (unlikely(is_link_local(dest))) {
/* Pause frames shouldn't be passed up by driver anyway */
if (skb->protocol == htons(ETH_P_PAUSE))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0f4a74b..c83519b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -331,8 +331,7 @@ extern void br_features_recompute(struct net_bridge *br);
/* br_input.c */
extern int br_handle_frame_finish(struct sk_buff *skb);
-extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
- struct sk_buff *skb);
+extern struct sk_buff *br_handle_frame(struct sk_buff *skb);
/* br_ioctl.c */
extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c82065..d2ab71a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2585,70 +2585,14 @@ static inline int deliver_skb(struct sk_buff *skb,
return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
}
-#if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
-
-#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
+#if (defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)) && \
+ (defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE))
/* This hook is defined here for ATM LANE */
int (*br_fdb_test_addr_hook)(struct net_device *dev,
unsigned char *addr) __read_mostly;
EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
#endif
-/*
- * If bridge module is loaded call bridging hook.
- * returns NULL if packet was consumed.
- */
-struct sk_buff *(*br_handle_frame_hook)(struct net_bridge_port *p,
- struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(br_handle_frame_hook);
-
-static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
- struct packet_type **pt_prev, int *ret,
- struct net_device *orig_dev)
-{
- struct net_bridge_port *port;
-
- if (skb->pkt_type == PACKET_LOOPBACK ||
- (port = rcu_dereference(skb->dev->br_port)) == NULL)
- return skb;
-
- if (*pt_prev) {
- *ret = deliver_skb(skb, *pt_prev, orig_dev);
- *pt_prev = NULL;
- }
-
- return br_handle_frame_hook(port, skb);
-}
-#else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (skb)
-#endif
-
-#if defined(CONFIG_MACVLAN) || defined(CONFIG_MACVLAN_MODULE)
-struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *p,
- struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(macvlan_handle_frame_hook);
-
-static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
- struct packet_type **pt_prev,
- int *ret,
- struct net_device *orig_dev)
-{
- struct macvlan_port *port;
-
- port = rcu_dereference(skb->dev->macvlan_port);
- if (!port)
- return skb;
-
- if (*pt_prev) {
- *ret = deliver_skb(skb, *pt_prev, orig_dev);
- *pt_prev = NULL;
- }
- return macvlan_handle_frame_hook(port, skb);
-}
-#else
-#define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
-#endif
-
#ifdef CONFIG_NET_CLS_ACT
/* TODO: Maybe we should just force sch_ingress to be compiled in
* when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
@@ -2744,6 +2688,85 @@ void netif_nit_deliver(struct sk_buff *skb)
rcu_read_unlock();
}
+static bool rx_handlers_equal(const struct netdev_rx_handler *rh1,
+ const struct netdev_rx_handler *rh2)
+{
+ return (rh1->order == rh2->order) && (rh1->callback == rh2->callback);
+}
+
+/**
+ * netdev_rx_handler_register - register receive handler
+ * @dev: device to register a handler for
+ * @rh: receive handler to register
+ *
+ * Register a receive hander for a device. This handler will then be
+ * called from __netif_receive_skb. A negative errno code is returned
+ * on a failure.
+ *
+ * The caller must hold the rtnl_mutex.
+ */
+int netdev_rx_handler_register(struct net_device *dev,
+ const struct netdev_rx_handler *rh)
+{
+ struct list_head *iter, *add_after;
+ struct netdev_rx_handler *rh1;
+ int err = 0;
+
+ ASSERT_RTNL();
+ add_after = &dev->rx_handlers;
+ list_for_each(iter, &dev->rx_handlers) {
+ rh1 = list_entry(iter, struct netdev_rx_handler, list);
+ if (rx_handlers_equal(rh, rh1))
+ return -EEXIST;
+ if (rh1->order > rh->order)
+ break;
+ add_after = iter;
+ }
+ rh1 = kzalloc(sizeof(*rh), GFP_KERNEL);
+ if (!rh1)
+ return -ENOMEM;
+
+ rh1->order = rh->order;
+ rh1->callback = rh->callback;
+ list_add_rcu(&rh1->list, add_after);
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_register);
+
+static void rx_handler_rcu_free(struct rcu_head *head)
+{
+ struct netdev_rx_handler *rh;
+
+ rh = container_of(head, struct netdev_rx_handler, rcu);
+ kfree(rh);
+}
+
+/**
+ * netdev_rx_handler_unregister - unregister receive handler
+ * @dev: device to unregister a handler from
+ * @rh: receive handler to unregister
+ *
+ * Unregister a receive hander from a device.
+ *
+ * The caller must hold the rtnl_mutex.
+ */
+void netdev_rx_handler_unregister(struct net_device *dev,
+ const struct netdev_rx_handler *rh)
+{
+ struct netdev_rx_handler *rh1;
+
+ ASSERT_RTNL();
+ list_for_each_entry(rh1, &dev->rx_handlers, list) {
+ if (rx_handlers_equal(rh, rh1)) {
+ list_del_rcu(&rh1->list);
+ call_rcu(&rh1->rcu, rx_handler_rcu_free);
+ return;
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
+
static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
struct net_device *master)
{
@@ -2796,6 +2819,7 @@ EXPORT_SYMBOL(__skb_bond_should_drop);
static int __netif_receive_skb(struct sk_buff *skb)
{
struct packet_type *ptype, *pt_prev;
+ struct netdev_rx_handler *rh;
struct net_device *orig_dev;
struct net_device *master;
struct net_device *null_or_orig;
@@ -2859,12 +2883,18 @@ static int __netif_receive_skb(struct sk_buff *skb)
ncls:
#endif
- skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
- if (!skb)
- goto out;
- skb = handle_macvlan(skb, &pt_prev, &ret, orig_dev);
- if (!skb)
- goto out;
+ /*
+ * Go through various rx handlers, like bridge, macvlan etc.
+ */
+ list_for_each_entry_rcu(rh, &skb->dev->rx_handlers, list) {
+ if (pt_prev) {
+ ret = deliver_skb(skb, pt_prev, orig_dev);
+ pt_prev = NULL;
+ }
+ skb = rh->callback(skb);
+ if (!skb)
+ goto out;
+ }
/*
* Make sure frames received on VLAN interfaces stacked on
@@ -5371,6 +5401,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
dev_mc_init(dev);
dev_uc_init(dev);
+ INIT_LIST_HEAD(&dev->rx_handlers);
+
dev_net_set(dev, &init_net);
dev->_tx = tx;
--
1.6.6.1
^ permalink raw reply related
* Re: fn_trie_lookup and fn_hash_lookup
From: ratheesh k @ 2010-05-28 6:13 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, linux-net
In-Reply-To: <20100527204055.406343eb@nehalam>
On Fri, May 28, 2010 at 9:10 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Fri, 28 May 2010 08:53:11 +0530
> ratheesh k <ratheesh.ksz@gmail.com> wrote:
>
>> hi ,
>>
>> I was looking into 2.6.18 kernel . I can see two functions
>> fn_trie_lookup , fn_hash_lookup
>>
>> ipv4/fib_trie.c: tb->tb_lookup = fn_trie_lookup;
>> ipv4/fib_hash.c: tb->tb_lookup = fn_hash_lookup;
>>
>> What is the differnce between these two functions ?
>
> They are the two possible FIB algorithms configurable.
>
> net/ipv4/Kconfig:
>
> choice
> prompt "Choose IP: FIB lookup algorithm (choose FIB_HASH if unsure)"
> depends on IP_ADVANCED_ROUTER
> default ASK_IP_FIB_HASH
>
> config ASK_IP_FIB_HASH
> bool "FIB_HASH"
> ---help---
> Current FIB is very proven and good enough for most users.
>
> config IP_FIB_TRIE
> bool "FIB_TRIE"
> ---help---
> Use new experimental LC-trie as FIB lookup algorithm.
> This improves lookup performance if you have a large
> number of routes.
>
> LC-trie is a longest matching prefix lookup algorithm which
> performs better than FIB_HASH for large routing tables.
> But, it consumes more memory and is more complex.
>
> LC-trie is described in:
>
> IP-address lookup using LC-tries. Stefan Nilsson and Gunnar Karlsson
> IEEE Journal on Selected Areas in Communications, 17(6):1083-1092,
> June 1999
>
> An experimental study of compression methods for dynamic tries
> Stefan Nilsson and Matti Tikkanen. Algorithmica, 33(1):19-33, 2002.
> http://www.nada.kth.se/~snilsson/public/papers/dyntrie2/
>
>
> Also see Documentation/networking/fib_trie.txt
>
>
>
Thanks ..
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: replace hooks in __netif_receive_skb V3
From: Eric Dumazet @ 2010-05-28 7:02 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, shemminger, kaber
In-Reply-To: <20100528061241.GC2823@psychotron.redhat.com>
Le vendredi 28 mai 2010 à 08:12 +0200, Jiri Pirko a écrit :
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a1bff65..f54b97d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -254,6 +254,16 @@ struct netdev_hw_addr_list {
> #define netdev_for_each_mc_addr(ha, dev) \
> netdev_hw_addr_list_for_each(ha, &(dev)->mc)
>
> +
> +struct netdev_rx_handler {
> + struct list_head list;
> + unsigned int order;
> +#define NETDEV_RX_HANDLER_ORDER_BRIDGE 1
> +#define NETDEV_RX_HANDLER_ORDER_MACVLAN 2
> + struct sk_buff *(*callback)(struct sk_buff *skb);
> + struct rcu_head rcu;
> +};
> +
> struct hh_cache {
> struct hh_cache *hh_next; /* Next entry */
> atomic_t hh_refcnt; /* number of users */
> @@ -1031,6 +1041,9 @@ struct net_device {
> /* GARP */
> struct garp_port *garp_port;
>
> + /* receive handlers (hooks) list */
> + struct list_head rx_handlers;
> +
> /* class/net/name entry */
> struct device dev;
> /* space for optional device, statistics, and wireless sysfs groups */
> @@ -1685,6 +1698,11 @@ static inline void napi_free_frags(struct napi_struct *napi)
> napi->skb = NULL;
> }
>
Please chose another place to hold rx_handlers, to keep rx path memory
needs to minimum cache lines. Somewhere in the following section :
/*
* Cache line mostly used on receive path (including eth_type_trans())
*/
unsigned long last_rx; /* Time of last Rx */
/* Interface address info used in eth_type_trans() */
unsigned char *dev_addr; /* hw address, (before bcast
because most packets are
unicast) */
struct netdev_hw_addr_list dev_addrs; /* list of device
hw addresses */
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
#ifdef CONFIG_RPS
struct kset *queues_kset;
struct netdev_rx_queue *_rx;
/* Number of RX queues allocated at alloc_netdev_mq() time */
unsigned int num_rx_queues;
#endif
struct netdev_queue rx_queue;
and before the :
struct netdev_queue *_tx ____cacheline_aligned_in_smp;
^ permalink raw reply
* [net-2.6 PATCH 2/2] netlink: bug fix: wrong size was calculated for vfinfo list blob
From: Scott Feldman @ 2010-05-28 7:15 UTC (permalink / raw)
To: davem; +Cc: chrisw, netdev, kaber, arnd
In-Reply-To: <20100528071546.4058.1332.stgit@localhost.localdomain>
From: Scott Feldman <scofeldm@cisco.com>
The wrong size was being calculated for vfinfo. In one case, it was over-
calculating using nlmsg_total_size on attrs, in another case, it was
under-calculating by assuming ifla_vf_* structs are packed together, but
each struct is it's own attr w/ hdr (and padding).
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
---
net/core/rtnetlink.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7331bb2..1a2af24 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -650,11 +650,12 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
if (dev->dev.parent && dev_is_pci(dev->dev.parent)) {
int num_vfs = dev_num_vf(dev->dev.parent);
- size_t size = nlmsg_total_size(sizeof(struct nlattr));
- size += nlmsg_total_size(num_vfs * sizeof(struct nlattr));
- size += num_vfs * (sizeof(struct ifla_vf_mac) +
- sizeof(struct ifla_vf_vlan) +
- sizeof(struct ifla_vf_tx_rate));
+ size_t size = nla_total_size(sizeof(struct nlattr));
+ size += nla_total_size(num_vfs * sizeof(struct nlattr));
+ size += num_vfs *
+ (nla_total_size(sizeof(struct ifla_vf_mac)) +
+ nla_total_size(sizeof(struct ifla_vf_vlan)) +
+ nla_total_size(sizeof(struct ifla_vf_tx_rate)));
return size;
} else
return 0;
^ permalink raw reply related
* [net-2.6 PATCH 1/2] netlink: bug fix: don't overrun skbs on vf_port dump
From: Scott Feldman @ 2010-05-28 7:15 UTC (permalink / raw)
To: davem; +Cc: chrisw, netdev, kaber, arnd
From: Scott Feldman <scofeldm@cisco.com>
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer. Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
---
net/core/rtnetlink.c | 15 +++++++++------
1 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7ab86f3..7331bb2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -722,14 +722,13 @@ static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
vf_port = nla_nest_start(skb, IFLA_VF_PORT);
- if (!vf_port) {
- nla_nest_cancel(skb, vf_ports);
- return -EMSGSIZE;
- }
+ if (!vf_port)
+ goto nla_put_failure;
NLA_PUT_U32(skb, IFLA_PORT_VF, vf);
err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
+ if (err == -EMSGSIZE)
+ goto nla_put_failure;
if (err) {
-nla_put_failure:
nla_nest_cancel(skb, vf_port);
continue;
}
@@ -739,6 +738,10 @@ nla_put_failure:
nla_nest_end(skb, vf_ports);
return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, vf_ports);
+ return -EMSGSIZE;
}
static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev)
@@ -753,7 +756,7 @@ static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev)
err = dev->netdev_ops->ndo_get_vf_port(dev, PORT_SELF_VF, skb);
if (err) {
nla_nest_cancel(skb, port_self);
- return err;
+ return (err == -EMSGSIZE) ? err : 0;
}
nla_nest_end(skb, port_self);
^ permalink raw reply related
* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: walter harms @ 2010-05-28 7:28 UTC (permalink / raw)
To: Richard Hartmann; +Cc: linux-kernel, netdev, linux-ppp, erik_list
In-Reply-To: <AANLkTikeld4fbr9r1Sq0jhfkWxf0ZlwOsufUIRvb1rFH@mail.gmail.com>
Richard Hartmann schrieb:
> ==It seems LKML & netdev were dropped from the To list, re-adding==
>
> Hi Walter,
>
>
>> if (ppp->rrsched % ppp->n_channels == i)
>>
>> since both do not change in that while() loop you can calculate in advance
>> perhaps ppp->rrsched %= ppp->n_channels before the while ?
>> (that would reduce my bad feels about variables that only increments also :)
>
> rrsched and i do change when appropriate. As they are used as a cheap
> way to get round robin, rrsched is not even initialized. One can argue
> that this should be done, but as it literally does not matter where the
> value starts counting....
>
yep,
the problem is that you will trigger a warning "variable uninitialised".
And as programmer you are trained to spot such kind of code.
in short you violated "the rule of least surprise", simply set it to 99
and add a comment that the value does not matter because it is actualy a
random seed.
Basicly the same reason for the ppp->rrsched %= ppp->n_channels outside
the loop. 1. people/compiler are happy because they see the variable
is used. 2. no need to recalculate the if in a loop (never trust optimisers).
/*
perhaps rr_chanel is a better name ? round robin channel
that would requiere the changes but explain what it actualy is
*/
>
>> btw: you are doing after loop() if(pch->chan == NULL) continue;
>> that means the else in the if below if (pch->chan) should never be reached.
>> Or is it likely that some channel will be dropped (?) ?
>
> Channels could be dropped and we need to guard against that.
>
please add a comment about that. i can garantee you someone will spot it
and remove either the pch->chan == NULL or the else.
just my 2 cents,
wh
>
>> btw: this is intentional ? looks strange
>>
>> if(ppp_ml_noexplode) {
>> + }
>> + else {
>
> Leftover from various printks for debugging reasons.
>
>
> Thanks for your feedback,
> Richard
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox