Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
From: Dave Chinner @ 2010-11-04  0:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev
In-Reply-To: <20101103112936.GB9169@dastard>

On Wed, Nov 03, 2010 at 10:29:36PM +1100, Dave Chinner wrote:
> On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
> > On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
> > > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
> > > > Folks,
> > > > 
> > > > Starting up KVM on a current mainline kernel using the tap
> > > > device for the networking is resulting in the ip process tryin gto
> > > > up the tap interface hanging. KVM is started with this networking
> > > > config:
> > > > 
> > > > ....
> > > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
> > > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
> > > > ....
> > > > 
> > > > And the script is effectively:
> > > > 
> > > > switch=br0
> > > > if [ -n "$1" ];then
> > > >         /usr/bin/sudo /sbin/ip link set $1 up
> > > >         sleep 0.5s
> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
> > > > 	exit 0
> > > > fi
> > > > exit 1
> > > > 
> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
> > > > 
> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
> > > > 
> > > > In do_exit() with this trace:
> > > > 
> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
> > > > [ 1630.811324] Call Trace:
> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
> > > > 
> > > > The address comes down to the schedule() call:
> > > > 
> > > > (gdb) l *(do_exit+0x716)
> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
> > > > 1029            preempt_disable();
> > > > 1030            exit_rcu();
> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
> > > > 1032            tsk->state = TASK_DEAD;
> > > > 1033            schedule();
> > > > 1034            BUG();
> > > > 1035            /* Avoid "noreturn function does return".  */
> > > > 1036            for (;;)
> > > > 1037                    cpu_relax();    /* For when BUG is null */
> > > > 1038    }
> > > > 
> > > > Needless to say, KVM is not starting up. This works just fine on
> > > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
> > > > the host is the machine that hosts all my build and test environments....
> > > > 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > 
> > > Could it be the same problem than 
> > > 
> > > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
> > > 
> > > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
> > 
> > It's working fine on 2.6.36 right now, so it's something that came in
> > with the .37 merge cycle...
> 
> Actually, the machine isn't running a 2.6.36 kernel (it had booted
> to the working .35 kernel and I didn't notice). So i've just tested
> a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
> reverted the above commit but that does not fix the problem.

Ok, so further investigation has shown I can reproduce this on
2.6.32 and 2.6.35. It's not a new bug, nor do I think that it is
a networking bug as it is not specific to the ip command.

The trigger for the problem is actually an upgrade of the sudo
package in debian unstable which changed the behaviour of sudo (has
some per-login/pty restriction on it now). Basically, the startup
script I'm running does:

sudo kvm .....

which then executes the qemu-ifup bash script which does:

	sudo ip ....
	sudo brctl ...

because at one point KVM did not create the tap device automatically
and so kvm could be run as a user with only the ifup script
requiring privileges to create the tap device and mark it up. When
KVM started creating the tap device, I added the sudo to the KVM
script, an everything worked again.

Now if I take the 'sudo' out of the ifup script, the hang goes away.
I first removed it from the ip command, and then the brctl command
hung in the same way the ip command was hanging. Hence my thoughts
that it is not directly related to networking utilities.
Unfortunately, it is not trivial to reproduce as I could only
trigger it through this kvm method, not on the command line. e.g:

$ sudo bash -c "sudo ip link set tap1 up"

does not hang.

This sudo package upgrade coincided with kernel upgrades, and so
that lead to my confusion about where it occurred and what triggered
it.  Still, it appears to be a bug that has been around for some
time.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] 8021q: set hard_header_len when VLAN offload features are toggled
From: Jesse Gross @ 2010-11-04  0:46 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, netdev
In-Reply-To: <20101031002232.8691.41201.stgit@jf-dev1-dcblab>

On Sat, Oct 30, 2010 at 5:22 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
> as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
> This results in pskb_expand_head() being used unnecessarily.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] net: remove check for headroom in vlan_dev_create
From: Jesse Gross @ 2010-11-04  0:46 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, netdev
In-Reply-To: <20101031002237.8691.6266.stgit@jf-dev1-dcblab>

On Sat, Oct 30, 2010 at 5:22 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> It is possible for the headroom to be smaller then the
> hard_header_len for a short period of time after toggling
> the vlan offload setting.
>
> This is not a hard error and skb_cow_head is called in
> __vlan_put_tag() to resolve this.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jesse Gross <jesse@nicira.com>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] net: consolidate 8021q tagging
From: Jesse Gross @ 2010-11-04  0:47 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, netdev
In-Reply-To: <20101031002242.8691.38060.stgit@jf-dev1-dcblab>

On Sat, Oct 30, 2010 at 5:22 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> Now that VLAN packets are tagged in dev_hard_start_xmit()
> at the bottom of the stack we no longer need to tag them
> in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR
> case).
>
> This allows the accel path and non accel paths to be consolidated.
> Here the vlan_tci in the skb is always set and we allow the
> stack to add the actual tag in dev_hard_start_xmit().
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jesse Gross <jesse@nicira.com>

Thanks John.

^ permalink raw reply

* Re: [RFC][net-next-2.6 PATCH 2/4] net: 8021Q consolidate header_ops routines
From: Jesse Gross @ 2010-11-04  0:47 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev
In-Reply-To: <20101021221010.22906.60238.stgit@jf-dev1-dcblab>

On Thu, Oct 21, 2010 at 3:10 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> The only thing the 8021Q header ops routines are required
> for is the VLAN_FLAG_REORDER_HDR otherwise by the time
> the VLAN tag has been added the packet is already on
> its way down the stack. In this case using the Ethernet
> ops works OK.
>
> At present the VLAN_FLAG_REORDER_HDR flag does not work
> with vlan offloads. As I understand the flag the intent
> is to allow taps on the vlan device and possibly the
> QOS layer to see the vlan tag info.
>
> By inserting the tag in vlan_tci any taps or QOS policies
> should be able to retrieve the vlan info. This allows
> the flag to work the same in both the offload case and
> non-offloaded case. And allows us to use the underlying
> ethernet ops.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

I noticed that you dropped this patch from your most recent series, so
I went back to take a look at it.  I realized that it probably works
inconsistently since header caching doesn't take into account
skb->vlan_tci, so whether you see the tag depends on the state of the
cache.

It would be really good to have this type of code consolidation, both
for the sake of sanity and to eliminate the inconsistent behavior.  We
could do that by either not using header caching or making it work
with vlan offloading somehow.  However, I'm not sure that there's
really much point in that.  VLAN_FLAG_REORDER_HDR doesn't work with
cards that do vlan offloading, which is a pretty significant number of
them.  It similarly works inconsistently on the rx side.  So it's
broken most of the time and worse, the behavior changes depending on
the NIC (and now the ethtool setting).  Can we just eliminate it?

^ permalink raw reply

* Re: Linux 2.6.37-rc1 (net/sched: cls_cgroup)
From: Li Zefan @ 2010-11-04  1:46 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, Randy Dunlap, Linus Torvalds, Jamal Hadi Salim,
	Thomas Graf, Linux Kernel Mailing List, netdev, Ben Blum
In-Reply-To: <20101103233105.GA26124@gondor.apana.org.au>

>> commits 8e039d84b323c450 
>> (cgroups: net_cls as module)
>>
>> followed by commit f845172531f
>> (cls_cgroup: Store classid in struct sock)
> 
> Indeed, it looks like the tree I worked on didn't have the first
> patch applied for some reason.
> 

The first patch was merged in .34, and the second one .35, and
from the changelog and the diff, seems you did know cls_cgroup
can be a module. ;)

> Anyway, this patch should fix the problem.  Thanks Eric!
> 
> cls_cgroup: Fix crash on module unload
> 
> Somewhere along the lines net_cls_subsys_id became a macro when
> cls_cgroup is built as a module.  Not only did it make cls_cgroup
> completely useless, it also causes it to crash on module unload.
> 
> This patch fixes this by removing that macro.
> 
> Thanks to Eric Dumazet for diagnosing this problem.
> 
> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 

Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>

> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 37dff78..d49c40f 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -34,8 +34,6 @@ struct cgroup_subsys net_cls_subsys = {
>  	.populate	= cgrp_populate,
>  #ifdef CONFIG_NET_CLS_CGROUP
>  	.subsys_id	= net_cls_subsys_id,
> -#else
> -#define net_cls_subsys_id net_cls_subsys.subsys_id
>  #endif
>  	.module		= THIS_MODULE,
>  };
> 
> Cheers,

^ permalink raw reply

* Re: [PATCH] de2104x: fix panic on load
From: David Miller @ 2010-11-04  1:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288823132.2718.32.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 03 Nov 2010 23:25:32 +0100

> Its now illegal to call netif_stop_queue() before register_netdev()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/5] netfilter: netfilter fixes
From: David Miller @ 2010-11-04  1:54 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev
In-Reply-To: <1288822372-21245-1-git-send-email-kaber@trash.net>

From: kaber@trash.net
Date: Wed,  3 Nov 2010 23:12:47 +0100

> Please apply or pull from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.git

Pulled, thanks a lot Patrick.

^ permalink raw reply

* Re: panic with 2.6.37-rc1
From: David Miller @ 2010-11-04  1:54 UTC (permalink / raw)
  To: teg; +Cc: eric.dumazet, linux-kernel, netdev
In-Reply-To: <AANLkTimY+cC=bu6LDb_e-Qbghd-F0RNZejhKaeXJWbY7@mail.gmail.com>

From: Tom Gundersen <teg@jklm.no>
Date: Wed, 3 Nov 2010 23:42:02 +0100

> On Wed, Nov 3, 2010 at 11:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mercredi 03 novembre 2010 à 23:02 +0100, Tom Gundersen a écrit :
>>> I get panic during boot both with 2.6.37-rc1 and with current head.
>>>
>>> Here is the error message:
>>> <http://www.lix.polytechnique.fr/~tom/2.6.37-rc1-panic.jpg>. My
>>> .config is below.
>>>
>>> Let me know if you want more info. If you want I can test patches or
>>> bisect (if no one has any better suggestions).
>>>
>>
>> Please test following patch, thanks
>>
>> [PATCH] atl1 : fix panic on load
>>
>> Its now illegal to call netif_stop_queue() before register_netdev()
>>
>> Reported-by: Tom Gundersen <teg@jklm.no>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
 ...
> The patch solves the problem. Thanks!

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH 5/5] ipv4: netfilter: ip_tables: fix information leak to userland
From: David Miller @ 2010-11-04  1:55 UTC (permalink / raw)
  To: jengelh; +Cc: kaber, netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1011032355020.31351@obet.zrqbmnf.qr>

From: Jan Engelhardt <jengelh@medozas.de>
Date: Wed, 3 Nov 2010 23:55:58 +0100 (CET)

> netfilter: ip6_tables: fix information leak to userspace
> 
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

Good catch, applied, thanks Jan.

^ permalink raw reply

* Re: Linux 2.6.37-rc1 (net/sched: cls_cgroup)
From: David Miller @ 2010-11-04  1:56 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, randy.dunlap, torvalds, hadi, tgraf, linux-kernel,
	netdev, bblum
In-Reply-To: <20101103233105.GA26124@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 3 Nov 2010 18:31:05 -0500

> cls_cgroup: Fix crash on module unload
> 
> Somewhere along the lines net_cls_subsys_id became a macro when
> cls_cgroup is built as a module.  Not only did it make cls_cgroup
> completely useless, it also causes it to crash on module unload.
> 
> This patch fixes this by removing that macro.
> 
> Thanks to Eric Dumazet for diagnosing this problem.
> 
> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, and queued up for -stable, thanks everyone!

^ permalink raw reply

* Re: [SECURITY] memory corruption in X.25 facilities parsing
From: David Miller @ 2010-11-04  1:56 UTC (permalink / raw)
  To: drosenberg; +Cc: andrew.hendry, netdev, security, stable
In-Reply-To: <1288827871.22123.0.camel@dan>

From: Dan Rosenberg <drosenberg@vsecurity.com>
Date: Wed, 03 Nov 2010 19:44:31 -0400

> Looks good to me.  Thanks for the quick turnaround.

Applied, thanks!

^ permalink raw reply

* Re: [PATCH 1/2] caif: Bugfix for socket priority, bindtodev and dbg channel.
From: David Miller @ 2010-11-04  1:56 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, andre.carvalho.matos
In-Reply-To: <1288648368-9062-1-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>
Date: Mon,  1 Nov 2010 22:52:47 +0100

> From: André Carvalho de Matos <andre.carvalho.matos@stericsson.com>
> 
> Changes:
> o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
>   in caif's setsockopt,  using the struct sock attribute priority instead.
> 
> o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
>   in caif's setsockopt,  using the struct sock attribute ifindex instead.
> 
> o Wrong assert statement for RFM layer segmentation.
> 
> o CAIF Debug channels was not working over SPI, caif_payload_info
>   containing padding info must be initialized.
> 
> o Check on pointer before dereferencing when unregister dev in caif_dev.c
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] caif: SPI-driver bugfix - incorrect padding.
From: David Miller @ 2010-11-04  1:56 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev
In-Reply-To: <1288648368-9062-2-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>
Date: Mon,  1 Nov 2010 22:52:48 +0100

> From: Sjur Brændeland <sjur.brandeland@stericsson.com>
> 
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH] caif: Remove noisy printout when disconnecting caif socket
From: David Miller @ 2010-11-04  1:57 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev
In-Reply-To: <1288815565-2616-1-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>
Date: Wed, 03 Nov 2010 21:19:25 +0100

> 
> Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [Patch] netxen: remove unused firmware exports
From: David Miller @ 2010-11-04  1:58 UTC (permalink / raw)
  To: amit.salecha
  Cc: amwang, linux-kernel, dhananjay.phadke, narender.kumar, netdev
In-Reply-To: <99737F4847ED0A48AECC9F4A1974A4B80FCEAB435D@MNEXMB2.qlogic.org>

From: Amit Salecha <amit.salecha@qlogic.com>
Date: Wed, 3 Nov 2010 04:35:01 -0500

>> From: Amerigo Wang [amwang@redhat.com]
>> Sent: Wednesday, November 03, 2010 9:55 AM
>> To: linux-kernel@vger.kernel.org
>> Cc: Dhananjay Phadke; Amit Salecha; Narender Kumar; netdev@vger.kernel.org; David S. Miller; Amerigo Wang
>> Subject: [Patch] netxen: remove unused firmware exports
>>
>> Quote from Amit Salecha:
>> 
>> "Actually I was not updated, NX_UNIFIED_ROMIMAGE_NAME (phanfw.bin) is already
>> submitted and its present in linux-firmware.git.
>>
>> I will get back to you on NX_P2_MN_ROMIMAGE_NAME, NX_P3_CT_ROMIMAGE_NAME and
>> NX_P3_MN_ROMIMAGE_NAME. Whether this will be submitted ?"
>>
>> We have to remove these, otherwise we will get wrong info from modinfo.
>>
>> Signed-off-by: WANG Cong <amwang@redhat.com>
>> Cc: Amit Kumar Salecha <amit.salecha@qlogic.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Dhananjay Phadke <dhananjay.phadke@qlogic.com>
>> Cc: Narender Kumar narender.kumar@qlogic.com
> 
> Acked-by:  Amit Kumar Salecha <amit.salecha@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH] smsc911x: Set Ethernet EEPROM size to supported device's size
From: David Miller @ 2010-11-04  1:57 UTC (permalink / raw)
  To: jfaith7; +Cc: netdev, linux-omap
In-Reply-To: <1288647008-11846-1-git-send-email-jfaith7@gmail.com>

From: John Faith <jfaith7@gmail.com>
Date: Mon,  1 Nov 2010 14:30:08 -0700

> The SMSC911x supports 128 x 8-bit EEPROMs.  Increase the EEPROM size
> so more than just the MAC address can be stored.
> 
> Signed-off-by: John Faith <jfaith7@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net dst: fix percpu_counter list corruption and poison overwritten
From: David Miller @ 2010-11-04  1:59 UTC (permalink / raw)
  To: eric.dumazet
  Cc: dfeng, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji,
	kaber
In-Reply-To: <1288761773.2467.535.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 03 Nov 2010 06:22:53 +0100

> Le mercredi 03 novembre 2010 à 10:11 +0800, Xiaotian Feng a écrit :
>> There're some percpu_counter list corruption and poison overwritten warnings
>> in recent kernel, which is resulted by fc66f95c.
>> 
>> commit fc66f95c switches to use percpu_counter, in ip6_route_net_init, kernel
>> init the percpu_counter for dst entries, but, the percpu_counter is never destroyed
>> in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter
>> is still on the list, then if we insert/remove other percpu_counter, list corruption
>> resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of
>> the freed value, the poison overwritten is resulted then.
>> 
>> With the following patch, the percpu_counter list corruption and poison overwritten
>> warnings disappeared.
>> 
>> Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
 ...
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks!

^ permalink raw reply

* [PATCH 1/2] netlink: Make nlmsg_find_attr take a const nlmsghdr*.
From: Nelson Elhage @ 2010-11-04  2:35 UTC (permalink / raw)
  To: netdev; +Cc: Nelson Elhage

This will let us use it on a nlmsghdr stored inside a netlink_callback.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
---
 include/net/netlink.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index f3b201d..9801c55 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -384,7 +384,7 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, int hdrlen,
  *
  * Returns the first attribute which matches the specified type.
  */
-static inline struct nlattr *nlmsg_find_attr(struct nlmsghdr *nlh,
+static inline struct nlattr *nlmsg_find_attr(const struct nlmsghdr *nlh,
 					     int hdrlen, int attrtype)
 {
 	return nla_find(nlmsg_attrdata(nlh, hdrlen),
-- 
1.7.1.31.g6297e


^ permalink raw reply related

* [PATCH 2/2] inet_diag: Make sure we actually run the same bytecode we audited.
From: Nelson Elhage @ 2010-11-04  2:35 UTC (permalink / raw)
  To: netdev; +Cc: Nelson Elhage
In-Reply-To: <1288838141-17871-1-git-send-email-nelhage@ksplice.com>

We were using nlmsg_find_attr() to look up the bytecode by attribute when
auditing, but then just using the first attribute when actually running
bytecode. So, if we received a message with two attribute elements, where only
the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different
bytecode strings.

Fix this by consistently using nlmsg_find_attr everywhere.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
---
 net/ipv4/inet_diag.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index ba80426..2ada171 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -490,9 +490,11 @@ static int inet_csk_diag_dump(struct sock *sk,
 {
 	struct inet_diag_req *r = NLMSG_DATA(cb->nlh);
 
-	if (cb->nlh->nlmsg_len > 4 + NLMSG_SPACE(sizeof(*r))) {
+	if (nlmsg_attrlen(cb->nlh, sizeof(*r))) {
 		struct inet_diag_entry entry;
-		struct rtattr *bc = (struct rtattr *)(r + 1);
+		const struct nlattr *bc = nlmsg_find_attr(cb->nlh,
+							  sizeof(*r),
+							  INET_DIAG_REQ_BYTECODE);
 		struct inet_sock *inet = inet_sk(sk);
 
 		entry.family = sk->sk_family;
@@ -512,7 +514,7 @@ static int inet_csk_diag_dump(struct sock *sk,
 		entry.dport = ntohs(inet->inet_dport);
 		entry.userlocks = sk->sk_userlocks;
 
-		if (!inet_diag_bc_run(RTA_DATA(bc), RTA_PAYLOAD(bc), &entry))
+		if (!inet_diag_bc_run(nla_data(bc), nla_len(bc), &entry))
 			return 0;
 	}
 
@@ -527,9 +529,11 @@ static int inet_twsk_diag_dump(struct inet_timewait_sock *tw,
 {
 	struct inet_diag_req *r = NLMSG_DATA(cb->nlh);
 
-	if (cb->nlh->nlmsg_len > 4 + NLMSG_SPACE(sizeof(*r))) {
+	if (nlmsg_attrlen(cb->nlh, sizeof(*r))) {
 		struct inet_diag_entry entry;
-		struct rtattr *bc = (struct rtattr *)(r + 1);
+		const struct nlattr *bc = nlmsg_find_attr(cb->nlh,
+							  sizeof(*r),
+							  INET_DIAG_REQ_BYTECODE);
 
 		entry.family = tw->tw_family;
 #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
@@ -548,7 +552,7 @@ static int inet_twsk_diag_dump(struct inet_timewait_sock *tw,
 		entry.dport = ntohs(tw->tw_dport);
 		entry.userlocks = 0;
 
-		if (!inet_diag_bc_run(RTA_DATA(bc), RTA_PAYLOAD(bc), &entry))
+		if (!inet_diag_bc_run(nla_data(bc), nla_len(bc), &entry))
 			return 0;
 	}
 
@@ -618,7 +622,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
 	struct inet_diag_req *r = NLMSG_DATA(cb->nlh);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct listen_sock *lopt;
-	struct rtattr *bc = NULL;
+	const struct nlattr *bc = NULL;
 	struct inet_sock *inet = inet_sk(sk);
 	int j, s_j;
 	int reqnum, s_reqnum;
@@ -638,8 +642,9 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
 	if (!lopt || !lopt->qlen)
 		goto out;
 
-	if (cb->nlh->nlmsg_len > 4 + NLMSG_SPACE(sizeof(*r))) {
-		bc = (struct rtattr *)(r + 1);
+	if (nlmsg_attrlen(cb->nlh, sizeof(*r))) {
+		bc = nlmsg_find_attr(cb->nlh, sizeof(*r),
+				     INET_DIAG_REQ_BYTECODE);
 		entry.sport = inet->inet_num;
 		entry.userlocks = sk->sk_userlocks;
 	}
@@ -672,8 +677,8 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
 					&ireq->rmt_addr;
 				entry.dport = ntohs(ireq->rmt_port);
 
-				if (!inet_diag_bc_run(RTA_DATA(bc),
-						    RTA_PAYLOAD(bc), &entry))
+				if (!inet_diag_bc_run(nla_data(bc),
+						      nla_len(bc), &entry))
 					continue;
 			}
 
-- 
1.7.1.31.g6297e


^ permalink raw reply related

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-11-04  5:38 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <20101103104812.GB10555@redhat.com>

On Wed, 2010-11-03 at 12:48 +0200, Michael S. Tsirkin wrote:
> I mean in practice, you see a benefit from this patch?

Yes, I tested it. It does benefit the performance.

> > My concern here is whether checking only in set up would be
> sufficient
> > for security?
> 
> It better be sufficient because the checks that put_user does
> are not effictive when run from the kernel thread, anyway.
> 
> > Would be there is a case guest could corrupt the ring
> > later? If not, that's OK.
> 
> You mean change the pointer after it's checked?
> If you see such a case, please holler.

I wonder about it, not a such case in mind.

> To clarify: the combination of __put_user and separate
> signalling is giving the same performance benefit as your
> patch?

Yes, it has similar performance, not I haven't finished all message
sizes comparison yet.

> I am mostly concerned with adding code that seems to help
> speed for reasons we don't completely understand, because
> then we might break the optimization easily without noticing.

I don't think the patch I submited would break up anything. It just
reduced the cost of per used buffer 3 put_user() calls and guest
signaling from one to one to many to one.

Thanks
Shirley

^ permalink raw reply

* Re: [regression, 2.6.37-rc1] 'ip link tap0 up' stuck in do_exit()
From: Américo Wang @ 2010-11-04  5:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <20101104002140.GA13830@dastard>

On Thu, Nov 04, 2010 at 11:21:40AM +1100, Dave Chinner wrote:
>On Wed, Nov 03, 2010 at 10:29:36PM +1100, Dave Chinner wrote:
>> On Wed, Nov 03, 2010 at 09:34:48PM +1100, Dave Chinner wrote:
>> > On Wed, Nov 03, 2010 at 08:13:22AM +0100, Eric Dumazet wrote:
>> > > Le mercredi 03 novembre 2010 à 17:26 +1100, Dave Chinner a écrit :
>> > > > Folks,
>> > > > 
>> > > > Starting up KVM on a current mainline kernel using the tap
>> > > > device for the networking is resulting in the ip process tryin gto
>> > > > up the tap interface hanging. KVM is started with this networking
>> > > > config:
>> > > > 
>> > > > ....
>> > > >         -net nic,vlan=0,macaddr=00:e4:b6:63:63:6d,model=virtio \
>> > > >         -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \
>> > > > ....
>> > > > 
>> > > > And the script is effectively:
>> > > > 
>> > > > switch=br0
>> > > > if [ -n "$1" ];then
>> > > >         /usr/bin/sudo /sbin/ip link set $1 up
>> > > >         sleep 0.5s
>> > > >         /usr/bin/sudo /usr/sbin/brctl addif $switch $1
>> > > > 	exit 0
>> > > > fi
>> > > > exit 1
>> > > > 
>> > > > This is resulting in the command 'ip link set tap0 up' hanging as a zombie:
>> > > > 
>> > > > root      3005     1  0 16:53 pts/3    00:00:00 /bin/sh /vm-images/qemu-ifup tap0
>> > > > root      3011  3005  0 16:53 pts/3    00:00:00 /usr/bin/sudo /sbin/ip link set tap0 up
>> > > > root      3012  3011  0 16:53 pts/3    00:00:00 [ip] <defunct>
>> > > > 
>> > > > In do_exit() with this trace:
>> > > > 
>> > > > [ 1630.782255] ip            x ffff88063fcb3600     0  3012   3011 0x00000000
>> > > > [ 1630.789121]  ffff880631328000 0000000000000046 0000000000000000 ffff880633104380
>> > > > [ 1630.796524]  0000000000013600 ffff88062f031fd8 0000000000013600 0000000000013600
>> > > > [ 1630.803925]  ffff8806313282d8 ffff8806313282e0 ffff880631328000 0000000000013600
>> > > > [ 1630.811324] Call Trace:
>> > > > [ 1630.813760]  [<ffffffff8104a90d>] ? do_exit+0x716/0x724
>> > > > [ 1630.818964]  [<ffffffff8104a995>] ? do_group_exit+0x7a/0xa4
>> > > > [ 1630.824512]  [<ffffffff8104a9d1>] ? sys_exit_group+0x12/0x16
>> > > > [ 1630.830149]  [<ffffffff81009a82>] ? system_call_fastpath+0x16/0x1b
>> > > > 
>> > > > The address comes down to the schedule() call:
>> > > > 
>> > > > (gdb) l *(do_exit+0x716)
>> > > > 0xffffffff8104a90d is in do_exit (kernel/exit.c:1034).
>> > > > 1029            preempt_disable();
>> > > > 1030            exit_rcu();
>> > > > 1031            /* causes final put_task_struct in finish_task_switch(). */
>> > > > 1032            tsk->state = TASK_DEAD;
>> > > > 1033            schedule();
>> > > > 1034            BUG();
>> > > > 1035            /* Avoid "noreturn function does return".  */
>> > > > 1036            for (;;)
>> > > > 1037                    cpu_relax();    /* For when BUG is null */
>> > > > 1038    }
>> > > > 
>> > > > Needless to say, KVM is not starting up. This works just fine on
>> > > > 2.6.35.1 and so is a regression. I can't do a lot of testing on this as
>> > > > the host is the machine that hosts all my build and test environments....
>> > > > 
>> > > > Cheers,
>> > > > 
>> > > > Dave.
>> > > 
>> > > Could it be the same problem than 
>> > > 
>> > > http://kerneltrap.com/mailarchive/linux-netdev/2010/10/23/6288128
>> > > 
>> > > Try to revert bee31369ce16fc3898ec9a54161248c9eddb06bc ?
>> > 
>> > It's working fine on 2.6.36 right now, so it's something that came in
>> > with the .37 merge cycle...
>> 
>> Actually, the machine isn't running a 2.6.36 kernel (it had booted
>> to the working .35 kernel and I didn't notice). So i've just tested
>> a 2.6.36 kernel, and the problem _is present_ in 2.6.36. I've
>> reverted the above commit but that does not fix the problem.
>
>Ok, so further investigation has shown I can reproduce this on
>2.6.32 and 2.6.35. It's not a new bug, nor do I think that it is
>a networking bug as it is not specific to the ip command.
>
>The trigger for the problem is actually an upgrade of the sudo
>package in debian unstable which changed the behaviour of sudo (has
>some per-login/pty restriction on it now). Basically, the startup
>script I'm running does:
>
>sudo kvm .....
>
>which then executes the qemu-ifup bash script which does:
>
>	sudo ip ....
>	sudo brctl ...
>
>because at one point KVM did not create the tap device automatically
>and so kvm could be run as a user with only the ifup script
>requiring privileges to create the tap device and mark it up. When
>KVM started creating the tap device, I added the sudo to the KVM
>script, an everything worked again.
>
>Now if I take the 'sudo' out of the ifup script, the hang goes away.
>I first removed it from the ip command, and then the brctl command
>hung in the same way the ip command was hanging. Hence my thoughts
>that it is not directly related to networking utilities.
>Unfortunately, it is not trivial to reproduce as I could only
>trigger it through this kvm method, not on the command line. e.g:
>
>$ sudo bash -c "sudo ip link set tap1 up"
>
>does not hang.
>
>This sudo package upgrade coincided with kernel upgrades, and so
>that lead to my confusion about where it occurred and what triggered
>it.  Still, it appears to be a bug that has been around for some
>time.....
>

Interesting, the scheduler failed to put the dead task out of
run queue, so to me this is likely to be a scheduler bug.
I have no idea how sudo can change the behaviour here.

Another guess is we need a smp_wmb() before schedule() above.

We need to Cc Oleg and Ingo.

^ permalink raw reply

* Re: [PATCH 1/1] UDEV - Add 'udevlom' command line param to start_udev
From: Sujit K M @ 2010-11-04  8:37 UTC (permalink / raw)
  To: Greg KH, Narendra_K, linux-hotplug, netdev, Matt_Domsch,
	Jordan_Hargrave, Charles_Rose
In-Reply-To: <20101103183247.GA28139@bongo.bofh.it>

> (Maybe with a more descriptive name than "UDEVLOM".)

What is different from the already used Name?

>
> --
> ciao,
> Marco
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEARECAAYFAkzRqs8ACgkQFGfw2OHuP7EYEACZAeDc/phuXkT89y+bGtsYROYN
> Pw0An1Lti6pAyXjt/pjIj8L9h7V5hXTC
> =Keaz
> -----END PGP SIGNATURE-----
>
>



-- 
-- Sujit K M

blog(http://kmsujit.blogspot.com/)

^ permalink raw reply

* [PATCH v14 01/17] Add a new structure for skb buffer from external.
From: xiaohui.xin @ 2010-11-04  9:04 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <1288861513-5707-1-git-send-email-xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 include/linux/skbuff.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 77eb60d..696e690 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -211,6 +211,15 @@ struct skb_shared_info {
 	skb_frag_t	frags[MAX_SKB_FRAGS];
 };
 
+/* The structure is for a skb which pages may point to
+ * an external buffer, which is not allocated from kernel space.
+ * It also contains a destructor for itself.
+ */
+struct skb_ext_page {
+	struct		page *page;
+	void		(*dtor)(struct skb_ext_page *);
+};
+
 /* We divide dataref into two halves.  The higher 16 bits hold references
  * to the payload part of skb->data.  The lower 16 bits hold references to
  * the entire skb->data.  A clone of a headerless skb holds the length of
-- 
1.7.3


^ permalink raw reply related

* [PATCH v14 02/17] Add a new struct for device to manipulate external buffer.
From: xiaohui.xin @ 2010-11-04  9:04 UTC (permalink / raw)
  To: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike; +Cc: Xin Xiaohui
In-Reply-To: <fc6e95d63a2c62aaf77f8ded22fc43ccefcdbbff.1288860477.git.xiaohui.xin@intel.com>

From: Xin Xiaohui <xiaohui.xin@intel.com>

    Add a structure in structure net_device, the new field is
    named as mp_port. It's for mediate passthru (zero-copy).
    It contains the capability for the net device driver,
    a socket, and an external buffer creator, external means
    skb buffer belongs to the device may not be allocated from
    kernel space.

    Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
    Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
    Reviewed-by: Jeff Dike <jdike@linux.intel.com>
---
 include/linux/netdevice.h |   25 ++++++++++++++++++++++++-
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 46c36ff..f6b1870 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -325,6 +325,28 @@ enum netdev_state_t {
 	__LINK_STATE_DORMANT,
 };
 
+/*The structure for mediate passthru(zero-copy). */
+struct mp_port	{
+	/* the header len */
+	int		hdr_len;
+	/* the max payload len for one descriptor */
+	int		data_len;
+	/* the pages for DMA in one time */
+	int		npages;
+	/* the socket bind to */
+	struct socket	*sock;
+	/* the header len for virtio-net */
+	int		vnet_hlen;
+	/* the external buffer page creator */
+	struct skb_ext_page *(*ctor)(struct mp_port *,
+				struct sk_buff *, int);
+	/* the hash function attached to find according
+	 * backend ring descriptor info for one external
+	 * buffer page.
+	 */
+	struct skb_ext_page *(*hash)(struct net_device *,
+				struct page *);
+};
 
 /*
  * This structure holds at boot time configured netdevice settings. They
@@ -1045,7 +1067,8 @@ struct net_device {
 
 	/* GARP */
 	struct garp_port	*garp_port;
-
+	/* mpassthru */
+	struct mp_port		*mp_port;
 	/* class/net/name entry */
 	struct device		dev;
 	/* space for optional device, statistics, and wireless sysfs groups */
-- 
1.7.3


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox