Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net 3/7] qlge: Garbage values shown in extra info during selftest.
From: Jitendra Kalsaria @ 2012-07-05 17:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ron Mercer, Dept-NX Linux NIC Driver
In-Reply-To: <20120705.002341.316014337743384600.davem@davemloft.net>



-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net] 
>Sent: Thursday, July 05, 2012 12:24 AM
>To: Jitendra Kalsaria
>Cc: netdev; Ron Mercer; Dept-NX Linux NIC Driver
>Subject: Re: [PATCH net 3/7] qlge: Garbage values shown in extra info during selftest.
>
>
>Why are you posting an arbitrary patch from a patch series,
>yet not the rest of that series?
>
>This needs to be sent alongside the rest of the series.

I haven't sent any arbitrary patch, seems like something wrong with mail server.

Thanks for letting me know about this will get it fixed. 

^ permalink raw reply

* Re: [net-next RFC V5 0/5] Multiqueue virtio-net
From: Rick Jones @ 2012-07-05 17:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>

On 07/05/2012 03:29 AM, Jason Wang wrote:

>
> Test result:
>
> 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
>
> - Guest to External Host TCP STREAM
> sessions size throughput1 throughput2   norm1 norm2
> 1 64 650.55 655.61 100% 24.88 24.86 99%
> 2 64 1446.81 1309.44 90% 30.49 27.16 89%
> 4 64 1430.52 1305.59 91% 30.78 26.80 87%
> 8 64 1450.89 1270.82 87% 30.83 25.95 84%

Was the -D test-specific option used to set TCP_NODELAY?  I'm guessing 
from your description of how packet sizes were smaller with multiqueue 
and your need to hack tcp_write_xmit() it wasn't but since we don't have 
the specific netperf command lines (hint hint :) I wanted to make certain.

Instead of calling them throughput1 and throughput2, it might be more 
clear in future to identify them as singlequeue and multiqueue.

Also, how are you combining the concurrent netperf results?  Are you 
taking sums of what netperf reports, or are you gathering statistics 
outside of netperf?

> - TCP RR
> sessions size throughput1 throughput2   norm1 norm2
> 50 1 54695.41 84164.98 153% 1957.33 1901.31 97%

A single instance TCP_RR test would help confirm/refute any non-trivial 
change in (effective) path length between the two cases.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH net-next 13/15] netfilter: nfdbus: Add D-bus message parsing
From: Javier Martinez Canillas @ 2012-07-05 17:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Javier Martinez Canillas, Vincent Sanders, netdev, linux-kernel,
	David S. Miller, Alban Crequy
In-Reply-To: <20120704173047.GA8864@1984>

On Wed, Jul 4, 2012 at 7:30 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Jul 02, 2012 at 05:43:43PM +0200, Javier Martinez Canillas wrote:
>> On 06/29/2012 07:11 PM, Pablo Neira Ayuso wrote:
>> > On Fri, Jun 29, 2012 at 05:45:52PM +0100, Vincent Sanders wrote:
>> >> From: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
>> >>
>> >> The netfilter D-Bus module needs to parse D-bus messages sent by
>> >> applications to decide whether a peer can receive or not a D-Bus
>> >> message. Add D-bus message parsing logic to be able to analyze.
>> >
>> > Not talking about the entire patchset, only about the part I'm
>> > responsible for.
>> >
>> > I don't see why you think this belong to netfilter at all.
>> >
>> > This doesn't integrate into the existing filtering infrastructure,
>> > neither it extends it in any way.
>> >
>>
>> Hello Pablo,
>>
>> Thanks a lot for your feedback.
>>
>> This is the first of a set of patches that adds a netfilter module to parse
>> D-Bus messages, the complete patch-set is:
>>
>> [PATCH 13/15] netfilter: nfdbus: Add D-bus message parsing
>> [PATCH 14/15] netfilter: nfdbus: Add D-bus match rule implementation
>> [PATCH 15/15] netfilter: add netfilter D-Bus module
>>
>> patches 13 and 14 just include D-Bus helper code to be used by the netfilter
>> module (added on patch 15) and specially the dbus_filter netfilter hook function.
>
> I see, the use of the netfilter hooks seems to be the only reason why
> you consider these chunks belong to netfilter.
>
>> For the next post version we will reorganize the patches so first the D-Bus
>> netfilter module is added with an empty dbus_filter function and then added the
>> D-Bus helper code.
>>
>> Also, we will move the nfdbus netfilter module to net/bus so is not inside the
>> netfilter core code.
>
> Yes, please, remove this stuff from my directory tree, I believe this
> filtering infrastructure has not much to do with Netfilter itself.
>
> It uses the connector to communicate kernel <-> userspace instead of
> nfnetlink and, as said, it does neither integrate into existing
> filtering kernel/userspace infrastructure nor integrates into it.
>
> So, please, if you plan to give another try to this patchset, move
> this to your net/bus directory as you propose and find a different
> (better) name for the filtering part (just to avoid confusion in the
> future).
>
> Thanks.

Hi Pablo,

Thanks a lot for your feedback and comments.

On a next patch-set post I'll move to net/bus and change the name as
you suggest.

Best regards,

-- 
Javier Martínez Canillas
(+34) 682 39 81 69
Barcelona, Spain

^ permalink raw reply

* Re: [PATCH 0/5] rtcache remove respin
From: Eric Dumazet @ 2012-07-05 19:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120705.031539.2275742387594459652.davem@davemloft.net>

On Thu, 2012-07-05 at 03:15 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 02 Jul 2012 12:44:01 +0200
> 
> > If we still want __refcnt being on cache line boundary, we might find a
> > better way to accomplish this.
> 
> Back to this issue again.
> 
> Eric, if you take a look at net-next right now, I left a dummy padding
> in dst_entry where the neighbour pointer used to be.
> 
> Can you come up with some way to make use of that new space?
> 

If route cache is removed, I believe we can remove all paddings.

Each tcp session will have its own dst_entry, instead of being shared.

^ permalink raw reply

* Re: [net-next RFC V5 4/5] virtio_net: multiqueue support
From: Amos Kong @ 2012-07-05 20:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-5-git-send-email-jasowang@redhat.com>

On 07/05/2012 06:29 PM, Jason Wang wrote:
> This patch converts virtio_net to a multi queue device. After negotiated
> VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs,
> and driver could read the number from config space.
> 
> The driver expects the number of rx/tx queue paris is equal to the number of
> vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
> optimization were introduced:
> 
> - Txq selection is based on the processor id in order to avoid contending a lock
>   whose owner may exits to host.
> - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
>   the queue pairs.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---

...

>  
>  static int virtnet_probe(struct virtio_device *vdev)
>  {
> -	int err;
> +	int i, err;
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
> +	u16 num_queues, num_queue_pairs;
> +
> +	/* Find if host supports multiqueue virtio_net device */
> +	err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
> +				offsetof(struct virtio_net_config,
> +				num_queues), &num_queues);
> +
> +	/* We need atleast 2 queue's */


s/atleast/at least/


> +	if (err || num_queues < 2)
> +		num_queues = 2;
> +	if (num_queues > MAX_QUEUES * 2)
> +		num_queues = MAX_QUEUES;

                num_queues = MAX_QUEUES * 2;

MAX_QUEUES is the limitation of RX or TX.

> +
> +	num_queue_pairs = num_queues / 2;

...

-- 
			Amos.

^ permalink raw reply

* Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Amos Kong @ 2012-07-05 20:07 UTC (permalink / raw)
  To: Sasha Levin
  Cc: krkumar2, habanero, kvm, mst, netdev, mashirle, linux-kernel,
	virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341492679.18786.18.camel@lappy>

On 07/05/2012 08:51 PM, Sasha Levin wrote:
> On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
>> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>>                 vi->has_cvq = true;
>>  


>> +       /* Use single tx/rx queue pair as default */
>> +       vi->num_queue_pairs = 1;
>> +       vi->total_queue_pairs = num_queue_pairs; 

vi->total_queue_pairs also should be set to 1

           vi->total_queue_pairs = 1;

> 
> The code is using this "default" even if the amount of queue pairs it
> wants was specified during initialization. This basically limits any
> device to use 1 pair when starting up.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
			Amos.

^ permalink raw reply

* Re: AF_BUS socket address family
From: Jan Engelhardt @ 2012-07-05 21:06 UTC (permalink / raw)
  To: Vincent Sanders; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20120629231236.GA28593@mail.collabora.co.uk>


On Saturday 2012-06-30 01:12, Vincent Sanders wrote:
>
>Firstly it is intended is an interprocess mechanism and not to rely on
>a configured IP system, indeed one of its primary usages is to
>provide mechanism for various tools to set up IP networking.

Using IP as a localhost IPC is not uncommon (independent of
software preferring AF_UNIX, if so available). Distro boot
scripts have been running `ip addr add ::1/128 dev lo`
all these years along.

And now we suddently need a DBUS program just to configure
IP-based localhost IPC? I can see the flaw in that.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: Initialize the neighbour pointer of rt6_info on allocation
From: David Miller @ 2012-07-05 21:21 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20120705131828.GE1869@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 5 Jul 2012 15:18:28 +0200

> git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
> added a neighbour pointer to rt6_info. Currently we don't initialize
> this pointer at allocation time. We assume this pointer to be valid
> if it is not a null pointer, so initialize it on allocation.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Applied, but as Eric said we need to find a way to avoid having to
make changes like this every time we simply want to add a struct
member to rt6_info.

^ permalink raw reply

* Re: [net-next:master] general protection fault in __nla_put()
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
  To: wfg; +Cc: netdev
In-Reply-To: <20120705134857.GA14643@localhost>


Steffen Klassert posted a patch which fixes this.

^ permalink raw reply

* Re: ipv6 problem with 6lowpan
From: David Miller @ 2012-07-05 21:22 UTC (permalink / raw)
  To: alex.bluesman.smirnov; +Cc: netdev
In-Reply-To: <CAJmB2rD8U1ihy4Ai6y5QGjj4f7txDabszesrNrQ=pgEbscePqQ@mail.gmail.com>


Should be fixed by Steffen Kassert's patch which I just pushed into net-next

^ permalink raw reply

* Re: [PATCH 0/5] rtcache remove respin
From: David Miller @ 2012-07-05 21:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1341515017.3265.6.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 21:03:37 +0200

> If route cache is removed, I believe we can remove all paddings.
> 
> Each tcp session will have its own dst_entry, instead of being shared.

Not really, the routing cache removal patches have poor performance
and won't go-in as-is. :-) Once PMTU/redirect/TCP-metrics are reworked
I plan to do things like the patch below to make the performance loss
more acceptable.

And then I'll do the same for input routes too, at which point your
'noref' case can be put back.

So really, we have to consider how to rework the layout of this
structure.

Thanks.

====================
ipv4: Cache output routes in fib_info nexthops.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h     |    3 +++
 net/ipv4/fib_semantics.c |    2 ++
 net/ipv4/route.c         |    9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 3dc7c96..ff9f0c4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -45,6 +45,7 @@ struct fib_config {
  };
 
 struct fib_info;
+struct rtable;
 
 struct fib_nh {
 	struct net_device	*nh_dev;
@@ -63,6 +64,8 @@ struct fib_nh {
 	__be32			nh_gw;
 	__be32			nh_saddr;
 	int			nh_saddr_genid;
+
+	struct rtable		*rth;
 };
 
 /*
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c46c20b..f3ada74 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -148,6 +148,8 @@ static void free_fib_info_rcu(struct rcu_head *head)
 	change_nexthops(fi) {
 		if (nexthop_nh->nh_dev)
 			dev_put(nexthop_nh->nh_dev);
+		if (nexthop_nh->rth)
+			dst_release(&nexthop_nh->rth->dst);
 	} endfor_nexthops(fi);
 
 	release_net(fi->fib_net);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9f68f74..35bfd98 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -914,6 +914,8 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		dst->tclassid = FIB_RES_NH(*res).nh_tclassid;
 #endif
+		FIB_RES_NH(*res).rth = rt;
+		dst_clone(&rt->dst);
 	}
 
 	if (dst_mtu(dst) > IP_MAX_MTU)
@@ -1399,6 +1401,13 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 			fi = NULL;
 	}
 
+	if (fi) {
+		rth = FIB_RES_NH(*res).rth;
+		if (rth) {
+			dst_use(&rth->dst, jiffies);
+			return rth;
+		}
+	}
 	rth = rt_dst_alloc(dev_out,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(in_dev, NOXFRM));
-- 
1.7.10

^ permalink raw reply related

* [PATCH] gianfar: fix potential sk_wmem_alloc imbalance
From: Eric Dumazet @ 2012-07-05 21:45 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Manfred Rudigier, Claudiu Manoil, Jiajun Wu,
	Paul Gortmaker, Andy Fleming

From: Eric Dumazet <edumazet@google.com>

commit db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance

If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.

Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andy Fleming <afleming@freescale.com>
---

Note : I don't have the hardware and discovered this problem by code
analysis. So please compile and run this patch before Acking it,
thanks !

BTW, dev->needed_headroom should be set to GMAC_FCB_LEN + GMAC_TXPAL_LEN
to avoid reallocations...

 drivers/net/ethernet/freescale/gianfar.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index f2db8fc..ab1d80f 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2063,10 +2063,9 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			return NETDEV_TX_OK;
 		}
 
-		/* Steal sock reference for processing TX time stamps */
-		swap(skb_new->sk, skb->sk);
-		swap(skb_new->destructor, skb->destructor);
-		kfree_skb(skb);
+		if (skb->sk)
+			skb_set_owner_w(skb_new, skb->sk);
+		consume_skb(skb);
 		skb = skb_new;
 	}
 

^ permalink raw reply related

* Re: [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-07-05 21:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120629.153425.24594752441419170.davem@davemloft.net>

On Fri, 2012-06-29 at 15:34 -0700, David Miller wrote: 
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Fri, 29 Jun 2012 12:32:45 -0700
> 
> > commit 1f85d58cdf15354a7120fc9ccc9bb9c45b53af88
> >     cnic: Remove uio mem[0].
> > 
> > introduced a regression as older versions of userspace app still rely
> > on this mmap.  Restore the mmap functionality and get the base address
> > from pci_resource_start() as the nedev->base_addr has been deprecated for
> > PCI devices.
> > 
> > Update version to 2.5.12.
> > 
> > Signed-off-by: Michael Chan <mchan@broadocm.com>
> 
> I really couldn't believe what you guys were doing in the original
> commit, but I decided to let you do stupid things and find out the
> hard way that removing any user visible interface is basically
> impossible.
> 
> Applied, thanks.
> 

David, this patch plus the earlier commit are also needed for the net
tree because netdev->base_addr was removed there.  Can you apply these
directly to the net tree?  Or you want me to send you the equivalent
patches for net.  Thanks.

^ permalink raw reply

* Re: [PATCH] force dentry revalidation after namespace change
From: Serge E. Hallyn @ 2012-07-05 22:17 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, netdev, Andrew Morton, Tejun Heo, Eric W. Biederman,
	Greg Kroah-Hartman
In-Reply-To: <1341496805-26394-1-git-send-email-glommer@parallels.com>

Quoting Glauber Costa (glommer@parallels.com):
> When we change the namespace tag of a sysfs entry, the associated dentry
> is still kept around. readdir() will work correctly and not display the
> old entries, but open() will still succeed, so will reads and writes.
> 
> This will no longer happen if sysfs is remounted, hinting that this is a
> cache-related problem.
> 
> I am using the following sequence to demonstrate that:
> 
> shell1:
> ip link add type veth
> unshare -nm
> 
> shell2:
> ip link set veth1 <pid_of_shell_1>
> cat /sys/devices/virtual/net/veth1/ifindex
> 
> Before that patch, this will succeed (fail to fail). After it, it will

Confirmed that it currently fails to fail :)

> correctly return an error. Differently from a normal rename, which we
> handle fine, changing the object namespace will keep it's path intact.
> So this check seems necessary as well.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>

Haven't run it, but the patch looks good.  Thanks, Glauber.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> CC: Tejun Heo <tj@kernel.org>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  fs/sysfs/dir.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..c24bdd9 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -307,6 +307,7 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
>  {
>  	struct sysfs_dirent *sd;
>  	int is_dir;
> +	int type;
>  
>  	if (nd->flags & LOOKUP_RCU)
>  		return -ECHILD;
> @@ -314,6 +315,10 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
>  	sd = dentry->d_fsdata;
>  	mutex_lock(&sysfs_mutex);
>  
> +	type = sysfs_ns_type(sd);
> +	if (sd->s_ns && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
> +		goto out_bad;
> +
>  	/* The sysfs dirent has been deleted */
>  	if (sd->s_flags & SYSFS_FLAG_REMOVED)
>  		goto out_bad;
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH net-next] cnic: Fix mmap regression.
From: David Miller @ 2012-07-05 22:36 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1341525586.7472.25.camel@LTIRV-MCHAN1.corp.ad.broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 5 Jul 2012 14:59:46 -0700

> Or you want me to send you the equivalent patches for net.

Please do so.

^ permalink raw reply

* [PATCH net] Bug fix for batman-adv 2012-07-06
From: Antonio Quartulli @ 2012-07-05 22:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n

here I have a fix intended for net/linux-3.5.

The bug, discovered by Guido Iribarren and fixed by Simon Wunderlich, is caused
by the wrong interaction between the Bridge Loop Avoidance and the Gateway
feature of batman-adv.

Let me know if there are problems.

Thank you,
	Antonio

The following changes since commit 9e85a6f9dc231f3ed3c1dc1b12217505d970142a:

  Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux (2012-07-03 18:06:49 -0700)

are available in the git repository at:


  git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem

for you to fetch changes up to 2d3f6ccc4ea5c74d4b4af1b47c56b4cff4bbfcb7:

  batman-adv: check incoming packet type for bla (2012-07-06 00:08:46 +0200)

----------------------------------------------------------------
Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
  Bridge Loop Avoidance

----------------------------------------------------------------
Simon Wunderlich (1):
      batman-adv: check incoming packet type for bla

 net/batman-adv/bridge_loop_avoidance.c |   15 +++++++++++----
 net/batman-adv/bridge_loop_avoidance.h |    5 +++--
 net/batman-adv/soft-interface.c        |    6 +++++-
 3 files changed, 19 insertions(+), 7 deletions(-)

^ permalink raw reply

* [PATCH net] batman-adv: check incoming packet type for bla
From: Antonio Quartulli @ 2012-07-05 22:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Simon Wunderlich, Simon Wunderlich
In-Reply-To: <1341528514-27906-1-git-send-email-ordex@autistici.org>

From: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>

If the gateway functionality is used, some broadcast packets (DHCP
requests) may be transmitted as unicast packets. As the bridge loop
avoidance code now only considers the payload Ethernet destination,
it may drop the DHCP request for clients which are claimed by other
backbone gateways, because it falsely infers from the broadcast address
that the right backbone gateway should havehandled the broadcast.

Fix this by checking and delegating the batman-adv packet type used
for transmission.

Reported-by: Guido Iribarren <guidoiribarren@buenosaireslibre.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
---
 net/batman-adv/bridge_loop_avoidance.c |   15 +++++++++++----
 net/batman-adv/bridge_loop_avoidance.h |    5 +++--
 net/batman-adv/soft-interface.c        |    6 +++++-
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 8bf9751..c5863f4 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1351,6 +1351,7 @@ void bla_free(struct bat_priv *bat_priv)
  * @bat_priv: the bat priv with all the soft interface information
  * @skb: the frame to be checked
  * @vid: the VLAN ID of the frame
+ * @is_bcast: the packet came in a broadcast packet type.
  *
  * bla_rx avoidance checks if:
  *  * we have to race for a claim
@@ -1361,7 +1362,8 @@ void bla_free(struct bat_priv *bat_priv)
  * process the skb.
  *
  */
-int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
+int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
+	   bool is_bcast)
 {
 	struct ethhdr *ethhdr;
 	struct claim search_claim, *claim = NULL;
@@ -1380,7 +1382,7 @@ int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
 
 	if (unlikely(atomic_read(&bat_priv->bla_num_requests)))
 		/* don't allow broadcasts while requests are in flight */
-		if (is_multicast_ether_addr(ethhdr->h_dest))
+		if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast)
 			goto handled;
 
 	memcpy(search_claim.addr, ethhdr->h_source, ETH_ALEN);
@@ -1406,8 +1408,13 @@ int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid)
 	}
 
 	/* if it is a broadcast ... */
-	if (is_multicast_ether_addr(ethhdr->h_dest)) {
-		/* ... drop it. the responsible gateway is in charge. */
+	if (is_multicast_ether_addr(ethhdr->h_dest) && is_bcast) {
+		/* ... drop it. the responsible gateway is in charge.
+		 *
+		 * We need to check is_bcast because with the gateway
+		 * feature, broadcasts (like DHCP requests) may be sent
+		 * using a unicast packet type.
+		 */
 		goto handled;
 	} else {
 		/* seems the client considers us as its best gateway.
diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h
index e39f93a..dc5227b 100644
--- a/net/batman-adv/bridge_loop_avoidance.h
+++ b/net/batman-adv/bridge_loop_avoidance.h
@@ -23,7 +23,8 @@
 #define _NET_BATMAN_ADV_BLA_H_
 
 #ifdef CONFIG_BATMAN_ADV_BLA
-int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
+int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
+	   bool is_bcast);
 int bla_tx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
 int bla_is_backbone_gw(struct sk_buff *skb,
 		       struct orig_node *orig_node, int hdr_size);
@@ -41,7 +42,7 @@ void bla_free(struct bat_priv *bat_priv);
 #else /* ifdef CONFIG_BATMAN_ADV_BLA */
 
 static inline int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb,
-			 short vid)
+			 short vid, bool is_bcast)
 {
 	return 0;
 }
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 6e2530b..a0ec0e4 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -256,7 +256,11 @@ void interface_rx(struct net_device *soft_iface,
 	struct bat_priv *bat_priv = netdev_priv(soft_iface);
 	struct ethhdr *ethhdr;
 	struct vlan_ethhdr *vhdr;
+	struct batman_header *batadv_header = (struct batman_header *)skb->data;
 	short vid __maybe_unused = -1;
+	bool is_bcast;
+
+	is_bcast = (batadv_header->packet_type == BAT_BCAST);
 
 	/* check if enough space is available for pulling, and pull */
 	if (!pskb_may_pull(skb, hdr_size))
@@ -302,7 +306,7 @@ void interface_rx(struct net_device *soft_iface,
 	/* Let the bridge loop avoidance check the packet. If will
 	 * not handle it, we can safely push it up.
 	 */
-	if (bla_rx(bat_priv, skb, vid))
+	if (bla_rx(bat_priv, skb, vid, is_bcast))
 		goto out;
 
 	netif_rx(skb);
-- 
1.7.9.4

^ permalink raw reply related

* Re: [B.A.T.M.A.N.] [PATCH net] Bug fix for batman-adv 2012-07-06
From: Antonio Quartulli @ 2012-07-05 22:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1341528514-27906-1-git-send-email-ordex@autistici.org>

[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]

On Fri, Jul 06, 2012 at 12:48:33 +0200, Antonio Quartulli wrote:
> here I have a fix intended for net/linux-3.5.
 ...


Hello David,

here you have our instructions to resolve the conflicts that you will hit while
merging net into net-next:




Conflict 1 (bridge_loop_avoidance.c):
<<<<<<<
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid)
=======
int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
	   bool is_bcast)
>>>>>>>

resolves to:
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid,
		  bool is_bcast)

Conflict 2 (bridge_loop_avoidance.h):
<<<<<<<
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_is_backbone_gw(struct sk_buff *skb,
			      struct batadv_orig_node *orig_node, int hdr_size);
int batadv_bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig);
int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv,
				   struct batadv_bcast_packet *bcast_packet,
				   int hdr_size);
void batadv_bla_update_orig_address(struct batadv_priv *bat_priv,
				    struct batadv_hard_iface *primary_if,
				    struct batadv_hard_iface *oldif);
int batadv_bla_init(struct batadv_priv *bat_priv);
void batadv_bla_free(struct batadv_priv *bat_priv);
=======
int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid,
	   bool is_bcast);
int bla_tx(struct bat_priv *bat_priv, struct sk_buff *skb, short vid);
int bla_is_backbone_gw(struct sk_buff *skb,
		       struct orig_node *orig_node, int hdr_size);
int bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int bla_is_backbone_gw_orig(struct bat_priv *bat_priv, uint8_t *orig);
int bla_check_bcast_duplist(struct bat_priv *bat_priv,
			    struct bcast_packet *bcast_packet, int hdr_size);
void bla_update_orig_address(struct bat_priv *bat_priv,
			     struct hard_iface *primary_if,
			     struct hard_iface *oldif);
int bla_init(struct bat_priv *bat_priv);
void bla_free(struct bat_priv *bat_priv);
>>>>>>>

resolves to:
int batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid,
		  bool is_bcast);
int batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, short vid);
int batadv_bla_is_backbone_gw(struct sk_buff *skb,
			      struct batadv_orig_node *orig_node, int hdr_size);
int batadv_bla_claim_table_seq_print_text(struct seq_file *seq, void *offset);
int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig);
int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, 
				   struct batadv_bcast_packet *bcast_packet,
				   int hdr_size);
void batadv_bla_update_orig_address(struct batadv_priv *bat_priv,
				    struct batadv_hard_iface *primary_if,
				    struct batadv_hard_iface *oldif);
int batadv_bla_init(struct batadv_priv *bat_priv);
void batadv_bla_free(struct batadv_priv *bat_priv);


Conflict 3 (bridge_loop_avoidance.h):
<<<<<<<
static inline int batadv_bla_rx(struct batadv_priv *bat_priv,
				struct sk_buff *skb, short vid)
=======                         
static inline int bla_rx(struct bat_priv *bat_priv, struct sk_buff *skb,
			 short vid, bool is_bcast)
>>>>>>>

resolves to:
static inline int batadv_bla_rx(struct batadv_priv *bat_priv,
				struct sk_buff *skb, short vid, bool is_bcast)

Conflict 4 (soft-interface.c):
<<<<<<<
	__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
=======
	bool is_bcast;

	is_bcast = (batadv_header->packet_type == BAT_BCAST);
>>>>>>>

resolves to:
	bool is_bcast;
	__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);

	is_bcast = (batadv_header->packet_type == BATADV_BCAST);


Conflict 5 (soft-interface.c):
<<<<<<<
	if (batadv_bla_rx(bat_priv, skb, vid))
=======
	if (bla_rx(bat_priv, skb, vid, is_bcast))
>>>>>>>

resolves to:
	if (batadv_bla_rx(bat_priv, skb, vid, is_bcast))


Wrong merge by git (soft-interface.c):
line 270 must look like this:
	struct batadv_header *batadv_header = (struct batadv_header *)skb->data;




-- 
Antonio Quartulli

..each of us alone is worth nothing..
Ernesto "Che" Guevara

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [iproute2] display vlan configuration
From: Fabien C. @ 2012-07-05 23:06 UTC (permalink / raw)
  To: netdev

Hello, 

it looks like there is no way to show the vlan configuration with iproute (nor with any other tool apparently). 

This can lead to trouble since : 
 # ip link add link eth0 name eth2.333 type vlan id 444

will create an interface that will show up like this with "ip link show" : 
 51: eth2.333@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN

The only hint we have is the interface name, which may not be related to the vlan id we set earlier. 

Is there any way to get that information? 

Thanks, 
Fabien 

^ permalink raw reply

* Re: [iproute2] display vlan configuration
From: John Fastabend @ 2012-07-05 23:20 UTC (permalink / raw)
  To: Fabien C.; +Cc: netdev
In-Reply-To: <4FF61DE9.7000507@jetable.org>

On 7/5/2012 4:06 PM, Fabien C. wrote:
> Hello,
>
> it looks like there is no way to show the vlan configuration with iproute (nor with any other tool apparently).
>
> This can lead to trouble since :
>   # ip link add link eth0 name eth2.333 type vlan id 444
>
> will create an interface that will show up like this with "ip link show" :
>   51: eth2.333@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
> The only hint we have is the interface name, which may not be related to the vlan id we set earlier.

Here you need to show the details,

#ip -d link show dev eth2.333

 From my current setup,

# ip -d link show dev vlan0
33: vlan0@eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
     link/ether 00:1b:21:55:23:59 brd ff:ff:ff:ff:ff:ff
     vlan id 101 <REORDER_HDR>

^ permalink raw reply

* Re: [PATCH] force dentry revalidation after namespace change
From: Eric W. Biederman @ 2012-07-05 23:31 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, netdev, Andrew Morton, Tejun Heo,
	Greg Kroah-Hartman
In-Reply-To: <1341496805-26394-1-git-send-email-glommer@parallels.com>

Glauber Costa <glommer@parallels.com> writes:

> When we change the namespace tag of a sysfs entry, the associated dentry
> is still kept around. readdir() will work correctly and not display the
> old entries, but open() will still succeed, so will reads and writes.
>
> This will no longer happen if sysfs is remounted, hinting that this is a
> cache-related problem.

Equalivalently to remounting you can do
echo 3 > /proc/sys/vm/drop_caches.

> I am using the following sequence to demonstrate that:
>
> shell1:
> ip link add type veth
> unshare -nm
>
> shell2:
> ip link set veth1 <pid_of_shell_1>
> cat /sys/devices/virtual/net/veth1/ifindex
>
> Before that patch, this will succeed (fail to fail). After it, it will
> correctly return an error. Differently from a normal rename, which we
> handle fine, changing the object namespace will keep it's path intact.
> So this check seems necessary as well.

Overall good bug spotting, and good spotting of where the fix should
live.

Your summary should have said:
[PATCH] fail dentry revalidation after namespace change

And you have the test slightly wrong below.

> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  fs/sysfs/dir.c |    5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..c24bdd9 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -307,6 +307,7 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
>  {
>  	struct sysfs_dirent *sd;
>  	int is_dir;
> +	int type;
>  
>  	if (nd->flags & LOOKUP_RCU)
>  		return -ECHILD;
> @@ -314,6 +315,10 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
>  	sd = dentry->d_fsdata;
>  	mutex_lock(&sysfs_mutex);
>  
> +	type = sysfs_ns_type(sd);
> +	if (sd->s_ns && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
> +		goto out_bad;
> +

First this check should be down below with after the other rename
checks.

Second the test should be:
	type = KOBJ_NS_TYPE_NONE;
	if (sd->s_parent)
		type = sysfs_ns_type(sd->s_parent);
	if (type && (sysfs_info(dentry->d_sb)->ns[type] != sd->s_ns))
        	goto out_bad;

The important difference there it is the directory that the dirent is
in that the type comes from.  Not the dirent itself.

>  	/* The sysfs dirent has been deleted */
>  	if (sd->s_flags & SYSFS_FLAG_REMOVED)
>  		goto out_bad;

Glauber.  Do you think you can fix your patch and resubmit.

Eric

^ permalink raw reply

* Re: [PATCH net-next] cnic: Fix mmap regression.
From: Michael Chan @ 2012-07-05 23:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120705.153638.790030674286651971.davem@davemloft.net>

On Thu, 2012-07-05 at 15:36 -0700, David Miller wrote: 
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Thu, 5 Jul 2012 14:59:46 -0700
> 
> > Or you want me to send you the equivalent patches for net.
> 
> Please do so.
> 

OK.  I'll send you one patch to fix it in net, instead of one that
causes regression, and another one to fix it.

^ permalink raw reply

* [PATCH net] cnic: Don't use netdev->base_addr
From: Michael Chan @ 2012-07-06  0:21 UTC (permalink / raw)
  To: davem; +Cc: netdev

    commit c0357e975afdbbedab5c662d19bef865f02adc17
    bnx2: stop using net_device.{base_addr, irq}.

removed netdev->base_addr so we need to update cnic to get the MMIO
base address from pci_resource_start().  Otherwise, mmap of the uio
device will fail.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index c95e7b5..3c95065 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1053,12 +1053,13 @@ static int cnic_init_uio(struct cnic_dev *dev)
 
 	uinfo = &udev->cnic_uinfo;
 
-	uinfo->mem[0].addr = dev->netdev->base_addr;
+	uinfo->mem[0].addr = pci_resource_start(dev->pcidev, 0);
 	uinfo->mem[0].internal_addr = dev->regview;
-	uinfo->mem[0].size = dev->netdev->mem_end - dev->netdev->mem_start;
 	uinfo->mem[0].memtype = UIO_MEM_PHYS;
 
 	if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
+		uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
+						     TX_MAX_TSS_RINGS + 1);
 		uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
 					PAGE_MASK;
 		if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
@@ -1068,6 +1069,8 @@ static int cnic_init_uio(struct cnic_dev *dev)
 
 		uinfo->name = "bnx2_cnic";
 	} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
+		uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);
+
 		uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
 			PAGE_MASK;
 		uinfo->mem[1].size = sizeof(*cp->bnx2x_def_status_blk);
-- 
1.7.1

^ permalink raw reply related

* Re: BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable plug on Acer Aspire One, no network
From: Alex Villacís Lasso @ 2012-07-06  0:35 UTC (permalink / raw)
  To: Marek Szyprowski; +Cc: 'Francois Romieu', netdev
In-Reply-To: <012601cd5a7b$886fd4c0$994f7e40$%szyprowski@samsung.com>

El 05/07/12 01:58, Marek Szyprowski escribió:
> Hello,
>
> On Thursday, July 05, 2012 6:15 AM Alex Villacís Lasso wrote:
>
>> El 04/07/12 02:02, Marek Szyprowski escribió:
>>> Hello,
>>>
>>> On Tuesday, July 03, 2012 4:27 PM Alex Villací¬s Lasso wrote:
>>>
>>>> El 03/07/12 00:40, Marek Szyprowski escribió:
>>>>> Hi Alex,
>>>>>
>>>>> On Tuesday, July 03, 2012 4:45 AM Alex Villacís Lasso wrote:
>>>>>
>>>>>> -------- Mensaje original --------
>>>>>> Asunto:  BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable
>>>>>> plug on Acer Aspire One, no network Fecha:  Mon, 02 Jul 2012 21:33:41 -0500 De:
>>>>>>     Alex Villacís Lasso <a_villacis@palosanto.com> Para:  Francois Romieu
>>>>>> <romieu@fr.zoreil.com> CC:  netdev@vger.kernel.org
>>>>>> El 01/07/12 08:50, Alex Villacís Lasso escribió:
>>>>>>> El 11/06/12 16:38, Francois Romieu escribió:
>>>>>>>> Alex Villacís Lasso <a_villacis@palosanto.com> :
>>>>>>>> [...]
>>>>>>>>> $ grep XID dmesg-3.5.0-rc2.txt
>>>>>>>>> [   15.873858] r8169 0000:02:00.0: eth0: RTL8102e at 0xf7c0e000,
>>>>>>>>> 00:1e:68:e5:5d:b1, XID 04a00000 IRQ 44
>>>>>>>> The 8102e has not been touched by that many suspect patches but I do
>>>>>>>> not see where the problem is :o(
>>>>>>>>
>>>>>>>> Can you peel off the r8169 patches between 3.4.0 and 3.5-rc ?
>>>>>>>>
>>>>>>> Still present in 3.5-rc5. Bisection still in progress.
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> My full bisection points to this commit:
>>>>>>
>>>>>> commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
>>>>>> Author: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>> Date:   Thu Dec 29 13:09:51 2011 +0100
>>>>>>
>>>>>>        X86: integrate CMA with DMA-mapping subsystem
>>>>>>
>>>>>>        This patch adds support for CMA to dma-mapping subsystem for x86
>>>>>>        architecture that uses common pci-dma/pci-nommu implementation. This
>>>>>>        allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
>>>>>>
>>>>>>        Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>        Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>>>>        CC: Michal Nazarewicz <mina86@mina86.com>
>>>>>>        Acked-by: Arnd Bergmann <arnd@arndb.de>
>>>>>>
>>>>>> Is this commit somehow messing with the network card DMA?
>>>>> This commit in fact touches DMA-mapping subsystem and introduces a bug,
>>>>> which has been finally fixed by commit c080e26edc3a2a3 merged to v3.5-rc3.
>>>>> After applying it the DMA-mapping subsystem should work exactly the same was
>>>>> as in v3.4. Could you please check if it fixes this issue?
>>>>>
>>>>> Best regards
>>>> No. It still fails in 3.5-rc5, as mentioned before.
>>> Hmm. I was a bit confused, because both the subject and git bisect log pointed to v3.5-rc2,
>>> which had that bug. Maybe there is one some other issue present in v3.5-rc5 not related to
>>> my patches?
>>>
>>> Could you check with v3.5-rc5 if reverting patch c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae
>>> and 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 fixes the problems with rtl driver?
>>>
>>> Best regards
>> Reverting the two patches indeed fixes the bug on -rc5.
> That's really strange. Could you check if you have CMA disabled in the config? After preparing
> a c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae fixup patch, I was really convinced that there are
> no functional changes in x86 dma mapping code when CMA is disabled. I will provide some
> patches to revert different parts of my changes, so we will find which line causes issues.
>
> Best regards
The affected system is an Acer Aspire One, a 32-bit only system. The 
option to enable or disable CMA simply does not appear as available in 
menuconfig to either enable or disable, and it also does not appear in 
the .config file as either set or unset. I assume this means that CMA is 
disabled.

^ permalink raw reply

* Re: Network namespace and bonding WARNING at fs/proc/generic.c remove_proc_entry
From: Eric W. Biederman @ 2012-07-06  0:41 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Dilip Daya, linux-kernel, containers, netdev
In-Reply-To: <20120705220749.GA11255@mail.hallyn.com>

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Dilip Daya (dilip.daya@hp.com):
>> Hi,
>> 
>> I'd discussed the following with Serge Hallyn.
>> 
>> => Environment based on 3.2.18 / x86_64 kernel.
>> => WARNING: at fs/proc/generic.c:808 remove_proc_entry+0xdb/0x21f()
>> => WARNING: at fs/proc/generic.c:849 remove_proc_entry+0x208/0x21f()
>
> Hi,
>
> thanks much for sending this.  I'm still getting this error on
> 3.5.0-2-generic (today's ubuntu quantal kernel)
>
>> network namespace and bonding
>> -----------------------------
>> 
>> * Migrate two phy nics from host to netns (netns0).
>>   - ip link set ethX netns netns0
>> 
>> * In host environment:
>>   - load bonding module, /sbin/modprobe -v bonding mode=1 miimon=100
>>   - /sys/class/net/bond0 exists.
>>   - /proc/net/bonding/bond0 exists.
>>   - /sys/class/net/bonding_masters has bond0.
>> 
>> * Migrate bond0 to netns (netns0):
>>   - ip link set bond0 netns netns0.
>> 
>> * Within netns (netns0):
>>   - /sys/class/net/bonding_masters is empty.
>>   - /sys/class/net/bond0 exist.
>>   - configure bond0 and ifenslave with two phy nics.
>>   - /proc/net/bonding/bond0 does not exist within netns0, but does
>>     exist in the host environment.
>>   - /sys/class/net/bonding_masters is empty.
>
> mine is not empty, fwiw.  However
>
>>   - ping to remote end of bond0 works.
>> 
>> * Within netns (netns0), flushing ethX and bondY:
>>   - down bond0 and its phy nic interfaces:
>>   - ip link set ... down
>>   - ip addr flush dev [bond0 | eth#]
>>   - deleting bond0, /sbin/ip link del dev bond0
>
> Yup I still get a remove_proc_entry WARNING at fs/proc/generic.c:808,
> which is the warning when (!de)

It looks like Dilip is running an old kernel.  There should have been
some version of /sys/class/net/bonding_masters in every network
namespace since sometime in 2009.

>From the warning it looks like the proc files are being added/removed
to the wrong network namespace.  So in one namespace we get an error
when we delete the moved device and in the other network namespace
we get an error when we remove the /proc/directory.

An old kernel without proper network namespace support is the only
reason I can imagine someone would be moving an existing bond device
between network namespaces.

If there are other reasons for wanting to move a bonding device between
network namespaces it is possible to catch the NETDEV_UNREGISTER and
NETDEV_REGISTER events to remove/add the per device proc files at the
appropriate time.

However since moving bonding devices appears to be an unneded operation
let's just do things simply and forbid moving bonding devices between
network namespaces.  Serge, Dilip can you two test the patch below
and see if it fixes the warnings.

Eric


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2ee8cf9..818ed64 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4345,6 +4345,9 @@ static void bond_setup(struct net_device *bond_dev)
        bond_dev->priv_flags |= IFF_BONDING;
        bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 
+       /* Don't allow bond devices to change network namespaces. */
+       bond_dev->features |= NETIF_F_LOCAL;
+
        /* At first, we block adding VLANs. That's the only way to
         * prevent problems that occur when adding VLANs over an
         * empty bond. The block will be removed once non-challenged

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox