Netdev List
 help / color / mirror / Atom feed
* [RFC 0/1] netfilter: xtables: xt_condition inclusion with namespace fix
From: Luciano Coelho @ 2010-07-22 14:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, kaber, jengelh, sameo

Hi,

This is a respin of the patch Jan sent to the list some time ago.  I've made
the changes proposed by Patrick in order to support multiple namespaces
correctly.

I still need to reapply my condition target and the u32 changes to the
condition on top of this, but I'd like to get some comments before I continue.

Please let me know how this looks.

Cheers,
Luca.

Luciano Coelho (1):
  netfilter: xtables: inclusion of xt_condition

 include/linux/netfilter/Kbuild         |    1 +
 include/linux/netfilter/xt_condition.h |   14 ++
 net/netfilter/Kconfig                  |    8 +
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_condition.c           |  294 ++++++++++++++++++++++++++++++++
 5 files changed, 318 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_condition.h
 create mode 100644 net/netfilter/xt_condition.c


^ permalink raw reply

* Re: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Ben Hutchings @ 2010-07-22 14:07 UTC (permalink / raw)
  To: Stefan Assmann
  Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
	alexander.h.duyck, leedom, harald
In-Reply-To: <4C483E8D.4080300@redhat.com>

On Thu, 2010-07-22 at 14:50 +0200, Stefan Assmann wrote:
> On 21.07.2010 15:54, Ben Hutchings wrote:
> > On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
> >> I put Alex' idea into code for further discussion, keeping the names
> >> mentioned here until we agree on the scope of this attribute. When we
> >> have settled I'll post a patch with proper patch description.
> > [...]
> > 
> > Just a little nitpick: I think it would be clearer to use a more
> > specific term like 'address source' or 'address assignment type' rather
> > than 'address type'.
> 
> Here's a proposal for the final patch.

Looks good, but...

[...]
>  /**
> + * dev_hw_addr_random - Create random MAC and set device flag
> + * @dev: pointer to net_device structure
> + * @addr: Pointer to a six-byte array containing the Ethernet address
> + *
> + * Generate random MAC to be used by a device and set addr_assign_type
> + * so the state can be read by sysfs and be used by udev.
> + */
> +static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
> +{
> +	dev->addr_assign_type |= NET_ADDR_RANDOM;
> +	random_ether_addr(hwaddr);
> +}
[...]

...why '|=' and not '='?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Eric Dumazet @ 2010-07-22 14:03 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, netdev

This match is a bit strange, being packet content agnostic...

Still, in some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.

With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)

Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.

Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.

Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
	-j REDIRECT --to-port 8080

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
	-j REDIRECT --to-port 8081

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
	-j REDIRECT --to-port 8082

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
	-j REDIRECT --to-port 8083


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netfilter/Kbuild   |    1 
 include/linux/netfilter/xt_cpu.h |    8 +++
 net/netfilter/Kconfig            |    9 ++++
 net/netfilter/Makefile           |    1 
 net/netfilter/xt_cpu.c           |   65 +++++++++++++++++++++++++++++
 5 files changed, 84 insertions(+)

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..5c39a56 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -34,6 +34,7 @@ header-y += xt_helper.h
 header-y += xt_length.h
 header-y += xt_limit.h
 header-y += xt_mac.h
+header-y += xt_cpu.h
 header-y += xt_mark.h
 header-y += xt_multiport.h
 header-y += xt_osf.h
diff --git a/include/linux/netfilter/xt_cpu.h b/include/linux/netfilter/xt_cpu.h
index e69de29..fdf4202 100644
--- a/include/linux/netfilter/xt_cpu.h
+++ b/include/linux/netfilter/xt_cpu.h
@@ -0,0 +1,8 @@
+#ifndef _XT_CPU_H
+#define _XT_CPU_H
+
+struct xt_cpu_info {
+	unsigned int	cpu;
+	int		invert;
+};
+#endif /*_XT_MAC_H*/
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..85b07bd 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -754,6 +754,15 @@ config NETFILTER_XT_MATCH_MAC
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_CPU
+	tristate '"cpu" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  CPU matching allows you to match packets based on the CPU
+	  currently handling the packet.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_MARK
 	tristate '"mark" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..0fe7efd 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -79,6 +79,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
diff --git a/net/netfilter/xt_cpu.c b/net/netfilter/xt_cpu.c
index e69de29..23d5a76 100644
--- a/net/netfilter/xt_cpu.c
+++ b/net/netfilter/xt_cpu.c
@@ -0,0 +1,65 @@
+/* Kernel module to match running CPU */
+
+/*
+ * Might be used to distribute connections on several daemons, if
+ * RPS (Remote Packet Steering) is enabled or NIC is multiqueue capable,
+ * each RX queue IRQ affined to one CPU (1:1 mapping)
+ *
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 -j REDIRECT --to-port 8080
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 -j REDIRECT --to-port 8081
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 -j REDIRECT --to-port 8082
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 -j REDIRECT --to-port 8083
+ *
+ */
+
+/* (C) 2010 Eric Dumazet
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_cpu.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Dumazet <eric.dumazet@gmail.com>");
+MODULE_DESCRIPTION("Xtables: CPU match");
+
+/*
+ * Yes, packet content is not interesting for us, we only take care
+ * of cpu handling this packet
+ */
+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_cpu_info *info = par->matchinfo;
+	bool ret;
+
+	ret = info->cpu == smp_processor_id();
+	ret ^= info->invert;
+	return ret;
+}
+
+static struct xt_match cpu_mt_reg __read_mostly = {
+	.name      = "cpu",
+	.revision  = 0,
+	.family    = NFPROTO_UNSPEC,
+	.match     = cpu_mt,
+	.matchsize = sizeof(struct xt_cpu_info),
+	.me        = THIS_MODULE,
+};
+
+static int __init cpu_mt_init(void)
+{
+	return xt_register_match(&cpu_mt_reg);
+}
+
+static void __exit cpu_mt_exit(void)
+{
+	xt_unregister_match(&cpu_mt_reg);
+}
+
+module_init(cpu_mt_init);
+module_exit(cpu_mt_exit);



^ permalink raw reply related

* Re: With disable_ipv6 set to 1 on an interface, ff00:/8 and fe80::/64 are still added on device UP
From: Mahesh Kelkar @ 2010-07-22 14:03 UTC (permalink / raw)
  To: Brian Haley; +Cc: netdev, David Miller
In-Reply-To: <20100720.134851.09735512.davem@davemloft.net>

Brian,

Overall the patch seem to work.

On one occasion I saw an error when it tried get rtnl_trylock() in
"addrconf_disable_ipv6" in addrconf.c. I am investigating into it. If
you could think of anything, please let me know.

I also came across another odd behavior (unrelated to disable_ipv6 but
related to multicast & link local route):
A. configure unicast Ipv6 address (say 123:2:3:4:5:6:7:8/64) on an
interface. (link-local will be assigned when interface comes up)
B. Bring the interface down (ip link set eth0 down),

you will get following set of netlink notifications (ip monitor all):
1. Deleted - unicast address connected route (123:2:3:4::/64)
2. Deleted - link local (fe80::/64) route
3. Deleted - multicast (ff00::/8) route
4. Deleted - unicast address (123:2:3:4:5:6:7:8/64)
5. Deleted - link local address

C. re-configure the unicast Ipv6 address (say 123:2:3:4:5:6:7:8/64) on
the interface. (link-local will NOT be assigned as interface is down)

You wil see following netlink notifications:
6. Added - unicast address (123:2:3:4:5:6:7:8/64)
7. Added - unicast address connected route (123:2:3:4::/64)
8. Added - multicast (ff00::/8) route
9. Added - link local (fe80::/64) route
etc.

I am not sure why #7, #8 & #9 occured. It doesn't happen in case of
IPv4. The routes show up when interface reaches up state. Perhaps my
kernel is old and that could be reason for this beahvior.

BTW I am using 2.6.21 with following cherry-picked disable_ipv6 patches:
- ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific
interface(commit:778d80be52699596bf70e0eb0761cf5e1e46088d)
- ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c) (commit:
71f6f6dfdf7c7a67462386d9ea05c1095a89c555)
- IPv6: Add 'autoconf' and 'disable_ipv6' module parameters (ONLY
interface specific behavior)

Thanks very much for your help.
Mahesh

On Tue, Jul 20, 2010 at 4:48 PM, David Miller <davem@davemloft.net> wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 20 Jul 2010 16:34:30 -0400
>
>> I believe the easiest way to fix this is the following patch, can
>> you please test it?
>  ...
>> If the interface has IPv6 disabled, don't add a multicast or
>> link-local route since we won't be adding a link-local address.
>>
>> Reported-by: Mahesh Kelkar <maheshkelkar@gmail.com>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>
> This looks good to me, let me know when it has been tested.
>

^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-22 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Andrew Morton, Greg KH, Rafael J. Wysocki,
	Maciej W. Rozycki, Johannes Berg, netdev
In-Reply-To: <m1zkxj27xp.fsf_-_@fess.ebiederm.org>

On Thu, Jul 22, 2010 at 11:16, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> With CONFIG_SYSFS_DEPRECATED_V2 enabled I can rename any network device
> anything as long as the new name does not conflict with another network
> device.
>
> With CONFIG_SYSFS_DEPRECATED_V2 disabled without this fix bluetooth benp
> devices, and the mac80211_hwsim driver can not be renamed to any arbitrary
> name that happens to conflict with any other name that is used in their
> parent devices directory.

This is true for all devices, that their children can not carry names
of existing attributes or directories of the parent. These drivers
manage the parent-child relation their own and know these limitations
very well, because they have created the conflicing names themselves.
The class glue directories which separate these namespaces are there
to prevent unknown clashes, not clashes originating from the same
subsystem.

The real fix is that the drivers should not try to stack classes, but
use buses. This is and never was supported by the core, especially not
for clashing names.

> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -673,7 +673,7 @@ static struct kobject *get_device_parent(struct device *dev,
>                 */
>                if (parent == NULL)
>                        parent_kobj = virtual_device_parent(dev);
> -               else if (parent->class)
> +               else if (parent->class && (strcmp(dev->class->name, "net") != 0))

Subsystems specific code must not leak into core code. We will never
be able to get rid of these hacks. As mentioned in earlier mails, it's
just plain wrong to do anything like this. It makes a specific
subsystem behave different from all others, just to fix some broken
drivers to work with the newly introduced network sysfs namespaces.

Since the issue is not a regression, and not even a bug in the core,
it should not be done this way for mainline.

Please try to fix these drivers instead, or mark the broken for
namespaces, if nobody can fix them right now.

Thanks,
Kay

^ permalink raw reply

* Re: Fwd: LVS on local node
From: Simon Horman @ 2010-07-22 13:25 UTC (permalink / raw)
  To: Franchoze Eric; +Cc: wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <27901279770680@web67.yandex.ru>

On Thu, Jul 22, 2010 at 07:51:20AM +0400, Franchoze Eric wrote:
> Hello,
> 
> I'm trying to do load balancing of incoming traffic to my applications. This applications are not very  smp friendly, and I want try to run some instances according to number of cpus on single machine. And balance load of incoming traffic/connections to this applications.
> Looks like is should be similar to http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html
> 
>  linux kernel 2.6.32 with  or without hide interface patches.  Tried different configurations but could not see packets on application layer.
> 
> 192.168.1.165 - eth0 - interface for external connections
> 195.0.0.1 - dummy0 - virtual interface, real application is binded to that address.
> 
> Configuration is:
> -A -t 192.168.1.165:1234 -s wlc
> -a -t 192.168.1.165:1234 -r 195.0.0.1:1234 -g -w
> 
> #ipvsadm -L -n
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> TCP  192.168.1.165:1234 wlc
>   -> 195.0.0.1:1234               Local   1      0          0        
> #
> 
> Log:
> [ 2106.897409] IPVS: lookup/out TCP 192.168.1.165:44847->192.168.1.165:1234 not hit
> [ 2106.897412] IPVS: lookup service: fwm 0 TCP 192.168.1.165:1234 hit
> [ 2106.897414] IPVS: ip_vs_wlc_schedule(): Scheduling...
> [ 2106.897416] IPVS: WLC: server 195.0.0.1:1234 activeconns 0 refcnt 2 weight 1 overhead 1
> [ 2106.897418] IPVS: Enter: ip_vs_conn_new, net/netfilter/ipvs/ip_vs_conn.c line 693
> [ 2106.897421] IPVS: Bind-dest TCP c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 fwd:L s:0 conn->flags:181 conn->refcnt:1 dest->refcnt:3
> [ 2106.897425] IPVS: Schedule fwd:L c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 conn->flags:1C1 conn->refcnt:2
> [ 2106.897429] IPVS: TCP input  [S...] 195.0.0.1:1234->192.168.1.165:44847 state: NONE->SYN_RECV conn->refcnt:2
> [ 2106.897431] IPVS: Enter: ip_vs_null_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 212
> [ 2106.897439] IPVS: lookup/in TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
> [ 2106.897441] IPVS: lookup/out TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
> [ 2107.277535] IPVS: packet type=1 proto=17 daddr=255.255.255.255 ignored
> [ 2108.542691] IPVS: packet type=1 proto=17 daddr=192.168.1.255 ignored
> 
> As the result, server application does receive anything on accept(). I tried to make dummy0 a hidden device and play with arp settings. But without result.
> 
> I will be happy to hear any idea how to do connection in this environment.

Hi,

while others have suggested not using LVS for this task for various reasons.
I would just like to comment that this should work and this smells
like a bug to me. I will try and confirm that. But it won't be today.


^ permalink raw reply

* Re: Fwd: LVS on local node
From: Simon Horman @ 2010-07-22 13:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Franchoze Eric, wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <1279803583.2467.43.camel@edumazet-laptop>

On Thu, Jul 22, 2010 at 02:59:43PM +0200, Eric Dumazet wrote:
> Le jeudi 22 juillet 2010 à 21:24 +0900, Simon Horman a écrit :
> > On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:
> > 
> > [snip]
> > 
> > > lvs seems not very SMP friendly and a bit complex.
> > 
> > I'd be interested to hear some thoughts on
> > how the SMP aspect of that statement could
> > be improved.
> 
> Hi Simon
> 
> I am not familiar with LVS code, so I am probably wrong, but it seems it
> could be changed a bit.
> 
> Some rwlocks might become spinlocks (faster than rwlocks)
> 
> __ip_vs_securetcp_lock for example is always used with
> write_lock()/write_unlock().
> This can be a regular spinlock without even knowing the code.

I'll get that fixed.

> Some lookups could use RCU to avoid cache line misses, and to be able to
> use spinlocks for the write side.

Agreed. I took a look at RCUing things a while back, but got bogged
down and then forgot about it. I'll take anther stab at it.

> It would be good to have a bench setup with the case of 16 legacy
> daemons, and check how many new connections per second can be
> established, in an LVS setup and an iptables based one.
> 
> With 2.6.35 and RPS, a REDIRECT based solution can chose the target port
> without taking any lock (not counting conntrack internal costs of
> course), each cpu accessing local memory only.
> 
> # No need is eth0 is a multiqueue NIC
> echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus
> 
> for c in `seq 0 15`
> do
>   iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j
> REDIRECT --to-port $((1000 + $c))
> done

Its hard for lvs to compete with those kind of lightweight solutions and
it probably shouldn't. However, I'd just like to see LVS working as
well as it can within the constraint that, as you pointed out, its rather
complex. Thanks for your suggestions.


^ permalink raw reply

* Re: [PATCH for-2.6.35] tun: avoid BUG, dump packet on GSO errors
From: Herbert Xu @ 2010-07-22 13:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev
In-Reply-To: <20100721143245.GA8423@redhat.com>

Michael S. Tsirkin <mst@redhat.com> wrote:
> There are still some LRO cards that cause GSO errors in tun,
> and BUG on this is an unfriendly way to tell the admin
> to disable LRO.
> 
> Further, experience shows we might have more GSO bugs lurking.
> See https://bugzilla.kernel.org/show_bug.cgi?id=16413
> as a recent example.
> dumping a packet will make it easier to figure it out.
> 
> Replace BUG with warning+dump+drop the packet to make
> GSO errors in tun less critical and easier to debug.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Tested-by: Alex Unigovsky <unik@compot.ru>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Fwd: LVS on local node
From: Eric Dumazet @ 2010-07-22 12:59 UTC (permalink / raw)
  To: Simon Horman; +Cc: Franchoze Eric, wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <20100722122422.GD16234@verge.net.au>

Le jeudi 22 juillet 2010 à 21:24 +0900, Simon Horman a écrit :
> On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:
> 
> [snip]
> 
> > lvs seems not very SMP friendly and a bit complex.
> 
> I'd be interested to hear some thoughts on
> how the SMP aspect of that statement could
> be improved.

Hi Simon

I am not familiar with LVS code, so I am probably wrong, but it seems it
could be changed a bit.

Some rwlocks might become spinlocks (faster than rwlocks)

__ip_vs_securetcp_lock for example is always used with
write_lock()/write_unlock().
This can be a regular spinlock without even knowing the code.

Some lookups could use RCU to avoid cache line misses, and to be able to
use spinlocks for the write side.

It would be good to have a bench setup with the case of 16 legacy
daemons, and check how many new connections per second can be
established, in an LVS setup and an iptables based one.

With 2.6.35 and RPS, a REDIRECT based solution can chose the target port
without taking any lock (not counting conntrack internal costs of
course), each cpu accessing local memory only.

# No need is eth0 is a multiqueue NIC
echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus

for c in `seq 0 15`
do
  iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j
REDIRECT --to-port $((1000 + $c))
done




^ permalink raw reply

* Re: [PATCH V4] CAN: Add Flexcan CAN controller driver
From: Marc Kleine-Budde @ 2010-07-22 12:57 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4C483B1A.2040703-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1030 bytes --]

Hey,

Wolfgang Grandegger wrote:
> On 07/21/2010 11:04 PM, Marc Kleine-Budde wrote:
>> This core is found on some Freescale SoCs and also some Coldfire
>> SoCs. Support for Coldfire is missing though at the moment as
>> they have an older revision of the core which does not have RX FIFO
>> support.
>>
>> Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
>> Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
> 
> Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

David, please don't apply the patch yet. I just got the information that
there is a problem with "a" flexcan driver. I'm about to get more
information and investigate this.

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

[-- Attachment #2: Type: text/plain, Size: 188 bytes --]

_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core

^ permalink raw reply

* [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Stefan Assmann @ 2010-07-22 12:50 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
	alexander.h.duyck, leedom, harald
In-Reply-To: <1279720478.2089.3.camel@achroite.uk.solarflarecom.com>

On 21.07.2010 15:54, Ben Hutchings wrote:
> On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
>> I put Alex' idea into code for further discussion, keeping the names
>> mentioned here until we agree on the scope of this attribute. When we
>> have settled I'll post a patch with proper patch description.
> [...]
> 
> Just a little nitpick: I think it would be clearer to use a more
> specific term like 'address source' or 'address assignment type' rather
> than 'address type'.

Here's a proposal for the final patch.

  Stefan

From: Stefan Assmann <sassmann@redhat.com>

Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.

For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.

This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
---
 include/linux/etherdevice.h |   14 ++++++++++++++
 include/linux/netdevice.h   |    6 ++++++
 net/core/net-sysfs.c        |    2 ++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 3d7a668..848480b 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -127,6 +127,20 @@ static inline void random_ether_addr(u8 *addr)
 }

 /**
+ * dev_hw_addr_random - Create random MAC and set device flag
+ * @dev: pointer to net_device structure
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Generate random MAC to be used by a device and set addr_assign_type
+ * so the state can be read by sysfs and be used by udev.
+ */
+static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
+{
+	dev->addr_assign_type |= NET_ADDR_RANDOM;
+	random_ether_addr(hwaddr);
+}
+
+/**
  * compare_ether_addr - Compare two Ethernet addresses
  * @addr1: Pointer to a six-byte array containing the Ethernet address
  * @addr2: Pointer other six-byte array containing the Ethernet address
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b626289..1bca617 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -66,6 +66,11 @@ struct wireless_dev;
 #define HAVE_FREE_NETDEV		/* free_netdev() */
 #define HAVE_NETDEV_PRIV		/* netdev_priv() */

+/* hardware address assignment types */
+#define NET_ADDR_PERM		0	/* address is permanent (default) */
+#define NET_ADDR_RANDOM		1	/* address is generated randomly */
+#define NET_ADDR_STOLEN		2	/* address is stolen from other device */
+
 /* Backlog congestion levels */
 #define NET_RX_SUCCESS		0	/* keep 'em coming, baby */
 #define NET_RX_DROP		1	/* packet dropped */
@@ -919,6 +924,7 @@ struct net_device {

 	/* Interface address info. */
 	unsigned char		perm_addr[MAX_ADDR_LEN]; /* permanent hw address */
+	unsigned char		addr_assign_type; /* hw address assignment type */
 	unsigned char		addr_len;	/* hardware address length	*/
 	unsigned short          dev_id;		/* for shared network cards */

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index d2b5965..af4dfba 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -95,6 +95,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 }

 NETDEVICE_SHOW(dev_id, fmt_hex);
+NETDEVICE_SHOW(addr_assign_type, fmt_dec);
 NETDEVICE_SHOW(addr_len, fmt_dec);
 NETDEVICE_SHOW(iflink, fmt_dec);
 NETDEVICE_SHOW(ifindex, fmt_dec);
@@ -295,6 +296,7 @@ static ssize_t show_ifalias(struct device *dev,
 }

 static struct device_attribute net_class_attributes[] = {
+	__ATTR(addr_assign_type, S_IRUGO, show_addr_assign_type, NULL),
 	__ATTR(addr_len, S_IRUGO, show_addr_len, NULL),
 	__ATTR(dev_id, S_IRUGO, show_dev_id, NULL),
 	__ATTR(ifalias, S_IRUGO | S_IWUSR, show_ifalias, store_ifalias),
-- 
1.6.5.2

^ permalink raw reply related

* minstrel_tx_status mac80211 WARNINGs in vanilla 2.6.34.1
From: Sven Geggus @ 2010-07-22 12:30 UTC (permalink / raw)
  To: netdev

Hello,

running vanilla 2.6.34.1 I get the following warnings in  kernel log:

WARNING: at net/mac80211/rc80211_minstrel.c:70 minstrel_tx_status+0x67/0xd1 [mac80211]()
Hardware name: SCENIC E300/E600
Modules linked in: i915 drm_kms_helper drm video backlight output lp loop
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_dummy zd1211rw snd_seq_oss usbhid mac80211 option cfg80211
snd_seq_midi usbserial snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd parport_pc ehci_hcd uhci_hcd soundcore intel_agp parport
usbcore nls_base snd_page_alloc agpgart rng_core floppy sg
Pid: 0, comm: swapper Tainted: G        W  2.6.34.1 #3
Call Trace:
 [<c102af25>] warn_slowpath_common+0x60/0x90
 [<c102af62>] warn_slowpath_null+0xd/0x10
 [<f83cbd48>] minstrel_tx_status+0x67/0xd1 [mac80211]
 [<f83b6eb1>] ieee80211_tx_status+0x1f6/0x5ac [mac80211]
 [<c1261533>] ? skb_dequeue+0x45/0x4c
 [<f83b6896>] ieee80211_tasklet_handler+0x61/0xd6 [mac80211]
 [<c102ed7d>] tasklet_action+0x62/0x9f
 [<c102f331>] __do_softirq+0x77/0xe5
 [<c102f3c5>] do_softirq+0x26/0x2b
 [<c102f52f>] irq_exit+0x29/0x66
 [<c1003e90>] do_IRQ+0x85/0x9b
 [<c1002d29>] common_interrupt+0x29/0x30
 [<c10083ac>] ? default_idle+0x2d/0x42
 [<c1001a9b>] cpu_idle+0x44/0x71
 [<c12e00de>] rest_init+0x96/0x98
 [<c1498862>] start_kernel+0x2a5/0x2aa
 [<c14980b7>] i386_start_kernel+0xb7/0xbf
---[ end trace f22ceacef336878f ]---

Wireless driver is zd1211rw.

Did not test with older kernel because this device has not been in user on
this machine before.

WLAN does however seem to work anyway.

Regards

Sven

-- 
The source code is not comprehensible
                 (found in bug section of man 8 telnetd on Redhat Linux)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

^ permalink raw reply

* Re: [PATCH] LSM: Add post recvmsg() hook.
From: Tetsuo Handa @ 2010-07-22 12:46 UTC (permalink / raw)
  To: davem
  Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore, netdev,
	linux-security-module
In-Reply-To: <20100721.220611.267376790.davem@davemloft.net>

David Miller wrote:
> > Then, why does below proposal lose information?
> 
> Peek changes state, now it's possible that two processes end up
> receiving the packet.

Indeed. We will need to protect sock->ops->recvmsg() call using a lock like

 static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 				       struct msghdr *msg, size_t size, int flags)
 {
+	int err;
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
 	sock_update_classid(sock->sk);
 
 	si->sock = sock;
 	si->scm = NULL;
 	si->msg = msg;
 	si->size = size;
 	si->flags = flags;
 
-	return sock->ops->recvmsg(iocb, sock, msg, size, flags);
+	err = security_socket_read_lock(sock);
+	if (err)
+		return err;
+	err = sock->ops->recvmsg(iocb, sock, msg, size, flags);
+	security_socket_read_unlock(sock);
+	return err;
 }

in addition to security_socket_recvmsg_force_peek() and
security_socket_post_recvmsg().

But locks like above break MSG_DONTWAIT since recv() without MSG_DONTWAIT
calls wait_for_packet() inside __skb_recv_datagram().
To make MSG_DONTWAIT work, I have to do like below.

 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 				    int *peeked, int *err)
(...snipped...)
 	do {
 		/* Again only user level code calls this function, so nothing
 		 * interrupt level will suddenly eat the receive_queue.
 		 *
 		 * Look at current nfs client by the way...
 		 * However, this function was corrent in any case. 8)
 		 */
 		unsigned long cpu_flags;
 
+		/* < 0 if lock failed, 0 if no need to lock, > 0 if locked */
+		int serialized = security_socket_read_lock(sk);
+		if (serialized < 0) {
+			error = serialized;
+			goto no_packet;
+		} else if (serialized > 0) {
+			int err;
+			spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
+			skb = skb_peek(&sk->sk_receive_queue);
+			spin_unlock_irqrestore(&sk->sk_receive_queue.lock,
+					       cpu_flags);
+			if (!skb)
+				goto no_skb;
+			err = security_socket_pre_recvmsg(sk, skb);
+			if (err < 0) {
+				error = err;
+				security_socket_read_unlock(sk);
+				goto no_packet;
+			}
+		}
+
 		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
 		skb = skb_peek(&sk->sk_receive_queue);
 		if (skb) {
 			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
 				skb->peeked = 1;
 				atomic_inc(&skb->users);
 			} else
 				__skb_unlink(skb, &sk->sk_receive_queue);
 		}
 		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 
+no_skb:
+		if (serialized > 0)
+			security_socket_read_unlock(sk);
 		if (skb)
 			return skb;
 
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
 			goto no_packet;
 
 	} while (!wait_for_packet(sk, err, &timeo));
(...snipped...)
 }

Inserting LSM hooks like above will be the only way to work properly (i.e.
handle MSG_DONTWAIT and avoid showing the same message to multiple readers
and keep the queue's state unchanged upon error).
But you said ( http://marc.info/?l=linux-netdev&m=124022463014713&w=2 )

> We worked so hard to split out this common code, it is simply
> a non-starter for anyone to start putting protocol specific test
> into here, or even worse to move this code back to being locally
> copied into every protocol implementation.

when I proposed inserting LSM hooks into __skb_recv_datagram()
( http://marc.info/?l=linux-netdev&m=124022463014672&w=2 ).
So, I have no way to allow performing permission checks based on combination of
"process who issued recv() request" and "source address/port of the message
which the process is about to pick up" without breaking things (unless you
accept inserting LSM hooks into __skb_recv_datagram())...

^ permalink raw reply

* Re: [PATCH V4] CAN: Add Flexcan CAN controller driver
From: Wolfgang Grandegger @ 2010-07-22 12:35 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1279746286-19736-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

On 07/21/2010 11:04 PM, Marc Kleine-Budde wrote:
> This core is found on some Freescale SoCs and also some Coldfire
> SoCs. Support for Coldfire is missing though at the moment as
> they have an older revision of the core which does not have RX FIFO
> support.
> 
> Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
> Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

Thanks for your contribution.

Wolfgang.

^ permalink raw reply

* Another oops + repost
From: Martín Ferrari @ 2010-07-22 12:24 UTC (permalink / raw)
  To: netdev
  Cc: Ben Hutchings, 577640, Eric W. Biederman, Alexey Dobriyan,
	Mathieu Lacage, Daniel Lezcano

Hi again,

First of all, I would like to know if anybody was able to fix this
problem  that got kinda lost in the thread:

On Thu, Apr 22, 2010 at 16:05, Martín Ferrari <martin.ferrari@gmail.com> wrote:

> I have just downloaded and compiled 2.6.32-2 and 2.6.34-rc5 from
> kernel.org using the .config from the debian package, and the oops is
> reproducible in both.
>
> This small C file reproduces the error every time:
>
> $ cat netnsoops.c
> #include <stdio.h>
> #include <stdlib.h>
> #define _GNU_SOURCE
> #include <sched.h>
>
> int main(int argc, char *argv[])
> {
>        int c;
>        unsigned long flags = CLONE_NEWNET;
>
>        if(unshare(flags) == -1) {
>                perror("unshare");
>                return 1;
>        }
>        system("ip link add name FOO type veth peer name BAR");
>        system("ip link set FOO netns 1");
>        system("ip link show");
>        return 0;
> }

Secondly, I discovered another related bug, just more subtle.

I'm creating a dummy device, moving it into a name space and then
taking it back to netns 1. Later, when I delete the dummy, I get an
oops:

[  610.540091] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000028
[  610.541512] IP: [<ffffffff81138f71>] sysfs_find_dirent+0x9/0x2f
[  610.542369] PGD 3799d067 PUD 37bb3067 PMD 0
[  610.543370] Oops: 0000 [#1] SMP
[  610.544018] last sysfs file: /sys/devices/virtual/net/lo/operstate
[  610.544018] CPU 0
[  610.544018] Modules linked in: dummy loop parport_pc parport
snd_pcm snd_timer tpm_tis tpm tpm_bios snd soundcore snd_page_alloc
pcspkr psmouse serio_raw i2c_piix4 evdev i2c_core button processor
ext3 jbd mbcache ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix
8139too libata scsi_mod floppy piix 8139cp mii thermal thermal_sys
ide_core [last unloaded: scsi_wait_scan]
[  610.544018]
[  610.544018] Pid: 1359, comm: ip Tainted: G        W  2.6.34-rc5 #1 /
[  610.544018] RIP: 0010:[<ffffffff81138f71>]  [<ffffffff81138f71>]
sysfs_find_dirent+0x9/0x2f
[  610.544018] RSP: 0018:ffff88003789f988  EFLAGS: 00010296
[  610.544018] RAX: ffff88003789e000 RBX: 0000000000000000 RCX: ffff88007d3d2cd8
[  610.544018] RDX: 0000000000000003 RSI: ffffffff814a6352 RDI: 0000000000000000
[  610.544018] RBP: ffffffff814a6352 R08: 000000037f80e800 R09: ffff88007d3d2ce8
[  610.544018] R10: ffff88007d3d2800 R11: 0000000000000006 R12: ffffffff814a6352
[  610.544018] R13: ffff88007d3d2c48 R14: 0000000000000000 R15: ffff88003789fbb8
[  610.544018] FS:  00007f22855f7700(0000) GS:ffff880001a00000(0000)
knlGS:0000000000000000
[  610.544018] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  610.544018] CR2: 0000000000000028 CR3: 000000007d4df000 CR4: 00000000000006f0
[  610.544018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  610.544018] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  610.544018] Process ip (pid: 1359, threadinfo ffff88003789e000,
task ffff88007d2b69f0)
[  610.544018] Stack:
[  610.544018]  000000037f80e800 0000000000000000 0000000000000000
ffffffff81138fbb
[  610.544018] <0> 0000000000000296 ffff88007d3d2c38 ffffffff816698d0
ffffffff8113a980
[  610.544018] <0> ffff88007d3d2c38 ffff88007d3d2c38 ffff88007d3d2800
0000000000000000
[  610.544018] Call Trace:
[  610.544018]  [<ffffffff81138fbb>] ? sysfs_get_dirent+0x24/0x43
[  610.544018]  [<ffffffff8113a980>] ? sysfs_remove_group+0x24/0xcf
[  610.544018]  [<ffffffff8120f776>] ? device_del+0x3b/0x1a0
[  610.544018]  [<ffffffff81244139>] ? rollback_registered_many+0x15d/0x1c8
[  610.544018]  [<ffffffff8124d81e>] ? rtnetlink_rcv_msg+0x0/0x1f5
[  610.544018]  [<ffffffff81244273>] ? unregister_netdevice_queue+0x78/0xa9
[  610.544018]  [<ffffffff8124c22b>] ? rtnl_dellink+0xb7/0xdb
[  610.544018]  [<ffffffff8125e887>] ? netlink_rcv_skb+0x34/0x7c
[  610.544018]  [<ffffffff8124d818>] ? rtnetlink_rcv+0x1f/0x25
[  610.544018]  [<ffffffff8125e67b>] ? netlink_unicast+0xe2/0x148
[  610.544018]  [<ffffffff8125edaf>] ? netlink_sendmsg+0x23f/0x252
[  610.544018]  [<ffffffff8123388d>] ? sock_sendmsg+0x83/0x9b
[  610.544018]  [<ffffffff810b3e0d>] ? __alloc_pages_nodemask+0x10f/0x5e2
[  610.544018]  [<ffffffff8123c76f>] ? copy_from_user+0x13/0x25
[  610.544018]  [<ffffffff8123cb25>] ? verify_iovec+0x49/0x84
[  610.544018]  [<ffffffff81233b62>] ? sys_sendmsg+0x225/0x2af
[  610.544018]  [<ffffffff81234e17>] ? sys_recvmsg+0x48/0x56
[  610.544018]  [<ffffffff81008ac2>] ? system_call_fastpath+0x16/0x1b
[  610.544018] Code: fb 74 1a 8b 07 85 c0 75 11 be 9d 00 00 00 48 c7
c7 da 56 4b 81 e8 2b cf f0 ff f0 ff 03 48 89 d8 5b c3 55 48 89 f5 53
48 83 ec 08 <48> 8b 5f 28 eb 14 48 8b 7b 18 48 89 ee e8 46 a4 04 00 85
c0 74
[  610.544018] RIP  [<ffffffff81138f71>] sysfs_find_dirent+0x9/0x2f
[  610.544018]  RSP <ffff88003789f988>
[  610.544018] CR2: 0000000000000028
[  610.597008] ---[ end trace f92104e98ea87a47 ]---


To reproduce:

$ cat netnsoops2.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#define _GNU_SOURCE
#include <sched.h>

int main(int argc, char *argv[])
{
       int pipefd[2];
       pid_t pid;
       unsigned long flags = CLONE_NEWNET;

       if(system("ip link add name FOO type dummy"))
	       return 1;
       if(pipe(pipefd) == -1) {
               perror("pipe");
               return 1;
       }
       pid = fork();
       if(pid == -1) {
               perror("fork");
               return 1;
       }
       if(pid) {
	       char buf[256];
	       read(pipefd[0], buf, 1);
	       snprintf(buf, sizeof(buf), "ip link set FOO netns %d", pid);
	       system(buf);
       } else {
	       if(unshare(flags) == -1) {
		       perror("unshare");
		       return 1;
	       }
	       write(pipefd[1], "a", 1);
	       system("ip link set FOO netns 1");
	       return 0;
       }
       waitpid(pid, NULL, 0);
       system("ip link show");
       system("ip link del FOO");
       return 0;
}


-- 
Martín Ferrari

^ permalink raw reply

* Re: Fwd: LVS on local node
From: Simon Horman @ 2010-07-22 12:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Franchoze Eric, wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <1279781811.2405.15.camel@edumazet-laptop>

On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:

[snip]

> lvs seems not very SMP friendly and a bit complex.

I'd be interested to hear some thoughts on
how the SMP aspect of that statement could
be improved.

^ permalink raw reply

* Re: [PATCH] sysfs: Don't allow the creation of symlinks we can't remove
From: Johannes Berg @ 2010-07-22 11:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Andrew Morton, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Greg KH, netdev
In-Reply-To: <m1sk3bycxl.fsf@fess.ebiederm.org>

On Thu, 2010-07-22 at 04:27 -0700, Eric W. Biederman wrote:

> >> Do we have a convenient command line tool to do this?
> >> I remember there being a different netlink message from
> >> normal network devices.
> >
> > iw phy0 set netns <pid>
> >
> > http://git.sipsolutions.net/iw.git
> >
> >> > root@kvm:~# ip link
> >> > 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> >> >     link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> >> > 7: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN 
> >> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >> > root@kvm:~# ls /sys/class/net/
> >> > eth0  hwsim0  lo  wlan1  wlan2
> >> 
> >> I think this is actually the output of something working.
> >> 
> >> I expect after you created a new netns you didn't mount
> >> a new instance of /sys.  /sys remembers which netns you
> >> had when you mounted it.  So you have to mount /sys again
> >> so you can see the /sys/class/net for the network namespace
> >> you are in.
> >
> > Ohh, oops! I saw all the "current->" references in the code and somehow
> > expected the same instance of sysfs to show the right thing.
> >
> > Yes, it works now. But the patch below doesn't seem to work, am I
> > missing something?
> 
> You are trying to move the phy devices as well?

Yes. The intent is that each wireless phy lives in a netns along with
all of its child devices.

> My guess is that at least part of the problem is that you don't have a
> ieee80211 directory under hwsim.

But I should have? 'ieee80211' is a class just like 'net', no?

> My apologies for not thinking about the peculiarities of the wireless
> drivers.

No worries.

johannes


^ permalink raw reply

* Re: [PATCH] sysfs: Don't allow the creation of symlinks we can't remove
From: Eric W. Biederman @ 2010-07-22 11:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Greg KH, Andrew Morton, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Greg KH, netdev
In-Reply-To: <1279795286.12439.8.camel@jlt3.sipsolutions.net>

Johannes Berg <johannes@sipsolutions.net> writes:

> On Thu, 2010-07-22 at 03:35 -0700, Eric W. Biederman wrote:
>
>> >> The warning patch just makes things fail faster.  Although I get some of the
>> >> wireless interfaces for hwsim when I use this one.
>> >
>> > Hmm, I didn't.
>> 
>> To be clear I just get hwsim0.  Not wlan0 or wlan1.
>
> Ah, yes, but that's just a regular netdev, you can pretty much ignore
> it. It just shows all hwsim traffic as it is on the "air" for sniffing.
>
>> > Right, it actually starts working again with that patch you sent.
>> > However, netns support is really broken:
>> >
>> > <create net namespace, put phy0/wlan0 into it>
>> 
>> Do we have a convenient command line tool to do this?
>> I remember there being a different netlink message from
>> normal network devices.
>
> iw phy0 set netns <pid>
>
> http://git.sipsolutions.net/iw.git
>
>> > root@kvm:~# ip link
>> > 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>> >     link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
>> > 7: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN 
>> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> > root@kvm:~# ls /sys/class/net/
>> > eth0  hwsim0  lo  wlan1  wlan2
>> 
>> I think this is actually the output of something working.
>> 
>> I expect after you created a new netns you didn't mount
>> a new instance of /sys.  /sys remembers which netns you
>> had when you mounted it.  So you have to mount /sys again
>> so you can see the /sys/class/net for the network namespace
>> you are in.
>
> Ohh, oops! I saw all the "current->" references in the code and somehow
> expected the same instance of sysfs to show the right thing.
>
> Yes, it works now. But the patch below doesn't seem to work, am I
> missing something?

You are trying to move the phy devices as well?

My guess is that at least part of the problem is that you don't have a
ieee80211 directory under hwsim.

My apologies for not thinking about the peculiarities of the wireless
drivers.

Eric


> ---
>  include/linux/netdevice.h |    2 ++
>  net/core/net-sysfs.c      |    3 ++-
>  net/wireless/sysfs.c      |    9 +++++++++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> --- wireless-testing.orig/include/linux/netdevice.h	2010-07-22 10:01:22.000000000 +0200
> +++ wireless-testing/include/linux/netdevice.h	2010-07-22 10:11:00.000000000 +0200
> @@ -2148,6 +2148,8 @@ extern void dev_seq_stop(struct seq_file
>  extern int netdev_class_create_file(struct class_attribute *class_attr);
>  extern void netdev_class_remove_file(struct class_attribute *class_attr);
>  
> +extern struct kobj_ns_type_operations net_ns_type_operations;
> +
>  extern char *netdev_drivername(const struct net_device *dev, char *buffer, int len);
>  
>  extern void linkwatch_run_queue(void);
> --- wireless-testing.orig/net/core/net-sysfs.c	2010-07-22 10:01:22.000000000 +0200
> +++ wireless-testing/net/core/net-sysfs.c	2010-07-22 10:11:51.000000000 +0200
> @@ -785,12 +785,13 @@ static const void *net_netlink_ns(struct
>  	return sock_net(sk);
>  }
>  
> -static struct kobj_ns_type_operations net_ns_type_operations = {
> +struct kobj_ns_type_operations net_ns_type_operations = {
>  	.type = KOBJ_NS_TYPE_NET,
>  	.current_ns = net_current_ns,
>  	.netlink_ns = net_netlink_ns,
>  	.initial_ns = net_initial_ns,
>  };
> +EXPORT_SYMBOL_GPL(net_ns_type_operations);
>  
>  static void net_kobj_ns_exit(struct net *net)
>  {
> --- wireless-testing.orig/net/wireless/sysfs.c	2010-07-22 10:01:22.000000000 +0200
> +++ wireless-testing/net/wireless/sysfs.c	2010-07-22 10:13:08.000000000 +0200
> @@ -110,6 +110,13 @@ static int wiphy_resume(struct device *d
>  	return ret;
>  }
>  
> +static const void *wiphy_namespace(struct device *d)
> +{
> +	struct wiphy *wiphy = container_of(d, struct wiphy, dev);
> +
> +	return wiphy_net(wiphy);
> +}
> +
>  struct class ieee80211_class = {
>  	.name = "ieee80211",
>  	.owner = THIS_MODULE,
> @@ -120,6 +127,8 @@ struct class ieee80211_class = {
>  #endif
>  	.suspend = wiphy_suspend,
>  	.resume = wiphy_resume,
> +	.ns_type = &net_ns_type_operations,
> +	.namespace = wiphy_namespace,
>  };
>  
>  int wiphy_sysfs_init(void)

^ permalink raw reply

* [patch -next] stmmac: handle allocation errors in setup functions
From: Dan Carpenter @ 2010-07-22 11:16 UTC (permalink / raw)
  To: Giuseppe Cavallaro; +Cc: David S. Miller, netdev, kernel-janitors

If the allocations fail in either dwmac1000_setup() or dwmac100_setup()
then return NULL.  These are called from stmmac_mac_device_setup().  The 
check for NULL returns in stmmac_mac_device_setup() needed to be moved 
forward a couple lines.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/net/stmmac/dwmac1000_core.c b/drivers/net/stmmac/dwmac1000_core.c
index 917b4e1..2b2f5c8 100644
--- a/drivers/net/stmmac/dwmac1000_core.c
+++ b/drivers/net/stmmac/dwmac1000_core.c
@@ -220,6 +220,8 @@ struct mac_device_info *dwmac1000_setup(unsigned long ioaddr)
 		((uid & 0x0000ff00) >> 8), (uid & 0x000000ff));
 
 	mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
+	if (!mac)
+		return NULL;
 
 	mac->mac = &dwmac1000_ops;
 	mac->dma = &dwmac1000_dma_ops;
diff --git a/drivers/net/stmmac/dwmac100_core.c b/drivers/net/stmmac/dwmac100_core.c
index 6f270a0..2fb165f 100644
--- a/drivers/net/stmmac/dwmac100_core.c
+++ b/drivers/net/stmmac/dwmac100_core.c
@@ -179,6 +179,8 @@ struct mac_device_info *dwmac100_setup(unsigned long ioaddr)
 	struct mac_device_info *mac;
 
 	mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
+	if (!mac)
+		return NULL;
 
 	pr_info("\tDWMAC100\n");
 
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index acf0616..0bdd332 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -1558,15 +1558,15 @@ static int stmmac_mac_device_setup(struct net_device *dev)
 	else
 		device = dwmac100_setup(ioaddr);
 
+	if (!device)
+		return -ENOMEM;
+
 	if (priv->enh_desc) {
 		device->desc = &enh_desc_ops;
 		pr_info("\tEnhanced descriptor structure\n");
 	} else
 		device->desc = &ndesc_ops;
 
-	if (!device)
-		return -ENOMEM;
-
 	priv->hw = device;
 
 	priv->wolenabled = priv->hw->pmt;	/* PMT supported */

^ permalink raw reply related

* [patch -next] caif: precedence bug
From: Dan Carpenter @ 2010-07-22 11:11 UTC (permalink / raw)
  To: SjurBraendeland
  Cc: David S. Miller, Sjur Braendeland, netdev, kernel-janitors

Negate has precedence over comparison so the original assert only
checked that "rfml->fragment_size" was larger than 1 or 0.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index 4b04d25..eb16020 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -193,7 +193,7 @@ out:
 
 static int cfrfml_transmit_segment(struct cfrfml *rfml, struct cfpkt *pkt)
 {
-	caif_assert(!cfpkt_getlen(pkt) < rfml->fragment_size);
+	caif_assert(cfpkt_getlen(pkt) >= rfml->fragment_size);
 
 	/* Add info for MUX-layer to route the packet out. */
 	cfpkt_info(pkt)->channel_id = rfml->serv.layer.id;

^ permalink raw reply related

* Re: [PATCH] sysfs: Don't allow the creation of symlinks we can't remove
From: Johannes Berg @ 2010-07-22 10:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Andrew Morton, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Greg KH, netdev
In-Reply-To: <m1mxtjztvy.fsf@fess.ebiederm.org>

On Thu, 2010-07-22 at 03:35 -0700, Eric W. Biederman wrote:

> >> The warning patch just makes things fail faster.  Although I get some of the
> >> wireless interfaces for hwsim when I use this one.
> >
> > Hmm, I didn't.
> 
> To be clear I just get hwsim0.  Not wlan0 or wlan1.

Ah, yes, but that's just a regular netdev, you can pretty much ignore
it. It just shows all hwsim traffic as it is on the "air" for sniffing.

> > Right, it actually starts working again with that patch you sent.
> > However, netns support is really broken:
> >
> > <create net namespace, put phy0/wlan0 into it>
> 
> Do we have a convenient command line tool to do this?
> I remember there being a different netlink message from
> normal network devices.

iw phy0 set netns <pid>

http://git.sipsolutions.net/iw.git

> > root@kvm:~# ip link
> > 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> >     link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> > 7: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN 
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > root@kvm:~# ls /sys/class/net/
> > eth0  hwsim0  lo  wlan1  wlan2
> 
> I think this is actually the output of something working.
> 
> I expect after you created a new netns you didn't mount
> a new instance of /sys.  /sys remembers which netns you
> had when you mounted it.  So you have to mount /sys again
> so you can see the /sys/class/net for the network namespace
> you are in.

Ohh, oops! I saw all the "current->" references in the code and somehow
expected the same instance of sysfs to show the right thing.

Yes, it works now. But the patch below doesn't seem to work, am I
missing something?

johannes

---
 include/linux/netdevice.h |    2 ++
 net/core/net-sysfs.c      |    3 ++-
 net/wireless/sysfs.c      |    9 +++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

--- wireless-testing.orig/include/linux/netdevice.h	2010-07-22 10:01:22.000000000 +0200
+++ wireless-testing/include/linux/netdevice.h	2010-07-22 10:11:00.000000000 +0200
@@ -2148,6 +2148,8 @@ extern void dev_seq_stop(struct seq_file
 extern int netdev_class_create_file(struct class_attribute *class_attr);
 extern void netdev_class_remove_file(struct class_attribute *class_attr);
 
+extern struct kobj_ns_type_operations net_ns_type_operations;
+
 extern char *netdev_drivername(const struct net_device *dev, char *buffer, int len);
 
 extern void linkwatch_run_queue(void);
--- wireless-testing.orig/net/core/net-sysfs.c	2010-07-22 10:01:22.000000000 +0200
+++ wireless-testing/net/core/net-sysfs.c	2010-07-22 10:11:51.000000000 +0200
@@ -785,12 +785,13 @@ static const void *net_netlink_ns(struct
 	return sock_net(sk);
 }
 
-static struct kobj_ns_type_operations net_ns_type_operations = {
+struct kobj_ns_type_operations net_ns_type_operations = {
 	.type = KOBJ_NS_TYPE_NET,
 	.current_ns = net_current_ns,
 	.netlink_ns = net_netlink_ns,
 	.initial_ns = net_initial_ns,
 };
+EXPORT_SYMBOL_GPL(net_ns_type_operations);
 
 static void net_kobj_ns_exit(struct net *net)
 {
--- wireless-testing.orig/net/wireless/sysfs.c	2010-07-22 10:01:22.000000000 +0200
+++ wireless-testing/net/wireless/sysfs.c	2010-07-22 10:13:08.000000000 +0200
@@ -110,6 +110,13 @@ static int wiphy_resume(struct device *d
 	return ret;
 }
 
+static const void *wiphy_namespace(struct device *d)
+{
+	struct wiphy *wiphy = container_of(d, struct wiphy, dev);
+
+	return wiphy_net(wiphy);
+}
+
 struct class ieee80211_class = {
 	.name = "ieee80211",
 	.owner = THIS_MODULE,
@@ -120,6 +127,8 @@ struct class ieee80211_class = {
 #endif
 	.suspend = wiphy_suspend,
 	.resume = wiphy_resume,
+	.ns_type = &net_ns_type_operations,
+	.namespace = wiphy_namespace,
 };
 
 int wiphy_sysfs_init(void)



^ permalink raw reply

* Re: [PATCH] sysfs: Don't allow the creation of symlinks we can't remove
From: Eric W. Biederman @ 2010-07-22 10:35 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Greg KH, Andrew Morton, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Greg KH, netdev
In-Reply-To: <1279793435.12439.3.camel@jlt3.sipsolutions.net>

Johannes Berg <johannes@sipsolutions.net> writes:

> On Thu, 2010-07-22 at 03:05 -0700, Eric W. Biederman wrote:
>
>> >> Detect this problem up front and simply don't create symlinks we won't
>> >> be able to remove later.  This prevents symlink leakage and fails in
>> >> a much clearer and more understandable way.
>> >
>> > Eric, I was looking into sysfs netns support for wireless, and with this
>> > patch applied I just get the warning and no network interfaces.
>> 
>> The warning patch just makes things fail faster.  Although I get some of the
>> wireless interfaces for hwsim when I use this one.
>
> Hmm, I didn't.

To be clear I just get hwsim0.  Not wlan0 or wlan1.

>> > Was there any patch that was supposed to fix hwsim?
>> 
>> - If you have my patches that fix CONFIG_SYSFS_DEPRECATED,
>>   you should find everything works there.
>
> But then I was carrying those two patches too.
>
>> As for a proper fix I have just resent my one liner to
>> drives/base/core.c I can't think of a better option right now.
>> 
>> For hwsim it is arguable, but the behaviour of sysfs for the
>> bluetooth bnep driver is very clearly a 3 year old regression,
>> and the cause is exactly the same.
>
> Right, it actually starts working again with that patch you sent.
> However, netns support is really broken:
>
> <create net namespace, put phy0/wlan0 into it>

Do we have a convenient command line tool to do this?
I remember there being a different netlink message from
normal network devices.

> root@kvm:~# ip link
> 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>     link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> 7: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN 
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> root@kvm:~# ls /sys/class/net/
> eth0  hwsim0  lo  wlan1  wlan2

I think this is actually the output of something working.

I expect after you created a new netns you didn't mount
a new instance of /sys.  /sys remembers which netns you
had when you mounted it.  So you have to mount /sys again
so you can see the /sys/class/net for the network namespace
you are in.

Eric


^ permalink raw reply

* Re: [PATCH] sysfs: Don't allow the creation of symlinks we can't remove
From: Johannes Berg @ 2010-07-22 10:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, Andrew Morton, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Greg KH, netdev
In-Reply-To: <m139vb25nz.fsf@fess.ebiederm.org>

On Thu, 2010-07-22 at 03:05 -0700, Eric W. Biederman wrote:

> >> Detect this problem up front and simply don't create symlinks we won't
> >> be able to remove later.  This prevents symlink leakage and fails in
> >> a much clearer and more understandable way.
> >
> > Eric, I was looking into sysfs netns support for wireless, and with this
> > patch applied I just get the warning and no network interfaces.
> 
> The warning patch just makes things fail faster.  Although I get some of the
> wireless interfaces for hwsim when I use this one.

Hmm, I didn't.

> > Was there any patch that was supposed to fix hwsim?
> 
> - If you have my patches that fix CONFIG_SYSFS_DEPRECATED,
>   you should find everything works there.

But then I was carrying those two patches too.

> As for a proper fix I have just resent my one liner to
> drives/base/core.c I can't think of a better option right now.
> 
> For hwsim it is arguable, but the behaviour of sysfs for the
> bluetooth bnep driver is very clearly a 3 year old regression,
> and the cause is exactly the same.

Right, it actually starts working again with that patch you sent.
However, netns support is really broken:

<create net namespace, put phy0/wlan0 into it>

root@kvm:~# ip link
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 02:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
root@kvm:~# ls /sys/class/net/
eth0  hwsim0  lo  wlan1  wlan2

johannes


^ permalink raw reply

* Re: mirred, redirect action vs. dev refcount issue
From: jamal @ 2010-07-22 10:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20100721170024.60cd9ef4@nehalam>

On Wed, 2010-07-21 at 17:00 -0700, Stephen Hemminger wrote:
> On Wed, 21 Jul 2010 16:58:02 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 

> > Whether the ifindex or the global list + delete scheme is better is a
> > topic for discussion.  Since from the user's perspective it is unclear
> > which semantic is less surprising, entries disappearing or suddenly
> > stop working (or start applying to a different device which has taken
> > a previous one's ifindex!).
> 
> ifindex is unique (until integer wraps) so that soft reference
> works.

The proper way to do it is via a notifier since we point to the
netdev - and yes it is a little more complex thats why i just
let the admin suffer (IMO) the well deserved consequences[1].

I am in travel mode - but i will do some background thinking and
come up with a good way to resolve it when i get back. Unless you
have a patch you want me to look at.

cheers,
jamal

[1] least element of suprise principle:
Admin adds a rule which says
"you see a packet matching blah incoming on eth0,
do action1 then action2 ... then actionN"
Say action2 is "mirror to ifb0".
And then this same admin goes and rmmods ifb0 - it is easier to
just reject this rmmod operation as we do todau. Maybe we could be 
kinder and be more informative and syslog something along the lines of
"rejected to unregister device you rat-bastard because you have a rule
which says we should mirror to ifb0". 
Thoughts?



^ permalink raw reply

* Re: Fwd: LVS on local node
From: Changli Gao @ 2010-07-22 10:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, Jan Engelhardt, Franchoze Eric, wensong,
	lvs-devel, netdev, netfilter-devel
In-Reply-To: <1279792792.2467.15.camel@edumazet-laptop>

On Thu, Jul 22, 2010 at 5:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 22 juillet 2010 à 17:52 +0800, Changli Gao a écrit :
>
>>
>> FYI: the random option is documented in the manual page of iptables.
>>
>>    REDIRECT
>>        This  target is only valid in the nat table, in the PREROUTING and OUT-
>>        PUT chains, and user-defined chains which are only  called  from  those
>>        chains.   It redirects the packet to the machine itself by changing the
>>        destination IP  to  the  primary  address  of  the  incoming  interface
>>        (locally-generated packets are mapped to the 127.0.0.1 address).
>>
>>        --to-ports port[-port]
>>               This  specifies  a  destination  port  or range of ports to use:
>>               without this, the destination port is never  altered.   This  is
>>               only valid if the rule also specifies -p tcp or -p udp.
>>
>>        --random
>>               If  option --random is used then port mapping will be randomized
>>               (kernel >= 2.6.22).
>>
>>
>
> Note my patch has nothing to do with the man page, its already up2date.
>
> I usually dont read the Fine manuals, do you ?

Yea. And I don't object your patch. so I add FYI. Thanks.

>
> Try :
>
> iptables -t nat -A PREROUTING -p tcp --dport 1234 -j REDIRECT --help
>
> REDIRECT target options:
>  --to-ports <port>[-<port>]
>                                Port (range) to map to.
>
>
> You see [--random] is missing.
>
>


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox