* Re: [PATCH] LSM: Add post recvmsg() hook.
From: Tetsuo Handa @ 2010-07-22 12:46 UTC (permalink / raw)
To: davem
Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore, netdev,
linux-security-module
In-Reply-To: <20100721.220611.267376790.davem@davemloft.net>
David Miller wrote:
> > Then, why does below proposal lose information?
>
> Peek changes state, now it's possible that two processes end up
> receiving the packet.
Indeed. We will need to protect sock->ops->recvmsg() call using a lock like
static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size, int flags)
{
+ int err;
struct sock_iocb *si = kiocb_to_siocb(iocb);
sock_update_classid(sock->sk);
si->sock = sock;
si->scm = NULL;
si->msg = msg;
si->size = size;
si->flags = flags;
- return sock->ops->recvmsg(iocb, sock, msg, size, flags);
+ err = security_socket_read_lock(sock);
+ if (err)
+ return err;
+ err = sock->ops->recvmsg(iocb, sock, msg, size, flags);
+ security_socket_read_unlock(sock);
+ return err;
}
in addition to security_socket_recvmsg_force_peek() and
security_socket_post_recvmsg().
But locks like above break MSG_DONTWAIT since recv() without MSG_DONTWAIT
calls wait_for_packet() inside __skb_recv_datagram().
To make MSG_DONTWAIT work, I have to do like below.
struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
int *peeked, int *err)
(...snipped...)
do {
/* Again only user level code calls this function, so nothing
* interrupt level will suddenly eat the receive_queue.
*
* Look at current nfs client by the way...
* However, this function was corrent in any case. 8)
*/
unsigned long cpu_flags;
+ /* < 0 if lock failed, 0 if no need to lock, > 0 if locked */
+ int serialized = security_socket_read_lock(sk);
+ if (serialized < 0) {
+ error = serialized;
+ goto no_packet;
+ } else if (serialized > 0) {
+ int err;
+ spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
+ skb = skb_peek(&sk->sk_receive_queue);
+ spin_unlock_irqrestore(&sk->sk_receive_queue.lock,
+ cpu_flags);
+ if (!skb)
+ goto no_skb;
+ err = security_socket_pre_recvmsg(sk, skb);
+ if (err < 0) {
+ error = err;
+ security_socket_read_unlock(sk);
+ goto no_packet;
+ }
+ }
+
spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
skb = skb_peek(&sk->sk_receive_queue);
if (skb) {
*peeked = skb->peeked;
if (flags & MSG_PEEK) {
skb->peeked = 1;
atomic_inc(&skb->users);
} else
__skb_unlink(skb, &sk->sk_receive_queue);
}
spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
+no_skb:
+ if (serialized > 0)
+ security_socket_read_unlock(sk);
if (skb)
return skb;
/* User doesn't want to wait */
error = -EAGAIN;
if (!timeo)
goto no_packet;
} while (!wait_for_packet(sk, err, &timeo));
(...snipped...)
}
Inserting LSM hooks like above will be the only way to work properly (i.e.
handle MSG_DONTWAIT and avoid showing the same message to multiple readers
and keep the queue's state unchanged upon error).
But you said ( http://marc.info/?l=linux-netdev&m=124022463014713&w=2 )
> We worked so hard to split out this common code, it is simply
> a non-starter for anyone to start putting protocol specific test
> into here, or even worse to move this code back to being locally
> copied into every protocol implementation.
when I proposed inserting LSM hooks into __skb_recv_datagram()
( http://marc.info/?l=linux-netdev&m=124022463014672&w=2 ).
So, I have no way to allow performing permission checks based on combination of
"process who issued recv() request" and "source address/port of the message
which the process is about to pick up" without breaking things (unless you
accept inserting LSM hooks into __skb_recv_datagram())...
^ permalink raw reply
* minstrel_tx_status mac80211 WARNINGs in vanilla 2.6.34.1
From: Sven Geggus @ 2010-07-22 12:30 UTC (permalink / raw)
To: netdev
Hello,
running vanilla 2.6.34.1 I get the following warnings in kernel log:
WARNING: at net/mac80211/rc80211_minstrel.c:70 minstrel_tx_status+0x67/0xd1 [mac80211]()
Hardware name: SCENIC E300/E600
Modules linked in: i915 drm_kms_helper drm video backlight output lp loop
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_dummy zd1211rw snd_seq_oss usbhid mac80211 option cfg80211
snd_seq_midi usbserial snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd parport_pc ehci_hcd uhci_hcd soundcore intel_agp parport
usbcore nls_base snd_page_alloc agpgart rng_core floppy sg
Pid: 0, comm: swapper Tainted: G W 2.6.34.1 #3
Call Trace:
[<c102af25>] warn_slowpath_common+0x60/0x90
[<c102af62>] warn_slowpath_null+0xd/0x10
[<f83cbd48>] minstrel_tx_status+0x67/0xd1 [mac80211]
[<f83b6eb1>] ieee80211_tx_status+0x1f6/0x5ac [mac80211]
[<c1261533>] ? skb_dequeue+0x45/0x4c
[<f83b6896>] ieee80211_tasklet_handler+0x61/0xd6 [mac80211]
[<c102ed7d>] tasklet_action+0x62/0x9f
[<c102f331>] __do_softirq+0x77/0xe5
[<c102f3c5>] do_softirq+0x26/0x2b
[<c102f52f>] irq_exit+0x29/0x66
[<c1003e90>] do_IRQ+0x85/0x9b
[<c1002d29>] common_interrupt+0x29/0x30
[<c10083ac>] ? default_idle+0x2d/0x42
[<c1001a9b>] cpu_idle+0x44/0x71
[<c12e00de>] rest_init+0x96/0x98
[<c1498862>] start_kernel+0x2a5/0x2aa
[<c14980b7>] i386_start_kernel+0xb7/0xbf
---[ end trace f22ceacef336878f ]---
Wireless driver is zd1211rw.
Did not test with older kernel because this device has not been in user on
this machine before.
WLAN does however seem to work anyway.
Regards
Sven
--
The source code is not comprehensible
(found in bug section of man 8 telnetd on Redhat Linux)
/me is giggls@ircnet, http://sven.gegg.us/ on the Web
^ permalink raw reply
* [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Stefan Assmann @ 2010-07-22 12:50 UTC (permalink / raw)
To: Ben Hutchings
Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
alexander.h.duyck, leedom, harald
In-Reply-To: <1279720478.2089.3.camel@achroite.uk.solarflarecom.com>
On 21.07.2010 15:54, Ben Hutchings wrote:
> On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
>> I put Alex' idea into code for further discussion, keeping the names
>> mentioned here until we agree on the scope of this attribute. When we
>> have settled I'll post a patch with proper patch description.
> [...]
>
> Just a little nitpick: I think it would be clearer to use a more
> specific term like 'address source' or 'address assignment type' rather
> than 'address type'.
Here's a proposal for the final patch.
Stefan
From: Stefan Assmann <sassmann@redhat.com>
Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.
For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.
This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.
Signed-off-by: Stefan Assmann <sassmann@redhat.com>
---
include/linux/etherdevice.h | 14 ++++++++++++++
include/linux/netdevice.h | 6 ++++++
net/core/net-sysfs.c | 2 ++
3 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 3d7a668..848480b 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -127,6 +127,20 @@ static inline void random_ether_addr(u8 *addr)
}
/**
+ * dev_hw_addr_random - Create random MAC and set device flag
+ * @dev: pointer to net_device structure
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Generate random MAC to be used by a device and set addr_assign_type
+ * so the state can be read by sysfs and be used by udev.
+ */
+static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
+{
+ dev->addr_assign_type |= NET_ADDR_RANDOM;
+ random_ether_addr(hwaddr);
+}
+
+/**
* compare_ether_addr - Compare two Ethernet addresses
* @addr1: Pointer to a six-byte array containing the Ethernet address
* @addr2: Pointer other six-byte array containing the Ethernet address
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b626289..1bca617 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -66,6 +66,11 @@ struct wireless_dev;
#define HAVE_FREE_NETDEV /* free_netdev() */
#define HAVE_NETDEV_PRIV /* netdev_priv() */
+/* hardware address assignment types */
+#define NET_ADDR_PERM 0 /* address is permanent (default) */
+#define NET_ADDR_RANDOM 1 /* address is generated randomly */
+#define NET_ADDR_STOLEN 2 /* address is stolen from other device */
+
/* Backlog congestion levels */
#define NET_RX_SUCCESS 0 /* keep 'em coming, baby */
#define NET_RX_DROP 1 /* packet dropped */
@@ -919,6 +924,7 @@ struct net_device {
/* Interface address info. */
unsigned char perm_addr[MAX_ADDR_LEN]; /* permanent hw address */
+ unsigned char addr_assign_type; /* hw address assignment type */
unsigned char addr_len; /* hardware address length */
unsigned short dev_id; /* for shared network cards */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index d2b5965..af4dfba 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -95,6 +95,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
}
NETDEVICE_SHOW(dev_id, fmt_hex);
+NETDEVICE_SHOW(addr_assign_type, fmt_dec);
NETDEVICE_SHOW(addr_len, fmt_dec);
NETDEVICE_SHOW(iflink, fmt_dec);
NETDEVICE_SHOW(ifindex, fmt_dec);
@@ -295,6 +296,7 @@ static ssize_t show_ifalias(struct device *dev,
}
static struct device_attribute net_class_attributes[] = {
+ __ATTR(addr_assign_type, S_IRUGO, show_addr_assign_type, NULL),
__ATTR(addr_len, S_IRUGO, show_addr_len, NULL),
__ATTR(dev_id, S_IRUGO, show_dev_id, NULL),
__ATTR(ifalias, S_IRUGO | S_IWUSR, show_ifalias, store_ifalias),
--
1.6.5.2
^ permalink raw reply related
* Re: [PATCH V4] CAN: Add Flexcan CAN controller driver
From: Marc Kleine-Budde @ 2010-07-22 12:57 UTC (permalink / raw)
To: Wolfgang Grandegger
Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4C483B1A.2040703-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 1030 bytes --]
Hey,
Wolfgang Grandegger wrote:
> On 07/21/2010 11:04 PM, Marc Kleine-Budde wrote:
>> This core is found on some Freescale SoCs and also some Coldfire
>> SoCs. Support for Coldfire is missing though at the moment as
>> they have an older revision of the core which does not have RX FIFO
>> support.
>>
>> Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
>> Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
>
> Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
David, please don't apply the patch yet. I just got the information that
there is a problem with "a" flexcan driver. I'm about to get more
information and investigate this.
cheers, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]
[-- Attachment #2: Type: text/plain, Size: 188 bytes --]
_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core
^ permalink raw reply
* Re: Fwd: LVS on local node
From: Eric Dumazet @ 2010-07-22 12:59 UTC (permalink / raw)
To: Simon Horman; +Cc: Franchoze Eric, wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <20100722122422.GD16234@verge.net.au>
Le jeudi 22 juillet 2010 à 21:24 +0900, Simon Horman a écrit :
> On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:
>
> [snip]
>
> > lvs seems not very SMP friendly and a bit complex.
>
> I'd be interested to hear some thoughts on
> how the SMP aspect of that statement could
> be improved.
Hi Simon
I am not familiar with LVS code, so I am probably wrong, but it seems it
could be changed a bit.
Some rwlocks might become spinlocks (faster than rwlocks)
__ip_vs_securetcp_lock for example is always used with
write_lock()/write_unlock().
This can be a regular spinlock without even knowing the code.
Some lookups could use RCU to avoid cache line misses, and to be able to
use spinlocks for the write side.
It would be good to have a bench setup with the case of 16 legacy
daemons, and check how many new connections per second can be
established, in an LVS setup and an iptables based one.
With 2.6.35 and RPS, a REDIRECT based solution can chose the target port
without taking any lock (not counting conntrack internal costs of
course), each cpu accessing local memory only.
# No need is eth0 is a multiqueue NIC
echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus
for c in `seq 0 15`
do
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j
REDIRECT --to-port $((1000 + $c))
done
^ permalink raw reply
* Re: [PATCH for-2.6.35] tun: avoid BUG, dump packet on GSO errors
From: Herbert Xu @ 2010-07-22 13:05 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev
In-Reply-To: <20100721143245.GA8423@redhat.com>
Michael S. Tsirkin <mst@redhat.com> wrote:
> There are still some LRO cards that cause GSO errors in tun,
> and BUG on this is an unfriendly way to tell the admin
> to disable LRO.
>
> Further, experience shows we might have more GSO bugs lurking.
> See https://bugzilla.kernel.org/show_bug.cgi?id=16413
> as a recent example.
> dumping a packet will make it easier to figure it out.
>
> Replace BUG with warning+dump+drop the packet to make
> GSO errors in tun less critical and easier to debug.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Tested-by: Alex Unigovsky <unik@compot.ru>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Fwd: LVS on local node
From: Simon Horman @ 2010-07-22 13:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Franchoze Eric, wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <1279803583.2467.43.camel@edumazet-laptop>
On Thu, Jul 22, 2010 at 02:59:43PM +0200, Eric Dumazet wrote:
> Le jeudi 22 juillet 2010 à 21:24 +0900, Simon Horman a écrit :
> > On Thu, Jul 22, 2010 at 08:56:51AM +0200, Eric Dumazet wrote:
> >
> > [snip]
> >
> > > lvs seems not very SMP friendly and a bit complex.
> >
> > I'd be interested to hear some thoughts on
> > how the SMP aspect of that statement could
> > be improved.
>
> Hi Simon
>
> I am not familiar with LVS code, so I am probably wrong, but it seems it
> could be changed a bit.
>
> Some rwlocks might become spinlocks (faster than rwlocks)
>
> __ip_vs_securetcp_lock for example is always used with
> write_lock()/write_unlock().
> This can be a regular spinlock without even knowing the code.
I'll get that fixed.
> Some lookups could use RCU to avoid cache line misses, and to be able to
> use spinlocks for the write side.
Agreed. I took a look at RCUing things a while back, but got bogged
down and then forgot about it. I'll take anther stab at it.
> It would be good to have a bench setup with the case of 16 legacy
> daemons, and check how many new connections per second can be
> established, in an LVS setup and an iptables based one.
>
> With 2.6.35 and RPS, a REDIRECT based solution can chose the target port
> without taking any lock (not counting conntrack internal costs of
> course), each cpu accessing local memory only.
>
> # No need is eth0 is a multiqueue NIC
> echo ffff >/sys/class/net/eth0/queues/rx-0/rps_cpus
>
> for c in `seq 0 15`
> do
> iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu $c -j
> REDIRECT --to-port $((1000 + $c))
> done
Its hard for lvs to compete with those kind of lightweight solutions and
it probably shouldn't. However, I'd just like to see LVS working as
well as it can within the constraint that, as you pointed out, its rather
complex. Thanks for your suggestions.
^ permalink raw reply
* Re: Fwd: LVS on local node
From: Simon Horman @ 2010-07-22 13:25 UTC (permalink / raw)
To: Franchoze Eric; +Cc: wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <27901279770680@web67.yandex.ru>
On Thu, Jul 22, 2010 at 07:51:20AM +0400, Franchoze Eric wrote:
> Hello,
>
> I'm trying to do load balancing of incoming traffic to my applications. This applications are not very smp friendly, and I want try to run some instances according to number of cpus on single machine. And balance load of incoming traffic/connections to this applications.
> Looks like is should be similar to http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html
>
> linux kernel 2.6.32 with or without hide interface patches. Tried different configurations but could not see packets on application layer.
>
> 192.168.1.165 - eth0 - interface for external connections
> 195.0.0.1 - dummy0 - virtual interface, real application is binded to that address.
>
> Configuration is:
> -A -t 192.168.1.165:1234 -s wlc
> -a -t 192.168.1.165:1234 -r 195.0.0.1:1234 -g -w
>
> #ipvsadm -L -n
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP 192.168.1.165:1234 wlc
> -> 195.0.0.1:1234 Local 1 0 0
> #
>
> Log:
> [ 2106.897409] IPVS: lookup/out TCP 192.168.1.165:44847->192.168.1.165:1234 not hit
> [ 2106.897412] IPVS: lookup service: fwm 0 TCP 192.168.1.165:1234 hit
> [ 2106.897414] IPVS: ip_vs_wlc_schedule(): Scheduling...
> [ 2106.897416] IPVS: WLC: server 195.0.0.1:1234 activeconns 0 refcnt 2 weight 1 overhead 1
> [ 2106.897418] IPVS: Enter: ip_vs_conn_new, net/netfilter/ipvs/ip_vs_conn.c line 693
> [ 2106.897421] IPVS: Bind-dest TCP c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 fwd:L s:0 conn->flags:181 conn->refcnt:1 dest->refcnt:3
> [ 2106.897425] IPVS: Schedule fwd:L c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 conn->flags:1C1 conn->refcnt:2
> [ 2106.897429] IPVS: TCP input [S...] 195.0.0.1:1234->192.168.1.165:44847 state: NONE->SYN_RECV conn->refcnt:2
> [ 2106.897431] IPVS: Enter: ip_vs_null_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 212
> [ 2106.897439] IPVS: lookup/in TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
> [ 2106.897441] IPVS: lookup/out TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
> [ 2107.277535] IPVS: packet type=1 proto=17 daddr=255.255.255.255 ignored
> [ 2108.542691] IPVS: packet type=1 proto=17 daddr=192.168.1.255 ignored
>
> As the result, server application does receive anything on accept(). I tried to make dummy0 a hidden device and play with arp settings. But without result.
>
> I will be happy to hear any idea how to do connection in this environment.
Hi,
while others have suggested not using LVS for this task for various reasons.
I would just like to comment that this should work and this smells
like a bug to me. I will try and confirm that. But it won't be today.
^ permalink raw reply
* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-22 13:38 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg KH, Andrew Morton, Greg KH, Rafael J. Wysocki,
Maciej W. Rozycki, Johannes Berg, netdev
In-Reply-To: <m1zkxj27xp.fsf_-_@fess.ebiederm.org>
On Thu, Jul 22, 2010 at 11:16, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> With CONFIG_SYSFS_DEPRECATED_V2 enabled I can rename any network device
> anything as long as the new name does not conflict with another network
> device.
>
> With CONFIG_SYSFS_DEPRECATED_V2 disabled without this fix bluetooth benp
> devices, and the mac80211_hwsim driver can not be renamed to any arbitrary
> name that happens to conflict with any other name that is used in their
> parent devices directory.
This is true for all devices, that their children can not carry names
of existing attributes or directories of the parent. These drivers
manage the parent-child relation their own and know these limitations
very well, because they have created the conflicing names themselves.
The class glue directories which separate these namespaces are there
to prevent unknown clashes, not clashes originating from the same
subsystem.
The real fix is that the drivers should not try to stack classes, but
use buses. This is and never was supported by the core, especially not
for clashing names.
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -673,7 +673,7 @@ static struct kobject *get_device_parent(struct device *dev,
> */
> if (parent == NULL)
> parent_kobj = virtual_device_parent(dev);
> - else if (parent->class)
> + else if (parent->class && (strcmp(dev->class->name, "net") != 0))
Subsystems specific code must not leak into core code. We will never
be able to get rid of these hacks. As mentioned in earlier mails, it's
just plain wrong to do anything like this. It makes a specific
subsystem behave different from all others, just to fix some broken
drivers to work with the newly introduced network sysfs namespaces.
Since the issue is not a regression, and not even a bug in the core,
it should not be done this way for mainline.
Please try to fix these drivers instead, or mark the broken for
namespaces, if nobody can fix them right now.
Thanks,
Kay
^ permalink raw reply
* Re: With disable_ipv6 set to 1 on an interface, ff00:/8 and fe80::/64 are still added on device UP
From: Mahesh Kelkar @ 2010-07-22 14:03 UTC (permalink / raw)
To: Brian Haley; +Cc: netdev, David Miller
In-Reply-To: <20100720.134851.09735512.davem@davemloft.net>
Brian,
Overall the patch seem to work.
On one occasion I saw an error when it tried get rtnl_trylock() in
"addrconf_disable_ipv6" in addrconf.c. I am investigating into it. If
you could think of anything, please let me know.
I also came across another odd behavior (unrelated to disable_ipv6 but
related to multicast & link local route):
A. configure unicast Ipv6 address (say 123:2:3:4:5:6:7:8/64) on an
interface. (link-local will be assigned when interface comes up)
B. Bring the interface down (ip link set eth0 down),
you will get following set of netlink notifications (ip monitor all):
1. Deleted - unicast address connected route (123:2:3:4::/64)
2. Deleted - link local (fe80::/64) route
3. Deleted - multicast (ff00::/8) route
4. Deleted - unicast address (123:2:3:4:5:6:7:8/64)
5. Deleted - link local address
C. re-configure the unicast Ipv6 address (say 123:2:3:4:5:6:7:8/64) on
the interface. (link-local will NOT be assigned as interface is down)
You wil see following netlink notifications:
6. Added - unicast address (123:2:3:4:5:6:7:8/64)
7. Added - unicast address connected route (123:2:3:4::/64)
8. Added - multicast (ff00::/8) route
9. Added - link local (fe80::/64) route
etc.
I am not sure why #7, #8 & #9 occured. It doesn't happen in case of
IPv4. The routes show up when interface reaches up state. Perhaps my
kernel is old and that could be reason for this beahvior.
BTW I am using 2.6.21 with following cherry-picked disable_ipv6 patches:
- ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific
interface(commit:778d80be52699596bf70e0eb0761cf5e1e46088d)
- ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c) (commit:
71f6f6dfdf7c7a67462386d9ea05c1095a89c555)
- IPv6: Add 'autoconf' and 'disable_ipv6' module parameters (ONLY
interface specific behavior)
Thanks very much for your help.
Mahesh
On Tue, Jul 20, 2010 at 4:48 PM, David Miller <davem@davemloft.net> wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 20 Jul 2010 16:34:30 -0400
>
>> I believe the easiest way to fix this is the following patch, can
>> you please test it?
> ...
>> If the interface has IPv6 disabled, don't add a multicast or
>> link-local route since we won't be adding a link-local address.
>>
>> Reported-by: Mahesh Kelkar <maheshkelkar@gmail.com>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>
> This looks good to me, let me know when it has been tested.
>
^ permalink raw reply
* [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Eric Dumazet @ 2010-07-22 14:03 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, netdev
This match is a bit strange, being packet content agnostic...
Still, in some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.
With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)
Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.
Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.
Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
-j REDIRECT --to-port 8081
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
-j REDIRECT --to-port 8082
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
-j REDIRECT --to-port 8083
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/netfilter/Kbuild | 1
include/linux/netfilter/xt_cpu.h | 8 +++
net/netfilter/Kconfig | 9 ++++
net/netfilter/Makefile | 1
net/netfilter/xt_cpu.c | 65 +++++++++++++++++++++++++++++
5 files changed, 84 insertions(+)
diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..5c39a56 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -34,6 +34,7 @@ header-y += xt_helper.h
header-y += xt_length.h
header-y += xt_limit.h
header-y += xt_mac.h
+header-y += xt_cpu.h
header-y += xt_mark.h
header-y += xt_multiport.h
header-y += xt_osf.h
diff --git a/include/linux/netfilter/xt_cpu.h b/include/linux/netfilter/xt_cpu.h
index e69de29..fdf4202 100644
--- a/include/linux/netfilter/xt_cpu.h
+++ b/include/linux/netfilter/xt_cpu.h
@@ -0,0 +1,8 @@
+#ifndef _XT_CPU_H
+#define _XT_CPU_H
+
+struct xt_cpu_info {
+ unsigned int cpu;
+ int invert;
+};
+#endif /*_XT_MAC_H*/
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..85b07bd 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -754,6 +754,15 @@ config NETFILTER_XT_MATCH_MAC
To compile it as a module, choose M here. If unsure, say N.
+config NETFILTER_XT_MATCH_CPU
+ tristate '"cpu" match support'
+ depends on NETFILTER_ADVANCED
+ help
+ CPU matching allows you to match packets based on the CPU
+ currently handling the packet.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config NETFILTER_XT_MATCH_MARK
tristate '"mark" match support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..0fe7efd 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -79,6 +79,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
diff --git a/net/netfilter/xt_cpu.c b/net/netfilter/xt_cpu.c
index e69de29..23d5a76 100644
--- a/net/netfilter/xt_cpu.c
+++ b/net/netfilter/xt_cpu.c
@@ -0,0 +1,65 @@
+/* Kernel module to match running CPU */
+
+/*
+ * Might be used to distribute connections on several daemons, if
+ * RPS (Remote Packet Steering) is enabled or NIC is multiqueue capable,
+ * each RX queue IRQ affined to one CPU (1:1 mapping)
+ *
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 -j REDIRECT --to-port 8080
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 -j REDIRECT --to-port 8081
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 -j REDIRECT --to-port 8082
+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 -j REDIRECT --to-port 8083
+ *
+ */
+
+/* (C) 2010 Eric Dumazet
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_cpu.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Dumazet <eric.dumazet@gmail.com>");
+MODULE_DESCRIPTION("Xtables: CPU match");
+
+/*
+ * Yes, packet content is not interesting for us, we only take care
+ * of cpu handling this packet
+ */
+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_cpu_info *info = par->matchinfo;
+ bool ret;
+
+ ret = info->cpu == smp_processor_id();
+ ret ^= info->invert;
+ return ret;
+}
+
+static struct xt_match cpu_mt_reg __read_mostly = {
+ .name = "cpu",
+ .revision = 0,
+ .family = NFPROTO_UNSPEC,
+ .match = cpu_mt,
+ .matchsize = sizeof(struct xt_cpu_info),
+ .me = THIS_MODULE,
+};
+
+static int __init cpu_mt_init(void)
+{
+ return xt_register_match(&cpu_mt_reg);
+}
+
+static void __exit cpu_mt_exit(void)
+{
+ xt_unregister_match(&cpu_mt_reg);
+}
+
+module_init(cpu_mt_init);
+module_exit(cpu_mt_exit);
^ permalink raw reply related
* Re: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Ben Hutchings @ 2010-07-22 14:07 UTC (permalink / raw)
To: Stefan Assmann
Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
alexander.h.duyck, leedom, harald
In-Reply-To: <4C483E8D.4080300@redhat.com>
On Thu, 2010-07-22 at 14:50 +0200, Stefan Assmann wrote:
> On 21.07.2010 15:54, Ben Hutchings wrote:
> > On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
> >> I put Alex' idea into code for further discussion, keeping the names
> >> mentioned here until we agree on the scope of this attribute. When we
> >> have settled I'll post a patch with proper patch description.
> > [...]
> >
> > Just a little nitpick: I think it would be clearer to use a more
> > specific term like 'address source' or 'address assignment type' rather
> > than 'address type'.
>
> Here's a proposal for the final patch.
Looks good, but...
[...]
> /**
> + * dev_hw_addr_random - Create random MAC and set device flag
> + * @dev: pointer to net_device structure
> + * @addr: Pointer to a six-byte array containing the Ethernet address
> + *
> + * Generate random MAC to be used by a device and set addr_assign_type
> + * so the state can be read by sysfs and be used by udev.
> + */
> +static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
> +{
> + dev->addr_assign_type |= NET_ADDR_RANDOM;
> + random_ether_addr(hwaddr);
> +}
[...]
...why '|=' and not '='?
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* [RFC 0/1] netfilter: xtables: xt_condition inclusion with namespace fix
From: Luciano Coelho @ 2010-07-22 14:09 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, kaber, jengelh, sameo
Hi,
This is a respin of the patch Jan sent to the list some time ago. I've made
the changes proposed by Patrick in order to support multiple namespaces
correctly.
I still need to reapply my condition target and the u32 changes to the
condition on top of this, but I'd like to get some comments before I continue.
Please let me know how this looks.
Cheers,
Luca.
Luciano Coelho (1):
netfilter: xtables: inclusion of xt_condition
include/linux/netfilter/Kbuild | 1 +
include/linux/netfilter/xt_condition.h | 14 ++
net/netfilter/Kconfig | 8 +
net/netfilter/Makefile | 1 +
net/netfilter/xt_condition.c | 294 ++++++++++++++++++++++++++++++++
5 files changed, 318 insertions(+), 0 deletions(-)
create mode 100644 include/linux/netfilter/xt_condition.h
create mode 100644 net/netfilter/xt_condition.c
^ permalink raw reply
* [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Luciano Coelho @ 2010-07-22 14:09 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, kaber, jengelh, sameo
In-Reply-To: <1279807758-6876-1-git-send-email-luciano.coelho@nokia.com>
xt_condition can be used by userspace to influence decisions in rules
by means of togglable variables without having to reload the entire
ruleset.
This is a respin of the module in Xtables-addons, with support for
multiple namespaces and other small improvements.
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
include/linux/netfilter/Kbuild | 1 +
include/linux/netfilter/xt_condition.h | 14 ++
net/netfilter/Kconfig | 8 +
net/netfilter/Makefile | 1 +
net/netfilter/xt_condition.c | 294 ++++++++++++++++++++++++++++++++
5 files changed, 318 insertions(+), 0 deletions(-)
create mode 100644 include/linux/netfilter/xt_condition.h
create mode 100644 net/netfilter/xt_condition.c
diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..d873f67 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -20,6 +20,7 @@ header-y += xt_TCPOPTSTRIP.h
header-y += xt_TEE.h
header-y += xt_TPROXY.h
header-y += xt_comment.h
+header-y += xt_condition.h
header-y += xt_connbytes.h
header-y += xt_connlimit.h
header-y += xt_connmark.h
diff --git a/include/linux/netfilter/xt_condition.h b/include/linux/netfilter/xt_condition.h
new file mode 100644
index 0000000..4faf3ca
--- /dev/null
+++ b/include/linux/netfilter/xt_condition.h
@@ -0,0 +1,14 @@
+#ifndef _XT_CONDITION_H
+#define _XT_CONDITION_H
+
+#include <linux/types.h>
+
+struct xt_condition_mtinfo {
+ char name[31];
+ __u8 invert;
+
+ /* Used internally by the kernel */
+ void *condvar __attribute__((aligned(8)));
+};
+
+#endif /* _XT_CONDITION_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..8c114b8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -605,6 +605,14 @@ config NETFILTER_XT_MATCH_COMMENT
If you want to compile it as a module, say M here and read
<file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+config NETFILTER_XT_MATCH_CONDITION
+ tristate '"condition" match support'
+ depends on NETFILTER_ADVANCED
+ depends on PROC_FS
+ ---help---
+ This option allows you to match firewall rules against condition
+ variables stored in the /proc/net/nf_condition directory.
+
config NETFILTER_XT_MATCH_CONNBYTES
tristate '"connbytes" per-connection counter match support'
depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..474dd06 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
# matches
obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CONDITION) += xt_condition.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNTRACK) += xt_conntrack.o
diff --git a/net/netfilter/xt_condition.c b/net/netfilter/xt_condition.c
new file mode 100644
index 0000000..162aa60
--- /dev/null
+++ b/net/netfilter/xt_condition.c
@@ -0,0 +1,294 @@
+/*
+ * "condition" match extension for Xtables
+ *
+ * Description: This module allows firewall rules to match using
+ * condition variables available through procfs.
+ *
+ * Authors:
+ * Stephane Ouellette <ouellettes@videotron.ca>, 2002-10-22
+ * Massimiliano Hofer <max@nucleus.it>, 2006-05-15
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License; either version 2
+ * or 3 of the License, as published by the Free Software Foundation.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/version.h>
+#include <linux/nsproxy.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_condition.h>
+#include <net/netns/generic.h>
+#include <asm/uaccess.h>
+
+MODULE_AUTHOR("Stephane Ouellette <ouellettes@videotron.ca>");
+MODULE_AUTHOR("Massimiliano Hofer <max@nucleus.it>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_DESCRIPTION("Allows rules to match against condition variables");
+MODULE_LICENSE("GPL");
+MODULE_PARM_DESC(condition_list_perms, "default permissions on /proc/net/nf_condition/* files");
+MODULE_PARM_DESC(condition_uid_perms, "default user owner of /proc/net/nf_condition/* files");
+MODULE_PARM_DESC(condition_gid_perms, "default group owner of /proc/net/nf_condition/* files");
+MODULE_ALIAS("ipt_condition");
+MODULE_ALIAS("ip6t_condition");
+
+struct condition_variable {
+ struct list_head list;
+ struct proc_dir_entry *status_proc;
+ unsigned int refcount;
+ bool enabled;
+};
+
+struct condition_net {
+ struct list_head list;
+ struct proc_dir_entry *proc_dir;
+ unsigned int list_perms;
+ unsigned int uid_perms;
+ unsigned int gid_perms;
+};
+
+static int condition_net_id;
+static inline struct condition_net *condition_pernet(struct net *net)
+{
+ return net_generic(net, condition_net_id);
+}
+
+/* proc_lock is a user context only semaphore used for write access */
+/* to the conditions' list. */
+static DEFINE_MUTEX(proc_lock);
+
+static int condition_proc_read(char __user *buffer, char **start, off_t offset,
+ int length, int *eof, void *data)
+{
+ const struct condition_variable *var = data;
+
+ buffer[0] = var->enabled ? '1' : '0';
+ buffer[1] = '\n';
+ if (length >= 2)
+ *eof = true;
+ return 2;
+}
+
+static int condition_proc_write(struct file *file, const char __user *buffer,
+ unsigned long length, void *data)
+{
+ struct condition_variable *var = data;
+ char newval;
+
+ if (length > 0) {
+ if (get_user(newval, buffer) != 0)
+ return -EFAULT;
+ /* Match only on the first character */
+ switch (newval) {
+ case '0':
+ var->enabled = false;
+ break;
+ case '1':
+ var->enabled = true;
+ break;
+ }
+ }
+ return length;
+}
+
+static bool
+condition_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_condition_mtinfo *info = par->matchinfo;
+ const struct condition_variable *var = info->condvar;
+
+ return var->enabled ^ info->invert;
+}
+
+static int condition_mt_check(const struct xt_mtchk_param *par)
+{
+ struct xt_condition_mtinfo *info = par->matchinfo;
+ struct condition_variable *var;
+ struct condition_net *cond_net =
+ condition_pernet(current->nsproxy->net_ns);
+
+ /* Forbid certain names */
+ if (*info->name == '\0' || *info->name == '.' ||
+ info->name[sizeof(info->name)-1] != '\0' ||
+ memchr(info->name, '/', sizeof(info->name)) != NULL) {
+ pr_info("name not allowed or too long: \"%.*s\"\n",
+ (unsigned int)sizeof(info->name), info->name);
+ return -EINVAL;
+ }
+
+ /*
+ * Let's acquire the lock, check for the condition and add it
+ * or increase the reference counter.
+ */
+ mutex_lock(&proc_lock);
+ list_for_each_entry(var, &cond_net->list, list) {
+ if (strcmp(info->name, var->status_proc->name) == 0) {
+ ++var->refcount;
+ mutex_unlock(&proc_lock);
+ info->condvar = var;
+ return 0;
+ }
+ }
+
+ /* At this point, we need to allocate a new condition variable. */
+ var = kmalloc(sizeof(struct condition_variable), GFP_KERNEL);
+ if (var == NULL) {
+ mutex_unlock(&proc_lock);
+ return -ENOMEM;
+ }
+
+ /* Create the condition variable's proc file entry. */
+ var->status_proc = create_proc_entry(info->name,
+ cond_net->list_perms,
+ cond_net->proc_dir);
+ if (var->status_proc == NULL) {
+ kfree(var);
+ mutex_unlock(&proc_lock);
+ return -ENOMEM;
+ }
+
+ var->refcount = 1;
+ var->enabled = false;
+ var->status_proc->data = var;
+ var->status_proc->read_proc = condition_proc_read;
+ var->status_proc->write_proc = condition_proc_write;
+ var->status_proc->uid = cond_net->uid_perms;
+ var->status_proc->gid = cond_net->gid_perms;
+ list_add(&var->list, &cond_net->list);
+ mutex_unlock(&proc_lock);
+ info->condvar = var;
+ return 0;
+}
+
+static void condition_mt_destroy(const struct xt_mtdtor_param *par)
+{
+ const struct xt_condition_mtinfo *info = par->matchinfo;
+ struct condition_variable *var = info->condvar;
+ struct condition_net *cond_net =
+ condition_pernet(current->nsproxy->net_ns);
+
+ mutex_lock(&proc_lock);
+ if (--var->refcount == 0) {
+ list_del(&var->list);
+ remove_proc_entry(var->status_proc->name,
+ cond_net->proc_dir);
+ mutex_unlock(&proc_lock);
+ kfree(var);
+ return;
+ }
+ mutex_unlock(&proc_lock);
+}
+
+static struct xt_match condition_mt_reg __read_mostly = {
+ .name = "condition",
+ .revision = 1,
+ .family = NFPROTO_UNSPEC,
+ .matchsize = sizeof(struct xt_condition_mtinfo),
+ .match = condition_mt,
+ .checkentry = condition_mt_check,
+ .destroy = condition_mt_destroy,
+ .me = THIS_MODULE,
+};
+
+static const char *const dir_name = "nf_condition";
+
+static int __net_init condnet_mt_init(struct net *net)
+{
+ struct condition_net *cond_net = condition_pernet(net);
+
+ INIT_LIST_HEAD(&cond_net->list);
+ cond_net->list_perms = S_IRUSR | S_IWUSR;
+ cond_net->uid_perms = S_IRUSR | S_IWUSR;
+ cond_net->gid_perms = S_IRUSR | S_IWUSR;
+
+ cond_net->proc_dir = proc_mkdir(dir_name, net->proc_net);
+
+ return (cond_net->proc_dir == NULL) ? -EACCES : 0;
+}
+
+static void __net_exit condnet_mt_exit(struct net *net)
+{
+ remove_proc_entry(dir_name, net->proc_net);
+}
+
+static struct pernet_operations condition_mt_netops = {
+ .init = condnet_mt_init,
+ .exit = condnet_mt_exit,
+ .id = &condition_net_id,
+ .size = sizeof(struct condition_net),
+};
+
+static int __init condition_mt_init(void)
+{
+ int ret;
+
+ mutex_init(&proc_lock);
+ ret = xt_register_match(&condition_mt_reg);
+ if (ret < 0)
+ return ret;
+
+ ret = register_pernet_subsys(&condition_mt_netops);
+ if (ret < 0) {
+ xt_unregister_match(&condition_mt_reg);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit condition_mt_exit(void)
+{
+ unregister_pernet_subsys(&condition_mt_netops);
+ xt_unregister_match(&condition_mt_reg);
+}
+
+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
+{
+ unsigned long l;
+ int ret;
+ struct condition_net *cond_net =
+ condition_pernet(current->nsproxy->net_ns);
+
+ if (!val) return -EINVAL;
+ ret = strict_strtoul(val, 0, &l);
+ if (ret == -EINVAL || ((uint)l != l))
+ return -EINVAL;
+ *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xt_condition_set_module_perms);
+
+int xt_condition_get_module_perms(char *buffer, struct kernel_param *kp)
+{
+ struct condition_net *cond_net =
+ condition_pernet(current->nsproxy->net_ns);
+
+ return sprintf(buffer, "%u",
+ *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)));
+}
+EXPORT_SYMBOL_GPL(xt_condition_get_module_perms);
+
+module_param_call(list_perms,
+ xt_condition_set_module_perms,
+ xt_condition_get_module_perms,
+ (void *) offsetof(struct condition_net, list_perms),
+ 0600);
+module_param_call(uid_perms,
+ xt_condition_set_module_perms,
+ xt_condition_get_module_perms,
+ (void *) offsetof(struct condition_net, uid_perms),
+ 0600);
+module_param_call(gid_perms,
+ xt_condition_set_module_perms,
+ xt_condition_get_module_perms,
+ (void *) offsetof(struct condition_net, gid_perms),
+ 0600);
+
+module_init(condition_mt_init);
+module_exit(condition_mt_exit);
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Johannes Berg @ 2010-07-22 14:10 UTC (permalink / raw)
To: Kay Sievers
Cc: Eric W. Biederman, Greg KH, Andrew Morton, Greg KH,
Rafael J. Wysocki, Maciej W. Rozycki, netdev
In-Reply-To: <AANLkTimM5Ea8mQ7aX4kDq3dgF3P-t2Wm3dERhckC69Ja@mail.gmail.com>
On Thu, 2010-07-22 at 15:38 +0200, Kay Sievers wrote:
> Please try to fix these drivers instead, or mark the broken for
> namespaces, if nobody can fix them right now.
We've tried. Nobody, including you, has been able to suggest how to fix
it. And it's not just broken with network namespaces enabled either, as
Eric explained. I really don't see why you keep asking us to fix it when
clearly we cannot -- even you don't know how and you certainly have more
insight into the device model than we do.
johannes
^ permalink raw reply
* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Jiri Kosina @ 2010-07-22 14:14 UTC (permalink / raw)
To: Marcel Holtmann
Cc: Alan Ott, David S Miller, Michael Poole, Bastien Nocera,
Eric Dumazet, linux-bluetooth, linux-kernel, netdev
In-Reply-To: <1278696815.10421.137.camel@localhost.localdomain>
On Fri, 9 Jul 2010, Marcel Holtmann wrote:
> > >>> what is usb-hid.ko doing here? I would expect a bunch of code
> > >>> duplication with minor difference between USB and Bluetooth.
> > >>>
> > >> usbhid doesn't have a lot of code for hidraw. Two functions are involved:
> > >> usbhid_output_raw_report()
> > >> - calls usb_control_msg() with Get_Report
> > >> usbhid_get_raw_report()
> > >> - calls usb_control_msg() with Set_Report
> > >> OR
> > >> - calls usb_interrupt_msg() on the Ouput pipe.
> > >>
> > >> This is of course easier than bluetooth because usb_control_msg() is
> > >> synchronous, even when requesting reports, mostly because of the nature
> > >> of USB, where the request and response are part of the same transfer.
> > >>
> > >> For Bluetooth, it's a bit more complicated since the kernel treats it
> > >> more like a networking interface (and indeed it is). My understanding is
> > >> that to make a synchronous transfer in bluetooth, one must:
> > >> - send the request packet
> > >> - block (wait_event_*())
> > >> - when the response is received in the input handler, wake_up_*().
> > >>
> > >> There's not really any code duplication, mostly because initiating
> > >> synchronous USB transfers (input and output) is easy (because of the
> > >> usb_*_msg() functions), while making synchronous Bluetooth transfers
> > >> must be done manually. If there's a nice, convenient, synchronous
> > >> function in Bluetooth similar to usb_control_msg() that I've missed,
> > >> then let me know, as it would simplify this whole thing.
> > >>
> > > there is not and I don't think we ever get one. My question here was
> > > more in the direction why HID core is doing these synchronously in the
> > > first place. Especially since USB can do everything async as well.
> >
> > I'm open to suggestions. The way I see it is from a user space
> > perspective. With Get_Feature being on an ioctl(), I don't see any clean
> > way to do it other than synchronously. Other operating systems (I can
> > say for sure Windows, Mac OS X, and FreeBSD) handle Get/Set Feature the
> > same way (synchronously) from user space.
> >
> > You seem to be proposing an asynchronous interface. What would that look
> > like from user space?
>
> not necessarily from user space, but at least from HID core to HIDP and
> usb-hid transports. At least that is what I would expect, Jiri?
Sorry for this taking too long (vacations, conferences, you name it) for
me to respond.
As all the _raw() callbacks are purely intended for userspace interaction
anyway, it's perfectly fine (and in fact desirable) for the low-level
transport drivers to perform these operations synchronously (and that's
what USB implementation does as well).
Marcel, if your opposition to synchronous interface is strong, we'll have
to think about other aproaches, but from my POV, the patch is fine as-is
for Bluetooth.
Thanks,
--
Jiri Kosina
SUSE Labs, Novell Inc.
^ permalink raw reply
* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Jan Engelhardt @ 2010-07-22 14:19 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <1279807385.2467.67.camel@edumazet-laptop>
On Thursday 2010-07-22 16:03, Eric Dumazet wrote:
>This match is a bit strange, being packet content agnostic...
>+/*
>+ * Yes, packet content is not interesting for us, we only take care
>+ * of cpu handling this packet
>+ */
That is not so strange after all, we have many packet agnostic matches:
xt_time, xt_condition, xt_IDLETIMER, xt_iface.
So this little comment looks a bit redundant.
Or it seems that academia can't come up with enough new protocols in time that
we have to resort to do -m coffeemaker :)
>@@ -0,0 +1,8 @@
>+#ifndef _XT_CPU_H
>+#define _XT_CPU_H
>+
>+struct xt_cpu_info {
>+ unsigned int cpu;
>+ int invert;
>+};
>+#endif /*_XT_MAC_H*/
Please take a read in "Writing Netfilter Modules" e-book :-)
It will tell you that types other than fixed ones are a no-no.
>diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
>index e28420a..0fe7efd 100644
>--- a/net/netfilter/Makefile
>+++ b/net/netfilter/Makefile
>@@ -79,6 +79,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
>+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
Try to keep it alphabetic (KConfig too).
>+ *
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 -j REDIRECT --to-port 8080
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 -j REDIRECT --to-port 8081
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 -j REDIRECT --to-port 8082
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 -j REDIRECT --to-port 8083
>+ *
>+ */
Well the commands you already have presented in the commit log, and the
most efficient place for these is actually the manpage.
>+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
>+{
>+ const struct xt_cpu_info *info = par->matchinfo;
>+ bool ret;
>+
>+ ret = info->cpu == smp_processor_id();
>+ ret ^= info->invert;
>+ return ret;
>+}
Looks simple enough that it could do it in a single line,
return (info->cpu == smp_processor_id()) ^ !!info->invert;
^ permalink raw reply
* Re: [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Jan Engelhardt @ 2010-07-22 14:44 UTC (permalink / raw)
To: Luciano Coelho
Cc: Netfilter Developer Mailing List, netdev, Patrick McHardy, sameo,
Alexey Dobriyan
In-Reply-To: <1279807758-6876-2-git-send-email-luciano.coelho@nokia.com>
On Thursday 2010-07-22 16:09, Luciano Coelho wrote:
>+static int condition_mt_check(const struct xt_mtchk_param *par)
>+{
>+ struct xt_condition_mtinfo *info = par->matchinfo;
>+ struct condition_variable *var;
>+ struct condition_net *cond_net =
>+ condition_pernet(current->nsproxy->net_ns);
Cc'ing Alexey who has done the netns support.
Alexey, you added par->net, but given Luciano just did it with
current->nsproxy->net_ns, do we really need par->net?
>+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
>+{
>+ unsigned long l;
>+ int ret;
>+ struct condition_net *cond_net =
>+ condition_pernet(current->nsproxy->net_ns);
>+
>+ if (!val) return -EINVAL;
newline before return.
>+ ret = strict_strtoul(val, 0, &l);
>+ if (ret == -EINVAL || ((uint)l != l))
>+ return -EINVAL;
>+ *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
I don't think we need this level of granularity; let the options be
global, similar to what xt_hashlimit does.
(I am not even sure if kp->arg can be non-multiples-of-4, in which case
this would be an alignment violation even.)
>+
>+ return 0;
>+}
>+EXPORT_SYMBOL_GPL(xt_condition_set_module_perms);
>+
>+int xt_condition_get_module_perms(char *buffer, struct kernel_param *kp)
>+{
>+ struct condition_net *cond_net =
>+ condition_pernet(current->nsproxy->net_ns);
>+
>+ return sprintf(buffer, "%u",
>+ *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)));
>+}
>+EXPORT_SYMBOL_GPL(xt_condition_get_module_perms);
>+
>+module_param_call(list_perms,
>+ xt_condition_set_module_perms,
>+ xt_condition_get_module_perms,
>+ (void *) offsetof(struct condition_net, list_perms),
>+ 0600);
>+module_param_call(uid_perms,
>+ xt_condition_set_module_perms,
>+ xt_condition_get_module_perms,
>+ (void *) offsetof(struct condition_net, uid_perms),
>+ 0600);
>+module_param_call(gid_perms,
>+ xt_condition_set_module_perms,
>+ xt_condition_get_module_perms,
>+ (void *) offsetof(struct condition_net, gid_perms),
>+ 0600);
>+
>+module_init(condition_mt_init);
>+module_exit(condition_mt_exit);
>--
>1.7.0.4
>
^ permalink raw reply
* Re: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Stefan Assmann @ 2010-07-22 14:47 UTC (permalink / raw)
To: Ben Hutchings
Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
alexander.h.duyck, leedom, harald
In-Reply-To: <1279807643.2104.1.camel@achroite.uk.solarflarecom.com>
On 22.07.2010 16:07, Ben Hutchings wrote:
> On Thu, 2010-07-22 at 14:50 +0200, Stefan Assmann wrote:
>> On 21.07.2010 15:54, Ben Hutchings wrote:
>>> On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
>>>> I put Alex' idea into code for further discussion, keeping the names
>>>> mentioned here until we agree on the scope of this attribute. When we
>>>> have settled I'll post a patch with proper patch description.
>>> [...]
>>>
>>> Just a little nitpick: I think it would be clearer to use a more
>>> specific term like 'address source' or 'address assignment type' rather
>>> than 'address type'.
>>
>> Here's a proposal for the final patch.
>
> Looks good, but...
>
> [...]
>> /**
>> + * dev_hw_addr_random - Create random MAC and set device flag
>> + * @dev: pointer to net_device structure
>> + * @addr: Pointer to a six-byte array containing the Ethernet address
>> + *
>> + * Generate random MAC to be used by a device and set addr_assign_type
>> + * so the state can be read by sysfs and be used by udev.
>> + */
>> +static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
>> +{
>> + dev->addr_assign_type |= NET_ADDR_RANDOM;
>> + random_ether_addr(hwaddr);
>> +}
> [...]
>
> ...why '|=' and not '='?
The intention is to use addr_assign_type as a bit field.
Okay it it might not make too much sense to 'steal' a random MAC
address but in case we add more types later it might get useful.
Stefan
--
Stefan Assmann | Red Hat GmbH
Software Engineer | Otto-Hahn-Strasse 20, 85609 Dornach
| HR: Amtsgericht Muenchen HRB 153243
| GF: Brendan Lane, Charlie Peters,
sassmann at redhat.com | Michael Cunningham, Charles Cachera
^ permalink raw reply
* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
From: Neil Horman @ 2010-07-22 14:57 UTC (permalink / raw)
To: Koki Sanagi
Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
fweisbec, mathieu.desnoyers
In-Reply-To: <4C4803B3.1020808@jp.fujitsu.com>
On Thu, Jul 22, 2010 at 05:39:15PM +0900, Koki Sanagi wrote:
> (2010/07/21 19:56), Neil Horman wrote:
> > On Wed, Jul 21, 2010 at 04:02:57PM +0900, Koki Sanagi wrote:
> >> (2010/07/20 20:50), Neil Horman wrote:
> >>> On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
> >>>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> >>>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> >>>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> >>>> we can check how long it takes to free transmited packets. And using it, we can
> >>>> calculate how many packets driver had at that time. It is useful when a drop of
> >>>> transmited packet is a problem.
> >>>>
> >>>> <idle>-0 [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
> >>>> <idle>-0 [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> >>>>
> >>>> udp-recv-302 [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> >>>>
> >>>>
> >>>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >>>> ---
> >>>> include/trace/events/skb.h | 42 ++++++++++++++++++++++++++++++++++++++++++
> >>>> net/core/datagram.c | 1 +
> >>>> net/core/dev.c | 2 ++
> >>>> net/core/skbuff.c | 1 +
> >>>> 4 files changed, 46 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> >>>> index 4b2be6d..84c9041 100644
> >>>> --- a/include/trace/events/skb.h
> >>>> +++ b/include/trace/events/skb.h
> >>>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
> >>>> __entry->skbaddr, __entry->protocol, __entry->location)
> >>>> );
> >>>>
> >>>> +DECLARE_EVENT_CLASS(free_skb,
> >>>> +
> >>>> + TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> + TP_ARGS(skb),
> >>>> +
> >>>> + TP_STRUCT__entry(
> >>>> + __field( void *, skbaddr )
> >>>> + ),
> >>>> +
> >>>> + TP_fast_assign(
> >>>> + __entry->skbaddr = skb;
> >>>> + ),
> >>>> +
> >>>> + TP_printk("skbaddr=%p", __entry->skbaddr)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, consume_skb,
> >>>> +
> >>>> + TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> + TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> >>>> +
> >>>> + TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> + TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> >>>> +
> >>>> + TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> + TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>
> >>> Why create these last two tracepoints at all? dev_kfree_skb_irq will eventually
> >>> pass through kfree_skb anyway, getting picked up by the tracepoint there, the
> >>> while the latter won't (since it uses __kfree_skb instead), I think that could
> >>> be fixed up by add a call to trace_kfree_skb there directly, saving you two
> >>> tracepoints.
> >>>
> >>> Neil
> >>>
> >> I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
> >> completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
> >> add trace_kfree_skb before it. skb_free_datagram_locked is same.
> >>
> > It isn't, you're right, but that was the point I made above. Those missed areas
> > could be easily handled by adding calls to trace_kfree_skb which already exists,
> > to the missed areas. Then you don't need to create those new tracepoints. The
> > way your doing this, if someone wants to trace all skb frees in debugfs, they
> > would have to enable three tracepoints, not just one. Not that thats the point
> > of your patch, but its something to consider, and it simplifies your code.
> > Neil
> >
>
> O.K. I've re-made a patch to use trace_kfree_skb instead of
> trace_dev_kfree_skb_irq and trace_skb_free_datagram_locked.
> But I've got a problem.
> I should use not __builtin_return_address, but macro or function which returns
> current address. But I don't know any macro like that. Do you know any solution ?
>
Since the trace call is the first thing in the function, why not just pass in
skb_free_datagram_locked as the pointer. That should work out properly
Neil
>
^ permalink raw reply
* Re: minstrel_tx_status mac80211 WARNINGs in vanilla 2.6.34.1
From: John W. Linville @ 2010-07-22 14:47 UTC (permalink / raw)
To: Sven Geggus; +Cc: netdev, linux-wireless, nbd
In-Reply-To: <20100722123000.GA16657@geggus.net>
On Thu, Jul 22, 2010 at 02:30:01PM +0200, Sven Geggus wrote:
> Hello,
>
> running vanilla 2.6.34.1 I get the following warnings in kernel log:
>
> WARNING: at net/mac80211/rc80211_minstrel.c:70 minstrel_tx_status+0x67/0xd1 [mac80211]()
> Hardware name: SCENIC E300/E600
> Modules linked in: i915 drm_kms_helper drm video backlight output lp loop
> snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_seq_dummy zd1211rw snd_seq_oss usbhid mac80211 option cfg80211
> snd_seq_midi usbserial snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device snd parport_pc ehci_hcd uhci_hcd soundcore intel_agp parport
> usbcore nls_base snd_page_alloc agpgart rng_core floppy sg
> Pid: 0, comm: swapper Tainted: G W 2.6.34.1 #3
> Call Trace:
> [<c102af25>] warn_slowpath_common+0x60/0x90
> [<c102af62>] warn_slowpath_null+0xd/0x10
> [<f83cbd48>] minstrel_tx_status+0x67/0xd1 [mac80211]
> [<f83b6eb1>] ieee80211_tx_status+0x1f6/0x5ac [mac80211]
> [<c1261533>] ? skb_dequeue+0x45/0x4c
> [<f83b6896>] ieee80211_tasklet_handler+0x61/0xd6 [mac80211]
> [<c102ed7d>] tasklet_action+0x62/0x9f
> [<c102f331>] __do_softirq+0x77/0xe5
> [<c102f3c5>] do_softirq+0x26/0x2b
> [<c102f52f>] irq_exit+0x29/0x66
> [<c1003e90>] do_IRQ+0x85/0x9b
> [<c1002d29>] common_interrupt+0x29/0x30
> [<c10083ac>] ? default_idle+0x2d/0x42
> [<c1001a9b>] cpu_idle+0x44/0x71
> [<c12e00de>] rest_init+0x96/0x98
> [<c1498862>] start_kernel+0x2a5/0x2aa
> [<c14980b7>] i386_start_kernel+0xb7/0xbf
> ---[ end trace f22ceacef336878f ]---
>
> Wireless driver is zd1211rw.
>
> Did not test with older kernel because this device has not been in user on
> this machine before.
>
> WLAN does however seem to work anyway.
Well, I just took a quick look -- so, I'm not 100% sure...
But, it looks to me like zd1211rw is reporting tx status on a rate
that minstrel didn't expect it to use. It seems like the hardware
is pre-wired to do retries on sequentially lower rates, which seems
a bit incompatible with minstrel's worldview.
Felix, can we accomodate this? The "WLAN does however seem to work
anyway" seems to suggest things work, so can we at least not yell
about it?
Thanks,
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Luciano Coelho @ 2010-07-22 15:16 UTC (permalink / raw)
To: ext Jan Engelhardt
Cc: Netfilter Developer Mailing List, netdev@vger.kernel.org,
Patrick McHardy, sameo@linux.intel.com, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1007221621270.1619@obet.zrqbmnf.qr>
On Thu, 2010-07-22 at 16:44 +0200, ext Jan Engelhardt wrote:
> On Thursday 2010-07-22 16:09, Luciano Coelho wrote:
> >+static int condition_mt_check(const struct xt_mtchk_param *par)
> >+{
> >+ struct xt_condition_mtinfo *info = par->matchinfo;
> >+ struct condition_variable *var;
> >+ struct condition_net *cond_net =
> >+ condition_pernet(current->nsproxy->net_ns);
>
> Cc'ing Alexey who has done the netns support.
>
> Alexey, you added par->net, but given Luciano just did it with
> current->nsproxy->net_ns, do we really need par->net?
>
>
> >+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
> >+{
> >+ unsigned long l;
> >+ int ret;
> >+ struct condition_net *cond_net =
> >+ condition_pernet(current->nsproxy->net_ns);
> >+
> >+ if (!val) return -EINVAL;
>
> newline before return.
Sure! I copied this from params.c. I'll fix it.
> >+ ret = strict_strtoul(val, 0, &l);
> >+ if (ret == -EINVAL || ((uint)l != l))
> >+ return -EINVAL;
>
> >+ *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
>
> I don't think we need this level of granularity; let the options be
> global, similar to what xt_hashlimit does.
I did this according to Patrick's comment:
> > proc_net_condition is a global variable, so this won't work for
> > namespaces. What the code does is reinitialize it when instantiating
> > a new namespace, so it will always point to the last instantiated
> > namespace.
> >
> > The same problem exists for the condition_list, each namespace
> > should only be able to access its own conditions.
>
> This also applies to the permission variables. Basically, we shouldn't
> be having any globals except perhaps the mutex. You probably need a
> module_param_call function to set them for the correct namespace (you
> can access that through current->nsproxy->net_ns).
I found it a bit strange to be able to change the module params in a
per-netns basis, but it is actually possible if you're changing the
parameters via sysfs. I tried it and it even seems to work. ;)
I can't see any module parameters in the xt_hashlimit.c file. Am I
looking in the wrong place?
I would be fine with making the module params global (as they were
before), if that's fine with Patrick too.
> (I am not even sure if kp->arg can be non-multiples-of-4, in which case
> this would be an alignment violation even.)
I'm passing size_t in kp->arg. It looks quite ugly, because usually
kp->arg is a pointer to some data. But at least this way, using
offsetof(), I could avoid lots of repeated code for the options...
--
Cheers,
Luca.
^ permalink raw reply
* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Eric Dumazet @ 2010-07-22 15:18 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <alpine.LSU.2.01.1007221611510.1619@obet.zrqbmnf.qr>
Le jeudi 22 juillet 2010 à 16:19 +0200, Jan Engelhardt a écrit :
> On Thursday 2010-07-22 16:03, Eric Dumazet wrote:
>
> >This match is a bit strange, being packet content agnostic...
> >+/*
> >+ * Yes, packet content is not interesting for us, we only take care
> >+ * of cpu handling this packet
> >+ */
>
> That is not so strange after all, we have many packet agnostic matches:
> xt_time, xt_condition, xt_IDLETIMER, xt_iface.
> So this little comment looks a bit redundant.
>
> Or it seems that academia can't come up with enough new protocols in time that
> we have to resort to do -m coffeemaker :)
>
> >@@ -0,0 +1,8 @@
> >+#ifndef _XT_CPU_H
> >+#define _XT_CPU_H
> >+
> >+struct xt_cpu_info {
> >+ unsigned int cpu;
> >+ int invert;
> >+};
> >+#endif /*_XT_MAC_H*/
>
> Please take a read in "Writing Netfilter Modules" e-book :-)
> It will tell you that types other than fixed ones are a no-no.
Ok, let's do that, but I doubt sizeof(int) can be different than 4 on a
Linux 2.6 host right now.
I prefer not doing the !!info->invert, and do the check only once.
Thanks
[PATCH nf-next-2.6] netfilter: add xt_cpu match
In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.
With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)
Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.
Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.
Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
-j REDIRECT --to-port 8081
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
-j REDIRECT --to-port 8082
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
-j REDIRECT --to-port 8083
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/netfilter/Kbuild | 3 -
include/linux/netfilter/xt_cpu.h | 11 +++++
net/netfilter/Kconfig | 9 ++++
net/netfilter/Makefile | 1
net/netfilter/xt_cpu.c | 63 +++++++++++++++++++++++++++++
5 files changed, 86 insertions(+), 1 deletion(-)
diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..1041a1d 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -19,12 +19,13 @@ header-y += xt_TCPMSS.h
header-y += xt_TCPOPTSTRIP.h
header-y += xt_TEE.h
header-y += xt_TPROXY.h
+header-y += xt_cluster.h
header-y += xt_comment.h
header-y += xt_connbytes.h
header-y += xt_connlimit.h
header-y += xt_connmark.h
header-y += xt_conntrack.h
-header-y += xt_cluster.h
+header-y += xt_cpu.h
header-y += xt_dccp.h
header-y += xt_dscp.h
header-y += xt_esp.h
diff --git a/include/linux/netfilter/xt_cpu.h b/include/linux/netfilter/xt_cpu.h
index e69de29..93c7f11 100644
--- a/include/linux/netfilter/xt_cpu.h
+++ b/include/linux/netfilter/xt_cpu.h
@@ -0,0 +1,11 @@
+#ifndef _XT_CPU_H
+#define _XT_CPU_H
+
+#include <linux/types.h>
+
+struct xt_cpu_info {
+ __u32 cpu;
+ __u32 invert;
+};
+
+#endif /*_XT_CPU_H*/
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..523e8d0 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -647,6 +647,15 @@ config NETFILTER_XT_MATCH_CONNTRACK
To compile it as a module, choose M here. If unsure, say N.
+config NETFILTER_XT_MATCH_CPU
+ tristate '"cpu" match support'
+ depends on NETFILTER_ADVANCED
+ help
+ CPU matching allows you to match packets based on the CPU
+ currently handling the packet.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config NETFILTER_XT_MATCH_DCCP
tristate '"dccp" protocol match support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..6da84c3 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNTRACK) += xt_conntrack.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
obj-$(CONFIG_NETFILTER_XT_MATCH_DCCP) += xt_dccp.o
obj-$(CONFIG_NETFILTER_XT_MATCH_DSCP) += xt_dscp.o
obj-$(CONFIG_NETFILTER_XT_MATCH_ESP) += xt_esp.o
diff --git a/net/netfilter/xt_cpu.c b/net/netfilter/xt_cpu.c
index e69de29..b39db8a 100644
--- a/net/netfilter/xt_cpu.c
+++ b/net/netfilter/xt_cpu.c
@@ -0,0 +1,63 @@
+/* Kernel module to match running CPU */
+
+/*
+ * Might be used to distribute connections on several daemons, if
+ * RPS (Remote Packet Steering) is enabled or NIC is multiqueue capable,
+ * each RX queue IRQ affined to one CPU (1:1 mapping)
+ *
+ */
+
+/* (C) 2010 Eric Dumazet
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_cpu.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Dumazet <eric.dumazet@gmail.com>");
+MODULE_DESCRIPTION("Xtables: CPU match");
+
+static int cpu_mt_check(const struct xt_mtchk_param *par)
+{
+ const struct xt_cpu_info *info = par->matchinfo;
+
+ if (info->invert & ~1)
+ return -EINVAL;
+ return 0;
+}
+
+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_cpu_info *info = par->matchinfo;
+
+ return (info->cpu == smp_processor_id()) ^ info->invert;
+}
+
+static struct xt_match cpu_mt_reg __read_mostly = {
+ .name = "cpu",
+ .revision = 0,
+ .family = NFPROTO_UNSPEC,
+ .checkentry = cpu_mt_check,
+ .match = cpu_mt,
+ .matchsize = sizeof(struct xt_cpu_info),
+ .me = THIS_MODULE,
+};
+
+static int __init cpu_mt_init(void)
+{
+ return xt_register_match(&cpu_mt_reg);
+}
+
+static void __exit cpu_mt_exit(void)
+{
+ xt_unregister_match(&cpu_mt_reg);
+}
+
+module_init(cpu_mt_init);
+module_exit(cpu_mt_exit);
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Marcel Holtmann @ 2010-07-22 15:21 UTC (permalink / raw)
To: Jiri Kosina
Cc: Alan Ott, David S Miller, Michael Poole, Bastien Nocera,
Eric Dumazet, linux-bluetooth, linux-kernel, netdev
In-Reply-To: <alpine.LNX.2.00.1007221612450.4741@pobox.suse.cz>
Hi Jiri,
> > > >>> what is usb-hid.ko doing here? I would expect a bunch of code
> > > >>> duplication with minor difference between USB and Bluetooth.
> > > >>>
> > > >> usbhid doesn't have a lot of code for hidraw. Two functions are involved:
> > > >> usbhid_output_raw_report()
> > > >> - calls usb_control_msg() with Get_Report
> > > >> usbhid_get_raw_report()
> > > >> - calls usb_control_msg() with Set_Report
> > > >> OR
> > > >> - calls usb_interrupt_msg() on the Ouput pipe.
> > > >>
> > > >> This is of course easier than bluetooth because usb_control_msg() is
> > > >> synchronous, even when requesting reports, mostly because of the nature
> > > >> of USB, where the request and response are part of the same transfer.
> > > >>
> > > >> For Bluetooth, it's a bit more complicated since the kernel treats it
> > > >> more like a networking interface (and indeed it is). My understanding is
> > > >> that to make a synchronous transfer in bluetooth, one must:
> > > >> - send the request packet
> > > >> - block (wait_event_*())
> > > >> - when the response is received in the input handler, wake_up_*().
> > > >>
> > > >> There's not really any code duplication, mostly because initiating
> > > >> synchronous USB transfers (input and output) is easy (because of the
> > > >> usb_*_msg() functions), while making synchronous Bluetooth transfers
> > > >> must be done manually. If there's a nice, convenient, synchronous
> > > >> function in Bluetooth similar to usb_control_msg() that I've missed,
> > > >> then let me know, as it would simplify this whole thing.
> > > >>
> > > > there is not and I don't think we ever get one. My question here was
> > > > more in the direction why HID core is doing these synchronously in the
> > > > first place. Especially since USB can do everything async as well.
> > >
> > > I'm open to suggestions. The way I see it is from a user space
> > > perspective. With Get_Feature being on an ioctl(), I don't see any clean
> > > way to do it other than synchronously. Other operating systems (I can
> > > say for sure Windows, Mac OS X, and FreeBSD) handle Get/Set Feature the
> > > same way (synchronously) from user space.
> > >
> > > You seem to be proposing an asynchronous interface. What would that look
> > > like from user space?
> >
> > not necessarily from user space, but at least from HID core to HIDP and
> > usb-hid transports. At least that is what I would expect, Jiri?
>
> Sorry for this taking too long (vacations, conferences, you name it) for
> me to respond.
>
> As all the _raw() callbacks are purely intended for userspace interaction
> anyway, it's perfectly fine (and in fact desirable) for the low-level
> transport drivers to perform these operations synchronously (and that's
> what USB implementation does as well).
>
> Marcel, if your opposition to synchronous interface is strong, we'll have
> to think about other aproaches, but from my POV, the patch is fine as-is
> for Bluetooth.
that the ioctl() API is synchronous is fine to me, however pushing that
down to the transport drivers seems wrong to me. I think the HID core
should be able to handle a fully asynchronous transport driver. I know
that with the USB subsystem you are little bit spoiled here, but for
Bluetooth it is not the case. And in the end even using the asynchronous
USB URB calls would be nice as well.
So why not make the core actually wait for responses from the transport
driver. I would make the transport drivers a lot simpler in the long
run. And I know that most likely besides Bluetooth and USB you won't see
another, but you never know.
Regards
Marcel
^ permalink raw reply
* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-22 15:30 UTC (permalink / raw)
To: Johannes Berg
Cc: Eric W. Biederman, Greg KH, Andrew Morton, Greg KH,
Rafael J. Wysocki, Maciej W. Rozycki, netdev
In-Reply-To: <1279807845.12439.20.camel@jlt3.sipsolutions.net>
On Thu, Jul 22, 2010 at 16:10, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2010-07-22 at 15:38 +0200, Kay Sievers wrote:
>>
>> Please try to fix these drivers instead, or mark the broken for
>> namespaces, if nobody can fix them right now.
>
> We've tried. Nobody, including you, has been able to suggest how to fix
> it.
I did, and several times. Here are the options again:
- Split the driver in two modules, so the auto-cleanup of the netdev
can be done by the second module when the device is force-unloaded
without taking any references to the code while in use (netdev specif
behavior).
- Move the device cleanup code somehow in the core by adding proper
functions to bus devices, similar to the completely mis-used
device_create() function for class devices. This would also be a
proper fix for the weird driver core use.
- Do not allow to force-unload the module while the netdev is in use.
You would need some "destruct" command for the device then, which
removes the netdev, and to be able to unload the module.
> And it's not just broken with network namespaces enabled either, as
So what's the problem without the sysfs ns then? I didn't hear any the
last couple of years.
> Eric explained. I really don't see why you keep asking us to fix it when
> clearly we cannot -- even you don't know how and you certainly have more
> insight into the device model than we do.
Sure, and I ask again to fix the drivers, instead of sneaking-in dirty
hacks into the core, just by calling an expected behavior a
"regression". This is not a core bug, and should not be worked around
that way in the core.
If there is a new requirement for the core (like possibly point 2
above), we can surely look into making this happen.
We can not add lists of individual subsystems to generic core
functions to work around broken drivers. I hope you will understand
that.
Thanks,
Kay
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox