Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-12  9:10 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <1963349.P9uq3yvlyR@sifl>

On Mon, Dec 10, 2012 at 05:43:49PM -0500, Paul Moore wrote:
> On Monday, December 10, 2012 07:50:35 PM Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
> > > On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> > > > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > > > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin 
> wrote:
> > > > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet
> > > > > > > > > exist
> > > > > > > > > in any released SELinux policy as we are just now adding them
> > > > > > > > > with
> > > > > > > > > this patchset. With current policies loaded into a kernel with
> > > > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > > > > permission would be treated according to the policy's unknown
> > > > > > > > > permission setting.
> > > > > > > > 
> > > > > > > > OK I think we need to rethink what we are doing here: what you
> > > > > > > > sent
> > > > > > > > addresses the problem as stated but I think we mis-stated it. 
> > > > > > > > Let
> > > > > > > > me try to restate the problem: it is not just selinux problem.
> > > > > > > > Let's
> > > > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it
> > > > > > > > as
> > > > > > > > root.
> > > > > > > > 
> > > > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > > > > kernel priveledges.
> > > > > > > 
> > > > > > > Correct me if I'm wrong, but I believe libvirt does this while
> > > > > > > running
> > > > > > > as root.  Assuming that is the case, why not simply
> > > > > > > setuid()/setgid()
> > > > > > > to the same credentials as the QEMU instance before creating the
> > > > > > > TUN
> > > > > > > device? You can always (re)configure the device afterwards while
> > > > > > > running as root/CAP_NET_ADMIN.
> > > > > > 
> > > > > > We want isolation between qemu instances.
> > > > > 
> > > > > Understood, I agree.
> > > > > 
> > > > > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > > > > already doing this for us automatically in most cases; the only thing
> > > > > we
> > > > > will want to do is make sure the SELinux policy is aware of the new
> > > > > permission.
> > > > > 
> > > > > Achieving separation via DAC should also be easily done, simply run
> > > > > each
> > > > > QEMU instance with a separate UID and/or GID.
> > > > > 
> > > > > > Giving qemu right to open tun and SETIFF would give it rights
> > > > > > to access any tun device.
> > > > > 
> > > > > I'm quickly looked at tun_chr_open() again and I don't see any special
> > > > > rights/privileges required, the same for tun_chr_ioctl() and
> > > > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > > > > tun_not_capable() which does a simple DAC check; it must have the same
> > > > > UID/GID or have CAP_NET_ADMIN.
> > > > > 
> > > > > I'm having a hard time seeing the problem you are describing; help me
> > > > > understand.
> > > > 
> > > > The issue is guest controls the number of queues in use.
> > > > So qemu would be required to be allowed to call tun_set_queue.
> > > > If we allow this we have a problem as one qemu will be
> > > > able to access any tun.
> > > 
> > > QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(),
> > > which from a practical point of view means that the TUN device was
> > > created with the same UID/GID as the QEMU instance.  If you want TUN
> > > device separation between QEMU instances using DAC you need to run each
> > > QEMU instance with a different UID/GID (which you should be doing anyway
> > > if you want DAC enforced general separation).
> > > 
> > > I believe I've stated this point several times now and I don't feel you've
> > > addressed it properly.
> > 
> > Look at how it works at the moment:
> > a priveledged libvirt server calls tun_set_iff
> > and passes the fd to qemu which is not priveledged.
> > 
> > The result is isolation between qemu instances without
> > need to create uid per qemu instance.
> 
> Okay, good.  That is my understanding.
>  
> > How do we create multiple queues? It makes sense to
> > follow this model and pass in fds for individual queues.
> 
> Okay.
> 
> > However they need to be disabled initially
> > so libvirt can not do tun_set_queue for us.
> 
> Unrelated question: why do the queues need to be disabled initially?  Is this 
> to prevent traffic from being queued up?  Some other reason?  I'm just curious 
> as to the reason ...

Yes.
Basically because old guests only use a single queue.
If a guest comes along and declares multiqueue support
we can queue up traffic on new queues but if we
do this with a legacy guest it will not be able to
consume it.



> > can't utilize multiqueue.
> 
> I still don't understand why in the multiqueue case libvirt doesn't just 
> change it's effective UID/GID when creating the TUN device, or just use the 
> TUNSETOWNER/TUNSETGROUP commands. This would solve the problem you describe 
> above and - at least to me - seems like a better solution conceptually.
> 
> Help me understand why you believe that will not work.
> 
> Do you not want to give ownership of the TUN device to QEMU?  That would be 
> the only reason I can think of, but all of your comments that I can recall 
> have been about isolation between QEMU instances and not access control 
> between a QEMU instance and its assigned TUN device.

I think I might have confused things more than clarified them.
Let me comment on specific lines in patch that worry me
that will make it clear I hope.

> > My solution is an unpriveledged variant
> > of tun_set_queue that only enables/disables
> > a queue without attach/detach.
> 
> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply

* Re: [PATCH RFC 0/5] Containerize syslog
From: Glauber Costa @ 2012-12-12  8:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Morton, Rui Xiang, netdev, containers
In-Reply-To: <87txrs30ur.fsf@xmission.com>

On 12/11/2012 10:22 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
>> On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
>>> Glauber Costa <glommer@parallels.com> writes:
>>>
>>>> I keep asking myself if it isn't the case of forwarding to a container
>>>> all messages printed in process context. That will obviously exclude all
>>>> messages resulting from kthreads - that will always be in the initial
>>>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>>>> delivering the same message twice: one to the container, and the other
>>>> to the host system.
>>>
>>> Except that there is harm in double printing.  One of the better
>>> justifications for doing something with the kernel log is that it is
>>> possible to overflow the kernel log with operations performed
>>> exclusively in a container.
>>>
>> I don't agree with you here.
>>
>> If we are double printing, we are using up more memory, but we also have
>> an extra buffer anyway. The messages are print on behalf of the user,
>> but still, by the kernel.
>>
>> So one of the following will necessarily hold:
>>
>> 1) There is no way that the process can overflow the main log, and as a
>> consequence, the container log, that has less messages than it.
>>
>> 2) The process will overflow the main log. But since we are not printing
>> anything extra to the main log compared to the scenario in which the
>> process lives in the main namespace, this would already be a problem
>> independent of namespaces. And needs to be fixed.
> 
> Well mounts, brining network interfaces up and down, running packets
> through our own choice of firewall rules, possibly enabling debug
> messages on network interfaces has the potential to create messages we
> aren't seeing today.
> 

There are two kinds of messages: the ones that would be print anyway if
you were not running in an ns, which we have no reason to fear, and the
ones that only exist because we wrote the code for them, due to ns
support. They are no different from writing a new fs support, driver,
etc. Any new functionality can print new messages, and we have to be
sure they won't fill the log. We write that code, so it is on us to make
sure the messages are being print in a reasonable rate.

This should be true for all messages running in process context. It is
either that, or we have a bug and we relying on a specific clone flag to
protect us against the buffer overrun.


>> IOW, double printing should not print anything *extra* to the main log.
>> It just prints to the container log, and leaves a copy to the box admin
>> to see. I think it is very reasonable to imagine that the main admin
>> would like to see anything the kernel has to tell him about the box.
> 
> The only reason that I have seen for doing anything with printks is
> because we are generating messages that would not be generated in a
> non-container environment.  At which point double printing is scary
> because it allows a container user to flood the kernel log ring buffer
> and suppress interesting messages.
> 
>>> I do think the idea of process context printks going to the current
>>> container one worth playing with.
>>>
>>
>> It still leaves the problem of prinkts outside process context that
>> should go to a namespace open. But it is easy to extend this idea to do
>> both.
> 
> Hmm.  For printks from process context I think I can see a point where
> double printing makes sense, because that is a rather indiscriminate grab
> of printk messages.
> 
exactly. What I have in mind this whole time, is that if you are
printing a message of interest of the container admin, it is *very
likely* also of interest of the box admin, specially if it indicates
something going wrong. Maybe what goes and does not go to the main
buffer can be determined by the log level of each buffer. But still, I
think that just hiding them from the box admin may not exactly be what
we want.

Cheers

^ permalink raw reply

* RE: Gianfar driver issue
From: voncken @ 2012-12-12  8:46 UTC (permalink / raw)
  To: 'Claudiu Manoil'; +Cc: netdev
In-Reply-To: <50C744CE.7040109@freescale.com>

	Thanks claudiu, 

	I will reformat my patch and resend it.

	Regards.

Cedric Voncken 

-----Message d'origine-----
De : Claudiu Manoil [mailto:claudiu.manoil@freescale.com] 
Envoyé : mardi 11 décembre 2012 15:36
À : Cedric VONCKEN
Cc : netdev@vger.kernel.org
Objet : Re: Gianfar driver issue

On 12/11/2012 11:59 AM, Cedric VONCKEN wrote:
> 	Hi all,
>
> 	I think he have an issue in Gianfar driver.
>
> 	When the Netdev tx queue timeout occurred, the function
> gfar_timeout(..) is called. This function calls indirectly the
> gfar_init_mac(..) function.
>
> 	In this function, the rctrl register is set to a default value.
>
> 	If the Promiscuous is enable on the net dev ( flag IFF_PROMISC is 
> set), the gfar_init_function does not reactivate it.
>
> 	The Promiscuous mode is used for example when the netdev is bridged.
> 	
> 	I apply this patch to fix it.
>
> 	--- a/drivers/net/ethernet/freescale/gianfar.c.	2012-06-01
> 09:16:13.000000000 +0200
> 	+++ b/drivers/net/ethernet/freescale/gianfar.c	2012-12-11
> 10:38:23.000000000 +0100
> 	@@ -356,6 +356,11 @@
>   	/* Configure the coalescing support */
>   	gfar_configure_coalescing(priv, 0xFF, 0xFF);
>
> +	if (ndev->flags & IFF_PROMISC) {
> +		/* Set RCTRL to PROM */
> +		rctrl |= RCTRL_PROM;
> +	}
> +
>   	if (priv->rx_filer_enable) {
>   		rctrl |= RCTRL_FILREN;
>   		/* Program the RIR0 reg with the required distribution */

Hello,

I don't see any issue with this code change, and there are other drivers too
reconfiguring the promiscuous mode upon tx timeout.
A valid (formatted) patch needs to be sent however.

Thanks,
Claudiu

^ permalink raw reply

* Re: [BUG] Kernel recieves DNS reply, but doesn't deliver it to a waiting application
From: Andrew Savchenko @ 2012-12-12  8:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <20121023012759.ca7f91d6.bircoph@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6266 bytes --]

Hello,

On Tue, 23 Oct 2012 01:27:59 +0400 Andrew Savchenko wrote:
> On Mon, 22 Oct 2012 08:48:09 +0200 Eric Dumazet wrote:
[...]
> > Some driver or protocol stack is messing with skb->truesize, as
> > your /proc/net/udp file contains anomalies :
> > 
> > $ cat /proc/net/udp
> >   sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
> > ...
> >   323: 074A070A:007B 00000000:0000 07 FFFDF700:00000000 00:00000000 00000000   123        0 254469 2 ffff88003d581880 0
> > ...
> >   323: 00FCA8C0:007B 00000000:0000 07 FFFFF900:00000000 00:00000000 00000000     0        0 5187 2 ffff880039993880 0
> > 
> > Its clearly not possible to get tx_queue = 0xFFFDF700 or 0xFFFFF900
> > 
> > So what drivers handle following IP addresses : 192.168.252.0 , 10.7.74.7  ?
> 
> 192.168.252.0 is handled by eth0 interface running on Realtek
> Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (10ec:8139) NIC.
> Kernel driver 8139too. This interface handles multiple subnetworks:
> 
> # ip addr show eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000 
> link/ether 00:80:48:30:ca:f3 brd ff:ff:ff:ff:ff:ff
> inet 10.51.15.126/25 brd 10.51.15.127 scope global eth0
> inet 192.168.252.0/31 scope global eth0
> 
> 10.7.74.7 is an l2tp connection handled by ppp over l2tp:
> CONFIG_PPPOL2TP=y
> It is running on top of eth0 described above.
> 
> # ip addr show ppp0
> 65: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UNKNOWN qlen 3
> link/ppp 
> inet 10.7.74.7 peer 10.7.2.18/32 scope global ppp0

I updated kernel on this system to 3.7.0 and udp anomaly is still
present:

$ cat /proc/net/udp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
    0: 00000000:06A5 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5326 2 ffff88003dbf0a80 0          
    8: 00000000:7EAD 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5157 2 ffff8800398c2000 0          
   89: 00000000:90FE 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5101 2 ffff88003dbd3500 0          
  160: 0100007F:2745 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4598 2 ffff88003d612700 0          
  184: 0100007F:035D 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4774 2 ffff88003d612a80 0          
  217: 00000000:857E 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5195 2 ffff8800398c2700 0          
  318: 00000000:A9E3 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4782 2 ffff88003d612e00 0          
  335: 7E0F330A:01F4 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5303 2 ffff8800398c2e00 0          
  348: 00000000:0801 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5186 2 ffff8800398c2380 0          
  387: 7E0F330A:DE28 1400320A:06A5 01 00000000:00000000 00:00000000 00000000     0        0 5332 4 ffff88003dbf0e00 0          
  400: 010013AC:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4842 2 ffff88003d613880 0          
  400: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4841 2 ffff88003d613500 0          
  414: 00000000:0043 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5273 2 ffff8800398c2a80 0          
  458: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4483 2 ffff88003d612000 0          
  459: 00000000:0270 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4507 2 ffff88003d612380 0          
  466: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4802 2 ffff88003d613180 0          
  470: 076A070A:007B 00000000:0000 07 FFFF4600:00000000 00:00000000 00000000   123        0 5552 2 ffff880039974380 0          
  470: 010213AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4986 2 ffff88003dbd3180 0          
  470: 010013AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4985 2 ffff88003dbd2e00 0          
  470: 00FCA8C0:007B 00000000:0000 07 FFFFFB00:00000000 00:00000000 00000000     0        0 4984 2 ffff88003dbd2a80 0          
  470: 7E0F330A:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4983 2 ffff88003dbd2700 0          
  470: 0100007F:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4982 2 ffff88003dbd2380 0          
  470: 00000000:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4975 2 ffff88003d613c00 0          
  484: FF0013AC:0089 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5316 2 ffff88003dbf0000 0          
  484: 010013AC:0089 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5315 2 ffff88003dbd3880 0          
  484: FF0213AC:0089 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5312 2 ffff8800398c3c00 0          
  484: 010213AC:0089 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5311 2 ffff8800398c3880 0          
  484: 00000000:0089 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5308 2 ffff8800398c3180 0          
  485: FF0013AC:008A 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5318 2 ffff88003dbf0700 0          
  485: 010013AC:008A 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5317 2 ffff88003dbf0380 0          
  485: FF0213AC:008A 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5314 2 ffff88003dbd3c00 0          
  485: 010213AC:008A 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5313 2 ffff88003dbd2000 0          
  485: 00000000:008A 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 5309 2 ffff8800398c3500 0

The bug hasn't shown up yet, I'll need to wait for about a week to see
if it is reproducible.

Best regards,
Andrew Savchenko

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH net-next 2/2] bridge: add support of adding and deleting mdb entries
From: Cong Wang @ 2012-12-12  8:23 UTC (permalink / raw)
  To: netdev
  Cc: bridge, Cong Wang, Herbert Xu, Stephen Hemminger, David S. Miller,
	Thomas Graf
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

This patch implents adding/deleting mdb entries via netlink.
Currently all entries are temp, we probably need a flag to distinguish
permanent entries too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
 include/uapi/linux/if_bridge.h |    8 ++
 net/bridge/br_mdb.c            |  240 ++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_multicast.c      |   55 +++++-----
 net/bridge/br_private.h        |   23 ++++
 4 files changed, 297 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 9a0f6ff..afbb18a 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -157,6 +157,7 @@ enum {
 #define MDBA_ROUTER_MAX (__MDBA_ROUTER_MAX - 1)
 
 struct br_port_msg {
+	__u8  family;
 	__u32 ifindex;
 };
 
@@ -171,4 +172,11 @@ struct br_mdb_entry {
 	} addr;
 };
 
+enum {
+	MDBA_SET_ENTRY_UNSPEC,
+	MDBA_SET_ENTRY,
+	__MDBA_SET_ENTRY_MAX,
+};
+#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index a8cfbf5..6f0a2ee 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -4,6 +4,7 @@
 #include <linux/netdevice.h>
 #include <linux/rculist.h>
 #include <linux/skbuff.h>
+#include <linux/if_ether.h>
 #include <net/ip.h>
 #include <net/netlink.h>
 #if IS_ENABLED(CONFIG_IPV6)
@@ -235,7 +236,246 @@ void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 	__br_mdb_notify(dev, &entry, type);
 }
 
+static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
+{
+	if (entry->ifindex == 0)
+		return false;
+
+	if (entry->addr.proto == htons(ETH_P_IP)) {
+		if (!ipv4_is_multicast(entry->addr.u.ip4))
+			return false;
+		if (ipv4_is_local_multicast(entry->addr.u.ip4))
+			return false;
+#if IS_ENABLED(CONFIG_IPV6)
+	} else if (entry->addr.proto == htons(ETH_P_IPV6)) {
+		if (!ipv6_is_transient_multicast(&entry->addr.u.ip6))
+			return false;
+#endif
+	} else
+		return false;
+
+	return true;
+}
+
+static int br_mdb_parse(struct sk_buff *skb, struct nlmsghdr *nlh,
+			struct net_device **pdev, struct br_mdb_entry **pentry)
+{
+	struct net *net = sock_net(skb->sk);
+	struct br_mdb_entry *entry;
+	struct br_port_msg *bpm;
+	struct nlattr *tb[MDBA_SET_ENTRY_MAX+1];
+	struct net_device *dev;
+	int err;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	err = nlmsg_parse(nlh, sizeof(*bpm), tb, MDBA_SET_ENTRY, NULL);
+	if (err < 0)
+		return err;
+
+	bpm = nlmsg_data(nlh);
+	if (bpm->ifindex == 0) {
+		pr_info("PF_BRIDGE: br_mdb_parse() with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, bpm->ifindex);
+	if (dev == NULL) {
+		pr_info("PF_BRIDGE: br_mdb_parse() with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (!(dev->priv_flags & IFF_EBRIDGE)) {
+		pr_info("PF_BRIDGE: br_mdb_parse() with non-bridge\n");
+		return -EOPNOTSUPP;
+	}
+
+	*pdev = dev;
+
+	if (!tb[MDBA_SET_ENTRY] ||
+	    nla_len(tb[MDBA_SET_ENTRY]) != sizeof(struct br_mdb_entry)) {
+		pr_info("PF_BRIDGE: br_mdb_parse() with invalid attr\n");
+		return -EINVAL;
+	}
+
+	entry = nla_data(tb[MDBA_SET_ENTRY]);
+	if (!is_valid_mdb_entry(entry)) {
+		pr_info("PF_BRIDGE: br_mdb_parse() with invalid entry\n");
+		return -EINVAL;
+	}
+
+	*pentry = entry;
+	return 0;
+}
+
+static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
+			    struct br_ip *group)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group __rcu **pp;
+	struct net_bridge_mdb_htable *mdb;
+	int err;
+
+	mdb = mlock_dereference(br->mdb, br);
+	mp = br_mdb_ip_get(mdb, group);
+	if (!mp) {
+		mp = br_multicast_new_group(br, port, group);
+		err = PTR_ERR(mp);
+		if (IS_ERR(mp))
+			return err;
+	}
+
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
+		if (p->port == port)
+			return -EEXIST;
+		if ((unsigned long)p->port < (unsigned long)port)
+			break;
+	}
+
+	p = br_multicast_new_port_group(port, group, *pp);
+	if (unlikely(!p))
+		return -ENOMEM;
+	rcu_assign_pointer(*pp, p);
+
+	br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
+	return 0;
+}
+
+static int __br_mdb_add(struct net *net, struct net_bridge *br,
+			struct br_mdb_entry *entry)
+{
+	struct br_ip ip;
+	struct net_device *dev;
+	struct net_bridge_port *p;
+	int ret;
+
+	if (!netif_running(br->dev) || br->multicast_disabled)
+		return -EINVAL;
+
+	dev = __dev_get_by_index(net, entry->ifindex);
+	if (!dev)
+		return -ENODEV;
+
+	p = br_port_get_rtnl(dev);
+	if (!p || p->br != br || p->state == BR_STATE_DISABLED)
+		return -EINVAL;
+
+	ip.proto = entry->addr.proto;
+	if (ip.proto == htons(ETH_P_IP))
+		ip.u.ip4 = entry->addr.u.ip4;
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		ip.u.ip6 = entry->addr.u.ip6;
+#endif
+
+	spin_lock_bh(&br->multicast_lock);
+	ret = br_mdb_add_group(br, p, &ip);
+	spin_unlock_bh(&br->multicast_lock);
+	return ret;
+}
+
+static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct br_mdb_entry *entry;
+	struct net_device *dev;
+	struct net_bridge *br;
+	int err;
+
+	err = br_mdb_parse(skb, nlh, &dev, &entry);
+	if (err < 0)
+		return err;
+
+	br = netdev_priv(dev);
+
+	err = __br_mdb_add(net, br, entry);
+	if (!err)
+		__br_mdb_notify(dev, entry, RTM_NEWMDB);
+	return err;
+}
+
+static int __br_mdb_del(struct net_bridge *br, struct br_mdb_entry *entry)
+{
+	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group __rcu **pp;
+	struct br_ip ip;
+	int err = -EINVAL;
+
+	if (!netif_running(br->dev) || br->multicast_disabled)
+		return -EINVAL;
+
+	if (timer_pending(&br->multicast_querier_timer))
+		return -EBUSY;
+
+	ip.proto = entry->addr.proto;
+	if (ip.proto == htons(ETH_P_IP))
+		ip.u.ip4 = entry->addr.u.ip4;
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		ip.u.ip6 = entry->addr.u.ip6;
+#endif
+
+	spin_lock_bh(&br->multicast_lock);
+	mdb = mlock_dereference(br->mdb, br);
+
+	mp = br_mdb_ip_get(mdb, &ip);
+	if (!mp)
+		goto unlock;
+
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
+		if (!p->port || p->port->dev->ifindex != entry->ifindex)
+			continue;
+
+		if (p->port->state == BR_STATE_DISABLED)
+			goto unlock;
+
+		rcu_assign_pointer(*pp, p->next);
+		hlist_del_init(&p->mglist);
+		del_timer(&p->timer);
+		call_rcu_bh(&p->rcu, br_multicast_free_pg);
+		err = 0;
+
+		if (!mp->ports && !mp->mglist &&
+		    netif_running(br->dev))
+			mod_timer(&mp->timer, jiffies);
+		break;
+	}
+
+unlock:
+	spin_unlock_bh(&br->multicast_lock);
+	return err;
+}
+
+static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net_device *dev;
+	struct br_mdb_entry *entry;
+	struct net_bridge *br;
+	int err;
+
+	err = br_mdb_parse(skb, nlh, &dev, &entry);
+	if (err < 0)
+		return err;
+
+	br = netdev_priv(dev);
+
+	err = __br_mdb_del(br, entry);
+	if (!err)
+		__br_mdb_notify(dev, entry, RTM_DELMDB);
+	return err;
+}
+
 void br_mdb_init(void)
 {
 	rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, NULL);
+	rtnl_register(PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, NULL);
+	rtnl_register(PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, NULL);
 }
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index d929586..977c3ee 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -27,27 +27,14 @@
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
 #include <net/mld.h>
-#include <net/addrconf.h>
 #include <net/ip6_checksum.h>
 #endif
 
 #include "br_private.h"
 
-#define mlock_dereference(X, br) \
-	rcu_dereference_protected(X, lockdep_is_held(&br->multicast_lock))
-
 static void br_multicast_start_querier(struct net_bridge *br);
 unsigned int br_mdb_rehash_seq;
 
-#if IS_ENABLED(CONFIG_IPV6)
-static inline int ipv6_is_transient_multicast(const struct in6_addr *addr)
-{
-	if (ipv6_addr_is_multicast(addr) && IPV6_ADDR_MC_FLAG_TRANSIENT(addr))
-		return 1;
-	return 0;
-}
-#endif
-
 static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b)
 {
 	if (a->proto != b->proto)
@@ -104,8 +91,8 @@ static struct net_bridge_mdb_entry *__br_mdb_ip_get(
 	return NULL;
 }
 
-static struct net_bridge_mdb_entry *br_mdb_ip_get(
-	struct net_bridge_mdb_htable *mdb, struct br_ip *dst)
+struct net_bridge_mdb_entry *br_mdb_ip_get(struct net_bridge_mdb_htable *mdb,
+					   struct br_ip *dst)
 {
 	if (!mdb)
 		return NULL;
@@ -208,7 +195,7 @@ static int br_mdb_copy(struct net_bridge_mdb_htable *new,
 	return maxlen > elasticity ? -EINVAL : 0;
 }
 
-static void br_multicast_free_pg(struct rcu_head *head)
+void br_multicast_free_pg(struct rcu_head *head)
 {
 	struct net_bridge_port_group *p =
 		container_of(head, struct net_bridge_port_group, rcu);
@@ -584,9 +571,8 @@ err:
 	return mp;
 }
 
-static struct net_bridge_mdb_entry *br_multicast_new_group(
-	struct net_bridge *br, struct net_bridge_port *port,
-	struct br_ip *group)
+struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
+	struct net_bridge_port *port, struct br_ip *group)
 {
 	struct net_bridge_mdb_htable *mdb;
 	struct net_bridge_mdb_entry *mp;
@@ -633,6 +619,26 @@ out:
 	return mp;
 }
 
+struct net_bridge_port_group *br_multicast_new_port_group(
+			struct net_bridge_port *port,
+			struct br_ip *group,
+			struct net_bridge_port_group *next)
+{
+	struct net_bridge_port_group *p;
+
+	p = kzalloc(sizeof(*p), GFP_ATOMIC);
+	if (unlikely(!p))
+		return NULL;
+
+	p->addr = *group;
+	p->port = port;
+	p->next = next;
+	hlist_add_head(&p->mglist, &port->mglist);
+	setup_timer(&p->timer, br_multicast_port_group_expired,
+		    (unsigned long)p);
+	return p;
+}
+
 static int br_multicast_add_group(struct net_bridge *br,
 				  struct net_bridge_port *port,
 				  struct br_ip *group)
@@ -668,18 +674,9 @@ static int br_multicast_add_group(struct net_bridge *br,
 			break;
 	}
 
-	p = kzalloc(sizeof(*p), GFP_ATOMIC);
-	err = -ENOMEM;
+	p = br_multicast_new_port_group(port, group, *pp);
 	if (unlikely(!p))
 		goto err;
-
-	p->addr = *group;
-	p->port = port;
-	p->next = *pp;
-	hlist_add_head(&p->mglist, &port->mglist);
-	setup_timer(&p->timer, br_multicast_port_group_expired,
-		    (unsigned long)p);
-
 	rcu_assign_pointer(*pp, p);
 	br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2807c76..f21a739 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -434,10 +434,33 @@ extern int br_multicast_set_port_router(struct net_bridge_port *p,
 extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
+extern struct net_bridge_mdb_entry *br_mdb_ip_get(
+				struct net_bridge_mdb_htable *mdb,
+				struct br_ip *dst);
+extern struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
+				struct net_bridge_port *port, struct br_ip *group);
+extern void br_multicast_free_pg(struct rcu_head *head);
+extern struct net_bridge_port_group *br_multicast_new_port_group(
+				struct net_bridge_port *port,
+				struct br_ip *group,
+				struct net_bridge_port_group *next);
 extern void br_mdb_init(void);
 extern void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
 			  struct br_ip *group, int type);
 
+#define mlock_dereference(X, br) \
+	rcu_dereference_protected(X, lockdep_is_held(&br->multicast_lock))
+
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/addrconf.h>
+static inline int ipv6_is_transient_multicast(const struct in6_addr *addr)
+{
+	if (ipv6_addr_is_multicast(addr) && IPV6_ADDR_MC_FLAG_TRANSIENT(addr))
+		return 1;
+	return 0;
+}
+#endif
+
 static inline bool br_multicast_is_router(struct net_bridge *br)
 {
 	return br->multicast_router == 2 ||
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH net-next 1/2] bridge: notify mdb changes via netlink
From: Cong Wang @ 2012-12-12  8:23 UTC (permalink / raw)
  To: netdev
  Cc: bridge, Cong Wang, Herbert Xu, Stephen Hemminger, David S. Miller,
	Thomas Graf

From: Cong Wang <amwang@redhat.com>

As Stephen mentioned, we need to monitor the mdb
changes in user-space, so add notifications via netlink too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/uapi/linux/rtnetlink.h |    6 +++
 net/bridge/br_mdb.c            |   80 ++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_multicast.c      |    2 +
 net/bridge/br_private.h        |    2 +
 4 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 354a1e7..7a5eb19 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -125,6 +125,10 @@ enum {
 	RTM_GETNETCONF = 82,
 #define RTM_GETNETCONF RTM_GETNETCONF
 
+	RTM_NEWMDB = 84,
+#define RTM_NEWMDB RTM_NEWMDB
+	RTM_DELMDB = 85,
+#define RTM_DELMDB RTM_DELMDB
 	RTM_GETMDB = 86,
 #define RTM_GETMDB RTM_GETMDB
 
@@ -607,6 +611,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_IPV4_NETCONF	RTNLGRP_IPV4_NETCONF
 	RTNLGRP_IPV6_NETCONF,
 #define RTNLGRP_IPV6_NETCONF	RTNLGRP_IPV6_NETCONF
+	RTNLGRP_MDB,
+#define RTNLGRP_MDB		RTNLGRP_MDB
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index ccc43a9..a8cfbf5 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -155,6 +155,86 @@ out:
 	return skb->len;
 }
 
+static int nlmsg_populate_mdb_fill(struct sk_buff *skb,
+				   struct net_device *dev,
+				   struct br_mdb_entry *entry, u32 pid,
+				   u32 seq, int type, unsigned int flags)
+{
+	struct nlmsghdr *nlh;
+	struct br_port_msg *bpm;
+	struct nlattr *nest, *nest2;
+
+	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*bpm), NLM_F_MULTI);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	bpm = nlmsg_data(nlh);
+	bpm->family  = AF_BRIDGE;
+	bpm->ifindex = dev->ifindex;
+	nest = nla_nest_start(skb, MDBA_MDB);
+	if (nest == NULL)
+		goto cancel;
+	nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
+	if (nest2 == NULL)
+		goto end;
+
+	if (nla_put(skb, MDBA_MDB_ENTRY_INFO, sizeof(*entry), entry))
+		goto end;
+
+	nla_nest_end(skb, nest2);
+	nla_nest_end(skb, nest);
+	return nlmsg_end(skb, nlh);
+
+end:
+	nla_nest_end(skb, nest);
+cancel:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static inline size_t rtnl_mdb_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct br_port_msg))
+		+ nla_total_size(sizeof(struct br_mdb_entry));
+}
+
+static void __br_mdb_notify(struct net_device *dev, struct br_mdb_entry *entry,
+			    int type)
+{
+	struct net *net = dev_net(dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(rtnl_mdb_nlmsg_size(), GFP_ATOMIC);
+	if (!skb)
+		goto errout;
+
+	err = nlmsg_populate_mdb_fill(skb, dev, entry, 0, 0, type, NTF_SELF);
+	if (err < 0) {
+		kfree_skb(skb);
+		goto errout;
+	}
+
+	rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
+	return;
+errout:
+	rtnl_set_sk_err(net, RTNLGRP_MDB, err);
+}
+
+void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
+		   struct br_ip *group, int type)
+{
+	struct br_mdb_entry entry;
+
+	entry.ifindex = port->dev->ifindex;
+	entry.addr.proto = group->proto;
+	entry.addr.u.ip4 = group->u.ip4;
+#if IS_ENABLED(CONFIG_IPV6)
+	entry.addr.u.ip6 = group->u.ip6;
+#endif
+	__br_mdb_notify(dev, &entry, type);
+}
+
 void br_mdb_init(void)
 {
 	rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, NULL);
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 847b98a1..d929586 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -681,6 +681,7 @@ static int br_multicast_add_group(struct net_bridge *br,
 		    (unsigned long)p);
 
 	rcu_assign_pointer(*pp, p);
+	br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
 
 found:
 	mod_timer(&p->timer, now + br->multicast_membership_interval);
@@ -1240,6 +1241,7 @@ static void br_multicast_leave_group(struct net_bridge *br,
 			hlist_del_init(&p->mglist);
 			del_timer(&p->timer);
 			call_rcu_bh(&p->rcu, br_multicast_free_pg);
+			br_mdb_notify(br->dev, port, group, RTM_DELMDB);
 
 			if (!mp->ports && !mp->mglist &&
 			    netif_running(br->dev))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f95b766..2807c76 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -435,6 +435,8 @@ extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
 extern void br_mdb_init(void);
+extern void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
+			  struct br_ip *group, int type);
 
 static inline bool br_multicast_is_router(struct net_bridge *br)
 {
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/2] iproute2: add support to monitor mdb entries too
From: Cong Wang @ 2012-12-12  8:23 UTC (permalink / raw)
  To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger, Thomas Graf
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

This patch implements `bridge monitor mdb`.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
 bridge/br_common.h        |    2 ++
 bridge/mdb.c              |    4 ++--
 bridge/monitor.c          |   14 ++++++++++++++
 include/linux/rtnetlink.h |    2 ++
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 892fb76..10f6ce9 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -3,6 +3,8 @@ extern int print_linkinfo(const struct sockaddr_nl *who,
 			  void *arg);
 extern int print_fdb(const struct sockaddr_nl *who,
 		     struct nlmsghdr *n, void *arg);
+extern int print_mdb(const struct sockaddr_nl *who,
+		     struct nlmsghdr *n, void *arg);
 
 extern int do_fdb(int argc, char **argv);
 extern int do_mdb(int argc, char **argv);
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 4d8a896..121ce9c 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -82,8 +82,8 @@ int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	int len = n->nlmsg_len;
 	struct rtattr * tb[MDBA_MAX+1];
 
-	if (n->nlmsg_type != RTM_GETMDB) {
-		fprintf(stderr, "Not RTM_GETMDB: %08x %08x %08x\n",
+	if (n->nlmsg_type != RTM_GETMDB && n->nlmsg_type != RTM_NEWMDB && n->nlmsg_type != RTM_DELMDB) {
+		fprintf(stderr, "Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
 			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
 
 		return 0;
diff --git a/bridge/monitor.c b/bridge/monitor.c
index 2f60655..44e14d8 100644
--- a/bridge/monitor.c
+++ b/bridge/monitor.c
@@ -68,6 +68,12 @@ int accept_msg(const struct sockaddr_nl *who,
 			fprintf(fp, "[NEIGH]");
 		return print_fdb(who, n, arg);
 
+	case RTM_NEWMDB:
+	case RTM_DELMDB:
+		if (prefix_banner)
+			fprintf(fp, "[MDB]");
+		return print_mdb(who, n, arg);
+
 	case 15:
 		return show_mark(fp, n);
 
@@ -84,6 +90,7 @@ int do_monitor(int argc, char **argv)
 	unsigned groups = ~RTMGRP_TC;
 	int llink=0;
 	int lneigh=0;
+	int lmdb=0;
 
 	rtnl_close(&rth);
 
@@ -97,6 +104,9 @@ int do_monitor(int argc, char **argv)
 		} else if (matches(*argv, "fdb") == 0) {
 			lneigh = 1;
 			groups = 0;
+		} else if (matches(*argv, "mdb") == 0) {
+			lmdb = 1;
+			groups = 0;
 		} else if (strcmp(*argv, "all") == 0) {
 			groups = ~RTMGRP_TC;
 			prefix_banner=1;
@@ -116,6 +126,10 @@ int do_monitor(int argc, char **argv)
 		groups |= nl_mgrp(RTNLGRP_NEIGH);
 	}
 
+	if (lmdb) {
+		groups |= nl_mgrp(RTNLGRP_MDB);
+	}
+
 	if (file) {
 		FILE *fp;
 		fp = fopen(file, "r");
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 3ea85dc..87452b4 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -609,6 +609,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_IPV4_NETCONF	RTNLGRP_IPV4_NETCONF
 	RTNLGRP_IPV6_NETCONF,
 #define RTNLGRP_IPV6_NETCONF	RTNLGRP_IPV6_NETCONF
+	RTNLGRP_MDB,
+#define RTNLGRP_MDB		RTNLGRP_MDB
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 1/2] iproute2: implement add/del mdb entry
From: Cong Wang @ 2012-12-12  8:23 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf, Stephen Hemminger, bridge, Cong Wang
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

This patch implements:

	bridge mdb { add | del } dev DEV port PORT grp GROUP

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
 bridge/mdb.c              |   76 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/if_bridge.h |    8 +++++
 include/linux/rtnetlink.h |    4 ++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 390d7f6..4d8a896 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,6 +28,7 @@ int filter_index;
 
 static void usage(void)
 {
+	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
 	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
 	exit(-1);
 }
@@ -153,11 +154,86 @@ static int mdb_show(int argc, char **argv)
 	return 0;
 }
 
+static int mdb_modify(int cmd, int flags, int argc, char **argv)
+{
+	struct {
+		struct nlmsghdr 	n;
+		struct br_port_msg	bpm;
+		char   			buf[1024];
+	} req;
+	struct br_mdb_entry entry;
+	char *d = NULL, *p = NULL, *grp = NULL;
+
+	memset(&req, 0, sizeof(req));
+	memset(&entry, 0, sizeof(entry));
+
+	req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct br_port_msg));
+	req.n.nlmsg_flags = NLM_F_REQUEST|flags;
+	req.n.nlmsg_type = cmd;
+	req.bpm.family = PF_BRIDGE;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "dev") == 0) {
+			NEXT_ARG();
+			d = *argv;
+		} else if (strcmp(*argv, "grp") == 0) {
+			NEXT_ARG();
+			grp = *argv;
+		} else {
+			if (strcmp(*argv, "port") == 0) {
+				NEXT_ARG();
+				p = *argv;
+			}
+			if (matches(*argv, "help") == 0)
+				usage();
+		}
+		argc--; argv++;
+	}
+
+	if (d == NULL || grp == NULL || p == NULL) {
+		fprintf(stderr, "Device, group address and port name are required arguments.\n");
+		exit(-1);
+	}
+
+	req.bpm.ifindex = ll_name_to_index(d);
+	if (req.bpm.ifindex == 0) {
+		fprintf(stderr, "Cannot find device \"%s\"\n", d);
+		return -1;
+	}
+
+	entry.ifindex = ll_name_to_index(p);
+	if (entry.ifindex == 0) {
+		fprintf(stderr, "Cannot find device \"%s\"\n", p);
+		return -1;
+	}
+
+	if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) {
+		if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) {
+			fprintf(stderr, "Invalid address \"%s\"\n", grp);
+			return -1;
+		} else
+			entry.addr.proto = htons(ETH_P_IPV6);
+	} else
+		entry.addr.proto = htons(ETH_P_IP);
+
+	addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
+
+	if (rtnl_talk(&rth, &req.n, 0, 0, NULL) < 0)
+		exit(2);
+
+	return 0;
+}
+
 int do_mdb(int argc, char **argv)
 {
 	ll_init_map(&rth);
 
 	if (argc > 0) {
+		if (matches(*argv, "add") == 0)
+			return mdb_modify(RTM_NEWMDB, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
+		if (matches(*argv, "delete") == 0)
+			return mdb_modify(RTM_DELMDB, 0, argc-1, argv+1);
+
 		if (matches(*argv, "show") == 0 ||
 		    matches(*argv, "lst") == 0 ||
 		    matches(*argv, "list") == 0)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 151a8bb..b3b6a67 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -157,6 +157,7 @@ enum {
 #define MDBA_ROUTER_MAX (__MDBA_ROUTER_MAX - 1)
 
 struct br_port_msg {
+	__u8  family;
 	__u32 ifindex;
 };
 
@@ -171,4 +172,11 @@ struct br_mdb_entry {
 	} addr;
 };
 
+enum {
+	MDBA_SET_ENTRY_UNSPEC,
+	MDBA_SET_ENTRY,
+	__MDBA_SET_ENTRY_MAX,
+};
+#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
+
 #endif /* _LINUX_IF_BRIDGE_H */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c82a159..3ea85dc 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -125,6 +125,10 @@ enum {
 	RTM_GETNETCONF = 82,
 #define RTM_GETNETCONF RTM_GETNETCONF
 
+	RTM_NEWMDB = 84,
+#define RTM_NEWMDB RTM_NEWMDB
+	RTM_DELMDB = 85,
+#define RTM_DELMDB RTM_DELMDB
 	RTM_GETMDB = 86,
 #define RTM_GETMDB RTM_GETMDB
 
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH net-next v5] bridge: export multicast database via netlink
From: Cong Wang @ 2012-12-12  7:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, bridge, Herbert Xu, Jesper Dangaard Brouer, Thomas Graf,
	David S. Miller
In-Reply-To: <20121211164856.37ce94fe@nehalam.linuxnetplumber.net>

On Tue, 2012-12-11 at 16:48 -0800, Stephen Hemminger wrote:    
> 
> Applied, but required some manual fixing. It required adding if_bridge.h
> to include/linux in iproute2 exported headers. Also patch still had some fuzz
> against current version.
> 

Thanks, Stephen!

I thought those headers are sync'ed with kernel headers automatically,
so we have to keep them up to date manually.

^ permalink raw reply

* Re: [PATCH net-next] pkt_sched: avoid requeues if possible
From: David Miller @ 2012-12-12  5:24 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, jhs, john.r.fastabend
In-Reply-To: <1355277273.27891.166.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Tue, 11 Dec 2012 17:54:33 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> With BQL being deployed, we can more likely have following behavior :
> 
> We dequeue a packet from qdisc in dequeue_skb(), then we realize target
> tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
> skb into gso_skb for later.
> 
> This shows in stats (tc -s qdisc dev eth0) as requeues.
> 
> Problem of these requeues is that high priority packets can not be
> dequeued as long as this (possibly low prio and big TSO packet) is not
> removed from gso_skb.
> 
> At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
> 
> In some cases, we know that all packets dequeued from a qdisc are
> for a particular and known txq :
> 
> - If device is non multi queue
> - For all MQ/MQPRIO slave qdiscs
> 
> This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
> this capability, so that dequeue_skb() is allowed to dequeue a packet
> only if the associated txq is not stopped.
> 
> This indeed reduce latencies for high prio packets (or improve fairness
> with sfq/fq_codel), and almost remove qdisc 'requeues'.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Miller @ 2012-12-12  5:24 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <20121212.002345.2154152659215725592.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 12 Dec 2012 00:23:45 -0500 (EST)

> From: David Woodhouse <dwmw2@infradead.org>
> Date: Wed, 12 Dec 2012 00:57:14 +0000
> 
>> We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
>> If there wasn't another skb ready to transmit immediately, this led to a
>> double-free because we'd free it *again* next time we did have a packet to
>> send.
>> 
>> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: stable@kernel.org
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Sorry, fingers slipped, I meant "Applied" :-)

^ permalink raw reply

* Re: [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Miller @ 2012-12-12  5:23 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <1355273834.23544.37.camel@shinybook.infradead.org>

From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 12 Dec 2012 00:57:14 +0000

> We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
> If there wasn't another skb ready to transmit immediately, this led to a
> double-free because we'd free it *again* next time we did have a packet to
> send.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@kernel.org

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH 4/5] net: sfc: fix return value check in efx_ptp_probe_channel().
From: David Miller @ 2012-12-12  5:15 UTC (permalink / raw)
  To: tipecaml
  Cc: linux-kernel, kernel-janitors, linux-net-drivers, bhutchings,
	netdev
In-Reply-To: <1355271894-5284-5-git-send-email-tipecaml@gmail.com>

From: Cyril Roelandt <tipecaml@gmail.com>
Date: Wed, 12 Dec 2012 01:24:53 +0100

> The ptp_clock_register() returns ERR_PTR() and never returns NULL. Replace the
> NULL check by a call to IS_ERR().
> 
> Signed-off-by: Cyril Roelandt <tipecaml@gmail.com>

I'll let Ben queue this up.

Probably he'll want to avoid potentially leaving an ERR_PTR
in ptp->phc_clock even if, with this fix, that would be
harmless.

^ permalink raw reply

* Re: [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Jason Wang @ 2012-12-12  3:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <20121211123012.GB15435@redhat.com>

On 12/11/2012 08:30 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 11, 2012 at 07:03:47PM +0800, Jason Wang wrote:
>> Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
>> and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
>> queues. Sometimes, userspace such as qemu need to the ability to enable and
>> disable a specific queue without priveledge since guest operating system may
>> change the number of queues it want use.
>>
>> To support this kind of ability, this patch introduce a flag enabled which is
>> used to track whether the queue is enabled by userspace. And also restrict that
>> only one deivce could be used for a queue to attach. With this patch, the DAC
>> checking when adding queues through IFF_ATTACH_QUEUE is still done and after
>> this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE  could be used to disable/enable this
>> queue.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/net/tun.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----
>>  1 files changed, 73 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d593f56..43831a7 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -138,6 +138,7 @@ struct tun_file {
>>  	/* only used for fasnyc */
>>  	unsigned int flags;
>>  	u16 queue_index;
>> +	bool enabled;
>>  };
>>  
>>  struct tun_flow_entry {
>> @@ -345,9 +346,11 @@ unlock:
>>  static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>>  {
>>  	struct tun_struct *tun = netdev_priv(dev);
>> +	struct tun_file *tfile;
>>  	struct tun_flow_entry *e;
>>  	u32 txq = 0;
>>  	u32 numqueues = 0;
>> +	int i;
>>  
>>  	rcu_read_lock();
>>  	numqueues = tun->numqueues;
>> @@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>>  			txq -= numqueues;
>>  	}
>>  
>> +	tfile = rcu_dereference(tun->tfiles[txq]);
>> +	if (unlikely(!tfile->enabled))
> This unlikely tag is suspicious. It should be perfectly
> legal to use less queues than created.

Ok. will remove this check.
>
>> +		/* tun_detach() should make sure there's at least one queue
>> +		 * could be used to do the tranmission.
>> +		 */
>> +		for (i = 0; i < numqueues; i++) {
>> +			tfile = rcu_dereference(tun->tfiles[i]);
>> +			if (tfile->enabled) {
>> +				txq = i;
>> +				break;
>> +			}
>> +		}
>> +
> Worst case this will do a linear scan over all queueus on each packet.
> Instead, I think we need a list of all queues and only install
> the active ones in the array.

Another method is using another variable e.g. active_queues to track how
many queues were enabled. And re-shuffle the pointers during
detaching/attaching to make sure [0, active_queues) to be enabled
queues, and [active_queues, num_queues) to be disabled queues. Then we
could avoid this issue.
>
>>  	rcu_read_unlock();
>>  	return txq;
>>  }
>> @@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>>  	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
>>  }
>>  
>> +static int tun_enable(struct tun_file *tfile)
>> +{
>> +	if (tfile->enabled == true)
> simply if (tfile->enabled)

Right.
>> +		return -EINVAL;
> Actually it's better to have operations be
> idempotent. If it's enabled, enabling should
> be a NOP not an error.

Ok.
>> +
>> +	tfile->enabled = true;
>> +	return 0;
>> +}
>> +
>> +static int tun_disable(struct tun_file *tfile)
>> +{
>> +	struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
>> +							   lockdep_rtnl_is_held());
>> +	u16 index = tfile->queue_index;
>> +
>> +	if (!tun)
>> +		return -EINVAL;
>> +
>> +	if (tun->numqueues == 1)
>> +		return -EINVAL;
> So if there's a single queue we can't disable it,
> but if there are > 1 we can disable them all.
> This seems arbitrary.
>

The question is whether we can allow the userspace to disable all queues
which looks useless to me. So I try to forbid this.
>> +
>> +	BUG_ON(index >= tun->numqueues);
>> +	tfile->enabled = false;
>> +
>> +	synchronize_net();
>> +	tun_flow_delete_by_queue(tun, index);
>> +
>> +	return 0;
>> +}
>> +
>>  static void __tun_detach(struct tun_file *tfile, bool clean)
>>  {
>>  	struct tun_file *ntfile;
>> @@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
>>  		BUG_ON(!tfile);
>>  		wake_up_all(&tfile->wq.wait);
>>  		rcu_assign_pointer(tfile->tun, NULL);
>> +		tfile->enabled = false;
>>  		--tun->numqueues;
>>  	}
>>  	BUG_ON(tun->numqueues != 0);
>> @@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>>  	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>  	sock_hold(&tfile->sk);
>>  	tun->numqueues++;
>> +	tfile->enabled = true;
>>  
>>  	tun_set_real_num_queues(tun);
>>  
>> @@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>  	if (txq >= tun->numqueues)
>>  		goto drop;
>>  
>> +	/* Drop packet if the queue was not enabled */
>> +	if (!tfile->enabled)
>> +		goto drop;
>> +
>>  	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>>  
>>  	BUG_ON(!tfile);
>> @@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>  	bool zerocopy = false;
>>  	int err;
>>  
>> +	if (!tfile->enabled)
>> +		return -EINVAL;
>> +
>>  	if (!(tun->flags & TUN_NO_PI)) {
>>  		if ((len -= sizeof(pi)) > total_len)
>>  			return -EINVAL;
>> @@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>>  	struct tun_pi pi = { 0, skb->protocol };
>>  	ssize_t total = 0;
>>  
>> +	if (!tfile->enabled)
>> +		return -EINVAL;
>> +
>>  	if (!(tun->flags & TUN_NO_PI)) {
>>  		if ((len -= sizeof(pi)) < 0)
>>  			return -EINVAL;
>> @@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>>  		if (dev->netdev_ops != &tap_netdev_ops &&
>>  			dev->netdev_ops != &tun_netdev_ops)
>>  			ret = -EINVAL;
>> -		else if (tun_not_capable(tun))
>> -			ret = -EPERM;
>> -		/* TUNSETIFF is needed to do permission checking */
>> -		else if (tun->numqueues == 0)
>> -			ret = -EPERM;
>> -		else
>> -			ret = tun_attach(tun, file);
>> +		else {
>> +			if (!rcu_dereference(tfile->tun)) {
> Should be rcu_dereference_protected.

True.
>
>> +				if (tun_not_capable(tun) ||
>> +				    tun->numqueues == 0)
>> +					ret = -EPERM;
>> +				else
>> +					ret = tun_attach(tun, file);
>> +			}
>> +			else {
>> +				/* FIXME: permission check? */
>> +				ret = tun_enable(tfile);
>> +			}
>> +		}
>>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
>> -		__tun_detach(tfile, false);
>> +		tun_disable(tfile);
>>  	else
>>  		ret = -EINVAL;
>>  
>> @@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>>  	tfile->socket.file = file;
>>  	tfile->socket.ops = &tun_socket_ops;
>>  
>> +	tfile->enabled = false;
>>  	sock_init_data(&tfile->socket, &tfile->sk);
>>  	sk_change_net(&tfile->sk, tfile->net);
>>  
>> -- 
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Jason Wang @ 2012-12-12  3:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <20121211124616.GC15435@redhat.com>

On 12/11/2012 08:46 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 11, 2012 at 07:03:45PM +0800, Jason Wang wrote:
>> This series is an rfc that tries to solve the issue that the queues of tuntap
>> could not be disabled/enabled by unpriveledged user. This is needed for
>> unpriveledge userspace such as qemu since guest may change the number of queues
>> at any time, qemu needs to configure the tuntap to disable/enable a specific
>> queue.
>>
>> Instead of introducting new flag/ioctls, this series tries to re-use the current
>> TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
>> IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
>> its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
>> a tuntap device, in this situation, previous DAC check is still done. 2)
>> re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
>> we can bypass some checking when we do during queue creating (the check need to
>> be done here needs discussion.
>>
>> Management software (such as libvirt) then can do:
>> - TUNSETIFF to creating device and queue 0
>> - TUNSETQUEUE to create the rest of queues
>> - Passing them to unpriveledge userspace (such as qemu)
> Sorry I find this somewhat confusing.
> Why doesn't management call TUNSETIFF to create all queues -
> seems cleaner, no? Also has the advantage that it works
> without selinux changes.

The issue is how to return those fds through TUNSETIFF. Looks like
there's no space in ifreq for TUNSETIFF, we need another new ioctls to
do this.
>
> So why don't we simply fix TUNSETQUEUE such that
> 1. It only works if already attached to device by TUNSETIFF
> 2. It does not attach/detach, instead simply enables/disables the queue

This is just what this patch does, the only different is when calling
TUNSETQUEUE through a fd without attaching to the device, it is used to
create the queue.
> This way no new flags, just tweak the semantics of the
> existing ones. Need to do this before 3.8 is out though
> otherwise we'll end up maintaining the old semantics forever.
>

Yes, I will try to solve this issue soon.
>> Then the unpriveledge userspace can enable and disable a specific queue through
>> IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.
>>
>> This is done by introducing a enabled flags were used to notify whether the
>> queue is enabled, and tuntap only send/receive packets when it was enabled.
>>
>> Please comment, thanks!
>>
>> Jason Wang (2):
>>   tuntap: forbid calling TUNSETQUEUE for a persistent device with no
>>     queues
>>   tuntap: allow unpriveledge user to enable and disable queues
>>
>>  drivers/net/tun.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 files changed, 73 insertions(+), 5 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tun: allow setting ethernet addresss while running
From: Jan Engelhardt @ 2012-12-12  3:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, jasowang
In-Reply-To: <1355188560-8388-1-git-send-email-shemminger@vyatta.com>

On Tuesday 2012-12-11 02:16, Stephen Hemminger wrote:

>This is a pure software device, and ok with live address change.
>--- a/drivers/net/tun.c
>+++ b/drivers/net/tun.c
>@@ -849,6 +849,7 @@ static void tun_net_init(struct net_device *dev)
> 		/* Ethernet TAP Device */
> 		ether_setup(dev);
> 		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
>+		dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
> 
> 		eth_hw_addr_random(dev);

Would this possibly apply to L2TP devices as well?

^ permalink raw reply

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
From: Tom Herbert @ 2012-12-12  3:14 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller
In-Reply-To: <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

> This patch adds ipv6 set action functionality. It allows to change
> traffic class, flow label, hop-limit, ipv6 source and destination
> address fields.
>
I have to wonder about these patches and the underlying design
direction.  Aren't these sort of things and more already implemented
by IPtables but in a modular and extensible fashion?  Has there been
any thought into hooking OVS to IP tables to leverage all the existing
functionality?

Thanks,
Tom

> Signed-off-by: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> ---
>  net/openvswitch/actions.c  |   93 ++++++++++++++++++++++++++++++++++++++++++++
>  net/openvswitch/datapath.c |   20 ++++++++++
>  2 files changed, 113 insertions(+)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 0811447..a58ed27 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -28,6 +28,7 @@
>  #include <linux/if_arp.h>
>  #include <linux/if_vlan.h>
>  #include <net/ip.h>
> +#include <net/ipv6.h>
>  #include <net/checksum.h>
>  #include <net/dsfield.h>
>
> @@ -162,6 +163,53 @@ static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh,
>         *addr = new_addr;
>  }
>
> +static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto,
> +                                __be32 addr[4], const __be32 new_addr[4])
> +{
> +       int transport_len = skb->len - skb_transport_offset(skb);
> +
> +       if (l4_proto == IPPROTO_TCP) {
> +               if (likely(transport_len >= sizeof(struct tcphdr)))
> +                       inet_proto_csum_replace16(&tcp_hdr(skb)->check, skb,
> +                                                 addr, new_addr, 1);
> +       } else if (l4_proto == IPPROTO_UDP) {
> +               if (likely(transport_len >= sizeof(struct udphdr))) {
> +                       struct udphdr *uh = udp_hdr(skb);
> +
> +                       if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
> +                               inet_proto_csum_replace16(&uh->check, skb,
> +                                                         addr, new_addr, 1);
> +                               if (!uh->check)
> +                                       uh->check = CSUM_MANGLED_0;
> +                       }
> +               }
> +       }
> +}
> +
> +static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
> +                         __be32 addr[4], const __be32 new_addr[4],
> +                         bool recalculate_csum)
> +{
> +       if (recalculate_csum)
> +               update_ipv6_checksum(skb, l4_proto, addr, new_addr);
> +
> +       skb->rxhash = 0;
> +       memcpy(addr, new_addr, sizeof(__be32[4]));
> +}
> +
> +static void set_ipv6_tc(struct ipv6hdr *nh, u8 tc)
> +{
> +       nh->priority = tc >> 4;
> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0F) | ((tc & 0x0F) << 4);
> +}
> +
> +static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl)
> +{
> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0xF0) | (fl & 0x000F0000) >> 16;
> +       nh->flow_lbl[1] = (fl & 0x0000FF00) >> 8;
> +       nh->flow_lbl[2] = fl & 0x000000FF;
> +}
> +
>  static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl)
>  {
>         csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
> @@ -195,6 +243,47 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
>         return 0;
>  }
>
> +static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
> +{
> +       struct ipv6hdr *nh;
> +       int err;
> +       __be32 *saddr;
> +       __be32 *daddr;
> +
> +       err = make_writable(skb, skb_network_offset(skb) +
> +                           sizeof(struct ipv6hdr));
> +       if (unlikely(err))
> +               return err;
> +
> +       nh = ipv6_hdr(skb);
> +       saddr = (__be32 *)&nh->saddr;
> +       daddr = (__be32 *)&nh->daddr;
> +
> +       if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
> +               set_ipv6_addr(skb, ipv6_key->ipv6_proto, saddr,
> +                             ipv6_key->ipv6_src, true);
> +
> +       if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst))) {
> +               unsigned int offset = 0;
> +               int flags = IP6_FH_F_SKIP_RH;
> +               bool recalc_csum = true;
> +
> +               if (ipv6_ext_hdr(nh->nexthdr))
> +                       recalc_csum = ipv6_find_hdr(skb, &offset,
> +                                                   NEXTHDR_ROUTING, NULL,
> +                                                   &flags) != NEXTHDR_ROUTING;
> +
> +               set_ipv6_addr(skb, ipv6_key->ipv6_proto, daddr,
> +                             ipv6_key->ipv6_dst, recalc_csum);
> +       }
> +
> +       set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
> +       set_ipv6_fl(nh, ntohl(ipv6_key->ipv6_label));
> +       nh->hop_limit = ipv6_key->ipv6_hlimit;
> +
> +       return 0;
> +}
> +
>  /* Must follow make_writable() since that can move the skb data. */
>  static void set_tp_port(struct sk_buff *skb, __be16 *port,
>                          __be16 new_port, __sum16 *check)
> @@ -347,6 +436,10 @@ static int execute_set_action(struct sk_buff *skb,
>                 err = set_ipv4(skb, nla_data(nested_attr));
>                 break;
>
> +       case OVS_KEY_ATTR_IPV6:
> +               err = set_ipv6(skb, nla_data(nested_attr));
> +               break;
> +
>         case OVS_KEY_ATTR_TCP:
>                 err = set_tcp(skb, nla_data(nested_attr));
>                 break;
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 4c4b62c..fd4a6a4 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -479,6 +479,7 @@ static int validate_set(const struct nlattr *a,
>
>         switch (key_type) {
>         const struct ovs_key_ipv4 *ipv4_key;
> +       const struct ovs_key_ipv6 *ipv6_key;
>
>         case OVS_KEY_ATTR_PRIORITY:
>         case OVS_KEY_ATTR_ETHERNET:
> @@ -500,6 +501,25 @@ static int validate_set(const struct nlattr *a,
>
>                 break;
>
> +       case OVS_KEY_ATTR_IPV6:
> +               if (flow_key->eth.type != htons(ETH_P_IPV6))
> +                       return -EINVAL;
> +
> +               if (!flow_key->ip.proto)
> +                       return -EINVAL;
> +
> +               ipv6_key = nla_data(ovs_key);
> +               if (ipv6_key->ipv6_proto != flow_key->ip.proto)
> +                       return -EINVAL;
> +
> +               if (ipv6_key->ipv6_frag != flow_key->ip.frag)
> +                       return -EINVAL;
> +
> +               if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000)
> +                       return -EINVAL;
> +
> +               break;
> +
>         case OVS_KEY_ATTR_TCP:
>                 if (flow_key->ip.proto != IPPROTO_TCP)
>                         return -EINVAL;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] bnx2: Fix accidental reversions.
From: David Miller @ 2012-12-12  2:28 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1355279060-24192-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 11 Dec 2012 18:24:20 -0800

> Commit 4ce45e02469c382699f4c5f6df727aea9dd2e1ca
> "bnx2: Add BNX2 prefix to CHIP ID and name macros"
> 
> accidentally reverted 2 commits to use pci_ioumap() and to make
> pci_error_handlers const.  This fixes those mistakes.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net-next] pkt_sched: avoid requeues if possible
From: Eric Dumazet @ 2012-12-12  1:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jamal Hadi Salim, John Fastabend

From: Eric Dumazet <edumazet@google.com>

With BQL being deployed, we can more likely have following behavior :

We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :

- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs

This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.

This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    7 +++++++
 net/sched/sch_api.c       |    2 ++
 net/sched/sch_generic.c   |   11 ++++++-----
 net/sched/sch_mq.c        |    4 +++-
 net/sched/sch_mqprio.c    |    4 ++++
 5 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4616f46..1540f9c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -50,6 +50,13 @@ struct Qdisc {
 #define TCQ_F_INGRESS		2
 #define TCQ_F_CAN_BYPASS	4
 #define TCQ_F_MQROOT		8
+#define TCQ_F_ONETXQUEUE	0x10 /* dequeue_skb() can assume all skbs are for
+				      * q->dev_queue : It can test
+				      * netif_xmit_frozen_or_stopped() before
+				      * dequeueing next packet.
+				      * Its true for MQ/MQPRIO slaves, or non
+				      * multiqueue device.
+				      */
 #define TCQ_F_WARN_NONWC	(1 << 16)
 	int			padded;
 	const struct Qdisc_ops	*ops;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 4799c48..d84f7e7 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -833,6 +833,8 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out3;
 		}
 		lockdep_set_class(qdisc_lock(sch), &qdisc_tx_lock);
+		if (!netif_is_multiqueue(dev))
+			sch->flags |= TCQ_F_ONETXQUEUE;
 	}
 
 	sch->handle = handle;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index aefc150..5d81a44 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -53,20 +53,19 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 {
 	struct sk_buff *skb = q->gso_skb;
+	const struct netdev_queue *txq = q->dev_queue;
 
 	if (unlikely(skb)) {
-		struct net_device *dev = qdisc_dev(q);
-		struct netdev_queue *txq;
-
 		/* check the reason of requeuing without tx lock first */
-		txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
+		txq = netdev_get_tx_queue(txq->dev, skb_get_queue_mapping(skb));
 		if (!netif_xmit_frozen_or_stopped(txq)) {
 			q->gso_skb = NULL;
 			q->q.qlen--;
 		} else
 			skb = NULL;
 	} else {
-		skb = q->dequeue(q);
+		if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq))
+			skb = q->dequeue(q);
 	}
 
 	return skb;
@@ -686,6 +685,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 			netdev_info(dev, "activation failed\n");
 			return;
 		}
+		if (!netif_is_multiqueue(dev))
+			qdisc->flags |= TCQ_F_ONETXQUEUE;
 	}
 	dev_queue->qdisc_sleeping = qdisc;
 }
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 0a4b2f9..5da78a1 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -63,6 +63,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
 		if (qdisc == NULL)
 			goto err;
 		priv->qdiscs[ntx] = qdisc;
+		qdisc->flags |= TCQ_F_ONETXQUEUE;
 	}
 
 	sch->flags |= TCQ_F_MQROOT;
@@ -150,7 +151,8 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 		dev_deactivate(dev);
 
 	*old = dev_graft_qdisc(dev_queue, new);
-
+	if (new)
+		new->flags |= TCQ_F_ONETXQUEUE;
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);
 	return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index d1831ca..accec33 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -132,6 +132,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 			goto err;
 		}
 		priv->qdiscs[i] = qdisc;
+		qdisc->flags |= TCQ_F_ONETXQUEUE;
 	}
 
 	/* If the mqprio options indicate that hardware should own
@@ -205,6 +206,9 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 
 	*old = dev_graft_qdisc(dev_queue, new);
 
+	if (new)
+		new->flags |= TCQ_F_ONETXQUEUE;
+
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);
 

^ permalink raw reply related

* [PATCH net-next] bnx2: Fix accidental reversions.
From: Michael Chan @ 2012-12-12  2:24 UTC (permalink / raw)
  To: davem; +Cc: netdev

Commit 4ce45e02469c382699f4c5f6df727aea9dd2e1ca
"bnx2: Add BNX2 prefix to CHIP ID and name macros"

accidentally reverted 2 commits to use pci_ioumap() and to make
pci_error_handlers const.  This fixes those mistakes.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index c16526d..a1adfaf 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -8572,7 +8572,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 error:
-	iounmap(bp->regview);
+	pci_iounmap(pdev, bp->regview);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
@@ -8750,7 +8750,7 @@ static void bnx2_io_resume(struct pci_dev *pdev)
 	rtnl_unlock();
 }
 
-static struct pci_error_handlers bnx2_err_handler = {
+static const struct pci_error_handlers bnx2_err_handler = {
 	.error_detected	= bnx2_io_error_detected,
 	.slot_reset	= bnx2_io_slot_reset,
 	.resume		= bnx2_io_resume,
-- 
1.6.4.GIT

^ permalink raw reply related

* Re: netdevice wanrouter: Convert directly reference of netdev->priv
From: Paul Gortmaker @ 2012-12-12  0:58 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: wangchen, netdev, David Miller
In-Reply-To: <20121203090405.GA12089@elgon.mountain>

On Mon, Dec 3, 2012 at 4:04 AM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> Hello Wang Chen,
>
> The patch 7be6065b39c3: "netdevice wanrouter: Convert directly
> reference of netdev->priv" from Nov 20, 2008, leads to the following
> Smatch warning:
> net/wanrouter/wanmain.c:610 wanrouter_device_new_if()
>          error: potential NULL dereference 'dev'.
>
> This is an old patch from 2008.  It removed the allocation in
> wanrouter_device_new_if() so it looks like wanrouter has been completely
> broken for four years.

Hi Dan,

Crap -- wishing I'd seen this earlier.  There was an RFC patch for
sending wanrouter to the bitbucket from Joe Perches, but aside
from the obvious build failures in it that Dave found (and I fixed)
there wasn't any real feedback (either positive or negative) to it:

http://patchwork.ozlabs.org/patch/198830/

Knowing it has been non-functional for ~4 years is I think a key
bit of information in justifying a removal, so folks like yourself
and JuliaL don't waste cycles fixing/auditing dead code.  But it
will need to be 3.9 material now, it seems.

Paul.
--

>
> @@ -589,10 +591,6 @@ static int wanrouter_device_new_if(struct wan_device *wandev,
>                 err = -EPROTONOSUPPORT;
>                 goto out;
>         } else {
> -               dev = kzalloc(sizeof(struct net_device), GFP_KERNEL);
> -               err = -ENOBUFS;
> -               if (dev == NULL)
> -                       goto out;
>                 err = wandev->new_if(wandev, dev, cnf);
>
> "dev" is still NULL after the call to ->new_if().
>
>         }
>
> Here is what the code looks like now:
>
> net/wanrouter/wanmain.c
>    590          if (cnf->config_id == WANCONFIG_MPPP) {
>    591                  printk(KERN_INFO "%s: Wanpipe Mulit-Port PPP support has not been compiled in!\n",
>    592                                  wandev->name);
>    593                  err = -EPROTONOSUPPORT;
>    594                  goto out;
>    595          } else {
>
> We were supposed to allocate "dev" here.
>
>    596                  err = wandev->new_if(wandev, dev, cnf);
>    597          }
>    598
>    599          if (!err) {
>    600                  /* Register network interface. This will invoke init()
>    601                   * function supplied by the driver.  If device registered
>    602                   * successfully, add it to the interface list.
>    603                   */
>    604
>    605  #ifdef WANDEBUG
>    606                  printk(KERN_INFO "%s: registering interface %s...\n",
>    607                         wanrouter_modname, dev->name);
>    608  #endif
>    609
>    610                  err = register_netdev(dev);
>                               ^^^^^^^^^^^^^^^^^^^^
>
> The kernel will always oops inside the call to register_netdev() because
> "dev" is still NULL.
>
> I suspect we should just revert the patch?
>
> regards,
> dan carpenter
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Goodluck
From: Allen and Violet Large @ 2012-12-11 19:12 UTC (permalink / raw)





This is a personal email directed to you. My wife and I won a Jackpot
Lottery of $11.3 million in July and have voluntarily decided to donate
the sum of $500,000.00 USD to you as part of our own charity project to
improve the lot of 10 lucky individuals all over the world. If you have
received this email then you are one of the lucky recipients and all you
have to do is get back with us so that we can send your details to the
payout bank.
Please note that you have to contact my private email for more
informations (allen_violetlarge03@yahoo.co.jp)
You can verify this by visiting the web pages below.

http://www.dailymail.co.uk/news/article-1326473/Canadian-couple-Allen-Violet-Large-away-entire-11-2m-lottery-win.html

http://www.cbc.ca/news/canada/nova-scotia/story/2010/11/04/ns-allen-violet-large-lottery-winning.html


Goodluck,
Allen and Violet Large
Email: allen_violetlarge03@yahoo.co.jp

^ permalink raw reply

* [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Woodhouse @ 2012-12-12  0:57 UTC (permalink / raw)
  To: netdev; +Cc: nathan

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
If there wasn't another skb ready to transmit immediately, this led to a
double-free because we'd free it *again* next time we did have a packet to
send.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
---
 drivers/atm/solos-pci.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 6619a8a..c909b7b 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -945,10 +945,11 @@ static uint32_t fpga_tx(struct solos_card *card)
 	for (port = 0; tx_pending; tx_pending >>= 1, port++) {
 		if (tx_pending & 1) {
 			struct sk_buff *oldskb = card->tx_skb[port];
-			if (oldskb)
+			if (oldskb) {
 				pci_unmap_single(card->dev, SKB_CB(oldskb)->dma_addr,
 						 oldskb->len, PCI_DMA_TODEVICE);
-
+				card->tx_skb[port] = NULL;
+			}
 			spin_lock(&card->tx_queue_lock);
 			skb = skb_dequeue(&card->tx_queue[port]);
 			if (!skb)
-- 
1.8.0.1


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-12  0:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric W. Biederman, Michael Richardson, netdev, tcpdump-workers,
	Francesco Ruggeri
In-Reply-To: <1355267060.27891.139.camel@edumazet-glaptop>

On Tue, Dec 11, 2012 at 3:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-12-11 at 14:36 -0800, Ani Sinha wrote:
>> >
>> > It is possible to test for the presence of support of the new vlan bpf

> If you want to test ANCILLARY possible values, its already too late, as
> old kernels wont use any patch anyway.
>

So basically this means that if we generate a filter with these
special negative offset values and expect that the kernel will
complain if it does not recognize the newer values then we would be
wrong. And you are right. Old kernels never knew about them and the
code wasn't written in a way to return EINVAL if it didn't recognize a
special negative anciliary offset value.

^ permalink raw reply

* Re: [PATCH net-next v5] bridge: export multicast database via netlink
From: Stephen Hemminger @ 2012-12-12  0:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bridge, Herbert Xu, Jesper Dangaard Brouer, Thomas Graf,
	David S. Miller
In-Reply-To: <1354874688-24564-1-git-send-email-amwang@redhat.com>

On Fri,  7 Dec 2012 18:04:48 +0800
Cong Wang <amwang@redhat.com> wrote:

> From: Cong Wang <amwang@redhat.com>
> 
> V5: fix two bugs pointed out by Thomas
>     remove seq check for now, mark it as TODO
> 
> V4: remove some useless #include
>     some coding style fix
> 
> V3: drop debugging printk's
>     update selinux perm table as well
> 
> V2: drop patch 1/2, export ifindex directly
>     Redesign netlink attributes
>     Improve netlink seq check
>     Handle IPv6 addr as well
> 
> This patch exports bridge multicast database via netlink
> message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
> We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
> 
> (Thanks to Thomas for patient reviews)
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>
>     

Applied, but required some manual fixing. It required adding if_bridge.h
to include/linux in iproute2 exported headers. Also patch still had some fuzz
against current version.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox