Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Patch 1/1] net/phy: Add interrupt support for dp83640 phy.
From: Richard Cochran @ 2012-12-05 10:05 UTC (permalink / raw)
  To: Stephan Gatzka; +Cc: netdev, davem
In-Reply-To: <1354652498-16573-1-git-send-email-stephan.gatzka@gmail.com>

On Tue, Dec 04, 2012 at 09:21:38PM +0100, Stephan Gatzka wrote:
> Added functions for ack_interrupt and config_intr. Tested on an mpc5200b
> powerpc board.
> 
> Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>

The patch looks okay to me, but I worry that this might fail on boards
which have not connected the phyer's PWERDOWN/INTN pin to anything.
Such designs really need the PHY_POLL working.

Taking a brief glance at the drivers for two such boards I know of
(m5234bcc and an IXP), it looks like their MAC drivers set mii_bus irq
to PHY_POLL, so it might work fine, but this patch still makes me
nervous that some other board might break.

Maybe this should be a kconfig option?

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH -next v2] ipw2200: return error code on error in ipw_wx_get_auth()
From: Stanislav Yakovlev @ 2012-12-05  9:44 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: linville, yongjun_wei, linux-wireless, netdev
In-Reply-To: <CAPgLHd_AN+Tr5EQFBZMH0FZCp6ESHVURZHvd9SuW=a2PYkPoXw@mail.gmail.com>

On 5 December 2012 00:08, Wei Yongjun <weiyj.lk@gmail.com> wrote:
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
>
> We have assinged error code to 'ret' when get auth from some
> option is not supported but never used it, but we'd better return
> the error code.
>
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Looks fine, thanks.

Stanislav.

> ---
>  drivers/net/wireless/ipw2x00/ipw2200.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
> index 482f505..b0879ad 100644
> --- a/drivers/net/wireless/ipw2x00/ipw2200.c
> +++ b/drivers/net/wireless/ipw2x00/ipw2200.c
> @@ -6812,7 +6812,6 @@ static int ipw_wx_get_auth(struct net_device *dev,
>         struct libipw_device *ieee = priv->ieee;
>         struct lib80211_crypt_data *crypt;
>         struct iw_param *param = &wrqu->param;
> -       int ret = 0;
>
>         switch (param->flags & IW_AUTH_INDEX) {
>         case IW_AUTH_WPA_VERSION:
> @@ -6822,8 +6821,7 @@ static int ipw_wx_get_auth(struct net_device *dev,
>                 /*
>                  * wpa_supplicant will control these internally
>                  */
> -               ret = -EOPNOTSUPP;
> -               break;
> +               return -EOPNOTSUPP;
>
>         case IW_AUTH_TKIP_COUNTERMEASURES:
>                 crypt = priv->ieee->crypt_info.crypt[priv->ieee->crypt_info.tx_keyidx];
>
>

^ permalink raw reply

* RE: [PATCH v2] net/macb: Use non-coherent memory for rx buffers
From: David Laight @ 2012-12-05  9:35 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: David S. Miller, netdev, linux-arm-kernel, linux-kernel,
	Joachim Eastwood, Jean-Christophe PLAGNIOL-VILLARD,
	Havard Skinnemoen
In-Reply-To: <50BE2FEC.2070500@atmel.com>

> If I understand well, you mean that the call to:
> 
> 		dma_sync_single_range_for_device(&bp->pdev->dev, phys,
> 				pg_offset, frag_len, DMA_FROM_DEVICE);
> 
> in the rx path after having copied the data to skb is not needed?
> That is also the conclusion that I found after having thinking about
> this again... I will check this.

You need to make sure that the memory isn't in the data cache
when you give the rx buffer back to the MAC.
(and ensure the cpu doesn't read it until the rx is complete.)
I've NFI what that dma_sync call does - you need to invalidate
the cache lines.

> For the CRC, my driver is not using the CRC offloading feature for the
> moment. So no CRC is written by the device.

I was thinking it would matter if the MAC wrote the CRC into the
buffer (even though it was excluded from the length).
It doesn't - you only need to worry about data you've read.

> > I was wondering if the code needs to do per page allocations?
> > Perhaps that is necessary to avoid needing a large block of
> > contiguous physical memory (and virtual addresses)?
> 
> The page management seems interesting for future management of RX
> buffers as skb fragments: that will allow to avoid copying received data.

Dunno - the complexities of such buffer loaning schemes often
exceed the gain of avoiding the data copy.
Using buffers allocated to the skb is a bit different - since
you completely forget about the memory once you pass the skb
upstream.

Some quick sums indicate you might want to allocate 8k memory
blocks and split into 5 buffers.

	David

^ permalink raw reply

* Re: ip_rt_min_pmtu
From: Christopher Schramm @ 2012-12-05  9:33 UTC (permalink / raw)
  To: Rami Rosen; +Cc: netdev
In-Reply-To: <CAKoUArn0EYaeebFXtCgqDSYeJD2oFKUfns7Lbk7Rsg=N=pnpHA@mail.gmail.com>

On Wed, 05 Dec 2012 09:46:18 +0100, Rami Rosen wrote:
> But RFC 791 also declares 576 as PMTU:
> "All hosts must be prepared to accept datagrams of up to 576 octets".

That's the common mistake I mentioned. The sentence goes on: "(whether 
they arrive whole or in fragments)", so it says nothing about lower 
layers, especially not about the MTU.

The 576 octects IP datagram every implementation is required to be able 
to handle could be transferred in 12 up to over 60 fragments if we had 
the minimal PMTU of 68, depending on the header size. Of course that 
leads to an overhead of nearly 90 percent, but it should be possible 
following the RFC.

With Linux, trying to send large data over a path with PMTU < 552 will 
probably fail (unless you change the min_pmtu value).

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-12-05  9:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Florian Westphal, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <20121204133007.20215.52566.stgit@dragon>

First of all, this patch contains a small bug (see below), which
resulted in me not testing the correct patch...

Second, this patch does NOT behave as I expected and claimed.  Thus, my
conclusions, in my previous respond might be wrong!

The previous evictor patch of letting new fragments enter, worked
amazingly well.  But I suspect, this might also be related to a
bug/problem in the evictor loop (which were being hidden by that patch).

My new *theory* is that the evictor loop, will be looping too much, if
it finds a fragment which is INET_FRAG_COMPLETE ... in that case, we
don't advance the LRU list, and thus will pickup the exact same
inet_frag_queue again in the loop... to get out of the loop we need
another CPU or packet to change the LRU list for us... I'll test that
theory... (its could also be CPUs fighting over the same LRU head
element that cause this) ... more to come...

On Tue, 2012-12-04 at 14:30 +0100, Jesper Dangaard Brouer wrote:
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 4750d2b..d8bf59b 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -178,6 +178,16 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
>  
>  		q = list_first_entry(&nf->lru_list,
>  				struct inet_frag_queue, lru_list);
> +
> +		/* When head of LRU is very new/warm, then the head is
> +		 * most likely the one with most fragments and the
> +		 * tail with least, thus drop tail
> +		 */
> +		if (!force && q->creation_ts == (u32) jiffies) {
> +			q = list_entry(&nf->lru_list.prev,

Remove the "&" in &nf->lru_list.prev

> +				struct inet_frag_queue, lru_list);
> +		}
> +
>  		atomic_inc(&q->refcnt);
>  		read_unlock(&f->lock);

^ permalink raw reply

* Re: ip_rt_min_pmtu
From: Rami Rosen @ 2012-12-05  8:46 UTC (permalink / raw)
  To: Christopher Schramm; +Cc: netdev
In-Reply-To: <50BE654D.2010602@shakaweb.org>

Hi,
Just a short note:
RFC 791 indeed set 68 for internet module MTU.

But RFC 791 also declares 576 as PMTU:
"All hosts must be prepared to accept datagrams of up to 576 octets".
and it says also:
"The number 576 is selected to allow a reasonable sized data block to
be transmitted in addition to the required header information."

It seems that there is a distinction between a module sending MTU and
hosts receiving MTU.

Regarding the historical details of why it was sent at that time  -
I don't have an idea.

Regards,
Rami Rosen
http://ramirose.wix.com/ramirosen



On Tue, Dec 4, 2012 at 11:04 PM, Christopher Schramm
<netdev@shakaweb.org> wrote:
> Hi,
>
> I'm looking into an interesting detail of the Linux IPv4 implementation I
> stumbled upon during a University course.
>
> In route.c there's a value ip_rt_min_pmtu, defined as 512 + 20 + 20, that
> tells Linux a minimum PMTU to use, even if e. g. an ICMP message tells it to
> set a smaller one.
>
> Of course, this is not a problem in real world, but not standard-compliant,
> since RFC 791 defines a minimum MTU of 68 for IPv4. So I wonder what's the
> reason for the restriction.
>
> I looked into it and found that it appeared in Linux 2.3.15 with the
> following ID in route.c:
>
> v 1.71 1999/08/20 11:05:58 davem
>
> While it was not present in Linux 2.3.14 with:
>
> v 1.69 1999/06/09 10:11:02 davem
>
> I couldn't find any related discussion or patch on the LKML around that
> dates, so I'm asking you for any hints to find out the reason for
> implementing this lower bound.
>
> What I've found on the LKML is a topic around February 15th, 2001, titled
> "MTU and 2.4.x kernel", where Alexey Kuznetsov points out that the handling
> of "DF on syn frames" is broken for MTUs smaller than 128 and "Preventing
> DoSes requires to block pmtu discovery at 576 or at least 552".
>
> Does anybody know the actual reason for the change in 2.3.15? I first
> thought it's the common misinterpretation that 576 would be the lower bound
> for MTUs in IPv4, but I wonder why it was put in place as a patch years
> after the IPv4 implementation was already done. There seems to have been
> some clear reason for it. I also wonder why it has never been removed up to
> today if it's really nothing more than a mistake.
>
> Would be great if someone could help me shed some light on this.
>
> Regards
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Patch net-next] xfrm: add missing xfrm message types to selinux perm table
From: Steffen Klassert @ 2012-12-05  8:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, Herbert Xu, David S. Miller, Eric Paris,
	linux-security-module
In-Reply-To: <1354693368-19494-1-git-send-email-amwang@redhat.com>

Ccing Eric Paris and linux-security-module.

On Wed, Dec 05, 2012 at 03:42:48PM +0800, Cong Wang wrote:
> SElinux perm table is not up-to-date.
> 
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Cong Wang <amwang@redhat.com>
> 
> ---
> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
> index d309e7f..cc191bc 100644
> --- a/security/selinux/nlmsgtab.c
> +++ b/security/selinux/nlmsgtab.c
> @@ -93,6 +93,13 @@ static struct nlmsg_perm nlmsg_xfrm_perms[] =
>  	{ XFRM_MSG_FLUSHPOLICY,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
>  	{ XFRM_MSG_NEWAE,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
>  	{ XFRM_MSG_GETAE,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
> +	{ XFRM_MSG_REPORT,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
> +	{ XFRM_MSG_MIGRATE,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
> +	{ XFRM_MSG_NEWSADINFO,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
> +	{ XFRM_MSG_GETSADINFO,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
> +	{ XFRM_MSG_NEWSPDINFO,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
> +	{ XFRM_MSG_GETSPDINFO,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
> +	{ XFRM_MSG_MAPPING,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
>  };
>  
>  static struct nlmsg_perm nlmsg_audit_perms[] =

I'm not the maintainer of the file which this patch changes,
but I could take it into the ipsec-next tree if the selinux
people are fine with that.

^ permalink raw reply

* Re: WARNING: drivers/net/ethernet/dlink/sundance.o(.text+0x2e87): Section mismatch in reference from the function sundance_probe1() to the variable .devinit.rodata:sundance_pci_tbl
From: Denis Kirjanov @ 2012-12-05  8:12 UTC (permalink / raw)
  To: kbuild test robot; +Cc: Bill Pemberton, netdev, Greg Kroah-Hartman
In-Reply-To: <50be77ca.95imDYp6GlyHjRuw%fengguang.wu@intel.com>

I"ll fix it.

Thanks.

On 12/5/12, kbuild test robot <fengguang.wu@intel.com> wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> master
> head:   193c1e478cc496844fcbef402a10976c95a634ff
> commit: 64bc40de134bb5c7826ff384016f654219ed3956 dlink: remove __dev*
> attributes
> date:   27 hours ago
> config: make ARCH=x86_64 allmodconfig
>
> All warnings:
>
> WARNING: drivers/net/ethernet/dlink/sundance.o(.text+0x2e87): Section
> mismatch in reference from the function sundance_probe1() to the variable
> .devinit.rodata:sundance_pci_tbl
> The function sundance_probe1() references
> the variable __devinitconst sundance_pci_tbl.
> This is often because sundance_probe1 lacks a __devinitconst
> annotation or the annotation of sundance_pci_tbl is wrong.
>
> ---
> 0-DAY kernel build testing backend         Open Source Technology Center
> Fengguang Wu, Yuanhan Liu                              Intel Corporation
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH -next v2] ipw2200: return error code on error in ipw_wx_get_auth()
From: Wei Yongjun @ 2012-12-05  8:08 UTC (permalink / raw)
  To: stas.yakovlev, linville; +Cc: yongjun_wei, linux-wireless, netdev

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

We have assinged error code to 'ret' when get auth from some
option is not supported but never used it, but we'd better return
the error code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 drivers/net/wireless/ipw2x00/ipw2200.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 482f505..b0879ad 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -6812,7 +6812,6 @@ static int ipw_wx_get_auth(struct net_device *dev,
 	struct libipw_device *ieee = priv->ieee;
 	struct lib80211_crypt_data *crypt;
 	struct iw_param *param = &wrqu->param;
-	int ret = 0;
 
 	switch (param->flags & IW_AUTH_INDEX) {
 	case IW_AUTH_WPA_VERSION:
@@ -6822,8 +6821,7 @@ static int ipw_wx_get_auth(struct net_device *dev,
 		/*
 		 * wpa_supplicant will control these internally
 		 */
-		ret = -EOPNOTSUPP;
-		break;
+		return -EOPNOTSUPP;
 
 	case IW_AUTH_TKIP_COUNTERMEASURES:
 		crypt = priv->ieee->crypt_info.crypt[priv->ieee->crypt_info.tx_keyidx];

^ permalink raw reply related

* [Patch net-next] xfrm: add missing xfrm message types to selinux perm table
From: Cong Wang @ 2012-12-05  7:42 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Cong Wang

SElinux perm table is not up-to-date.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index d309e7f..cc191bc 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -93,6 +93,13 @@ static struct nlmsg_perm nlmsg_xfrm_perms[] =
 	{ XFRM_MSG_FLUSHPOLICY,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
 	{ XFRM_MSG_NEWAE,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
 	{ XFRM_MSG_GETAE,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
+	{ XFRM_MSG_REPORT,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
+	{ XFRM_MSG_MIGRATE,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
+	{ XFRM_MSG_NEWSADINFO,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
+	{ XFRM_MSG_GETSADINFO,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
+	{ XFRM_MSG_NEWSPDINFO,	NETLINK_XFRM_SOCKET__NLMSG_WRITE },
+	{ XFRM_MSG_GETSPDINFO,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
+	{ XFRM_MSG_MAPPING,	NETLINK_XFRM_SOCKET__NLMSG_READ  },
 };

 static struct nlmsg_perm nlmsg_audit_perms[] =

^ permalink raw reply related

* Re: [Patch net-next] xfrm: add missing xfrm message types to selinux perm table
From: Steffen Klassert @ 2012-12-05  7:39 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev, herbert
In-Reply-To: <1354690585.28951.10.camel@cr0>

On Wed, Dec 05, 2012 at 02:56:25PM +0800, Cong Wang wrote:
> On Wed, 2012-12-05 at 06:41 +0100, Steffen Klassert wrote:
> > On Tue, Dec 04, 2012 at 01:02:57PM -0500, David Miller wrote:
> > > From: Cong Wang <amwang@redhat.com>
> > > Date: Tue,  4 Dec 2012 13:39:31 +0800
> > > 
> > > > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Cc: "David S. Miller" <davem@davemloft.net>
> > > > Signed-off-by: Cong Wang <amwang@redhat.com>
> > > 
> > > Steffen will you take this into your IPSEC tree?
> > > 
> > 
> > I would do so, but I have not seen this patch by now.
> > Was it really send to me? I have not even seen it on the
> > netdev list.
> > 
> > Anyway, I don't have it, please resend.
> > 
> 
> Weird, you are already Cc'ed in the original patch, Cc: Steffen Klassert
> <steffen.klassert@secunet.com>.
> 

Even patchwork does not list this patch, I can only guess that
something went wrong with your submission.

Please resend it,

Thanks.

^ permalink raw reply

* [PATCH v3] iproute2: add mdb command to bridge
From: Cong Wang @ 2012-12-05  7:14 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, bridge, Herbert Xu, Jesper Dangaard Brouer,
	Thomas Graf, Stephen Hemminger, David S. Miller
In-Reply-To: <1354675804-13310-1-git-send-email-amwang@redhat.com>

V3: improve the output, display router info only for -d
    fix router parsing code

V2: sync with the kernel patch
    handle IPv6 addr
    a few cleanup

Sample output:

	# ./bridge/bridge mdb
	bridge br0:
	port eth0, group 224.8.8.9
	port eth1, group 224.8.8.8

	# ./bridge/bridge -d mdb
	bridge br0:
	port eth0, group 224.8.8.9
	port eth1, group 224.8.8.8
	router ports: eth0

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/Makefile    |    2 +-
 bridge/br_common.h |    3 +-
 bridge/bridge.c    |    1 +
 bridge/mdb.c       |  177 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/bridge/Makefile b/bridge/Makefile
index 9a6743e..67aceb4 100644
--- a/bridge/Makefile
+++ b/bridge/Makefile
@@ -1,4 +1,4 @@
-BROBJ = bridge.o fdb.o monitor.o link.o
+BROBJ = bridge.o fdb.o monitor.o link.o mdb.o
 
 include ../Config
 
diff --git a/bridge/br_common.h b/bridge/br_common.h
index 718ecb9..892fb76 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -5,10 +5,11 @@ extern int print_fdb(const struct sockaddr_nl *who,
 		     struct nlmsghdr *n, void *arg);
 
 extern int do_fdb(int argc, char **argv);
+extern int do_mdb(int argc, char **argv);
 extern int do_monitor(int argc, char **argv);
 
 extern int preferred_family;
 extern int show_stats;
-extern int show_detail;
+extern int show_details;
 extern int timestamp;
 extern struct rtnl_handle rth;
diff --git a/bridge/bridge.c b/bridge/bridge.c
index e2c33b0..1fcd365 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -43,6 +43,7 @@ static const struct cmd {
 	int (*func)(int argc, char **argv);
 } cmds[] = {
 	{ "fdb", 	do_fdb },
+	{ "mdb", 	do_mdb },
 	{ "monitor",	do_monitor },
 	{ "help",	do_help },
 	{ 0 }
diff --git a/bridge/mdb.c b/bridge/mdb.c
new file mode 100644
index 0000000..b39b535
--- /dev/null
+++ b/bridge/mdb.c
@@ -0,0 +1,177 @@
+/*
+ * Get mdb table with netlink
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if_bridge.h>
+#include <linux/if_ether.h>
+#include <linux/neighbour.h>
+#include <linux/if_bridge.h>
+#include <string.h>
+#include <arpa/inet.h>
+
+#include "libnetlink.h"
+#include "br_common.h"
+#include "rt_names.h"
+#include "utils.h"
+
+#ifndef MDBA_RTA
+#define MDBA_RTA(r) \
+	((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
+#endif
+
+int filter_index;
+
+static void usage(void)
+{
+	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
+	exit(-1);
+}
+
+static void br_print_router_ports(FILE *f, struct rtattr *attr)
+{
+	uint32_t *port_ifindex;
+	struct rtattr *i;
+	int rem;
+
+	rem = RTA_PAYLOAD(attr);
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		port_ifindex = RTA_DATA(i);
+		fprintf(f, "%s ", ll_index_to_name(*port_ifindex));
+	}
+
+	fprintf(f, "\n");
+}
+
+static void print_mdb_entry(FILE *f, struct br_mdb_entry *e)
+{
+	SPRINT_BUF(abuf);
+
+	if (e->addr.proto == htons(ETH_P_IP))
+		fprintf(f, "port %s, group %s\n", ll_index_to_name(e->ifindex),
+			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)));
+	else
+		fprintf(f, "port %s, group %s\n", ll_index_to_name(e->ifindex),
+			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)));
+}
+
+static void br_print_mdb_entry(FILE *f, struct rtattr *attr)
+{
+	struct rtattr *i;
+	int rem;
+	struct br_mdb_entry *e;
+
+	rem = RTA_PAYLOAD(attr);
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		e = RTA_DATA(i);
+		print_mdb_entry(f, e);
+	}
+}
+
+int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = arg;
+	struct br_port_msg *r = NLMSG_DATA(n);
+	int len = n->nlmsg_len;
+	struct rtattr * tb[MDBA_MAX+1];
+
+	if (n->nlmsg_type != RTM_GETMDB) {
+		fprintf(stderr, "Not RTM_GETMDB: %08x %08x %08x\n",
+			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
+
+		return 0;
+	}
+
+	len -= NLMSG_LENGTH(sizeof(*r));
+	if (len < 0) {
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	if (filter_index && filter_index != r->ifindex)
+		return 0;
+
+	if (!filter_index && r->ifindex)
+		fprintf(fp, "bridge %s:\n", ll_index_to_name(r->ifindex));
+
+	parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
+
+	if (tb[MDBA_MDB]) {
+		struct rtattr *i;
+		int rem = RTA_PAYLOAD(tb[MDBA_MDB]);
+
+		for (i = RTA_DATA(tb[MDBA_MDB]); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
+			br_print_mdb_entry(fp, i);
+	}
+
+	if (tb[MDBA_ROUTER]) {
+		if (show_details) {
+			fprintf(fp, "router ports: ");
+			br_print_router_ports(fp, tb[MDBA_ROUTER]);
+		}
+	}
+
+	return 0;
+}
+
+static int mdb_show(int argc, char **argv)
+{
+	char *filter_dev = NULL;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "dev") == 0) {
+			NEXT_ARG();
+			if (filter_dev)
+				duparg("dev", *argv);
+			filter_dev = *argv;
+		}
+		argc--; argv++;
+	}
+
+	if (filter_dev) {
+		filter_index = if_nametoindex(filter_dev);
+		if (filter_index == 0) {
+			fprintf(stderr, "Cannot find device \"%s\"\n",
+				filter_dev);
+			return -1;
+		}
+	}
+
+	if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETMDB) < 0) {
+		perror("Cannot send dump request");
+		exit(1);
+	}
+
+	if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
+		fprintf(stderr, "Dump terminated\n");
+		exit(1);
+	}
+
+	return 0;
+}
+
+int do_mdb(int argc, char **argv)
+{
+	ll_init_map(&rth);
+
+	if (argc > 0) {
+		if (matches(*argv, "show") == 0 ||
+		    matches(*argv, "lst") == 0 ||
+		    matches(*argv, "list") == 0)
+			return mdb_show(argc-1, argv+1);
+		if (matches(*argv, "help") == 0)
+			usage();
+	} else
+		return mdb_show(0, NULL);
+
+	fprintf(stderr, "Command \"%s\" is unknown, try \"bridge mdb help\".\n", *argv);
+	exit(-1);
+}

^ permalink raw reply related

* Re: [Patch net-next] xfrm: add missing xfrm message types to selinux perm table
From: Cong Wang @ 2012-12-05  6:56 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev, herbert
In-Reply-To: <20121205054152.GO22290@secunet.com>

On Wed, 2012-12-05 at 06:41 +0100, Steffen Klassert wrote:
> On Tue, Dec 04, 2012 at 01:02:57PM -0500, David Miller wrote:
> > From: Cong Wang <amwang@redhat.com>
> > Date: Tue,  4 Dec 2012 13:39:31 +0800
> > 
> > > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Signed-off-by: Cong Wang <amwang@redhat.com>
> > 
> > Steffen will you take this into your IPSEC tree?
> > 
> 
> I would do so, but I have not seen this patch by now.
> Was it really send to me? I have not even seen it on the
> netdev list.
> 
> Anyway, I don't have it, please resend.
> 

Weird, you are already Cc'ed in the original patch, Cc: Steffen Klassert
<steffen.klassert@secunet.com>.

^ permalink raw reply

* Re: [PATCH net-next 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-05  6:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
	jwhan, davem, shiyer
In-Reply-To: <20121204151108.GI7499@redhat.com>

On 12/04/2012 11:11 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 04, 2012 at 10:45:33PM +0800, Jason Wang wrote:
>> On Tuesday, December 04, 2012 03:24:22 PM Michael S. Tsirkin wrote:
>>> I found some bugs, see below.
>>> Also some style nitpicking, this is not mandatory to address.
>> Thanks for the reviewing.
>>> On Tue, Dec 04, 2012 at 07:07:57PM +0800, Jason Wang wrote:
>>>
[...]
>>>> +		set = false;
>>> This will overwrite affinity if it was set by userspace.
>>> Just
>>> 	if (set)
>>> 		return;
>>> will not have this problem.
>> But we need handle the situtaiton when switch back to sq from mq mode. 
>> Otherwise we may still get the affinity hint used in mq.
>>  This kind of overwrite 
>> is unavoidable or is there some method to detect whether userspac write 
>> something new?
> If we didn't set the affinity originally we should not overwrite it.
> I think this means we need a flag that tells us that
> virtio set the affinity.

Ok.

[...]
>>>> +
>>>> +	/* Parameters for control virtqueue, if any */
>>>> +	if (vi->has_cvq) {
>>>> +		callbacks[total_vqs - 1] = NULL;
>>>> +		names[total_vqs - 1] = kasprintf(GFP_KERNEL, "control");
>>>> +	}
>>>>
>>>> +	/* Allocate/initialize parameters for send/receive virtqueues */
>>>> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>>>> +		callbacks[rxq2vq(i)] = skb_recv_done;
>>>> +		callbacks[txq2vq(i)] = skb_xmit_done;
>>>> +		names[rxq2vq(i)] = kasprintf(GFP_KERNEL, "input.%d", i);
>>>> +		names[txq2vq(i)] = kasprintf(GFP_KERNEL, "output.%d", i);
>>>> +	}
>>> We would need to check kasprintf return value.
>> Looks like a better method is to make the name as a memeber of receive_queue 
>> and send_queue, and use sprintf here.
>>> Also if you allocate names from slab we'll need to free them
>>> later.
>> Then it could be freed when the send_queue and receive_queue is freed.
>>> It's probably easier to just use fixed names for now -
>>> it's not like the index is really useful.
>> Looks useful for debugging e.g. check whether the irq distribution is as 
>> expected.
> Well it doesn't really matter which one goes where, right?
> As long as interrupts are distributed well.

Yes, anyway, we decide to store the name in the send/receive queue, so I
will keep the index.
>
>>>> +
>>>> +	ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
>>>> +					 (const char **)names);
>>> Please avoid casts, use a proper type for names.
>> I'm consider we need a minor change in this api, we need allocate the names 
>> dynamically which could not be a const char **.
> I don't see why. Any use that allocates on the fly as
> you did would leak memory. Any use like you suggest
> e.g. allocating as part of send/receive structure
> would be fine.

True
>>>> +	if (ret)
>>>> +		goto err_names;
>>>> +
>>>> +	if (vi->has_cvq) {
>>>> +		vi->cvq = vqs[total_vqs - 1];
>>>>
>>>>  		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
>>>>  		
>>>>  			vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
>>>>  	
>>>>  	}
>>>>
>>>> +
>>>> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>>>> +		vi->rq[i].vq = vqs[rxq2vq(i)];
>>>> +		vi->sq[i].vq = vqs[txq2vq(i)];
>>>> +	}
>>>> +
>>>> +	kfree(callbacks);
>>>> +	kfree(vqs);
>>> Who frees names if there's no error?
>>>
>> The virtio core does not copy the name, so it need this and only used for 
>> debugging if I'm reading the code correctly.
> No, virtio core does not free either individual vq name or the names
> array passed in. So this leaks memory.

Yes, so when we use the names in receive/send queue, it can be freed
during queue destroying.

[...]
> @@ -1276,24 +1531,29 @@ static int virtnet_freeze(struct virtio_device
> *vdev)> 
>  static int virtnet_restore(struct virtio_device *vdev)
>  {
>  
>  	struct virtnet_info *vi = vdev->priv;
>
> -	int err;
> +	int err, i;
>
>  	err = init_vqs(vi);
>  	if (err)
>  	
>  		return err;
>  	
>  	if (netif_running(vi->dev))
>
> -		virtnet_napi_enable(&vi->rq);
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			virtnet_napi_enable(&vi->rq[i]);
>
>  	netif_device_attach(vi->dev);
>
> -	if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> -		schedule_delayed_work(&vi->refill, 0);
> +	for (i = 0; i < vi->max_queue_pairs; i++)
> +		if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> +			schedule_delayed_work(&vi->refill, 0);
>
>  	mutex_lock(&vi->config_lock);
>  	vi->config_enable = true;
>  	mutex_unlock(&vi->config_lock);
>
> +	if (vi->has_cvq && virtio_has_feature(vi->vdev, VIRTIO_NET_F_RFS))
> +		virtnet_set_queues(vi);
> +
>>> I think it's easier to test
>>> if (curr_queue_pairs == max_queue_pairs)
>>> within virtnet_set_queues and make it
>>> a NOP if so.
>> Still need to send the command during restore since we reset the device during 
>> freezing.
>
> Then maybe check vi->has_cvq && virtio_has_feature(vi->vdev,
> VIRTIO_NET_F_RFS) in there?

Right.
>
>>>>  
[...]

^ permalink raw reply

* Re: [RFC PATCH 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-05  6:19 UTC (permalink / raw)
  To: Paul Moore; +Cc: Michael S. Tsirkin, netdev, linux-security-module, selinux
In-Reply-To: <338664993.8Hd16PoY2S@sifl>

On 12/05/2012 02:17 AM, Paul Moore wrote:
> On Tuesday, December 04, 2012 07:36:26 PM Michael S. Tsirkin wrote:
>> On Tue, Dec 04, 2012 at 11:18:57AM -0500, Paul Moore wrote:
>>> Okay, based on your explanation of TUNSETQUEUE, the steps below are what I
>>> believe we need to do ... if you disagree speak up quickly please.
>>>
>>> A. TUNSETIFF (new, non-persistent device)
>>>
>>> [Allocate and initialize the tun_struct LSM state based on the calling
>>> process, use this state to label the TUN socket.]
>>>
>>> 1. Call security_tun_dev_create() which authorizes the action.
>>> 2. Call security_tun_dev_alloc_security() which allocates the tun_struct
>>> LSM blob and SELinux sets some internal blob state to record the label of
>>> the calling process.
>>> 3. Call security_tun_dev_attach() which sets the label of the TUN socket
>>> to match the label stored in the tun_struct LSM blob during A2.  No
>>> authorization is done at this point since the socket is new/unlabeled.
>>>
>>> B. TUNSETIFF (existing, persistent device)
>>>
>>> [Relabel the existing tun_struct LSM state based on the calling process,
>>> use this state to label the TUN socket.]
>>>
>>> 1. Attempt to relabel/reset the tun_struct LSM blob from the currently
>>> stored value, set during A2, to the label of the current calling process.
>>> *** THIS IS NOT CURRENTLY DONE IN THE RFC PATCH ***
>>> 2. Call security_tun_dev_attach() which sets the label of the TUN socket
>>> to match the label stored in the tun_struct LSM blob during B1. No
>>> authorization is done at this point since the socket is new/unlabeled.
>>>
>>> C. TUNSETQUEUE
>>>
>>> [Use the existing tun_struct LSM state to label the new TUN socket.]
>>>
>>> 1. Call security_tun_dev_attach() which sets the label of the TUN socket
>>> to match the label stored in the tun_struct LSM blob set during either A2
>>> or B1. No authorization is done at this point since the socket is
>>> new/unlabeled.
>> Here's what bothers me. libvirt currently opens tun and passes
>> fd to qemu. What would prevent qemu from attaching fd using TUNSETQUEUE
>> to another device it does not own?
> True, assuming all the above is correct and that I'm understanding it 
> correctly (Jason?), we should probably add a new SELinux access control for 
> TUNSETQUEUE.

Yes, we need make sure qemu can call TUNSETQUEUE for the device it does
not own.
>
> The current DAC code exists in tun_not_capable().
>

^ permalink raw reply

* Re: [RFC PATCH 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-05  6:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paul Moore, netdev, linux-security-module, selinux
In-Reply-To: <20121204152420.GJ7499@redhat.com>

On 12/04/2012 11:24 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 04, 2012 at 09:24:43PM +0800, Jason Wang wrote:
>> On Monday, December 03, 2012 11:22:29 AM Paul Moore wrote:
>>> On Monday, December 03, 2012 06:15:42 PM Jason Wang wrote:
>>>> On 11/30/2012 06:06 AM, Paul Moore wrote:
>>>>> This patch corrects some problems with LSM/SELinux that were introduced
>>>>> with the multiqueue patchset.  The problem stems from the fact that the
>>>>> multiqueue work changed the relationship between the tun device and its
>>>>> associated socket; before the socket persisted for the life of the
>>>>> device, however after the multiqueue changes the socket only persisted
>>>>> for the life of the userspace connection (fd open).  For non-persistent
>>>>> devices this is not an issue, but for persistent devices this can cause
>>>>> the tun device to lose its SELinux label.
>>>>>
>>>>> We correct this problem by adding an opaque LSM security blob to the
>>>>> tun device struct which allows us to have the LSM security state, e.g.
>>>>> SELinux labeling information, persist for the lifetime of the tun
>>>>> device.
>>> ...
>>>
>>>>> -static int selinux_tun_dev_attach(struct sock *sk)
>>>>> +static int selinux_tun_dev_attach(struct sock *sk, void *security)
>>>>>
>>>>>  {
>>>>>
>>>>> +	struct tun_security_struct *tunsec = security;
>>>>>
>>>>>  	struct sk_security_struct *sksec = sk->sk_security;
>>>>>  	u32 sid = current_sid();
>>>>>  	int err;
>>>>>
>>>>> +	/* we don't currently perform any NetLabel based labeling here ...
>>>>>
>>>>>  	err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
>>>>>  	
>>>>>  			   TUN_SOCKET__RELABELFROM, NULL);
>>>>>  	
>>>>>  	if (err)
>>>>>  	
>>>>>  		return err;
>>>>>
>>>>> -	err = avc_has_perm(sid, sid, SECCLASS_TUN_SOCKET,
>>>>> +	err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
>>>>>
>>>>>  			   TUN_SOCKET__RELABELTO, NULL);
>>>>>  	
>>>>>  	if (err)
>>>>>  	
>>>>>  		return err;
>>>>>
>>>>> -	sksec->sid = sid;
>>>>> +	sksec->sid = tunsec->sid;
>>>>> +	sksec->sclass = SECCLASS_TUN_SOCKET;
>>>> I'm not sure whether this is correct, looks like we need to differ between
>>>> TUNSETQUEUE and TUNSETIFF. When userspace call TUNSETIFF for persistent
>>>> device, looks like we need change the sid of tunsec like in the past.
>>> It may be that I'm misunderstanding TUNSETQUEUE and/or TUNSETIFF.  Can you
>>> elaborate as to why they should be different?
>> If I understand correctly, before multiqueue patchset, TUNSETIFF is used to:
>>
>> 1) Create the tun/tap network device
>> 2) For persistent device, re-attach the fd to the network device / socket. In 
>> this case, we call selinux_tun_dev_attch() to relabel the socket sid (in fact 
>> also the device's since the socket were persistent also) to the sid of process 
>> that calls TUNSETIFF.
>>
>> So, after the changes of multiqueue, we need try to preserve those policy. The 
>> interesting part is the introducing of TUNSETQUEUE, it's used to attach more 
>> file descriptors/sockets to a tun/tap device after at least one file descriptor 
>> were attached to the tun/tap device through TUNSETIFF. So I think maybe we 
>> need differ those two ioctls. This patch looks fine for TUNSETQUEUE, but for 
>> TUNSETIFF, we need relabel the tunsec to the process that calling TUNSETIFF 
>> for persistent device?
> Basically, it looks like currently once you get a tun fd,
> you can attach it to any device even if normally
> selinux would prevent you from accessing it.

Yes some checking during TUNSETQUEUE is missed.
> If we reuse selinux_tun_dev_attach, we won't need to
> change selinux policy, with a new capability we will need to change it
> to allow libvirt to do TUNSETQUEUE.
>

Also needed for qemu too since it may call TUNSETQUEUE when guest wants
to change the number of queues.
>> btw. Current code does allow calling TUNSETQUEUE to a persistent tun/tap 
>> device with no file attached. It should be a bug and need to be fixed.
> Is this a problem? You can always
> attach
> set queue
> detach
>
> and it would be hard to prevent this ...

Currently, the following steps is allowed:

1. fd1 = open("/dev/net/tun");
2. tunsetiff(fd1, "tap0");
3. tunsetpersistent("tap0");
4. close(fd1);
5. fd2 = open("/dev/net/tun");
6. tunsetqueue(fd2, "tap0);

Looks like step 6 should be forbidden since:

- no fd/sockets were attached to the device, we need use TUNSETIFF
instead to keep the API as we used do in single queue tun
- we need update the security information in tun_struct just like what
we discussed in this mail
- it may also miss checks in TUNSETIFF

>>> One thing that I think we probably should change is the relabelto/from
>>> permissions in the function above (selinux_tun_dev_attach()); in the case
>>> where the socket does not yet have a label, e.g. 'sksec->sid == 0', we
>>> should probably skip the relabel permissions since we want to assign the
>>> TUN device label regardless in this case.
>> I'm not familiar with the selinux, have a quick glance of the code, looks like 
>> the label has been initialized to SECINITSID_KERNEL in 
>> selinux_socket_post_create().
>>
>> Thanks

^ permalink raw reply

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chen Gang @ 2012-12-05  5:59 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR); +Cc: David Miller, netdev
In-Reply-To: <50BEDE4E.8010408@asianux.com>

于 2012年12月05日 13:40, Chen Gang 写道:
> 于 2012年12月05日 12:56, Chen Gang 写道:
>>>>>>>> -		pos += sprintf(pos, "\n");
>>>>>>>> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
>>>> ..
>>>>>>  need we judge whether count >= PAGE_SIZE ?
>>>>
>>>> count will eventually make PAGE_SIZE - count reach 0 at which point,
>>>> scnprintf() won't be able to write into the buffer.
>>   I also think so.
>>
>>   I think, maybe it will be better to break the loop when we already
>> know that "count >= PAGE_SIZE" (it can save waste looping, although it
>> seems unlikly happen, for example, using unlikly(...) ).
>>
>> By the way:
>>   will it be better that always let "\n" at the end ?
>>   (if count == PAGE_SIZE in a loop, we can not let "\n" at the end).
> 
>    oh, sorry ! count will never >= PAGE_SIZE.
> 
>    I think let "PAGE_SIZE - 2" instead of "PAGE_SIZE" in the loop, so we
> can make the room for the end of "\n".
> 
> 
> 
   sorry, "PAGE_SIZE - 1" is enough, not need "PAGE_SIZE - 2".


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [RFC PATCH 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-05  5:44 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <8577392.82G063LYx2@sifl>

On Tuesday, December 04, 2012 11:18:57 AM Paul Moore wrote:
> On Tuesday, December 04, 2012 09:24:43 PM Jason Wang wrote:
> > On Monday, December 03, 2012 11:22:29 AM Paul Moore wrote:
> > > It may be that I'm misunderstanding TUNSETQUEUE and/or TUNSETIFF.  Can
> > > you
> > > elaborate as to why they should be different?
> > 
> > If I understand correctly, before multiqueue patchset, TUNSETIFF is used
> > to:
> > 
> > 1) Create the tun/tap network device
> > 2) For persistent device, re-attach the fd to the network device / socket.
> > In this case, we call selinux_tun_dev_attch() to relabel the socket sid
> > (in
> > fact also the device's since the socket were persistent also) to the sid
> > of
> > process that calls TUNSETIFF.
> > 
> > So, after the changes of multiqueue, we need try to preserve those policy.
> > The interesting part is the introducing of TUNSETQUEUE, it's used to
> > attach
> > more file descriptors/sockets to a tun/tap device after at least one file
> > descriptor were attached to the tun/tap device through TUNSETIFF. So I
> > think maybe we need differ those two ioctls. This patch looks fine for
> > TUNSETQUEUE, but for TUNSETIFF, we need relabel the tunsec to the process
> > that calling TUNSETIFF for persistent device?
> 
> Okay, based on your explanation of TUNSETQUEUE, the steps below are what I
> believe we need to do ... if you disagree speak up quickly please.
> 
> A. TUNSETIFF (new, non-persistent device)
> 
> [Allocate and initialize the tun_struct LSM state based on the calling
> process, use this state to label the TUN socket.]
> 
> 1. Call security_tun_dev_create() which authorizes the action.
> 2. Call security_tun_dev_alloc_security() which allocates the tun_struct LSM
> blob and SELinux sets some internal blob state to record the label of the
> calling process.
> 3. Call security_tun_dev_attach() which sets the label of the TUN socket to
> match the label stored in the tun_struct LSM blob during A2.  No
> authorization is done at this point since the socket is new/unlabeled.
> 
> B. TUNSETIFF (existing, persistent device)
> 
> [Relabel the existing tun_struct LSM state based on the calling process, use
> this state to label the TUN socket.]
> 
> 1. Attempt to relabel/reset the tun_struct LSM blob from the currently
> stored value, set during A2, to the label of the current calling process.
> *** THIS IS NOT CURRENTLY DONE IN THE RFC PATCH ***
> 2. Call security_tun_dev_attach() which sets the label of the TUN socket to
> match the label stored in the tun_struct LSM blob during B1. No
> authorization is done at this point since the socket is new/unlabeled.
> 
> C. TUNSETQUEUE
> 
> [Use the existing tun_struct LSM state to label the new TUN socket.]
> 
> 1. Call security_tun_dev_attach() which sets the label of the TUN socket to
> match the label stored in the tun_struct LSM blob set during either A2 or
> B1. No authorization is done at this point since the socket is
> new/unlabeled.

This looks fine to me.
> > btw. Current code does allow calling TUNSETQUEUE to a persistent tun/tap
> > device with no file attached. It should be a bug and need to be fixed.
> 
> Since you wrote that code will you be submitting a patch to fix that
> problem?

Yes, I will fix it.
> > > One thing that I think we probably should change is the relabelto/from
> > > permissions in the function above (selinux_tun_dev_attach()); in the
> > > case
> > > where the socket does not yet have a label, e.g. 'sksec->sid == 0', we
> > > should probably skip the relabel permissions since we want to assign the
> > > TUN device label regardless in this case.
> > 
> > I'm not familiar with the selinux, have a quick glance of the code, looks
> > like the label has been initialized to SECINITSID_KERNEL in
> > selinux_socket_post_create().
> 
> Unless I've missed something in your changes, the multiqueue code never
> calls any socket code which ends up calling
> {security,selinux}_socket_post_create(); I believe you only call sk_alloc()
> which ends up calling
> {security,selinux}_sk_alloc() which sets SECINITSID_UNLABELED (I mistakenly
> wrote 0 instead in my earlier email which is techincally SECSID_NULL). 
> Either way, I still think the logic I originally described above is
> correct.

Yes, I was wrong. Thanks for the checking.



^ permalink raw reply

* Re: [Patch net-next] xfrm: add missing xfrm message types to selinux perm table
From: Steffen Klassert @ 2012-12-05  5:41 UTC (permalink / raw)
  To: David Miller; +Cc: amwang, netdev, herbert
In-Reply-To: <20121204.130257.1178992077066719400.davem@davemloft.net>

On Tue, Dec 04, 2012 at 01:02:57PM -0500, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Tue,  4 Dec 2012 13:39:31 +0800
> 
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Signed-off-by: Cong Wang <amwang@redhat.com>
> 
> Steffen will you take this into your IPSEC tree?
> 

I would do so, but I have not seen this patch by now.
Was it really send to me? I have not even seen it on the
netdev list.

Anyway, I don't have it, please resend.

Thanks.

^ permalink raw reply

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chen Gang @ 2012-12-05  5:40 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR); +Cc: David Miller, netdev
In-Reply-To: <50BED40D.9080100@asianux.com>

于 2012年12月05日 12:56, Chen Gang 写道:
>>>> >>> -		pos += sprintf(pos, "\n");
>>>> >>> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
>> > ..
>>> >>  need we judge whether count >= PAGE_SIZE ?
>> > 
>> > count will eventually make PAGE_SIZE - count reach 0 at which point,
>> > scnprintf() won't be able to write into the buffer.
>   I also think so.
> 
>   I think, maybe it will be better to break the loop when we already
> know that "count >= PAGE_SIZE" (it can save waste looping, although it
> seems unlikly happen, for example, using unlikly(...) ).
> 
> By the way:
>   will it be better that always let "\n" at the end ?
>   (if count == PAGE_SIZE in a loop, we can not let "\n" at the end).

   oh, sorry ! count will never >= PAGE_SIZE.

   I think let "PAGE_SIZE - 2" instead of "PAGE_SIZE" in the loop, so we
can make the room for the end of "\n".

-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chen Gang @ 2012-12-05  4:56 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR); +Cc: David Miller, netdev
In-Reply-To: <201212050357.qB53vHvT022706@thirdoffive.cmf.nrl.navy.mil>

于 2012年12月05日 11:57, Chas Williams (CONTRACTOR) 写道:
> In message <50BEA2CB.9000800@asianux.com>,Chen Gang writes:
>>> -	for (i = 0; i < (ESI_LEN - 1); i++)
>>> -		pos += sprintf(pos, "%02x:", adev->esi[i]);
>>> -	pos += sprintf(pos, "%02x\n", adev->esi[i]);
>>>  
>>> -	return pos - buf;
>>> +	return scnprintf(buf, PAGE_SIZE, "%pM\n", adev->esi);
>>>  }
>>>  
>>
>>  "%p" seems print a pointer, not contents of pointer (is it correct ?)
>>  will it change the original display format to outside ?
> 
> %pM means format this pointer as a mac address.  it didnt exist when the
> atm stack was originally written but can be used now to save a bit of
> messy code.
> 

  it is my fault. thank you

  :-)

>>> -		pos += sprintf(pos, "\n");
>>> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
> ..
>>  need we judge whether count >= PAGE_SIZE ?
> 
> count will eventually make PAGE_SIZE - count reach 0 at which point,
> scnprintf() won't be able to write into the buffer.

  I also think so.

  I think, maybe it will be better to break the loop when we already
know that "count >= PAGE_SIZE" (it can save waste looping, although it
seems unlikly happen, for example, using unlikly(...) ).

By the way:
  will it be better that always let "\n" at the end ?
  (if count == PAGE_SIZE in a loop, we can not let "\n" at the end).

  I think what I said above are minor, if you think, for this patch, do
not need consider them, it is ok (at least for me, it is true).

  :-)

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* [PATCH net-next 2/2] net: doc: add default value for neighbour parameters
From: Shan Wei @ 2012-12-05  4:50 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, NetDev, Shan Wei

From: Shan Wei <davidshan@tencent.com>

Signed-off-by: Shan Wei <davidshan@tencent.com>
---
 Documentation/networking/ip-sysctl.txt |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c6d5fee..0462a71 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -30,16 +30,24 @@ neigh/default/gc_thresh3 - INTEGER
 	Maximum number of neighbor entries allowed.  Increase this
 	when using large numbers of interfaces and when communicating
 	with large numbers of directly-connected peers.
+	Default: 1024
 
 neigh/default/unres_qlen_bytes - INTEGER
 	The maximum number of bytes which may be used by packets
 	queued for each	unresolved address by other network layers.
 	(added in linux 3.3)
+	Seting negative value is meaningless and will retrun error.
+	Default: 65536 Bytes(64KB)
 
 neigh/default/unres_qlen - INTEGER
 	The maximum number of packets which may be queued for each
 	unresolved address by other network layers.
 	(deprecated in linux 3.3) : use unres_qlen_bytes instead.
+	Prior to linux 3.3, the default value is 3 which may cause
+	secluded packet loss. The current default value is calculated
+	according to default value of unres_qlen_bytes and true size of
+	packet.
+	Default: 31
 
 mtu_expires - INTEGER
 	Time, in seconds, that cached PMTU information is kept.
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/2] net: neighbour: prohibit negative value for unres_qlen_bytes parameter
From: Shan Wei @ 2012-12-05  4:49 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, NetDev; +Cc: Shan Wei

From: Shan Wei <davidshan@tencent.com>

unres_qlen_bytes and unres_qlen are int type.
But multiple relation(unres_qlen_bytes = unres_qlen * SKB_TRUESIZE(ETH_FRAME_LEN))
will cause type overflow when seting unres_qlen. e.g.

$ echo 1027506 > /proc/sys/net/ipv4/neigh/eth1/unres_qlen
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen
1182657265
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen_bytes 
-2147479756

The gutted value is not that we setting。
But user/administrator don't know this is caused by int type overflow.

what's more, it is meaningless and even dangerous that unres_qlen_bytes is set
with negative number. Because, for unresolved neighbour address, kernel will cache packets
without limit in __neigh_event_send()(e.g. (u32)-1 = 2GB).


Signed-off-by: Shan Wei <davidshan@tencent.com>
---
 net/core/neighbour.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f1c0c2e..36fc692 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -62,6 +62,9 @@ static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 
+static int zero;
+static int unres_qlen_max = INT_MAX / SKB_TRUESIZE(ETH_FRAME_LEN);
+
 static struct neigh_table *neigh_tables;
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -1787,8 +1790,7 @@ static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
 	    nla_put_u32(skb, NDTPA_QUEUE_LENBYTES, parms->queue_len_bytes) ||
 	    /* approximative value for deprecated QUEUE_LEN (in packets) */
 	    nla_put_u32(skb, NDTPA_QUEUE_LEN,
-			DIV_ROUND_UP(parms->queue_len_bytes,
-				     SKB_TRUESIZE(ETH_FRAME_LEN))) ||
+			parms->queue_len_bytes / SKB_TRUESIZE(ETH_FRAME_LEN)) ||
 	    nla_put_u32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen) ||
 	    nla_put_u32(skb, NDTPA_APP_PROBES, parms->app_probes) ||
 	    nla_put_u32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes) ||
@@ -2777,9 +2779,13 @@ static int proc_unres_qlen(ctl_table *ctl, int write, void __user *buffer,
 	int size, ret;
 	ctl_table tmp = *ctl;
 
+	tmp.extra1 = &zero;
+	tmp.extra2 = &unres_qlen_max;
 	tmp.data = &size;
-	size = DIV_ROUND_UP(*(int *)ctl->data, SKB_TRUESIZE(ETH_FRAME_LEN));
-	ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);
+
+	size = *(int *)ctl->data / SKB_TRUESIZE(ETH_FRAME_LEN);
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+
 	if (write && !ret)
 		*(int *)ctl->data = size * SKB_TRUESIZE(ETH_FRAME_LEN);
 	return ret;
@@ -2865,7 +2871,8 @@ static struct neigh_sysctl_table {
 			.procname	= "unres_qlen_bytes",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
-			.proc_handler	= proc_dointvec,
+			.extra1		= &zero,
+			.proc_handler   = proc_dointvec_minmax,
 		},
 		[NEIGH_VAR_PROXY_QLEN] = {
 			.procname	= "proxy_qlen",
-- 
1.7.1

^ permalink raw reply related

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chas Williams (CONTRACTOR) @ 2012-12-05  3:57 UTC (permalink / raw)
  To: Chen Gang; +Cc: David Miller, netdev
In-Reply-To: <50BEA2CB.9000800@asianux.com>

In message <50BEA2CB.9000800@asianux.com>,Chen Gang writes:
>> -	for (i = 0; i < (ESI_LEN - 1); i++)
>> -		pos += sprintf(pos, "%02x:", adev->esi[i]);
>> -	pos += sprintf(pos, "%02x\n", adev->esi[i]);
>>  
>> -	return pos - buf;
>> +	return scnprintf(buf, PAGE_SIZE, "%pM\n", adev->esi);
>>  }
>>  
>
>  "%p" seems print a pointer, not contents of pointer (is it correct ?)
>  will it change the original display format to outside ?

%pM means format this pointer as a mac address.  it didnt exist when the
atm stack was originally written but can be used now to save a bit of
messy code.

>> -		pos += sprintf(pos, "\n");
>> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
...
>  need we judge whether count >= PAGE_SIZE ?

count will eventually make PAGE_SIZE - count reach 0 at which point,
scnprintf() won't be able to write into the buffer.

^ permalink raw reply

* Re: [Patch net-next] netlink: add missing netlink message types to selinux perm table
From: Cong Wang @ 2012-12-05  3:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller
In-Reply-To: <1354593208-21939-1-git-send-email-amwang@redhat.com>

On Tue, 2012-12-04 at 11:53 +0800, Cong Wang wrote:
> RTM_NEWNETCONF and RTM_GETNETCONF are missing in this table.

This patch has conflicts with "v3 bridge: export multicast database via
netlink", please drop this for now, I will resend it later.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox