Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/1] net: cpts: fix for build break after ARM SoC integration
From: Tomi Valkeinen @ 2012-12-13 11:07 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: netdev, davem, linux-arm-kernel, linux-omap, b-cousson, paul,
	richardcochran
In-Reply-To: <1354012034-31686-1-git-send-email-mugunthanvnm@ti.com>

Hi,

On 2012-11-27 12:27, Mugunthan V N wrote:
>   CC      drivers/net/ethernet/ti/cpts.o
> drivers/net/ethernet/ti/cpts.c:30:24: fatal error: plat/clock.h: No such file or directory
> compilation terminated.
> make[4]: *** [drivers/net/ethernet/ti/cpts.o] Error 1
> make[3]: *** [drivers/net/ethernet/ti] Error 2
> make[2]: *** [drivers/net/ethernet] Error 2
> make[1]: *** [drivers/net] Error 2
> 
> fix for build break as the header file is removed from plat-omap as part of
> the below patch

linux-next still has this build problem, I guess this patch is lingering
somewhere. Somewhat annoying, as the driver is enabled by default. (btw,
why is it "default y"?)

 Tomi

^ permalink raw reply

* [PATCH v2] ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
From: Duan Jiong @ 2012-12-13 11:21 UTC (permalink / raw)
  To: davem; +Cc: Steffen Klassert, netdev


In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.

Many thanks to Steffen Klassert.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
---
 include/net/ndisc.h |    7 +++++++
 net/ipv6/ndisc.c    |   22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 980d263..6b305d7 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -78,6 +78,13 @@ struct ra_msg {
 	__be32			retrans_timer;
 };
 
+struct rd_msg {
+	struct icmp6hdr icmph;
+	struct in6_addr	target;
+	struct in6_addr	dest;
+	__u8		opt[0];
+};
+
 struct nd_opt_hdr {
 	__u8		nd_opt_type;
 	__u8		nd_opt_len;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 2edce30..03deabc 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1333,6 +1333,12 @@ out:
 
 static void ndisc_redirect_rcv(struct sk_buff *skb)
 {
+	u8 *hdr;
+	struct ndisc_options ndopts;
+	struct rd_msg *msg = (struct rd_msg *)skb_transport_header(skb);
+	u32 ndoptlen = skb->tail - (skb->transport_header +
+				    offsetof(struct rd_msg, opt));
+
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	switch (skb->ndisc_nodetype) {
 	case NDISC_NODETYPE_HOST:
@@ -1349,6 +1355,22 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 		return;
 	}
 
+	if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) {
+		ND_PRINTK(2, warn,
+			  "Redirect: invalid ND options\n");
+		return;
+	}
+
+	if (!ndopts.nd_opts_rh) {
+		return;
+	}
+
+	hdr = (u8 *)ndopts.nd_opts_rh;
+	hdr += 8;
+	if(!pskb_pull(skb, hdr - skb_transport_header(skb))) {
+		return;
+	}
+
 	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
 }
 
-- 
1.7.1

^ permalink raw reply related

* Re: netconsole fun
From: Neil Horman @ 2012-12-13 12:36 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Cong Wang, netdev
In-Reply-To: <1355345957.2687.18.camel@thor>

On Wed, Dec 12, 2012 at 03:59:17PM -0500, Peter Hurley wrote:
> On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> > On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > > device?
> > > > > > >
> > > > > > 
> > > > > > Yes, running it on the master device instead.
> > > > > 
> > > > > Thanks for the suggestion, but:
> > > > > 
> > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > > ...
> > > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > > [ 5.289929] netconsole: cleaning up
> > > > > ...
> > > > > [ 9.392291] Bridge firewalling registered
> > > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > > [ 9.418350] eth1:  setting full-duplex.
> > > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > > 
> > > > > 
> > > > > Is there a way to control or associate network device names prior to
> > > > > udev renaming?
> > > > > 
> > > > That looks like a systemd problem (or more specifically a boot dependency
> > > > problem).  You need to modify your netconsole unit/service file to start after
> > > > all your networking is up.  NetworkManager provides a dummy service file for
> > > > this purpose, called networkmanager-wait-online.service
> > > 
> > > Ok. So with a single physical network interface that will be bridged,
> > > netconsole cannot used for kernel boot messages.
> > > 
> > > With a machine with multiple nics, is there a way to control device
> > > naming so that the interface name to be used by netconsole specified on
> > > the boot command line will actually corresponding to the intended
> > > device. For example,
> > > 
> > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > > ....
> > > [ 4.092184] 3c59x: Donald Becker and others.
> > > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > > ....
> > > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > > 
> > > This is attaching netconsole to the wrong device because bus
> > > enumeration, and therefore load order, is not consistent from boot to
> > > boot.
> > > 
> > No, theres no way to do that.  As you note device ennumeration isn't consistent
> > accross boots, thats why udev creates rules to rename devices based on immutable
> > (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> > happens, you'll have consistent names for your interfaces, and that work will be
> > guaranteed to be done after networkmanager has finished opening all the
> > interfaces that it needs (hence my suggestion to make netconsole service
> > dependent on networkmanager service startup completing).
> 
> Just wondering if you think something like the patch below is
> suitable/acceptable for insulating netconsole from inconsistent device
> name scenarios without changing the existing semantics. The basic idea
> is to allow an ethernet MAC address in the <dev> field of the
> netconsole= options, and if a MAC address was specified rather than a
> device name, to do the dev lookup from the MAC address instead.
> 
> This doesn't extend to, but also doesn't interfere with, the dynamic
> config of netconsole via configfs.
> 
> Would you mind reviewing it?
> 
> Regards,
> Peter
> 
This looks like a pretty good idea to me.  That said, something occured to me
when you wrote your summary above.  Have you looked at the netconsole service
scripts that most distros provide in their packaging?  I'm almost positive Red
Hat/Fedora (and also like Suse and Ubuntu), already implement this functionality
from user space.  Basically, instead of people just modprobing netconsole, they
create a service script that parses a config file that has contains all the
options needed to load the netconsole module, and it has the intellegence to see
if you specified a mac address rather than a device.  If you did that it finds
the corresponding device mac address and uses that as the device.  I'm sorry, I
don't know why I didn't think of that before.  Check that out though, that will
likey give you exactly what you need

Neil

P.S. Actually looking at it, I think it does one better, it lets you specify the
destinaition netconsole address, and then dynamically looks up the routing table
entry that gets you there, and uses the output device specified in the routing
table.

http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html

^ permalink raw reply

* Re: [PATCH 1/1] net: cpts: fix for build break after ARM SoC integration
From: Richard Cochran @ 2012-12-13 13:03 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: Mugunthan V N, netdev, davem, linux-arm-kernel, linux-omap,
	b-cousson, paul
In-Reply-To: <50C9B6F9.9020300@iki.fi>

On Thu, Dec 13, 2012 at 01:07:37PM +0200, Tomi Valkeinen wrote:
> Hi,
> 
> On 2012-11-27 12:27, Mugunthan V N wrote:
> >   CC      drivers/net/ethernet/ti/cpts.o
> > drivers/net/ethernet/ti/cpts.c:30:24: fatal error: plat/clock.h: No such file or directory
> > compilation terminated.
> > make[4]: *** [drivers/net/ethernet/ti/cpts.o] Error 1
> > make[3]: *** [drivers/net/ethernet/ti] Error 2
> > make[2]: *** [drivers/net/ethernet] Error 2
> > make[1]: *** [drivers/net] Error 2
> > 
> > fix for build break as the header file is removed from plat-omap as part of
> > the below patch
> 
> linux-next still has this build problem, I guess this patch is lingering
> somewhere. Somewhat annoying, as the driver is enabled by default. (btw,
> why is it "default y"?)

Um, in Linus' master, net, and net-next, neither TI_CPSW nor TI_CPTS
are default y, so I don't know where you are coming from on that.

Sorry,
Richard

^ permalink raw reply

* Re: [RFC] net : add tx timestamp to packet mmap.
From: Richard Cochran @ 2012-12-13 13:29 UTC (permalink / raw)
  To: Paul Chavent; +Cc: davem, edumazet, daniel.borkmann, xemul, ebiederm, netdev
In-Reply-To: <1355326165-12277-1-git-send-email-paul.chavent@onera.fr>

On Wed, Dec 12, 2012 at 04:29:25PM +0100, Paul Chavent wrote:
> This patch allow to generate tx timestamps of packets sent by the packet mmap interface.
> 
> Actually, you can't get tx timestamps with the sample code below.
> 
> I wonder if my current implementation is good. And if not, how should i get the timestamps ?

In order for time stamps to appear, somebody has to call
skb_tx_timestamp() ...

> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index e639645..948748b 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1857,6 +1857,10 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
>  	void *data;
>  	int err;
>  
> +	err = sock_tx_timestamp(&po->sk, &skb_shinfo(skb)->tx_flags);

and this call is only setting some flags.

HTH,
Richard

^ permalink raw reply

* [PATCH iproute2 6/6] ip/link_iptnl: fix indentation
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355406174-10586-1-git-send-email-nicolas.dichtel@6wind.com>

Use tabs instead of space when possible.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/link_iptnl.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 238722d..b00d8d9 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -298,10 +298,10 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 		fprintf(f, "nopmtudisc ");
 
 	if (tb[IFLA_IPTUN_FLAGS]) {
-	       __u16 iflags = rta_getattr_u16(tb[IFLA_IPTUN_FLAGS]);
+		__u16 iflags = rta_getattr_u16(tb[IFLA_IPTUN_FLAGS]);
 
-	      if (iflags & SIT_ISATAP)
-		      fprintf(f, "isatap ");
+		if (iflags & SIT_ISATAP)
+			fprintf(f, "isatap ");
 	}
 
 	if (tb[IFLA_IPTUN_6RD_PREFIXLEN] &&
@@ -314,12 +314,12 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 
 		printf("6rd-prefix %s/%u ",
 		       inet_ntop(AF_INET6, RTA_DATA(tb[IFLA_IPTUN_6RD_PREFIX]),
-			         s1, sizeof(s1)),
+				 s1, sizeof(s1)),
 		       prefixlen);
 		if (relayprefix) {
 			printf("6rd-relay_prefix %s/%u ",
 			       format_host(AF_INET, 4, &relayprefix, s1,
-				           sizeof(s1)),
+					   sizeof(s1)),
 			       relayprefixlen);
 		}
 	}
-- 
1.8.0.1

^ permalink raw reply related

* [PATCH iproute2 5/6] ip: term OPTIONS was used twice in 'ip route' man pages
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355406174-10586-1-git-send-email-nicolas.dichtel@6wind.com>

INFO_SPEC already uses the term 'OPTIONS' and describe it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 man/man8/ip-route.8.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index f06fcba..2c35a97 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -1,4 +1,4 @@
-.TH IP\-ROUTE 8 "20 Dec 2011" "iproute2" "Linux"
+.TH IP\-ROUTE 8 "13 Dec 2012" "iproute2" "Linux"
 .SH "NAME"
 ip-route \- routing table management
 .SH "SYNOPSIS"
@@ -7,7 +7,7 @@ ip-route \- routing table management
 .in +8
 .ti -8
 .B ip
-.RI "[ " OPTIONS " ]"
+.RI "[ " ip-OPTIONS " ]"
 .B route
 .RI " { " COMMAND " | "
 .BR help " }"
-- 
1.8.0.1

^ permalink raw reply related

* [PATCH iproute2 4/6] ip: update man pages for 'ip link'
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355406174-10586-1-git-send-email-nicolas.dichtel@6wind.com>

Now 'ip link' supports ipip, sit and ip6tnl.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 man/man8/ip-link.8.in | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 43c4ac6..8d2a6f9 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1,4 +1,4 @@
-.TH IP\-LINK 8 "20 Dec 2011" "iproute2" "Linux"
+.TH IP\-LINK 8 "13 Dec 2012" "iproute2" "Linux"
 .SH "NAME"
 ip-link \- network device configuration
 .SH "SYNOPSIS"
@@ -59,7 +59,10 @@ ip-link \- network device configuration
 .BR vcan " | "
 .BR veth " | "
 .BR vlan " | "
-.BR vxlan " ]"
+.BR vxlan " |"
+.BR ip6tnl " |"
+.BR ipip " |"
+.BR sit " ]"
 
 .ti -8
 .BI "ip link delete " DEVICE
@@ -174,6 +177,15 @@ Link types:
 .sp
 .BR vxlan
 - Virtual eXtended LAN
+.sp
+.BR ip6tnl
+- Virtual tunnel interface IPv4|IPv6 over IPv6
+.sp
+.BR ipip
+- Virtual tunnel interface IPv4 over IPv4
+.sp
+.BR sit
+- Virtual tunnel interface IPv6 over IPv4
 .in -8
 
 .TP
-- 
1.8.0.1

^ permalink raw reply related

* [PATCH iproute2 3/6] ip: update mand pages and usage() for 'ip mroute'
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355406174-10586-1-git-send-email-nicolas.dichtel@6wind.com>

Sync with the current code.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/ipmroute.c        |  2 ++
 man/man8/ip-mroute.8 | 14 +++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index defcfc5..345576d 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -36,6 +36,8 @@ static void usage(void) __attribute__((noreturn));
 static void usage(void)
 {
 	fprintf(stderr, "Usage: ip mroute show [ [ to ] PREFIX ] [ from PREFIX ] [ iif DEVICE ]\n");
+	fprintf(stderr, "                      [ table TABLE_ID ]\n");
+	fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
 #if 0
 	fprintf(stderr, "Usage: ip mroute [ add | del ] DESTINATION from SOURCE [ iif DEVICE ] [ oif DEVICE ]\n");
 #endif
diff --git a/man/man8/ip-mroute.8 b/man/man8/ip-mroute.8
index 98aab88..870df5e 100644
--- a/man/man8/ip-mroute.8
+++ b/man/man8/ip-mroute.8
@@ -1,4 +1,4 @@
-.TH IP\-MROUTE 8 "20 Dec 2011" "iproute2" "Linux"
+.TH IP\-MROUTE 8 "13 Dec 2012" "iproute2" "Linux"
 .SH "NAME"
 ip-mroute \- multicast routing cache management
 .SH "SYNOPSIS"
@@ -6,12 +6,15 @@ ip-mroute \- multicast routing cache management
 .ad l
 .in +8
 .ti -8
-.BR "ip mroute show" " ["
+.BR "ip " " [ ip-OPTIONS ] " "mroute show" " [ [ "
+.BR " to " " ] "
 .IR PREFIX " ] [ "
 .B  from
 .IR PREFIX " ] [ "
 .B  iif
-.IR DEVICE " ]"
+.IR DEVICE " ] [ "
+.B table
+.IR TABLE_ID " ] "
 
 .SH DESCRIPTION
 .B mroute
@@ -42,6 +45,11 @@ the interface on which multicast packets are received.
 .BI from " PREFIX"
 the prefix selecting the IP source addresses of the multicast route.
 
+.TP
+.BI table " TABLE_ID"
+the table id selecting the multicast table. It can be
+.BR local ", " main ", " default ", " all " or a number."
+
 .SH SEE ALSO
 .br
 .BR ip (8)
-- 
1.8.0.1

^ permalink raw reply related

* [PATCH iproute2 1/6] ip: add man pages for netconf
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

This patch add the documentation about 'ip netconf' command.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 man/man8/Makefile     |  2 +-
 man/man8/ip-netconf.8 | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/ip-netconf.8

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 4bad9d6..d208f3b 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -9,7 +9,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 \
 	ip-addrlabel.8 ip-l2tp.8 \
 	ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 \
 	ip-netns.8 ip-ntable.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 \
-	ip-tcp_metrics.8
+	ip-tcp_metrics.8 ip-netconf.8
 
 all: $(TARGETS)
 
diff --git a/man/man8/ip-netconf.8 b/man/man8/ip-netconf.8
new file mode 100644
index 0000000..8041ea2
--- /dev/null
+++ b/man/man8/ip-netconf.8
@@ -0,0 +1,36 @@
+.TH IP\-NETCONF 8 "13 Dec 2012" "iproute2" "Linux"
+.SH "NAME"
+ip-netconf \- network configuration monitoring
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.BR "ip " " [ ip-OPTIONS ] " "netconf show" " [ "
+.B dev
+.IR STRING " ]"
+
+.SH DESCRIPTION
+The
+.B ip netconf
+utility can monitor IPv4 and IPv6 parameters (see
+.BR "/proc/sys/net/ipv[4|6]/conf/[all|DEV]/" ")"
+like forwarding, rp_filter
+or mc_forwarding status.
+
+If no interface is specified, the entry
+.B all
+is displayed.
+
+.SS ip netconf show - display network parameters
+
+.TP
+.BI dev " STRING"
+the name of the device to display network parameters.
+
+.SH SEE ALSO
+.br
+.BR ip (8)
+
+.SH AUTHOR
+Original Manpage by Nicolas Dichtel <nicolas.dichtel@6wind.com>
-- 
1.8.0.1

^ permalink raw reply related

* [PATCH iproute2 2/6] ip: update man pages and usage() for 'ip monitor'
From: Nicolas Dichtel @ 2012-12-13 13:42 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1355406174-10586-1-git-send-email-nicolas.dichtel@6wind.com>

Sync with the current code.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/ipmonitor.c        |  5 ++++-
 man/man8/ip-monitor.8 | 15 +++++++++------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 09a339c..a9ff1e8 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -29,7 +29,10 @@ int prefix_banner;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ]\n");
+	fprintf(stderr, "Usage: ip monitor [ all | LISTofOBJECTS ] [ FILE ]\n");
+	fprintf(stderr, "LISTofOBJECTS := link | address | route | mroute | prefix |\n");
+	fprintf(stderr, "                 neigh | netconf\n");
+	fprintf(stderr, "FILE := file FILENAME\n");
 	exit(-1);
 }
 
diff --git a/man/man8/ip-monitor.8 b/man/man8/ip-monitor.8
index 351a744..b07cb0e 100644
--- a/man/man8/ip-monitor.8
+++ b/man/man8/ip-monitor.8
@@ -1,4 +1,4 @@
-.TH IP\-MONITOR 8 "20 Dec 2011" "iproute2" "Linux"
+.TH IP\-MONITOR 8 "13 Dec 2012" "iproute2" "Linux"
 .SH "NAME"
 ip-monitor, rtmon \- state monitoring
 .SH "SYNOPSIS"
@@ -6,8 +6,8 @@ ip-monitor, rtmon \- state monitoring
 .ad l
 .in +8
 .ti -8
-.BR "ip monitor" " [ " all " |"
-.IR LISTofOBJECTS " ]"
+.BR "ip " " [ ip-OPTIONS ] " "monitor" " [ " all " |"
+.IR LISTofOBJECTS " ] [ file " FILENAME " ]
 .sp
 
 .SH DESCRIPTION
@@ -20,12 +20,13 @@ Namely, the
 command is the first in the command line and then the object list follows:
 
 .BR "ip monitor" " [ " all " |"
-.IR LISTofOBJECTS " ]"
+.IR LISTofOBJECTS " ] [ file " FILENAME " ]
 
 .I OBJECT-LIST
 is the list of object types that we want to monitor.
 It may contain
-.BR link ", " address " and " route "."
+.BR link ", " address ", " route ", " mroute ", " prefix ", "
+.BR neigh " and " netconf "."
 If no
 .B file
 argument is given,
@@ -34,7 +35,9 @@ opens RTNETLINK, listens on it and dumps state changes in the format
 described in previous sections.
 
 .P
-If a file name is given, it does not listen on RTNETLINK,
+If a
+.I FILENAME
+is given, it does not listen on RTNETLINK,
 but opens the file containing RTNETLINK messages saved in binary format
 and dumps them.  Such a history file can be generated with the
 .B rtmon
-- 
1.8.0.1

^ permalink raw reply related

* Re: [RFC PATCH net-next 4/4 V4] try to fix performance regression
From: Weiping Pan @ 2012-12-13 14:05 UTC (permalink / raw)
  To: David Laight; +Cc: davem, brutus, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70ED@saturn3.aculab.com>

On 12/12/2012 10:57 PM, David Laight wrote:
>>          MS       BASE    AF_UNIX    FRIENDS            TCP_STREAM_MS
>>           1      10.70       5.40       4.02   37%   74%
>>           2      28.01       9.67       7.97   28%   82%
>>           4      55.53      19.78      16.48   29%   83%
>>           8     115.40      38.22      33.51   29%   87%
>>          16     227.31      81.06      67.70   29%   83%
>>          32     446.20     166.59     129.31   28%   77%
>>          64     849.04     336.77     259.43   30%   77%
>>         128    1440.50     661.88     530.43   36%   80%
>>         256    2404.70    1279.67    1029.15   42%   80%
>>         512    4331.53    2501.30    1942.21   44%   77%
>>        1024    6819.78    4622.37    4128.10   60%   89%
>>        2048   10544.60    6348.81    6349.59   60%  100%
>>        4096   12830.41    8324.43    7984.43   62%   95%
>>        8192   13462.65    8355.49   11079.37   82%  132%
>>       16384    9960.87   10840.13   13037.81  130%  120%
>>       32768    8749.31   11372.15   15087.08  172%  132%
>>       65536    7580.27   12150.23   14971.42  197%  123%
>>      131072    6727.74   11451.34   13604.78  202%  118%
>>      262144    7673.14   11613.10   11436.97  149%   98%
>>      524288    7366.17   11675.95   11559.43  156%   99%
>>     1048576    6608.57   11883.01   10103.20  152%   85%
>> MS means Message Size in bytes, that is -m -M for netperf
> If I read that table correctly, it seems to imply that
> something goes badly wrong for 'normal' TCP loopback
> connections when the read/write size exceeds 8k.
> Putting effort into fixing that would appear to be
> more worthwhile than the 'friends' code.
>
> 	David
>
Hi, David,

In my test program, I run normal tcp loopback then friends for each 
message size,
then it generates such strange numbers.

But if I just run normal tcp loopback for each message size, then the 
performance is stable.
[root@intel-s3e3432-01 ~]# cat base.sh
for s in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 
65536 131072 262144 524288 1048576
do
netperf -i -2,10 -I 95,20 -- -m $s -M $s | tail -n1
done


  87380  16384      1    10.09      15.51
  87380  16384      2    10.01      31.39
  87380  16384      4    10.00      55.78
  87380  16384      8    10.00     115.17
  87380  16384     16    10.00     231.66
  87380  16384     32    10.00     452.42
  87380  16384     64    10.00     859.92
  87380  16384    128    10.00    1464.91
  87380  16384    256    10.00    2613.12
  87380  16384    512    10.00    4338.88
  87380  16384   1024    10.00    7174.22
  87380  16384   2048    10.00    10452.84
  87380  16384   4096    10.00    11932.33
  87380  16384   8192    10.00    13750.49
  87380  16384  16384    10.00    13196.98
  87380  16384  32768    10.00    14881.25
  87380  16384  65536    10.00    13685.36
  87380  16384 131072    10.00    16088.71
  87380  16384 262144    10.00    17193.86
  87380  16384 524288    10.00    16696.07
  87380  16384 1048576    10.00    13638.13

thanks
Weiping Pan

^ permalink raw reply

* Re: [RFC PATCH net-next 4/4 V4] try to fix performance regression
From: Weiping Pan @ 2012-12-13 14:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, brutus, netdev
In-Reply-To: <1355329523.9139.578.camel@edumazet-glaptop>

On 12/13/2012 12:25 AM, Eric Dumazet wrote:
> On Wed, 2012-12-12 at 22:29 +0800, Weiping Pan wrote:
>
>>          MS       BASE    AF_UNIX    FRIENDS            TCP_STREAM_MS
>>           1      10.70       5.40       4.02   37%   74%
>>           2      28.01       9.67       7.97   28%   82%
>>           4      55.53      19.78      16.48   29%   83%
>>           8     115.40      38.22      33.51   29%   87%
>>          16     227.31      81.06      67.70   29%   83%
>>          32     446.20     166.59     129.31   28%   77%
>>          64     849.04     336.77     259.43   30%   77%
>>         128    1440.50     661.88     530.43   36%   80%
>>         256    2404.70    1279.67    1029.15   42%   80%
>>         512    4331.53    2501.30    1942.21   44%   77%
>>        1024    6819.78    4622.37    4128.10   60%   89%
>>        2048   10544.60    6348.81    6349.59   60%  100%
>>        4096   12830.41    8324.43    7984.43   62%   95%
>>        8192   13462.65    8355.49   11079.37   82%  132%
>>       16384    9960.87   10840.13   13037.81  130%  120%
>>       32768    8749.31   11372.15   15087.08  172%  132%
>>       65536    7580.27   12150.23   14971.42  197%  123%
>>      131072    6727.74   11451.34   13604.78  202%  118%
>>      262144    7673.14   11613.10   11436.97  149%   98%
>>      524288    7366.17   11675.95   11559.43  156%   99%
>>     1048576    6608.57   11883.01   10103.20  152%   85%
>> MS means Message Size in bytes, that is -m -M for netperf
> I cant reproduce your strange numbers here, they make no sense to me.
>
> for s in 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768
> 65536 131072 262144 524288 1048576
> do
>   ./netperf -- -m $s -M $s | tail -n1
> done
>
> Results :
>
> 87380  16384      1    10.00      34.68
>   87380  16384      2    10.00      68.07
>   87380  16384      4    10.00     126.27
>   87380  16384      8    10.00     284.50
>   87380  16384     16    10.00     574.38
>   87380  16384     32    10.00    1091.74
>   87380  16384     64    10.00    2130.23
>   87380  16384    128    10.00    4001.83
>   87380  16384    256    10.00    7666.01
>   87380  16384    512    10.00    13425.81
>   87380  16384   1024    10.00    21146.43
>   87380  16384   2048    10.00    28551.42
>   87380  16384   4096    10.00    37878.95
>   87380  16384   8192    10.00    42507.23
>   87380  16384  16384    10.00    46782.53
>   87380  16384  32768    10.00    42410.97
>   87380  16384  65536    10.00    43053.09
>   87380  16384 131072    10.00    44504.20
>   87380  16384 262144    10.00    50211.74
>   87380  16384 524288    10.00    54004.23
>   87380  16384 1048576    10.00    53852.26
>
>
>
Hi, Eric,

In my test program, I run normal tcp loopback then friends for each 
message size,
then it generates such strange numbers.

But if I just run normal tcp loopback for each message size, then the 
performance is stable.

Maybe I should make the environment clean before each test, like 
dropping cache.

thanks
Weiping Pan

^ permalink raw reply

* [PATCH] ndisc: Fix padding error in link-layer address option.
From: YOSHIFUJI Hideaki @ 2012-12-13 14:29 UTC (permalink / raw)
  To: davem, netdev; +Cc: yoshfuji

If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
post padding is not initialized correctly.

(Un)fortunately, the only type that requires pad is Infiniband,
whose pad is 2 and data_len is 20, and this logical error has not
become obvious, but it is better to fix.

Note that ndisc_opt_addr_space() handles the situation described
above correctly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
 net/ipv6/ndisc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 92909d2..2ed42c8 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -151,8 +151,8 @@ static inline int ndisc_opt_addr_space(struct net_device *dev)
 static u8 *ndisc_fill_addr_option(u8 *opt, int type, void *data, int data_len,
 				  unsigned short addr_type)
 {
-	int space = NDISC_OPT_SPACE(data_len);
 	int pad   = ndisc_addr_option_pad(addr_type);
+	int space = NDISC_OPT_SPACE(data_len + pad);

 	opt[0] = type;
 	opt[1] = space>>3;
-- 
1.7.9.5

^ permalink raw reply related

* Re: netconsole fun
From: Peter Hurley @ 2012-12-13 14:49 UTC (permalink / raw)
  To: Neil Horman; +Cc: Cong Wang, netdev
In-Reply-To: <20121213123611.GA12269@hmsreliant.think-freely.org>

On Thu, 2012-12-13 at 07:36 -0500, Neil Horman wrote:
> On Wed, Dec 12, 2012 at 03:59:17PM -0500, Peter Hurley wrote:
> > On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> > > On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > > > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > > > device?
> > > > > > > >
> > > > > > > 
> > > > > > > Yes, running it on the master device instead.
> > > > > > 
> > > > > > Thanks for the suggestion, but:
> > > > > > 
> > > > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > > > ...
> > > > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > > > [ 5.289929] netconsole: cleaning up
> > > > > > ...
> > > > > > [ 9.392291] Bridge firewalling registered
> > > > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > > > [ 9.418350] eth1:  setting full-duplex.
> > > > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > > > 
> > > > > > 
> > > > > > Is there a way to control or associate network device names prior to
> > > > > > udev renaming?
> > > > > > 
> > > > > That looks like a systemd problem (or more specifically a boot dependency
> > > > > problem).  You need to modify your netconsole unit/service file to start after
> > > > > all your networking is up.  NetworkManager provides a dummy service file for
> > > > > this purpose, called networkmanager-wait-online.service
> > > > 
> > > > Ok. So with a single physical network interface that will be bridged,
> > > > netconsole cannot used for kernel boot messages.
> > > > 
> > > > With a machine with multiple nics, is there a way to control device
> > > > naming so that the interface name to be used by netconsole specified on
> > > > the boot command line will actually corresponding to the intended
> > > > device. For example,
> > > > 
> > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > > > ....
> > > > [ 4.092184] 3c59x: Donald Becker and others.
> > > > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > > > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > > > ....
> > > > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > > > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > > > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > > > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > > > 
> > > > This is attaching netconsole to the wrong device because bus
> > > > enumeration, and therefore load order, is not consistent from boot to
> > > > boot.
> > > > 
> > > No, theres no way to do that.  As you note device ennumeration isn't consistent
> > > accross boots, thats why udev creates rules to rename devices based on immutable
> > > (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> > > happens, you'll have consistent names for your interfaces, and that work will be
> > > guaranteed to be done after networkmanager has finished opening all the
> > > interfaces that it needs (hence my suggestion to make netconsole service
> > > dependent on networkmanager service startup completing).
> > 
> > Just wondering if you think something like the patch below is
> > suitable/acceptable for insulating netconsole from inconsistent device
> > name scenarios without changing the existing semantics. The basic idea
> > is to allow an ethernet MAC address in the <dev> field of the
> > netconsole= options, and if a MAC address was specified rather than a
> > device name, to do the dev lookup from the MAC address instead.
> > 
> > This doesn't extend to, but also doesn't interfere with, the dynamic
> > config of netconsole via configfs.
> > 
> > Would you mind reviewing it?
> > 
> > Regards,
> > Peter
> > 
> This looks like a pretty good idea to me.  That said, something occured to me
> when you wrote your summary above.  Have you looked at the netconsole service
> scripts that most distros provide in their packaging?  I'm almost positive Red
> Hat/Fedora (and also like Suse and Ubuntu), already implement this functionality
> from user space.  Basically, instead of people just modprobing netconsole, they
> create a service script that parses a config file that has contains all the
> options needed to load the netconsole module, and it has the intellegence to see
> if you specified a mac address rather than a device.  If you did that it finds
> the corresponding device mac address and uses that as the device.  I'm sorry, I
> don't know why I didn't think of that before.  Check that out though, that will
> likey give you exactly what you need

Even with a udev rule to load netconsole that runs immediately after
device renaming (so before scripting), most of the dynamic module
loading has already happened so netconsole misses it. At least with the
patch, netconsole will load and attach to the proper interface much
earlier in the boot so that module-load-time messages will be caught.

There is an unforeseen consequence of the patch: it breaks device
renaming because the device will already be in use by netconsole. Which
is the whole problem with userspace device renaming to begin with...

I guess that leaves only the option of building in netconsole and the
driver that supplies the interface.

Oh well.

Regards,
Peter

^ permalink raw reply

* Re: GPF in skb_flow_dissect
From: Dave Jones @ 2012-12-13 15:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jason Wang, David Miller, netdev
In-Reply-To: <1355376177.12271.244.camel@edumazet-glaptop>

On Wed, Dec 12, 2012 at 09:22:57PM -0800, Eric Dumazet wrote:

 > Yes, commit 7694a3acc55a7 added this bug
 > 
 > Its illegal to use skb after call to netif_rx_ni(skb);
 > 
 > I would try following patch.

Looks like it does the right thing.  Thanks Eric.

	Dave

^ permalink raw reply

* [PATCH net-next] net: ethool: Document struct ethtool_flow_ext
From: Yan Burman @ 2012-12-13 15:20 UTC (permalink / raw)
  To: Or Gerlitz, Amir Vadai, netdev, David S. Miller, Ben Hutchings; +Cc: Yan Burman

Add documentation for struct ethtool_flow_ext especially in regard
to what flags are needed for which fields.

Signed-off-by: Yan Burman <yanb@mellanox.com>
---
 include/uapi/linux/ethtool.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index be8c41e..0c9b448 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -503,9 +503,20 @@ union ethtool_flow_union {
 	__u8					hdata[52];
 };
 
+/**
+ * struct ethtool_flow_ext - additional RX flow fields
+ * @h_dest: destination MAC address
+ * @vlan_etype: VLAN EtherType
+ * @vlan_tci: VLAN tag control information
+ * @data: user defined data
+ *
+ * Note, @vlan_etype, @vlan_tci, and @data are only valid if %FLOW_EXT
+ * is set in &struct ethtool_rx_flow_spec @flow_type.
+ * @h_dest is valid if %FLOW_MAC_EXT is set.
+ */
 struct ethtool_flow_ext {
 	__u8		padding[2];
-	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
+	unsigned char	h_dest[ETH_ALEN];
 	__be16		vlan_etype;
 	__be16		vlan_tci;
 	__be32		data[2];
@@ -519,7 +530,8 @@ struct ethtool_flow_ext {
  * @m_u: Masks for flow field bits to be matched
  * @m_ext: Masks for additional field bits to be matched
  *	Note, all additional fields must be ignored unless @flow_type
- *	includes the %FLOW_EXT flag.
+ *	includes the %FLOW_EXT or %FLOW_MAC_EXT flag
+ *	(see &struct ethtool_flow_ext description).
  * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
  *	if packets should be discarded
  * @location: Location of rule in the table.  Locations must be
-- 
1.7.11.3

^ permalink raw reply related

* Re: [RFC] net : add tx timestamp to packet mmap.
From: Paul Chavent @ 2012-12-13 16:13 UTC (permalink / raw)
  To: Richard Cochran; +Cc: davem, edumazet, daniel.borkmann, xemul, ebiederm, netdev
In-Reply-To: <20121213132916.GB10703@netboy.at.omicron.at>

Hello.

On 12/13/2012 02:29 PM, Richard Cochran wrote:
> On Wed, Dec 12, 2012 at 04:29:25PM +0100, Paul Chavent wrote:
>> This patch allow to generate tx timestamps of packets sent by the packet mmap interface.
>>
>> Actually, you can't get tx timestamps with the sample code below.
>>
>> I wonder if my current implementation is good. And if not, how should i get the timestamps ?
>
> In order for time stamps to appear, somebody has to call
> skb_tx_timestamp() ...
Yes. "Somebody" means "the hardware driver" after completing xmit. 
That's true ?

>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index e639645..948748b 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1857,6 +1857,10 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
>>   	void *data;
>>   	int err;
>>
>> +	err = sock_tx_timestamp(&po->sk, &skb_shinfo(skb)->tx_flags);
 >
 > and this call is only setting some flags.

Yes, it only sets some flags. I thought that those flags was required by 
the skb_tx_timestamp() in order to make the appropriate timestamping 
(hardware, software, etc).

So in order to have tx timestamp that work, both calls are needed ?

Why sock_tx_timestamp is called in packet_fill_skb and 
packet_sendmsg_spkt and not in tpacket_fill_skb ?
Why i can retrieve timestamps when i add this call ?

>
> HTH,
> Richard
>
Thank for your help.

Paul.

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-13 16:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, edumazet, bhutchings, mirqus, greearb, fbl
In-Reply-To: <20121212113433.21c05614@nehalam.linuxnetplumber.net>

Wed, Dec 12, 2012 at 08:34:33PM CET, shemminger@vyatta.com wrote:
>On Wed, 12 Dec 2012 20:06:13 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Wed, Dec 12, 2012 at 07:54:48PM CET, shemminger@vyatta.com wrote:
>> >On Wed, 12 Dec 2012 19:49:26 +0100
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >
>> >> Wed, Dec 12, 2012 at 07:36:32PM CET, shemminger@vyatta.com wrote:
>> >> >On Wed, 12 Dec 2012 19:25:56 +0100
>> >> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >
>> >> >> Wed, Dec 12, 2012 at 07:12:08PM CET, shemminger@vyatta.com wrote:
>> >> >> >On Wed, 12 Dec 2012 19:10:17 +0100
>> >> >> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >> >
>> >> >> >> ># ip li show dev dummy0
>> >> >> >> >12: dummy0: <NO-CARRIER,BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state DORMANT mode DORMANT   
>> >> >> >> 
>> >> >> >> if you mean this "NO-CARRIER"
>> >> >> >> it has no direct relation with netif_carrier_ok().
>> >> >> >
>> >> >> >It is the same value (IFF_RUNNING) that is visible from user space.
>> >> >> 
>> >> >> static inline bool netif_carrier_ok(const struct net_device *dev)
>> >> >> {
>> >> >> 	        return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
>> >> >> }
>> >> >> 
>> >> >> So netif_carrier[ok/on/off] are working with on __LINK_STATE_NOCARRIER
>> >> >> bit. Not with IFF_RUNNING flag.
>> >> >
>> >> >What is the code path that you are worried about netif_carrier_ok being set or clear?
>> >> >The interaction here is complex, and right now LINK_STATE_NOCARRIER is purely
>> >> >controlled by the driver, your patch changes that, but before acking I want
>> >> >to make sure why it is required.
>> >> 
>> >> This patchset would provide a possibility to set or clear the carrier
>> >> from userspace. For dummy device it would serve for direct emulation
>> >> of link fail.
>> >> 
>> >> Also for team deriver, that would serve for teamd (userspace part) to
>> >> set the carrier actually on or off (in case of LACP runner for example
>> >> this is required).
>> >> 
>> >
>> >You want to able to control the dummy device, so that you can test carrier
>> >management in the team device. Another alternative is to use carrier control
>> >on a virtual device. Vmware can do it, there were patches to do this with KVM/QEMU
>> >not sure if they ever got incorporated.
>> >
>> >Since this is a specific feature of the dummy device which is specialized for
>> >testing, maybe it should just be done by adding device specific ioctl rather
>> >than letting it creep in as a general facility.
>> 
>> Ugh, specific ioctl stinks...
>> But this is not only for dummy. As I said, we need this for team driver.
>> Maybe I did not explain that correctly. Given the fact that the whole
>> Team logic is in userspace, teamd (userspace daemon) needs to set the
>> carrier state as if it was done in kernel. Yes, we would be able to do
>> this by specific Team option in team driver, but I thought this would be
>> nicer to do that more generally.
>
>That is what the operstate mechanism was for. Why did we build that mechanism
>if it doesn't work from userspace.
>
>Maybe the fix is to make setting linkstate also set carrier bits.

Hmm. You mean to call netif_carrier_on/off as a reaction to operstate
change? How exactly would you like to do that?

Thanks

Jiri

^ permalink raw reply

* Re: [PATCH net-next] net: ethool: Document struct ethtool_flow_ext
From: Ben Hutchings @ 2012-12-13 16:27 UTC (permalink / raw)
  To: Yan Burman, David S. Miller; +Cc: Or Gerlitz, Amir Vadai, netdev
In-Reply-To: <1355412059-25663-1-git-send-email-yanb@mellanox.com>

On Thu, 2012-12-13 at 17:20 +0200, Yan Burman wrote:
> Add documentation for struct ethtool_flow_ext especially in regard
> to what flags are needed for which fields.
> 
> Signed-off-by: Yan Burman <yanb@mellanox.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>

David, please apply this to 'net' so that FLOW_MAC_EXT is properly
documented in 3.8.

Ben.

> ---
>  include/uapi/linux/ethtool.h | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index be8c41e..0c9b448 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -503,9 +503,20 @@ union ethtool_flow_union {
>  	__u8					hdata[52];
>  };
>  
> +/**
> + * struct ethtool_flow_ext - additional RX flow fields
> + * @h_dest: destination MAC address
> + * @vlan_etype: VLAN EtherType
> + * @vlan_tci: VLAN tag control information
> + * @data: user defined data
> + *
> + * Note, @vlan_etype, @vlan_tci, and @data are only valid if %FLOW_EXT
> + * is set in &struct ethtool_rx_flow_spec @flow_type.
> + * @h_dest is valid if %FLOW_MAC_EXT is set.
> + */
>  struct ethtool_flow_ext {
>  	__u8		padding[2];
> -	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
> +	unsigned char	h_dest[ETH_ALEN];
>  	__be16		vlan_etype;
>  	__be16		vlan_tci;
>  	__be32		data[2];
> @@ -519,7 +530,8 @@ struct ethtool_flow_ext {
>   * @m_u: Masks for flow field bits to be matched
>   * @m_ext: Masks for additional field bits to be matched
>   *	Note, all additional fields must be ignored unless @flow_type
> - *	includes the %FLOW_EXT flag.
> + *	includes the %FLOW_EXT or %FLOW_MAC_EXT flag
> + *	(see &struct ethtool_flow_ext description).
>   * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>   *	if packets should be discarded
>   * @location: Location of rule in the table.  Locations must be

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH] bridge: fix icmpv6 endian bug and other sparse warnings
From: Stephen Hemminger @ 2012-12-13 16:51 UTC (permalink / raw)
  To: kbuild test robot; +Cc: Cong Wang, netdev
In-Reply-To: <50c926ab.AI4jRk2MW3J/QuQj%fengguang.wu@intel.com>

Fix the warnings reported by sparse on recent bridge multicast
changes. Mostly just rcu annotation issues but in this case
sparse found a real bug! The ICMPv6 mld2 query mrc
values is in network byte order.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/bridge/br_multicast.c	2012-12-12 10:40:11.939838344 -0800
+++ b/net/bridge/br_multicast.c	2012-12-13 08:47:35.103982170 -0800
@@ -622,7 +622,7 @@ out:
 struct net_bridge_port_group *br_multicast_new_port_group(
 			struct net_bridge_port *port,
 			struct br_ip *group,
-			struct net_bridge_port_group *next)
+			struct net_bridge_port_group __rcu *next)
 {
 	struct net_bridge_port_group *p;
 
@@ -632,7 +632,7 @@ struct net_bridge_port_group *br_multica
 
 	p->addr = *group;
 	p->port = port;
-	p->next = next;
+	rcu_assign_pointer(p->next, next);
 	hlist_add_head(&p->mglist, &port->mglist);
 	setup_timer(&p->timer, br_multicast_port_group_expired,
 		    (unsigned long)p);
@@ -1138,7 +1138,7 @@ static int br_ip6_multicast_query(struct
 				  struct sk_buff *skb)
 {
 	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
-	struct mld_msg *mld = (struct mld_msg *) icmp6_hdr(skb);
+	struct mld_msg *mld;
 	struct net_bridge_mdb_entry *mp;
 	struct mld2_query *mld2q;
 	struct net_bridge_port_group *p;
@@ -1165,6 +1165,7 @@ static int br_ip6_multicast_query(struct
 		if (max_delay)
 			group = &mld->mld_mca;
 	} else if (skb->len >= sizeof(*mld2q)) {
+		u16 mrc;
 		if (!pskb_may_pull(skb, sizeof(*mld2q))) {
 			err = -EINVAL;
 			goto out;
@@ -1172,7 +1173,8 @@ static int br_ip6_multicast_query(struct
 		mld2q = (struct mld2_query *)icmp6_hdr(skb);
 		if (!mld2q->mld2q_nsrcs)
 			group = &mld2q->mld2q_mca;
-		max_delay = mld2q->mld2q_mrc ? MLDV2_MRC(mld2q->mld2q_mrc) : 1;
+		mrc = ntohs(mld2q->mld2q_mrc);
+		max_delay = mrc ? MLDV2_MRC(mrc) : 1;
 	}
 
 	if (!group)

^ permalink raw reply

* Re: Network namespace bugs in L2TP
From: Tom Parkin @ 2012-12-13 16:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev
In-Reply-To: <87k3snnjh7.fsf@xmission.com>

[-- Attachment #1: Type: text/plain, Size: 3784 bytes --]

On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
> >  1. Why do we need to change the namespace of the socket created in
> >     l2tp_tunnel_sock_create?  So far as I can tell, sock_create
> >     defaults to the namespace of the calling process.  Is the issue
> >     here that this code may run from a work queue or similar?
> 
> Something similar.  At the very least l2tp_tunnel_create which calls
> l2tp_tunnel_sock_create gets called from netlink.  The network namespace
> of a socket is not necessarily the same as the network namespace of the
> process that uses that socket.
> 
> So since current is not necessarily the right network namespace we need
> push the desired network namespace of the socket down into
> l2tp_tunnel_sock_create and use that when creating the socket.

Ah, I see.  I hadn't appreciated that a process might swap between
namespaces.

I think that raises a question in the case of the L2TP tunnel sockets,
though.  Currently l2tp_tunnel_sock_create uses the namespace of the
current process for the socket.  The alternative is to pass in the
desired namespace from l2tp_tunnel_create -- and this makes sense, I
think.

However, when l2tp_tunnel_create is called from the netlink code, the
namespace passed is that of the netlink socket.  At the risk of sounding
silly, what's the benefit of using the netlink socket namespace over the
process namespace in this case?

> >  2. You mentioned the need to keep track of sockets allocated within a
> >     namespace in order to be able to clean them up when the namespace
> >     is deleted.  Should we be keeping a list of sockets we create and
> >     then destroying them in the namespace pernet_ops exit function?
> 
> I think the issue that I was referring to and certainly the issue I am
> thinking about is the issue where normal sockets hold a reference to a
> network namespace and keep the network namespace alive.  Today l2tp uses
> sock_create when creating a socket, and as such I think it pins it
> current network namespace.  So I believe we can effectively have a
> reference counting loop with l2tp sockets pinning the network namespace
> and the network namespace keeping the l2tp device alive which keeps the
> l2tp socket alive.

OK, so presumably the way this would usually work is that a process
creates sockets, and when the process exits those sockets go away.
When all the processes in the namespace have exited, the namespace
can close because there are no sockets holding it open.  Is that
right?

If that's correct, then I suppose the issue with the L2TP tunnel socket
for an unmanaged tunnel is that it isn't owned by a process, per-se.
So there's no obvious way to get rid of it, apart from sending a
netlink message to tell the kernel to tear it down.

But that doesn't seem too unreasonable.  A user would have to take
explicit action to create an L2TP tunnel socket, and it might seem
reasonable for that socket to keep the namespace alive until the user
explicitly tears it down again.

> I don't remeber the specifics of l2tp as it creates some sockets, and
> has other sockets passed in, and as such has rules that are not at all
> normal.

Ack.  Sockets are created in the kernel code for "unmanaged" tunnels,
which don't run the control protocol over the top -- they're just for
data encapsulation/de-encapsulation.  "Managed" tunnels have a
userspace process looking after all the L2TP configuration and
control/keepalive protocol, and in this case the daemon handles the
creation of the tunnel socket.

Thanks,
Tom
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [patch net-next 0/4] net: allow to change carrier from userspace
From: John Fastabend @ 2012-12-13 17:15 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Stephen Hemminger, netdev, davem, edumazet, bhutchings, mirqus,
	greearb, fbl
In-Reply-To: <20121213161750.GA1914@minipsycho.orion>

[...]

>> That is what the operstate mechanism was for. Why did we build that mechanism
>> if it doesn't work from userspace.
>>
>> Maybe the fix is to make setting linkstate also set carrier bits.
>
> Hmm. You mean to call netif_carrier_on/off as a reaction to operstate
> change? How exactly would you like to do that?
>
> Thanks
>
> Jiri

This would break existing applications and would not really be in the
spirit of the operstate mechanism as I read the documentation:

./Documentation/networking/operstates.txt
  63 IF_OPER_DORMANT (5):
  64  Interface is L1 up, but waiting for an external event, f.e. for a
  65  protocol to establish. (802.1X)

The L1 up is netif_carrier_on here.

We use this in user space when we do not want applications to start
using the link until we have negotiated and configured some link layer
attributes. To do this we set IFLA_LINKMODE and then use the
IF_OPER_DORMANT event to trigger the application eventually setting
IF_OPER_UP when the link layer negotiation is complete. If you take the
carrier down this breaks. In my case the protocol is LLDP but I think
there are other examples. This is basically the example Stephen already
gave.

I guess I still am missing why teamd doesn't just set IFLA_LINKMODE
then manage the operstate this way? Sure teamd would have to become a
bit smarter but it would save adding additional interfaces to the
kernel. In the LinkAgg case you could just pin the operstate down this
would also allow protocols to run under the linkagg over the LLC in
IEEE speak which I think is being discussed in the latest round of
LLDP/LinkAgg spec updates (I'll check on that later today).

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-13 17:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Michael Richardson, netdev, tcpdump-workers, Francesco Ruggeri
In-Reply-To: <50C9936B.2000201@redhat.com>

On Thu, Dec 13, 2012 at 12:35 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 12/12/2012 10:53 PM, Ani Sinha wrote:
>>>
>>> unsigned int netdev_8021q_inskb = 1;
>>>
>>> ...
>>>         {
>>>                 .ctl_name       = NET_CORE_8021q_INSKB,
>>>                 .procname       = "netdev_8021q_inskb",
>>>                 .data           = &netdev_8021q_inskb,
>>>                 .maxlen         = sizeof(int),
>>>                 .mode           = 0444,
>>>                 .proc_handler   = proc_dointvec
>>>         },
>>>
>>> would seem to do it to me.
>>> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it
>>> finds it, and it is >0, then do the cmsg thing.
>>>
>>
>
> I think it doesn't. Because then you are obviously considering adding one
> procfs file into /proc/sys/net/core/ *for each* feature that is added into
> the ancillary ops which cannot be the right way ...

We had already brought up this topic previously in the same thread. A
suggestion was made to add that proc entry and no one from netdev
responded to it saying that it did not make any sense. Therefore
before I went ahead and made the fixes in libpcap, I wanted to run
this by your guys again to make sure we are still on the same page.

I do agree that instead of a /proc entry, we should check for a kenrel
version >= X where X is the upstream version that first started
supporting all the features needed by libpcap for vlan filtering. This
is not a compile time check but a run time one. Does anyone see any
issues with this? Is there any long term implications of this, like if
you backport patches to an older long term supported kernel? Are there
other better ways to do this, like may be returning feature bits from
an ioctl call? This is something we need to deal with on a continuous
basis as we keep supporting newer AUX fields and libpcap and other
user land code needs to make use of it. At the same time, they need to
handle backward compatibility issues with older kernels.

Thanks

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Nicolas Dichtel @ 2012-12-13 17:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka
In-Reply-To: <87sj7beyc1.fsf@xmission.com>

Le 12/12/2012 22:48, Eric W. Biederman a écrit :
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> It is very wrong to presume that without context you know the reason for
>> the exsitence of any network namespace and that you should or even that
>> you can manage it.  Think of running your multi-network namespace
>> managing application in a container.
>
> A good example of a network namespace you don't want to mess with are
> the network namespaces created by vsftp and chrome for security purposes
> to remove any possibility of creating new connections to the network.
>
Ok, I get the point.

A last question: from an administration point of view, is it intended to
not be able to monitor which netns are currently used? Like it can be done
for sockets, files, ...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox