Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/8] can: m_can: tag current CAN FD controllers as non-ISO
From: Marc Kleine-Budde @ 2015-01-15 16:11 UTC (permalink / raw)
  To: netdev
  Cc: davem, linux-can, kernel, Oliver Hartkopp, linux-stable,
	Marc Kleine-Budde
In-Reply-To: <1421338283-12417-1-git-send-email-mkl@pengutronix.de>

From: Oliver Hartkopp <socketcan@hartkopp.net>

During the CAN FD standardization process within the ISO it turned out that
the failure detection capability has to be improved.

The CAN in Automation organization (CiA) defined the already implemented CAN
FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937

Finally there will be three types of CAN FD controllers in the future:

1. ISO compliant (fixed)
2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)

So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
must not be set in ctrlmode_supported of the current M_CAN driver.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/m_can/m_can.c    | 5 +++++
 include/uapi/linux/can/netlink.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index d7bc462aafdc..244529881be9 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -955,6 +955,11 @@ static struct net_device *alloc_m_can_dev(void)
 	priv->can.data_bittiming_const = &m_can_data_bittiming_const;
 	priv->can.do_set_mode = m_can_set_mode;
 	priv->can.do_get_berr_counter = m_can_get_berr_counter;
+
+	/* CAN_CTRLMODE_FD_NON_ISO is fixed with M_CAN IP v3.0.1 */
+	priv->can.ctrlmode = CAN_CTRLMODE_FD_NON_ISO;
+
+	/* CAN_CTRLMODE_FD_NON_ISO can not be changed with M_CAN IP v3.0.1 */
 	priv->can.ctrlmode_supported = CAN_CTRLMODE_LOOPBACK |
 					CAN_CTRLMODE_LISTENONLY |
 					CAN_CTRLMODE_BERR_REPORTING |
diff --git a/include/uapi/linux/can/netlink.h b/include/uapi/linux/can/netlink.h
index 3e4323a3918d..94ffe0c83ce7 100644
--- a/include/uapi/linux/can/netlink.h
+++ b/include/uapi/linux/can/netlink.h
@@ -98,6 +98,7 @@ struct can_ctrlmode {
 #define CAN_CTRLMODE_BERR_REPORTING	0x10	/* Bus-error reporting */
 #define CAN_CTRLMODE_FD			0x20	/* CAN FD mode */
 #define CAN_CTRLMODE_PRESUME_ACK	0x40	/* Ignore missing CAN ACKs */
+#define CAN_CTRLMODE_FD_NON_ISO		0x80	/* CAN FD in non-ISO mode */
 
 /*
  * CAN device statistics
-- 
2.1.4


^ permalink raw reply related

* Re: Re: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Eric Dumazet @ 2015-01-15 16:00 UTC (permalink / raw)
  To: Thomas Jarosch
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <3089325.gjrPpo2XX1@storm>

On Thu, 2015-01-15 at 16:48 +0100, Thomas Jarosch wrote:
> On Thursday, 15. January 2015 07:25:32 Eric Dumazet wrote:
> > On Thu, 2015-01-15 at 15:58 +0100, Thomas Jarosch wrote:
> > > A colleague mentioned to me he saw the "Hardware Unit Hang" message
> > > every
> > > few days even running on kernel 3.4 (without your patch). Basically I'm
> > > testing now if that's still the case with 3.19-rc4+ or not.
> > > 
> > > I'm all for fixing the root cause. I'm just interested if the e1000e
> > > hang can even be triggered when using a max frag page size of 4096.
> > > So far it transferred 751.6 GiB without a hiccup.
> > 
> > You told it was forwarding setup.
> > 
> > 1) What is the NIC receiving traffic.
> > 2) What happens if you disable GRO on it ?
> 
> The setup is like this:
> 
> Win7 notebook (client)
>     -> "private LAN" eth0 (e1000e)
>         -> "external traffic" eth1 (r8169)
> 
>             -> local HTTP server in the intranet
>                (2x e1000e using bonding)
> 
> 
> Disabling gro on eth1 (r8169) seems to make eth0 (e1000e) stable.
> As it usually hangs within seconds, it already transferred 28 GiB right now.
> 
> When I switch gro back on, it takes around three seconds until the hang.
> 
> Does that point into the right / any direction?

Sure. 

Please apply this patch, and try to lower
/proc/sys/net/core/gro_max_frags and see if this makes a difference
(leaving GRO enabled)

(start with 7 and increase it, limit being 17)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 642d426a668f8ac94daf334c00117f96789f3990..817aee05a1b0623e5752beb0952a6fe6d66e583f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3400,6 +3400,7 @@ extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
 extern int		bpf_jit_enable;
+extern int		sysctl_gro_max_frags;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
 struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 56db472e9b864e805e0ab36dd73a0404d2fc66d5..c2c2e7e53014617c5da574f2eb8a2889ed743719 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3197,6 +3197,8 @@ err:
 }
 EXPORT_SYMBOL_GPL(skb_segment);
 
+int sysctl_gro_max_frags = MAX_SKB_FRAGS;
+
 int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
 	struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb);
@@ -3219,8 +3221,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		int i = skbinfo->nr_frags;
 		int nr_frags = pinfo->nr_frags + i;
 
-		if (nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+		if (nr_frags > sysctl_gro_max_frags)
+			return -E2BIG;
 
 		offset -= headlen;
 		pinfo->nr_frags = nr_frags;
@@ -3252,8 +3254,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		unsigned int first_size = headlen - offset;
 		unsigned int first_offset;
 
-		if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+		if (nr_frags + 1 + skbinfo->nr_frags > sysctl_gro_max_frags)
+			return -E2BIG;
 
 		first_offset = skb->data -
 			       (unsigned char *)page_address(page) +
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 31baba2a71ce15e49450f69dae81e7d3be1ff3f2..de73d51381bf8acd0aedeb859ed961468441014a 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -278,6 +278,13 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
+		.procname	= "gro_max_frags",
+		.data		= &sysctl_gro_max_frags,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
 		.procname	= "netdev_rss_key",
 		.data		= &netdev_rss_key,
 		.maxlen		= sizeof(int),

^ permalink raw reply related

* Re: Re: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Thomas Jarosch @ 2015-01-15 15:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <1421335532.11734.73.camel@edumazet-glaptop2.roam.corp.google.com>

On Thursday, 15. January 2015 07:25:32 Eric Dumazet wrote:
> On Thu, 2015-01-15 at 15:58 +0100, Thomas Jarosch wrote:
> > A colleague mentioned to me he saw the "Hardware Unit Hang" message
> > every
> > few days even running on kernel 3.4 (without your patch). Basically I'm
> > testing now if that's still the case with 3.19-rc4+ or not.
> > 
> > I'm all for fixing the root cause. I'm just interested if the e1000e
> > hang can even be triggered when using a max frag page size of 4096.
> > So far it transferred 751.6 GiB without a hiccup.
> 
> You told it was forwarding setup.
> 
> 1) What is the NIC receiving traffic.
> 2) What happens if you disable GRO on it ?

The setup is like this:

Win7 notebook (client)
    -> "private LAN" eth0 (e1000e)
        -> "external traffic" eth1 (r8169)

            -> local HTTP server in the intranet
               (2x e1000e using bonding)


Disabling gro on eth1 (r8169) seems to make eth0 (e1000e) stable.
As it usually hangs within seconds, it already transferred 28 GiB right now.

When I switch gro back on, it takes around three seconds until the hang.

Does that point into the right / any direction?

Thomas

^ permalink raw reply

* [PATCH net] net: sctp: fix race for one-to-many sockets in sendmsg's auto associate
From: Daniel Borkmann @ 2015-01-15 15:34 UTC (permalink / raw)
  To: davem; +Cc: vyasevich, netdev, linux-sctp

I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.

Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.

The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.

strace from example application (shortened):

socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
           msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0

tcpdump before patch (fooling the application):

22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

tcpdump after patch:

14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

Looks like this bug is from the pre-git history museum. ;)

Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/sctp/socket.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 2625ecc..aafe94b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1603,7 +1603,7 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
 	sctp_assoc_t associd = 0;
 	sctp_cmsgs_t cmsgs = { NULL };
 	sctp_scope_t scope;
-	bool fill_sinfo_ttl = false;
+	bool fill_sinfo_ttl = false, wait_connect = false;
 	struct sctp_datamsg *datamsg;
 	int msg_flags = msg->msg_flags;
 	__u16 sinfo_flags = 0;
@@ -1943,6 +1943,7 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
 		if (err < 0)
 			goto out_free;
 
+		wait_connect = true;
 		pr_debug("%s: we associated primitively\n", __func__);
 	}
 
@@ -1980,6 +1981,11 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
 	sctp_datamsg_put(datamsg);
 	err = msg_len;
 
+	if (unlikely(wait_connect)) {
+		timeo = sock_sndtimeo(sk, msg_flags & MSG_DONTWAIT);
+		sctp_wait_for_connect(asoc, &timeo);
+	}
+
 	/* If we are already past ASSOCIATE, the lower
 	 * layers are responsible for association cleanup.
 	 */
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH v2 2/2] fixup! net/macb: improved ethtool statistics support
From: Xander Huff @ 2015-01-15 15:32 UTC (permalink / raw)
  To: Nicolas Ferre, davem
  Cc: netdev, jaeden.amero, rich.tollerton, ben.shelton, brad.mouring,
	linux-kernel, cyrille.pitchen
In-Reply-To: <54B797E5.6070503@atmel.com>

On 1/15/2015 4:35 AM, Nicolas Ferre wrote:
>>   #define GEM_OTX					0x0100 /* Octets transmitted */
> I see, it's modified hereafter! Why not integrate this part in previous
> patch?
>
>

I split these up the way I did by using the --fixup argument to allow the rebase 
--autosquash to work. I could instead have a series of fixup commits to fix each 
type of style issue and then also the functional issue if you'd prefer, but 
--autosquash will no longer work.

^ permalink raw reply

* Re: [PATCH] e100: Don't enable WoL by default on Toshiba devices
From: Jeff Kirsher @ 2015-01-15 15:31 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: e1000-devel, netdev, David Miller, linux-kernel
In-Reply-To: <201501151618.07434.linux@rainbow-software.org>


[-- Attachment #1.1: Type: text/plain, Size: 2489 bytes --]

On Thu, 2015-01-15 at 16:18 +0100, Ondrej Zary wrote:
> On Thursday 15 January 2015, Jeff Kirsher wrote:
> > On Thu, 2015-01-15 at 14:40 +0100, Ondrej Zary wrote:
> > > On Thursday 13 November 2014, Jeff Kirsher wrote:
> > > > On Wed, 2014-11-12 at 18:18 -0500, David Miller wrote:
> > > > > From: Ondrej Zary <linux@rainbow-software.org>
> > > > > Date: Wed, 12 Nov 2014 23:47:25 +0100
> > > > >
> > > > > > Enabling WoL on some Toshiba laptops (such as Portege R100)
> > >
> > > causes
> > >
> > > > > > battery drain after shutdown (WoL is active even on battery).
> > >
> > > These
> > >
> > > > > > laptops have the WoL bit set in EEPROM ID, causing e100 driver
> > >
> > > to
> > >
> > > > > > enable WoL by default.
> > > > > >
> > > > > > Check subsystem vendor ID and if it's Toshiba, don't enable WoL
> > >
> > > by
> > >
> > > > > > default from EEPROM settings.
> > > > > >
> > > > > > Fixes
> > >
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/110784
> > >
> > > > > > Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
> > > > >
> > > > > Jeff, are you gonna pick this up?
> > > >
> > > > Yes, sorry I did not catch it earlier.
> > >
> > > What happened to this patch? I don't see it in net.git or
> > > net-next.git
> > > (checked both davem's and jkirsher's)
> >
> > Sorry, I thought I had replied with a NAK on this patch after further
> > review of the changes.
> >
> > We don't fix BIOS issues in the driver especially regarding feature
> > enablement like WoL.  We would end up with dozens of these kinds of fixes
> > if we to allow this.
> >
> > You should go back to the OEM and ask for a BIOS update to resolve this or
> > configure udev so that ethtool disables WoL.
> 
> This is not a BIOS bug. When the machine is powered off in BIOS (or GRUB), 
> everything is OK.
> 
> The bug is that e100 driver enables WoL based on some bit in EEPROM that 
> happens to be set on at least some Toshiba laptops and user has no way to 
> change it.

Yes, the EEPROM can be modified/updated through the BIOS update I
suggested earlier.  So again, a BIOS issue.

OR you can configure udev so that ethtool disables WoL if you do not
want to pursue a EEPROM update through a BIOS update.

>  Windows driver does not do this. Other Linux ethernet drivers 
> don't do this. When user wants WoL, (s)he enables it in BIOS and OS. Maybe 
> this (mis)feature should be removed from the driver.
> 



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 392 bytes --]

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [PATCH net-next] bridge: use MDBA_SET_ENTRY_MAX for maxtype in nlmsg_parse()
From: Nicolas Dichtel @ 2015-01-15 15:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, Nicolas Dichtel

This is just a cleanup, because in the current code MDBA_SET_ENTRY_MAX ==
MDBA_SET_ENTRY.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/bridge/br_mdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 5df05269d17a..fed61c971b17 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -276,7 +276,7 @@ static int br_mdb_parse(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net_device *dev;
 	int err;
 
-	err = nlmsg_parse(nlh, sizeof(*bpm), tb, MDBA_SET_ENTRY, NULL);
+	err = nlmsg_parse(nlh, sizeof(*bpm), tb, MDBA_SET_ENTRY_MAX, NULL);
 	if (err < 0)
 		return err;
 
-- 
2.2.2

^ permalink raw reply related

* Re: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Eric Dumazet @ 2015-01-15 15:25 UTC (permalink / raw)
  To: Thomas Jarosch
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <8088599.PZmG8U31O2@storm>

On Thu, 2015-01-15 at 15:58 +0100, Thomas Jarosch wrote:

> A colleague mentioned to me he saw the "Hardware Unit Hang" message every 
> few days even running on kernel 3.4 (without your patch). Basically I'm 
> testing now if that's still the case with 3.19-rc4+ or not.
> 
> I'm all for fixing the root cause. I'm just interested if the e1000e
> hang can even be triggered when using a max frag page size of 4096.
> So far it transferred 751.6 GiB without a hiccup.

You told it was forwarding setup.

1) What is the NIC receiving traffic.
2) What happens if you disable GRO on it ?

^ permalink raw reply

* Re: [patch-net-next v2 3/3] net: ethernet: cpsw: don't requests IRQs we don't use
From: Felipe Balbi @ 2015-01-15 15:20 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: Felipe Balbi, davem, Tony Lindgren, Linux OMAP Mailing List,
	netdev
In-Reply-To: <54B770FC.6060003@ti.com>

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

On Thu, Jan 15, 2015 at 01:19:16PM +0530, Mugunthan V N wrote:
> On Wednesday 14 January 2015 10:28 PM, Felipe Balbi wrote:
> > CPSW never uses RX_THRESHOLD or MISC interrupts. In
> > fact, they are always kept masked in their appropriate
> > IRQ Enable register.
> > 
> > Instead of allocating an IRQ that never fires, it's best
> > to remove that code altogether and let future patches
> > implement it if anybody needs those.
> > 
> > Signed-off-by: Felipe Balbi <balbi@ti.com>
> 
> Instead of introducing dummy ISR in previous patch and then removing in
> this patch, both can be squashed into a single patch.

sure they can. I decided to split to ease review and to make sure only
one thing happens in a single patch.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] e100: Don't enable WoL by default on Toshiba devices
From: Ondrej Zary @ 2015-01-15 15:18 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: e1000-devel, netdev, David Miller, linux-kernel
In-Reply-To: <1421333621.2632.19.camel@jtkirshe-mobl>

On Thursday 15 January 2015, Jeff Kirsher wrote:
> On Thu, 2015-01-15 at 14:40 +0100, Ondrej Zary wrote:
> > On Thursday 13 November 2014, Jeff Kirsher wrote:
> > > On Wed, 2014-11-12 at 18:18 -0500, David Miller wrote:
> > > > From: Ondrej Zary <linux@rainbow-software.org>
> > > > Date: Wed, 12 Nov 2014 23:47:25 +0100
> > > >
> > > > > Enabling WoL on some Toshiba laptops (such as Portege R100)
> >
> > causes
> >
> > > > > battery drain after shutdown (WoL is active even on battery).
> >
> > These
> >
> > > > > laptops have the WoL bit set in EEPROM ID, causing e100 driver
> >
> > to
> >
> > > > > enable WoL by default.
> > > > >
> > > > > Check subsystem vendor ID and if it's Toshiba, don't enable WoL
> >
> > by
> >
> > > > > default from EEPROM settings.
> > > > >
> > > > > Fixes
> >
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/110784
> >
> > > > > Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
> > > >
> > > > Jeff, are you gonna pick this up?
> > >
> > > Yes, sorry I did not catch it earlier.
> >
> > What happened to this patch? I don't see it in net.git or
> > net-next.git
> > (checked both davem's and jkirsher's)
>
> Sorry, I thought I had replied with a NAK on this patch after further
> review of the changes.
>
> We don't fix BIOS issues in the driver especially regarding feature
> enablement like WoL.  We would end up with dozens of these kinds of fixes
> if we to allow this.
>
> You should go back to the OEM and ask for a BIOS update to resolve this or
> configure udev so that ethtool disables WoL.

This is not a BIOS bug. When the machine is powered off in BIOS (or GRUB), 
everything is OK.

The bug is that e100 driver enables WoL based on some bit in EEPROM that 
happens to be set on at least some Toshiba laptops and user has no way to 
change it. Windows driver does not do this. Other Linux ethernet drivers 
don't do this. When user wants WoL, (s)he enables it in BIOS and OS. Maybe 
this (mis)feature should be removed from the driver.

-- 
Ondrej Zary

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* RE: [net-next 10/17] i40e: clean up PTP log messages
From: Nelson, Shannon @ 2015-01-15 15:01 UTC (permalink / raw)
  To: David Laight, Kirsher, Jeffrey T, davem@davemloft.net
  Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	jogreene@redhat.com, Keller, Jacob E
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CAC9AB0@AcuExch.aculab.com>

> From: David Laight [mailto:David.Laight@ACULAB.COM]
> Sent: Thursday, January 15, 2015 4:35 AM
> 
> From: Jeff Kirsher
> > From: Shannon Nelson <shannon.nelson@intel.com>
> >
> > The netdev name at init time often defaults to eth0 but later gets
> changed
> > by UDEV, so printing it here is misleading.
> 
> Without the interface name you stand zero chance of working out which
> one it is.
> With it, and provided all the interface renames get into dmesg, you
> stand
> at least some chance.

The dev_info() messages have the PCI device and function number in the string, so it's really not too hard to track down the resulting netdev port:
Jan 13 15:46:55 snelson3-cup kernel: [621235.401627] i40e 0000:84:00.1: PHC enabled

Later messages use netdev_info() and have both the device number and the netdev name:
Jan 13 15:46:56 snelson3-cup kernel: [621236.508868] i40e 0000:04:00.1 p261p2: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None

Note that the driver name appears in both as well.

sln

^ permalink raw reply

* Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Jeff Kirsher @ 2015-01-15 14:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Jarosch, 'Linux Netdev List', Eric Dumazet,
	e1000-devel
In-Reply-To: <1421333009.11734.53.camel@edumazet-glaptop2.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 2404 bytes --]

On Thu, 2015-01-15 at 06:43 -0800, Eric Dumazet wrote:
> On Thu, 2015-01-15 at 11:11 +0100, Thomas Jarosch wrote:
> > On Wednesday, 14. January 2015 09:20:52 Eric Dumazet wrote:
> > > I would try to use lower data per txd. I am not sure 24KB is really
> > > supported.
> > > 
> > > ( check commit d821a4c4d11ad160925dab2bb009b8444beff484 for details)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> > > b/drivers/net/ethernet/intel/e1000e/netdev.c index
> > > e14fd85f64eb..8d973f7edfbd 100644
> > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > > @@ -3897,7 +3897,7 @@ void e1000e_reset(struct e1000_adapter *adapter)
> > >  	 * limit of 24KB due to receive synchronization limitations.
> > >  	 */
> > >  	adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
> > > -				       24 << 10);
> > > +				       8 << 10);
> > > 
> > >  	/* Disable Adaptive Interrupt Moderation if 2 full packets cannot
> > >  	 * fit in receive buffer.
> > 
> > Thanks for checking!
> > 
> > I just tried that change on top of git f800c25 (git HEAD), same problem. 
> > Let's see what the Intel wizards come up with.
> > 
> > What "works" is to decrease the page size in git HEAD, too:
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 85ab7d7..9f0ef97 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -2108,7 +2108,7 @@ static inline void __skb_queue_purge(struct 
> > sk_buff_head *list)
> >                 kfree_skb(skb);
> >  }
> >  
> > -#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
> > +#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(4096)
> >  #define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER)
> >  #define NETDEV_PAGECNT_MAX_BIAS           NETDEV_FRAG_PAGE_MAX_SIZE
> > 
> > 
> > 
> > When I try a page size of 8192, it starts failing again. I'll now run
> > a stress test with 4096 to see if the problem is really gone
> > or just happens more rarely.
> 
> Sure, you basically reverted my patch.
> 
> You are not the first to report a problem caused by this patch.
> 
> This patch is known to have uncovered some driver bugs.
> 
> We are not going to revert it. We are going to fix the real bugs.
> 
> Thanks
> 
> 

Agreed, we are looking into issue Thomas.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Thomas Jarosch @ 2015-01-15 14:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <1421333009.11734.53.camel@edumazet-glaptop2.roam.corp.google.com>

On Thursday, 15. January 2015 06:43:29 Eric Dumazet wrote:
> > -#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
> > +#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(4096)
> > 
> >  #define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE <<
> >  NETDEV_FRAG_PAGE_MAX_ORDER) #define NETDEV_PAGECNT_MAX_BIAS          
> >  NETDEV_FRAG_PAGE_MAX_SIZE> 
> > When I try a page size of 8192, it starts failing again. I'll now run
> > a stress test with 4096 to see if the problem is really gone
> > or just happens more rarely.
> 
> Sure, you basically reverted my patch.
> 
> You are not the first to report a problem caused by this patch.
> 
> This patch is known to have uncovered some driver bugs.
> 
> We are not going to revert it. We are going to fix the real bugs.
> 
> Thanks

A colleague mentioned to me he saw the "Hardware Unit Hang" message every 
few days even running on kernel 3.4 (without your patch). Basically I'm 
testing now if that's still the case with 3.19-rc4+ or not.

I'm all for fixing the root cause. I'm just interested if the e1000e
hang can even be triggered when using a max frag page size of 4096.
So far it transferred 751.6 GiB without a hiccup.

Cheers,
Thomas

^ permalink raw reply

* Re: [PATCH] e100: Don't enable WoL by default on Toshiba devices
From: Jeff Kirsher @ 2015-01-15 14:53 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: David Miller, e1000-devel, netdev, linux-kernel
In-Reply-To: <201501151440.32159.linux@rainbow-software.org>

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

On Thu, 2015-01-15 at 14:40 +0100, Ondrej Zary wrote:
> On Thursday 13 November 2014, Jeff Kirsher wrote:
> > On Wed, 2014-11-12 at 18:18 -0500, David Miller wrote:
> > > From: Ondrej Zary <linux@rainbow-software.org>
> > > Date: Wed, 12 Nov 2014 23:47:25 +0100
> > >
> > > > Enabling WoL on some Toshiba laptops (such as Portege R100)
> causes
> > > > battery drain after shutdown (WoL is active even on battery).
> These
> > > > laptops have the WoL bit set in EEPROM ID, causing e100 driver
> to
> > > > enable WoL by default.
> > > >
> > > > Check subsystem vendor ID and if it's Toshiba, don't enable WoL
> by
> > > > default from EEPROM settings.
> > > >
> > > > Fixes
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/110784
> > > >
> > > > Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
> > >
> > > Jeff, are you gonna pick this up?
> >
> > Yes, sorry I did not catch it earlier.
> 
> What happened to this patch? I don't see it in net.git or
> net-next.git 
> (checked both davem's and jkirsher's)

Sorry, I thought I had replied with a NAK on this patch after further
review of the changes.

We don't fix BIOS issues in the driver especially regarding feature
enablement like WoL.  We would end up with dozens of these kinds of fixes
if we to allow this.

You should go back to the OEM and ask for a BIOS update to resolve this or
configure udev so that ethtool disables WoL.  


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Eric Dumazet @ 2015-01-15 14:43 UTC (permalink / raw)
  To: Thomas Jarosch
  Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
	e1000-devel
In-Reply-To: <5229621.KczjbIR22Q@storm>

On Thu, 2015-01-15 at 11:11 +0100, Thomas Jarosch wrote:
> On Wednesday, 14. January 2015 09:20:52 Eric Dumazet wrote:
> > I would try to use lower data per txd. I am not sure 24KB is really
> > supported.
> > 
> > ( check commit d821a4c4d11ad160925dab2bb009b8444beff484 for details)
> > 
> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> > b/drivers/net/ethernet/intel/e1000e/netdev.c index
> > e14fd85f64eb..8d973f7edfbd 100644
> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > @@ -3897,7 +3897,7 @@ void e1000e_reset(struct e1000_adapter *adapter)
> >  	 * limit of 24KB due to receive synchronization limitations.
> >  	 */
> >  	adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
> > -				       24 << 10);
> > +				       8 << 10);
> > 
> >  	/* Disable Adaptive Interrupt Moderation if 2 full packets cannot
> >  	 * fit in receive buffer.
> 
> Thanks for checking!
> 
> I just tried that change on top of git f800c25 (git HEAD), same problem. 
> Let's see what the Intel wizards come up with.
> 
> What "works" is to decrease the page size in git HEAD, too:
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 85ab7d7..9f0ef97 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2108,7 +2108,7 @@ static inline void __skb_queue_purge(struct 
> sk_buff_head *list)
>                 kfree_skb(skb);
>  }
>  
> -#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
> +#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(4096)
>  #define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER)
>  #define NETDEV_PAGECNT_MAX_BIAS           NETDEV_FRAG_PAGE_MAX_SIZE
> 
> 
> 
> When I try a page size of 8192, it starts failing again. I'll now run
> a stress test with 4096 to see if the problem is really gone
> or just happens more rarely.

Sure, you basically reverted my patch.

You are not the first to report a problem caused by this patch.

This patch is known to have uncovered some driver bugs.

We are not going to revert it. We are going to fix the real bugs.

Thanks

^ permalink raw reply

* Re: [PATCH 0/2] Remove T4 FCoE support
From: Hannes Reinecke @ 2015-01-15 14:25 UTC (permalink / raw)
  To: Praveen Madhavan, netdev, linux-scsi
  Cc: davem, JBottomley, hch, hariprasad, varun
In-Reply-To: <cover.1421328605.git.praveenm@chelsio.com>

On 01/15/2015 02:45 PM, Praveen Madhavan wrote:
> These patches removes FCoE support for chelsio T4 adapter.
> Please apply on net-next since depends on previous commits.
> 
> Praveen Madhavan (2):
>   csiostor:Remove T4 FCoE support.
>   csiostor:Removed file csio_hw_t4.c
> 
>  drivers/scsi/csiostor/Makefile       |   2 +-
>  drivers/scsi/csiostor/csio_hw.c      |  61 ++----
>  drivers/scsi/csiostor/csio_hw_chip.h |  43 ----
>  drivers/scsi/csiostor/csio_hw_t4.c   | 404 -----------------------------------
>  drivers/scsi/csiostor/csio_init.c    |   6 +-
>  drivers/scsi/csiostor/csio_wr.c      |  15 +-
>  6 files changed, 25 insertions(+), 506 deletions(-)
>  delete mode 100644 drivers/scsi/csiostor/csio_hw_t4.c
> 
Any particular reason for this?
It worked at one point, didn't it?

So why do you want to remove it?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* [PATCH net-next v5 1/4] netns: add rtnl cmd to add and get peer netns ids
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1421331078-21622-1-git-send-email-nicolas.dichtel@6wind.com>

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).

The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.

These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                        |   1 +
 include/net/net_namespace.h        |   4 +
 include/uapi/linux/Kbuild          |   1 +
 include/uapi/linux/net_namespace.h |  23 ++++
 include/uapi/linux/rtnetlink.h     |   5 +
 net/core/net_namespace.c           | 210 +++++++++++++++++++++++++++++++++++++
 6 files changed, 244 insertions(+)
 create mode 100644 include/uapi/linux/net_namespace.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9de900572633..9b91d9f0257e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6578,6 +6578,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/net_namespace.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 2e8756b8c775..36faf4990c4b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -60,6 +60,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	struct ns_common	ns;
 
@@ -290,6 +291,9 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 00b100023c47..14b7b6e44c77 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -283,6 +283,7 @@ header-y += net.h
 header-y += netlink_diag.h
 header-y += netlink.h
 header-y += netrom.h
+header-y += net_namespace.h
 header-y += net_tstamp.h
 header-y += nfc.h
 header-y += nfs2.h
diff --git a/include/uapi/linux/net_namespace.h b/include/uapi/linux/net_namespace.h
new file mode 100644
index 000000000000..778cd2c3ebf4
--- /dev/null
+++ b/include/uapi/linux/net_namespace.h
@@ -0,0 +1,23 @@
+/* Copyright (c) 2015 6WIND S.A.
+ * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+#ifndef _UAPI_LINUX_NET_NAMESPACE_H_
+#define _UAPI_LINUX_NET_NAMESPACE_H_
+
+/* Attributes of RTM_NEWNSID/RTM_GETNSID messages */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSID_NOT_ASSIGNED -1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NET_NAMESPACE_H_ */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index a1d18593f41e..5cc5d66bf519 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -132,6 +132,11 @@ enum {
 	RTM_GETMDB = 86,
 #define RTM_GETMDB RTM_GETMDB
 
+	RTM_NEWNSID = 88,
+#define RTM_NEWNSID RTM_NEWNSID
+	RTM_GETNSID = 90,
+#define RTM_GETNSID RTM_GETNSID
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ce780c722e48..edf089dd792a 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,10 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/net_namespace.h>
+#include <linux/rtnetlink.h>
+#include <net/sock.h>
+#include <net/netlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +148,77 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+static int alloc_netid(struct net *net, struct net *peer, int reqid)
+{
+	int min = 0, max = 0;
+
+	ASSERT_RTNL();
+
+	if (reqid >= 0) {
+		min = reqid;
+		max = reqid + 1;
+	}
+
+	return idr_alloc(&net->netns_ids, peer, min, max, GFP_KERNEL);
+}
+
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * NET_ID_ZERO (-1) for it.
+ */
+#define NET_ID_ZERO -1
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : NET_ID_ZERO;
+	return 0;
+}
+
+static int __peernet2id(struct net *net, struct net *peer, bool alloc)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == NET_ID_ZERO)
+		return 0;
+	if (id > 0)
+		return id;
+
+	if (alloc)
+		return alloc_netid(net, peer, -1);
+
+	return -ENOENT;
+}
+
+/* This function returns the id of a peer netns. If no id is assigned, one will
+ * be allocated and returned.
+ */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = __peernet2id(net, peer, true);
+
+	return id >= 0 ? id : NETNSA_NSID_NOT_ASSIGNED;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +233,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +364,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = __peernet2id(tmp, net, false);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -402,6 +486,129 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct nla_policy rtnl_net_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[NETNSA_MAX + 1];
+	struct net *peer;
+	int nsid, err;
+
+	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+			  rtnl_net_policy);
+	if (err < 0)
+		return err;
+	if (!tb[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(tb[NETNSA_NSID]);
+
+	if (tb[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
+	else if (tb[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	if (__peernet2id(net, peer, false) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = alloc_netid(net, peer, nsid);
+	if (err > 0)
+		err = 0;
+out:
+	put_net(peer);
+	return err;
+}
+
+static int rtnl_net_get_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct rtgenmsg))
+	       + nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	struct nlmsghdr *nlh;
+	struct rtgenmsg *rth;
+	int id;
+
+	ASSERT_RTNL();
+
+	nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	rth = nlmsg_data(nlh);
+	rth->rtgen_family = AF_UNSPEC;
+
+	id = __peernet2id(net, peer, false);
+	if  (id < 0)
+		id = NETNSA_NSID_NOT_ASSIGNED;
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static int rtnl_net_getid(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct nlattr *tb[NETNSA_MAX + 1];
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+			  rtnl_net_policy);
+	if (err < 0)
+		return err;
+	if (tb[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
+	else if (tb[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
+			    RTM_GETNSID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
@@ -435,6 +642,9 @@ static int __init net_ns_init(void)
 
 	register_pernet_subsys(&net_ns_ops);
 
+	rtnl_register(PF_UNSPEC, RTM_NEWNSID, rtnl_net_newid, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, NULL, NULL);
+
 	return 0;
 }
 
-- 
2.2.2

^ permalink raw reply related

* [PATCH net-next v5 3/4] tunnels: advertise link netns via netlink
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1421331078-21622-1-git-send-email-nicolas.dichtel@6wind.com>

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 drivers/net/vxlan.c      | 8 ++++++++
 include/net/ip6_tunnel.h | 1 +
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/ip6_gre.c       | 1 +
 net/ipv6/ip6_tunnel.c    | 9 +++++++++
 net/ipv6/ip6_vti.c       | 1 +
 net/ipv6/sit.c           | 1 +
 11 files changed, 34 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6b6b45622a0a..88dbb1edea6e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2922,6 +2922,13 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+static struct net *vxlan_get_link_net(const struct net_device *dev)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+
+	return vxlan->net;
+}
+
 static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.kind		= "vxlan",
 	.maxtype	= IFLA_VXLAN_MAX,
@@ -2933,6 +2940,7 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.dellink	= vxlan_dellink,
 	.get_size	= vxlan_get_size,
 	.fill_info	= vxlan_fill_info,
+	.get_link_net	= vxlan_get_link_net,
 };
 
 static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 9326c41c2d7f..76c091b53dae 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -70,6 +70,7 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t, const struct in6_addr *laddr,
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw);
 __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct in6_addr *laddr,
 			     const struct in6_addr *raddr);
+struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 
 static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index ce4db3cc5647..2c47061a6954 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -141,6 +141,7 @@ int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 942576e27df1..6e7727f27393 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -829,6 +829,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -843,6 +844,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d3e447936720..2cd08280c77b 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -972,6 +972,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 1a7e979e80ba..94efe148181c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -531,6 +531,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.dellink        = ip_tunnel_dellink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 40403114f00a..b58d6689874c 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -498,6 +498,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 13cda4c6313b..9306a5ff9149 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1662,6 +1662,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = {
 	.dellink	= ip6gre_dellink,
 	.get_size	= ip6gre_get_size,
 	.fill_info	= ip6gre_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 92b3da571980..266a264ec212 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1760,6 +1760,14 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct net *ip6_tnl_get_link_net(const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip6_tnl_get_link_net);
+
 static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_LINK]		= { .type = NLA_U32 },
 	[IFLA_IPTUN_LOCAL]		= { .len = sizeof(struct in6_addr) },
@@ -1783,6 +1791,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct xfrm6_tunnel ip4ip6_handler __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ace10d0b3aac..5fb9e212eca8 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1016,6 +1016,7 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.changelink	= vti6_changelink,
 	.get_size	= vti6_get_size,
 	.fill_info	= vti6_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 213546bd6d5d..3cc197c72b59 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1763,6 +1763,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.2.2

^ permalink raw reply related

* [PATCH net-next v5 0/4] netns: allow to identify peer netns
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang
In-Reply-To: <87wq7g831b.fsf@x220.int.ebiederm.org>


The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some information contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids of peer netns can be set by userland via a new rtnl cmd RTM_NEWNSID. When
the kernel needs an id for a peer (for example when advertising a new x-netns
interface via netlink), if the user didn't allocate an id, one will be
automatically allocated.
These ids are stored per netns and are local (ie only valid in the netns where
they are set). To avoid allocating an int for each peer netns, I use
idr_for_each() to retrieve the id of a peer netns. Note that it will be possible
to add a table (struct net -> id) later to optimize this lookup if needed.

Patch 1/4 introduces the rtnetlink API mechanism to set and get these ids.
Patch 2/4 and 3/4 implements an example of how to use these ids when advertising
information about a x-netns interface.
And patch 4/4 shows that the netlink messages can be symetric between a GET and
a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland.

# Initialization:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip -n foo netns
foo
init_net
$ ip -n foo netns set init_net 0
$ ip -n foo netns set foo 1

# Only netns seen from foo have an id:
$ ip netns
foo
init_net
$ ip -n foo netns
foo (id: 1)
init_net (id: 0)

# Add a 4in4 x-netns interface with a link-netnsid option and check the dump:
$ ip -n foo link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip -n foo link ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
# The parameter link-netnsid shows us where the interface sends and receives
# packets (and thus we know where encapsulated addresses are set).

# Add a 4in4 x-netns interface without a link-netnsid option and check that an
# id is allocated in init_net for foo
$ ip netns
foo
init_net
$ ip -n foo link add ipip2 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip -n foo link set ipip2 netns init_net
$ ip link ls ipip2
7: ipip2@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
$ ip netns
foo (id: 0)
init_net


v4 -> v5:
  use rtnetlink instead of genetlink
  allocate automatically an id if user didn't assign one
  rename include/uapi/linux/netns.h to include/uapi/linux/net_namespace.h
  add vxlan in patch #3

RFCv3 -> v4:
  rebase on net-next
  add copyright text in the new netns.h file

RFCv2 -> RFCv3:
  ids are now defined by userland (via netlink). Ids are stored in each netns
  (and they are local to this netns).
  add get_link_net support for ip6 tunnels
  netnsid is now a s32 instead of a u32

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                        |   1 +
 drivers/net/vxlan.c                |   8 ++
 include/net/ip6_tunnel.h           |   1 +
 include/net/ip_tunnels.h           |   1 +
 include/net/net_namespace.h        |   4 +
 include/net/rtnetlink.h            |   2 +
 include/uapi/linux/Kbuild          |   1 +
 include/uapi/linux/if_link.h       |   1 +
 include/uapi/linux/net_namespace.h |  23 ++++
 include/uapi/linux/rtnetlink.h     |   5 +
 net/core/net_namespace.c           | 210 +++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c               |  38 ++++++-
 net/ipv4/ip_gre.c                  |   2 +
 net/ipv4/ip_tunnel.c               |   8 ++
 net/ipv4/ip_vti.c                  |   1 +
 net/ipv4/ipip.c                    |   1 +
 net/ipv6/ip6_gre.c                 |   1 +
 net/ipv6/ip6_tunnel.c              |   9 ++
 net/ipv6/ip6_vti.c                 |   1 +
 net/ipv6/sit.c                     |   1 +
 20 files changed, 316 insertions(+), 3 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply

* [PATCH net-next v5 2/4] rtnl: add link netns id to interface messages
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1421331078-21622-1-git-send-email-nicolas.dichtel@6wind.com>

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 2a8380edbb7e..0deee3eeddbf 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -146,6 +146,7 @@ enum {
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
 	IFLA_PHYS_SWITCH_ID,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6a6cdade1676..ab78ba9a34e8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -875,6 +875,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1169,6 +1170,18 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.2.2

^ permalink raw reply related

* [PATCH net-next v5 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set
From: Nicolas Dichtel @ 2015-01-15 14:11 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1421331078-21622-1-git-send-email-nicolas.dichtel@6wind.com>

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ab78ba9a34e8..b2f6d8285a24 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1247,6 +1247,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
 	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
+	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2020,7 +2021,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2126,7 +2127,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_ns_by_id(dest_net, id);
+			if (!link_net) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2154,9 +2166,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, dest_net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.2.2

^ permalink raw reply related

* [PATCH 2/2] vxlan: synchronize <linux/if_link.h> header
From: Thomas Graf @ 2015-01-15 13:54 UTC (permalink / raw)
  To: stephen; +Cc: netdev
In-Reply-To: <cover.1421329802.git.tgraf@suug.ch>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/linux/if_link.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 167ec34..0b33d0a 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -368,6 +368,9 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_REMCSUM_TX,
+	IFLA_VXLAN_REMCSUM_RX,
+	IFLA_VXLAN_GBP,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
-- 
1.9.3

^ permalink raw reply related

* [PATCH 1/2] vxlan: Group policy extension
From: Thomas Graf @ 2015-01-15 13:54 UTC (permalink / raw)
  To: stephen; +Cc: netdev
In-Reply-To: <cover.1421329802.git.tgraf@suug.ch>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 ip/iplink_vxlan.c     | 11 +++++++++++
 man/man8/ip-link.8.in | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 9cc3ec3..473ff97 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -30,6 +30,7 @@ static void print_explain(FILE *f)
 	fprintf(f, "                 [ [no]l2miss ] [ [no]l3miss ]\n");
 	fprintf(f, "                 [ ageing SECONDS ] [ maxaddress NUMBER ]\n");
 	fprintf(f, "                 [ [no]udpcsum ] [ [no]udp6zerocsumtx ] [ [no]udp6zerocsumrx ]\n");
+	fprintf(f, "                 [ gbp ]\n");
 	fprintf(f, "\n");
 	fprintf(f, "Where: VNI := 0-16777215\n");
 	fprintf(f, "       ADDR := { IP_ADDRESS | any }\n");
@@ -68,6 +69,7 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 udpcsum = 0;
 	__u8 udp6zerocsumtx = 0;
 	__u8 udp6zerocsumrx = 0;
+	__u8 gbp = 0;
 	int dst_port_set = 0;
 	struct ifla_vxlan_port_range range = { 0, 0 };
 
@@ -197,6 +199,8 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			udp6zerocsumrx = 1;
 		} else if (!matches(*argv, "noudp6zerocsumrx")) {
 			udp6zerocsumrx = 0;
+		} else if (!matches(*argv, "gbp")) {
+			gbp = 1;
 		} else if (matches(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -268,6 +272,10 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	if (dstport)
 		addattr16(n, 1024, IFLA_VXLAN_PORT, htons(dstport));
 
+	if (gbp)
+		addattr_l(n, 1024, IFLA_VXLAN_GBP, NULL, 0);
+
+
 	return 0;
 }
 
@@ -398,6 +406,9 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_VXLAN_UDP_ZERO_CSUM6_RX] &&
 	    rta_getattr_u8(tb[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		fputs("udp6zerocsumrx ", f);
+
+	if (tb[IFLA_VXLAN_GBP])
+		fputs("gbp ", f);
 }
 
 static void vxlan_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 1209b55..be52ac6 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -276,6 +276,8 @@ the following additional arguments are supported:
 .BI ageing " SECONDS "
 .R " ] [ "
 .BI maxaddress " NUMBER "
+.R " ] [ "
+.B gbp
 .R " ]"
 
 .in +8
@@ -348,6 +350,49 @@ are entered into the VXLAN device forwarding database.
 .BI maxaddress " NUMBER"
 - specifies the maximum number of FDB entries.
 
+.sp
+.B gbp
+- enables the Group Policy extension (VXLAN-GBP).
+
+.in +4
+Allows to transport group policy context across VXLAN network peers.
+If enabled, includes the mark of a packet in the VXLAN header for outgoing
+packets and fills the packet mark based on the information found in the
+VXLAN header for incomming packets.
+
+Format of upper 16 bits of packet mark (flags);
+
+.in +2
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+.br
+|-|-|-|-|-|-|-|-|-|D|-|-|A|-|-|-|
+.br
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+.B D :=
+Don't Learn bit. When set, this bit indicates that the egress
+VTEP MUST NOT learn the source address of the encapsulated frame.
+
+.B A :=
+Indicates that the group policy has already been applied to
+this packet. Policies MUST NOT be applied by devices when the A bit is set.
+.in -2
+
+Format of lower 16 bits of packet mark (policy ID):
+
+.in +2
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+.br
+|        Group Policy ID        |
+.br
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+.in -2
+
+Example:
+  iptables -A OUTPUT [...] -j MARK --set-mark 0x800FF
+
+.in -4
+
 .in -8
 
 .TP
-- 
1.9.3

^ permalink raw reply related

* [PATCH 0/2 net-next] iproute2: VXLAN Group Policy extension
From: Thomas Graf @ 2015-01-15 13:54 UTC (permalink / raw)
  To: stephen; +Cc: netdev

Stephen,

Against net-next branch. I assume you only want to apply patch 1/2 after
the next header sync. I've included 2/2 if anyone wants to use it now and
needs the header bits

Thomas Graf (2):
  vxlan: Group policy extension
  vxlan: synchronize <linux/if_link.h> header

 include/linux/if_link.h |  3 +++
 ip/iplink_vxlan.c       | 11 +++++++++++
 man/man8/ip-link.8.in   | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

-- 
1.9.3

^ permalink raw reply

* Re: [PATCH 1/3] rtlwifi: btcoexist: Remove some unused functions
From: Kalle Valo @ 2015-01-15 13:49 UTC (permalink / raw)
  To: Larry Finger
  Cc: Rickard Strandqvist, Masanari Iida, Sachin Kamat, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <54B15D57.5060709@lwfinger.net>

Larry Finger <Larry.Finger@lwfinger.net> writes:

> On 01/10/2015 10:24 AM, Rickard Strandqvist wrote:
>> Removes some functions that are not used anywhere:
>> ex_halbtc8821a1ant_periodical() ex_halbtc8821a1ant_pnp_notify()
>> ex_halbtc8821a1ant_halt_notify() ex_halbtc8821a1ant_bt_info_notify()
>> ex_halbtc8821a1ant_special_packet_notify() ex_halbtc8821a1ant_connect_notify()
>> ex_halbtc8821a1ant_scan_notify() ex_halbtc8821a1ant_lps_notify()
>> ex_halbtc8821a1ant_ips_notify() ex_halbtc8821a1ant_display_coex_info()
>> ex_halbtc8821a1ant_init_coex_dm() ex_halbtc8821a1ant_init_hwconfig()
>>
>> This was partially found by using a static code analysis program called cppcheck.
>>
>> Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
>
> NACK!!!!!!!!!
>
> I told you to stay away from these routines until I finish my update.
> Not only might you remove some functions that I will be needing later,
> but you run the risk of merge complications.
>
> Kalle: Please ignore EVERYTHING from this person. Obviously, he is
> incapable of understanding even the simplest instructions.

Yeah, I have to agree with that. I think he is just too fixated creating
the patches automatically.

I have dropped the patches.

-- 
Kalle Valo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox