Netdev List
 help / color / mirror / Atom feed
* Re: [RFC iproute2 0/8] RDMA tool
From: Jiri Pirko @ 2017-05-06 10:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: stephen@networkplumber.org, leon@kernel.org, dledford@redhat.com,
	leonro@mellanox.com, jiri@mellanox.com,
	linux-rdma@vger.kernel.org, ram.amrani@cavium.com,
	sagi@grimberg.me, ogerlitz@mellanox.com,
	dennis.dalessandro@intel.com, hch@lst.de, netdev@vger.kernel.org,
	jgunthorpe@obsidianresearch.com, ariela@mellanox.com
In-Reply-To: <1493921453.2692.6.camel@sandisk.com>

Thu, May 04, 2017 at 08:10:54PM CEST, Bart.VanAssche@sandisk.com wrote:
>On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
>> Following our discussion both in mailing list [1] and at the LPC 2016 [2],
>> we would like to propose this RDMA tool to be part of iproute2 package
>> and finally improve this situation.
>
>Hello Leon,
>
>Although I really appreciate your work: can you clarify why you would like to
>add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
>for adding RDMA functionality to iproute2 in [1].

Bart, please realize that iproute2 is much more than "*IP routing* tool".
I understand you got confused by the name. Please see sources. Your comment
is totally pointless...

^ permalink raw reply

* Re: [RFC iproute2 0/8] RDMA tool
From: Jiri Pirko @ 2017-05-06 10:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jiri Benc, Stephen Hemminger, Doug Ledford, Jiri Pirko,
	Ariel Almog, Dennis Dalessandro, Ram Amrani, Bart Van Assche,
	Sagi Grimberg, Jason Gunthorpe, Christoph Hellwig, Or Gerlitz,
	Linux RDMA, Linux Netdev
In-Reply-To: <20170505131754.GH22833-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

Fri, May 05, 2017 at 03:17:54PM CEST, leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
>On Fri, May 05, 2017 at 08:54:57AM +0200, Jiri Benc wrote:
>> On Thu,  4 May 2017 21:02:08 +0300, Leon Romanovsky wrote:
>> > In order to close object model, ensure reuse of existing code and make this
>> > tool usable from day one, we decided to implement wrappers over legacy sysfs
>> > prior to implementing netlink functionality. As a nice bonus, it will allow
>> > to use this tool with old kernels too.
>>
>> This sounds wrong. We don't support legacy ioctl interface for the 'ip'
>> command, either. I think rdma should be converted to netlink first and
>> the new tool should only use netlink.
>
>RDMA in slightly different situation than "ip" tool was. "ip" was implemented
>when tools like ifconfig existed. It allowed to old and new systems to be
>configured to some degree. In RDMA community, there are no similar tools like
>"ifconfig". Implementation in netlink-only interface will leave old systems without
>common tool at all.
>
>As an upstream-oriented person, I personally fine with that, but anyway would
>like to get wider agreement/disagreement on that, before removing sysfs
>parsing logic from the rdmatool.

I tend to agree with Jiri Benc. I fear that supporting sysfs + netlink
api later on for the same things will make the code unnecessary complex.
Also, the legacy sysfs will most likely stay there forever so there will
be no actual motivation to port the existing things to the new netlink
api.

For the prototyping purposes, I belive that what you did makes perfect
sense. But for the actual mergable version, my feeling is that we need
to strictly stick with new netlink rdma interface and just forget about
the old sysfs one. Distros would have to backport the new kernel
rdma netlink api.

Yes, this will be little bit more painful at the beginning, but in the
long run, I believe it will save some severe headaches.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 net-next 0/2] rtnetlink: Updates to rtnetlink_event()
From: Jiri Pirko @ 2017-05-06 10:51 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, roopa, dsa, Vladislav Yasevich
In-Reply-To: <1494017569-12869-1-git-send-email-vyasevic@redhat.com>

Fri, May 05, 2017 at 10:52:47PM CEST, vyasevich@gmail.com wrote:
>This is a version 4 series came out of the conversation that started
>as a result my first attempt to add netdevice event info to netlink messages.
>
>First is the patch to add IFLA_EVENT attribute to the netlink message.  It
>supports only currently white-listed events.
>Like before, this is just an attribute that gets added to the rtnetlink
>message only when the messaged was generated as a result of a netdev event.
>In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
>event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
>in user space.  This is not possible since the messages generated as

What are you trying to do in userspace if I may ask.

^ permalink raw reply

* [PATCH 0/2] KCM: Fine-tuning for three function implementations
From: SF Markus Elfring @ 2017-05-06 12:14 UTC (permalink / raw)
  To: netdev, Colin Ian King, David S. Miller, Jiri Slaby, Tom Herbert
  Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 6 May 2017 14:11:22 +0200

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Replace three seq_puts() calls by seq_putc()
  Use seq_puts() in kcm_format_psock()

 net/kcm/kcmproc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.12.2

^ permalink raw reply

* [PATCH 1/2] kcm: Replace three seq_puts() calls by seq_putc()
From: SF Markus Elfring @ 2017-05-06 12:15 UTC (permalink / raw)
  To: netdev, Colin Ian King, David S. Miller, Jiri Slaby, Tom Herbert
  Cc: LKML, kernel-janitors
In-Reply-To: <d9601df3-94c8-f15c-3a54-770604caf6a6@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 6 May 2017 13:53:41 +0200

Three single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 net/kcm/kcmproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index bf75c9231cca..46b8b5f6c57f 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -116,7 +116,7 @@ static void kcm_format_mux_header(struct seq_file *seq)
 		   "Status");
 
 	/* XXX: pdsts header stuff here */
-	seq_puts(seq, "\n");
+	seq_putc(seq, '\n');
 }
 
 static void kcm_format_sock(struct kcm_sock *kcm, struct seq_file *seq,
@@ -146,7 +146,7 @@ static void kcm_format_sock(struct kcm_sock *kcm, struct seq_file *seq,
 	if (kcm->rx_wait)
 		seq_puts(seq, "RxWait ");
 
-	seq_puts(seq, "\n");
+	seq_putc(seq, '\n');
 }
 
 static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq,
@@ -192,7 +192,7 @@ static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq,
 			seq_puts(seq, "RdyRx ");
 	}
 
-	seq_puts(seq, "\n");
+	seq_putc(seq, '\n');
 }
 
 static void
-- 
2.12.2


^ permalink raw reply related

* [PATCH 2/2] kcm: Use seq_puts() in kcm_format_psock()
From: SF Markus Elfring @ 2017-05-06 12:16 UTC (permalink / raw)
  To: netdev, Colin Ian King, David S. Miller, Jiri Slaby, Tom Herbert
  Cc: LKML, kernel-janitors
In-Reply-To: <d9601df3-94c8-f15c-3a54-770604caf6a6@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 6 May 2017 14:04:02 +0200

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function "seq_puts".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 net/kcm/kcmproc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index 46b8b5f6c57f..b59b46822d9e 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -182,7 +182,7 @@ static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq,
 				seq_printf(seq, "RxWait=%u ",
 					   psock->strp.rx_need_bytes);
 			else
-				seq_printf(seq, "RxWait ");
+				seq_puts(seq, "RxWait ");
 		}
 	} else  {
 		if (psock->strp.rx_paused)
-- 
2.12.2


^ permalink raw reply related

* Re: [RFC iproute2 0/8] RDMA tool
From: Bart Van Assche @ 2017-05-06 14:40 UTC (permalink / raw)
  To: jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org
  Cc: leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	jiri-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ram.amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org,
	ariela-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
In-Reply-To: <20170506104047.GC2017@nanopsycho>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1243 bytes --]

On Sat, 2017-05-06 at 12:40 +0200, Jiri Pirko wrote:
> Thu, May 04, 2017 at 08:10:54PM CEST, Bart.VanAssche@sandisk.com wrote:
> > On Thu, 2017-05-04 at 21:02 +0300, Leon Romanovsky wrote:
> > > Following our discussion both in mailing list [1] and at the LPC 2016 [2],
> > > we would like to propose this RDMA tool to be part of iproute2 package
> > > and finally improve this situation.
> > 
> > Although I really appreciate your work: can you clarify why you would like to
> > add *RDMA* functionality to an *IP routing* tool? I haven't found any motivation
> > for adding RDMA functionality to iproute2 in [1].
> 
> Bart, please realize that iproute2 is much more than "*IP routing* tool".
> I understand you got confused by the name. Please see sources. Your comment
> is totally pointless...

I asked for a clarification that should have been in the cover letter but that
was missing from that cover letter. So I think that was the right thing to do
instead of pointless. BTW, can you explain why you are using an e-mail address
that is hiding that you are a Mellanox employee?

Bart.N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply

* Re: [PATCH] net: dsa: loop: Check for memory allocation failure
From: Andrew Lunn @ 2017-05-06 14:45 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: vivien.didelot, f.fainelli, netdev, linux-kernel, kernel-janitors
In-Reply-To: <20170506052945.2639-1-christophe.jaillet@wanadoo.fr>

On Sat, May 06, 2017 at 07:29:45AM +0200, Christophe JAILLET wrote:
> If 'devm_kzalloc' fails, a NULL pointer will be dereferenced.
> Return -ENOMEM instead, as done for some other memory allocation just a
> few lines above.
> 
> Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver")
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH] wil6210: use memdup_user
From: Geliang Tang @ 2017-05-06 15:42 UTC (permalink / raw)
  To: Maya Erez, Kalle Valo
  Cc: Geliang Tang, linux-wireless, wil6210, netdev, linux-kernel
In-Reply-To: <df8091d4c64a0d5a7ab0a5989f34552a5eebd15e.1493778999.git.geliangtang@gmail.com>

Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 drivers/net/wireless/ath/wil6210/debugfs.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/wil6210/debugfs.c b/drivers/net/wireless/ath/wil6210/debugfs.c
index 5648ebb..5b0f9fc 100644
--- a/drivers/net/wireless/ath/wil6210/debugfs.c
+++ b/drivers/net/wireless/ath/wil6210/debugfs.c
@@ -795,15 +795,11 @@ static ssize_t wil_write_file_txmgmt(struct file *file, const char __user *buf,
 	struct wireless_dev *wdev = wil_to_wdev(wil);
 	struct cfg80211_mgmt_tx_params params;
 	int rc;
-	void *frame = kmalloc(len, GFP_KERNEL);
+	void *frame;
 
-	if (!frame)
-		return -ENOMEM;
-
-	if (copy_from_user(frame, buf, len)) {
-		kfree(frame);
-		return -EIO;
-	}
+	frame = memdup_user(buf, len);
+	if (IS_ERR(frame))
+		return PTR_ERR(frame);
 
 	params.buf = frame;
 	params.len = len;
-- 
2.9.3

^ permalink raw reply related

* [PATCH] net/hippi/rrunner: use memdup_user
From: Geliang Tang @ 2017-05-06 15:42 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Geliang Tang, linux-hippi, netdev, linux-kernel
In-Reply-To: <df8091d4c64a0d5a7ab0a5989f34552a5eebd15e.1493778999.git.geliangtang@gmail.com>

Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 drivers/net/hippi/rrunner.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hippi/rrunner.c b/drivers/net/hippi/rrunner.c
index 9b0d614..1ce6239 100644
--- a/drivers/net/hippi/rrunner.c
+++ b/drivers/net/hippi/rrunner.c
@@ -1616,17 +1616,14 @@ static int rr_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 			return -EPERM;
 		}
 
-		image = kmalloc(EEPROM_WORDS * sizeof(u32), GFP_KERNEL);
-		oldimage = kmalloc(EEPROM_WORDS * sizeof(u32), GFP_KERNEL);
-		if (!image || !oldimage) {
-			error = -ENOMEM;
-			goto wf_out;
-		}
+		image = memdup_user(rq->ifr_data, EEPROM_BYTES);
+		if (IS_ERR(image))
+			return PTR_ERR(image);
 
-		error = copy_from_user(image, rq->ifr_data, EEPROM_BYTES);
-		if (error) {
-			error = -EFAULT;
-			goto wf_out;
+		oldimage = kmalloc(EEPROM_BYTES, GFP_KERNEL);
+		if (!oldimage) {
+			kfree(image);
+			return -ENOMEM;
 		}
 
 		if (rrpriv->fw_running){
-- 
2.9.3

^ permalink raw reply related

* [PATCH] wlcore: use memdup_user
From: Geliang Tang @ 2017-05-06 15:42 UTC (permalink / raw)
  To: Kalle Valo, Colin Ian King
  Cc: Geliang Tang, linux-wireless, netdev, linux-kernel
In-Reply-To: <df8091d4c64a0d5a7ab0a5989f34552a5eebd15e.1493778999.git.geliangtang@gmail.com>

Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 drivers/net/wireless/ti/wlcore/debugfs.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/debugfs.c b/drivers/net/wireless/ti/wlcore/debugfs.c
index de7e2a5..a2cb408 100644
--- a/drivers/net/wireless/ti/wlcore/debugfs.c
+++ b/drivers/net/wireless/ti/wlcore/debugfs.c
@@ -1149,15 +1149,9 @@ static ssize_t dev_mem_write(struct file *file, const char __user *user_buf,
 	part.mem.start = *ppos;
 	part.mem.size = bytes;
 
-	buf = kmalloc(bytes, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
-	ret = copy_from_user(buf, user_buf, bytes);
-	if (ret) {
-		ret = -EFAULT;
-		goto err_out;
-	}
+	buf = memdup_user(user_buf, bytes);
+	if (IS_ERR(buf))
+		return PTR_ERR(buf);
 
 	mutex_lock(&wl->mutex);
 
@@ -1197,7 +1191,6 @@ static ssize_t dev_mem_write(struct file *file, const char __user *user_buf,
 	if (ret == 0)
 		*ppos += bytes;
 
-err_out:
 	kfree(buf);
 
 	return ((ret == 0) ? bytes : ret);
-- 
2.9.3

^ permalink raw reply related

* [PATCH] xfrm: use memdup_user
From: Geliang Tang @ 2017-05-06 15:42 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller
  Cc: Geliang Tang, netdev, linux-kernel
In-Reply-To: <df8091d4c64a0d5a7ab0a5989f34552a5eebd15e.1493778999.git.geliangtang@gmail.com>

Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 net/xfrm/xfrm_state.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index fc3c5aa..5780cda 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2023,13 +2023,9 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 	if (optlen <= 0 || optlen > PAGE_SIZE)
 		return -EMSGSIZE;
 
-	data = kmalloc(optlen, GFP_KERNEL);
-	if (!data)
-		return -ENOMEM;
-
-	err = -EFAULT;
-	if (copy_from_user(data, optval, optlen))
-		goto out;
+	data = memdup_user(optval, optlen);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
 
 	err = -EINVAL;
 	rcu_read_lock();
@@ -2047,7 +2043,6 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 		err = 0;
 	}
 
-out:
 	kfree(data);
 	return err;
 }
-- 
2.9.3

^ permalink raw reply related

* [PATCH] yam: use memdup_user
From: Geliang Tang @ 2017-05-06 15:42 UTC (permalink / raw)
  To: Jean-Paul Roubelat; +Cc: Geliang Tang, linux-hams, netdev, linux-kernel
In-Reply-To: <df8091d4c64a0d5a7ab0a5989f34552a5eebd15e.1493778999.git.geliangtang@gmail.com>

Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 drivers/net/hamradio/yam.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index b6891ad..7a7c522 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -976,12 +976,10 @@ static int yam_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCYAMSMCS:
 		if (netif_running(dev))
 			return -EINVAL;		/* Cannot change this parameter when up */
-		if ((ym = kmalloc(sizeof(struct yamdrv_ioctl_mcs), GFP_KERNEL)) == NULL)
-			return -ENOBUFS;
-		if (copy_from_user(ym, ifr->ifr_data, sizeof(struct yamdrv_ioctl_mcs))) {
-			kfree(ym);
-			return -EFAULT;
-		}
+		ym = memdup_user(ifr->ifr_data,
+				 sizeof(struct yamdrv_ioctl_mcs));
+		if (IS_ERR(ym))
+			return PTR_ERR(ym);
 		if (ym->bitrate > YAM_MAXBITRATE) {
 			kfree(ym);
 			return -EINVAL;
-- 
2.9.3

^ permalink raw reply related

* [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

As I have mentioned many times[1], at ~43+kB per instance the use of
net_devices does not scale for deployments needing 10,000+ devices. At
netconf 1.2 there was a discussion about using a net_device_common for
the minimal set of common attributes with other structs built on top of
that one for "full" devices. It provided a means for the code to know
"non-standard" net_devices. Conceptually, that approach has its merits
but it is not practical given the sweeping changes required to the code
base. More importantly though struct net_device is not the problem; it
weighs in at less than 2kB so reorganizing the code base around a
refactored net_device is not going to solve the problem. The primary
issue is all of the initializations done *because* it is a struct
net_device -- kobject and sysfs and the protocols (e.g., ipv4, ipv6,
mpls, neighbors).

So, how do you keep the desired attributes of a net device -- network
addresses, xmit function, qdisc, netfilter rules, tcpdump -- while
lowering the overhead of a net_device instance and without sweeping
changes across net/ and drivers/net/?

This patch set introduces the concept of labeling net_devices as
"lightweight", first mentioned at netdev 1.1 [1]. Users have to opt
in to lightweight devices by passing a new attribute, IFLA_LWT_NETDEV,
in the new link request. This lightweight tag is meant for virtual
devices such as vlan, vrf, vti, and dummy where the user expects to
create a lot of them and does not want the duplication of resources.
Each device type can always opt out of a lightweight label if necessary
by failing device creates.

Labeling a virtual device as "lightweight" reduces the footprint for
device creation from ~43kB to ~6kB. That reduction in memory is obtained
by:
1. no entry in sysfs
   - kobject in net_device.device is not initialized

2. no entry in procfs
   - no sysctl option for these devices

3. deferred ipv4, ipv6, mpls initialization
   - network layer must be enabled before an address can be assigned
     or mpls labels can be processed
   - enables what Florian called L2 only devices [2]

Once the core premise of a lightweight device is accepted, follow on
patches can reduce the overhead of network initializations. e.g.,

1. remove devconf per device (ipv4 and ipv6)
   - lightweight devices use the default settings rather than replicate
     the same data for each device

2. reduce / remove / opt out of snmp mibs
   - snmp6_alloc_dev and icmpv6msg_mib_device specifically is a heavy
     hitter

Patches can also be found here:
    https://github.com/dsahern/linux lwt-dev-rfc

And iproute2 here:
    https://github.com/dsahern/iproute2 lwt-dev

Example:
    ip li add foo lwd type vrf table 123

- creates VRF device 'foo' as a lightweight netdevice.


[1] http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
[2] https://www.spinics.net/lists/netdev/msg340808.html
David Ahern (6):
  net: Add accessor for kboject in a net_device
  net: Add flags argument to alloc_netdev_mqs
  net: Introduce IFF_LWT_NETDEV flag
  net: Do not intialize kobject for lightweight netdevs
  net: Delay initializations for lightweight devices
  net: add uapi for creating lightweight devices

 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c |  2 +-
 drivers/net/ethernet/tile/tilegx.c              |  2 +-
 drivers/net/tun.c                               |  2 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c |  2 +-
 include/linux/netdevice.h                       | 27 ++++++++--
 include/uapi/linux/if_link.h                    |  1 +
 net/batman-adv/sysfs.c                          | 13 ++++-
 net/bridge/br_if.c                              | 12 +++--
 net/bridge/br_sysfs_br.c                        | 17 +++---
 net/bridge/br_sysfs_if.c                        |  8 ++-
 net/core/dev.c                                  | 71 ++++++++++++++++++-------
 net/core/neighbour.c                            |  3 ++
 net/core/net-sysfs.c                            | 25 ++++++---
 net/core/rtnetlink.c                            | 10 +++-
 net/ethernet/eth.c                              |  2 +-
 net/ipv4/devinet.c                              | 18 ++++++-
 net/ipv6/addrconf.c                             |  9 ++++
 net/mac80211/iface.c                            |  2 +-
 net/mpls/af_mpls.c                              |  6 +++
 net/wireless/core.c                             | 15 ++++--
 20 files changed, 190 insertions(+), 57 deletions(-)

-- 
2.11.0 (Apple Git-81)

^ permalink raw reply

* [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  5 +++++
 net/batman-adv/sysfs.c    | 13 +++++++++--
 net/bridge/br_if.c        | 12 ++++++----
 net/bridge/br_sysfs_br.c  | 17 +++++++++-----
 net/bridge/br_sysfs_if.c  |  8 +++++--
 net/core/dev.c            | 57 ++++++++++++++++++++++++++++++++++-------------
 net/core/net-sysfs.c      | 11 +++++----
 net/wireless/core.c       | 15 +++++++++----
 8 files changed, 100 insertions(+), 38 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2efb56..305d2d42b349 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4272,6 +4272,11 @@ static inline const char *netdev_reg_state(const struct net_device *dev)
 	return " (unknown)";
 }
 
+static inline struct kobject *netdev_kobject(struct net_device *dev)
+{
+	return &dev->dev.kobj;
+}
+
 __printf(3, 4)
 void netdev_printk(const char *level, const struct net_device *dev,
 		   const char *format, ...);
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
index 0ae8b30e4eaa..a8a7294fc054 100644
--- a/net/batman-adv/sysfs.c
+++ b/net/batman-adv/sysfs.c
@@ -735,11 +735,14 @@ static struct batadv_attribute *batadv_vlan_attrs[] = {
 
 int batadv_sysfs_add_meshif(struct net_device *dev)
 {
-	struct kobject *batif_kobject = &dev->dev.kobj;
+	struct kobject *batif_kobject = netdev_kobject(dev);
 	struct batadv_priv *bat_priv = netdev_priv(dev);
 	struct batadv_attribute **bat_attr;
 	int err;
 
+	if (!batif_kobject)
+		return 0;
+
 	bat_priv->mesh_obj = kobject_create_and_add(BATADV_SYSFS_IF_MESH_SUBDIR,
 						    batif_kobject);
 	if (!bat_priv->mesh_obj) {
@@ -778,6 +781,9 @@ void batadv_sysfs_del_meshif(struct net_device *dev)
 	struct batadv_priv *bat_priv = netdev_priv(dev);
 	struct batadv_attribute **bat_attr;
 
+	if (!bat_priv->mesh_obj)
+		return;
+
 	for (bat_attr = batadv_mesh_attrs; *bat_attr; ++bat_attr)
 		sysfs_remove_file(bat_priv->mesh_obj, &((*bat_attr)->attr));
 
@@ -1132,10 +1138,13 @@ static struct batadv_attribute *batadv_batman_attrs[] = {
 
 int batadv_sysfs_add_hardif(struct kobject **hardif_obj, struct net_device *dev)
 {
-	struct kobject *hardif_kobject = &dev->dev.kobj;
+	struct kobject *hardif_kobject = netdev_kobject(dev);
 	struct batadv_attribute **bat_attr;
 	int err;
 
+	if (!hardif_kobject)
+		return 0;
+
 	*hardif_obj = kobject_create_and_add(BATADV_SYSFS_IF_BAT_SUBDIR,
 					     hardif_kobject);
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 7f8d05cf9065..a5354436ada8 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -485,6 +485,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	struct net_bridge_port *p;
 	int err = 0;
 	unsigned br_hr, dev_hr;
+	struct kobject *kobj;
 	bool changed_addr;
 
 	/* Don't allow bridging non-ethernet like devices, or DSA-enabled
@@ -521,10 +522,13 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (err)
 		goto put_back;
 
-	err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj),
-				   SYSFS_BRIDGE_PORT_ATTR);
-	if (err)
-		goto err1;
+	kobj = netdev_kobject(dev);
+	if (kobj) {
+		err = kobject_init_and_add(&p->kobj, &brport_ktype, kobj,
+					   SYSFS_BRIDGE_PORT_ATTR);
+		if (err)
+			goto err1;
+	}
 
 	err = br_sysfs_addif(p);
 	if (err)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0b5dd607444c..f6439664ffea 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -917,10 +917,13 @@ static struct bin_attribute bridge_forward = {
  */
 int br_sysfs_addbr(struct net_device *dev)
 {
-	struct kobject *brobj = &dev->dev.kobj;
+	struct kobject *brobj = netdev_kobject(dev);
 	struct net_bridge *br = netdev_priv(dev);
 	int err;
 
+	if (!brobj)
+		return 0;
+
 	err = sysfs_create_group(brobj, &bridge_group);
 	if (err) {
 		pr_info("%s: can't create group %s/%s\n",
@@ -944,9 +947,9 @@ int br_sysfs_addbr(struct net_device *dev)
 	}
 	return 0;
  out3:
-	sysfs_remove_bin_file(&dev->dev.kobj, &bridge_forward);
+	sysfs_remove_bin_file(brobj, &bridge_forward);
  out2:
-	sysfs_remove_group(&dev->dev.kobj, &bridge_group);
+	sysfs_remove_group(brobj, &bridge_group);
  out1:
 	return err;
 
@@ -954,10 +957,12 @@ int br_sysfs_addbr(struct net_device *dev)
 
 void br_sysfs_delbr(struct net_device *dev)
 {
-	struct kobject *kobj = &dev->dev.kobj;
+	struct kobject *kobj = netdev_kobject(dev);
 	struct net_bridge *br = netdev_priv(dev);
 
 	kobject_put(br->ifobj);
-	sysfs_remove_bin_file(kobj, &bridge_forward);
-	sysfs_remove_group(kobj, &bridge_group);
+	if (kobj) {
+		sysfs_remove_bin_file(kobj, &bridge_forward);
+		sysfs_remove_group(kobj, &bridge_group);
+	}
 }
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 5d5d413a6cf8..4256e78f6c9f 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -283,10 +283,14 @@ int br_sysfs_addif(struct net_bridge_port *p)
 {
 	struct net_bridge *br = p->br;
 	const struct brport_attribute **a;
+	struct kobject *br_kobj;
 	int err;
 
-	err = sysfs_create_link(&p->kobj, &br->dev->dev.kobj,
-				SYSFS_BRIDGE_PORT_LINK);
+	br_kobj = netdev_kobject(br->dev);
+	if (!br_kobj)
+		return 0;
+
+	err = sysfs_create_link(&p->kobj, br_kobj, SYSFS_BRIDGE_PORT_LINK);
 	if (err)
 		return err;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index d07aa5ffb511..f166b3bf1895 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5910,22 +5910,33 @@ static int netdev_adjacent_sysfs_add(struct net_device *dev,
 			      struct net_device *adj_dev,
 			      struct list_head *dev_list)
 {
+	struct kobject *dev_kobj, *adj_kobj;
 	char linkname[IFNAMSIZ+7];
+	int rc = 0;
 
-	sprintf(linkname, dev_list == &dev->adj_list.upper ?
-		"upper_%s" : "lower_%s", adj_dev->name);
-	return sysfs_create_link(&(dev->dev.kobj), &(adj_dev->dev.kobj),
-				 linkname);
+	dev_kobj = netdev_kobject(dev);
+	adj_kobj = netdev_kobject(adj_dev);
+
+	if (dev_kobj && adj_kobj) {
+		sprintf(linkname, dev_list == &dev->adj_list.upper ?
+			"upper_%s" : "lower_%s", adj_dev->name);
+		rc = sysfs_create_link(dev_kobj, adj_kobj, linkname);
+	}
+	return rc;
 }
+
 static void netdev_adjacent_sysfs_del(struct net_device *dev,
 			       char *name,
 			       struct list_head *dev_list)
 {
+	struct kobject *kobj = netdev_kobject(dev);
 	char linkname[IFNAMSIZ+7];
 
-	sprintf(linkname, dev_list == &dev->adj_list.upper ?
-		"upper_%s" : "lower_%s", name);
-	sysfs_remove_link(&(dev->dev.kobj), linkname);
+	if (kobj) {
+		sprintf(linkname, dev_list == &dev->adj_list.upper ?
+			"upper_%s" : "lower_%s", name);
+		sysfs_remove_link(kobj, linkname);
+	}
 }
 
 static inline bool netdev_adjacent_is_neigh_list(struct net_device *dev,
@@ -5976,11 +5987,14 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 
 	/* Ensure that master link is always the first item in list. */
 	if (master) {
-		ret = sysfs_create_link(&(dev->dev.kobj),
-					&(adj_dev->dev.kobj), "master");
-		if (ret)
-			goto remove_symlinks;
+		struct kobject *dev_kobj = netdev_kobject(dev);
+		struct kobject *adj_kobj = netdev_kobject(adj_dev);
 
+		if (dev_kobj && adj_kobj) {
+			ret = sysfs_create_link(dev_kobj, adj_kobj, "master");
+			if (ret)
+				goto remove_symlinks;
+		}
 		list_add_rcu(&adj->list, dev_list);
 	} else {
 		list_add_tail_rcu(&adj->list, dev_list);
@@ -6025,8 +6039,12 @@ static void __netdev_adjacent_dev_remove(struct net_device *dev,
 		return;
 	}
 
-	if (adj->master)
-		sysfs_remove_link(&(dev->dev.kobj), "master");
+	if (adj->master) {
+		struct kobject *kobj = netdev_kobject(dev);
+
+		if (kobj)
+			sysfs_remove_link(kobj, "master");
+	}
 
 	if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list))
 		netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
@@ -7665,6 +7683,7 @@ void netdev_run_todo(void)
 		rcu_barrier();
 
 	while (!list_empty(&list)) {
+		struct kobject *kobj;
 		struct net_device *dev
 			= list_first_entry(&list, struct net_device, todo_list);
 		list_del(&dev->todo_list);
@@ -7702,7 +7721,9 @@ void netdev_run_todo(void)
 		wake_up(&netdev_unregistering_wq);
 
 		/* Free network device */
-		kobject_put(&dev->dev.kobj);
+		kobj = netdev_kobject(dev);
+		if (kobj)
+			kobject_put(kobj);
 	}
 }
 
@@ -8071,6 +8092,7 @@ EXPORT_SYMBOL(unregister_netdev);
 
 int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat)
 {
+	struct kobject *kobj;
 	int err;
 
 	ASSERT_RTNL();
@@ -8136,7 +8158,9 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	dev_mc_flush(dev);
 
 	/* Send a netdev-removed uevent to the old namespace */
-	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
+	kobj = netdev_kobject(dev);
+	if (kobj)
+		kobject_uevent(kobj, KOBJ_REMOVE);
 	netdev_adjacent_del_links(dev);
 
 	/* Actually switch the network namespace */
@@ -8147,7 +8171,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 		dev->ifindex = dev_new_index(net);
 
 	/* Send a netdev-add uevent to the new namespace */
-	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
+	if (kobj)
+		kobject_uevent(kobj, KOBJ_ADD);
 	netdev_adjacent_add_links(dev);
 
 	/* Fixup kobjects */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 65ea0ff4017c..9df53b688f5b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1390,10 +1390,13 @@ static int register_queue_kobjects(struct net_device *dev)
 	int error = 0, txq = 0, rxq = 0, real_rx = 0, real_tx = 0;
 
 #ifdef CONFIG_SYSFS
-	dev->queues_kset = kset_create_and_add("queues",
-	    NULL, &dev->dev.kobj);
-	if (!dev->queues_kset)
-		return -ENOMEM;
+	struct kobject *kobj = netdev_kobject(dev);
+
+	if (kobj) {
+		dev->queues_kset = kset_create_and_add("queues", NULL, kobj);
+		if (!dev->queues_kset)
+			return -ENOMEM;
+	}
 	real_rx = dev->real_num_rx_queues;
 #endif
 	real_tx = dev->real_num_tx_queues;
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 83ea164f16b3..a73b3efc17b2 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1122,6 +1122,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 	struct wireless_dev *wdev = dev->ieee80211_ptr;
 	struct cfg80211_registered_device *rdev;
 	struct cfg80211_sched_scan_request *pos, *tmp;
+	struct kobject *kobj;
 
 	if (!wdev)
 		return NOTIFY_DONE;
@@ -1160,9 +1161,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 		/* can only change netns with wiphy */
 		dev->features |= NETIF_F_NETNS_LOCAL;
 
-		if (sysfs_create_link(&dev->dev.kobj, &rdev->wiphy.dev.kobj,
-				      "phy80211")) {
-			pr_err("failed to add phy80211 symlink to netdev!\n");
+		kobj = netdev_kobject(dev);
+		if (kobj) {
+			if (sysfs_create_link(kobj, &rdev->wiphy.dev.kobj,
+					      "phy80211")) {
+				pr_err("failed to add phy80211 symlink to netdev!\n");
+			}
 		}
 		wdev->netdev = dev;
 #ifdef CONFIG_CFG80211_WEXT
@@ -1264,9 +1268,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 		 * remove and clean it up.
 		 */
 		if (!list_empty(&wdev->list)) {
+			struct kobject *kobj = netdev_kobject(dev);
+
 			nl80211_notify_iface(rdev, wdev,
 					     NL80211_CMD_DEL_INTERFACE);
-			sysfs_remove_link(&dev->dev.kobj, "phy80211");
+			if (kobj)
+				sysfs_remove_link(kobj, "phy80211");
 			list_del_rcu(&wdev->list);
 			rdev->devlist_generation++;
 			cfg80211_mlme_purge_registrations(wdev);
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH RFC net-next 2/6] net: Add flags argument to alloc_netdev_mqs
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Used in a later patch to pass in flags at create time

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 2 +-
 drivers/net/ethernet/tile/tilegx.c              | 2 +-
 drivers/net/tun.c                               | 2 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c | 2 +-
 include/linux/netdevice.h                       | 7 ++++---
 net/core/dev.c                                  | 5 ++++-
 net/core/rtnetlink.c                            | 2 +-
 net/ethernet/eth.c                              | 2 +-
 net/mac80211/iface.c                            | 2 +-
 9 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index 3c84e36af018..f5aaa92726a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -446,7 +446,7 @@ static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 				  name, NET_NAME_UNKNOWN,
 				  setup,
 				  nch * MLX5E_MAX_NUM_TC,
-				  nch);
+				  nch, 0);
 	if (!netdev) {
 		mlx5_core_warn(mdev, "alloc_netdev_mqs failed\n");
 		goto free_mdev_resources;
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index 7c634bc75615..f38067e260bd 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -2198,7 +2198,7 @@ static void tile_net_dev_init(const char *name, const uint8_t *mac)
 	 * template, instantiated by register_netdev(), but not for us.
 	 */
 	dev = alloc_netdev_mqs(sizeof(*priv), name, NET_NAME_UNKNOWN,
-			       tile_net_setup, NR_CPUS, 1);
+			       tile_net_setup, NR_CPUS, 1, 0);
 	if (!dev) {
 		pr_err("alloc_netdev_mqs(%s) failed\n", name);
 		return;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bbd707b9ef7a..030621621ea8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1804,7 +1804,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		dev = alloc_netdev_mqs(sizeof(struct tun_struct), name,
 				       NET_NAME_UNKNOWN, tun_setup, queues,
-				       queues);
+				       queues, 0);
 
 		if (!dev)
 			return -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 7ec06bf13413..38b6570ff1cd 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2960,7 +2960,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,
 
 	dev = alloc_netdev_mqs(sizeof(struct mwifiex_private *), name,
 			       name_assign_type, ether_setup,
-			       IEEE80211_NUM_ACS, 1);
+			       IEEE80211_NUM_ACS, 1, 0);
 	if (!dev) {
 		mwifiex_dbg(adapter, ERROR,
 			    "no memory available for netdevice\n");
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 305d2d42b349..f47c8712398a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3699,13 +3699,14 @@ void ether_setup(struct net_device *dev);
 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 				    unsigned char name_assign_type,
 				    void (*setup)(struct net_device *),
-				    unsigned int txqs, unsigned int rxqs);
+				    unsigned int txqs, unsigned int rxqs,
+				    unsigned int flags);
 #define alloc_netdev(sizeof_priv, name, name_assign_type, setup) \
-	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, 1, 1)
+	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, 1, 1, 0)
 
 #define alloc_netdev_mq(sizeof_priv, name, name_assign_type, setup, count) \
 	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, count, \
-			 count)
+			 count, 0)
 
 int register_netdev(struct net_device *dev);
 void unregister_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index f166b3bf1895..48a0252037d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7829,6 +7829,7 @@ void netdev_freemem(struct net_device *dev)
  * @setup: callback to initialize device
  * @txqs: the number of TX subqueues to allocate
  * @rxqs: the number of RX subqueues to allocate
+ * @flags: flags to 'or' with priv_flags
  *
  * Allocates a struct net_device with private data area for driver use
  * and performs basic initialization.  Also allocates subqueue structs
@@ -7837,7 +7838,8 @@ void netdev_freemem(struct net_device *dev)
 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		unsigned char name_assign_type,
 		void (*setup)(struct net_device *),
-		unsigned int txqs, unsigned int rxqs)
+		unsigned int txqs, unsigned int rxqs,
+		unsigned int flags)
 {
 	struct net_device *dev;
 	size_t alloc_size;
@@ -7920,6 +7922,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (netif_alloc_rx_queues(dev))
 		goto free_all;
 #endif
+	dev->priv_flags |= flags;
 
 	strcpy(dev->name, name);
 	dev->name_assign_type = name_assign_type;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bcb0f610ee42..a4db1cd91c4a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2390,7 +2390,7 @@ struct net_device *rtnl_create_link(struct net *net,
 		num_rx_queues = ops->get_num_rx_queues();
 
 	dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
-			       ops->setup, num_tx_queues, num_rx_queues);
+			       ops->setup, num_tx_queues, num_rx_queues, 0);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 1446810047f5..d8f489e134f0 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -389,7 +389,7 @@ struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
 				      unsigned int rxqs)
 {
 	return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN,
-				ether_setup, txqs, rxqs);
+				ether_setup, txqs, rxqs, 0);
 }
 EXPORT_SYMBOL(alloc_etherdev_mqs);
 
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 3bd5b81f5d81..54891601e3d1 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1802,7 +1802,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
 
 		ndev = alloc_netdev_mqs(size + txq_size,
 					name, name_assign_type,
-					if_setup, txqs, 1);
+					if_setup, txqs, 1, 0);
 		if (!ndev)
 			return -ENOMEM;
 		dev_net_set(ndev, wiphy_net(local->hw.wiphy));
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Add new flag to denote lightweight netdevices. Add helper to identify
such devices.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f47c8712398a..08151fd34973 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1401,6 +1401,7 @@ enum netdev_priv_flags {
 	IFF_RXFH_CONFIGURED		= 1<<25,
 	IFF_PHONY_HEADROOM		= 1<<26,
 	IFF_MACSEC			= 1<<27,
+	IFF_LWT_NETDEV			= 1<<28,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1430,6 +1431,7 @@ enum netdev_priv_flags {
 #define IFF_TEAM			IFF_TEAM
 #define IFF_RXFH_CONFIGURED		IFF_RXFH_CONFIGURED
 #define IFF_MACSEC			IFF_MACSEC
+#define IFF_LWT_NETDEV			IFF_LWT_NETDEV
 
 /**
  *	struct net_device - The DEVICE structure.
@@ -4137,6 +4139,11 @@ static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 	skb->mac_len = mac_len;
 }
 
+static inline bool netif_is_lwd(struct net_device *dev)
+{
+	return !!(dev->priv_flags & IFF_LWT_NETDEV);
+}
+
 static inline bool netif_is_macsec(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_MACSEC;
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Lightweight netdevices are not added to sysfs; bypass kobject
initialization.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            |  9 ++++++---
 net/core/net-sysfs.c      | 14 +++++++++++---
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08151fd34973..4ddd0ac7e1cb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4282,6 +4282,9 @@ static inline const char *netdev_reg_state(const struct net_device *dev)
 
 static inline struct kobject *netdev_kobject(struct net_device *dev)
 {
+	if (netif_is_lwd(dev))
+		return NULL;
+
 	return &dev->dev.kobj;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 48a0252037d5..52bb01041d12 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7993,7 +7993,8 @@ void free_netdev(struct net_device *dev)
 	dev->reg_state = NETREG_RELEASED;
 
 	/* will free via device release */
-	put_device(&dev->dev);
+	if (!netif_is_lwd(dev))
+		put_device(&dev->dev);
 }
 EXPORT_SYMBOL(free_netdev);
 
@@ -8179,8 +8180,10 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	netdev_adjacent_add_links(dev);
 
 	/* Fixup kobjects */
-	err = device_rename(&dev->dev, dev->name);
-	WARN_ON(err);
+	if (!netif_is_lwd(dev)) {
+		err = device_rename(&dev->dev, dev->name);
+		WARN_ON(err);
+	}
 
 	/* Add the device back in the hashes */
 	list_netdevice(dev);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9df53b688f5b..725348cdeb3b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1559,18 +1559,22 @@ EXPORT_SYMBOL(of_find_net_device_by_node);
  */
 void netdev_unregister_kobject(struct net_device *ndev)
 {
+	struct kobject *kobj = netdev_kobject(ndev);
 	struct device *dev = &(ndev->dev);
 
 	if (!atomic_read(&dev_net(ndev)->count))
 		dev_set_uevent_suppress(dev, 1);
 
-	kobject_get(&dev->kobj);
+	if (kobj)
+		kobject_get(kobj);
 
-	remove_queue_kobjects(ndev);
+	if (!netif_is_lwd(ndev))
+		remove_queue_kobjects(ndev);
 
 	pm_runtime_set_memalloc_noio(dev, false);
 
-	device_del(dev);
+	if (!netif_is_lwd(ndev))
+		device_del(dev);
 }
 
 /* Create sysfs entries for network device. */
@@ -1580,6 +1584,9 @@ int netdev_register_kobject(struct net_device *ndev)
 	const struct attribute_group **groups = ndev->sysfs_groups;
 	int error = 0;
 
+	if (netif_is_lwd(ndev))
+		goto pm;
+
 	device_initialize(dev);
 	dev->class = &net_class;
 	dev->platform_data = ndev;
@@ -1614,6 +1621,7 @@ int netdev_register_kobject(struct net_device *ndev)
 		return error;
 	}
 
+pm:
 	pm_runtime_set_memalloc_noio(dev, true);
 
 	return error;
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Delay ipv4 and ipv6 initializations on lightweight netdevices until an
address is added to the device.

Skip sysctl initialization for neighbor path as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  5 +++++
 net/core/neighbour.c      |  3 +++
 net/ipv4/devinet.c        | 18 ++++++++++++++++--
 net/ipv6/addrconf.c       |  9 +++++++++
 net/mpls/af_mpls.c        |  6 ++++++
 5 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ddd0ac7e1cb..32d155be777a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4144,6 +4144,11 @@ static inline bool netif_is_lwd(struct net_device *dev)
 	return !!(dev->priv_flags & IFF_LWT_NETDEV);
 }
 
+static inline bool netif_has_sysctl(struct net_device *dev)
+{
+	return !netif_is_lwd(dev);
+}
+
 static inline bool netif_is_macsec(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_MACSEC;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 58b0bcc125b5..10104a7135e2 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3123,6 +3123,9 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	char neigh_path[ sizeof("net//neigh/") + IFNAMSIZ + IFNAMSIZ ];
 	char *p_name;
 
+	if (dev && !netif_has_sysctl(dev))
+		return 0;
+
 	t = kmemdup(&neigh_sysctl_template, sizeof(*t), GFP_KERNEL);
 	if (!t)
 		goto err;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index df14815a3b8c..c5ffd3ed4b2c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -771,8 +771,15 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
 
 	in_dev = __in_dev_get_rtnl(dev);
 	err = -ENOBUFS;
-	if (!in_dev)
-		goto errout;
+	if (!in_dev) {
+		if (netif_is_lwd(dev)) {
+			in_dev = inetdev_init(dev);
+			if (IS_ERR(in_dev))
+				in_dev = NULL;
+		}
+		if (!in_dev)
+			goto errout;
+	}
 
 	ifa = inet_alloc_ifa();
 	if (!ifa)
@@ -1417,6 +1424,10 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 
 	if (!in_dev) {
 		if (event == NETDEV_REGISTER) {
+			/* inet init is deferred for lightweight devices */
+			if (netif_is_lwd(dev))
+				goto out;
+
 			in_dev = inetdev_init(dev);
 			if (IS_ERR(in_dev))
 				return notifier_from_errno(PTR_ERR(in_dev));
@@ -2303,6 +2314,9 @@ static int devinet_sysctl_register(struct in_device *idev)
 {
 	int err;
 
+	if (!netif_has_sysctl(idev->dev))
+		return 0;
+
 	if (!sysctl_dev_name_is_allowed(idev->dev->name))
 		return -EINVAL;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8d297a79b568..9814df6b7017 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3371,6 +3371,10 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 
 	switch (event) {
 	case NETDEV_REGISTER:
+		/* inet6 init is deferred for lightweight devices */
+		if (netif_is_lwd(dev))
+			return NOTIFY_OK;
+
 		if (!idev && dev->mtu >= IPV6_MIN_MTU) {
 			idev = ipv6_add_dev(dev);
 			if (IS_ERR(idev))
@@ -6368,6 +6372,11 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
 	struct ctl_table *table;
 	char path[sizeof("net/ipv6/conf/") + IFNAMSIZ];
 
+	if (idev && idev->dev && !netif_has_sysctl(idev->dev)) {
+		p->sysctl_header = NULL;
+		return 0;
+	}
+
 	table = kmemdup(addrconf_sysctl, sizeof(addrconf_sysctl), GFP_KERNEL);
 	if (!table)
 		goto out;
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 088e2b459d0f..7503d68da2ea 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1251,6 +1251,9 @@ static int mpls_dev_sysctl_register(struct net_device *dev,
 	struct ctl_table *table;
 	int i;
 
+	if (!netif_has_sysctl(dev))
+		return 0;
+
 	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
 	if (!table)
 		goto out;
@@ -1285,6 +1288,9 @@ static void mpls_dev_sysctl_unregister(struct net_device *dev,
 	struct net *net = dev_net(dev);
 	struct ctl_table *table;
 
+	if (!mdev->sysctl)
+		return;
+
 	table = mdev->sysctl->ctl_table_arg;
 	unregister_net_sysctl_table(mdev->sysctl);
 	kfree(table);
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH RFC net-next 6/6] net: add uapi for creating lightweight devices
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern
In-Reply-To: <20170506160734.47084-1-dsahern@gmail.com>

Allow users to make new devices lightweight by setting IFLA_LWT_NETDEV
attribute in the newlink request.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8e56ac70e0d1..f57a16e542b7 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
 	IFLA_GSO_MAX_SIZE,
 	IFLA_PAD,
 	IFLA_XDP,
+	IFLA_LWT_NETDEV,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a4db1cd91c4a..9c18e6dec379 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2378,6 +2378,7 @@ struct net_device *rtnl_create_link(struct net *net,
 	struct net_device *dev;
 	unsigned int num_tx_queues = 1;
 	unsigned int num_rx_queues = 1;
+	unsigned int flags = 0;
 
 	if (tb[IFLA_NUM_TX_QUEUES])
 		num_tx_queues = nla_get_u32(tb[IFLA_NUM_TX_QUEUES]);
@@ -2389,8 +2390,15 @@ struct net_device *rtnl_create_link(struct net *net,
 	else if (ops->get_num_rx_queues)
 		num_rx_queues = ops->get_num_rx_queues();
 
+	if (tb[IFLA_LWT_NETDEV]) {
+		u8 lwt_dev = !!nla_get_u8(tb[IFLA_LWT_NETDEV]);
+
+		if (lwt_dev)
+			flags |= IFF_LWT_NETDEV;
+	}
+
 	dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
-			       ops->setup, num_tx_queues, num_rx_queues, 0);
+			       ops->setup, num_tx_queues, num_rx_queues, flags);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related

* [PATCH 0/2] batman-adv: Fine-tuning for three function implementations
From: SF Markus Elfring @ 2017-05-06 16:12 UTC (permalink / raw)
  To: b.a.t.m.a.n, netdev, Antonio Quartulli, David S. Miller,
	Marek Lindner, Simon Wunderlich
  Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 6 May 2017 18:03:45 +0200

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Replace a seq_puts() call by seq_putc() in two functions
  Combine two seq_puts() calls into one call in batadv_nc_nodes_seq_print_text()

 net/batman-adv/bat_iv_ogm.c     | 2 +-
 net/batman-adv/bat_v.c          | 2 +-
 net/batman-adv/network-coding.c | 4 +---
 3 files changed, 3 insertions(+), 5 deletions(-)

-- 
2.12.2


^ permalink raw reply

* [PATCH 1/2] batman-adv: Replace a seq_puts() call by seq_putc() in two functions
From: SF Markus Elfring @ 2017-05-06 16:14 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, Antonio Quartulli, David S. Miller,
	Marek Lindner, Simon Wunderlich
  Cc: kernel-janitors-u79uwXL29TY76Z2rM5mHXA, LKML
In-Reply-To: <512d2e7c-0967-9303-2b68-6c9ef53d6ed2-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>

From: Markus Elfring <elfring-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Date: Sat, 6 May 2017 17:50:13 +0200

Two single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
---
 net/batman-adv/bat_iv_ogm.c | 2 +-
 net/batman-adv/bat_v.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 495ba7cdcb04..1f80392ab37c 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1944,7 +1944,7 @@ static void batadv_iv_ogm_orig_print(struct batadv_priv *bat_priv,
 
 			batadv_iv_ogm_orig_print_neigh(orig_node, if_outgoing,
 						       seq);
-			seq_puts(seq, "\n");
+			seq_putc(seq, '\n');
 			batman_count++;
 
 next:
diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c
index a36c8e7291d6..4e2724c5b33d 100644
--- a/net/batman-adv/bat_v.c
+++ b/net/batman-adv/bat_v.c
@@ -400,7 +400,7 @@ static void batadv_v_orig_print(struct batadv_priv *bat_priv,
 				   neigh_node->if_incoming->net_dev->name);
 
 			batadv_v_orig_print_neigh(orig_node, if_outgoing, seq);
-			seq_puts(seq, "\n");
+			seq_putc(seq, '\n');
 			batman_count++;
 
 next:
-- 
2.12.2

^ permalink raw reply related

* [PATCH 2/2] batman-adv: Combine two seq_puts() calls into one call in batadv_nc_nodes_seq_print_text()
From: SF Markus Elfring @ 2017-05-06 16:15 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, Antonio Quartulli, David S. Miller,
	Marek Lindner, Simon Wunderlich
  Cc: kernel-janitors-u79uwXL29TY76Z2rM5mHXA, LKML
In-Reply-To: <512d2e7c-0967-9303-2b68-6c9ef53d6ed2-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>

From: Markus Elfring <elfring-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Date: Sat, 6 May 2017 17:57:36 +0200

A bit of text was put into a sequence by two separate function calls.
Print the same data by a single function call instead.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
---
 net/batman-adv/network-coding.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index e1f6fc72fe3e..3604d7899e2c 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -1935,9 +1935,7 @@ int batadv_nc_nodes_seq_print_text(struct seq_file *seq, void *offset)
 						list)
 				seq_printf(seq, "%pM ",
 					   nc_node->addr);
-			seq_puts(seq, "\n");
-
-			seq_puts(seq, " Outgoing: ");
+			seq_puts(seq, "\n Outgoing: ");
 			/* For out_nc_node to this orig_node */
 			list_for_each_entry_rcu(nc_node,
 						&orig_node->out_coding_list,
-- 
2.12.2

^ permalink raw reply related

* Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit
From: Shubham Bansal @ 2017-05-06 16:48 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Kees Cook, Mircea Gherzan, Network Development,
	kernel-hardening, linux-arm-kernel, ast
In-Reply-To: <58E639E0.1010700@iogearbox.net>

Hi Daniel,

Thanks for the last reply about the testing of eBPF JIT.

I have one issue though, I am not able to find what BPF_ABS and
BPF_IND instruction does exactly. It not described on this link -
https://www.kernel.org/doc/Documentation/networking/filter.txt either.
Can you please tell me where I could find the description of these
instructions please?
Best,
Shubham Bansal


On Thu, Apr 6, 2017 at 6:21 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 04/06/2017 01:05 PM, Shubham Bansal wrote:
>>
>> Gentle Reminder.
>
>
> Sorry for late reply.
>
>> Anybody can tell me how to test the JIT compiler ?
>
>
> There's lib/test_bpf.c, see Documentation/networking/filter.txt +1349
> for some more information. It basically contains various test cases that
> have the purpose to test the JIT with corner cases. If you see a useful
> test missing, please send a patch for it, so all other JITs can benefit
> from this as well. For extracting disassembly from a generated test case,
> check out bpf_jit_disasm (Documentation/networking/filter.txt +486).
>
> Thanks,
> Daniel

^ permalink raw reply

* Re: Why do we need MSG_SENDPAGE_NOTLAST?
From: Eric Dumazet @ 2017-05-06 17:13 UTC (permalink / raw)
  To: Ilya Lesokhin; +Cc: netdev@vger.kernel.org, tls-fpga-sw-dev, Dave Watson
In-Reply-To: <VI1PR0502MB29571B2B852BF681FC1DB9B0D4E80@VI1PR0502MB2957.eurprd05.prod.outlook.com>

Do not top-post on netdev, please.


On Sat, 2017-05-06 at 05:46 +0000, Ilya Lesokhin wrote:
> I don't follow.
> Why can't splice use MSG_MORE for the individual pages?
> Why does tcp_sendpage need to know if the MORE indicator is coming from the user or from splice?
> 
> I also don't understand your comment about partial writes.
> 

Make sure that sendpage() wont end up with a stall on TCP, if the socket
has not enough room to store the 16 pages provided by splice() or
sendpage()

Just use MSG_SENDPAGE_NOTLAST and be happy.


> Thanks,
> Ilya
> 
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > Sent: Thursday, May 4, 2017 9:33 PM
> > To: Ilya Lesokhin <ilyal@mellanox.com>
> > Cc: netdev@vger.kernel.org; tls-fpga-sw-dev <tls-fpga-sw-
> > dev@mellanox.com>; Dave Watson <davejwatson@fb.com>
> > Subject: Re: Why do we need MSG_SENDPAGE_NOTLAST?
> > 
> > On Thu, 2017-05-04 at 17:03 +0000, Ilya Lesokhin wrote:
> > > I don't understand the need for MSG_SENDPAGE_NOTLAST and I'm hoping
> > > someone can enlighten me.
> > >
> > > According to commit 35f9c09 ('tcp: tcp_sendpages() should call
> > > tcp_push() once'):
> > > "We need to call tcp_flush() at the end of the last page processed in
> > > tcp_sendpages(), or else transmits can be deferred and future sends
> > > stall."
> > >
> > > I don't understand why we need to differentiate between the user
> > > setting MSG_MORE
> > > and splice indicating that more data is going to be sent.
> > > if the user passed MSG_MORE and didn't push any extra data, isn't it
> > > the users fault?
> > > Do we need it because poorly written applications were broken when
> > > MSG_MORE was added to tcp_sendpage? Or is there a deeper reason?
> > >
> > 
> > The answer lies to how splice() is working.
> > 
> > User can issue one splice without MSG_MORE semantic, right ?
> > 
> > Still, we want an implicit MORE behavior for all individual pages, but
> > the last one.
> > 
> > 
> > > The reason I'm asking is that we are working on a kernel TLS
> > > implementation
> > > and I would like to know if we can coalesce multiple tls_sendpage
> > > calls with MSG_MORE into a single
> > > tls record or whether we must push out the record as soon as
> > > MSG_SENDPAGE_NOTLAST is cleared?
> > 
> > Make sure you handle partial writes (you want to coalesce 10 pages, but
> > stack will only take 5 of them)
> > 
> > 
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox