Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next v4 3/5] selftests/bpf: test_sockmap, timing improvements
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

Currently 10us delay is too low for many tests to succeed. It needs to
be increased. Also, many corked tests are expected to hit rx timeout
irrespective of timeout value.

- This patch sets 1000usec timeout value for corked tests because less
than that causes broken-pipe error in tx thread. Also sets 1 second
timeout for all other tests because less than that results in RX
timeout
- tests with apply=1 and higher number of iterations were taking lot
of time. This patch reduces test run time by reducing iterations.

real    0m12.968s
user    0m0.219s
sys     0m14.337s

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 7f9ca79aadbc..5cd0550af595 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		if (err < 0)
 			perror("recv start time: ");
 		while (s->bytes_recvd < total_bytes) {
-			timeout.tv_sec = 0;
-			timeout.tv_usec = 10;
+			if (txmsg_cork) {
+				timeout.tv_sec = 0;
+				timeout.tv_usec = 1000;
+			} else {
+				timeout.tv_sec = 1;
+				timeout.tv_usec = 0;
+			}
 
 			/* FD sets */
 			FD_ZERO(&w);
@@ -1025,14 +1030,14 @@ static int test_send(struct sockmap_options *opt, int cgrp)
 
 	opt->iov_length = 1;
 	opt->iov_count = 1;
-	opt->rate = 1024;
+	opt->rate = 512;
 	err = test_exec(cgrp, opt);
 	if (err)
 		goto out;
 
 	opt->iov_length = 256;
 	opt->iov_count = 1024;
-	opt->rate = 10;
+	opt->rate = 2;
 	err = test_exec(cgrp, opt);
 	if (err)
 		goto out;
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 7b2008a144cb..7f9ca79aadbc 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1344,6 +1344,11 @@ static int __test_suite(char *bpf_file)
 		return cg_fd;
 	}
 
+	if (join_cgroup(CG_PATH)) {
+		fprintf(stderr, "ERROR: failed to join cgroup\n");
+		return -EINVAL;
+	}
+
 	/* Tests basic commands and APIs with range of iov values */
 	txmsg_start = txmsg_end = 0;
 	err = test_txmsg(cg_fd);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next v4 0/5] fix test_sockmap
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest

test_sockmap was originally written only to exercise kernel code
paths, so there was no strict checking of errors. When the code was
modified to run as selftests, due to lack of error handling it was not
able to detect test failures.

In order to improve, this series fixes error handling, test run time
and data verification.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Changes in v4:
  - patch1: Ignore RX timoute error only for corked tests
  - patch3: Setting different timeout for corked tests and reduce
      run time by reducing number of iterations in some tests

Changes in v3:
  - Skipped error checking for corked tests

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, timing improvements
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 87 +++++++++++++++++-----
 1 file changed, 67 insertions(+), 20 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH bpf-next v4 1/5] selftests/bpf: test_sockmap, check test failure
From: Prashant Bhole @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180531044240.796-1-bhole_prashant_q7@lab.ntt.co.jp>

Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads
- Skip error checking for corked tests as they are expected to timeout

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 27 +++++++++++++++++-----
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..7b2008a144cb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
 	struct msg_stats s = {0};
 	int iov_count = opt->iov_count;
 	int iov_buf = opt->iov_length;
+	int rx_status, tx_status;
 	int cnt = opt->rate;
-	int status;
 
 	errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
 	rxpid = fork();
 	if (rxpid == 0) {
 		if (opt->drop_expected)
-			exit(1);
+			exit(0);
 
 		if (opt->sendpage)
 			iov_count = 1;
@@ -463,7 +463,9 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		if (err && txmsg_cork)
+			err = 0;
+		exit(err ? 1 : 0);
 	} else if (rxpid == -1) {
 		perror("msg_loop_rx: ");
 		return errno;
@@ -491,14 +493,27 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		exit(err ? 1 : 0);
 	} else if (txpid == -1) {
 		perror("msg_loop_tx: ");
 		return errno;
 	}
 
-	assert(waitpid(rxpid, &status, 0) == rxpid);
-	assert(waitpid(txpid, &status, 0) == txpid);
+	assert(waitpid(rxpid, &rx_status, 0) == rxpid);
+	assert(waitpid(txpid, &tx_status, 0) == txpid);
+	if (WIFEXITED(rx_status)) {
+		err = WEXITSTATUS(rx_status);
+		if (err) {
+			fprintf(stderr, "rx thread exited with err %d. ", err);
+			goto out;
+		}
+	}
+	if (WIFEXITED(tx_status)) {
+		err = WEXITSTATUS(tx_status);
+		if (err)
+			fprintf(stderr, "tx thread exited with err %d. ", err);
+	}
+out:
 	return err;
 }
 
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
From: Prashant Bhole @ 2018-05-31  4:13 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller, Shuah Khan,
	netdev, linux-kselftest
In-Reply-To: <df1d9ec2-783a-09f4-29d7-544d20d74465@gmail.com>



On 5/31/2018 4:59 AM, John Fastabend wrote:
> On 05/30/2018 12:29 PM, Alexei Starovoitov wrote:
>> On Wed, May 30, 2018 at 02:56:09PM +0900, Prashant Bhole wrote:
>>> In order to reduce runtime of tests, recently timout for select() call
>>> was reduced from 1sec to 10usec. This was causing many tests failures.
>>> It was caught with failure handling commits in this series.
>>>
>>> Restoring the timeout from 10usec to 1sec
>>>
>>> Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
>>> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
>>> ---
>>>   tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
>>> index 64f9e25c451f..9d01f5c2abe2 100644
>>> --- a/tools/testing/selftests/bpf/test_sockmap.c
>>> +++ b/tools/testing/selftests/bpf/test_sockmap.c
>>> @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
>>>   		if (err < 0)
>>>   			perror("recv start time: ");
>>>   		while (s->bytes_recvd < total_bytes) {
>>> -			timeout.tv_sec = 0;
>>> -			timeout.tv_usec = 10;
>>> +			timeout.tv_sec = 1;
>>> +			timeout.tv_usec = 0;
>>
>> I've applied the set, but had to revert it, since it takes too long.
>>
>> real	1m40.124s
>> user	0m0.375s
>> sys	0m14.521s
>>
> 
> Dang, I thought it would be a bit longer but not minutes.
> 
>> Myself and Daniel run the test semi-manually when we apply patches.> Adding 2 extra minutes of wait time is unnecessary.
> 
> Yep.
> 
>> Especially since most of it is idle time.
>> Please find a way to fix tests differently.
>> btw I don't see any failures today. Not sure what is being fixed
>> by incresing a timeout.
>>
> 
> Calling these fixes is a bit much, they are primarily improvements.
> 
> The background is, when I originally wrote the tests my goal was to
> exercise the kernel code paths. Because of this I didn't really care if
> the tests actually sent/recv all bytes in the test. (I have long
> running tests using netperf/wrk/apached/etc. for that) But, the manual
> tests do have an option to verify the data if specified. The 'verify'
> option is a bit fragile in that with the right tests (e.g. drop)
> or the certain options (e.g. cork) it can fail which is expected.
> 
> What Prashant added was support to actually verify the data correctly.
> And also fix a few cgroup handling and some pretty printing as well.
> He noticed the low timeout causing issue in these cases though so
> increased it.
> 
> @Prashant, how about increasing this less dramatically because now
> all cork tests are going to stall for 1s unless perfectly aligned.
> How about 100us? Or even better we can conditionally set it based
> on if tx_cork is set. If tx_cork is set use 1us otherwise use 200us
> or something. (1s is really to high in any cases for lo)
> 
> Also capturing some of the above in the cover letter would help
> folks understand the context a bit better.
> 

I did trial and error for timeout values. Currently 1000us for corked 
tests and 1 sec for other tests works fine. I observed broken-pipe error 
at tx side when timeout was < 1000us.

Also tests with apply=1 and higher number of iterations were taking 
time, so reducing iterations reduces the test run time drastically.

real    0m12.968s
user    0m0.219s
sys     0m14.337s

Also I will try to explain background in the cover letter of next series.

-Prashant

^ permalink raw reply

* [PATCH net] net/ncsi: Fix array size in dumpit handler
From: Samuel Mendoza-Jonas @ 2018-05-31  4:10 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Mendoza-Jonas, David S . Miller, linux-kernel, openbmc

With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when
parsing a NCSI_CMD_PKG_INFO command:

[  150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
[  150.149711]
[  150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1
[  150.170004] Hardware name: Generic DT based system
[  150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[  150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28)
[  150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278)
[  150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24)
[  150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258)
[  150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54)
[  150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284)
[  150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c)
[  150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4)
[  150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134)
[  150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44)
[  150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234)
[  150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0)
[  150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34)
[  150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260)
[  150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c)
[  150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c)
[  150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[  150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08

This turns out to be because the attrs array in ncsi_pkg_info_all_nl()
is initialised to a length of NCSI_ATTR_MAX which is the maximum
attribute number, not the number of attributes.

Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
 net/ncsi/ncsi-netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 8d7e849d4825..41cede4041d3 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -215,7 +215,7 @@ static int ncsi_pkg_info_nl(struct sk_buff *msg, struct genl_info *info)
 static int ncsi_pkg_info_all_nl(struct sk_buff *skb,
 				struct netlink_callback *cb)
 {
-	struct nlattr *attrs[NCSI_ATTR_MAX];
+	struct nlattr *attrs[NCSI_ATTR_MAX + 1];
 	struct ncsi_package *np, *package;
 	struct ncsi_dev_priv *ndp;
 	unsigned int package_id;
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next] net: netcp: ethss: remove unnecessary pointer set to NULL
From: YueHaibing @ 2018-05-31  3:48 UTC (permalink / raw)
  To: davem, w-kwok2, m-karicheri2; +Cc: netdev, linux-kernel, YueHaibing

If statement has make sure the 'slave->phy' is NULL

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 6a728d3..6e455a2 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3206,7 +3206,6 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
 		if (!slave->phy) {
 			dev_err(dev, "phy not found for slave %d\n",
 				slave->slave_num);
-			slave->phy = NULL;
 		} else {
 			dev_dbg(dev, "phy found: id is: 0x%s\n",
 				phydev_name(slave->phy));
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Samudrala, Sridhar @ 2018-05-31  3:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
In-Reply-To: <20180530155337.691f1ad4@cakuba>

On 5/30/2018 3:53 PM, Jakub Kicinski wrote:
> On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
>> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>>>> extended?
>>>>> +1 it's painful to see this feature being added to the legacy
>>>>> API :(  Another duplicated configuration knob.
>>>>>   
>>>>>> I didn't know about that.
>>>>>>   
>>>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>>>> We discussed about this.  ethtool on the VF representor will only work
>>>>>> in switchdev mode and also will not support min/max values.
>>>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>>>> feel like implementing switchdev mode and rewriting features in devlink
>>>>> may be too much to ask.
>>>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>>>> used at the moment.  Do you have other suggestions besides NDO?
>>> At some points you (Broadcom) were working whole bunch of devlink
>>> configuration options for the PCIe side of the ASIC.  The number of
>>> queues relates to things like number of allocated MSI-X vectors, which
>>> if memory serves me was in your devlink patch set.  In an ideal world
>>> we would try to keep all those in one place :)
>>>
>>> For PCIe config there is always the question of what can be configured
>>> at runtime, and what requires a HW reset.  Therefore that devlink API
>>> which could configure current as well as persistent device settings was
>>> quite nice.  I'm not sure if reallocating queues would ever require
>>> PCIe block reset but maybe...  Certainly it seems the notion of min
>>> queues would make more sense in PCIe configuration devlink API than
>>> ethtool channel API to me as well.
>>>
>>> Queues are in the grey area between netdev and non-netdev constructs.
>>> They make sense both from PCIe resource allocation perspective (i.e.
>>> devlink PCIe settings) and netdev perspective (ethtool) because they
>>> feed into things like qdisc offloads, maybe per-queue stats etc.
>>>
>>> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
>>> API and/or switchdev representors.  But neither of those are really an
>>> option for you today so IDK :)
>> One reason why 'switchdev' mode is not yet widely used or enabled by default
>> could be due to the requirement to program the flow rules only via slow path.
> Do you mean the fallback traffic requirement?

Yes.

>
>> Would it make sense to relax this requirement and support a mode where port
>> representors are created and let the PF driver implement a default policy that
>> adds flow rules for all the VFs to enable connectivity and let the user
>> add/modify the rules via port representors?
> I definitely share your concerns, stopping a major HW vendor from using
> this new and preferred mode is not helping us make progress.
>
> The problem is that if we allow this diversion, i.e. driver to implement
> some special policy, or pre-populate a bridge in a configuration that
> suits the HW we may condition users to expect that as the standard Linux
> behaviour.  And we will be stuck with it forever even tho your next gen
> HW (ice?) may support correct behaviour.

Yes. ice can support slowpath behavior as required to support OVS offload.
However, i was just wondering if we should have an option to allow switchdev
without slowpath so that the user can use alternate mechanisms to program
the flow rules instead of having to use OVS.


>
> We should perhaps separate switchdev mode from TC flower/OvS offloads.
> Is your objective to implement OvS offload or just switchdev mode?
>
> For OvS without proper fallback behaviour you may struggle.
>
> Switchdev mode could be within your reach even without changing the
> default rules.  What if you spawned all port netdevs (I dislike the
> term representor, sorry, it's confusing people) in down state and then
> refuse to bring them up unless user instantiated a bridge that would
> behave in a way that your HW can support?  If ports are down you won't
> have fallback traffic so no problem to solve.

If we want to use port netdev's admin state to control the link state of the
VFs then this will not work.
We need to only disable TX/RX but admin state and link state need to be
supported on the port netdevs.

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-05-31  3:09 UTC (permalink / raw)
  To: Wei Xu; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann,
	tiwei.bie
In-Reply-To: <20180530114200.GA23792@wei-ubt>



On 2018年05月30日 19:42, Wei Xu wrote:
>>   /* This actually signals the guest, using eventfd. */
>>   void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>   {
>> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>>   				       struct vhost_virtqueue *vq)
>>   {
>>   	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
>> -	__virtio16 flags;
>> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>>   	int ret;
>>   
>> -	/* FIXME: disable notification through device area */
>> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
>> +		return false;
>> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> 'used_flags' was originally designed for 1.0, why should we pay attetion to it here?
>
> Wei

It was used to recored whether or not we've disabled notification. Then 
we can avoid unnecessary userspace writes or memory barriers.

Thanks

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-31  3:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180530220635.206ee6d7@shemminger-XPS-13-9360>



On 5/30/2018 7:06 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:14 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Why was this merged? It was never signed off by any of the netvsc maintainers,
> and there were still issues unresolved.
>
> There are also namespaces issues I am fixing and this breaks them.
> Will start my patch set with a revert for this. Sorry

I would appreciate if you can make the fixes on top of this patch series. I tried hard
to make sure that netvsc functionality and behavior doesn't change.

It is possible that there could be some bugs introduced, but they can be fixed.
Looks like Wei already found a bug and submitted a fix for that.

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-31  2:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <20180530225259.2cf3f7be@shemminger-XPS-13-9360>



On 5/30/2018 7:52 PM, Stephen Hemminger wrote:
> On Fri, 25 May 2018 16:06:58 -0700
> "Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:
>
>> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
>>> On Thu, 24 May 2018 09:55:13 -0700
>>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>>   
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 03ed492c4e14..0f4ba52b641d 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1421,6 +1421,8 @@ struct net_device_ops {
>>>>     *	entity (i.e. the master device for bridged veth)
>>>>     * @IFF_MACSEC: device is a MACsec device
>>>>     * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>>>> + * @IFF_FAILOVER: device is a failover master device
>>>> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>>>     */
>>>>    enum netdev_priv_flags {
>>>>    	IFF_802_1Q_VLAN			= 1<<0,
>>>> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
>>>>    	IFF_PHONY_HEADROOM		= 1<<24,
>>>>    	IFF_MACSEC			= 1<<25,
>>>>    	IFF_NO_RX_HANDLER		= 1<<26,
>>>> +	IFF_FAILOVER			= 1<<27,
>>>> +	IFF_FAILOVER_SLAVE		= 1<<28,
>>>>    };
>>> Why is FAILOVER any different than other master/slave relationships.
>>> I don't think you need to take up precious netdev flag bits for this.
>> These are netdev priv flags.
>> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
>> with other failover mechanisms. Team also doesn't use this flags and it has its own
>> priv_flags.
>>
> This change breaks userspace.
> We already have worked with partners to ignore devices marked as IFF_SLAVE,
> and IFF_SLAVE is visible to user space API's.

I specifically made sure not to remove IFF_SLAVE in the netvsc patch.


>
> NAK

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-31  2:52 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <00d34f67-f26f-0b20-af3f-2add24ae8a5c@intel.com>

On Fri, 25 May 2018 16:06:58 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:13 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 03ed492c4e14..0f4ba52b641d 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -1421,6 +1421,8 @@ struct net_device_ops {
> >>    *	entity (i.e. the master device for bridged veth)
> >>    * @IFF_MACSEC: device is a MACsec device
> >>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> >> + * @IFF_FAILOVER: device is a failover master device
> >> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> >>    */
> >>   enum netdev_priv_flags {
> >>   	IFF_802_1Q_VLAN			= 1<<0,
> >> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
> >>   	IFF_PHONY_HEADROOM		= 1<<24,
> >>   	IFF_MACSEC			= 1<<25,
> >>   	IFF_NO_RX_HANDLER		= 1<<26,
> >> +	IFF_FAILOVER			= 1<<27,
> >> +	IFF_FAILOVER_SLAVE		= 1<<28,
> >>   };  
> > Why is FAILOVER any different than other master/slave relationships.
> > I don't think you need to take up precious netdev flag bits for this.  
> 
> These are netdev priv flags.
> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
> with other failover mechanisms. Team also doesn't use this flags and it has its own
> priv_flags.
> 

This change breaks userspace.
We already have worked with partners to ignore devices marked as IFF_SLAVE,
and IFF_SLAVE is visible to user space API's.

NAK

^ permalink raw reply

* [PATCH net-next] net/smc: fix error return code in smc_setsockopt()
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Ursula Braun; +Cc: Wei Yongjun, linux-s390, netdev, kernel-janitors

Fix to return error code -EINVAL instead of 0 if optlen is invalid.

Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 net/smc/af_smc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2c369d4..973b447 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1420,7 +1420,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		return rc;
 
 	if (optlen < sizeof(int))
-		return rc;
+		return -EINVAL;
 	get_user(val, (int __user *)optval);
 
 	lock_sock(sk);

^ permalink raw reply related

* [PATCH net-next] net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
From: Wei Yongjun @ 2018-05-31  2:31 UTC (permalink / raw)
  To: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Ilya Lesokhin
  Cc: Wei Yongjun, netdev, linux-rdma, kernel-janitors

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
index 2104801..c973623 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
@@ -196,8 +196,8 @@ static void mlx5_fpga_tls_flow_to_cmd(void *flow, void *cmd)
 		 MLX5_GET(tls_flow, flow, direction_sx));
 }
 
-void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, void *flow,
-				     u32 swid, gfp_t flags)
+static void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev,
+					    void *flow, u32 swid, gfp_t flags)
 {
 	struct mlx5_teardown_stream_context *ctx;
 	struct mlx5_fpga_dma_buf *buf;

^ permalink raw reply related

* Re: [PATCH v2] Revert "alx: remove WoL support"
From: AceLan Kao @ 2018-05-31  2:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	Emily Chien, Johannes Berg, Johannes Stezenbach,
	Linux-Kernel@Vger. Kernel. Org
In-Reply-To: <20180530135859.GB27537@lunn.ch>

Hi Andrew,

2018-05-30 21:58 GMT+08:00 Andrew Lunn <andrew@lunn.ch>:
> On Wed, May 30, 2018 at 10:10:08AM +0800, AceLan Kao wrote:
>> This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.
>>
>> The WoL feature is a must to pass Energy Star 6.1 and above,
>> the power consumption will be measured during S3 with WoL is enabled.
>>
>> Reverting "alx: remove WoL support", and will try to fix the unintentional
>> wake up issue when WoL is enabled.
>
> Hi AceLan
>
> I find this change log entry rather odd.
>
> If i remember correctly, you first argued that you did not want to
> have to distribute out of tree patches.
Yes, once the secure boot is enabled, no dkms driver would be loaded.

>
> It was suggested that you might be able to justify the revert using
> the argument that the cure is worse than the decease. You ignored
I didn't try to ignore it, maybe I misunderstood what you say. I thought
you do not like the driver parameter, so I only revert back the alx wol
feature.

> that, and when with this Energy Star argument. That got shot down by
> DaveM, and told to actually try to find the problem.
To pass Energy Star is my purpose, I'm sorry to not mention it in the beginning.
We used to using dkms for the measurement, but secure boot is coming,
so we need to make wol feature to be built in the kernel.

And I've written to the device owners for help, but they are not care too much
about the wol feature and are not inconvenient for the testing. So I stuck here
until I saw the user report.

>
> So you then come back and said you think the problem is fixed, but
> don't know exactly what fixed it. So DaveM said try again.
That's another user's report, not me, please refer the link below
https://bugzilla.kernel.org/show_bug.cgi?id=61651#c126

We have no wake up issue and can't reproduce this issue at my side.

>
> Now you are back to Energy Star.
>
> I don't get this. It was the fact you said it was probably fixed that
> made DaveM reconsider. That is the argument you should be using in the
> change log. We want to know what testing you have done. See a
> tested-by: from somebody who had the issue which caused the revert,
> and now says the issue is fixed.
Thanks to remind me and sorry for my ignorance, I never think of adding
tested-by: in the comment, I'll be asking the reporter to provide more info
and put his name in the comment.

Hope my explanation is helpful for the misunderstanding.
And I'll submit another v3 patch once I got the info from reporter.

>
> Ideally we would like to know which change actually fixed the issue,
> so it can be added to stable. But that requires somebody to do a long
> git bisect.
>
>     Andrew
Thanks,

Best regards,
AceLan Kao.

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-31  2:06 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <1527180917-39737-3-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:14 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> Use the registration/notification framework supported by the generic
> failover infrastructure.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>

Why was this merged? It was never signed off by any of the netvsc maintainers,
and there were still issues unresolved.

There are also namespaces issues I am fixing and this breaks them.
Will start my patch set with a revert for this. Sorry

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Jason Wang @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Wei Yongjun, Michael S. Tsirkin, Sridhar Samudrala
  Cc: virtualization, netdev, kernel-janitors
In-Reply-To: <1527732307-145609-1-git-send-email-weiyongjun1@huawei.com>



On 2018年05月31日 10:05, Wei Yongjun wrote:
> Fix to return a negative error code from the failover create fail error
> handling case instead of 0, as done elsewhere in this function.
>
> Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>   drivers/net/virtio_net.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f08a3e..2d55e2a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
>   		vi->failover = net_failover_create(vi->dev);
> -		if (IS_ERR(vi->failover))
> +		if (IS_ERR(vi->failover)) {
> +			err = PTR_ERR(vi->failover);
>   			goto free_vqs;
> +		}
>   	}
>   
>   	err = register_netdev(dev);
>

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* [PATCH net-next] virtio_net: fix error return code in virtnet_probe()
From: Wei Yongjun @ 2018-05-31  2:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Sridhar Samudrala
  Cc: Wei Yongjun, virtualization, netdev, kernel-janitors

Fix to return a negative error code from the failover create fail error
handling case instead of 0, as done elsewhere in this function.

Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8f08a3e..2d55e2a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2935,8 +2935,10 @@ static int virtnet_probe(struct virtio_device *vdev)

 	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
 		vi->failover = net_failover_create(vi->dev);
-		if (IS_ERR(vi->failover))
+		if (IS_ERR(vi->failover)) {
+			err = PTR_ERR(vi->failover);
 			goto free_vqs;
+		}
 	}

 	err = register_netdev(dev);

^ permalink raw reply related

* [PATCH net-next] hv_netvsc: fix error return code in netvsc_probe()
From: Wei Yongjun @ 2018-05-31  2:04 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sridhar Samudrala
  Cc: Wei Yongjun, devel, netdev, kernel-janitors

Fix to return a negative error code from the failover register fail
error handling case instead of 0, as done elsewhere in this function.

Fixes: 1ff78076d8dd ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/hyperv/netvsc_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index ebe9642..bef4d55 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2031,8 +2031,10 @@ static int netvsc_probe(struct hv_device *dev,
 	}
 
 	net_device_ctx->failover = failover_register(net, &netvsc_failover_ops);
-	if (IS_ERR(net_device_ctx->failover))
+	if (IS_ERR(net_device_ctx->failover)) {
+		ret = PTR_ERR(net_device_ctx->failover);
 		goto err_failover;
+	}
 
 	return ret;

^ permalink raw reply related

* Re: [PATCH] [net-next, wrong] make BPFILTER_UMH depend on X86
From: Masahiro Yamada @ 2018-05-31  1:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Arnd Bergmann, David S. Miller, Alexei Starovoitov,
	Linux Kbuild mailing list, netdev, Linux Kernel Mailing List
In-Reply-To: <20180530151736.nzpde2bgzn4koi7f@ast-mbp>

2018-05-31 0:17 GMT+09:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> On Mon, May 28, 2018 at 05:31:01PM +0200, Arnd Bergmann wrote:
>> When build testing across architectures, I run into a build error on
>> all targets other than X86:
>>
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump: net/bpfilter/bpfilter_umh: File format not recognized
>> gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objcopy:net/bpfilter/bpfilter_umh.o: Invalid bfd target
>>
>> The problem is that 'hostprogs' get built with 'gcc' rather than
>> '$(CROSS_COMPILE)gcc', and my default gcc (as most people's) targets x86.
>>
>> To work around it, adding an X86 dependency gets randconfigs building
>> again on my box.
>>
>> Clearly, this is not a good solution, since it should actually work fine
>> when building native kernels on other architectures but that is now
>> disabled, while cross building an x86 kernel on another host is still
>> broken after my patch.
>>
>> What we probably want here is to try out if the compiler is able to build
>> executables for the target architecture and not build the helper otherwise,
>> at least when compile-testing. No idea how to do that though.
>>
>> Link: http://www.kernel.org/pub/tools/crosstool/
>> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Cc: linux-kbuild@vger.kernel.org
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>>  net/bpfilter/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
>> index 60725c5f79db..61cc4fcbb4d0 100644
>> --- a/net/bpfilter/Kconfig
>> +++ b/net/bpfilter/Kconfig
>> @@ -9,6 +9,7 @@ menuconfig BPFILTER
>>  if BPFILTER
>>  config BPFILTER_UMH
>>       tristate "bpfilter kernel module with user mode helper"
>> +     depends on X86 # actually depends on native builds
>
> depends on X86 will break it on arm.
> I think the better short term fix would be to test that HOSTCC == CC
> It doesn't have to be the same compiler. HOSTCC's arch == kernel ARCH
> Not sure how to hack makefile to do that.
> Long term we need to get rid of HOSTCC dependency.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hmm.
For cross-compiling, we set 'ARCH' via the environment variable or the
command line.

ARCH is not explicitly set, the top-level Makefile sets it to $(SUBARCH)


ARCH ?= $(SUBARCH)


Maybe, we can assume the native build if $(ARCH) and $(SUBARCH) are the same?


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: YueHaibing @ 2018-05-31  1:34 UTC (permalink / raw)
  To: Timur Tabi, davem; +Cc: netdev, linux-kernel
In-Reply-To: <0db547d4-27c1-a9f7-f443-86bebd831cb2@codeaurora.org>


On 2018/5/30 20:10, Timur Tabi wrote:
> On 5/29/18 5:43 AM, YueHaibing wrote:
>> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
>> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
>>   static struct sgmii_ops qdf2400_ops = {
>>                           ^~~~~~~~~~~
>> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
>>   static struct sgmii_ops qdf2432_ops = {
>>                           ^~~~~~~~~~~
>>
>> Move the declaration and functions inside the CONFIG_ACPI ifdef
>> to fix the warning.
>> Signed-off-by: YueHaibing<yuehaibing@huawei.com>
> 
> I already fixed this with:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6
>

Oh,I should notice this, thanks.

^ permalink raw reply

* [PATCH bpf-next] xsk: temporarily disable AF_XDP
From: Björn Töpel @ 2018-05-31  0:17 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson

From: Björn Töpel <bjorn.topel@intel.com>

Temporarily disable AF_XDP sockets, and hide uapi.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/{uapi => }/linux/if_xdp.h | 0
 net/xdp/Kconfig                   | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename include/{uapi => }/linux/if_xdp.h (100%)

diff --git a/include/uapi/linux/if_xdp.h b/include/linux/if_xdp.h
similarity index 100%
rename from include/uapi/linux/if_xdp.h
rename to include/linux/if_xdp.h
diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
index 90e4a7152854..d845606dae7b 100644
--- a/net/xdp/Kconfig
+++ b/net/xdp/Kconfig
@@ -1,5 +1,5 @@
 config XDP_SOCKETS
-	bool "XDP sockets"
+	bool "XDP sockets" if n
 	depends on BPF_SYSCALL
 	default n
 	help
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Qing Huang @ 2018-05-30 23:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan, linux-rdma,
	Santosh Shilimkar
In-Reply-To: <CANn89i+K3LS7+idPBOkcaBPBdBwz9tWCLqkzbWZA10W2mmR9hA@mail.gmail.com>

On 5/30/2018 2:30 PM, Eric Dumazet wrote:
> On Wed, May 30, 2018 at 5:08 PM Qing Huang<qing.huang@oracle.com>  wrote:
>>
>> On 5/30/2018 1:50 PM, Eric Dumazet wrote:
>>> On Wed, May 30, 2018 at 4:30 PM Qing Huang<qing.huang@oracle.com>  wrote:
>>>> On 5/29/2018 9:11 PM, Eric Dumazet wrote:
>>>>> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
>>>>> brought a regression caught in our regression suite, thanks to KASAN.
>>>> If KASAN reported issue was really caused by smaller chunk sizes,
>>>> changing allocation
>>>> order dynamically will eventually hit the same issue.
>>> Sigh, you have little idea of what your patch really did...
>>>
>>> The KASAN part only shows the tip of the iceberg, but our main concern
>>> is an increase of memory overhead.
>> Well, the commit log only mentioned KASAN and but the change here didn't
>> seem to solve
>> the issue.
> Can you elaborate ?
>
> My patch solves our problems.
>
> Both the memory overhead and KASAN splats are gone.

If KASAN issue was triggered by using smaller chunks, when under memory 
pressure with lots of fragments,
low order memory allocation will do the similar things. So perhaps in 
your test env, memory allocation
and usage is relatively static, that's probably why using larger chunks 
didn't really utilize low order
allocation code path hence no KASAN issue was spotted.

Smaller chunk size in the mlx4 driver is not supposed to cause any 
memory corruption. We will probably
need to continue to investigate this. Can you provide the test command 
that trigger this issue when running
KASAN kernel so we can try to reproduce it in our lab? It could be that 
upstream code is missing some other
fixes.

>>> Alternative is to revert your patch, since we are now very late in 4.17 cycle.
>>>
>>> Memory usage has grown a lot with your patch, since each 4KB page needs a full
>>> struct mlx4_icm_chunk (256 bytes of overhead !)
>> Going to smaller chunks will have some overhead. It depends on the
>> application though.
>> What's the total increased memory consumption in your env?
> As I explained, your patch adds 256 bytes of overhead per 4KB.
>
> Your changelog did not mentioned that at all, and we discovered this
> the hard way.

If you have some concern regarding memory usage, you should bring this 
up during code review.

Repeated failure and retry for lower order allocations could be bad for 
latency too. This wasn't
mentioned in this commit either.

Like I said, how much overhead really depends on the application. 256 
bytes x chunks may not be
significant on a server with lots of memory.

> That is pretty intolerable, and is a blocker for us, memory is precious.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30 22:53 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
In-Reply-To: <6dd76ffc-4097-0e5e-6f66-78fd178f89c2@intel.com>

On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:  
> >> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:  
> >>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:  
> >>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:  
> >>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >>>>> extended?  
> >>> +1 it's painful to see this feature being added to the legacy
> >>> API :(  Another duplicated configuration knob.
> >>>  
> >>>> I didn't know about that.
> >>>>  
> >>>>> Shouldn't we enable this via ethtool on the port representor netdev?  
> >>>> We discussed about this.  ethtool on the VF representor will only work
> >>>> in switchdev mode and also will not support min/max values.  
> >>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> >>> feel like implementing switchdev mode and rewriting features in devlink
> >>> may be too much to ask.  
> >> Totally agreed.  And switchdev mode doesn't seem to be that widely
> >> used at the moment.  Do you have other suggestions besides NDO?  
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)
> >
> > For PCIe config there is always the question of what can be configured
> > at runtime, and what requires a HW reset.  Therefore that devlink API
> > which could configure current as well as persistent device settings was
> > quite nice.  I'm not sure if reallocating queues would ever require
> > PCIe block reset but maybe...  Certainly it seems the notion of min
> > queues would make more sense in PCIe configuration devlink API than
> > ethtool channel API to me as well.
> >
> > Queues are in the grey area between netdev and non-netdev constructs.
> > They make sense both from PCIe resource allocation perspective (i.e.
> > devlink PCIe settings) and netdev perspective (ethtool) because they
> > feed into things like qdisc offloads, maybe per-queue stats etc.
> >
> > So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> > API and/or switchdev representors.  But neither of those are really an
> > option for you today so IDK :)  
> 
> One reason why 'switchdev' mode is not yet widely used or enabled by default
> could be due to the requirement to program the flow rules only via slow path.

Do you mean the fallback traffic requirement?

> Would it make sense to relax this requirement and support a mode where port
> representors are created and let the PF driver implement a default policy that
> adds flow rules for all the VFs to enable connectivity and let the user
> add/modify the rules via port representors?

I definitely share your concerns, stopping a major HW vendor from using
this new and preferred mode is not helping us make progress.

The problem is that if we allow this diversion, i.e. driver to implement
some special policy, or pre-populate a bridge in a configuration that
suits the HW we may condition users to expect that as the standard Linux
behaviour.  And we will be stuck with it forever even tho your next gen
HW (ice?) may support correct behaviour.

We should perhaps separate switchdev mode from TC flower/OvS offloads.
Is your objective to implement OvS offload or just switchdev mode?  

For OvS without proper fallback behaviour you may struggle.

Switchdev mode could be within your reach even without changing the
default rules.  What if you spawned all port netdevs (I dislike the
term representor, sorry, it's confusing people) in down state and then
refuse to bring them up unless user instantiated a bridge that would
behave in a way that your HW can support?  If ports are down you won't
have fallback traffic so no problem to solve.

^ permalink raw reply

* greetings
From: Miss Zeliha ömer Faruk @ 2018-05-30 22:34 UTC (permalink / raw)




-- 
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215 Sisli - Istanbul, Turke

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox