Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next V2 1/2] cls_flower: Fix missing free of rhashtable
From: Paul Blakey @ 2018-06-05  7:44 UTC (permalink / raw)
  To: David Miller
  Cc: paulb, jiri, xiyou.wangcong, jhs, netdev, kliteyn, roid, shahark,
	markb, ogerlitz
In-Reply-To: <20180604.170409.1557542563226321470.davem@davemloft.net>



On 05/06/2018 00:04, David Miller wrote:
> From: Paul Blakey <paulb@mellanox.com>
> Date: Sun,  3 Jun 2018 10:06:13 +0300
> 
>> When destroying the instance, destroy the head rhashtable.
>>
>> Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
>> Reported-by: Vlad Buslov <vladbu@mellanox.com>
>> Reviewed-by: Roi Dayan <roid@mellanox.com>
>> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> 
> Applied.
> 

thanks.

^ permalink raw reply

* Re: [Patch net-next] netdev-FAQ: clarify DaveM's position for stable backports
From: Greg KH @ 2018-06-05  7:46 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, stable
In-Reply-To: <20180604180719.6053-1-xiyou.wangcong@gmail.com>

On Mon, Jun 04, 2018 at 11:07:19AM -0700, Cong Wang wrote:
> Per discussion with David at netconf 2018, let's clarify
> DaveM's position of handling stable backports in netdev-FAQ.
> 
> This is important for people relying on upstream -stable
> releases.
> 
> Cc: stable@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  Documentation/networking/netdev-FAQ.txt | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt
> index 2a3278d5cf35..6dde6686c870 100644
> --- a/Documentation/networking/netdev-FAQ.txt
> +++ b/Documentation/networking/netdev-FAQ.txt
> @@ -179,6 +179,15 @@ A: No.  See above answer.  In short, if you think it really belongs in
>     dash marker line as described in Documentation/process/submitting-patches.rst to
>     temporarily embed that information into the patch that you send.
>  
> +Q: Are all networking bug fixes backported to all stable releases?
> +
> +A: Due to capacity, Dave could only take care of the backports for the last
> +   3 stable releases. For earlier stable releases, each stable branch maintainer

s/3/2/

There's no need for Dave to do more work than he currently does :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next 1/3] devlink: Add extack to reload and port_{un,}split operations
From: Jiri Pirko @ 2018-06-05  7:50 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180604221503.20329-2-dsahern@kernel.org>

Tue, Jun 05, 2018 at 12:15:01AM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Add extack argument to reload, port_split and port_unsplit operations.
>
>Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures^[
From: Jiri Pirko @ 2018-06-05  7:52 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180604221503.20329-4-dsahern@kernel.org>

Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Return messages in extack for port split/unsplit errors. e.g.,
>    $ devlink port split swp1s1 count 4
>    Error: mlxsw_spectrum: Port cannot be split further.
>    devlink answers: Invalid argument
>
>    $ devlink port unsplit swp4
>    Error: mlxsw_spectrum: Port was not split.
>    devlink answers: Invalid argument
>
>Signed-off-by: David Ahern <dsahern@gmail.com>
>---
> drivers/net/ethernet/mellanox/mlxsw/core.c     | 14 ++++++++++----
> drivers/net/ethernet/mellanox/mlxsw/core.h     |  5 +++--
> drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 13 +++++++++++--
> 3 files changed, 24 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
>index 7ed38d80bc08..f9c724752a32 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
>@@ -775,11 +775,14 @@ static int mlxsw_devlink_port_split(struct devlink *devlink,
> {
> 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
> 
>-	if (port_index >= mlxsw_core->max_ports)
>+	if (port_index >= mlxsw_core->max_ports) {
>+		NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number of ports");
> 		return -EINVAL;
>+	}
> 	if (!mlxsw_core->driver->port_split)
> 		return -EOPNOTSUPP;
>-	return mlxsw_core->driver->port_split(mlxsw_core, port_index, count);
>+	return mlxsw_core->driver->port_split(mlxsw_core, port_index, count,
>+					      extack);
> }
> 
> static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
>@@ -788,11 +791,14 @@ static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
> {
> 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
> 
>-	if (port_index >= mlxsw_core->max_ports)
>+	if (port_index >= mlxsw_core->max_ports) {
>+		NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number of ports");
> 		return -EINVAL;
>+	}
> 	if (!mlxsw_core->driver->port_unsplit)
> 		return -EOPNOTSUPP;
>-	return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index);
>+	return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index,
>+						extack);
> }
> 
> static int
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
>index 4a8d4c7f89d9..552cfa29c2f7 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
>@@ -274,8 +274,9 @@ struct mlxsw_driver {
> 	int (*port_type_set)(struct mlxsw_core *mlxsw_core, u8 local_port,
> 			     enum devlink_port_type new_type);
> 	int (*port_split)(struct mlxsw_core *mlxsw_core, u8 local_port,
>-			  unsigned int count);
>-	int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port);
>+			  unsigned int count, struct netlink_ext_ack *extack);
>+	int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port,
>+			    struct netlink_ext_ack *extack);
> 	int (*sb_pool_get)(struct mlxsw_core *mlxsw_core,
> 			   unsigned int sb_index, u16 pool_index,
> 			   struct devlink_sb_pool_info *pool_info);
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>index fc39f22e5c70..1b6d930e452d 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>@@ -3092,7 +3092,8 @@ static void mlxsw_sp_port_unsplit_create(struct mlxsw_sp *mlxsw_sp,
> }
> 
> static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
>-			       unsigned int count)
>+			       unsigned int count,
>+			       struct netlink_ext_ack *extack)
> {
> 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
> 	struct mlxsw_sp_port *mlxsw_sp_port;
>@@ -3104,6 +3105,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 	if (!mlxsw_sp_port) {
> 		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
> 			local_port);
>+		NL_SET_ERR_MSG_MOD(extack, "Port number does not exist");
> 		return -EINVAL;
> 	}
> 
>@@ -3112,11 +3114,13 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 
> 	if (count != 2 && count != 4) {
> 		netdev_err(mlxsw_sp_port->dev, "Port can only be split into 2 or 4 ports\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port can only be split into 2 or 4 ports");
> 		return -EINVAL;
> 	}
> 
> 	if (cur_width != MLXSW_PORT_MODULE_MAX_WIDTH) {
> 		netdev_err(mlxsw_sp_port->dev, "Port cannot be split further\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port cannot be split further");
> 		return -EINVAL;
> 	}
> 
>@@ -3125,6 +3129,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 		base_port = local_port;
> 		if (mlxsw_sp->ports[base_port + 1]) {
> 			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
>+			NL_SET_ERR_MSG_MOD(extack, "Invalid split configuration");
> 			return -EINVAL;
> 		}
> 	} else {
>@@ -3132,6 +3137,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 		if (mlxsw_sp->ports[base_port + 1] ||
> 		    mlxsw_sp->ports[base_port + 3]) {
> 			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
>+			NL_SET_ERR_MSG_MOD(extack, "Invalid split configuration");
> 			return -EINVAL;
> 		}
> 	}
>@@ -3153,7 +3159,8 @@ static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
> 	return err;
> }
> 
>-static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
>+static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port,
>+				 struct netlink_ext_ack *extack)
> {
> 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
> 	struct mlxsw_sp_port *mlxsw_sp_port;
>@@ -3165,11 +3172,13 @@ static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
> 	if (!mlxsw_sp_port) {
> 		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
> 			local_port);
>+		NL_SET_ERR_MSG_MOD(extack, "Port number does not exist");
> 		return -EINVAL;
> 	}
> 
> 	if (!mlxsw_sp_port->split) {
> 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
>+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");

I wonder if we need the dmesg for these as well. Plus it is not the same
(wasn't/was not) which is maybe confusing. Any objection against the
original dmesg messages removal?


> 		return -EINVAL;
> 	}
> 
>-- 
>2.11.0
>

^ permalink raw reply

* [PATCH net] net: sched: cls: Fix offloading when ingress dev is vxlan
From: Paul Blakey @ 2018-06-05  8:04 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang, Jamal Hadi Salim, David Miller, netdev
  Cc: Yevgeny Kliteynik, Roi Dayan, Shahar Klein, Mark Bloch,
	Or Gerlitz, Paul Blakey

When using a vxlan device as the ingress dev, we count it as a
"no offload dev", so when such a rule comes and err stop is true,
we fail early and don't try the egdev route which can offload it
through the egress device.

Fix that by not calling the block offload if one of the devices
attached to it is not offload capable, but make sure egress on such case
is capable instead.

Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
---
 net/sched/cls_api.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a57e112..2cd579f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -734,10 +734,6 @@ static int tcf_block_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	int ok_count = 0;
 	int err;
 
-	/* Make sure all netdevs sharing this block are offload-capable. */
-	if (block->nooffloaddevcnt && err_stop)
-		return -EOPNOTSUPP;
-
 	list_for_each_entry(block_cb, &block->cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
@@ -1580,21 +1576,31 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts *exts,
 int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 		     enum tc_setup_type type, void *type_data, bool err_stop)
 {
-	int ok_count;
+	int ok_count = 0;
 	int ret;
 
-	ret = tcf_block_cb_call(block, type, type_data, err_stop);
-	if (ret < 0)
-		return ret;
-	ok_count = ret;
+	if (!block->nooffloaddevcnt) {
+		ret = tcf_block_cb_call(block, type, type_data, err_stop);
+		if (ret < 0)
+			return ret;
+		ok_count = ret;
+	}
 
 	if (!exts || ok_count)
-		return ok_count;
+		goto skip_egress;
+
 	ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
 	if (ret < 0)
 		return ret;
 	ok_count += ret;
 
+skip_egress:
+	/* if one of the netdevs sharing this block are not offload-capable
+	 * make sure we succeeded in egress instead.
+	 */
+	if (block->nooffloaddevcnt && !ok_count && err_stop)
+		return -EOPNOTSUPP;
+
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures?
From: Ido Schimmel @ 2018-06-05  8:05 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: dsahern, netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180605075230.GC2164@nanopsycho>

On Tue, Jun 05, 2018 at 09:52:30AM +0200, Jiri Pirko wrote:
> Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
> > 	if (!mlxsw_sp_port->split) {
> > 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
> >+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");
> 
> I wonder if we need the dmesg for these as well. Plus it is not the same
> (wasn't/was not) which is maybe confusing. Any objection against the
> original dmesg messages removal?

We had this discussion about three months ago and decided to keep the
existing messages:
https://marc.info/?l=linux-netdev&m=151982813309466&w=2

^ permalink raw reply

* Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split   failures?
From: Jiri Pirko @ 2018-06-05  8:18 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: dsahern, netdev, idosch, jiri, jakub.kicinski, David Ahern
In-Reply-To: <20180605080528.GA2034@splinter.mtl.com>

Tue, Jun 05, 2018 at 10:05:28AM CEST, idosch@idosch.org wrote:
>On Tue, Jun 05, 2018 at 09:52:30AM +0200, Jiri Pirko wrote:
>> Tue, Jun 05, 2018 at 12:15:03AM CEST, dsahern@kernel.org wrote:
>> > 	if (!mlxsw_sp_port->split) {
>> > 		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
>> >+		NL_SET_ERR_MSG_MOD(extack, "Port was not split");
>> 
>> I wonder if we need the dmesg for these as well. Plus it is not the same
>> (wasn't/was not) which is maybe confusing. Any objection against the
>> original dmesg messages removal?
>
>We had this discussion about three months ago and decided to keep the
>existing messages:
>https://marc.info/?l=linux-netdev&m=151982813309466&w=2

I forgot. Thanks for reminding me. So could we at least have the
messages 100% same? Thanks.

^ permalink raw reply

* Re: WARNING in bpf_int_jit_compile
From: Dmitry Vyukov @ 2018-06-05  8:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: syzbot, Alexei Starovoitov, David Miller, H. Peter Anvin,
	Alexey Kuznetsov, LKML, Ingo Molnar, netdev, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Hideaki YOSHIFUJI
In-Reply-To: <cc2c87d5-0543-c340-9700-23f8676315f0@iogearbox.net>

On Mon, May 28, 2018 at 1:17 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 05/26/2018 11:29 AM, syzbot wrote:
>> syzbot has found a reproducer for the following crash on:
>>
>> HEAD commit:    62d18ecfa641 Merge tag 'arm64-fixes' of git://git.kernel.o..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=14c6bf57800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>> dashboard link: https://syzkaller.appspot.com/bug?extid=9e762b52dd17e616a7a5
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=130e42b7800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
>>
>> RAX: ffffffffffffffda RBX: 0000000002542914 RCX: 0000000000455a09
>> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
>> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> R13: 0000000000000046 R14: 00000000006f4730 R15: 0000000000000023
>> WARNING: CPU: 0 PID: 4752 at include/linux/filter.h:667 bpf_jit_binary_lock_ro include/linux/filter.h:667 [inline]
>> WARNING: CPU: 0 PID: 4752 at include/linux/filter.h:667 bpf_int_jit_compile+0xbf7/0xef7 arch/x86/net/bpf_jit_comp.c:1271
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 4752 Comm: syz-executor0 Not tainted 4.17.0-rc6+ #67
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  panic+0x22f/0x4de kernel/panic.c:184
>>  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
>>  report_bug+0x252/0x2d0 lib/bug.c:186
>>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>>  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
>> RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:667 [inline]
>
> Been looking into this last Friday already. What seems to happen here is that
> there's fault injection from inside set_memory_ro(), meaning it will eventually
> return an error there, and we throw a WARN_ON_ONCE() to bark that making the
> memory read-only didn't work out. I'd be in preference to notify the user on
> such issue rather than keeping completely silent about it so that there's
> awareness that read-only protections are not in place / guaranteed.


Hi Daniel,

There is no problem with notifying user, but there is a problem with
making this notification indistinguishable from kernel bugs. Also the
stack trace is pointless for end user, and they may not even
understand what happened and what they were notified about (not
everybody is a kernel developer!). A pr_err/pr_err_once with a
readable error message would be a better fit here.


>> RIP: 0010:bpf_int_jit_compile+0xbf7/0xef7 arch/x86/net/bpf_jit_comp.c:1271
>> RSP: 0018:ffff8801d85ff920 EFLAGS: 00010293
>> RAX: ffff8801d78c40c0 RBX: 0000000000000046 RCX: ffffffff81445d89
>> RDX: 0000000000000000 RSI: ffffffff81445d97 RDI: 0000000000000005
>> RBP: ffff8801d85ffa40 R08: ffff8801d78c40c0 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000194e002
>> R13: ffff8801d85ffa18 R14: 00000000fffffff4 R15: 0000000000000003
>>  bpf_prog_select_runtime+0x131/0x640 kernel/bpf/core.c:1541
>>  bpf_prog_load+0x16c2/0x2070 kernel/bpf/syscall.c:1333
>>  __do_sys_bpf kernel/bpf/syscall.c:2073 [inline]
>>  __se_sys_bpf kernel/bpf/syscall.c:2035 [inline]
>>  __x64_sys_bpf+0x389/0x4c0 kernel/bpf/syscall.c:2035
>>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x455a09
>> RSP: 002b:00007ffec3da2868 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
>> RAX: ffffffffffffffda RBX: 0000000002542914 RCX: 0000000000455a09
>> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
>> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> R13: 0000000000000046 R14: 00000000006f4730 R15: 0000000000000023
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
>>
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/cc2c87d5-0543-c340-9700-23f8676315f0%40iogearbox.net.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Michal Hocko @ 2018-06-05  8:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, qing.huang, tariqt, haakon.bugge, yanjun.zhu,
	netdev, linux-rdma, linux-kernel, gi-oh.kim
In-Reply-To: <1908601f-2eda-d739-9c4d-430a002b1a05@gmail.com>

On Mon 04-06-18 06:22:26, Eric Dumazet wrote:
> 
> 
> On 06/04/2018 06:11 AM, Michal Hocko wrote:
> > On Thu 31-05-18 11:10:22, Michal Hocko wrote:
> 
> > Just in case you are interested
> > ---
> > From 5010543ed6f73e4c00367801486dca8d5c63b2ce Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Mon, 4 Jun 2018 15:07:37 +0200
> > Subject: [PATCH] net: cleanup gfp mask in alloc_skb_with_frags
> > 
> > alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations
> > which is just a noop and a little bit confusing.
> > 
> > __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> > high order allocations") to prevent from the OOM killer. Yet this was
> > not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> > allocation") didn't want an excessive reclaim for non-costly orders
> > so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> > place which is now redundant.
> > 
> > Drop the pointless __GFP_NORETRY because this function is used as
> > copy&paste source for other places.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks! What is the proper process now? Should I resend or somebody can
pick it up from this thread?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* [net-next PATCH v3 0/5] Symmetric queue selection using XPS for Rx queues
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom

This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.

XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.

- /sys/class/net/<dev>/queues/tx-*/xps_rxqs

A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue pair as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html

Testing Hints:
Kernel:  Linux 4.17.0-rc7+
Interface: 
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b

Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on

Workload: 
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.

Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue

System:
Architecture:          x86_64
CPU(s):                72
Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

16 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 4/51/2215               2/30/5163
(usec)


intr/sec                        26655                   18606

contextswitch/sec               5145                    4044

insn per cycle                  0.43                    0.72

cache-misses                    6.919                   4.310
(% of all cache refs)

L1-dcache-load-                 4.49                    3.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.26                   8.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

32 threads  400K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                 10/112/5562             9/46/4637
(usec)


intr/sec                        30456                   27666

contextswitch/sec               7552                    5133

insn per cycle                  0.41                    0.49

cache-misses                    9.357                   2.769
(% of all cache refs)

L1-dcache-load-                 4.09                    3.98
-misses
(% of all L1-dcache hits)

LLC-load-misses                 12.96                   3.96
(% of all LL-cache hits)

-------------------------------------------------------------------------------

16 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  5/151/4989             9/69/2611
(usec)


intr/sec                        35686                   22907

contextswitch/sec               25522                   12281

insn per cycle                  0.67                    0.74

cache-misses                    8.652                   6.38
(% of all cache refs)

L1-dcache-load-                 3.19                    2.86
-misses
(% of all L1-dcache hits)

LLC-load-misses                 16.53                   11.99
(% of all LL-cache hits)

-------------------------------------------------------------------------------
32 threads  800K requests/sec
=============================
-------------------------------------------------------------------------------
                                Default                 Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max                  6/163/6152             8/88/4209
(usec)


intr/sec                        47079                   26548

contextswitch/sec               42190                   39168

insn per cycle                  0.45                    0.54

cache-misses                    8.798                   4.668
(% of all cache refs)

L1-dcache-load-                 6.55                    6.29
-misses
(% of all L1-dcache hits)

LLC-load-misses                 13.91                   10.44
(% of all LL-cache hits)

-------------------------------------------------------------------------------

v3:
- Removed xps_maps array and used separate pointers for rxqs_map and cpus_map
- Used static_key for XPS maps
- Removed skc_rx_ifindex from sock_common
- Record rx_queue_mapping as part of sk_mark_napi_id, sk_mark_rx_queue is
  retained to be used on request sock which has only struct sock_common and not
  a full sock.
- Added documentation in ABI/testing/sysfs-class-net-queues

---

Amritha Nambiar (5):
      net: Refactor XPS for CPUs and Rx queues
      net: Use static_key for XPS maps
      net: Enable Tx queue selection based on Rx queues
      net-sysfs: Add interface for Rx queue(s) map per Tx queue
      Documentation: Add explanation for XPS using Rx-queue(s) map


 Documentation/ABI/testing/sysfs-class-net-queues |   11 +
 Documentation/networking/scaling.txt             |   58 ++++
 include/linux/cpumask.h                          |   11 +
 include/linux/netdevice.h                        |   73 +++++-
 include/net/busy_poll.h                          |    3 
 include/net/sock.h                               |   14 +
 net/core/dev.c                                   |  292 +++++++++++++++-------
 net/core/net-sysfs.c                             |   85 ++++++
 net/core/sock.c                                  |    4 
 net/ipv4/tcp_input.c                             |    3 
 10 files changed, 445 insertions(+), 109 deletions(-)

^ permalink raw reply

* [net-next PATCH v3 1/5] net: Refactor XPS for CPUs and Rx queues
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/cpumask.h   |   11 ++
 include/linux/netdevice.h |   73 ++++++++++++++
 net/core/dev.c            |  228 ++++++++++++++++++++++++++++++---------------
 net/core/net-sysfs.c      |    4 -
 4 files changed, 234 insertions(+), 82 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index bf53d89..57f20a0 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -115,12 +115,17 @@ extern struct cpumask __cpu_active_mask;
 #define cpu_active(cpu)		((cpu) == 0)
 #endif
 
-/* verify cpu argument to cpumask_* operators */
-static inline unsigned int cpumask_check(unsigned int cpu)
+static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
 {
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
-	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
+	WARN_ON_ONCE(cpu >= bits);
 #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
+/* verify cpu argument to cpumask_* operators */
+static inline unsigned int cpumask_check(unsigned int cpu)
+{
+	cpu_max_bits_warn(cpu, nr_cpumask_bits);
 	return cpu;
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6b863ed..354c866 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -730,10 +730,21 @@ struct xps_map {
  */
 struct xps_dev_maps {
 	struct rcu_head rcu;
-	struct xps_map __rcu *cpu_map[0];
+	struct xps_map __rcu *attr_map[0];
 };
-#define XPS_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +		\
+
+#define XPS_CPU_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) +	\
 	(nr_cpu_ids * (_tcs) * sizeof(struct xps_map *)))
+
+#define XPS_RXQ_DEV_MAPS_SIZE(_tcs, _rxqs) (sizeof(struct xps_dev_maps) +\
+	(_rxqs * (_tcs) * sizeof(struct xps_map *)))
+
+enum xps_map_type {
+	XPS_MAP_RXQS,
+	XPS_MAP_CPUS,
+	__XPS_MAP_MAX
+};
+
 #endif /* CONFIG_XPS */
 
 #define TC_MAX_QUEUE	16
@@ -1902,7 +1913,8 @@ struct net_device {
 	int			watchdog_timeo;
 
 #ifdef CONFIG_XPS
-	struct xps_dev_maps __rcu *xps_maps;
+	struct xps_dev_maps __rcu *xps_cpus_map;
+	struct xps_dev_maps __rcu *xps_rxqs_map;
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
@@ -3251,6 +3263,61 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
 #ifdef CONFIG_XPS
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			u16 index);
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, enum xps_map_type type);
+
+static inline bool attr_test_mask(unsigned long j, const unsigned long *mask,
+				  unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+	return test_bit(j, mask);
+}
+
+static inline bool attr_test_online(unsigned long j,
+				    const unsigned long *online_mask,
+				    unsigned int nr_bits)
+{
+	cpu_max_bits_warn(j, nr_bits);
+
+	if (online_mask)
+		return test_bit(j, online_mask);
+
+	if (j >= 0 && j < nr_bits)
+		return true;
+
+	return false;
+}
+
+static inline unsigned int attrmask_next(int n, const unsigned long *srcp,
+					 unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (srcp)
+		return find_next_bit(srcp, nr_bits, n + 1);
+
+	return n + 1;
+}
+
+static inline int attrmask_next_and(int n, const unsigned long *src1p,
+				    const unsigned long *src2p,
+				    unsigned int nr_bits)
+{
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpu_max_bits_warn(n, nr_bits);
+
+	if (src1p && src2p)
+		return find_next_and_bit(src1p, src2p, nr_bits, n + 1);
+	else if (src1p)
+		return find_next_bit(src1p, nr_bits, n + 1);
+	else if (src2p)
+		return find_next_bit(src2p, nr_bits, n + 1);
+
+	return n + 1;
+}
 #else
 static inline int netif_set_xps_queue(struct net_device *dev,
 				      const struct cpumask *mask,
diff --git a/net/core/dev.c b/net/core/dev.c
index 1844d9b..156acbe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2092,7 +2092,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 	int pos;
 
 	if (dev_maps)
-		map = xmap_dereference(dev_maps->cpu_map[tci]);
+		map = xmap_dereference(dev_maps->attr_map[tci]);
 	if (!map)
 		return false;
 
@@ -2105,7 +2105,7 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 			break;
 		}
 
-		RCU_INIT_POINTER(dev_maps->cpu_map[tci], NULL);
+		RCU_INIT_POINTER(dev_maps->attr_map[tci], NULL);
 		kfree_rcu(map, rcu);
 		return false;
 	}
@@ -2138,30 +2138,57 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 				   u16 count)
 {
+	const unsigned long *possible_mask = NULL;
+	enum xps_map_type type = XPS_MAP_RXQS;
 	struct xps_dev_maps *dev_maps;
-	int cpu, i;
 	bool active = false;
+	unsigned int nr_ids;
+	int i, j;
 
 	mutex_lock(&xps_map_mutex);
-	dev_maps = xmap_dereference(dev->xps_maps);
-
-	if (!dev_maps)
-		goto out_no_maps;
-
-	for_each_possible_cpu(cpu)
-		active |= remove_xps_queue_cpu(dev, dev_maps, cpu,
-					       offset, count);
-
-	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
-		kfree_rcu(dev_maps, rcu);
-	}
+	while (type < __XPS_MAP_MAX) {
+		switch (type) {
+		case XPS_MAP_RXQS:
+			dev_maps = xmap_dereference(dev->xps_rxqs_map);
+			if (!dev_maps)
+				goto out_no_maps;
+			nr_ids = dev->num_rx_queues;
+			break;
+		case XPS_MAP_CPUS:
+			dev_maps = xmap_dereference(dev->xps_cpus_map);
+			if (!dev_maps)
+				goto out_no_maps;
+
+			if (num_possible_cpus() > 1)
+				possible_mask = cpumask_bits(cpu_possible_mask);
+			nr_ids = nr_cpu_ids;
+			break;
+		default:
+			goto out_no_maps;
+		}
 
-	for (i = offset + (count - 1); count--; i--)
-		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, i),
-					     NUMA_NO_NODE);
+		for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+		     j < nr_ids;)
+			active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
+						       count);
+		if (!active) {
+			if (type == XPS_MAP_RXQS)
+				RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+
+			if (type == XPS_MAP_CPUS) {
+				RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
+
+				for (i = offset + (count - 1); count--; i--)
+					netdev_queue_numa_node_write(
+						netdev_get_tx_queue(dev, i),
+								NUMA_NO_NODE);
+			}
+			kfree_rcu(dev_maps, rcu);
+		}
 
 out_no_maps:
+		type++;
+	}
 	mutex_unlock(&xps_map_mutex);
 }
 
@@ -2170,8 +2197,8 @@ static void netif_reset_xps_queues_gt(struct net_device *dev, u16 index)
 	netif_reset_xps_queues(dev, index, dev->num_tx_queues - index);
 }
 
-static struct xps_map *expand_xps_map(struct xps_map *map,
-				      int cpu, u16 index)
+static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
+				      u16 index, enum xps_map_type type)
 {
 	struct xps_map *new_map;
 	int alloc_len = XPS_MIN_MAP_ALLOC;
@@ -2183,7 +2210,7 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		return map;
 	}
 
-	/* Need to add queue to this CPU's existing map */
+	/* Need to add tx-queue to this CPU's/rx-queue's existing map */
 	if (map) {
 		if (pos < map->alloc_len)
 			return map;
@@ -2191,9 +2218,20 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 		alloc_len = map->alloc_len * 2;
 	}
 
-	/* Need to allocate new map to store queue on this CPU's map */
-	new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
-			       cpu_to_node(cpu));
+	/* Need to allocate new map to store tx-queue on this CPU's/rx-queue's
+	 *  map
+	 */
+	switch (type) {
+	case XPS_MAP_RXQS:
+		new_map = kzalloc(XPS_MAP_SIZE(alloc_len), GFP_KERNEL);
+		break;
+	case XPS_MAP_CPUS:
+		new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
+				       cpu_to_node(attr_index));
+		break;
+	default:
+		return NULL;
+	}
 	if (!new_map)
 		return NULL;
 
@@ -2205,14 +2243,16 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
 	return new_map;
 }
 
-int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
-			u16 index)
+int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
+			  u16 index, enum xps_map_type type)
 {
+	const unsigned long *online_mask = NULL, *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
-	int i, cpu, tci, numa_node_id = -2;
+	int i, j, tci, numa_node_id = -2;
 	int maps_sz, num_tc = 1, tc = 0;
 	struct xps_map *map, *new_map;
 	bool active = false;
+	unsigned int nr_ids;
 
 	if (dev->num_tc) {
 		num_tc = dev->num_tc;
@@ -2221,16 +2261,33 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -EINVAL;
 	}
 
-	maps_sz = XPS_DEV_MAPS_SIZE(num_tc);
-	if (maps_sz < L1_CACHE_BYTES)
-		maps_sz = L1_CACHE_BYTES;
-
 	mutex_lock(&xps_map_mutex);
+	switch (type) {
+	case XPS_MAP_RXQS:
+		maps_sz = XPS_RXQ_DEV_MAPS_SIZE(num_tc, dev->num_rx_queues);
+		dev_maps = xmap_dereference(dev->xps_rxqs_map);
+		nr_ids = dev->num_rx_queues;
+		break;
+	case XPS_MAP_CPUS:
+		maps_sz = XPS_CPU_DEV_MAPS_SIZE(num_tc);
+		if (num_possible_cpus() > 1) {
+			online_mask = cpumask_bits(cpu_online_mask);
+			possible_mask = cpumask_bits(cpu_possible_mask);
+		}
+		dev_maps = xmap_dereference(dev->xps_cpus_map);
+		nr_ids = nr_cpu_ids;
+		break;
+	default:
+		mutex_unlock(&xps_map_mutex);
+		return -EINVAL;
+	}
 
-	dev_maps = xmap_dereference(dev->xps_maps);
+	if (maps_sz < L1_CACHE_BYTES)
+		maps_sz = L1_CACHE_BYTES;
 
 	/* allocate memory for queue storage */
-	for_each_cpu_and(cpu, cpu_online_mask, mask) {
+	for (j = -1; j = attrmask_next_and(j, online_mask, mask, nr_ids),
+	     j < nr_ids;) {
 		if (!new_dev_maps)
 			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
 		if (!new_dev_maps) {
@@ -2238,73 +2295,81 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			return -ENOMEM;
 		}
 
-		tci = cpu * num_tc + tc;
-		map = dev_maps ? xmap_dereference(dev_maps->cpu_map[tci]) :
+		tci = j * num_tc + tc;
+		map = dev_maps ? xmap_dereference(dev_maps->attr_map[tci]) :
 				 NULL;
 
-		map = expand_xps_map(map, cpu, index);
+		map = expand_xps_map(map, j, index, type);
 		if (!map)
 			goto error;
 
-		RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+		RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 	}
 
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
-	for_each_possible_cpu(cpu) {
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
-		for (i = tc, tci = cpu * num_tc; dev_maps && i--; tci++) {
+		for (i = tc, tci = j * num_tc; dev_maps && i--; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* We need to explicitly update tci as prevous loop
 		 * could break out early if dev_maps is NULL.
 		 */
-		tci = cpu * num_tc + tc;
+		tci = j * num_tc + tc;
 
-		if (cpumask_test_cpu(cpu, mask) && cpu_online(cpu)) {
-			/* add queue to CPU maps */
+		if (attr_test_mask(j, mask, nr_ids) &&
+		    attr_test_online(j, online_mask, nr_ids)) {
+			/* add tx-queue to CPU/rx-queue maps */
 			int pos = 0;
 
-			map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+			map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			while ((pos < map->len) && (map->queues[pos] != index))
 				pos++;
 
 			if (pos == map->len)
 				map->queues[map->len++] = index;
 #ifdef CONFIG_NUMA
-			if (numa_node_id == -2)
-				numa_node_id = cpu_to_node(cpu);
-			else if (numa_node_id != cpu_to_node(cpu))
-				numa_node_id = -1;
+			if (type == XPS_MAP_CPUS) {
+				if (numa_node_id == -2)
+					numa_node_id = cpu_to_node(j);
+				else if (numa_node_id != cpu_to_node(j))
+					numa_node_id = -1;
+			}
 #endif
 		} else if (dev_maps) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* copy maps belonging to foreign traffic classes */
 		for (i = num_tc - tc, tci++; dev_maps && --i; tci++) {
 			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->cpu_map[tci], map);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
+			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 	}
 
-	rcu_assign_pointer(dev->xps_maps, new_dev_maps);
+	if (type == XPS_MAP_RXQS)
+		rcu_assign_pointer(dev->xps_rxqs_map, new_dev_maps);
+	else if (type == XPS_MAP_CPUS)
+		rcu_assign_pointer(dev->xps_cpus_map, new_dev_maps);
 
 	/* Cleanup old maps */
 	if (!dev_maps)
 		goto out_no_old_maps;
 
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
-			map = xmap_dereference(dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
+			map = xmap_dereference(dev_maps->attr_map[tci]);
 			if (map && map != new_map)
 				kfree_rcu(map, rcu);
 		}
@@ -2317,19 +2382,23 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	active = true;
 
 out_no_new_maps:
-	/* update Tx queue numa node */
-	netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
-				     (numa_node_id >= 0) ? numa_node_id :
-				     NUMA_NO_NODE);
+	if (type == XPS_MAP_CPUS) {
+		/* update Tx queue numa node */
+		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
+					     (numa_node_id >= 0) ?
+					     numa_node_id : NUMA_NO_NODE);
+	}
 
 	if (!dev_maps)
 		goto out_no_maps;
 
-	/* removes queue from unused CPUs */
-	for_each_possible_cpu(cpu) {
-		for (i = tc, tci = cpu * num_tc; i--; tci++)
+	/* removes tx-queue from unused CPUs/rx-queues */
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = tc, tci = j * num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
-		if (!cpumask_test_cpu(cpu, mask) || !cpu_online(cpu))
+		if (!attr_test_mask(j, mask, nr_ids) ||
+		    !attr_test_online(j, online_mask, nr_ids))
 			active |= remove_xps_queue(dev_maps, tci, index);
 		for (i = num_tc - tc, tci++; --i; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
@@ -2337,7 +2406,10 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 
 	/* free map if not active */
 	if (!active) {
-		RCU_INIT_POINTER(dev->xps_maps, NULL);
+		if (type == XPS_MAP_RXQS)
+			RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
+		else if (type == XPS_MAP_CPUS)
+			RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
 		kfree_rcu(dev_maps, rcu);
 	}
 
@@ -2347,11 +2419,12 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	return 0;
 error:
 	/* remove any maps that we added */
-	for_each_possible_cpu(cpu) {
-		for (i = num_tc, tci = cpu * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->cpu_map[tci]);
+	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
+	     j < nr_ids;) {
+		for (i = num_tc, tci = j * num_tc; i--; tci++) {
+			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			map = dev_maps ?
-			      xmap_dereference(dev_maps->cpu_map[tci]) :
+			      xmap_dereference(dev_maps->attr_map[tci]) :
 			      NULL;
 			if (new_map && new_map != map)
 				kfree(new_map);
@@ -2363,6 +2436,13 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	kfree(new_dev_maps);
 	return -ENOMEM;
 }
+
+int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
+			u16 index)
+{
+	return __netif_set_xps_queue(dev, cpumask_bits(mask), index,
+				     XPS_MAP_CPUS);
+}
 EXPORT_SYMBOL(netif_set_xps_queue);
 
 #endif
@@ -3402,7 +3482,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	int queue_index = -1;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		unsigned int tci = skb->sender_cpu - 1;
 
@@ -3411,7 +3491,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 			tci += netdev_get_prio_tc_map(dev, skb->priority);
 		}
 
-		map = rcu_dereference(dev_maps->cpu_map[tci]);
+		map = rcu_dereference(dev_maps->attr_map[tci]);
 		if (map) {
 			if (map->len == 1)
 				queue_index = map->queues[0];
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bb7e80f..b39987c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1227,13 +1227,13 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 		return -ENOMEM;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps);
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {
 		for_each_possible_cpu(cpu) {
 			int i, tci = cpu * num_tc + tc;
 			struct xps_map *map;
 
-			map = rcu_dereference(dev_maps->cpu_map[tci]);
+			map = rcu_dereference(dev_maps->attr_map[tci]);
 			if (!map)
 				continue;
 

^ permalink raw reply related

* [net-next PATCH v3 2/5] net: Use static_key for XPS maps
From: Amritha Nambiar @ 2018-06-05  8:37 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Use static_key for XPS maps to reduce the cost of extra map checks,
similar to how it is used for RPS and RFS.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/dev.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 156acbe..bba755f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2081,6 +2081,8 @@ int netdev_txq_to_tc(struct net_device *dev, unsigned int txq)
 EXPORT_SYMBOL(netdev_txq_to_tc);
 
 #ifdef CONFIG_XPS
+struct static_key xps_needed __read_mostly;
+EXPORT_SYMBOL(xps_needed);
 static DEFINE_MUTEX(xps_map_mutex);
 #define xmap_dereference(P)		\
 	rcu_dereference_protected((P), lockdep_is_held(&xps_map_mutex))
@@ -2189,6 +2191,7 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 out_no_maps:
 		type++;
 	}
+	static_key_slow_dec(&xps_needed);
 	mutex_unlock(&xps_map_mutex);
 }
 
@@ -2309,6 +2312,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (!new_dev_maps)
 		goto out_no_new_maps;
 
+	static_key_slow_inc(&xps_needed);
+
 	for (j = -1; j = attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
@@ -3481,6 +3486,9 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 	struct xps_map *map;
 	int queue_index = -1;
 
+	if (!static_key_false(&xps_needed))
+		return -1;
+
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (dev_maps) {

^ permalink raw reply related

* [net-next PATCH v3 3/5] net: Enable Tx queue selection based on Rx queues
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

This patch adds support to pick Tx queue based on the Rx queue(s) map
configuration set by the admin through the sysfs attribute
for each Tx queue. If the user configuration for receive queue(s) map
does not apply, then the Tx queue selection falls back to CPU(s) map
based selection and finally to hashing.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/busy_poll.h |    3 ++
 include/net/sock.h      |   14 +++++++++++
 net/core/dev.c          |   60 ++++++++++++++++++++++++++++++++---------------
 net/core/sock.c         |    4 +++
 net/ipv4/tcp_input.c    |    3 ++
 5 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a9..fc4fb68 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -136,6 +136,9 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	sk->sk_napi_id = skb->napi_id;
 #endif
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = skb_get_rx_queue(skb);
+#endif
 }
 
 /* variant used for unconnected sockets */
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f7c584..12313653 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -139,6 +139,7 @@ typedef __u64 __bitwise __addrpair;
  *	@skc_node: main hash linkage for various protocol lookup tables
  *	@skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol
  *	@skc_tx_queue_mapping: tx queue number for this connection
+ *	@skc_rx_queue_mapping: rx queue number for this connection
  *	@skc_flags: place holder for sk_flags
  *		%SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
@@ -215,6 +216,9 @@ struct sock_common {
 		struct hlist_nulls_node skc_nulls_node;
 	};
 	int			skc_tx_queue_mapping;
+#ifdef CONFIG_XPS
+	int			skc_rx_queue_mapping;
+#endif
 	union {
 		int		skc_incoming_cpu;
 		u32		skc_rcv_wnd;
@@ -326,6 +330,9 @@ struct sock {
 #define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_refcnt		__sk_common.skc_refcnt
 #define sk_tx_queue_mapping	__sk_common.skc_tx_queue_mapping
+#ifdef CONFIG_XPS
+#define sk_rx_queue_mapping	__sk_common.skc_rx_queue_mapping
+#endif
 
 #define sk_dontcopy_begin	__sk_common.skc_dontcopy_begin
 #define sk_dontcopy_end		__sk_common.skc_dontcopy_end
@@ -1696,6 +1703,13 @@ static inline int sk_tx_queue_get(const struct sock *sk)
 	return sk ? sk->sk_tx_queue_mapping : -1;
 }
 
+static inline void sk_mark_rx_queue(struct sock *sk, struct sk_buff *skb)
+{
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = skb_get_rx_queue(skb);
+#endif
+}
+
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
 	sk_tx_queue_clear(sk);
diff --git a/net/core/dev.c b/net/core/dev.c
index bba755f..1880e6c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3479,36 +3479,58 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 }
 #endif /* CONFIG_NET_EGRESS */
 
-static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
+#ifdef CONFIG_XPS
+static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
+			       struct xps_dev_maps *dev_maps, unsigned int tci)
+{
+	struct xps_map *map;
+	int queue_index = -1;
+
+	if (dev->num_tc) {
+		tci *= dev->num_tc;
+		tci += netdev_get_prio_tc_map(dev, skb->priority);
+	}
+
+	map = rcu_dereference(dev_maps->attr_map[tci]);
+	if (map) {
+		if (map->len == 1)
+			queue_index = map->queues[0];
+		else
+			queue_index = map->queues[reciprocal_scale(
+						skb_get_hash(skb), map->len)];
+		if (unlikely(queue_index >= dev->real_num_tx_queues))
+			queue_index = -1;
+	}
+	return queue_index;
+}
+#endif
+
+static int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_XPS
 	struct xps_dev_maps *dev_maps;
-	struct xps_map *map;
+	struct sock *sk = skb->sk;
 	int queue_index = -1;
+	unsigned int tci = 0;
 
 	if (!static_key_false(&xps_needed))
 		return -1;
 
+	if (sk && sk->sk_rx_queue_mapping <= dev->num_rx_queues)
+		tci = sk->sk_rx_queue_mapping;
+
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_cpus_map);
-	if (dev_maps) {
-		unsigned int tci = skb->sender_cpu - 1;
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	if (dev_maps)
+		queue_index = __get_xps_queue_idx(dev, skb, dev_maps, tci);
 
-		if (dev->num_tc) {
-			tci *= dev->num_tc;
-			tci += netdev_get_prio_tc_map(dev, skb->priority);
-		}
 
-		map = rcu_dereference(dev_maps->attr_map[tci]);
-		if (map) {
-			if (map->len == 1)
-				queue_index = map->queues[0];
-			else
-				queue_index = map->queues[reciprocal_scale(skb_get_hash(skb),
-									   map->len)];
-			if (unlikely(queue_index >= dev->real_num_tx_queues))
-				queue_index = -1;
-		}
+	if (queue_index < 0) {
+		tci = skb->sender_cpu - 1;
+		dev_maps = rcu_dereference(dev->xps_cpus_map);
+		if (dev_maps)
+			queue_index = __get_xps_queue_idx(dev, skb, dev_maps,
+							  tci);
 	}
 	rcu_read_unlock();
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 435a0ba..3c10d31 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2824,6 +2824,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_pacing_rate = ~0U;
 	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
+
+#ifdef CONFIG_XPS
+	sk->sk_rx_queue_mapping = -1;
+#endif
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d5ffb57..cc69f75 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -78,6 +78,7 @@
 #include <linux/errqueue.h>
 #include <trace/events/tcp.h>
 #include <linux/static_key.h>
+#include <net/busy_poll.h>
 
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 
@@ -5574,6 +5575,7 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 	if (skb) {
 		icsk->icsk_af_ops->sk_rx_dst_set(sk, skb);
 		security_inet_conn_established(sk, skb);
+		sk_mark_napi_id(sk, skb);
 	}
 
 	tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
@@ -6402,6 +6404,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	tcp_rsk(req)->snt_isn = isn;
 	tcp_rsk(req)->txhash = net_tx_rndhash();
 	tcp_openreq_init_rwin(req, sk, dst);
+	sk_mark_rx_queue(req_to_sk(req), skb);
 	if (!want_cookie) {
 		tcp_reqsk_record_syn(sk, req, skb);
 		fastopen_sk = tcp_try_fastopen(sk, skb, req, &foc, dst);

^ permalink raw reply related

* [net-next PATCH v3 4/5] net-sysfs: Add interface for Rx queue(s) map per Tx queue
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Extend transmit queue sysfs attribute to configure Rx queue(s) map
per Tx queue. By default no receive queues are configured for the
Tx queue.

- /sys/class/net/eth0/queues/tx-*/xps_rxqs

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 net/core/net-sysfs.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b39987c..2ed4317 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1283,6 +1283,86 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
 
 static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 	= __ATTR_RW(xps_cpus);
+
+static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
+{
+	struct net_device *dev = queue->dev;
+	struct xps_dev_maps *dev_maps;
+	unsigned long *mask, index;
+	int j, len, num_tc = 1, tc = 0;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	if (dev->num_tc) {
+		num_tc = dev->num_tc;
+		tc = netdev_txq_to_tc(dev, index);
+		if (tc < 0)
+			return -EINVAL;
+	}
+
+	rcu_read_lock();
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	if (dev_maps) {
+		for (j = -1; j = attrmask_next(j, NULL, dev->num_rx_queues),
+		     j < dev->num_rx_queues;) {
+			int i, tci = j * num_tc + tc;
+			struct xps_map *map;
+
+			map = rcu_dereference(dev_maps->attr_map[tci]);
+			if (!map)
+				continue;
+
+			for (i = map->len; i--;) {
+				if (map->queues[i] == index) {
+					set_bit(j, mask);
+					break;
+				}
+			}
+		}
+	}
+
+	len = bitmap_print_to_pagebuf(false, buf, mask, dev->num_rx_queues);
+	rcu_read_unlock();
+	kfree(mask);
+
+	return len < PAGE_SIZE ? len : -EINVAL;
+}
+
+static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
+			      size_t len)
+{
+	struct net_device *dev = queue->dev;
+	unsigned long *mask, index;
+	int err;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	mask = kcalloc(BITS_TO_LONGS(dev->num_rx_queues), sizeof(long),
+		       GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	err = bitmap_parse(buf, len, mask, dev->num_rx_queues);
+	if (err) {
+		kfree(mask);
+		return err;
+	}
+
+	err = __netif_set_xps_queue(dev, mask, index, XPS_MAP_RXQS);
+	kfree(mask);
+	return err ? : len;
+}
+
+static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init
+	= __ATTR_RW(xps_rxqs);
 #endif /* CONFIG_XPS */
 
 static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
@@ -1290,6 +1370,7 @@ static struct attribute *netdev_queue_default_attrs[] __ro_after_init = {
 	&queue_traffic_class.attr,
 #ifdef CONFIG_XPS
 	&xps_cpus_attribute.attr,
+	&xps_rxqs_attribute.attr,
 	&queue_tx_maxrate.attr,
 #endif
 	NULL

^ permalink raw reply related

* [net-next PATCH v3 5/5] Documentation: Add explanation for XPS using Rx-queue(s) map
From: Amritha Nambiar @ 2018-06-05  8:38 UTC (permalink / raw)
  To: netdev, davem
  Cc: alexander.h.duyck, willemdebruijn.kernel, amritha.nambiar,
	sridhar.samudrala, edumazet, hannes, tom
In-Reply-To: <152818727065.20862.10108275498797168689.stgit@anamdev.jf.intel.com>

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 Documentation/ABI/testing/sysfs-class-net-queues |   11 ++++
 Documentation/networking/scaling.txt             |   58 ++++++++++++++++++----
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net-queues b/Documentation/ABI/testing/sysfs-class-net-queues
index 0c0df91..1b4cc21 100644
--- a/Documentation/ABI/testing/sysfs-class-net-queues
+++ b/Documentation/ABI/testing/sysfs-class-net-queues
@@ -42,6 +42,17 @@ Description:
 		network device transmit queue. Possible vaules depend on the
 		number of available CPU(s) in the system.
 
+What:		/sys/class/<iface>/queues/tx-<queue>/xps_rxqs
+Date:		June 2018
+KernelVersion:	4.17.0
+Contact:	netdev@vger.kernel.org
+Description:
+		Mask of the receive queue(s) currently enabled to participate
+		into the Transmit Packet Steering packet processing flow for this
+		network device transmit queue. Possible values depend on the
+		number of available receive queue(s) in the network device.
+		Default is disabled.
+
 What:		/sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
 Date:		November 2011
 KernelVersion:	3.3
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index f55639d..834147c 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -366,8 +366,13 @@ XPS: Transmit Packet Steering
 
 Transmit Packet Steering is a mechanism for intelligently selecting
 which transmit queue to use when transmitting a packet on a multi-queue
-device. To accomplish this, a mapping from CPU to hardware queue(s) is
-recorded. The goal of this mapping is usually to assign queues
+device. This can be accomplished by recording two kinds of maps, either
+a mapping of CPU to hardware queue(s) or a mapping of receive queue(s)
+to hardware transmit queue(s).
+
+1. XPS using CPUs map
+
+The goal of this mapping is usually to assign queues
 exclusively to a subset of CPUs, where the transmit completions for
 these queues are processed on a CPU within this set. This choice
 provides two benefits. First, contention on the device queue lock is
@@ -377,12 +382,36 @@ transmit queue). Secondly, cache miss rate on transmit completion is
 reduced, in particular for data cache lines that hold the sk_buff
 structures.
 
-XPS is configured per transmit queue by setting a bitmap of CPUs that
-may use that queue to transmit. The reverse mapping, from CPUs to
-transmit queues, is computed and maintained for each network device.
-When transmitting the first packet in a flow, the function
-get_xps_queue() is called to select a queue. This function uses the ID
-of the running CPU as a key into the CPU-to-queue lookup table. If the
+2. XPS using receive queues map
+
+This mapping is used to pick transmit queue based on the receive
+queue(s) map configuration set by the administrator. A set of receive
+queues can be mapped to a set of transmit queues (many:many), although
+the common use case is a 1:1 mapping. This will enable sending packets
+on the same queue pair for transmit and receive. This is useful for
+busy polling multi-threaded workloads where there are challenges in
+associating a given CPU to a given application thread. The application
+threads are not pinned to CPUs and each thread handles packets
+received on a single queue. The receive queue number is cached in the
+socket for the connection and there is no need for adding flow entries
+as in the case of aRFS or flow director. In this model, sending the
+packets on the same transmit queue corresponding to the queue-pair
+associated with the receive queue has benefits in keeping the CPU overhead
+low. Transmit completion work is locked into the same queue pair that
+a given application is polling on. This avoids the overhead of triggering
+an interrupt on another CPU. When the application cleans up the packets
+during the busy poll, transmit completion may be processed along with it
+in the same thread context and so result in reduced latency.
+
+XPS is configured per transmit queue by setting a bitmap of
+CPUs/receive-queues that may use that queue to transmit. The reverse
+mapping, from CPUs to transmit queues or from receive-queues to transmit
+queues, is computed and maintained for each network device. When
+transmitting the first packet in a flow, the function get_xps_queue() is
+called to select a queue. This function uses the ID of the receive queue
+for the socket connection for a match in the receive queue-to-transmit queue
+lookup table. Alternatively, this function can also use the ID of the
+running CPU as a key into the CPU-to-queue lookup table. If the
 ID matches a single queue, that is used for transmission. If multiple
 queues match, one is selected by using the flow hash to compute an index
 into the set.
@@ -404,11 +433,15 @@ acknowledged.
 
 XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
 default for SMP). The functionality remains disabled until explicitly
-configured. To enable XPS, the bitmap of CPUs that may use a transmit
-queue is configured using the sysfs file entry:
+configured. To enable XPS, the bitmap of CPUs/receive-queues that may
+use a transmit queue is configured using the sysfs file entry:
 
+For selection based on CPUs map:
 /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
 
+For selection based on receive-queues map:
+/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
+
 == Suggested Configuration
 
 For a network device with a single transmission queue, XPS configuration
@@ -421,6 +454,11 @@ best CPUs to share a given queue are probably those that share the cache
 with the CPU that processes transmit completions for that queue
 (transmit interrupts).
 
+For transmit queue selection based on receive queue(s), XPS has to be
+explicitly configured mapping receive-queue(s) to transmit queue(s). If the
+user configuration for receive-queue map does not apply, then the transmit
+queue is selected based on the CPUs map.
+
 Per TX Queue rate limitation:
 =============================
 

^ permalink raw reply related

* Re: AF_XDP. Was: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04
From: Björn Töpel @ 2018-06-05  8:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexei Starovoitov, David Miller, Björn Töpel,
	Karlsson, Magnus, ast, Daniel Borkmann, Or Gerlitz, Jeff Kirsher,
	Netdev
In-Reply-To: <CAKgT0UeoB19-KKkXMLfprMHFA0B+_KRiBrCCFL7T=oEHjWKsTA@mail.gmail.com>

Den tis 5 juni 2018 kl 03:46 skrev Alexander Duyck <alexander.duyck@gmail.com>:
>
> On Mon, Jun 4, 2018 at 4:32 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Mon, Jun 04, 2018 at 03:02:31PM -0700, Alexander Duyck wrote:
> >> On Mon, Jun 4, 2018 at 2:27 PM, David Miller <davem@davemloft.net> wrote:
> >> > From: Or Gerlitz <gerlitz.or@gmail.com>
> >> > Date: Tue, 5 Jun 2018 00:11:35 +0300
> >> >
> >> >> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
> >> >> merged for this window -- AFAIU from [1], it's still under
> >> >> examination/development/research for non Intel HWs, am I correct or
> >> >> this is going to get in now?
> >> >
> >> > All of the pending AF_XDP changes will be merged this merge window.
> >> >
> >> > I think Intel folks need to review things as fast as possible because
> >> > I pretty much refuse to revert the series or disable it in Kconfig at
> >> > this point.
> >> >
> >> > Thank you.
> >>
> >> My understanding of things is that the current AF_XDP patches were
> >> going to be updated to have more of a model agnostic API such that
> >> they would work for either the "typewriter" mode or the descriptor
> >> ring based approach. The current plan was to have the zero copy
> >> patches be a follow-on after the vendor agnostic API bits in the
> >> descriptors and such had been sorted out. I believe you guys have the
> >> descriptor fixes already right?
> >>
> >> In my opinion the i40e code isn't mature enough yet to really go into
> >> anything other than maybe net-next in a couple weeks. We are going to
> >> need a while to get adequate testing in order to flush out all the
> >> bugs and performance regressions we are likely to see coming out of
> >> this change.
> >
> > I think the work everyone did in this release cycle increased my confidence
> > that the way descriptors are defined and the rest of uapi are stable enough
> > and i40e zero copy bits can land in the next release without uapi changes.
> > In that sense even if we merge i40e parts now, the other nic vendors
> > will be in the same situation and may find things that they would like
> > to improve in uapi.
> > So I propose we merge the first 7 patches of the last series now and
> > let 3 remaining i40e patches go via intel trees for the next release.
> > In the mean time other NIC vendors should start actively working
> > on AF_XDP support as well.
> > If somehow uapi would need tweaks, we can still do minor adjustments
> > since 4.18 won't be released for ~10 weeks.
> >
>
> That works for me. Actually I think patch 11 can probably be included
> as well since that is just sample code and could probably be used by
> whatever drivers end up implementing this.
>

The approach suggested by Alexei and Alex sounds good to us. Alex's
review items are very much valid, and require more time to address.
Therefore addressing i40e in the next merge windows sounds like a
great idea.

As Alex suggests, including patch 11 together with the first seven makes sense.

Thanks for spending time on our work. Very much appreciated!


Björn

> Thanks.
>
> - Alex

^ permalink raw reply

* Freeze when using ipheth+IPsec+IPv6
From: Yves-Alexis Perez @ 2018-06-05  8:54 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Hans Liljestrand, David Windsor,
	Kees Cook, Reshetova, Elena, Kirill Tkhai, Al Viro, Cong Wang,
	Mateusz Jurczyk, Denys Vlasenko, David Herrmann, netdev,
	Alexander Kappner, Johannes Berg, Gustavo A. R. Silva,
	Arvind Yadav, Steffen Klassert, Herbert Xu


[-- Attachment #1.1: Type: text/plain, Size: 996 bytes --]

Hi,

since some kernels releases (I didn't test thorougly but at least 4.16
and 4.17) I have regular freezes in certain situations on my laptop.

It seems to happen when I:

- tether using my iPhone (involving ipheth)
- mount an IPsec tunnel over IPv4
- run evolution to fetch my mail (IMAP traffic over IPv6 inside the IPv4
  IPsec tunnel)

When I do that, the interface seems to freeze. Last time the mouse was
still moving so the kernel didn't completely crash, but the UI was
completely irresponsive. I managed to get the attached log from
/sys/fs/pstore with refcount_t stuff pointing to an underflow.

Since this doesn't happen if all three conditions aren't met, I've added
a lot of people to the CC: list, sorry for the noise.

I can try to add debugging here and there if needed, but any pointer to
where would be helpful. I'm especially puzzled about the presence of
net/unix/af_unix.c:491 unix_sock_destructor+0x97/0xc0 in the log.

Regards,
-- 
Yves-Alexis

[-- Attachment #1.2: dmesg-crash.txt --]
[-- Type: text/plain, Size: 14641 bytes --]

Oops#1 Part8
<4>[ 2189.388649] ------------[ cut here ]------------
<4>[ 2189.388652] refcount_t: underflow; use-after-free.
<4>[ 2189.388691] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/lib/refcount.c:187 refcount_sub_and_test+0x3e/0x50
<4>[ 2189.388692] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.388713] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G                T 4.17.0 #22
<4>[ 2189.388714] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388718] RIP: 0010:refcount_sub_and_test+0x3e/0x50
<4>[ 2189.388720] RSP: 0000:ffff93e640dabdc0 EFLAGS: 00010282
<4>[ 2189.388722] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000006
<4>[ 2189.388723] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d014dd95610
<4>[ 2189.388724] RBP: ffff8d00bac82144 R08: 00000000000003b3 R09: 0000000000000004
<4>[ 2189.388725] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.388726] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
<4>[ 2189.388728] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.388729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388731] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.388732] Call Trace:
<4>[ 2189.388738]  sock_wfree+0x40/0x60
<4>[ 2189.388743]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.388747]  skb_release_head_state+0x59/0x90
<4>[ 2189.388751]  skb_release_all+0x9/0x20
<4>[ 2189.388753]  __kfree_skb_defer+0x19/0x50
Oops#1 Part7
<4>[ 2189.388757]  net_tx_action+0xf0/0x2d0
<4>[ 2189.388760]  __do_softirq+0xdb/0x220
<4>[ 2189.388766]  ? sort_range+0x20/0x20
<4>[ 2189.388768]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.388771]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.388775]  kthread+0x109/0x120
<4>[ 2189.388779]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.388783]  ret_from_fork+0x35/0x40
<4>[ 2189.388785] Code: c9 75 0c f0 0f b1 16 75 27 85 d2 0f 94 c0 c3 80 3d ab 82 f4 00 00 75 15 48 c7 c7 b8 a5 fd bd c6 05 9b 82 f4 00 01 e8 22 13 d4 ff <0f> 0b 31 c0 c3 83 f8 ff 75 bf eb f6 66 0f 1f 44 00 00 48 89 fe 
<4>[ 2189.388825] ---[ end trace b06d93f176d25117 ]---
<4>[ 2189.388848] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/net/unix/af_unix.c:491 unix_sock_destructor+0x97/0xc0
<4>[ 2189.388849] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.388866] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G        W       T 4.17.0 #22
<4>[ 2189.388867] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388869] RIP: 0010:unix_sock_destructor+0x97/0xc0
<4>[ 2189.388871] RSP: 0000:ffff93e640dabda8 EFLAGS: 00010286
<4>[ 2189.388872] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000000
<4>[ 2189.388873] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
<4>[ 2189.388875] RBP: ffff8d00bac82000 R08: ffff8d0084796430 R09: ffff93e640dabd50
<4>[ 2189.388876] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.388877] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
Oops#1 Part6
<4>[ 2189.388879] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.388880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388881] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.388882] Call Trace:
<4>[ 2189.388886]  __sk_destruct+0x1f/0x140
<4>[ 2189.388888]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.388891]  skb_release_head_state+0x59/0x90
<4>[ 2189.388894]  skb_release_all+0x9/0x20
<4>[ 2189.388894] ------------[ cut here ]------------
<4>[ 2189.388896] refcount_t: addition on 0; use-after-free.
<4>[ 2189.388898]  __kfree_skb_defer+0x19/0x50
<4>[ 2189.388901]  net_tx_action+0xf0/0x2d0
<4>[ 2189.388906] WARNING: CPU: 0 PID: 1094 at /home/corsac/projets/linux/linux/lib/refcount.c:102 refcount_add+0x26/0x30
<4>[ 2189.388907] Modules linked in: esp4
<4>[ 2189.388910]  __do_softirq+0xdb/0x220
<4>[ 2189.388911]  xfrm6_mode_tunnel
<4>[ 2189.388915]  ? sort_range+0x20/0x20
<4>[ 2189.388915]  xfrm4_mode_tunnel ipheth
<4>[ 2189.388918]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.388918]  bnep
<4>[ 2189.388920]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.388921]  rtsx_pci_sdmmc
<4>[ 2189.388924]  kthread+0x109/0x120
<4>[ 2189.388925]  iwlmvm
<4>[ 2189.388927]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.388928]  snd_hda_codec_realtek
<4>[ 2189.388932]  ret_from_fork+0x35/0x40
<4>[ 2189.388932]  snd_hda_codec_generic snd_hda_codec_hdmi
<4>[ 2189.388933] Code: 
<4>[ 2189.388935]  iwlwifi
<4>[ 2189.388936] e8 ff 
<4>[ 2189.388937]  snd_hda_intel rtsx_pci
<4>[ 2189.388938] f0 f2 ff 5b be 
<4>[ 2189.388941]  snd_hda_codec snd_hwdep
<4>[ 2189.388942] 00 02 
<4>[ 2189.388944]  snd_hda_core
<4>[ 2189.388944] 00 00 
<4>[ 2189.388945]  snd_pcm thinkpad_acpi
Oops#1 Part5
<4>[ 2189.388947] 48 
<4>[ 2189.388947]  efivarfs input_leds
<4>[ 2189.388949] c7 
<4>[ 2189.388951] c7 
<4>[ 2189.388952] CPU: 0 PID: 1094 Comm: Xorg Tainted: G        W       T 4.17.0 #22
<4>[ 2189.388953] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.388954] 5c 
<4>[ 2189.388957] RIP: 0010:refcount_add+0x26/0x30
<4>[ 2189.388958] c3 
<4>[ 2189.388959] RSP: 0018:ffff93e64167bbe0 EFLAGS: 00010286
<4>[ 2189.388960] 7b 
<4>[ 2189.388961] RAX: 0000000000000000 RBX: ffff93e64167bc18 RCX: 0000000000000000
<4>[ 2189.388963] RDX: ffff8d014dc1c540 RSI: ffff8d014dc15618 RDI: ffff8d014dc15618
<4>[ 2189.388963] bd 
<4>[ 2189.388965] RBP: ffffffffbd0d1570 R08: 00000000000003e6 R09: 0000000000000004
<4>[ 2189.388966] R10: ffff8d01449ee000 R11: 0000000000000001 R12: ffff93e64167bc94
<4>[ 2189.388967] 5d e9 
<4>[ 2189.388969] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d00bac82000
<4>[ 2189.388970] FS:  00007559523e76c0(0000) GS:ffff8d014dc00000(0000) knlGS:0000000000000000
<4>[ 2189.388971] 2c 91 
<4>[ 2189.388973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.388975] CR2: 00007e917c21a000 CR3: 00000002431c8002 CR4: 00000000003606f0
<4>[ 2189.388976] 8d 
<4>[ 2189.388977] Call Trace:
<4>[ 2189.388977] ff 48 89 ef 
<4>[ 2189.388983]  sock_alloc_send_pskb+0x20c/0x230
<4>[ 2189.388984] e8 24 73 
<4>[ 2189.388987]  unix_stream_sendmsg+0x2bf/0x3d0
<4>[ 2189.388988] 9f ff eb be 
<4>[ 2189.388993]  sock_sendmsg+0x31/0x40
<4>[ 2189.388994] 0f 
<4>[ 2189.388996]  sock_write_iter+0x88/0xf0
<4>[ 2189.388997] 0b 48 83 
<4>[ 2189.389002]  do_iter_readv_writev+0x147/0x1a0
<4>[ 2189.389005] 7b 
<4>[ 2189.389007]  do_iter_write+0x81/0x1a0
<4>[ 2189.389008] 70 
<4>[ 2189.389010]  vfs_writev+0xd1/0x160
<4>[ 2189.389011] 00 74 
Oops#1 Part4
<4>[ 2189.389013]  ? __sys_recvmsg+0x71/0xb0
<4>[ 2189.389016]  ? __fget+0x6f/0xb0
<4>[ 2189.389016] 8b <0f> 0b 
<4>[ 2189.389019]  ? do_writev+0x5c/0xf0
<4>[ 2189.389021]  do_writev+0x5c/0xf0
<4>[ 2189.389022] 48 83 bb 
<4>[ 2189.389025]  do_syscall_64+0x72/0x1c0
<4>[ 2189.389026] 60 
<4>[ 2189.389029]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[ 2189.389029] 02 00 
<4>[ 2189.389031] RIP: 0033:0x75594f801017
<4>[ 2189.389033] RSP: 002b:00007ffdf628f130 EFLAGS: 00000293
<4>[ 2189.389033] 00 
<4>[ 2189.389034]  ORIG_RAX: 0000000000000014
<4>[ 2189.389035] 00 
<4>[ 2189.389037] RAX: ffffffffffffffda RBX: 000000000000004b RCX: 000075594f801017
<4>[ 2189.389037] 74 
<4>[ 2189.389039] RDX: 0000000000000001 RSI: 00007ffdf628f430 RDI: 000000000000004b
<4>[ 2189.389039] 89 
<4>[ 2189.389041] RBP: 00007ffdf628f430 R08: 0000000000000000 R09: 0000616b576c78e0
<4>[ 2189.389041] 0f 
<4>[ 2189.389043] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000001
<4>[ 2189.389043] 0b 
<4>[ 2189.389045] R13: 00007ffdf628f430 R14: 0000000000000020 R15: 0000616b57691c80
<4>[ 2189.389046] eb 85 
<4>[ 2189.389047] Code: 
<4>[ 2189.389048] 48 
<4>[ 2189.389049] 00 00 
<4>[ 2189.389050] 89 
<4>[ 2189.389051] 00 
<4>[ 2189.389052] de 
<4>[ 2189.389053] 00 
<4>[ 2189.389054] 5b 
<4>[ 2189.389055] 00 
<4>[ 2189.389055] 48 
<4>[ 2189.389057] e8 8b 
<4>[ 2189.389058] ---[ end trace b06d93f176d25118 ]---
<4>[ 2189.389059] ff ff ff 84 c0 74 01 c3 80 3d 76 83 f4 00 00 75 f6 48 c7 c7 58 a5 fd bd c6 05 66 83 f4 00 01 e8 ea 13 d4 ff <0f> 0b c3 0f 1f 80 00 00 00 00 8b 
<4>[ 2189.389090] WARNING: CPU: 3 PID: 30 at /home/corsac/projets/linux/linux/net/unix/af_unix.c:492 unix_sock_destructor+0xa3/0xc0
<4>[ 2189.389090] Modules linked in:
<4>[ 2189.389091] 07 
<4>[ 2189.389092]  esp4
<4>[ 2189.389093] 8d 50 
Oops#1 Part3
<4>[ 2189.389094]  xfrm6_mode_tunnel
<4>[ 2189.389095] 01 85 
<4>[ 2189.389097]  xfrm4_mode_tunnel
<4>[ 2189.389097] c0 74 
<4>[ 2189.389099]  ipheth
<4>[ 2189.389100] 35 85 
<4>[ 2189.389101]  bnep rtsx_pci_sdmmc
<4>[ 2189.389103] d2 
<4>[ 2189.389104]  iwlmvm
<4>[ 2189.389106] ---[ end trace b06d93f176d25119 ]---
<4>[ 2189.389106]  snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2189.389116] CPU: 3 PID: 30 Comm: ksoftirqd/3 Tainted: G        W       T 4.17.0 #22
<4>[ 2189.389117] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2189.389119] RIP: 0010:unix_sock_destructor+0xa3/0xc0
<4>[ 2189.389121] RSP: 0000:ffff93e640dabda8 EFLAGS: 00010286
<4>[ 2189.389123] RAX: 0000000000000000 RBX: ffff8d00bac82000 RCX: 0000000000000000
<4>[ 2189.389124] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
<4>[ 2189.389126] RBP: ffff8d00bac82000 R08: ffff8d0084796430 R09: ffff93e640dabd50
<4>[ 2189.389127] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d014dda1100
<4>[ 2189.389128] R13: 0000000000000000 R14: 909c6c76983bd4d2 R15: 13146644b16dc153
<4>[ 2189.389130] FS:  0000000000000000(0000) GS:ffff8d014dd80000(0000) knlGS:0000000000000000
<4>[ 2189.389131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2189.389133] CR2: 0000755942c8c000 CR3: 000000006620a004 CR4: 00000000003606e0
<4>[ 2189.389134] Call Trace:
<4>[ 2189.389138]  __sk_destruct+0x1f/0x140
<4>[ 2189.389141]  unix_destruct_scm+0x7b/0xa0
<4>[ 2189.389145]  skb_release_head_state+0x59/0x90
<4>[ 2189.389148]  skb_release_all+0x9/0x20
<4>[ 2189.389150]  __kfree_skb_defer+0x19/0x50
<4>[ 2189.389153]  net_tx_action+0xf0/0x2d0
Oops#1 Part2
<4>[ 2189.389155]  __do_softirq+0xdb/0x220
<4>[ 2189.389159]  ? sort_range+0x20/0x20
<4>[ 2189.389163]  run_ksoftirqd+0x1f/0x30
<4>[ 2189.389165]  smpboot_thread_fn+0x11f/0x1e0
<4>[ 2189.389168]  kthread+0x109/0x120
<4>[ 2189.389171]  ? kthread_create_worker_on_cpu+0x70/0x70
<4>[ 2189.389174]  ret_from_fork+0x35/0x40
<4>[ 2189.389176] Code: c7 c7 5c c3 7b bd 5d e9 2c 91 8d ff 48 89 ef e8 24 73 9f ff eb be 0f 0b 48 83 7b 70 00 74 8b 0f 0b 48 83 bb 60 02 00 00 00 74 89 <0f> 0b eb 85 48 89 de 5b 48 c7 c7 60 78 05 be 5d e9 31 74 92 ff 
<4>[ 2189.389222] ---[ end trace b06d93f176d2511a ]---
<6>[ 2189.389224] unix: Attempt to release alive unix socket: 000000003dc67149
<1>[ 2190.401779] BUG: unable to handle kernel paging request at ffffffffbd3e3680
<6>[ 2190.401785] PGD 6620c067 P4D 6620c067 PUD 6620d063 PMD 652000e1 
<4>[ 2190.401790] Oops: 0003 [#1] PREEMPT SMP PTI
<4>[ 2190.401792] Modules linked in: esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel ipheth bnep rtsx_pci_sdmmc iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iwlwifi snd_hda_intel rtsx_pci snd_hda_codec snd_hwdep snd_hda_core snd_pcm thinkpad_acpi efivarfs input_leds
<4>[ 2190.401806] CPU: 2 PID: 4119 Comm: evolution Tainted: G        W       T 4.17.0 #22
<4>[ 2190.401808] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET48W (1.27 ) 09/12/2017
<4>[ 2190.401813] RIP: 0010:queued_spin_lock_slowpath+0xe4/0x1a0
<4>[ 2190.401815] RSP: 0018:ffff93e642373c60 EFLAGS: 00010282
<4>[ 2190.401817] RAX: ffffffffbd3e3680 RBX: 0000000000000008 RCX: ffff8d014dd20880
<4>[ 2190.401818] RDX: 0000000000001084 RSI: 0000000042161000 RDI: ffff8d00bac82340
<4>[ 2190.401820] RBP: ffff8d00bac82340 R08: 00000000000c0000 R09: ffff8d0142385000
<4>[ 2190.401821] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d0142385000
Oops#1 Part1
<4>[ 2190.401823] R13: 0000000000000000 R14: ffff8d00bac82000 R15: ffff8d0144916800
<4>[ 2190.401825] FS:  0000746978d9bf80(0000) GS:ffff8d014dd00000(0000) knlGS:0000000000000000
<4>[ 2190.401827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2190.401829] CR2: ffffffffbd3e3680 CR3: 00000001844d6005 CR4: 00000000003606e0
<4>[ 2190.401830] Call Trace:
<4>[ 2190.401835]  unix_stream_sendmsg+0x1df/0x3d0
<4>[ 2190.401839]  sock_sendmsg+0x31/0x40
<4>[ 2190.401842]  sock_write_iter+0x88/0xf0
<4>[ 2190.401846]  do_iter_readv_writev+0x147/0x1a0
<4>[ 2190.401848]  do_iter_write+0x81/0x1a0
<4>[ 2190.401851]  vfs_writev+0xd1/0x160
<4>[ 2190.401854]  ? __fget+0x6f/0xb0
<4>[ 2190.401857]  ? do_writev+0x5c/0xf0
<4>[ 2190.401859]  do_writev+0x5c/0xf0
<4>[ 2190.401863]  do_syscall_64+0x72/0x1c0
<4>[ 2190.401866]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[ 2190.401868] RIP: 0033:0x746973a7c017
<4>[ 2190.401870] RSP: 002b:00007ffd99341b60 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
<4>[ 2190.401872] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000746973a7c017
<4>[ 2190.401873] RDX: 0000000000000003 RSI: 00007ffd99341d10 RDI: 0000000000000004
<4>[ 2190.401875] RBP: 00007ffd99341d10 R08: 0000000000000000 R09: 0000000000000000
<4>[ 2190.401876] R10: 00000000000000b0 R11: 0000000000000293 R12: 0000000000000003
<4>[ 2190.401878] R13: 00007ffd99341c94 R14: 000060ed17d42a98 R15: 00007ffd99341c98
<4>[ 2190.401880] Code: 89 c2 c1 e2 10 85 d2 0f 84 cc 00 00 00 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 80 08 02 00 48 03 04 d5 80 54 06 be <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 
<1>[ 2190.401909] RIP: queued_spin_lock_slowpath+0xe4/0x1a0 RSP: ffff93e642373c60
<4>[ 2190.401910] CR2: ffffffffbd3e3680
<4>[ 2190.401912] ---[ end trace b06d93f176d2511b ]---

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: INFO: task hung in ip6gre_exit_batch_net
From: Kirill Tkhai @ 2018-06-05  9:03 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: Christian Brauner, David Miller, David Ahern, Florian Westphal,
	Jiri Benc, LKML, Xin Long, mschiffer, netdev, syzkaller-bugs,
	Vladislav Yasevich
In-Reply-To: <CACT4Y+ZmxgKvgD2RWD1=KEZQqY=vwdhrAZfqMT-gGuj8k8eKKg@mail.gmail.com>

Hi, Dmirty!

On 04.06.2018 18:22, Dmitry Vyukov wrote:
> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
> <syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com
> 
> Another hang on rtnl lock:
> 
> #syz dup: INFO: task hung in netdev_run_todo
> 
> May be related to "unregister_netdevice: waiting for DEV to become free":
> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
> 
> Any other explanations for massive hangs on rtnl lock for minutes?

To exclude the situation, when a task exists with rtnl_mutex held:

would the pr_warn() from print_held_locks_bug() be included in the console output
if they appear?

Kirill

^ permalink raw reply

* Re: [PATCH RFC ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-05  9:07 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, eyal.birger, antony, benedictwong, lorenzo,
	shannon.nelson
In-Reply-To: <20180604.085822.1374283424905473397.davem@davemloft.net>

On Mon, Jun 04, 2018 at 08:58:22AM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Mon, 4 Jun 2018 08:09:07 +0200
> 
> > This patchset introduces new virtual xfrm interfaces.
> > The design of virtual xfrm interfaces interfaces was
> > discussed at the Linux IPsec workshop 2018. This patchset
> > implements these interfaces as the IPsec userspace and
> > kernel developers agreed. The purpose of these interfaces
> > is to overcome the design limitations that the existing
> > VTI devices have.
> > 
> > We had two presentations about xfrm interfaces at
> > the workshop. Slides with further informations
> > can be found at the workshop homepage:
> > 
> > https://workshop.linux-ipsec.org/2018/
> 
> First off, you will have to describe in detail what the VTI
> limitations are and how these new devices overcome them in this commit
> message.

Ok, no problem. Sorry for not doing this in the initial mail.

The main limitations that we see with the current VTI are the following:

- VTI interfaces are L3 tunnels with configurable endpoints.
  For xfrm, the tunnel endpoint are already determined by the SA.
  So the VTI tunnel endpoints must be either the same as on the
  SA or wildcards. In case VTI tunnel endpoints are same as on
  the SA, we get a one to one correlation between the SA and
  the tunnel. So each SA needs its own tunnel interface.

  On the other hand, we can have only one VTI tunnel with
  wildcard src/dst tunnel endpoints in the system because the
  lookup is based on the tunnel endpoints. The existing tunnel
  lookup won't work with multiple tunnels with wildcard
  tunnel endpoints. Some usecases require more than on
  VTI tunnel of this type, for example if somebody has multiple
  namespaces and every namespace requires such a VTI.

- VTI needs separate interfaces for IPv4 and IPv6 tunnels.
  So when routing to a VTI, we have to know to which address
  family this traffic class is going to be encapsulated.
  This is a lmitation because it makes routing more complex
  and it is not always possible to know what happens behind the
  VTI, e.g. when the VTI is move to some namespace.

- VTI works just with tunnel mode SAs. We need generic interfaces
  that ensures transfomation, regardless of the xfrm mode and
  the encapsulated address family.

- VTI is configured with a combination GRE keys and xfrm marks.
  With this we have to deal with some extra cases in the generic
  tunnel lookup because the GRE keys on the VTI are actually
  not GRE keys, the GRE keys were just reused for something else.
  All extensions to the VTI interfaces would require to add
  even more complexity to the generic tunnel lookup.

So to overcome this, we started with the following design goal:

- It should be possible to tunnel IPv4 and IPv6 through the same
  interface.

- No limitation on xfrm mode (tunnel, transport and beet).

- Should be a generic virtual interface that ensures IPsec
  transformation, no need to know what happens behind the
  interface.

- Interfaces should be configured with a new key that must match a
  new policy/SA lookup key.

- The lookup logic should stay in the xfrm codebase, no need to
  change or extend generic routing and tunnel lookups.

- Should be possible to use IPsec hardware offloads of the underlying
  interface.

> 
> You can't just say "we discussed this over there, go take a look".
> 
> The place people "take a look" is your text here.
> 
> Second, since you didn't explain things, I have to ask.  Why is a new
> special ID even necessary?  It makes the flowi bigger, and adds all of
> this new logic.
> 
> All netdevs have an ifindex and you should be able to find a way to
> use the ifindex of these new devices in the key somehow.

The ifindex is already part of the xfrm selector and used as a key
to enforce the output device after the transformation in the
policy/state lookup. So we need some other identifier as the
ifindex is already in use for xfrm lookups.

We tried to keep the lookup logic in xfrm as much as possible,
the only thing that we currently need is a field in the flowi
for this to pass the information down to the lookups.

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/2] samples/bpf: Add xdp_sample_pkts example
From: Toke Høiland-Jørgensen @ 2018-06-05  9:20 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski; +Cc: netdev
In-Reply-To: <1ef5fdfd-3225-89e2-c6b5-d3fe2573d655@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/05/2018 12:32 AM, Jakub Kicinski wrote:
>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>> +	if (load_bpf_file(filename)) {
>> 
>> Would you mind using libbpf instead of bpf_load.o?  I converted some
>> samples in be5bca44aa6b ("samples: bpf: convert some XDP samples from
>> bpf_load to libbpf"), it's pretty straight forward.  Maybe we can kill
>> bpf_load.o one day :)
>
> Agreed, we should only be using libbpf going forward.

Right, I'll rework to load using libbpf instead.

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Toke Høiland-Jørgensen @ 2018-06-05  9:22 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski; +Cc: netdev, Alexei Starovoitov
In-Reply-To: <2626942d-664b-f569-1926-d5a8ac5314b9@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/05/2018 12:26 AM, Jakub Kicinski wrote:
>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>> This adds two new helper functions to trace_helpers that supports polling
>>> multiple perf file descriptors for events. These are used to the XDP
>>> perf_event_output example, which needs to work with one perf fd per CPU.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> 
>> Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?
>> 
>> I think the ability to poll multiple FDs could be generally useful and
>> therefore better add it to libbpf.c than
>> tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...
>
> I think for it to land in libbpf this code needs to be more generalized
> as it is right now and allowing for more flexibility like pinning RB
> processing threads to CPUs, poll handling, etc.

Hmm, so how about we leave it in trace_helpers.c for now? I don't
necessarily mind working on an addition to libbpf at some point, but
don't have time to do so short term...

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Daniel Borkmann @ 2018-06-05  9:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jakub Kicinski
  Cc: netdev, Alexei Starovoitov
In-Reply-To: <87k1rdfv0m.fsf@toke.dk>

On 06/05/2018 11:22 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 06/05/2018 12:26 AM, Jakub Kicinski wrote:
>>> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>>>> This adds two new helper functions to trace_helpers that supports polling
>>>> multiple perf file descriptors for events. These are used to the XDP
>>>> perf_event_output example, which needs to work with one perf fd per CPU.
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>
>>> Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?
>>>
>>> I think the ability to poll multiple FDs could be generally useful and
>>> therefore better add it to libbpf.c than
>>> tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...
>>
>> I think for it to land in libbpf this code needs to be more generalized
>> as it is right now and allowing for more flexibility like pinning RB
>> processing threads to CPUs, poll handling, etc.
> 
> Hmm, so how about we leave it in trace_helpers.c for now? I don't
> necessarily mind working on an addition to libbpf at some point, but
> don't have time to do so short term...

That's okay with me as well.

^ permalink raw reply

* Re: INFO: task hung in ip6gre_exit_batch_net
From: Dmitry Vyukov @ 2018-06-05  9:36 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: syzbot, Christian Brauner, David Miller, David Ahern,
	Florian Westphal, Jiri Benc, LKML, Xin Long, mschiffer, netdev,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <cf729ce5-7948-d475-1b20-f31fb313df88@virtuozzo.com>

On Tue, Jun 5, 2018 at 11:03 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> Hi, Dmirty!
>
> On 04.06.2018 18:22, Dmitry Vyukov wrote:
>> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
>> <syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com
>>
>> Another hang on rtnl lock:
>>
>> #syz dup: INFO: task hung in netdev_run_todo
>>
>> May be related to "unregister_netdevice: waiting for DEV to become free":
>> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9

netdev_wait_allrefs does not hold rtnl lock during waiting, so it must
be something different.


>> Any other explanations for massive hangs on rtnl lock for minutes?
>
> To exclude the situation, when a task exists with rtnl_mutex held:
>
> would the pr_warn() from print_held_locks_bug() be included in the console output
> if they appear?

Yes, everything containing "WARNING:" is detected as bug.

^ permalink raw reply

* Re: Qualcomm rmnet driver and qmi_wwan
From: Daniele Palmas @ 2018-06-05  9:38 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan; +Cc: netdev
In-Reply-To: <2a96c941b6699827ea6ac1ce3603fae5@codeaurora.org>

Hi,

2018-02-21 20:47 GMT+01:00 Subash Abhinov Kasiviswanathan
<subashab@codeaurora.org>:
> On 2018-02-21 04:38, Daniele Palmas wrote:
>>
>> Hello,
>>
>> in rmnet kernel documentation I read:
>>
>> "This driver can be used to register onto any physical network device in
>> IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator."
>>
>> Does this mean that it can be used in association with the qmi_wwan
>> driver?
>>
>> If yes, can someone give me an hint on the steps to follow?
>>
>> If not, does anyone know if it is possible to modify qmi_wwan in order
>> to take advantage of the features provided by the rmnet driver?
>>
>> In this case hint on the changes for modifying qmi_wwan are welcome.
>>
>> Thanks in advance,
>> Daniele
>
>
> Hi
>
> I havent used qmi_wwan so the following comment is based on code inspection.
> qmimux_register_device() is creating qmimux devices with usb net device as
> real_dev. The Multiplexing and aggregation header (qmimux_hdr) is stripped
> off
> in qmimux_rx_fixup() and the packet is passed on to stack.
>
> You could instead create rmnet devices with the usb netdevice as real dev.
> The packets from the usb net driver can be queued to network stack directly
> as rmnet driver will setup a RX handler. rmnet driver will process the
> packets
> further and then queue to network stack.
>

in kernel documentation I read that rmnet user space configuration is
done through librmnetctl available at

https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/dataservices/tree/rmnetctl

However it seems to me that this is a bit outdated (e.g. it does not
properly build since it is looking for kernel header
linux/rmnet_data.h that, as far as I understand, is no more present).

Is there available a more recent version of the tool?

Thanks,
Daniele

> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

^ permalink raw reply

* [PATCH net-next] net: fddi: fix a possible null-ptr-deref
From: YueHaibing @ 2018-06-05 10:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, YueHaibing

bp->SharedMemAddr is set to NULL while bp->SharedMemSize less than 0,
then memset will trigger null-ptr-deref.

Fix it use pci_zalloc_consistent when bp->SharedMemSize is greater than 0.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/fddi/skfp/skfddi.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fddi/skfp/skfddi.c b/drivers/net/fddi/skfp/skfddi.c
index 2414f1d..90164e0 100644
--- a/drivers/net/fddi/skfp/skfddi.c
+++ b/drivers/net/fddi/skfp/skfddi.c
@@ -407,23 +407,21 @@ static  int skfp_driver_init(struct net_device *dev)
 	if (bp->SharedMemSize > 0) {
 		bp->SharedMemSize += 16;	// for descriptor alignment
 
-		bp->SharedMemAddr = pci_alloc_consistent(&bp->pdev,
-							 bp->SharedMemSize,
-							 &bp->SharedMemDMA);
+		bp->SharedMemAddr = pci_zalloc_consistent(&bp->pdev,
+							  bp->SharedMemSize,
+							  &bp->SharedMemDMA);
 		if (!bp->SharedMemAddr) {
 			printk("could not allocate mem for ");
 			printk("hardware module: %ld byte\n",
 			       bp->SharedMemSize);
 			goto fail;
 		}
-		bp->SharedMemHeap = 0;	// Nothing used yet.
 
 	} else {
 		bp->SharedMemAddr = NULL;
-		bp->SharedMemHeap = 0;
 	}			// SharedMemSize > 0
 
-	memset(bp->SharedMemAddr, 0, bp->SharedMemSize);
+	bp->SharedMemHeap = 0;
 
 	card_stop(smc);		// Reset adapter.
 
-- 
2.7.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox