Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 2/2 v9] net: ethernet: Add a driver for Gemini gigabit ethernet
From: Michał Mirosław @ 2017-12-18 14:48 UTC (permalink / raw)
  To: Linus Walleij
  Cc: netdev, David S . Miller, Janos Laube, Paulius Zaleckas,
	Linux ARM, Hans Ulli Kroll, Florian Fainelli, Tobias Waldvogel
In-Reply-To: <CACRpkdZ3t9ZDB-GOoriq0Jm=-GtYzUX-qc36o_XbWO9NQzMUmA@mail.gmail.com>

On Mon, Dec 18, 2017 at 02:57:37PM +0100, Linus Walleij wrote:
> On Sat, Dec 16, 2017 at 8:39 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 
> > The Gemini ethernet has been around for years as an out-of-tree
> > patch used with the NAS boxen and routers built on StorLink
> > SL3512 and SL3516, later Storm Semiconductor, later Cortina
> > Systems. These ASICs are still being deployed and brand new
> > off-the-shelf systems using it can easily be acquired.
[...]
> > ---
> > Changes from v8:
> > - Remove dependency guards in Kconfig to get a wider compile
> >   coverage for the driver to detect broken APIs etc.
> 
> I guess we need to hold this off for a while, the code does
> some weird stuff using the ARM-internal page DMA mapping
> API.
> 
> I *think* what happens is that the driver allocates a global queue
> used for RX and TX on both interfaces, then initializes that with
> page pointers and gives that to the hardware to play with.
> 
> When an RX packet comes in, the RX routine needs to figure
> out from the DMA (physical) address which remapped
> page/address this random physical address pointer
> corresponds to.
> 
> The Linux DMA API assumption is that the driver keeps track
> of this mapping, not the hardware. So we need to figure out
> a way to reverse-map this. Preferably quickly, and without
> using any ARM-internal mapping APIs.

IIRC, the hardware copies descriptors from free queue (FREEQ)
to RX queues. FREEQ is shared among the two ethernet ports.

This platform is CPU bound, so every additional lookup will
hit performance here. In my version I had an #ifdef for
COMPILE_TEST that replaced ARM-specific calls with stubs.
Since the driver is not expected to work on other platforms,
this seemed like the best workaround to make it compile
on other arches.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next 2/2 v9] net: ethernet: Add a driver for Gemini gigabit ethernet
From: Russell King - ARM Linux @ 2017-12-18 14:54 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Linus Walleij, Tobias Waldvogel, Florian Fainelli,
	Paulius Zaleckas, netdev, Hans Ulli Kroll, Janos Laube,
	David S . Miller, Linux ARM
In-Reply-To: <20171218144817.GA25352@qmqm.qmqm.pl>

On Mon, Dec 18, 2017 at 03:48:17PM +0100, Michał Mirosław wrote:
> On Mon, Dec 18, 2017 at 02:57:37PM +0100, Linus Walleij wrote:
> > On Sat, Dec 16, 2017 at 8:39 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
> > 
> > > The Gemini ethernet has been around for years as an out-of-tree
> > > patch used with the NAS boxen and routers built on StorLink
> > > SL3512 and SL3516, later Storm Semiconductor, later Cortina
> > > Systems. These ASICs are still being deployed and brand new
> > > off-the-shelf systems using it can easily be acquired.
> [...]
> > > ---
> > > Changes from v8:
> > > - Remove dependency guards in Kconfig to get a wider compile
> > >   coverage for the driver to detect broken APIs etc.
> > 
> > I guess we need to hold this off for a while, the code does
> > some weird stuff using the ARM-internal page DMA mapping
> > API.
> > 
> > I *think* what happens is that the driver allocates a global queue
> > used for RX and TX on both interfaces, then initializes that with
> > page pointers and gives that to the hardware to play with.
> > 
> > When an RX packet comes in, the RX routine needs to figure
> > out from the DMA (physical) address which remapped
> > page/address this random physical address pointer
> > corresponds to.
> > 
> > The Linux DMA API assumption is that the driver keeps track
> > of this mapping, not the hardware. So we need to figure out
> > a way to reverse-map this. Preferably quickly, and without
> > using any ARM-internal mapping APIs.
> 
> IIRC, the hardware copies descriptors from free queue (FREEQ)
> to RX queues. FREEQ is shared among the two ethernet ports.
> 
> This platform is CPU bound, so every additional lookup will
> hit performance here. In my version I had an #ifdef for
> COMPILE_TEST that replaced ARM-specific calls with stubs.
> Since the driver is not expected to work on other platforms,
> this seemed like the best workaround to make it compile
> on other arches.

Really.  No.  Stop going beneath the covers and using ARM private
implementation APIs in drivers.

Take that as a big NAK to that.

(I don't seem have the patch in question here to look at though.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [PATCH][next] ath10k: wmi: remove redundant integer fc
From: Colin King @ 2017-12-18 15:02 UTC (permalink / raw)
  To: Kalle Valo, ath10k, linux-wireless, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Variable fc is being assigned but never used, so remove it. Cleans
up the clang warning:

warning: Value stored to 'fc' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ath/ath10k/wmi-tlv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index 8d53063bd503..06fde53aa679 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -2494,7 +2494,6 @@ ath10k_wmi_tlv_op_gen_mgmt_tx(struct ath10k *ar, struct sk_buff *msdu)
 	void *ptr;
 	int len;
 	u32 buf_len = msdu->len;
-	u16 fc;
 	struct ath10k_vif *arvif;
 	dma_addr_t mgmt_frame_dma;
 	u32 vdev_id;
@@ -2503,7 +2502,6 @@ ath10k_wmi_tlv_op_gen_mgmt_tx(struct ath10k *ar, struct sk_buff *msdu)
 		return ERR_PTR(-EINVAL);
 
 	hdr = (struct ieee80211_hdr *)msdu->data;
-	fc = le16_to_cpu(hdr->frame_control);
 	arvif = (void *)cb->vif->drv_priv;
 	vdev_id = arvif->vdev_id;
 
-- 
2.14.1


^ permalink raw reply related

* Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper
From: Daniel Borkmann @ 2017-12-18 15:09 UTC (permalink / raw)
  To: Masami Hiramatsu, Josef Bacik
  Cc: rostedt, mingo, davem, netdev, linux-kernel, ast, kernel-team,
	linux-btrfs, darrick.wong, Josef Bacik
In-Reply-To: <20171218185100.4a8174488dfebf93d28899c3@kernel.org>

On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> On Fri, 15 Dec 2017 14:12:54 -0500
> Josef Bacik <josef@toxicpanda.com> wrote:
>> From: Josef Bacik <jbacik@fb.com>
>>
>> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
>> perfectly with it's kprobe functionality.  We could make sure errors are
>> only triggered in specific call chains that we care about with very
>> specific situations.  Accomplish this with the bpf_override_funciton
>> helper.  This will modify the probe'd callers return value to the
>> specified value and set the PC to an override function that simply
>> returns, bypassing the originally probed function.  This gives us a nice
>> clean way to implement systematic error injection for all of our code
>> paths.
> 
> OK, got it. I think the error_injectable function list should be defined
> in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> the "safeness".
> 
> [...]
>> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
>> index 8dc0161cec8f..1ea748d682fd 100644
>> --- a/arch/x86/kernel/kprobes/ftrace.c
>> +++ b/arch/x86/kernel/kprobes/ftrace.c
>> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
>>  	p->ainsn.boostable = false;
>>  	return 0;
>>  }
>> +
>> +asmlinkage void override_func(void);
>> +asm(
>> +	".type override_func, @function\n"
>> +	"override_func:\n"
>> +	"	ret\n"
>> +	".size override_func, .-override_func\n"
>> +);
>> +
>> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
>> +{
>> +	regs->ip = (unsigned long)&override_func;
>> +}
>> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> 
> Calling this as "override_function" is meaningless. This is a function
> which just return. So I think combination of just_return_func() and
> arch_bpf_override_func_just_return() will be better.
> 
> Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> dependent implementation of kprobes, not bpf.

Josef, please work out any necessary cleanups that would still need
to be addressed based on Masami's feedback and send them as follow-up
patches, thanks.

> Hmm, arch/x86/net/bpf_jit_comp.c will be better place?

(No, it's JIT only and I'd really prefer to keep it that way, mixing
 this would result in a huge mess.)

^ permalink raw reply

* Re: [PATCH bpf-next 12/13] bpf: arm64: add JIT support for multi-function programs
From: Arnd Bergmann @ 2017-12-18 15:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Daniel Borkmann, John Fastabend, Edward Cree,
	Jakub Kicinski, Networking, kernel-team
In-Reply-To: <20171215015517.409513-13-ast@kernel.org>

On Fri, Dec 15, 2017 at 2:55 AM, Alexei Starovoitov <ast@kernel.org> wrote:


> +       if (jit_data->ctx.offset) {
> +               ctx = jit_data->ctx;
> +               image_ptr = jit_data->image;
> +               header = jit_data->header;
> +               extra_pass = true;
> +               goto skip_init_ctx;
> +       }
>         memset(&ctx, 0, sizeof(ctx));
>         ctx.prog = prog;

The 'goto' jumps over the 'image_size' initialization

>         prog->bpf_func = (void *)ctx.image;
>         prog->jited = 1;
>         prog->jited_len = image_size;

so we now get a warning here, starting with linux-next-20171218:

arch/arm64/net/bpf_jit_comp.c: In function 'bpf_int_jit_compile':
arch/arm64/net/bpf_jit_comp.c:982:18: error: 'image_size' may be used
uninitialized in this function [-Werror=maybe-uninitialized]

I could not figure out what the code should be doing instead, or if it is
indeed safe and the warning is a false-positive.

        Arnd

^ permalink raw reply

* Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program
From: Joe Perches @ 2017-12-18 15:30 UTC (permalink / raw)
  To: Knut Omang, Jason Gunthorpe
  Cc: Stephen Hemminger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Mauro Carvalho Chehab, Nicolas Palix, Jonathan Corbet,
	Santosh Shilimkar, Matthew Wilcox, cocci-/FJkirnvOdkvYVN+rsErww,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Mickaël Salaün, Shuah Khan,
	linux-kbuild-u79uwXL29TY76Z2rM5mHXA, Michal Marek, Julia Lawall,
	John Haxby, Åsmund Østvold, Masahiro Yamada
In-Reply-To: <1513602315.22938.49.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On Mon, 2017-12-18 at 14:05 +0100, Knut Omang wrote:
> > Here is a list of the checkpatch messages for drivers/infiniband
> > sorted by type.
> > 
> > Many of these might be corrected by using
> > 
> > $ ./scripts/checkpatch.pl -f --fix-inplace --types=<TYPE> \
> >   $(git ls-files drivers/infiniband/)
> 
> Yes, and I already did that work piece by piece for individual types,
> just to test the runchecks tool, and want to post that set once the 
> runchecks script and Makefile changes itself are in,

I think those are independent of any runcheck tool changes and
could be posted now.  In general, don't keep patches in a local
tree waiting on some other unrelated patch.

Just fyi:

There is a script that helps automate checkpatch "by-type" conversions
with compilation, .o difference checking, and git commit editing.

https://lkml.org/lkml/2014/7/11/794

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net v2] net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
From: Nikolay Aleksandrov @ 2017-12-18 15:35 UTC (permalink / raw)
  To: netdev
  Cc: makita.toshiaki, roopa, davem, stephen, avagin, bridge, idosch,
	Nikolay Aleksandrov
In-Reply-To: <e4948ab7-0f3b-a4f5-52ea-853a987f1abb@cumulusnetworks.com>

The early call to br_stp_change_bridge_id in bridge's newlink can cause
a memory leak if an error occurs during the newlink because the fdb
entries are not cleaned up if a different lladdr was specified, also
another minor issue is that it generates fdb notifications with
ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
which get added on NETDEV_REGISTER event, but are not cleaned up in the
newlink error path. To remove this special case the call to
br_stp_change_bridge_id is done after netdev register and we cleanup the
bridge on changelink error via br_dev_delete to plug all leaks.

This patch makes netlink bridge destruction on newlink error the same as
dellink and ioctl del which is necessary since at that point we have a
fully initialized bridge device.

To reproduce the issue:
$ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
RTNETLINK answers: Invalid argument

$ rmmod bridge
[ 1822.142525] =============================================================================
[ 1822.143640] BUG bridge_fdb_cache (Tainted: G           O    ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
[ 1822.144821] -----------------------------------------------------------------------------

[ 1822.145990] Disabling lock debugging due to kernel taint
[ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
[ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G    B      O     4.15.0-rc2+ #87
[ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1822.150008] Call Trace:
[ 1822.150510]  dump_stack+0x78/0xa9
[ 1822.151156]  slab_err+0xb1/0xd3
[ 1822.151834]  ? __kmalloc+0x1bb/0x1ce
[ 1822.152546]  __kmem_cache_shutdown+0x151/0x28b
[ 1822.153395]  shutdown_cache+0x13/0x144
[ 1822.154126]  kmem_cache_destroy+0x1c0/0x1fb
[ 1822.154669]  SyS_delete_module+0x194/0x244
[ 1822.155199]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 1822.155773]  entry_SYSCALL_64_fastpath+0x23/0x9a
[ 1822.156343] RIP: 0033:0x7f929bd38b17
[ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
[ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
[ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
[ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
[ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
[ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
[ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128

Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
v2: it's better to use br_dev_delete to clean up because the sysfs entries
were leaked as well, basically we use it for the bridge fdb flush and sysfs
entries delete

Consequently this also would fix the null ptr deref due to the rhashtable
not being initialized in net-next when br_stp_change_bridge_id is called.

 net/bridge/br_netlink.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index d0ef0a8e8831..015f465c514b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1262,19 +1262,20 @@ static int br_dev_newlink(struct net *src_net, struct net_device *dev,
 	struct net_bridge *br = netdev_priv(dev);
 	int err;
 
+	err = register_netdevice(dev);
+	if (err)
+		return err;
+
 	if (tb[IFLA_ADDRESS]) {
 		spin_lock_bh(&br->lock);
 		br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
 		spin_unlock_bh(&br->lock);
 	}
 
-	err = register_netdevice(dev);
-	if (err)
-		return err;
-
 	err = br_changelink(dev, tb, data, extack);
 	if (err)
-		unregister_netdevice(dev);
+		br_dev_delete(dev, NULL);
+
 	return err;
 }
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH 4.4 095/115] l2tp: cleanup l2tp_tunnel_delete calls
From: Greg Kroah-Hartman @ 2017-12-18 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Sabrina Dubroca,
	Guillaume Nault, David S. Miller, netdev, Sasha Levin
In-Reply-To: <20171218152851.886086917@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>


[ Upstream commit 4dc12ffeaeac939097a3f55c881d3dc3523dff0c ]

l2tp_tunnel_delete does not return anything since commit 62b982eeb458
("l2tp: fix race condition in l2tp_tunnel_delete").  But call sites of
l2tp_tunnel_delete still do casts to void to avoid unused return value
warnings.

Kill these now useless casts.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/l2tp/l2tp_core.c    |    2 +-
 net/l2tp/l2tp_netlink.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1856,7 +1856,7 @@ static __net_exit void l2tp_exit_net(str
 
 	rcu_read_lock_bh();
 	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
-		(void)l2tp_tunnel_delete(tunnel);
+		l2tp_tunnel_delete(tunnel);
 	}
 	rcu_read_unlock_bh();
 }
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -285,7 +285,7 @@ static int l2tp_nl_cmd_tunnel_delete(str
 	l2tp_tunnel_notify(&l2tp_nl_family, info,
 			   tunnel, L2TP_CMD_TUNNEL_DELETE);
 
-	(void) l2tp_tunnel_delete(tunnel);
+	l2tp_tunnel_delete(tunnel);
 
 out:
 	return ret;

^ permalink raw reply

* [PATCH 4.9 151/177] l2tp: cleanup l2tp_tunnel_delete calls
From: Greg Kroah-Hartman @ 2017-12-18 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Sabrina Dubroca,
	Guillaume Nault, David S. Miller, netdev, Sasha Levin
In-Reply-To: <20171218152909.823644066@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>


[ Upstream commit 4dc12ffeaeac939097a3f55c881d3dc3523dff0c ]

l2tp_tunnel_delete does not return anything since commit 62b982eeb458
("l2tp: fix race condition in l2tp_tunnel_delete").  But call sites of
l2tp_tunnel_delete still do casts to void to avoid unused return value
warnings.

Kill these now useless casts.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/l2tp/l2tp_core.c    |    2 +-
 net/l2tp/l2tp_netlink.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1944,7 +1944,7 @@ static __net_exit void l2tp_exit_net(str
 
 	rcu_read_lock_bh();
 	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
-		(void)l2tp_tunnel_delete(tunnel);
+		l2tp_tunnel_delete(tunnel);
 	}
 	rcu_read_unlock_bh();
 
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -287,7 +287,7 @@ static int l2tp_nl_cmd_tunnel_delete(str
 	l2tp_tunnel_notify(&l2tp_nl_family, info,
 			   tunnel, L2TP_CMD_TUNNEL_DELETE);
 
-	(void) l2tp_tunnel_delete(tunnel);
+	l2tp_tunnel_delete(tunnel);
 
 out:
 	return ret;

^ permalink raw reply

* Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support
From: Marcin Wojtas @ 2017-12-18 15:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Petazzoni, Andrew Lunn, Florian Fainelli,
	Russell King - ARM Linux, Graeme Gregory,
	<netdev@vger.kernel.org>, Antoine Ténart,
	Rafael J. Wysocki, linux-kernel@vger.kernel.org, Nadav Haklai,
	linux-acpi@vger.kernel.org, Neta Zur Hershkovits, Ezequiel Garcia,
	Grzegorz Jaszczyk, Gregory CLEMENT, Tomasz Nowicki,
	David S. Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAKv+Gu8_6y3dU81ZS4WWCfpRZkfjGo_+_K--e3go_3_xXwBErQ@mail.gmail.com>

Hi Ard

2017-12-18 10:40 GMT+01:00 Ard Biesheuvel <ard.biesheuvel@linaro.org>:
> On 18 December 2017 at 10:17, Marcin Wojtas <mw@semihalf.com> wrote:
>> Hi,
>>
>> This patchset introduces ACPI support in mvpp2 and mvmdio drivers.
>> First three patches introduce fwnode helpers for obtaining PHY
>> information from nodes and also MDIO fwnode API for registering
>> the bus with its PHY/devices.
>>
>> Following patches update code of the mvmdio and mvpp2 drivers
>> to support ACPI tables handling. The latter is done in 4 steps,
>> as can be seen in the commits. For the details, please
>> refer to the commit messages.
>>
>> Drivers operation was tested on top of the net-next branch
>> with both DT and ACPI. Although for compatibility reasons
>> with older platforms, the mvmdio driver keeps using
>> of_ MDIO registering, new fwnode_ one proved to fully work
>> with DT as well (tested on MacchiatoBin board).
>>
>> mvpp2/mvmdio driver can work with the ACPI representation, as exposed
>> on a public branch:
>> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
>> It was compiled together with the most recent Tianocore EDK2 revision.
>> Please refer to the firmware build instruction on MacchiatoBin board:
>> http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II
>>
>> Above support configures 1G to use its PHY normally. 10G can work now
>> only with the link interrupt mode. Somehow reading of the
>> string property in fwnode_mdiobus_child_is_phy works only with
>> DT and cannot cope with 10G PHY nodes as in:
>> https://pastebin.com/3JnYpU0A
>>
>> Above root cause will be further checked. In the meantime I will
>> appreciate any comments or remarks for the kernel patches.
>>
>
> Hi Marcin,
>
> I have added linux-acpi and Graeme to cc. I think it makes sense to
> discuss the way you describe the device topology before looking at the
> patches in more detail.
>

Thanks. Tomasz Nowicki immediately pointed this to me off the lists.

> In particular, I would like to request feedback on the use of
> [redundant] 'reg' properties and the use of _DSD + compatible to
> describe PHYs. Usually, we try to avoid this, given that it is
> essentially a ACPI encapsulated DT dialect that is difficult to
> support in drivers unless they are based on DT to begin with. Also,
> non-standard _DSD properties require a vendor prefix, it is not
> freeform.
>

Already a lot of drivers use both DT + ACPI. Some have IMO very
fanciful bindings in both, mostly handled within the drivers with
custom functions. OF_ - only drivers can use of_mdio_ helper routines,
that assume a certain hierarchy:
MDIO device node with PHYs as children, which are referenced in the
ports node. I believe such approach could fit ACPI description too.
I'm aware however that my code is pretty much DT transposed into ACPI
environment and I'm of course open to discussion, how to do it in the
most proper way.

My main goal is to provide an fwnode-based glue code, that can be used
among the NIC/MDIO drivers  (+ phylink) without multiple ifs
differentiating between ACPI/OF. What I sent has single calls in
mvpp2/mvmdio with a common bottom layers, but I don't see a problem,
that, e.g. when iterating over PHY subnodes, in case of OF
'reg'/'compatible' are used, whereas with ACPI some specific _ADR/_CID
objects.

I'd appreaciate any feedback.

Best regards,
Marcin

> For reference, the ACPI description is here (starting at line 175)
> https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/72d5ac23b20dd74d479daa5e40ba443264b31261/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl

^ permalink raw reply

* Re: pull-request: bpf 2017-12-17
From: David Miller @ 2017-12-18 15:49 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20171217200626.5478-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sun, 17 Dec 2017 21:06:26 +0100

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Fix a corner case in generic XDP where we have non-linear skbs
>    but enough tailroom in the skb to not miss to linearizing there,
>    from Song.
> 
> 2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
>    BPF context is not skb, from Daniel.
> 
> 3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
>    call would use the wrong register for the skb, from Daniel.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Neftin, Sasha @ 2017-12-18 15:50 UTC (permalink / raw)
  To: Pavel Machek, jacob.e.keller
  Cc: David Miller, bpoirier, nix.or.die, netdev, linux-kernel,
	intel-wired-lan, lsorense
In-Reply-To: <20171218115817.GA17054@amd>

On 12/18/2017 13:58, Pavel Machek wrote:
> On Mon 2017-12-18 13:24:40, Neftin, Sasha wrote:
>> On 12/18/2017 12:26, Pavel Machek wrote:
>>> Hi!
>>>
>>>>>>> In v4.15-rc2+, network manager can not see my ethernet card, and
>>>>>>> manual attempts to ifconfig it up did not really help, either.
>>>>>>>
>>>>>>> Card is:
>>>>>>>
>>>>>>> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
>>>>>>> Controller
>>>>> ....
>>>>>>> Any ideas ?
>>>>>> Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.
>>>>>>
>>>>>> See:
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198047
>>>>>>
>>>>>> Fix there :
>>>>>> https://marc.info/?l=linux-kernel&m=151272209903675&w=2
>>>>> I don't see the patch in latest mainline. Not having ethernet
>>>>> is... somehow annoying. What is going on there?
>>>> Generally speaking, e1000 maintainence has been handled very poorly over
>>>> the past few years, I have to say.
>>>>
>>>> Fixes take forever to propagate even when someone other than the
>>>> maintainer provides a working and tested fix, just like this case.
>>>>
>>>> Jeff, please take e1000 maintainence seriously and get these critical
>>>> bug fixes propagated.
>>> No response AFAICT. I guess I should test reverting
>>> 19110cfbb34d4af0cdfe14cd243f3b09dc95b013, then ask you for revert?
>> Hello Pavel,
>>
>> Before ask for reverting 19110cfbb..., please, check if follow patch of
>> Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
> Jacob said, in another email:
>
> # Digging into this, the problem is complicated. The original bug
> # assumed behavior of the .check_for_link call, which is universally not
> # implemented.
> #
> # I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate
> # signaling for link check/link up", 2017-10-10) and find a more proper solution.
>
> ...which makes me think that revert is preffered?
>
> 									Pavel
>
Pavel, before ask for revert - let's check Benjamin's patch following to 
his previous patch. Previous patch was not competed and latest one come 
to complete changes.

^ permalink raw reply

* Re: [PATCH bpf-next 12/13] bpf: arm64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2017-12-18 15:51 UTC (permalink / raw)
  To: Arnd Bergmann, Alexei Starovoitov
  Cc: David S . Miller, John Fastabend, Edward Cree, Jakub Kicinski,
	Networking, kernel-team
In-Reply-To: <CAK8P3a3XhCRiWoLVMd6VB9FMUho554UZt8AmxCm8zbkrok_cOw@mail.gmail.com>

On 12/18/2017 04:29 PM, Arnd Bergmann wrote:
> On Fri, Dec 15, 2017 at 2:55 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> 
> 
>> +       if (jit_data->ctx.offset) {
>> +               ctx = jit_data->ctx;
>> +               image_ptr = jit_data->image;
>> +               header = jit_data->header;
>> +               extra_pass = true;
>> +               goto skip_init_ctx;
>> +       }
>>         memset(&ctx, 0, sizeof(ctx));
>>         ctx.prog = prog;
> 
> The 'goto' jumps over the 'image_size' initialization
> 
>>         prog->bpf_func = (void *)ctx.image;
>>         prog->jited = 1;
>>         prog->jited_len = image_size;
> 
> so we now get a warning here, starting with linux-next-20171218:
> 
> arch/arm64/net/bpf_jit_comp.c: In function 'bpf_int_jit_compile':
> arch/arm64/net/bpf_jit_comp.c:982:18: error: 'image_size' may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
> 
> I could not figure out what the code should be doing instead, or if it is
> indeed safe and the warning is a false-positive.

Good catch, it's buggy indeed. Fix like below is needed; I can submit
it properly a bit later today:

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 396490c..a6fd585 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -855,6 +855,7 @@ static inline void bpf_flush_icache(void *start, void *end)
 struct arm64_jit_data {
 	struct bpf_binary_header *header;
 	u8 *image;
+	int image_size;
 	struct jit_ctx ctx;
 };

@@ -895,6 +896,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	if (jit_data->ctx.offset) {
 		ctx = jit_data->ctx;
 		image_ptr = jit_data->image;
+		image_size = jit_data->image_size;
 		header = jit_data->header;
 		extra_pass = true;
 		goto skip_init_ctx;
@@ -975,6 +977,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	} else {
 		jit_data->ctx = ctx;
 		jit_data->image = image_ptr;
+		jit_data->image_size = image_size;
 		jit_data->header = header;
 	}
 	prog->bpf_func = (void *)ctx.image;

^ permalink raw reply related

* Re: pull-request: bpf-next 2017-12-18
From: David Miller @ 2017-12-18 15:51 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20171218003307.10014-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 18 Dec 2017 01:33:07 +0100

> The following pull-request contains BPF updates for your *net-next* tree.
> 
> The main changes are:
> 
> 1) Allow arbitrary function calls from one BPF function to another BPF function.
>    As of today when writing BPF programs, __always_inline had to be used in
>    the BPF C programs for all functions, unnecessarily causing LLVM to inflate
>    code size. Handle this more naturally with support for BPF to BPF calls
>    such that this __always_inline restriction can be overcome. As a result,
>    it allows for better optimized code and finally enables to introduce core
>    BPF libraries in the future that can be reused out of different projects.
>    x86 and arm64 JIT support was added as well, from Alexei.

Exciting... but now there's a lot of JIT work to do.

 ...
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Pulled, thanks!

^ permalink raw reply

* Re: [bpf-next V1-RFC PATCH 01/14] xdp: base API for new XDP rx-queue info concept
From: Jesper Dangaard Brouer @ 2017-12-18 15:52 UTC (permalink / raw)
  To: David Ahern
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev, gospo, bjorn.topel,
	michael.chan, brouer
In-Reply-To: <2effe097-6802-2020-075d-47cc3576f78f@gmail.com>

On Mon, 18 Dec 2017 06:23:40 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/18/17 3:55 AM, Jesper Dangaard Brouer wrote:
> > 
> > Handling return-errors in the drivers complicated the driver code, as it
> > involves unraveling and deallocating other RX-rings etc (that were
> > already allocated) if the reg fails. (Also notice next patch will allow
> > dev == NULL, if right ptype is set).
> > 
> > I'm not completely rejecting you idea, as this is a good optimization
> > trick, which is to move validation checks to setup-time, thus allowing
> > less validation checks at runtime.  I sort-of actually already did
> > this, as I allow bpf to deref dev without NULL check.  I would argue
> > this is good enough, as we will crash in a predictable way, as above
> > WARN will point to which driver violated the API.
> > 
> > If people think it is valuable I can change this API to return an err?  
> 
> Saeed's suggested API in a comment on patch 12 also removes most of the
> WARN_ONs as it sets the device and index:
> 
> xdp_rxq_info_reg(netdev, rxq_index)
> {
>     rxqueue = netdev->_rx + rxq_index;
>     xdp_rxq = rxqueue.xdp_rxq;
>         xdp_rxq_info_init(xdp_rxq);
>     xdp_rxq.dev = netdev;
>     xdp_rxq.queue_index = rxq_index;
> }
> 
> xdp_rxq_info_unreg(netdev, rxq_index)
> {
> ...
> }

No, we still need the other WARN_ON's.

I don't understand why you think above API is better.  In case
netdev==NULL the system will simply crash on deref of netdev.  That
case happened for both drivers i40e and mlx5, when I was adding this.
The WARN_ON help me quickly identify the issue, and in both drivers it
was a non-critical error, as these queues are not used by XDP. IHMO a
better experience for the driver developer.

IHMO WARN_ON's are a good thing.  For example the:

 if (xdp_rxq->reg_state == REG_STATE_REGISTERED)
   WARN(1, "Missing unregister, handled but fix driver\n");

Just helped me identify a bug in i40e driver.  It turns out that
changing the RX-ring queue size via ethtool <-G|--set-ring> (_not_ the
number of RX-rings, but frames per RX-ring). Then i40e_set_ringparam()
allocates some temp RX-rings and copy-around struct contents, causing
this strange issue.  It will not crash with our currently simple content,
but later this would cause a hard-to-debug issue.  I'm happy I could
catch this now, instead of later as a strange crash.

The WARN's are there to assist driver developers when using this API
in their drivers (better than crash/BUG_ON as they don't have to dig-up
their serial cable console).  For me it is also part of the
documentation, as it document the API assumptions/assertions together
with a small text field.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: Change the order of mac up and sgmii open
From: Timur Tabi @ 2017-12-18 16:05 UTC (permalink / raw)
  To: Hemanth Puranik, netdev, linux-kernel
In-Reply-To: <1513576667-2967-1-git-send-email-hpuranik@codeaurora.org>

On 12/17/2017 11:57 PM, Hemanth Puranik wrote:
> This patch fixes the order of mac_up and sgmii_open for the
> reasons noted below:
> 
> - If open takes more time(if the SGMII block is not responding or
>    if we want to do some delay based task) in this situation we
>    will hit NETDEV watchdog
> - The main reason : We should signal to upper layers that we are
>    ready to receive packets "only" when the entire path is initialized
>    not the other way around, this is followed in the reset path where
>    we do mac_down, sgmii_reset and mac_up. This also makes the driver
>    uniform across the reset and open paths.
> - In the future there may be need for delay based tasks to be done in
>    sgmii open which will result in NETDEV watchdog
> - As per the documentation the order of init should be sgmii, mac, rings
>    and DMA
> 
> Signed-off-by: Hemanth Puranik<hpuranik@codeaurora.org>

Acked-by: Timur Tabi <timur@codeaurora.org>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [RFC ipsec-next 0/4]: Support multiple VTIs with the same src+dst pair
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold

When using IPsec tunnel mode, VTIs provide many benefits compared
to direct configuration of xfrm policies / states. However, one
limitation is that there can only be one VTI between a given pair
of IP addresses. This does not allow configuring multiple IPsec
tunnels to the same security gateway. This is required by some
deployments, for example I-WLAN [3GPP TS 24.327].

This patchset introduces a new VTI_KEYED flag that allows
configuration of multiple VTIs between the same IP address
pairs. The output path is the same as current VTI behaviour,
where a routing lookup selects a VTI interface, and the VTI's
okey specifies the mark to use in the XFRM lookup. The input and
ICMP error paths instead work by first looking up an SA with a
loose match that ignores the mark. That mark is then used to find
the tunnel by ikey.

This approach is simple and requires few userspace changes, but
it has one limitation in that ICMP errors received in response to
VTI-emitted packets can only be processed if the VTI's ikey and
okey are the same. This limitation could be lifted by introducing
another XFRM mark, similar to XFRMA_OUTPUT_MARK, but used for
input.

^ permalink raw reply

* [RFC ipsec-next 1/4] met: xfrm: Add an xfrm lookup that ignores the mark.
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

The xfrm inbound and ICMP error paths can match inbound XFRM states
that have a mark, but only if the skb mark is already correctly set
to match the state mark. This typically requires iptables rules
(potentially even per SA iptables rules), which impose configuration
complexity.

In some cases, it may be useful to match such an SA anyway. An example
is when processing an ICMP error to an ESP packet that we previously
sent. In this case, the only information available to match the SA are
the IP addresses and the outbound SPI. Therefore, if the output SA has
a mark, the lookup will fail and the ICMP packet cannot be processed
unless the packet is somehow already marked.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h    |  4 ++++
 net/xfrm/xfrm_state.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 1ec0c4760646..9d3b7c0ac6e2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1550,6 +1550,10 @@ struct xfrm_state *xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 					    const xfrm_address_t *saddr,
 					    u8 proto,
 					    unsigned short family);
+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family);
 #ifdef CONFIG_XFRM_SUB_POLICY
 int xfrm_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n,
 		   unsigned short family, struct net *net);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1b7856be3eeb..ee678758547f 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -839,6 +839,38 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark,
 	return NULL;
 }

+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family)
+{
+	unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family);
+	struct xfrm_state *x, *cand = NULL;
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) {
+		if (x->props.family != family ||
+		    x->id.spi       != spi ||
+		    x->id.proto     != proto ||
+		    !xfrm_addr_equal(&x->id.daddr, daddr, family))
+			continue;
+
+		if (((mark & x->mark.m) == x->mark.v) &&
+		    xfrm_state_hold_rcu(x)) {
+			if (cand)
+				xfrm_state_put(cand);
+			rcu_read_unlock();
+			return x;
+		}
+
+		if (!cand && xfrm_state_hold_rcu(x))
+			cand = x;
+	}
+
+	rcu_read_unlock();
+	return cand;
+}
+
 static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 						     const xfrm_address_t *daddr,
 						     const xfrm_address_t *saddr,
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 2/4] net: xfrm: find VTI interfaces from xfrm_input
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

Currently, the VTI input path works by first looking up the VTI
by its IP addresses, then setting the tunnel pointer in the
XFRM_TUNNEL_SKB_CB, and then having xfrm_input override the mark
with the mark in the tunnel.

This patch changes the order so that the tunnel is found by a
callback from xfrm_input. Each tunnel type (currently only ip_vti
and ip6_vti) implements a lookup function pointer that finds the
tunnel and sets it in the CB, and also does a state lookup.

This has the advantage that much more information is available to
the tunnel lookup function, including the looked-up XFRM state.
This will be used in a future change to allow finding the tunnel
not just from the IP addresses, but also from the xfrm lookup.

The lookup function pointer occupies the same space in the
XFRM_TUNNEL_SKB_CB as the IPv4/IPv6 tunnel pointer. The semantics
of the field are:
- When not running a handler that uses tunnels: always null.
- At the beginning of xfrm_input: lookup function pointer.
- After xfrm_input calls the lookup function: tunnel if found,
  else null.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h     |  2 ++
 net/ipv4/ip_vti.c      | 43 ++++++++++++++++++++++++++++++++++++----
 net/ipv6/ip6_vti.c     | 53 +++++++++++++++++++++++++++++++++++++++++++++-----
 net/ipv6/xfrm6_input.c |  1 -
 net/xfrm/xfrm_input.c  | 34 +++++++++++++++++++-------------
 5 files changed, 109 insertions(+), 24 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 9d3b7c0ac6e2..3d245f2f6f6c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -653,6 +653,8 @@ struct xfrm_tunnel_skb_cb {
 	} header;
 
 	union {
+		int (*lookup)(struct sk_buff *skb, int nexthdr, __be32 spi,
+			      __be32 seq, struct xfrm_state **x);
 		struct ip_tunnel *ip4;
 		struct ip6_tnl *ip6;
 	} tunnel;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 949f432a5f04..850625598187 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -49,8 +49,8 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
 static unsigned int vti_net_id __read_mostly;
 static int vti_tunnel_init(struct net_device *dev);
 
-static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
-		     int encap_type)
+static struct ip_tunnel *
+vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph = ip_hdr(skb);
@@ -59,19 +59,52 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
 				  iph->saddr, iph->daddr, 0);
+	if (tunnel) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
+				       (xfrm_address_t *)&iph->daddr,
+				       spi, iph->protocol, AF_INET);
+	}
+
+	return tunnel;
+}
+
+static int vti_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+		      struct xfrm_state **x)
+{
+	struct net *net = dev_net(skb->dev);
+	struct ip_tunnel *tunnel;
+
+	tunnel = vti4_find_tunnel(skb, spi, x);
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 			goto drop;
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET, spi, seq);
+			tunnel->dev->stats.rx_errors++;
+			tunnel->dev->stats.rx_dropped++;
+			goto drop;
+		}
+
 		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
 
-		return xfrm_input(skb, nexthdr, spi, encap_type);
+		return 0;
 	}
 
 	return -EINVAL;
 drop:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
+		     int encap_type)
+{
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti_lookup;
+	return xfrm_input(skb, nexthdr, spi, encap_type);
 }
 
 static int vti_rcv(struct sk_buff *skb)
@@ -93,6 +126,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = NULL;
+
 	if (!tunnel)
 		return 1;
 
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index dbb74f3c57a7..d0676f2f99eb 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -297,13 +297,33 @@ static void vti6_dev_uninit(struct net_device *dev)
 	dev_put(dev);
 }
 
-static int vti6_rcv(struct sk_buff *skb)
+static struct ip6_tnl *
+vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
+	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
+
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	if (t) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
+				       (xfrm_address_t *)&ipv6h->daddr,
+				       spi, ipv6h->nexthdr, AF_INET6);
+	}
+
+	return t;
+}
+
+int
+vti6_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+	    struct xfrm_state **x)
+{
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
+	struct ip6_tnl *t;
 
 	rcu_read_lock();
-	t = vti6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_find_tunnel(skb, spi, x);
 	if (t) {
 		if (t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) {
 			rcu_read_unlock();
@@ -312,7 +332,7 @@ static int vti6_rcv(struct sk_buff *skb)
 
 		if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 			rcu_read_unlock();
-			return 0;
+			goto discard;
 		}
 
 		if (!ip6_tnl_rcv_ctl(t, &ipv6h->daddr, &ipv6h->saddr)) {
@@ -321,15 +341,36 @@ static int vti6_rcv(struct sk_buff *skb)
 			goto discard;
 		}
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET6, spi, seq);
+			t->dev->stats.rx_errors++;
+			t->dev->stats.rx_dropped++;
+			rcu_read_unlock();
+			goto discard;
+		}
+
 		rcu_read_unlock();
 
-		return xfrm6_rcv_tnl(skb, t);
+		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
+
+		return 0;
 	}
 	rcu_read_unlock();
 	return -EINVAL;
 discard:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti6_rcv(struct sk_buff *skb)
+{
+	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
+
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
+	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
@@ -343,6 +384,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL;
+
 	if (!t)
 		return 1;
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index fe04e23af986..6d1b734fef8d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -25,7 +25,6 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
 		  struct ip6_tnl *t)
 {
-	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index ac277b97e0d7..7b54f58454ee 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -267,18 +267,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	family = XFRM_SPI_SKB_CB(skb)->family;
 
-	/* if tunnel is present override skb->mark value with tunnel i_key */
-	switch (family) {
-	case AF_INET:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4->parms.i_key);
-		break;
-	case AF_INET6:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6->parms.i_key);
-		break;
-	}
-
 	err = secpath_set(skb);
 	if (err) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR);
@@ -293,14 +281,29 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	daddr = (xfrm_address_t *)(skb_network_header(skb) +
 				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup) {
+		err = XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup(skb, nexthdr,
+							     spi, seq, &x);
+		if (err) {
+			XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = NULL;
+			return err;
+		}
+	}
+
 	do {
 		if (skb->sp->len == XFRM_MAX_DEPTH) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
+			if (x)
+				xfrm_state_put(x);
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
-		if (x == NULL) {
+		if (!x)
+			x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr,
+					      family);
+
+		if (!x) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
@@ -420,6 +423,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
 			goto drop;
 		}
+
+		if (!err)
+			x = NULL;
 	} while (!err);
 
 	err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 3/4] net: xfrm: support multiple VTI tunnels
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

This commit allows the creation of multiple VTI tunnels with the
same src+dst pair, via a new VTI_KEYED flag. This makes it
possible to maintain multiple IPsec tunnels to the same security
gateway, with the tunnels distinguished by SPI.

The new semantics are as follows:

- The output path is the same as existing VTIs. A routing lookup
  matches a VTI interface. The VTI uses its o_key to as the mark
  to select an XFRM state. The state transforms the packet.
- Input works as follows:
  1. Attempt to match a regular VTI by IP addresses only. If that
     succeeds, use the i_key as the mark to look up the xfrm
     state.
  2. If the match failed, do an XFRM state lookup that ignores
     the mark. If that finds an state, then use the state match's
     mark to find the tunnel by its i_key.
- ICMP errors are similar to input, except the search is for the
  outbound XFRM state, because the only data that is available is
  the outbound SPI. Thus, ICMP errors are only processed if the
  ikey is the same as the same as the okey. AFAICS this is
  consistent with GRE tunnels, but not with existing VTI
  behaviour.

Tested: https://android-review.googlesource.com/571524
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/uapi/linux/if_tunnel.h |   3 ++
 net/ipv4/ip_vti.c              |  75 +++++++++++++++++++++++--------
 net/ipv6/ip6_vti.c             | 100 +++++++++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 42 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1b3d148c4560..c2ec509cbc9e 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -147,6 +147,8 @@ enum {
 
 /* VTI-mode i_flags */
 #define VTI_ISVTI ((__force __be16)0x0001)
+#define VTI_KEYED ((__force __be16)0x0002)
+#define VTI_IFLAG_MASK ((__force __be16)0x0003)
 
 enum {
 	IFLA_VTI_UNSPEC,
@@ -156,6 +158,7 @@ enum {
 	IFLA_VTI_LOCAL,
 	IFLA_VTI_REMOTE,
 	IFLA_VTI_FWMARK,
+	IFLA_VTI_IFLAGS,
 	__IFLA_VTI_MAX,
 };
 
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 850625598187..f5793782c418 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -63,6 +63,17 @@ vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
 				       (xfrm_address_t *)&iph->daddr,
 				       spi, iph->protocol, AF_INET);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *) &iph->daddr,
+					     spi, iph->protocol, AF_INET);
+		if (!*x)
+			return NULL;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  iph->saddr, iph->daddr,
+					  cpu_to_be32((*x)->mark.v));
+		if (!tunnel)
+			xfrm_state_put(*x);
 	}
 
 	return tunnel;
@@ -302,7 +313,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 static int vti4_err(struct sk_buff *skb, u32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip_tunnel *tunnel;
 	struct ip_esp_hdr *esph;
@@ -313,13 +323,6 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	int protocol = iph->protocol;
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
-				  iph->daddr, iph->saddr, 0);
-	if (!tunnel)
-		return -1;
-
-	mark = be32_to_cpu(tunnel->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
@@ -347,18 +350,46 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 		return 0;
 	}
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET);
-	if (!x)
-		return 0;
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+				  iph->daddr, iph->saddr, 0);
+	if (tunnel) {
+		x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, iph->protocol, AF_INET);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *)&iph->daddr,
+					    spi, iph->protocol, AF_INET);
+		if (!x)
+			goto out;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  iph->daddr, iph->saddr,
+					  cpu_to_be32(x->mark.v));
+	}
+
+	if (!tunnel || !x)
+		goto out;
 
 	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
 		ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
 	else
 		ipv4_redirect(skb, net, 0, 0, protocol, 0);
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return tunnel ? 0 : -1;
+}
+
+static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+}
+
+static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
 }
 
 static int
@@ -381,7 +412,7 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	if (!(p.o_flags & GRE_KEY))
 		p.o_key = 0;
 
-	p.i_flags = VTI_ISVTI;
+	p.i_flags = vti_flags_to_tnl_flags(p.i_flags);
 
 	err = ip_tunnel_ioctl(dev, &p, cmd);
 	if (err)
@@ -508,8 +539,6 @@ static void vti_netlink_parms(struct nlattr *data[],
 	if (!data)
 		return;
 
-	parms->i_flags = VTI_ISVTI;
-
 	if (data[IFLA_VTI_LINK])
 		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
 
@@ -527,6 +556,11 @@ static void vti_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		*fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_IFLAGS])
+		parms->i_flags = nla_get_be16(data[IFLA_VTI_IFLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(parms->i_flags);
 }
 
 static int vti_newlink(struct net *src_net, struct net_device *dev,
@@ -567,6 +601,8 @@ static size_t vti_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_IFLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -580,7 +616,9 @@ static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_be32(skb, IFLA_VTI_OKEY, p->o_key) ||
 	    nla_put_in_addr(skb, IFLA_VTI_LOCAL, p->iph.saddr) ||
 	    nla_put_in_addr(skb, IFLA_VTI_REMOTE, p->iph.daddr) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark) ||
+	    nla_put_be16(skb, IFLA_VTI_IFLAGS,
+			 tnl_flags_to_vti_flags(p->i_flags)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -593,6 +631,7 @@ static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
 	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_IFLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti_link_ops __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d0676f2f99eb..3797738c828f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -54,9 +54,10 @@
 #define IP6_VTI_HASH_SIZE_SHIFT  5
 #define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
-static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
+static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2,
+		__be32 i_key)
 {
-	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
+	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2) ^ i_key;
 
 	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
@@ -78,11 +79,17 @@ struct vti6_net {
 #define for_each_vti6_tunnel_rcu(start) \
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
+static __be32 vti6_get_hash_key(const struct __ip6_tnl_parm *p)
+{
+	return (p->i_flags & GRE_KEY) ? p->i_key : 0;
+}
+
 /**
- * vti6_tnl_lookup - fetch tunnel matching the end-point addresses
+ * vti6_tnl_lookup - fetch tunnel matching the end-point addresses and i_key
  *   @net: network namespace
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
+ *   @local: the i_key of the tunnel
  *
  * Return:
  *   tunnel matching given end-points if found,
@@ -91,9 +98,9 @@ struct vti6_net {
  **/
 static struct ip6_tnl *
 vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
-		const struct in6_addr *local)
+		const struct in6_addr *local, __be32 i_key)
 {
-	unsigned int hash = HASH(remote, local);
+	unsigned int hash = HASH(remote, local, i_key);
 	struct ip6_tnl *t;
 	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
 	struct in6_addr any;
@@ -101,21 +108,24 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
 
 	memset(&any, 0, sizeof(any));
-	hash = HASH(&any, local);
+	hash = HASH(&any, local, i_key);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
 
-	hash = HASH(remote, &any);
+	hash = HASH(remote, &any, i_key);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
@@ -147,7 +157,7 @@ vti6_tnl_bucket(struct vti6_net *ip6n, const struct __ip6_tnl_parm *p)
 
 	if (!ipv6_addr_any(remote) || !ipv6_addr_any(local)) {
 		prio = 1;
-		h = HASH(remote, local);
+		h = HASH(remote, local, vti6_get_hash_key(p));
 	}
 	return &ip6n->tnls[prio][h];
 }
@@ -266,7 +276,8 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == vti6_get_hash_key(p)) {
 			if (create)
 				return NULL;
 
@@ -304,11 +315,21 @@ vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
 
-	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr, 0);
 	if (t) {
 		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
 				       (xfrm_address_t *)&ipv6h->daddr,
 				       spi, ipv6h->nexthdr, AF_INET6);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *) &ipv6h->daddr,
+					     spi, ipv6h->nexthdr, AF_INET6);
+		if (!*x)
+			return NULL;
+		t =  vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr,
+				     cpu_to_be32((*x)->mark.v));
+		if (!t)
+			xfrm_state_put(*x);
 	}
 
 	return t;
@@ -613,7 +634,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		    u8 type, u8 code, int offset, __be32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip6_tnl *t;
 	struct ip_esp_hdr *esph;
@@ -623,12 +643,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
 	int protocol = iph->nexthdr;
 
-	t = vti6_tnl_lookup(dev_net(skb->dev), &iph->daddr, &iph->saddr);
-	if (!t)
-		return -1;
-
-	mark = be32_to_cpu(t->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data + offset);
@@ -650,19 +664,35 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	    type != NDISC_REDIRECT)
 		return 0;
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET6);
-	if (!x)
-		return 0;
+	t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr, 0);
+	if (t) {
+		x = xfrm_state_lookup(net, be32_to_cpu(t->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, protocol, AF_INET6);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *) &iph->daddr,
+					    spi, protocol, AF_INET6);
+		if (!x)
+			goto out;
+		t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr,
+				    cpu_to_be32(x->mark.v));
+	}
+
+	if (!t || !x)
+		goto out;
 
 	if (type == NDISC_REDIRECT)
 		ip6_redirect(skb, net, skb->dev->ifindex, 0,
 			     sock_net_uid(net, NULL));
 	else
 		ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return t ? 0 : -1;
 }
 
 static void vti6_link_config(struct ip6_tnl *t)
@@ -957,9 +987,21 @@ static int vti6_validate(struct nlattr *tb[], struct nlattr *data[],
 	return 0;
 }
 
+static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+}
+
+static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
+}
+
 static void vti6_netlink_parms(struct nlattr *data[],
 			       struct __ip6_tnl_parm *parms)
 {
+	__be16 i_flags = 0;
+
 	memset(parms, 0, sizeof(*parms));
 
 	if (!data)
@@ -982,6 +1024,11 @@ static void vti6_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		parms->fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_IFLAGS])
+		i_flags = nla_get_be16(data[IFLA_VTI_IFLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(i_flags);
 }
 
 static int vti6_newlink(struct net *src_net, struct net_device *dev,
@@ -1051,6 +1098,8 @@ static size_t vti6_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_IFLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -1064,7 +1113,9 @@ static int vti6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_in6_addr(skb, IFLA_VTI_REMOTE, &parm->raddr) ||
 	    nla_put_be32(skb, IFLA_VTI_IKEY, parm->i_key) ||
 	    nla_put_be32(skb, IFLA_VTI_OKEY, parm->o_key) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark) ||
+	    nla_put_be16(skb, IFLA_VTI_IFLAGS,
+			 tnl_flags_to_vti_flags(parm->i_flags)))
 		goto nla_put_failure;
 	return 0;
 
@@ -1079,6 +1130,7 @@ static const struct nla_policy vti6_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_IFLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti6_link_ops __read_mostly = {
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 4/4] net: xfrm: don't pass tunnel objects to xfrm6_rcv_spi.
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

This change removes the tunnel parameter from xfrm6_rcv_spi and
deletes xfrm6_rcv_tnl. These were only used by the VTI code and
are now unused.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h      |  4 +---
 net/ipv4/ip_vti.c       |  4 ++--
 net/ipv6/ip6_vti.c      |  2 +-
 net/ipv6/xfrm6_input.c  | 13 +++----------
 net/ipv6/xfrm6_tunnel.c |  2 +-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 3d245f2f6f6c..fc19dda73c50 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1638,10 +1638,8 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t);
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 int xfrm6_transport_finish(struct sk_buff *skb, int async);
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t);
 int xfrm6_rcv(struct sk_buff *skb);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		     xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index f5793782c418..144ec34fd975 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -384,12 +384,12 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 
 static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
 {
-	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? TUNNEL_KEY : 0);
 }
 
 static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
 {
-	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
+	return VTI_ISVTI | ((i_flags & TUNNEL_KEY) ? VTI_KEYED : 0);
 }
 
 static int
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 3797738c828f..3a68b7ba1b9c 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -391,7 +391,7 @@ static int vti6_rcv(struct sk_buff *skb)
 	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
 
 	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
-	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
+	return xfrm6_rcv_spi(skb, nexthdr, 0);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 6d1b734fef8d..5f20e309263f 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -22,8 +22,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 	return xfrm6_extract_header(skb);
 }
 
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
@@ -59,16 +58,10 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	return -1;
 }
 
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t)
-{
-	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
-			     0, t);
-}
-EXPORT_SYMBOL(xfrm6_rcv_tnl);
-
 int xfrm6_rcv(struct sk_buff *skb)
 {
-	return xfrm6_rcv_tnl(skb, NULL);
+	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
+			     0);
 }
 EXPORT_SYMBOL(xfrm6_rcv);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index f85f0d7480ac..02161543a932 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -236,7 +236,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
 	__be32 spi;
 
 	spi = xfrm6_tunnel_spi_lookup(net, (const xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi);
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [PATCH 4.14 121/178] l2tp: cleanup l2tp_tunnel_delete calls
From: Greg Kroah-Hartman @ 2017-12-18 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Sabrina Dubroca,
	Guillaume Nault, David S. Miller, netdev, Sasha Levin
In-Reply-To: <20171218152920.567991776@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>


[ Upstream commit 4dc12ffeaeac939097a3f55c881d3dc3523dff0c ]

l2tp_tunnel_delete does not return anything since commit 62b982eeb458
("l2tp: fix race condition in l2tp_tunnel_delete").  But call sites of
l2tp_tunnel_delete still do casts to void to avoid unused return value
warnings.

Kill these now useless casts.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/l2tp/l2tp_core.c    |    2 +-
 net/l2tp/l2tp_netlink.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1891,7 +1891,7 @@ static __net_exit void l2tp_exit_net(str
 
 	rcu_read_lock_bh();
 	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
-		(void)l2tp_tunnel_delete(tunnel);
+		l2tp_tunnel_delete(tunnel);
 	}
 	rcu_read_unlock_bh();
 
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -282,7 +282,7 @@ static int l2tp_nl_cmd_tunnel_delete(str
 	l2tp_tunnel_notify(&l2tp_nl_family, info,
 			   tunnel, L2TP_CMD_TUNNEL_DELETE);
 
-	(void) l2tp_tunnel_delete(tunnel);
+	l2tp_tunnel_delete(tunnel);
 
 	l2tp_tunnel_dec_refcount(tunnel);
 

^ permalink raw reply

* Re: BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
From: Santosh Shilimkar @ 2017-12-18 16:28 UTC (permalink / raw)
  To: syzbot, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	syzkaller-bugs-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <001a1145ac5480242305609956b3-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

On 12/18/2017 12:43 AM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> program syz-executor6 is using a deprecated SCSI ioctl, please convert 
> it to SG_IO
> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

Looks like another one tripping on empty transport. Mostly below should
address it but we will test it if it does.

diff --git a/net/rds/send.c b/net/rds/send.c
index 7244d2e..e2d0eaa 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,7 +183,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
                 goto out;
         }

-       if (conn->c_trans->xmit_path_prepare)
+       if (conn->c_trans && conn->c_trans->xmit_path_prepare)
                 conn->c_trans->xmit_path_prepare(cp);



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [net  1/1] tipc: fix lost member events bug
From: Jon Maloy @ 2017-12-18 16:34 UTC (permalink / raw)
  To: davem, netdev
  Cc: tipc-discussion, hoang.h.le, mohan.krishna.ghanta.krishnamurthy

Group messages are not supposed to be returned to sender when the
destination socket disappears. This is done correctly for regular
traffic messages, by setting the 'dest_droppable' bit in the header.
But we forget to do that in group protocol messages. This has the effect
that such messages may sometimes bounce back to the sender, be perceived
as a legitimate peer message, and wreak general havoc for the rest of
the session. In particular, we have seen that a member in state LEAVING
may go back to state RECLAIMED or REMITTED, hence causing suppression
of an otherwise expected 'member down' event to the user.

We fix this by setting the 'dest_droppable' bit even in group protocol
messages.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/group.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/group.c b/net/tipc/group.c
index 95fec2c..efb5714 100644
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -648,6 +648,7 @@ static void tipc_group_proto_xmit(struct tipc_group *grp, struct tipc_member *m,
 	} else if (mtyp == GRP_REMIT_MSG) {
 		msg_set_grp_remitted(hdr, m->window);
 	}
+	msg_set_dest_droppable(hdr, true);
 	__skb_queue_tail(xmitq, skb);
 }

-- 
2.1.4

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program
From: Knut Omang @ 2017-12-18 16:41 UTC (permalink / raw)
  To: Joe Perches, Jason Gunthorpe
  Cc: Stephen Hemminger, linux-kernel, Mauro Carvalho Chehab,
	Nicolas Palix, Jonathan Corbet, Santosh Shilimkar, Matthew Wilcox,
	cocci, rds-devel, linux-rdma, linux-doc, Doug Ledford,
	Mickaël Salaün, Shuah Khan, linux-kbuild, Michal Marek,
	Julia Lawall, John Haxby, Åsmund Østvold,
	Masahiro Yamada
In-Reply-To: <1513611003.31581.71.camel@perches.com>

On Mon, 2017-12-18 at 07:30 -0800, Joe Perches wrote:
> On Mon, 2017-12-18 at 14:05 +0100, Knut Omang wrote:
> > > Here is a list of the checkpatch messages for drivers/infiniband
> > > sorted by type.
> > > 
> > > Many of these might be corrected by using
> > > 
> > > $ ./scripts/checkpatch.pl -f --fix-inplace --types=<TYPE> \
> > >   $(git ls-files drivers/infiniband/)
> > 
> > Yes, and I already did that work piece by piece for individual types,
> > just to test the runchecks tool, and want to post that set once the 
> > runchecks script and Makefile changes itself are in,
> 
> I think those are independent of any runcheck tool changes and
> could be posted now.  In general, don't keep patches in a local
> tree waiting on some other unrelated patch.

It becomes related in that the runchecks.cfg file is updated 
in all the patches to keep 'make C=2' run with 0 errors while 
enabling more checks. I think they serve well as examples of 
how a workflow with runchecks could be.

> Just fyi:
> 
> There is a script that helps automate checkpatch "by-type" conversions
> with compilation, .o difference checking, and git commit editing.
> 
> https://lkml.org/lkml/2014/7/11/794

oh - good to know - seems it would have been a good help
during my little exercise..

Thanks,
Knut

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox