Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Waiman Long @ 2020-06-16 20:01 UTC (permalink / raw)
  To: Joe Perches, Andrew Morton, David Howells, Jarkko Sakkinen,
	James Morris, Serge E. Hallyn, Linus Torvalds, Matthew Wilcox,
	David Rientjes
  Cc: Michal Hocko, Johannes Weiner, Dan Carpenter, David Sterba,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <fe3b9a437be4aeab3bac68f04193cb6daaa5bee4.camel@perches.com>

On 6/16/20 2:53 PM, Joe Perches wrote:
> On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
>>   v4:
>>    - Break out the memzero_explicit() change as suggested by Dan Carpenter
>>      so that it can be backported to stable.
>>    - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
>>      now as there can be a bit more discussion on what is best. It will be
>>      introduced as a separate patch later on after this one is merged.
> To this larger audience and last week without reply:
> https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/
>
> Are there _any_ fastpath uses of kfree or vfree?
>
> Many patches have been posted recently to fix mispairings
> of specific types of alloc and free functions.
>
> To eliminate these mispairings at a runtime cost of four
> comparisons, should the kfree/vfree/kvfree/kfree_const
> functions be consolidated into a single kfree?
>
> Something like the below:
>
>     void kfree(const void *addr)
>     {
>     	if (is_kernel_rodata((unsigned long)addr))
>     		return;
>
>     	if (is_vmalloc_addr(addr))
>     		_vfree(addr);
>     	else
>     		_kfree(addr);
>     }
>
>     #define kvfree		kfree
>     #define vfree		kfree
>     #define kfree_const	kfree
>
>
How about adding CONFIG_DEBUG_VM code to check for invalid address 
ranges in kfree() and vfree()? By doing this, we can catch unmatched 
pairing in debug mode, but won't have the overhead when debug mode is off.

Thought?

Cheers,
Longman


^ permalink raw reply

* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Jason A. Donenfeld @ 2020-06-16 19:46 UTC (permalink / raw)
  To: Joe Perches
  Cc: Waiman Long, Andrew Morton, David Howells, Jarkko Sakkinen,
	James Morris, Serge E. Hallyn, Linus Torvalds, Matthew Wilcox,
	David Rientjes, Michal Hocko, Johannes Weiner, Dan Carpenter,
	David Sterba, Linux-MM, keyrings, LKML, Linux Crypto Mailing List,
	linux-pm, linux-stm32, linux-amlogic, linux-mediatek,
	linuxppc-dev, virtualization, Netdev, linux-ppp,
	WireGuard mailing list, linux-wireless, devel, linux-scsi,
	target-devel, linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs,
	kasan-dev, linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity,
	David Miller, Steffen Klassert
In-Reply-To: <fe3b9a437be4aeab3bac68f04193cb6daaa5bee4.camel@perches.com>

On Tue, Jun 16, 2020 at 12:54 PM Joe Perches <joe@perches.com> wrote:
>
> On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
> >  v4:
> >   - Break out the memzero_explicit() change as suggested by Dan Carpenter
> >     so that it can be backported to stable.
> >   - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
> >     now as there can be a bit more discussion on what is best. It will be
> >     introduced as a separate patch later on after this one is merged.
>
> To this larger audience and last week without reply:
> https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/
>
> Are there _any_ fastpath uses of kfree or vfree?

The networking stack has various places where there will be a quick
kmalloc followed by a kfree for an incoming or outgoing packet. One
place that comes to mind would be esp_alloc_tmp, which does a quick
allocation of some temporary kmalloc memory, processes some packet
things inside of that, and then frees it, sometimes in the same
function, and sometimes later in an async callback. I don't know how
"fastpath" you consider this, but usually packet processing is
something people want to do with minimal overhead, considering how
fast NICs are these days.

^ permalink raw reply

* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Waiman Long @ 2020-06-16 19:43 UTC (permalink / raw)
  To: Joe Perches, Andrew Morton, David Howells, Jarkko Sakkinen,
	James Morris, Serge E. Hallyn, Linus Torvalds, Matthew Wilcox,
	David Rientjes
  Cc: Michal Hocko, Johannes Weiner, Dan Carpenter, David Sterba,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <fe3b9a437be4aeab3bac68f04193cb6daaa5bee4.camel@perches.com>

On 6/16/20 2:53 PM, Joe Perches wrote:
> On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
>>   v4:
>>    - Break out the memzero_explicit() change as suggested by Dan Carpenter
>>      so that it can be backported to stable.
>>    - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
>>      now as there can be a bit more discussion on what is best. It will be
>>      introduced as a separate patch later on after this one is merged.
> To this larger audience and last week without reply:
> https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/
>
> Are there _any_ fastpath uses of kfree or vfree?

I am not sure about that, but both of them can be slow.


>
> Many patches have been posted recently to fix mispairings
> of specific types of alloc and free functions.
>
> To eliminate these mispairings at a runtime cost of four
> comparisons, should the kfree/vfree/kvfree/kfree_const
> functions be consolidated into a single kfree?
>
> Something like the below:
>
>     void kfree(const void *addr)
>     {
>     	if (is_kernel_rodata((unsigned long)addr))
>     		return;
>
>     	if (is_vmalloc_addr(addr))
>     		_vfree(addr);
>     	else
>     		_kfree(addr);
>     }
>
is_kernel_rodata() is inlined, but is_vmalloc_addr() isn't. So the 
overhead can be a bit bigger.

Cheers,
Longman


^ permalink raw reply

* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Joe Perches @ 2020-06-16 18:53 UTC (permalink / raw)
  To: Waiman Long, Andrew Morton, David Howells, Jarkko Sakkinen,
	James Morris, Serge E. Hallyn, Linus Torvalds, Matthew Wilcox,
	David Rientjes
  Cc: Michal Hocko, Johannes Weiner, Dan Carpenter, David Sterba,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <20200616015718.7812-1-longman@redhat.com>

On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
>  v4:
>   - Break out the memzero_explicit() change as suggested by Dan Carpenter
>     so that it can be backported to stable.
>   - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
>     now as there can be a bit more discussion on what is best. It will be
>     introduced as a separate patch later on after this one is merged.

To this larger audience and last week without reply:
https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/

Are there _any_ fastpath uses of kfree or vfree?

Many patches have been posted recently to fix mispairings
of specific types of alloc and free functions.

To eliminate these mispairings at a runtime cost of four
comparisons, should the kfree/vfree/kvfree/kfree_const
functions be consolidated into a single kfree?

Something like the below:

   void kfree(const void *addr)
   {
   	if (is_kernel_rodata((unsigned long)addr))
   		return;

   	if (is_vmalloc_addr(addr))
   		_vfree(addr);
   	else
   		_kfree(addr);
   }

   #define kvfree		kfree
   #define vfree		kfree
   #define kfree_const	kfree



^ permalink raw reply

* Re: [PATCH 1/2] net: qrtr: Migrate nameservice to kernel from userspace
From: Qais Yousef @ 2020-06-16 18:48 UTC (permalink / raw)
  To: Manivannan Sadhasivam; +Cc: davem, kuba, bjorn.andersson, netdev, linux-kernel
In-Reply-To: <20200213091427.13435-2-manivannan.sadhasivam@linaro.org>

Hi Manivannan, David

On 02/13/20 14:44, Manivannan Sadhasivam wrote:

[...]

> +	trace_printk("advertising new server [%d:%x]@[%d:%d]\n",
> +		     srv->service, srv->instance, srv->node, srv->port);

I can't tell exactly from the discussion whether this is the version that got
merged into 5.7 or not, but it does match the commit message.

This patch introduces several trace_printk() which AFAIK is intended for
debugging only and shouldn't make it into mainline kernels.

It causes this big message to be printed to the log too

[    0.000000] **********************************************************
[    0.000000] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.000000] **                                                      **
[    0.000000] ** trace_printk() being used. Allocating extra memory.  **
[    0.000000] **                                                      **
[    0.000000] ** This means that this is a DEBUG kernel and it is     **
[    0.000000] ** unsafe for production use.                           **
[    0.000000] **                                                      **
[    0.000000] ** If you see this message and you are not debugging    **
[    0.000000] ** the kernel, report this immediately to your vendor!  **
[    0.000000] **                                                      **
[    0.000000] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.000000] **********************************************************

Shouldn't this be replaced with one of pr_*() variants instead?

Thanks

--
Qais Yousef

^ permalink raw reply

* Re: [PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Waiman Long @ 2020-06-16 18:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Jarkko Sakkinen, James Morris, Serge E. Hallyn,
	Linus Torvalds, Joe Perches, Matthew Wilcox, David Rientjes,
	Michal Hocko, Johannes Weiner, Dan Carpenter, Jason A . Donenfeld,
	linux-mm, keyrings, linux-kernel, linux-crypto, linux-pm,
	linux-stm32, linux-amlogic, linux-mediatek, linuxppc-dev,
	virtualization, netdev, linux-ppp, wireguard, linux-wireless,
	devel, linux-scsi, target-devel, linux-cifs, linux-fscrypt,
	ecryptfs, kasan-dev, linux-bluetooth, linux-wpan, linux-sctp,
	linux-nfs, tipc-discussion, linux-security-module,
	linux-integrity
In-Reply-To: <20200616110944.c13f221e5c3f54e775190afe@linux-foundation.org>

On 6/16/20 2:09 PM, Andrew Morton wrote:
> On Tue, 16 Jun 2020 11:43:11 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> As said by Linus:
>>
>>    A symmetric naming is only helpful if it implies symmetries in use.
>>    Otherwise it's actively misleading.
>>
>>    In "kzalloc()", the z is meaningful and an important part of what the
>>    caller wants.
>>
>>    In "kzfree()", the z is actively detrimental, because maybe in the
>>    future we really _might_ want to use that "memfill(0xdeadbeef)" or
>>    something. The "zero" part of the interface isn't even _relevant_.
>>
>> The main reason that kzfree() exists is to clear sensitive information
>> that should not be leaked to other future users of the same memory
>> objects.
>>
>> Rename kzfree() to kfree_sensitive() to follow the example of the
>> recently added kvfree_sensitive() and make the intention of the API
>> more explicit. In addition, memzero_explicit() is used to clear the
>> memory to make sure that it won't get optimized away by the compiler.
>>
>> The renaming is done by using the command sequence:
>>
>>    git grep -w --name-only kzfree |\
>>    xargs sed -i 's/\bkzfree\b/kfree_sensitive/'
>>
>> followed by some editing of the kfree_sensitive() kerneldoc and adding
>> a kzfree backward compatibility macro in slab.h.
>>
>> ...
>>
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -186,10 +186,12 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *);
>>    */
>>   void * __must_check krealloc(const void *, size_t, gfp_t);
>>   void kfree(const void *);
>> -void kzfree(const void *);
>> +void kfree_sensitive(const void *);
>>   size_t __ksize(const void *);
>>   size_t ksize(const void *);
>>   
>> +#define kzfree(x)	kfree_sensitive(x)	/* For backward compatibility */
>> +
> What was the thinking here?  Is this really necessary?
>
> I suppose we could keep this around for a while to ease migration.  But
> not for too long, please.
>
It should be there just for 1 release cycle. I have broken out the btrfs 
patch to the btrfs list and I didn't make the kzfree to kfree_sensitive 
conversion there as that patch was in front in my patch list. So 
depending on which one lands first, there can be a window where the 
compilation may fail without this workaround. I am going to send out 
another patch in the next release cycle to remove it.

Cheers,
Longman


^ permalink raw reply

* Re: [PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Andrew Morton @ 2020-06-16 18:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: David Howells, Jarkko Sakkinen, James Morris, Serge E. Hallyn,
	Linus Torvalds, Joe Perches, Matthew Wilcox, David Rientjes,
	Michal Hocko, Johannes Weiner, Dan Carpenter, Jason A . Donenfeld,
	linux-mm, keyrings, linux-kernel, linux-crypto, linux-pm,
	linux-stm32, linux-amlogic, linux-mediatek, linuxppc-dev,
	virtualization, netdev, linux-ppp, wireguard, linux-wireless,
	devel, linux-scsi, target-devel, linux-cifs, linux-fscrypt,
	ecryptfs, kasan-dev, linux-bluetooth, linux-wpan, linux-sctp,
	linux-nfs, tipc-discussion, linux-security-module,
	linux-integrity
In-Reply-To: <20200616154311.12314-3-longman@redhat.com>

On Tue, 16 Jun 2020 11:43:11 -0400 Waiman Long <longman@redhat.com> wrote:

> As said by Linus:
> 
>   A symmetric naming is only helpful if it implies symmetries in use.
>   Otherwise it's actively misleading.
> 
>   In "kzalloc()", the z is meaningful and an important part of what the
>   caller wants.
> 
>   In "kzfree()", the z is actively detrimental, because maybe in the
>   future we really _might_ want to use that "memfill(0xdeadbeef)" or
>   something. The "zero" part of the interface isn't even _relevant_.
> 
> The main reason that kzfree() exists is to clear sensitive information
> that should not be leaked to other future users of the same memory
> objects.
> 
> Rename kzfree() to kfree_sensitive() to follow the example of the
> recently added kvfree_sensitive() and make the intention of the API
> more explicit. In addition, memzero_explicit() is used to clear the
> memory to make sure that it won't get optimized away by the compiler.
> 
> The renaming is done by using the command sequence:
> 
>   git grep -w --name-only kzfree |\
>   xargs sed -i 's/\bkzfree\b/kfree_sensitive/'
> 
> followed by some editing of the kfree_sensitive() kerneldoc and adding
> a kzfree backward compatibility macro in slab.h.
> 
> ...
>
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -186,10 +186,12 @@ void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *);
>   */
>  void * __must_check krealloc(const void *, size_t, gfp_t);
>  void kfree(const void *);
> -void kzfree(const void *);
> +void kfree_sensitive(const void *);
>  size_t __ksize(const void *);
>  size_t ksize(const void *);
>  
> +#define kzfree(x)	kfree_sensitive(x)	/* For backward compatibility */
> +

What was the thinking here?  Is this really necessary?

I suppose we could keep this around for a while to ease migration.  But
not for too long, please.

^ permalink raw reply

* [Patch net] cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
From: Cong Wang @ 2020-06-16 18:03 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, Cameron Berkenpas, Peter Geis, Lu Fengqi,
	Daniël Sonck, Daniel Borkmann, Zefan Li, Tejun Heo

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

Fixes: 090e28b229af92dc5b ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/linux/cgroup.h |  2 ++
 kernel/cgroup/cgroup.c | 26 ++++++++++++++------------
 net/core/sock.c        |  2 +-
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 4598e4da6b1b..818dc7b3ed6c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -822,6 +822,7 @@ extern spinlock_t cgroup_sk_update_lock;
 
 void cgroup_sk_alloc_disable(void);
 void cgroup_sk_alloc(struct sock_cgroup_data *skcd);
+void cgroup_sk_clone(struct sock_cgroup_data *skcd);
 void cgroup_sk_free(struct sock_cgroup_data *skcd);
 
 static inline struct cgroup *sock_cgroup_ptr(struct sock_cgroup_data *skcd)
@@ -847,6 +848,7 @@ static inline struct cgroup *sock_cgroup_ptr(struct sock_cgroup_data *skcd)
 #else	/* CONFIG_CGROUP_DATA */
 
 static inline void cgroup_sk_alloc(struct sock_cgroup_data *skcd) {}
+static inline void cgroup_sk_clone(struct sock_cgroup_data *skcd) {}
 static inline void cgroup_sk_free(struct sock_cgroup_data *skcd) {}
 
 #endif	/* CONFIG_CGROUP_DATA */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1ea181a58465..6377045b7096 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6442,18 +6442,6 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	if (cgroup_sk_alloc_disabled)
 		return;
 
-	/* Socket clone path */
-	if (skcd->val) {
-		/*
-		 * We might be cloning a socket which is left in an empty
-		 * cgroup and the cgroup might have already been rmdir'd.
-		 * Don't use cgroup_get_live().
-		 */
-		cgroup_get(sock_cgroup_ptr(skcd));
-		cgroup_bpf_get(sock_cgroup_ptr(skcd));
-		return;
-	}
-
 	/* Don't associate the sock with unrelated interrupted task's cgroup. */
 	if (in_interrupt())
 		return;
@@ -6475,6 +6463,20 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	rcu_read_unlock();
 }
 
+void cgroup_sk_clone(struct sock_cgroup_data *skcd)
+{
+	/* Socket clone path */
+	if (skcd->val) {
+		/*
+		 * We might be cloning a socket which is left in an empty
+		 * cgroup and the cgroup might have already been rmdir'd.
+		 * Don't use cgroup_get_live().
+		 */
+		cgroup_get(sock_cgroup_ptr(skcd));
+		cgroup_bpf_get(sock_cgroup_ptr(skcd));
+	}
+}
+
 void cgroup_sk_free(struct sock_cgroup_data *skcd)
 {
 	struct cgroup *cgrp = sock_cgroup_ptr(skcd);
diff --git a/net/core/sock.c b/net/core/sock.c
index 6c4acf1f0220..b62f06fa5e37 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1925,7 +1925,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		/* sk->sk_memcg will be populated at accept() time */
 		newsk->sk_memcg = NULL;
 
-		cgroup_sk_alloc(&newsk->sk_cgrp_data);
+		cgroup_sk_clone(&newsk->sk_cgrp_data);
 
 		rcu_read_lock();
 		filter = rcu_dereference(sk->sk_filter);
-- 
2.26.2


^ permalink raw reply related

* [PATCH ipsec-next 10/10] xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

Similar to ip_vti, IPIP and IPIP6 tunnels processing can easily
be done with .cb_handler for xfrm interface.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/xfrm/xfrm_interface.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index 7be4d0d..0ddcd57 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -840,6 +840,18 @@ static struct xfrm4_protocol xfrmi_ipcomp4_protocol __read_mostly = {
 	.priority	=	10,
 };
 
+static int xfrmi4_rcv_tunnel(struct sk_buff *skb)
+{
+	return xfrm4_rcv_spi(skb, IPPROTO_IPIP, ip_hdr(skb)->saddr);
+}
+
+static struct xfrm_tunnel xfrmi_ipip_handler __read_mostly = {
+	.handler	=	xfrmi4_rcv_tunnel,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi4_err,
+	.priority	=	-1,
+};
+
 static int __init xfrmi4_init(void)
 {
 	int err;
@@ -853,9 +865,19 @@ static int __init xfrmi4_init(void)
 	err = xfrm4_protocol_register(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
 	if (err < 0)
 		goto xfrm_proto_comp_failed;
+	err = xfrm4_tunnel_register(&xfrmi_ipip_handler, AF_INET);
+	if (err < 0)
+		goto xfrm_tunnel_ipip_failed;
+	err = xfrm4_tunnel_register(&xfrmi_ipip_handler, AF_INET6);
+	if (err < 0)
+		goto xfrm_tunnel_ipip6_failed;
 
 	return 0;
 
+xfrm_tunnel_ipip6_failed:
+	xfrm4_tunnel_deregister(&xfrmi_ipip_handler, AF_INET);
+xfrm_tunnel_ipip_failed:
+	xfrm4_protocol_deregister(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
 	xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
 xfrm_proto_ah_failed:
@@ -866,6 +888,8 @@ static int __init xfrmi4_init(void)
 
 static void xfrmi4_fini(void)
 {
+	xfrm4_tunnel_deregister(&xfrmi_ipip_handler, AF_INET6);
+	xfrm4_tunnel_deregister(&xfrmi_ipip_handler, AF_INET);
 	xfrm4_protocol_deregister(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
 	xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
 	xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 09/10] xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

Similar to ip6_vti, IP6IP6 and IP6IP tunnels processing can easily
be done with .cb_handler for xfrm interface.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/xfrm/xfrm_interface.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index c407ecb..7be4d0d 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -798,6 +798,24 @@ static struct xfrm6_protocol xfrmi_ipcomp6_protocol __read_mostly = {
 	.priority	=	10,
 };
 
+static int xfrmi6_rcv_tunnel(struct sk_buff *skb)
+{
+	const xfrm_address_t *saddr;
+	__be32 spi;
+
+	saddr = (const xfrm_address_t *)&ipv6_hdr(skb)->saddr;
+	spi = xfrm6_tunnel_spi_lookup(dev_net(skb->dev), saddr);
+
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
+}
+
+static struct xfrm6_tunnel xfrmi_ipv6_handler __read_mostly = {
+	.handler	=	xfrmi6_rcv_tunnel,
+	.cb_handler	=	xfrmi_rcv_cb,
+	.err_handler	=	xfrmi6_err,
+	.priority	=	-1,
+};
+
 static struct xfrm4_protocol xfrmi_esp4_protocol __read_mostly = {
 	.handler	=	xfrm4_rcv,
 	.input_handler	=	xfrm_input,
@@ -866,9 +884,19 @@ static int __init xfrmi6_init(void)
 	err = xfrm6_protocol_register(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
 	if (err < 0)
 		goto xfrm_proto_comp_failed;
+	err = xfrm6_tunnel_register(&xfrmi_ipv6_handler, AF_INET6);
+	if (err < 0)
+		goto xfrm_tunnel_ipv6_failed;
+	err = xfrm6_tunnel_register(&xfrmi_ipv6_handler, AF_INET);
+	if (err < 0)
+		goto xfrm_tunnel_ip6ip_failed;
 
 	return 0;
 
+xfrm_tunnel_ip6ip_failed:
+	xfrm6_tunnel_deregister(&xfrmi_ipv6_handler, AF_INET6);
+xfrm_tunnel_ipv6_failed:
+	xfrm6_protocol_deregister(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
 	xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
 xfrm_proto_ah_failed:
@@ -879,6 +907,8 @@ static int __init xfrmi6_init(void)
 
 static void xfrmi6_fini(void)
 {
+	xfrm6_tunnel_deregister(&xfrmi_ipv6_handler, AF_INET);
+	xfrm6_tunnel_deregister(&xfrmi_ipv6_handler, AF_INET6);
 	xfrm6_protocol_deregister(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
 	xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
 	xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 08/10] ipcomp: assign if_id to child tunnel from parent tunnel
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

The child tunnel if_id will be used for xfrm interface's lookup
when processing the IP(6)IP(6) packets in the next patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ipcomp.c  | 1 +
 net/ipv6/ipcomp6.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index 59bfa38..b426832 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -72,6 +72,7 @@ static struct xfrm_state *ipcomp_tunnel_create(struct xfrm_state *x)
 	t->props.flags = x->props.flags;
 	t->props.extra_flags = x->props.extra_flags;
 	memcpy(&t->mark, &x->mark, sizeof(t->mark));
+	t->if_id = x->if_id;
 
 	if (xfrm_init_state(t))
 		goto error;
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 99668bf..daef890 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -91,6 +91,7 @@ static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x)
 	t->props.mode = x->props.mode;
 	memcpy(t->props.saddr.a6, x->props.saddr.a6, sizeof(struct in6_addr));
 	memcpy(&t->mark, &x->mark, sizeof(t->mark));
+	t->if_id = x->if_id;
 
 	if (xfrm_init_state(t))
 		goto error;
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 07/10] ip6_vti: support IP6IP tunnel processing
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

For IP6IP tunnel processing, the functions called will be the
same as that for IP6IP6 tunnel's. So reuse it and register it
with family == AF_INET.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv6/ip6_vti.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 2161648..d45111d 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1264,7 +1264,10 @@ static int __init vti6_tunnel_init(void)
 	msg = "ipv6 tunnel";
 	err = xfrm6_tunnel_register(&vti_ipv6_handler, AF_INET6);
 	if (err < 0)
-		goto vti_tunnel_failed;
+		goto vti_tunnel_ipv6_failed;
+	err = xfrm6_tunnel_register(&vti_ipv6_handler, AF_INET);
+	if (err < 0)
+		goto vti_tunnel_ip6ip_failed;
 
 	msg = "netlink interface";
 	err = rtnl_link_register(&vti6_link_ops);
@@ -1274,8 +1277,10 @@ static int __init vti6_tunnel_init(void)
 	return 0;
 
 rtnl_link_failed:
+	err = xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET);
+vti_tunnel_ip6ip_failed:
 	err = xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET6);
-vti_tunnel_failed:
+vti_tunnel_ipv6_failed:
 	xfrm6_protocol_deregister(&vti_ipcomp6_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
 	xfrm6_protocol_deregister(&vti_ah6_protocol, IPPROTO_AH);
@@ -1294,6 +1299,7 @@ static int __init vti6_tunnel_init(void)
 static void __exit vti6_tunnel_cleanup(void)
 {
 	rtnl_link_unregister(&vti6_link_ops);
+	xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET);
 	xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET6);
 	xfrm6_protocol_deregister(&vti_ipcomp6_protocol, IPPROTO_COMP);
 	xfrm6_protocol_deregister(&vti_ah6_protocol, IPPROTO_AH);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 06/10] ip6_vti: support IP6IP6 tunnel processing with .cb_handler
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

Similar to IPIP tunnel's processing, this patch is to support
IP6IP6 tunnel processing with .cb_handler.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv6/ip6_vti.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 1147f64..2161648 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -343,6 +343,17 @@ static int vti6_rcv(struct sk_buff *skb)
 	return vti6_input_proto(skb, nexthdr, 0, 0);
 }
 
+static int vti6_rcv_tunnel(struct sk_buff *skb)
+{
+	const xfrm_address_t *saddr;
+	__be32 spi;
+
+	saddr = (const xfrm_address_t *)&ipv6_hdr(skb)->saddr;
+	spi = xfrm6_tunnel_spi_lookup(dev_net(skb->dev), saddr);
+
+	return vti6_input_proto(skb, IPPROTO_IPV6, spi, 0);
+}
+
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
 {
 	unsigned short family;
@@ -1218,6 +1229,13 @@ static struct xfrm6_protocol vti_ipcomp6_protocol __read_mostly = {
 	.priority	=	100,
 };
 
+static struct xfrm6_tunnel vti_ipv6_handler __read_mostly = {
+	.handler	=	vti6_rcv_tunnel,
+	.cb_handler	=	vti6_rcv_cb,
+	.err_handler	=	vti6_err,
+	.priority	=	0,
+};
+
 /**
  * vti6_tunnel_init - register protocol and reserve needed resources
  *
@@ -1243,6 +1261,10 @@ static int __init vti6_tunnel_init(void)
 	err = xfrm6_protocol_register(&vti_ipcomp6_protocol, IPPROTO_COMP);
 	if (err < 0)
 		goto xfrm_proto_comp_failed;
+	msg = "ipv6 tunnel";
+	err = xfrm6_tunnel_register(&vti_ipv6_handler, AF_INET6);
+	if (err < 0)
+		goto vti_tunnel_failed;
 
 	msg = "netlink interface";
 	err = rtnl_link_register(&vti6_link_ops);
@@ -1252,6 +1274,8 @@ static int __init vti6_tunnel_init(void)
 	return 0;
 
 rtnl_link_failed:
+	err = xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET6);
+vti_tunnel_failed:
 	xfrm6_protocol_deregister(&vti_ipcomp6_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
 	xfrm6_protocol_deregister(&vti_ah6_protocol, IPPROTO_AH);
@@ -1270,6 +1294,7 @@ static int __init vti6_tunnel_init(void)
 static void __exit vti6_tunnel_cleanup(void)
 {
 	rtnl_link_unregister(&vti6_link_ops);
+	xfrm6_tunnel_deregister(&vti_ipv6_handler, AF_INET6);
 	xfrm6_protocol_deregister(&vti_ipcomp6_protocol, IPPROTO_COMP);
 	xfrm6_protocol_deregister(&vti_ah6_protocol, IPPROTO_AH);
 	xfrm6_protocol_deregister(&vti_esp6_protocol, IPPROTO_ESP);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 05/10] ip_vti: support IPIP6 tunnel processing
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

For IPIP6 tunnel processing, the functions called will be the
same as that for IPIP tunnel's. So reuse it and register it
with family == AF_INET6.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ip_vti.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index fd762d9..f0bd680 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -655,7 +655,12 @@ static int __init vti_init(void)
 	msg = "ipip tunnel";
 	err = xfrm4_tunnel_register(&vti_ipip_handler, AF_INET);
 	if (err < 0)
-		goto xfrm_tunnel_failed;
+		goto xfrm_tunnel_ipip_failed;
+#if IS_ENABLED(CONFIG_IPV6)
+	err = xfrm4_tunnel_register(&vti_ipip_handler, AF_INET6);
+	if (err < 0)
+		goto xfrm_tunnel_ipip6_failed;
+#endif
 
 	msg = "netlink interface";
 	err = rtnl_link_register(&vti_link_ops);
@@ -665,8 +670,12 @@ static int __init vti_init(void)
 	return err;
 
 rtnl_link_failed:
+#if IS_ENABLED(CONFIG_IPV6)
+	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET6);
+xfrm_tunnel_ipip6_failed:
+#endif
 	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET);
-xfrm_tunnel_failed:
+xfrm_tunnel_ipip_failed:
 	xfrm4_protocol_deregister(&vti_ipcomp4_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
 	xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
@@ -682,6 +691,9 @@ static int __init vti_init(void)
 static void __exit vti_fini(void)
 {
 	rtnl_link_unregister(&vti_link_ops);
+#if IS_ENABLED(CONFIG_IPV6)
+	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET6);
+#endif
 	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET);
 	xfrm4_protocol_deregister(&vti_ipcomp4_protocol, IPPROTO_COMP);
 	xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 04/10] ip_vti: support IPIP tunnel processing with .cb_handler
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

With tunnel4_input_afinfo added, IPIP tunnel processing in
ip_vti can be easily done with .cb_handler. So replace the
processing by calling ip_tunnel_rcv() with it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/ip_vti.c | 33 ++++++++-------------------------
 1 file changed, 8 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 1d9c8cf..fd762d9 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -93,28 +93,10 @@ static int vti_rcv_proto(struct sk_buff *skb)
 
 static int vti_rcv_tunnel(struct sk_buff *skb)
 {
-	struct ip_tunnel_net *itn = net_generic(dev_net(skb->dev), vti_net_id);
-	const struct iphdr *iph = ip_hdr(skb);
-	struct ip_tunnel *tunnel;
-
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
-				  iph->saddr, iph->daddr, 0);
-	if (tunnel) {
-		struct tnl_ptk_info tpi = {
-			.proto = htons(ETH_P_IP),
-		};
-
-		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
-			goto drop;
-		if (iptunnel_pull_header(skb, 0, tpi.proto, false))
-			goto drop;
-		return ip_tunnel_rcv(tunnel, skb, &tpi, NULL, false);
-	}
+	XFRM_SPI_SKB_CB(skb)->family = AF_INET;
+	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
 
-	return -EINVAL;
-drop:
-	kfree_skb(skb);
-	return 0;
+	return vti_input(skb, IPPROTO_IPIP, ip_hdr(skb)->saddr, 0, false);
 }
 
 static int vti_rcv_cb(struct sk_buff *skb, int err)
@@ -495,8 +477,9 @@ static struct xfrm4_protocol vti_ipcomp4_protocol __read_mostly = {
 	.priority	=	100,
 };
 
-static struct xfrm_tunnel ipip_handler __read_mostly = {
+static struct xfrm_tunnel vti_ipip_handler __read_mostly = {
 	.handler	=	vti_rcv_tunnel,
+	.cb_handler	=	vti_rcv_cb,
 	.err_handler	=	vti4_err,
 	.priority	=	0,
 };
@@ -670,7 +653,7 @@ static int __init vti_init(void)
 		goto xfrm_proto_comp_failed;
 
 	msg = "ipip tunnel";
-	err = xfrm4_tunnel_register(&ipip_handler, AF_INET);
+	err = xfrm4_tunnel_register(&vti_ipip_handler, AF_INET);
 	if (err < 0)
 		goto xfrm_tunnel_failed;
 
@@ -682,7 +665,7 @@ static int __init vti_init(void)
 	return err;
 
 rtnl_link_failed:
-	xfrm4_tunnel_deregister(&ipip_handler, AF_INET);
+	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET);
 xfrm_tunnel_failed:
 	xfrm4_protocol_deregister(&vti_ipcomp4_protocol, IPPROTO_COMP);
 xfrm_proto_comp_failed:
@@ -699,7 +682,7 @@ static int __init vti_init(void)
 static void __exit vti_fini(void)
 {
 	rtnl_link_unregister(&vti_link_ops);
-	xfrm4_tunnel_deregister(&ipip_handler, AF_INET);
+	xfrm4_tunnel_deregister(&vti_ipip_handler, AF_INET);
 	xfrm4_protocol_deregister(&vti_ipcomp4_protocol, IPPROTO_COMP);
 	xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
 	xfrm4_protocol_deregister(&vti_esp4_protocol, IPPROTO_ESP);
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 03/10] tunnel6: add tunnel6_input_afinfo for ipip and ipv6 tunnels
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

This patch is to register a callback function tunnel6_rcv_cb with
is_ipip set in a xfrm_input_afinfo object for tunnel6 and tunnel46.

It will be called by xfrm_rcv_cb() from xfrm_input() when family
is AF_INET6 and proto is IPPROTO_IPIP or IPPROTO_IPV6.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/xfrm.h |  1 +
 net/ipv6/tunnel6.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index c69410d..ffee126 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1424,6 +1424,7 @@ struct xfrm_tunnel {
 
 struct xfrm6_tunnel {
 	int (*handler)(struct sk_buff *skb);
+	int (*cb_handler)(struct sk_buff *skb, int err);
 	int (*err_handler)(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			   u8 type, u8 code, int offset, __be32 info);
 	struct xfrm6_tunnel __rcu *next;
diff --git a/net/ipv6/tunnel6.c b/net/ipv6/tunnel6.c
index 06c02eb..7bf4c27 100644
--- a/net/ipv6/tunnel6.c
+++ b/net/ipv6/tunnel6.c
@@ -155,6 +155,30 @@ static int tunnel6_rcv(struct sk_buff *skb)
 	return 0;
 }
 
+static int tunnel6_rcv_cb(struct sk_buff *skb, u8 proto, int err)
+{
+	struct xfrm6_tunnel *head, *handler;
+	int ret;
+
+	head = (proto == IPPROTO_IPV6) ? tunnel6_handlers : tunnel46_handlers;
+
+	for_each_tunnel_rcu(head, handler) {
+		if (handler->cb_handler) {
+			ret = handler->cb_handler(skb, err);
+			if (ret <= 0)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static const struct xfrm_input_afinfo tunnel6_input_afinfo = {
+	.family		=	AF_INET6,
+	.is_ipip	=	true,
+	.callback	=	tunnel6_rcv_cb,
+};
+
 static int tunnel46_rcv(struct sk_buff *skb)
 {
 	struct xfrm6_tunnel *handler;
@@ -245,11 +269,13 @@ static int __init tunnel6_init(void)
 		inet6_del_protocol(&tunnel46_protocol, IPPROTO_IPIP);
 		return -EAGAIN;
 	}
+	xfrm_input_register_afinfo(&tunnel6_input_afinfo);
 	return 0;
 }
 
 static void __exit tunnel6_fini(void)
 {
+	xfrm_input_unregister_afinfo(&tunnel6_input_afinfo);
 	if (inet6_del_protocol(&tunnel46_protocol, IPPROTO_IPIP))
 		pr_err("%s: can't remove protocol\n", __func__);
 	if (inet6_del_protocol(&tunnel6_protocol, IPPROTO_IPV6))
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 02/10] tunnel4: add cb_handler to struct xfrm_tunnel
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

This patch is to register a callback function tunnel4_rcv_cb with
is_ipip set in a xfrm_input_afinfo object for tunnel4 and tunnel64.

It will be called by xfrm_rcv_cb() from xfrm_input() when family
is AF_INET and proto is IPPROTO_IPIP or IPPROTO_IPV6.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/xfrm.h |  1 +
 net/ipv4/tunnel4.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 4aa118d..c69410d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1415,6 +1415,7 @@ struct xfrm6_protocol {
 /* XFRM tunnel handlers.  */
 struct xfrm_tunnel {
 	int (*handler)(struct sk_buff *skb);
+	int (*cb_handler)(struct sk_buff *skb, int err);
 	int (*err_handler)(struct sk_buff *skb, u32 info);
 
 	struct xfrm_tunnel __rcu *next;
diff --git a/net/ipv4/tunnel4.c b/net/ipv4/tunnel4.c
index c4b2ccb..5968d78 100644
--- a/net/ipv4/tunnel4.c
+++ b/net/ipv4/tunnel4.c
@@ -110,6 +110,30 @@ static int tunnel4_rcv(struct sk_buff *skb)
 	return 0;
 }
 
+static int tunnel4_rcv_cb(struct sk_buff *skb, u8 proto, int err)
+{
+	struct xfrm_tunnel *head, *handler;
+	int ret;
+
+	head = (proto == IPPROTO_IPIP) ? tunnel4_handlers : tunnel64_handlers;
+
+	for_each_tunnel_rcu(head, handler) {
+		if (handler->cb_handler) {
+			ret = handler->cb_handler(skb, err);
+			if (ret <= 0)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static const struct xfrm_input_afinfo tunnel4_input_afinfo = {
+	.family		=	AF_INET,
+	.is_ipip	=	true,
+	.callback	=	tunnel4_rcv_cb,
+};
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int tunnel64_rcv(struct sk_buff *skb)
 {
@@ -231,6 +255,7 @@ static int __init tunnel4_init(void)
 		goto err;
 	}
 #endif
+	xfrm_input_register_afinfo(&tunnel4_input_afinfo);
 	return 0;
 
 err:
@@ -240,6 +265,7 @@ static int __init tunnel4_init(void)
 
 static void __exit tunnel4_fini(void)
 {
+	xfrm_input_unregister_afinfo(&tunnel4_input_afinfo);
 #if IS_ENABLED(CONFIG_MPLS)
 	if (inet_del_protocol(&tunnelmpls4_protocol, IPPROTO_MPLS))
 		pr_err("tunnelmpls4 close: can't remove protocol\n");
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 01/10] xfrm: add is_ipip to struct xfrm_input_afinfo
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca
In-Reply-To: <cover.1592328814.git.lucien.xin@gmail.com>

This patch is to add a new member is_ipip to struct xfrm_input_afinfo,
to allow another group family of callback functions to be registered
with is_ipip set.

This will be used for doing a callback for struct xfrm(6)_tunnel of
ipip/ipv6 tunnels in xfrm_input() by calling xfrm_rcv_cb(), which is
needed by ipip/ipv6 tunnels' support in ip(6)_vti and xfrm interface
in the next patches.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/xfrm.h    |  3 ++-
 net/xfrm/xfrm_input.c | 24 +++++++++++++-----------
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 094fe68..4aa118d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -372,7 +372,8 @@ struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned int family);
 struct xfrm_state_afinfo *xfrm_state_afinfo_get_rcu(unsigned int family);
 
 struct xfrm_input_afinfo {
-	unsigned int		family;
+	u8			family;
+	bool			is_ipip;
 	int			(*callback)(struct sk_buff *skb, u8 protocol,
 					    int err);
 };
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index bd984ff..37456d0 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -42,7 +42,7 @@ struct xfrm_trans_cb {
 #define XFRM_TRANS_SKB_CB(__skb) ((struct xfrm_trans_cb *)&((__skb)->cb[0]))
 
 static DEFINE_SPINLOCK(xfrm_input_afinfo_lock);
-static struct xfrm_input_afinfo const __rcu *xfrm_input_afinfo[AF_INET6 + 1];
+static struct xfrm_input_afinfo const __rcu *xfrm_input_afinfo[2][AF_INET6 + 1];
 
 static struct gro_cells gro_cells;
 static struct net_device xfrm_napi_dev;
@@ -53,14 +53,14 @@ int xfrm_input_register_afinfo(const struct xfrm_input_afinfo *afinfo)
 {
 	int err = 0;
 
-	if (WARN_ON(afinfo->family >= ARRAY_SIZE(xfrm_input_afinfo)))
+	if (WARN_ON(afinfo->family > AF_INET6))
 		return -EAFNOSUPPORT;
 
 	spin_lock_bh(&xfrm_input_afinfo_lock);
-	if (unlikely(xfrm_input_afinfo[afinfo->family] != NULL))
+	if (unlikely(xfrm_input_afinfo[afinfo->is_ipip][afinfo->family]))
 		err = -EEXIST;
 	else
-		rcu_assign_pointer(xfrm_input_afinfo[afinfo->family], afinfo);
+		rcu_assign_pointer(xfrm_input_afinfo[afinfo->is_ipip][afinfo->family], afinfo);
 	spin_unlock_bh(&xfrm_input_afinfo_lock);
 	return err;
 }
@@ -71,11 +71,11 @@ int xfrm_input_unregister_afinfo(const struct xfrm_input_afinfo *afinfo)
 	int err = 0;
 
 	spin_lock_bh(&xfrm_input_afinfo_lock);
-	if (likely(xfrm_input_afinfo[afinfo->family] != NULL)) {
-		if (unlikely(xfrm_input_afinfo[afinfo->family] != afinfo))
+	if (likely(xfrm_input_afinfo[afinfo->is_ipip][afinfo->family])) {
+		if (unlikely(xfrm_input_afinfo[afinfo->is_ipip][afinfo->family] != afinfo))
 			err = -EINVAL;
 		else
-			RCU_INIT_POINTER(xfrm_input_afinfo[afinfo->family], NULL);
+			RCU_INIT_POINTER(xfrm_input_afinfo[afinfo->is_ipip][afinfo->family], NULL);
 	}
 	spin_unlock_bh(&xfrm_input_afinfo_lock);
 	synchronize_rcu();
@@ -83,15 +83,15 @@ int xfrm_input_unregister_afinfo(const struct xfrm_input_afinfo *afinfo)
 }
 EXPORT_SYMBOL(xfrm_input_unregister_afinfo);
 
-static const struct xfrm_input_afinfo *xfrm_input_get_afinfo(unsigned int family)
+static const struct xfrm_input_afinfo *xfrm_input_get_afinfo(u8 family, bool is_ipip)
 {
 	const struct xfrm_input_afinfo *afinfo;
 
-	if (WARN_ON_ONCE(family >= ARRAY_SIZE(xfrm_input_afinfo)))
+	if (WARN_ON_ONCE(family > AF_INET6))
 		return NULL;
 
 	rcu_read_lock();
-	afinfo = rcu_dereference(xfrm_input_afinfo[family]);
+	afinfo = rcu_dereference(xfrm_input_afinfo[is_ipip][family]);
 	if (unlikely(!afinfo))
 		rcu_read_unlock();
 	return afinfo;
@@ -100,9 +100,11 @@ static const struct xfrm_input_afinfo *xfrm_input_get_afinfo(unsigned int family
 static int xfrm_rcv_cb(struct sk_buff *skb, unsigned int family, u8 protocol,
 		       int err)
 {
+	bool is_ipip = (protocol == IPPROTO_IPIP || protocol == IPPROTO_IPV6);
+	const struct xfrm_input_afinfo *afinfo;
 	int ret;
-	const struct xfrm_input_afinfo *afinfo = xfrm_input_get_afinfo(family);
 
+	afinfo = xfrm_input_get_afinfo(family, is_ipip);
 	if (!afinfo)
 		return -EAFNOSUPPORT;
 
-- 
2.1.0


^ permalink raw reply related

* [PATCH ipsec-next 00/10] xfrm: support ipip and ipv6 tunnels in vti and xfrmi
From: Xin Long @ 2020-06-16 17:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, Sabrina Dubroca

Now ipip and ipv6 tunnels processing is supported by xfrm4/6_tunnel,
but not in vti and xfrmi. This feature is needed by processing those
uncompressed small fragments and packets when using comp protocol.
It means vti and xfrmi won't be able to accept small fragments or
packets when using comp protocol, which is not expected.

xfrm4/6_tunnel eventually calls xfrm_input() to process ipip and ipv6
tunnels with an ipip/ipv6-proto state (a child state of comp-proto
state), and vti and xfrmi should do the same.

The extra things for vti to do is:

  - vti_input() should be called before xfrm_input() to set
    XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4/6 = tunnel. [A]

  - vti_rcv_cb() should be called after xfrm_input() to update
    the skb->dev. [B]

And the extra things for xfrmi to do is:

   - The ipip/ipv6-proto state should be assigned if_id from its
     parent's state. [C]

   - xfrmi_rcv_cb() should be called after xfrm_input() to update
     the skb->dev. [D]


Patch 4-7 does the things in [A].

To implement [B] and [D], patch 1-3 is to build a callback function
for xfrm4/6_tunnel, which can be called after xfrm_input(), similar
to xfrm4/6_protocol's .cb_handler. vti and xfrmi only needs to give
their own callback function in patch 4-7 and 9-10, which already
exists: vti_rcv_cb() and xfrmi_rcv_cb().

Patch 8 is to do the thing in [C] by assigning child tunnel's if_id
from its parent tunnel.

With the whole patch series, the segments or packets with any size
can work with ipsec comp proto on vti and xfrmi.

Xin Long (10):
  xfrm: add is_ipip to struct xfrm_input_afinfo
  tunnel4: add cb_handler to struct xfrm_tunnel
  tunnel6: add tunnel6_input_afinfo for ipip and ipv6 tunnels
  ip_vti: support IPIP tunnel processing with .cb_handler
  ip_vti: support IPIP6 tunnel processing
  ip6_vti: support IP6IP6 tunnel processing with .cb_handler
  ip6_vti: support IP6IP tunnel processing
  ipcomp: assign if_id to child tunnel from parent tunnel
  xfrm: interface: support IP6IP6 and IP6IP tunnels processing with
    .cb_handler
  xfrm: interface: support IPIP and IPIP6 tunnels processing with
    .cb_handler

 include/net/xfrm.h        |  5 ++++-
 net/ipv4/ip_vti.c         | 49 +++++++++++++++++++-----------------------
 net/ipv4/ipcomp.c         |  1 +
 net/ipv4/tunnel4.c        | 26 +++++++++++++++++++++++
 net/ipv6/ip6_vti.c        | 31 +++++++++++++++++++++++++++
 net/ipv6/ipcomp6.c        |  1 +
 net/ipv6/tunnel6.c        | 26 +++++++++++++++++++++++
 net/xfrm/xfrm_input.c     | 24 +++++++++++----------
 net/xfrm/xfrm_interface.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 178 insertions(+), 39 deletions(-)

-- 
2.1.0


^ permalink raw reply

* [PATCH] bpf: Allow small structs to be type of function argument
From: Jiri Olsa @ 2020-06-16 17:35 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller, John Fastabend, Jesper Dangaard Brouer,
	Andrii Nakryiko, KP Singh, Masanori Misono

This way we can have trampoline on function
that has arguments with types like:

  kuid_t uid
  kgid_t gid

which unwind into small structs like:

  typedef struct {
        uid_t val;
  } kuid_t;

  typedef struct {
        gid_t val;
  } kgid_t;

And we can use them in bpftrace like:
(assuming d_path changes are in)

  # bpftrace -e 'lsm:path_chown { printf("uid %d, gid %d\n", args->uid, args->gid) }'
  Attaching 1 probe...
  uid 0, gid 0
  uid 1000, gid 1000
  ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/btf.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 58c9af1d4808..f8fee5833684 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -362,6 +362,14 @@ static bool btf_type_is_struct(const struct btf_type *t)
 	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
 }
 
+/* type is struct and its size is within 8 bytes
+ * and it can be value of function argument
+ */
+static bool btf_type_is_struct_arg(const struct btf_type *t)
+{
+	return btf_type_is_struct(t) && (t->size <= sizeof(u64));
+}
+
 static bool __btf_type_is_struct(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
@@ -3768,7 +3776,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	/* skip modifiers */
 	while (btf_type_is_modifier(t))
 		t = btf_type_by_id(btf, t->type);
-	if (btf_type_is_int(t) || btf_type_is_enum(t))
+	if (btf_type_is_int(t) || btf_type_is_enum(t) || btf_type_is_struct_arg(t))
 		/* accessing a scalar */
 		return true;
 	if (!btf_type_is_ptr(t)) {
@@ -4161,6 +4169,8 @@ static int __get_type_size(struct btf *btf, u32 btf_id,
 		return sizeof(void *);
 	if (btf_type_is_int(t) || btf_type_is_enum(t))
 		return t->size;
+	if (btf_type_is_struct_arg(t))
+		return t->size;
 	*bad_type = t;
 	return -EINVAL;
 }
-- 
2.25.4


^ permalink raw reply related

* [PATCH net v3] ip_tunnel: fix use-after-free in ip_tunnel_lookup()
From: Taehee Yoo @ 2020-06-16 16:51 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: ap420073, pshelar, eric.dumazet

In the datapath, the ip_tunnel_lookup() is used and it internally uses
fallback tunnel device pointer, which is fb_tunnel_dev.
This pointer variable should be set to NULL when a fb interface is deleted.
But there is no routine to set fb_tunnel_dev pointer to NULL.
So, this pointer will be still used after interface is deleted and
it eventually results in the use-after-free problem.

Test commands:
    ip netns add A
    ip netns add B
    ip link add eth0 type veth peer name eth1
    ip link set eth0 netns A
    ip link set eth1 netns B

    ip netns exec A ip link set lo up
    ip netns exec A ip link set eth0 up
    ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
	    remote 10.0.0.2
    ip netns exec A ip link set gre1 up
    ip netns exec A ip a a 10.0.100.1/24 dev gre1
    ip netns exec A ip a a 10.0.0.1/24 dev eth0

    ip netns exec B ip link set lo up
    ip netns exec B ip link set eth1 up
    ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
	    remote 10.0.0.1
    ip netns exec B ip link set gre1 up
    ip netns exec B ip a a 10.0.100.2/24 dev gre1
    ip netns exec B ip a a 10.0.0.2/24 dev eth1
    ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
    ip netns del B

Splat looks like:
[   77.793450][    C3] ==================================================================
[   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
[   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
[   77.796398][    C3]
[   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
[   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   77.798453][    C3] Call Trace:
[   77.798815][    C3]  <IRQ>
[   77.799142][    C3]  dump_stack+0x9d/0xdb
[   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
[   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
[   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
[   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
[   77.802145][    C3]  kasan_report+0x154/0x190
[   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
[   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
[   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
[   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
[   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
[   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
[   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
[   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
[   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
[   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
[   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
[ ... ]

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

v2 -> v3:
 - Remove invalid comment.
 - Update commit message.

v1 -> v2:
 - Do not add a new variable.

 net/ipv4/ip_tunnel.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index f4f1d11eab50..0c1f36404471 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -85,9 +85,10 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 				   __be32 remote, __be32 local,
 				   __be32 key)
 {
-	unsigned int hash;
 	struct ip_tunnel *t, *cand = NULL;
 	struct hlist_head *head;
+	struct net_device *ndev;
+	unsigned int hash;
 
 	hash = ip_tunnel_hash(key, remote);
 	head = &itn->tunnels[hash];
@@ -162,8 +163,9 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 	if (t && t->dev->flags & IFF_UP)
 		return t;
 
-	if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP)
-		return netdev_priv(itn->fb_tunnel_dev);
+	ndev = READ_ONCE(itn->fb_tunnel_dev);
+	if (ndev && ndev->flags & IFF_UP)
+		return netdev_priv(ndev);
 
 	return NULL;
 }
@@ -1259,9 +1261,9 @@ void ip_tunnel_uninit(struct net_device *dev)
 	struct ip_tunnel_net *itn;
 
 	itn = net_generic(net, tunnel->ip_tnl_net_id);
-	/* fb_tunnel_dev will be unregisted in net-exit call. */
-	if (itn->fb_tunnel_dev != dev)
-		ip_tunnel_del(itn, netdev_priv(dev));
+	ip_tunnel_del(itn, netdev_priv(dev));
+	if (itn->fb_tunnel_dev == dev)
+		WRITE_ONCE(itn->fb_tunnel_dev, NULL);
 
 	dst_cache_reset(&tunnel->dst_cache);
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net v5 0/3] esp, ah: improve crypto algorithm selections
From: Eric Biggers @ 2020-06-16 16:51 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: netdev, linux-crypto, Corentin Labbe, Greg Kroah-Hartman,
	Herbert Xu
In-Reply-To: <20200616060257.GT19286@gauss3.secunet.de>

On Tue, Jun 16, 2020 at 08:02:58AM +0200, Steffen Klassert wrote:
> On Mon, Jun 15, 2020 at 03:13:15PM -0700, Eric Biggers wrote:
> > This series consolidates and modernizes the lists of crypto algorithms
> > that are selected by the IPsec kconfig options, and adds CRYPTO_SEQIV
> > since it no longer gets selected automatically by other things.
> > 
> > See previous discussion at
> > https://lkml.kernel.org/netdev/20200604192322.22142-1-ebiggers@kernel.org/T/#u
> > 
> > Changed v4 => v5:
> >   - Rebased onto latest net/master to resolve conflict with
> >     "treewide: replace '---help---' in Kconfig files with 'help'"
> 
> The target trees for IPsec patches is the ipsec and ipsec-next tree.
> I have the v4 patchset already in the testing branch of the ipsec tree
> and plan to merge it to master. This conflict has to be resolved
> when the ipsec tree is merged into the net tree.
> 

Okay, great!  I didn't know about the ipsec tree or that you had already applied
the patches.

- Eric

^ permalink raw reply

* Re: [PATCH net v2] ip_tunnel: fix use-after-free in ip_tunnel_lookup()
From: Taehee Yoo @ 2020-06-16 16:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jakub Kicinski, Netdev, pshelar
In-Reply-To: <fd5f2683-caac-6ed1-7da7-11e8fce2fe42@gmail.com>

On Wed, 17 Jun 2020 at 01:16, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>

Hi Eric,
Thank you for the review :)

>
> On 6/16/20 9:01 AM, Taehee Yoo wrote:
> > In the datapath, the ip_tunnel_lookup() is used and it internally uses
> > fallback tunnel device pointer, which is fb_tunnel_dev.
> > This pointer variable should be set to NULL when a fb interface is deleted.
> > But there is no routine to set fb_tunnel_dev pointer to NULL.
> > So, this pointer will be still used after interface is deleted and
> > it eventually results in the use-after-free problem.
> >
> > Test commands:
> >     ip netns add A
> >     ip netns add B
> >     ip link add eth0 type veth peer name eth1
> >     ip link set eth0 netns A
> >     ip link set eth1 netns B
> >
> >     ip netns exec A ip link set lo up
> >     ip netns exec A ip link set eth0 up
> >     ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
> >           remote 10.0.0.2
> >     ip netns exec A ip link set gre1 up
> >     ip netns exec A ip a a 10.0.100.1/24 dev gre1
> >     ip netns exec A ip a a 10.0.0.1/24 dev eth0
> >
> >     ip netns exec B ip link set lo up
> >     ip netns exec B ip link set eth1 up
> >     ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
> >           remote 10.0.0.1
> >     ip netns exec B ip link set gre1 up
> >     ip netns exec B ip a a 10.0.100.2/24 dev gre1
> >     ip netns exec B ip a a 10.0.0.2/24 dev eth1
> >     ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
> >     ip netns del B
> >
> > Splat looks like:
> > [  133.319668][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0x9d6/0xde0
> > [  133.343852][    C3] Read of size 4 at addr ffff8880b1701c84 by task hping3/1222
> > [  133.344724][    C3]
> > [  133.345002][    C3] CPU: 3 PID: 1222 Comm: hping3 Not tainted 5.7.0+ #591
> > [  133.345814][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> > [  133.373336][    C3] Call Trace:
> > [  133.374792][    C3]  <IRQ>
> > [  133.375205][    C3]  dump_stack+0x96/0xdb
> > [  133.375789][    C3]  print_address_description.constprop.6+0x2cc/0x450
> > [  133.376720][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.377431][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.378130][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.378851][    C3]  kasan_report+0x154/0x190
> > [  133.379494][    C3]  ? ip_tunnel_lookup+0x9d6/0xde0
> > [  133.380200][    C3]  ip_tunnel_lookup+0x9d6/0xde0
> > [  133.380894][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
> > [  133.381630][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
> > [  133.382429][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
> > [ ... ]
> >
> > Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v1 -> v2:
> >  - Do not add a new variable.
> >
> >  net/ipv4/ip_tunnel.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> > index f4f1d11eab50..701f150f11e1 100644
> > --- a/net/ipv4/ip_tunnel.c
> > +++ b/net/ipv4/ip_tunnel.c
> > @@ -85,9 +85,10 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
> >                                  __be32 remote, __be32 local,
> >                                  __be32 key)
> >  {
> > -     unsigned int hash;
> >       struct ip_tunnel *t, *cand = NULL;
> >       struct hlist_head *head;
> > +     struct net_device *ndev;
> > +     unsigned int hash;
> >
> >       hash = ip_tunnel_hash(key, remote);
> >       head = &itn->tunnels[hash];
> > @@ -162,8 +163,9 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
> >       if (t && t->dev->flags & IFF_UP)
> >               return t;
> >
> > -     if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP)
> > -             return netdev_priv(itn->fb_tunnel_dev);
> > +     ndev = READ_ONCE(itn->fb_tunnel_dev);
> > +     if (ndev && ndev->flags & IFF_UP)
> > +             return netdev_priv(ndev);
> >
> >       return NULL;
> >  }
> > @@ -1260,8 +1262,9 @@ void ip_tunnel_uninit(struct net_device *dev)
> >
> >       itn = net_generic(net, tunnel->ip_tnl_net_id);
> >       /* fb_tunnel_dev will be unregisted in net-exit call. */
>
> Is this comment still correct ?
>

I think this comment is not valid anymore.
I will remove this comment in a v3 patch.

Thanks a lot!
Taehee Yoo

^ permalink raw reply

* [PATCH v2] Bluetooth: Terminate the link if pairing is cancelled
From: Manish Mandlik @ 2020-06-16 16:28 UTC (permalink / raw)
  To: marcel, luiz.dentz
  Cc: linux-bluetooth, chromeos-bluetooth-upstreaming, Alain Michaud,
	Manish Mandlik, David S. Miller, Johan Hedberg, netdev,
	linux-kernel

If user decides to cancel the ongoing pairing process (e.g. by clicking
the cancel button on pairing/passkey window), abort any ongoing pairing
and then terminate the link if it was created because of the pair
device action.

Signed-off-by: Manish Mandlik <mmandlik@google.com>
---

Changes in v2:
- Added code to track if the connection was triggered because of the pair
  device action and then only terminate the link on pairing cancel.

 include/net/bluetooth/hci_core.h | 14 ++++++++++++--
 net/bluetooth/hci_conn.c         | 11 ++++++++---
 net/bluetooth/l2cap_core.c       |  6 ++++--
 net/bluetooth/mgmt.c             | 22 ++++++++++++++++++----
 4 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index f5b28c7cae9f2..236ffbc36b2c3 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -519,6 +519,12 @@ struct hci_dev {
 
 #define HCI_PHY_HANDLE(handle)	(handle & 0xff)
 
+enum conn_reasons {
+	CONN_REASON_PAIR_DEVICE,
+	CONN_REASON_L2CAP_CHAN,
+	CONN_REASON_SCO_CONNECT,
+};
+
 struct hci_conn {
 	struct list_head list;
 
@@ -567,6 +573,8 @@ struct hci_conn {
 	__s8		max_tx_power;
 	unsigned long	flags;
 
+	enum conn_reasons conn_reason;
+
 	__u32		clock;
 	__u16		clock_accuracy;
 
@@ -991,12 +999,14 @@ struct hci_chan *hci_chan_lookup_handle(struct hci_dev *hdev, __u16 handle);
 
 struct hci_conn *hci_connect_le_scan(struct hci_dev *hdev, bdaddr_t *dst,
 				     u8 dst_type, u8 sec_level,
-				     u16 conn_timeout);
+				     u16 conn_timeout,
+				     enum conn_reasons conn_reason);
 struct hci_conn *hci_connect_le(struct hci_dev *hdev, bdaddr_t *dst,
 				u8 dst_type, u8 sec_level, u16 conn_timeout,
 				u8 role, bdaddr_t *direct_rpa);
 struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst,
-				 u8 sec_level, u8 auth_type);
+				 u8 sec_level, u8 auth_type,
+				 enum conn_reasons conn_reason);
 struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst,
 				 __u16 setting);
 int hci_conn_check_link_mode(struct hci_conn *conn);
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 3ea1bdf5d1e35..1353d7e3f1012 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -1157,7 +1157,8 @@ static int hci_explicit_conn_params_set(struct hci_dev *hdev,
 /* This function requires the caller holds hdev->lock */
 struct hci_conn *hci_connect_le_scan(struct hci_dev *hdev, bdaddr_t *dst,
 				     u8 dst_type, u8 sec_level,
-				     u16 conn_timeout)
+				     u16 conn_timeout,
+				     enum conn_reasons conn_reason)
 {
 	struct hci_conn *conn;
 
@@ -1202,6 +1203,7 @@ struct hci_conn *hci_connect_le_scan(struct hci_dev *hdev, bdaddr_t *dst,
 	conn->sec_level = BT_SECURITY_LOW;
 	conn->pending_sec_level = sec_level;
 	conn->conn_timeout = conn_timeout;
+	conn->conn_reason = conn_reason;
 
 	hci_update_background_scan(hdev);
 
@@ -1211,7 +1213,8 @@ struct hci_conn *hci_connect_le_scan(struct hci_dev *hdev, bdaddr_t *dst,
 }
 
 struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst,
-				 u8 sec_level, u8 auth_type)
+				 u8 sec_level, u8 auth_type,
+				 enum conn_reasons conn_reason)
 {
 	struct hci_conn *acl;
 
@@ -1231,6 +1234,7 @@ struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst,
 
 	hci_conn_hold(acl);
 
+	acl->conn_reason = conn_reason;
 	if (acl->state == BT_OPEN || acl->state == BT_CLOSED) {
 		acl->sec_level = BT_SECURITY_LOW;
 		acl->pending_sec_level = sec_level;
@@ -1247,7 +1251,8 @@ struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst,
 	struct hci_conn *acl;
 	struct hci_conn *sco;
 
-	acl = hci_connect_acl(hdev, dst, BT_SECURITY_LOW, HCI_AT_NO_BONDING);
+	acl = hci_connect_acl(hdev, dst, BT_SECURITY_LOW, HCI_AT_NO_BONDING,
+			      CONN_REASON_SCO_CONNECT);
 	if (IS_ERR(acl))
 		return acl;
 
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index bdbf37337bc6c..ee71b68582f48 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -7224,11 +7224,13 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid,
 		else
 			hcon = hci_connect_le_scan(hdev, dst, dst_type,
 						   chan->sec_level,
-						   HCI_LE_CONN_TIMEOUT);
+						   HCI_LE_CONN_TIMEOUT,
+						   CONN_REASON_L2CAP_CHAN);
 
 	} else {
 		u8 auth_type = l2cap_get_auth_type(chan);
-		hcon = hci_connect_acl(hdev, dst, chan->sec_level, auth_type);
+		hcon = hci_connect_acl(hdev, dst, chan->sec_level, auth_type,
+				       CONN_REASON_L2CAP_CHAN);
 	}
 
 	if (IS_ERR(hcon)) {
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index db7023dfcd253..06cc8d30f8f00 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -2940,7 +2940,7 @@ static int pair_device(struct sock *sk, struct hci_dev *hdev, void *data,
 
 	if (cp->addr.type == BDADDR_BREDR) {
 		conn = hci_connect_acl(hdev, &cp->addr.bdaddr, sec_level,
-				       auth_type);
+				       auth_type, CONN_REASON_PAIR_DEVICE);
 	} else {
 		u8 addr_type = le_addr_type(cp->addr.type);
 		struct hci_conn_params *p;
@@ -2959,9 +2959,9 @@ static int pair_device(struct sock *sk, struct hci_dev *hdev, void *data,
 		if (p->auto_connect == HCI_AUTO_CONN_EXPLICIT)
 			p->auto_connect = HCI_AUTO_CONN_DISABLED;
 
-		conn = hci_connect_le_scan(hdev, &cp->addr.bdaddr,
-					   addr_type, sec_level,
-					   HCI_LE_CONN_TIMEOUT);
+		conn = hci_connect_le_scan(hdev, &cp->addr.bdaddr, addr_type,
+					   sec_level, HCI_LE_CONN_TIMEOUT,
+					   CONN_REASON_PAIR_DEVICE);
 	}
 
 	if (IS_ERR(conn)) {
@@ -3062,6 +3062,20 @@ static int cancel_pair_device(struct sock *sk, struct hci_dev *hdev, void *data,
 
 	err = mgmt_cmd_complete(sk, hdev->id, MGMT_OP_CANCEL_PAIR_DEVICE, 0,
 				addr, sizeof(*addr));
+
+	/* Since user doesn't want to proceed with the connection, abort any
+	 * ongoing pairing and then terminate the link if it was created
+	 * because of the pair device action.
+	 */
+	if (addr->type == BDADDR_BREDR)
+		hci_remove_link_key(hdev, &addr->bdaddr);
+	else
+		smp_cancel_and_remove_pairing(hdev, &addr->bdaddr,
+					      le_addr_type(addr->type));
+
+	if (conn->conn_reason == CONN_REASON_PAIR_DEVICE)
+		hci_abort_conn(conn, HCI_ERROR_REMOTE_USER_TERM);
+
 unlock:
 	hci_dev_unlock(hdev);
 	return err;
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related

* Re: [PATCH] macsec: Support 32bit PN netlink attribute for XPN links
From: Era Mayflower @ 2020-06-16 16:32 UTC (permalink / raw)
  To: David Miller; +Cc: kuba, netdev, linux-kernel
In-Reply-To: <CAMdQvKvJ7MXihELmPW2LC3PAgXMK2OG6bJjPNJkCgE6eZftDVw@mail.gmail.com>

On Wed, Jun 17, 2020 at 10:02 AM Era Mayflower <mayflowerera@gmail.com> wrote:
>
> On Tue, Jun 16, 2020 at 1:23 AM David Miller <davem@davemloft.net> wrote:
> >
> > From: Era Mayflower <mayflowerera@gmail.com>
> > Date: Tue, 16 Jun 2020 00:41:14 +0900
> >
> > > +     if (tb_sa[MACSEC_SA_ATTR_PN]) {
> >
> > validate_add_rxsa() requires that MACSET_SA_ATTR_PN be non-NULL, so
> > you don't need to add this check here.
> >
>
> validate_add_rxsa() did not originally contain that requirement.
> It does exist in validate_add_txsa(), which means that providing a PN
> is necessary only when creating TXSA.
> When creating an RXSA without providing a PN it will be set to 1
> (init_rx_sa+15).
> This is the original behavior which of course can be changed.
>
> - Era.

Sorry for the time issues, just noticed I sent the previous mail with
future time.
Fixed it permanently on my computer.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox