Netdev List
 help / color / mirror / Atom feed
* [Patch net] ipv6/dccp: do not inherit ipv6_mc_list from parent
From: Cong Wang @ 2017-05-09 23:59 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, Cong Wang

Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/dccp/ipv6.c     | 6 ++++++
 net/ipv6/tcp_ipv6.c | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index d9b6a4e..b6bbb71 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -426,6 +426,9 @@ static struct sock *dccp_v6_request_recv_sock(const struct sock *sk,
 		newsk->sk_backlog_rcv = dccp_v4_do_rcv;
 		newnp->pktoptions  = NULL;
 		newnp->opt	   = NULL;
+		newnp->ipv6_mc_list = NULL;
+		newnp->ipv6_ac_list = NULL;
+		newnp->ipv6_fl_list = NULL;
 		newnp->mcast_oif   = inet6_iif(skb);
 		newnp->mcast_hops  = ipv6_hdr(skb)->hop_limit;
 
@@ -490,6 +493,9 @@ static struct sock *dccp_v6_request_recv_sock(const struct sock *sk,
 	/* Clone RX bits */
 	newnp->rxopt.all = np->rxopt.all;
 
+	newnp->ipv6_mc_list = NULL;
+	newnp->ipv6_ac_list = NULL;
+	newnp->ipv6_fl_list = NULL;
 	newnp->pktoptions = NULL;
 	newnp->opt	  = NULL;
 	newnp->mcast_oif  = inet6_iif(skb);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index aeb9497..df5a9ff 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1062,6 +1062,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
 		newtp->af_specific = &tcp_sock_ipv6_mapped_specific;
 #endif
 
+		newnp->ipv6_mc_list = NULL;
 		newnp->ipv6_ac_list = NULL;
 		newnp->ipv6_fl_list = NULL;
 		newnp->pktoptions  = NULL;
@@ -1131,6 +1132,7 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
 	   First: no IPv4 options.
 	 */
 	newinet->inet_opt = NULL;
+	newnp->ipv6_mc_list = NULL;
 	newnp->ipv6_ac_list = NULL;
 	newnp->ipv6_fl_list = NULL;
 
-- 
2.5.5

^ permalink raw reply related

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Eric Dumazet @ 2017-05-09 23:51 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <CAM_iQpU+fXO7eFroAYMv6vqDzCK_ZYXjrTPAMfoDR2BDqaK9rQ@mail.gmail.com>

On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote:

> This statement is only used to ensure we pass the "dead == fi->fib_nhs"
> check right below the inner loop, it is fine to keep it without break since
> fi is not changed in the inner loop.
> 

So the dead++ above wont end up with (dead > fi->fib_nhs) ?

^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Eric Dumazet @ 2017-05-09 23:50 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <CAM_iQpU+fXO7eFroAYMv6vqDzCK_ZYXjrTPAMfoDR2BDqaK9rQ@mail.gmail.com>

On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote:

> All of them take RCU read lock, so, as I explained in the code comment,
> they all should be fine because of synchronize_net() on unregister path.
> Do you see anything otherwise?

They might take rcu lock, but compiler is still allowed to read
fi->fib_dev multiple times, and crashes might happen.

You will need to audit all code and fix it, using proper
rcu_dereference() or similar code ensuring compiler wont do stupid
things.

Like :

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1201409ba1dcb18ee028003b065410b87bf4a602..ab69517befbb5f300af785fbb20071a3d1086593 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2666,11 +2666,13 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 
 		seq_setwidth(seq, 127);
 
-		if (fi)
+		if (fi) {
+			struct net_device *dev = rcu_dereference(fi->fib_dev);
+
 			seq_printf(seq,
 				   "%s\t%08X\t%08X\t%04X\t%d\t%u\t"
 				   "%d\t%08X\t%d\t%u\t%u",
-				   fi->fib_dev ? fi->fib_dev->name : "*",
+				   dev ? dev->name : "*",
 				   prefix,
 				   fi->fib_nh->nh_gw, flags, 0, 0,
 				   fi->fib_priority,
@@ -2679,13 +2681,13 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 				    fi->fib_advmss + 40 : 0),
 				   fi->fib_window,
 				   fi->fib_rtt >> 3);
-		else
+		} else {
 			seq_printf(seq,
 				   "*\t%08X\t%08X\t%04X\t%d\t%u\t"
 				   "%d\t%08X\t%d\t%u\t%u",
 				   prefix, 0, flags, 0, 0, 0,
 				   mask, 0, 0, 0);
-
+		}
 		seq_pad(seq, '\n');
 	}
 

^ permalink raw reply related

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-09 23:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <1494371348.7796.95.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, May 9, 2017 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-05-09 at 15:54 -0700, Eric Dumazet wrote:
>> On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote:
>> > On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote:
>> > > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > > > Wait... if we transfer dst->dev to loopback_dev because we don't
>> > > > want to block unregister path, then we might have a similar problem
>> > > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
>> > > > hold the dev references...
>> > > >
>> > >
>> > > I finally come up with the attach patch... Do you mind to give it a try?
>> >
>> > I will, but this might be delayed by a few hours.
>> >
>> > In the mean time, it looks like you could try adding the following to
>> > your .config ;)
>> >
>> > CONFIG_IP_ROUTE_MULTIPATH=y
>> >
>> >
>>
>> +                               /* This should be fine, we are on unregister
>> +                                * path so synchronize_net() already waits for
>> +                                * existing readers. We have to release the
>> +                                * dev here because dst could still hold this
>> +                                * fib_info via rt->fi, we can't wait for GC.
>> +                                */
>> +                               RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
>> +                               dev_put(dev);
>>                                 dead = fi->fib_nhs;
>>
>> dead = fi->fib_mhs looks wrong if you remove the break; statement ?
>>
>> -                               break;

This statement is only used to ensure we pass the "dead == fi->fib_nhs"
check right below the inner loop, it is fine to keep it without break since
fi is not changed in the inner loop.


>
> Also setting nexthop_nh->nh_dev to NULL looks quite dangerous
>
> We have plenty of sites doing :
>
> if (fi->fib_dev)
>     x = fi->fib_dev->field
>
> fib_route_seq_show() is one example.
>

All of them take RCU read lock, so, as I explained in the code comment,
they all should be fine because of synchronize_net() on unregister path.
Do you see anything otherwise?

^ permalink raw reply

* [PATCH] libertas: Avoid reading past end of buffer
From: Kees Cook @ 2017-05-09 23:23 UTC (permalink / raw)
  To: netdev; +Cc: Kalle Valo, libertas-dev, linux-wireless, linux-kernel,
	Daniel Micay

Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros. Additionally adjust indentation to keep checkpatch.pl happy.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/wireless/marvell/libertas/mesh.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/libertas/mesh.c b/drivers/net/wireless/marvell/libertas/mesh.c
index d0c881dd5846..d0b1948ca242 100644
--- a/drivers/net/wireless/marvell/libertas/mesh.c
+++ b/drivers/net/wireless/marvell/libertas/mesh.c
@@ -1177,9 +1177,9 @@ void lbs_mesh_ethtool_get_strings(struct net_device *dev,
 	switch (stringset) {
 	case ETH_SS_STATS:
 		for (i = 0; i < MESH_STATS_NUM; i++) {
-			memcpy(s + i * ETH_GSTRING_LEN,
-					mesh_stat_strings[i],
-					ETH_GSTRING_LEN);
+			strncpy(s + i * ETH_GSTRING_LEN,
+				mesh_stat_strings[i],
+				ETH_GSTRING_LEN);
 		}
 		break;
 	}
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Eric Dumazet @ 2017-05-09 23:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <1494370451.7796.93.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-05-09 at 15:54 -0700, Eric Dumazet wrote:
> On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote:
> > On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote:
> > > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > Wait... if we transfer dst->dev to loopback_dev because we don't
> > > > want to block unregister path, then we might have a similar problem
> > > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
> > > > hold the dev references...
> > > >
> > > 
> > > I finally come up with the attach patch... Do you mind to give it a try?
> > 
> > I will, but this might be delayed by a few hours.
> > 
> > In the mean time, it looks like you could try adding the following to
> > your .config ;)
> > 
> > CONFIG_IP_ROUTE_MULTIPATH=y
> > 
> > 
> 
> +                               /* This should be fine, we are on unregister
> +                                * path so synchronize_net() already waits for
> +                                * existing readers. We have to release the
> +                                * dev here because dst could still hold this
> +                                * fib_info via rt->fi, we can't wait for GC.
> +                                */
> +                               RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
> +                               dev_put(dev);
>                                 dead = fi->fib_nhs;
> 
> dead = fi->fib_mhs looks wrong if you remove the break; statement ?
> 
> -                               break;

Also setting nexthop_nh->nh_dev to NULL looks quite dangerous

We have plenty of sites doing :

if (fi->fib_dev)
    x = fi->fib_dev->field

fib_route_seq_show() is one example.

^ permalink raw reply

* Re: [PATCH] wcn36xx: Close SMD channel on device removal
From: Bjorn Andersson @ 2017-05-09 23:03 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eugene Krasnikov, Eyal Ilsar, wcn36xx, linux-wireless, netdev,
	linux-kernel, linux-arm-msm
In-Reply-To: <87h90u3672.fsf@kamboji.qca.qualcomm.com>

On Mon 08 May 23:17 PDT 2017, Kalle Valo wrote:

> Bjorn Andersson <bjorn.andersson@linaro.org> writes:
> 
> > The SMD channel is not the primary WCNSS channel and must explicitly be
> > closed as the device is removed, or the channel will already by open on
> > a subsequent probe call in e.g. the case of reloading the kernel module.
> >
> > This issue was introduced because I simplified the underlying SMD
> > implementation while the SMD adaptions of the driver sat on the mailing
> > list, but missed to update these patches. The patch does however only
> > apply back to the transition to rpmsg, hence the limited Fixes.
> >
> > Fixes: 5052de8deff5 ("soc: qcom: smd: Transition client drivers from smd to rpmsg")
> > Reported-by: Eyal Ilsar <c_eilsar@qti.qualcomm.com>
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> 
> As this is a regression I'll queue this to 4.12.
> 

Thanks.

> But if this is an older bug (didn't quite understand your description
> though) should there be a separate patch for stable releases?
> 

AFAICT this never worked, as it seems I did the rework in SMD while we
tried to figure out the dependency issues we had with moving to SMD. So
v4.9 through v4.11 has SMD support - with this bug.

How do I proceed, do you want me to write up a fix for stable@? Do I
send that out as an ordinary patch?

Regards,
Bjorn

^ permalink raw reply

* Re: [PATCH net-next] bnxt: add dma mapping attributes
From: Shannon Nelson @ 2017-05-09 22:54 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, Netdev, sparclinux
In-Reply-To: <CACKFLikw_oi4rHRC9P-23xuvuOJSaf9b2kUSeTMgbNW1WA__=w@mail.gmail.com>

On 5/9/2017 2:05 PM, Michael Chan wrote:
> On Tue, May 9, 2017 at 1:37 PM, Shannon Nelson
> <shannon.nelson@oracle.com> wrote:
>> On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
>> in our Rx path dma mapping in order to get the expected performance out
>> of the receive path.  Adding it to the Tx path has little effect, so
>> that's not a part of this patch.
>>
>> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
>> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
>> Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
>> ---
>>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   61 ++++++++++++++++++----------
>>  1 files changed, 39 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> index 1f1e54b..771742c 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> @@ -66,6 +66,12 @@
>>  MODULE_DESCRIPTION("Broadcom BCM573xx network driver");
>>  MODULE_VERSION(DRV_MODULE_VERSION);
>>
>> +#ifdef CONFIG_SPARC
>> +#define BNXT_DMA_ATTRS  DMA_ATTR_WEAK_ORDERING
>> +#else
>> +#define BNXT_DMA_ATTRS 0
>> +#endif
>> +
>
> I think we can use the same attribute for all architectures.
> Architectures that don't implement weak ordering will ignore the
> attribute.
>

In the long run, you are probably correct, and it would be simple enough 
to change this.  However, given the recent threads about the 
applicability of Relaxed Ordering and a couple of PCIe root complexes 
that have been found to have issues with Relaxed Ordering TLPs, I prefer 
to stay on the conservative side and set it up only for the platform I 
know.  As it stands, this patch won't change the currently working 
behavior on other platforms, but will help us out on the one we know can 
use the feature.

sln



^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Eric Dumazet @ 2017-05-09 22:54 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <1494370367.7796.92.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote:
> On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote:
> > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > Wait... if we transfer dst->dev to loopback_dev because we don't
> > > want to block unregister path, then we might have a similar problem
> > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
> > > hold the dev references...
> > >
> > 
> > I finally come up with the attach patch... Do you mind to give it a try?
> 
> I will, but this might be delayed by a few hours.
> 
> In the mean time, it looks like you could try adding the following to
> your .config ;)
> 
> CONFIG_IP_ROUTE_MULTIPATH=y
> 
> 

+                               /* This should be fine, we are on unregister
+                                * path so synchronize_net() already waits for
+                                * existing readers. We have to release the
+                                * dev here because dst could still hold this
+                                * fib_info via rt->fi, we can't wait for GC.
+                                */
+                               RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
+                               dev_put(dev);
                                dead = fi->fib_nhs;

dead = fi->fib_mhs looks wrong if you remove the break; statement ?

-                               break;
                        }

^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Eric Dumazet @ 2017-05-09 22:52 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <CAM_iQpVhdzeen+ZJ3jyK7Qb=yVuoRSZ7BFB3RoVMShY8KPBTZA@mail.gmail.com>

On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote:
> On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > Wait... if we transfer dst->dev to loopback_dev because we don't
> > want to block unregister path, then we might have a similar problem
> > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
> > hold the dev references...
> >
> 
> I finally come up with the attach patch... Do you mind to give it a try?

I will, but this might be delayed by a few hours.

In the mean time, it looks like you could try adding the following to
your .config ;)

CONFIG_IP_ROUTE_MULTIPATH=y

^ permalink raw reply

* Re: [PATCH v2 net] dccp/tcp: do not inherit mc_list from parent
From: Eric Dumazet @ 2017-05-09 22:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev, Pray3r, Andrey Konovalov
In-Reply-To: <CAM_iQpXHXhnCJr6TNCUm06TkSQvdz6H+k=avN1AyBO9uBkWZAA@mail.gmail.com>

On Tue, 2017-05-09 at 15:37 -0700, Cong Wang wrote:
> On Tue, May 9, 2017 at 6:29 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > syzkaller found a way to trigger double frees from ip_mc_drop_socket()
> >
> > It turns out that leave a copy of parent mc_list at accept() time,
> > which is very bad.
> >
> > Very similar to commit 8b485ce69876 ("tcp: do not inherit
> > fastopen_req from parent")
> >
> > Initial report from Pray3r, completed by Andrey one.
> > Thanks a lot to them !
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Pray3r <pray3r.z@gmail.com>
> > Reported-by: Andrey Konovalov <andreyknvl@google.com>
> > Tested-by: Andrey Konovalov <andreyknvl@google.com>
> > ---
> > v2: fix moved into inet_csk_clone_lock() to fix both DCCP and TCP
> >
> >  net/ipv4/inet_connection_sock.c |    2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 5e313c1ac94fc88eca5fe3a0e9e46e551e955ff0..1054d330bf9df3189a21dbb08e27c0e6ad136775 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -794,6 +794,8 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
> >                 /* listeners have SOCK_RCU_FREE, not the children */
> >                 sock_reset_flag(newsk, SOCK_RCU_FREE);
> >
> > +               inet_sk(newsk)->mc_list = NULL;
> > +
> >                 newsk->sk_mark = inet_rsk(req)->ir_mark;
> >                 atomic64_set(&newsk->sk_cookie,
> >                              atomic64_read(&inet_rsk(req)->ir_cookie));
> >
> 
> I think IPv6 needs this too?
> 
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index aeb9497..b3611d9 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1062,6 +1062,7 @@ static struct sock *tcp_v6_syn_recv_sock(const
> struct sock *sk, struct sk_buff *
>                 newtp->af_specific = &tcp_sock_ipv6_mapped_specific;
>  #endif
> 
> +               newnp->ipv6_mc_list = NULL;
>                 newnp->ipv6_ac_list = NULL;
>                 newnp->ipv6_fl_list = NULL;
>                 newnp->pktoptions  = NULL;

Good point, but it looks like you patched on only IPv4 mapped sockets.

And DCCP would need fixes as well.

^ permalink raw reply

* Re: [PATCH v2 net] dccp/tcp: do not inherit mc_list from parent
From: Cong Wang @ 2017-05-09 22:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Pray3r, Andrey Konovalov
In-Reply-To: <1494336559.7796.78.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, May 9, 2017 at 6:29 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> syzkaller found a way to trigger double frees from ip_mc_drop_socket()
>
> It turns out that leave a copy of parent mc_list at accept() time,
> which is very bad.
>
> Very similar to commit 8b485ce69876 ("tcp: do not inherit
> fastopen_req from parent")
>
> Initial report from Pray3r, completed by Andrey one.
> Thanks a lot to them !
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Pray3r <pray3r.z@gmail.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> ---
> v2: fix moved into inet_csk_clone_lock() to fix both DCCP and TCP
>
>  net/ipv4/inet_connection_sock.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 5e313c1ac94fc88eca5fe3a0e9e46e551e955ff0..1054d330bf9df3189a21dbb08e27c0e6ad136775 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -794,6 +794,8 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
>                 /* listeners have SOCK_RCU_FREE, not the children */
>                 sock_reset_flag(newsk, SOCK_RCU_FREE);
>
> +               inet_sk(newsk)->mc_list = NULL;
> +
>                 newsk->sk_mark = inet_rsk(req)->ir_mark;
>                 atomic64_set(&newsk->sk_cookie,
>                              atomic64_read(&inet_rsk(req)->ir_cookie));
>

I think IPv6 needs this too?

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index aeb9497..b3611d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1062,6 +1062,7 @@ static struct sock *tcp_v6_syn_recv_sock(const
struct sock *sk, struct sk_buff *
                newtp->af_specific = &tcp_sock_ipv6_mapped_specific;
 #endif

+               newnp->ipv6_mc_list = NULL;
                newnp->ipv6_ac_list = NULL;
                newnp->ipv6_fl_list = NULL;
                newnp->pktoptions  = NULL;

^ permalink raw reply related

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-09 22:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <CAM_iQpXADa6AyQ3X-EbmRPGCQArgPWygvHMYgWtfUYV082oF3Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Wait... if we transfer dst->dev to loopback_dev because we don't
> want to block unregister path, then we might have a similar problem
> for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
> hold the dev references...
>

I finally come up with the attach patch... Do you mind to give it a try?

[-- Attachment #2: ipv4-rt-fi-ref-count.diff --]
[-- Type: text/plain, Size: 2821 bytes --]

commit a431b4d969f617c5ef8711b6bf493199eecec759
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Tue May 9 14:35:10 2017 -0700

    ipv4: restore rt->fi for reference counting, try #2
    
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

diff --git a/include/net/route.h b/include/net/route.h
index 2cc0e14..4335eb7 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -69,6 +69,7 @@ struct rtable {
 
 	struct list_head	rt_uncached;
 	struct uncached_list	*rt_uncached_list;
+	struct fib_info		*fi; /* for refcnt to shared metrics */
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index da449dd..d77c453 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1429,8 +1429,15 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 			if (event == NETDEV_UNREGISTER &&
 			    nexthop_nh->nh_dev == dev) {
+				/* This should be fine, we are on unregister
+				 * path so synchronize_net() already waits for
+				 * existing readers. We have to release the
+				 * dev here because dst could still hold this
+				 * fib_info via rt->fi, we can't wait for GC.
+				 */
+				RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL);
+				dev_put(dev);
 				dead = fi->fib_nhs;
-				break;
 			}
 #endif
 		} endfor_nexthops(fi)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 655d9ee..f647310 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 {
 	struct rtable *rt = (struct rtable *) dst;
 
+	if (rt->fi) {
+		fib_info_put(rt->fi);
+		rt->fi = NULL;
+	}
+
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt)
 		!rt_is_expired(rt);
 }
 
+static void rt_init_metrics(struct rtable *rt, struct fib_info *fi)
+{
+	if (fi->fib_metrics != (u32 *)dst_default_metrics) {
+		fib_info_hold(fi);
+		rt->fi = fi;
+	}
+
+	dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+}
+
 static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			   const struct fib_result *res,
 			   struct fib_nh_exception *fnhe,
@@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			rt->rt_gateway = nh->nh_gw;
 			rt->rt_uses_gateway = 1;
 		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+		rt_init_metrics(rt, fi);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
 #endif
@@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
 		rt->rt_gateway = 0;
 		rt->rt_uses_gateway = 0;
 		rt->rt_table_id = 0;
+		rt->fi = NULL;
 		INIT_LIST_HEAD(&rt->rt_uncached);
 
 		rt->dst.output = ip_output;

^ permalink raw reply related

* Re: [PATCH net-next] bnxt: add dma mapping attributes
From: Michael Chan @ 2017-05-09 21:05 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: David Miller, Netdev, sparclinux
In-Reply-To: <1494362253-270990-1-git-send-email-shannon.nelson@oracle.com>

On Tue, May 9, 2017 at 1:37 PM, Shannon Nelson
<shannon.nelson@oracle.com> wrote:
> On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
> in our Rx path dma mapping in order to get the expected performance out
> of the receive path.  Adding it to the Tx path has little effect, so
> that's not a part of this patch.
>
> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
> Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
> Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   61 ++++++++++++++++++----------
>  1 files changed, 39 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 1f1e54b..771742c 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -66,6 +66,12 @@
>  MODULE_DESCRIPTION("Broadcom BCM573xx network driver");
>  MODULE_VERSION(DRV_MODULE_VERSION);
>
> +#ifdef CONFIG_SPARC
> +#define BNXT_DMA_ATTRS  DMA_ATTR_WEAK_ORDERING
> +#else
> +#define BNXT_DMA_ATTRS 0
> +#endif
> +

I think we can use the same attribute for all architectures.
Architectures that don't implement weak ordering will ignore the
attribute.

^ permalink raw reply

* Re: [Patch net] ipv4: restore rt->fi for reference counting
From: Cong Wang @ 2017-05-09 20:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov,
	Eric Dumazet
In-Reply-To: <1494348962.7796.88.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, May 9, 2017 at 9:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-05-09 at 09:44 -0700, Cong Wang wrote:
>
>>
>> Eric, how did you produce it?
>> I guess it's because of nh_dev which is the only netdevice pointer inside
>> fib_info. Let me take a deeper look.
>>
>
> Nothing particular, I am using kexec to boot new kernels, and all my
> attempts with your patch included demonstrated the issue.


Interesting. So this happens on your eth0 teardown path, but
I don't see any problem in the related NETDEV_UNREGISTER
notifiers yet, we don't call dst_destroy() on this path and will defer it
to GC, but that should not be delayed for so long?

Wait... if we transfer dst->dev to loopback_dev because we don't
want to block unregister path, then we might have a similar problem
for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still
hold the dev references...

Something like this (just for proof of concept):

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index da449dd..85cd614 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -205,7 +205,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
        struct fib_info *fi = container_of(head, struct fib_info, rcu);

        change_nexthops(fi) {
-               if (nexthop_nh->nh_dev)
+               if (nexthop_nh->nh_dev && !(nexthop_nh->nh_flags & RTNH_F_DEAD))
                        dev_put(nexthop_nh->nh_dev);
                lwtstate_put(nexthop_nh->nh_lwtstate);
                free_nh_exceptions(nexthop_nh);
@@ -1414,8 +1414,10 @@ int fib_sync_down_dev(struct net_device *dev,
unsigned long event, bool force)
                        else if (nexthop_nh->nh_dev == dev &&
                                 nexthop_nh->nh_scope != scope) {
                                switch (event) {
-                               case NETDEV_DOWN:
                                case NETDEV_UNREGISTER:
+                                       dev_put(dev);
+                                       /* fall through */
+                               case NETDEV_DOWN:
                                        nexthop_nh->nh_flags |= RTNH_F_DEAD;
                                        /* fall through */
                                case NETDEV_CHANGE:

^ permalink raw reply related

* [PATCH net-next] bnxt: add dma mapping attributes
From: Shannon Nelson @ 2017-05-09 20:37 UTC (permalink / raw)
  To: davem, netdev; +Cc: sparclinux, Shannon Nelson

On the SPARC platform we need to use the DMA_ATTR_WEAK_ORDERING attribute
in our Rx path dma mapping in order to get the expected performance out
of the receive path.  Adding it to the Tx path has little effect, so
that's not a part of this patch.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Tushar Dave <tushar.n.dave@oracle.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   61 ++++++++++++++++++----------
 1 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1f1e54b..771742c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -66,6 +66,12 @@
 MODULE_DESCRIPTION("Broadcom BCM573xx network driver");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
+#ifdef CONFIG_SPARC
+#define BNXT_DMA_ATTRS  DMA_ATTR_WEAK_ORDERING
+#else
+#define BNXT_DMA_ATTRS	0
+#endif
+
 #define BNXT_RX_OFFSET (NET_SKB_PAD + NET_IP_ALIGN)
 #define BNXT_RX_DMA_OFFSET NET_SKB_PAD
 #define BNXT_RX_COPY_THRESH 256
@@ -582,7 +588,8 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 	if (!page)
 		return NULL;
 
-	*mapping = dma_map_page(dev, page, 0, PAGE_SIZE, bp->rx_dir);
+	*mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir,
+				      BNXT_DMA_ATTRS);
 	if (dma_mapping_error(dev, *mapping)) {
 		__free_page(page);
 		return NULL;
@@ -601,8 +608,9 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 	if (!data)
 		return NULL;
 
-	*mapping = dma_map_single(&pdev->dev, data + bp->rx_dma_offset,
-				  bp->rx_buf_use_size, bp->rx_dir);
+	*mapping = dma_map_single_attrs(&pdev->dev, data + bp->rx_dma_offset,
+					bp->rx_buf_use_size, bp->rx_dir,
+					BNXT_DMA_ATTRS);
 
 	if (dma_mapping_error(&pdev->dev, *mapping)) {
 		kfree(data);
@@ -705,8 +713,9 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
 			return -ENOMEM;
 	}
 
-	mapping = dma_map_page(&pdev->dev, page, offset, BNXT_RX_PAGE_SIZE,
-			       PCI_DMA_FROMDEVICE);
+	mapping = dma_map_page_attrs(&pdev->dev, page, offset,
+				     BNXT_RX_PAGE_SIZE, PCI_DMA_FROMDEVICE,
+				     BNXT_DMA_ATTRS);
 	if (dma_mapping_error(&pdev->dev, mapping)) {
 		__free_page(page);
 		return -EIO;
@@ -799,7 +808,8 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_napi *bnapi, u16 cp_cons,
 		return NULL;
 	}
 	dma_addr -= bp->rx_dma_offset;
-	dma_unmap_page(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir);
+	dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir,
+			     BNXT_DMA_ATTRS);
 
 	if (unlikely(!payload))
 		payload = eth_get_headlen(data_ptr, len);
@@ -841,8 +851,8 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_napi *bnapi, u16 cp_cons,
 	}
 
 	skb = build_skb(data, 0);
-	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
-			 bp->rx_dir);
+	dma_unmap_single_attrs(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
+			       bp->rx_dir, BNXT_DMA_ATTRS);
 	if (!skb) {
 		kfree(data);
 		return NULL;
@@ -909,8 +919,8 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_napi *bnapi, u16 cp_cons,
 			return NULL;
 		}
 
-		dma_unmap_page(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
-			       PCI_DMA_FROMDEVICE);
+		dma_unmap_page_attrs(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
+				     PCI_DMA_FROMDEVICE, BNXT_DMA_ATTRS);
 
 		skb->data_len += frag_len;
 		skb->len += frag_len;
@@ -1329,8 +1339,9 @@ static void bnxt_abort_tpa(struct bnxt *bp, struct bnxt_napi *bnapi,
 		tpa_info->mapping = new_mapping;
 
 		skb = build_skb(data, 0);
-		dma_unmap_single(&bp->pdev->dev, mapping, bp->rx_buf_use_size,
-				 bp->rx_dir);
+		dma_unmap_single_attrs(&bp->pdev->dev, mapping,
+				       bp->rx_buf_use_size, bp->rx_dir,
+				       BNXT_DMA_ATTRS);
 
 		if (!skb) {
 			kfree(data);
@@ -1971,9 +1982,11 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
 				if (!data)
 					continue;
 
-				dma_unmap_single(&pdev->dev, tpa_info->mapping,
-						 bp->rx_buf_use_size,
-						 bp->rx_dir);
+				dma_unmap_single_attrs(&pdev->dev,
+						       tpa_info->mapping,
+						       bp->rx_buf_use_size,
+						       bp->rx_dir,
+						       BNXT_DMA_ATTRS);
 
 				tpa_info->data = NULL;
 
@@ -1993,13 +2006,15 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
 
 			if (BNXT_RX_PAGE_MODE(bp)) {
 				mapping -= bp->rx_dma_offset;
-				dma_unmap_page(&pdev->dev, mapping,
-					       PAGE_SIZE, bp->rx_dir);
+				dma_unmap_page_attrs(&pdev->dev, mapping,
+						     PAGE_SIZE, bp->rx_dir,
+						     BNXT_DMA_ATTRS);
 				__free_page(data);
 			} else {
-				dma_unmap_single(&pdev->dev, mapping,
-						 bp->rx_buf_use_size,
-						 bp->rx_dir);
+				dma_unmap_single_attrs(&pdev->dev, mapping,
+						       bp->rx_buf_use_size,
+						       bp->rx_dir,
+						       BNXT_DMA_ATTRS);
 				kfree(data);
 			}
 		}
@@ -2012,8 +2027,10 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
 			if (!page)
 				continue;
 
-			dma_unmap_page(&pdev->dev, rx_agg_buf->mapping,
-				       BNXT_RX_PAGE_SIZE, PCI_DMA_FROMDEVICE);
+			dma_unmap_page_attrs(&pdev->dev, rx_agg_buf->mapping,
+					     BNXT_RX_PAGE_SIZE,
+					     PCI_DMA_FROMDEVICE,
+					     BNXT_DMA_ATTRS);
 
 			rx_agg_buf->page = NULL;
 			__clear_bit(j, rxr->rx_agg_bmap);
-- 
1.7.1


^ permalink raw reply related

* Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit
From: Daniel Borkmann @ 2017-05-09 20:25 UTC (permalink / raw)
  To: Shubham Bansal
  Cc: David Miller, Kees Cook, Mircea Gherzan, Network Development,
	kernel-hardening, linux-arm-kernel, ast
In-Reply-To: <CAHgaXd+xb5dN90sH___RtxgSC3usnH2jXkA5r3=fQJc3pOY5xw@mail.gmail.com>

On 05/09/2017 10:12 PM, Shubham Bansal wrote:
> Hi Daniel,
>
> I just tried running test_bpf.ko module.
>
> $ echo 2 >>  /proc/sys/net/core/bpf_jit_enable
> $ insmod test_bpf.ko
>
> test_bpf: #0 TAX
> bpf_jit: flen=14 proglen=212 pass=2 image=7f15a83c from=insmod pid=730
> JIT code: 00000000: f0 05 2d e9 40 d2 4d e2 00 40 a0 e3 0c 42 8d e5
> JIT code: 00000010: 08 42 8d e5 00 00 20 e0 01 10 21 e0 20 62 9d e5
> JIT code: 00000020: 20 72 9d e5 06 70 27 e0 20 72 8d e5 24 62 9d e5
> JIT code: 00000030: 24 72 9d e5 06 70 27 e0 24 72 8d e5 00 40 a0 e1
> JIT code: 00000040: 01 50 a0 e1 01 00 a0 e3 00 10 a0 e3 20 02 8d e5
> JIT code: 00000050: 24 12 8d e5 02 00 a0 e3 00 10 a0 e3 20 62 9d e5
> JIT code: 00000060: 06 00 80 e0 00 10 a0 e3 00 00 60 e2 00 10 a0 e3
> JIT code: 00000070: 20 02 8d e5 24 12 8d e5 54 40 90 e5 20 62 9d e5
> JIT code: 00000080: 06 00 80 e0 00 10 a0 e3 20 02 8d e5 24 12 8d e5
> JIT code: 00000090: 04 00 a0 e1 01 10 a0 e3 20 62 9d e5 06 10 81 e0
> JIT code: 000000a0: 01 20 a0 e3 04 32 8d e2 bc 68 0a e3 11 60 48 e3
> JIT code: 000000b0: 36 ff 2f e1 01 10 21 e0 00 00 50 e3 04 00 00 0a
> JIT code: 000000c0: 00 00 d0 e5 01 00 00 ea 40 d2 8d e2 f0 05 bd e8
> JIT code: 000000d0: 1e ff 2f e1
> jited:1
> Unhandled fault: page domain fault (0x01b) at 0x00000051
> pgd = 871d0000
> [00000051] *pgd=671b7831, *pte=00000000, *ppte=00000000
> Internal error: : 1b [#1] SMP ARM
> Modules linked in: test_bpf(+)
> CPU: 0 PID: 730 Comm: insmod Not tainted 4.11.0+ #5
> Hardware name: ARM-Versatile Express
> task: 87023700 task.stack: 8718a000
> PC is at 0x7f15a8b4
> LR is at test_bpf_init+0x5bc/0x1000 [test_bpf]
> pc : [<7f15a8b4>]    lr : [<7f1575bc>]    psr: 80000013
> sp : 8718bd7c  ip : 00000015  fp : 7f005008
> r10: 7f005094  r9 : 893ba020  r8 : 893ba000
> r7 : 00000000  r6 : 00000001  r5 : 00000000  r4 : 00000000
> r3 : 7f15a83c  r2 : 893ba020  r1 : 00000000  r0 : fffffffd
> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 671d0059  DAC: 00000051
> Process insmod (pid: 730, stack limit = 0x8718a210)
> Stack: (0x8718bd7c to 0x8718c000)
> bd60:                                                                00000000
> bd80: 00002710 870db300 c302e7e8 7f004010 893ba000 7f005094 00000000 00000000
> bda0: 00000000 00000000 00000000 00000001 00000001 00000000 014000c0 00150628
> bdc0: 7f0050ac 7f154840 1234aaaa 1234aaab c302e7e8 0000000f 00000000 893ba000
> bde0: 0000000b 7f004010 87fd54a0 ffffe000 7f157000 00000000 871b6fc0 00000001
> be00: 78e4905c 00000024 7f154640 8010179c 80a06544 8718a000 00000001 80a54980
> be20: 80a3066c 00000007 809685c0 80a54700 80a54700 07551000 80a54700 60070013
> be40: 7f154640 801f3fc8 78e4905c 7f154640 00000001 871b6fe4 7f154640 00000001
> be60: 871b6b00 00000001 78e4905c 801eaa94 00000001 871b6fe4 8718bf44 00000001
> be80: 871b6fe4 80196e4c 7f15464c 00007fff 7f154640 80193f10 87127000 7f154640
> bea0: 7f154688 80703800 7f154770 807037e4 8081b184 807bec60 807becc4 807bec6c
> bec0: 7f15481c 8010c1b8 93600000 76ed8028 00000f60 00000000 00000000 00000000
> bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00003f80
> bf20: 76f5cf88 00000000 93684f80 8718a000 00160fda 00000051 00000000 801973b0
> bf40: 87671a00 93501000 00183f80 93684760 93684574 936788e0 00155000 00155290
> bf60: 00000000 00000000 00000000 00001f64 00000032 00000033 0000001d 00000000
> bf80: 00000017 00000000 00000000 00183f80 756e694c 00000080 80107684 fffffffd
> bfa0: 00000000 801074c0 00000000 00183f80 76dd9008 00183f80 00160fda 00000000
> bfc0: 00000000 00183f80 756e694c 00000080 00000001 7eabae2c 00172f8c 00000000
> bfe0: 7eabaae0 7eabaad0 0004017f 00013172 60070030 76dd9008 00000000 00000000
> [<7f1575bc>] (test_bpf_init [test_bpf]) from [<7f157000>]
> (test_bpf_init+0x0/0x1000 [test_bpf])
> [<7f157000>] (test_bpf_init [test_bpf]) from [<78e4905c>] (0x78e4905c)
> Code: e2600000 e3a01000 e58d0220 e58d1224 (e5904054)
> ---[ end trace a36398923b914fe2 ]---
> Segmentation fault
>
> Why is trying to execute TAX which is a cBPF instruction?

Kernel translates this to eBPF internally (bpf_prepare_filter() ->
bpf_migrate_filter()), no cBPF will see the JIT directly.

Is your implementation still using bpf_jit_compile() callback as
opposed to bpf_int_jit_compile()?!

Cheers,
Daniel

^ permalink raw reply

* Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit
From: David Miller @ 2017-05-09 20:19 UTC (permalink / raw)
  To: illusionist.neo
  Cc: daniel, keescook, mgherzan, netdev, kernel-hardening,
	linux-arm-kernel, ast
In-Reply-To: <CAHgaXd+xb5dN90sH___RtxgSC3usnH2jXkA5r3=fQJc3pOY5xw@mail.gmail.com>

From: Shubham Bansal <illusionist.neo@gmail.com>
Date: Wed, 10 May 2017 01:42:10 +0530

> Why is trying to execute TAX which is a cBPF instruction?

Because some of the tests are classic BPF programs which
get translated into eBPF ones and sent to the JIT for
compilation.

^ permalink raw reply

* [PATCH nf] xtables: zero padding in data_to_user
From: Willem de Bruijn @ 2017-05-09 20:17 UTC (permalink / raw)
  To: netfilter-devel
  Cc: netdev, rgb, fwestpha, pmoore, pvrabec, pablo, davem,
	Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

When looking up an iptables rule, the iptables binary compares the
aligned match and target data (XT_ALIGN). In some cases this can
exceed the actual data size to include padding bytes.

Before commit f77bc5b23fb1 ("iptables: use match, target and data
copy_to_user helpers") the malloc()ed bytes were overwritten by the
kernel with kzalloced contents, zeroing the padding and making the
comparison succeed. After this patch, the kernel copies and clears
only data, leaving the padding bytes undefined.

Extend the clear operation from data size to aligned data size to
include the padding bytes, if any.

Padding bytes can be observed in both match and target, and the bug
triggered, by issuing a rule with match icmp and target ACCEPT:

  iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
  iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT

Fixes: f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers")
Reported-by: Paul Moore <pmoore@redhat.com>
Reported-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/netfilter/x_tables.h | 2 +-
 net/bridge/netfilter/ebtables.c    | 9 ++++++---
 net/netfilter/x_tables.c           | 9 ++++++---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index be378cf47fcc..b3044c2c62cb 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -294,7 +294,7 @@ int xt_match_to_user(const struct xt_entry_match *m,
 int xt_target_to_user(const struct xt_entry_target *t,
 		      struct xt_entry_target __user *u);
 int xt_data_to_user(void __user *dst, const void *src,
-		    int usersize, int size);
+		    int usersize, int size, int aligned_size);
 
 void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
 				 struct xt_counters_info *info, bool compat);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 9ec0c9f908fa..9c6e619f452b 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1373,7 +1373,8 @@ static inline int ebt_obj_to_user(char __user *um, const char *_name,
 	strlcpy(name, _name, sizeof(name));
 	if (copy_to_user(um, name, EBT_FUNCTION_MAXNAMELEN) ||
 	    put_user(datasize, (int __user *)(um + EBT_FUNCTION_MAXNAMELEN)) ||
-	    xt_data_to_user(um + entrysize, data, usersize, datasize))
+	    xt_data_to_user(um + entrysize, data, usersize, datasize,
+			    XT_ALIGN(datasize)))
 		return -EFAULT;
 
 	return 0;
@@ -1658,7 +1659,8 @@ static int compat_match_to_user(struct ebt_entry_match *m, void __user **dstptr,
 		if (match->compat_to_user(cm->data, m->data))
 			return -EFAULT;
 	} else {
-		if (xt_data_to_user(cm->data, m->data, match->usersize, msize))
+		if (xt_data_to_user(cm->data, m->data, match->usersize, msize,
+				    COMPAT_XT_ALIGN(msize)))
 			return -EFAULT;
 	}
 
@@ -1687,7 +1689,8 @@ static int compat_target_to_user(struct ebt_entry_target *t,
 		if (target->compat_to_user(cm->data, t->data))
 			return -EFAULT;
 	} else {
-		if (xt_data_to_user(cm->data, t->data, target->usersize, tsize))
+		if (xt_data_to_user(cm->data, t->data, target->usersize, tsize,
+				    COMPAT_XT_ALIGN(tsize)))
 			return -EFAULT;
 	}
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index f134d384852f..c64716a735b0 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -283,12 +283,13 @@ static int xt_obj_to_user(u16 __user *psize, u16 size,
 		       &U->u.user.revision, K->u.kernel.TYPE->revision)
 
 int xt_data_to_user(void __user *dst, const void *src,
-		    int usersize, int size)
+		    int usersize, int size, int aligned_size)
 {
 	usersize = usersize ? : size;
 	if (copy_to_user(dst, src, usersize))
 		return -EFAULT;
-	if (usersize != size && clear_user(dst + usersize, size - usersize))
+	if (usersize != aligned_size &&
+	    clear_user(dst + usersize, aligned_size - usersize))
 		return -EFAULT;
 
 	return 0;
@@ -298,7 +299,9 @@ EXPORT_SYMBOL_GPL(xt_data_to_user);
 #define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)				\
 	xt_data_to_user(U->data, K->data,				\
 			K->u.kernel.TYPE->usersize,			\
-			C_SIZE ? : K->u.kernel.TYPE->TYPE##size)
+			C_SIZE ? : K->u.kernel.TYPE->TYPE##size,	\
+			C_SIZE ? COMPAT_XT_ALIGN(C_SIZE) :		\
+				 XT_ALIGN(K->u.kernel.TYPE->TYPE##size))
 
 int xt_match_to_user(const struct xt_entry_match *m,
 		     struct xt_entry_match __user *u)
-- 
2.13.0.rc2.291.g57267f2277-goog

^ permalink raw reply related

* Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit
From: Shubham Bansal @ 2017-05-09 20:12 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Kees Cook, Mircea Gherzan, Network Development,
	kernel-hardening, linux-arm-kernel, ast
In-Reply-To: <58E639E0.1010700@iogearbox.net>

Hi Daniel,

I just tried running test_bpf.ko module.

$ echo 2 >>  /proc/sys/net/core/bpf_jit_enable
$ insmod test_bpf.ko

test_bpf: #0 TAX
bpf_jit: flen=14 proglen=212 pass=2 image=7f15a83c from=insmod pid=730
JIT code: 00000000: f0 05 2d e9 40 d2 4d e2 00 40 a0 e3 0c 42 8d e5
JIT code: 00000010: 08 42 8d e5 00 00 20 e0 01 10 21 e0 20 62 9d e5
JIT code: 00000020: 20 72 9d e5 06 70 27 e0 20 72 8d e5 24 62 9d e5
JIT code: 00000030: 24 72 9d e5 06 70 27 e0 24 72 8d e5 00 40 a0 e1
JIT code: 00000040: 01 50 a0 e1 01 00 a0 e3 00 10 a0 e3 20 02 8d e5
JIT code: 00000050: 24 12 8d e5 02 00 a0 e3 00 10 a0 e3 20 62 9d e5
JIT code: 00000060: 06 00 80 e0 00 10 a0 e3 00 00 60 e2 00 10 a0 e3
JIT code: 00000070: 20 02 8d e5 24 12 8d e5 54 40 90 e5 20 62 9d e5
JIT code: 00000080: 06 00 80 e0 00 10 a0 e3 20 02 8d e5 24 12 8d e5
JIT code: 00000090: 04 00 a0 e1 01 10 a0 e3 20 62 9d e5 06 10 81 e0
JIT code: 000000a0: 01 20 a0 e3 04 32 8d e2 bc 68 0a e3 11 60 48 e3
JIT code: 000000b0: 36 ff 2f e1 01 10 21 e0 00 00 50 e3 04 00 00 0a
JIT code: 000000c0: 00 00 d0 e5 01 00 00 ea 40 d2 8d e2 f0 05 bd e8
JIT code: 000000d0: 1e ff 2f e1
jited:1
Unhandled fault: page domain fault (0x01b) at 0x00000051
pgd = 871d0000
[00000051] *pgd=671b7831, *pte=00000000, *ppte=00000000
Internal error: : 1b [#1] SMP ARM
Modules linked in: test_bpf(+)
CPU: 0 PID: 730 Comm: insmod Not tainted 4.11.0+ #5
Hardware name: ARM-Versatile Express
task: 87023700 task.stack: 8718a000
PC is at 0x7f15a8b4
LR is at test_bpf_init+0x5bc/0x1000 [test_bpf]
pc : [<7f15a8b4>]    lr : [<7f1575bc>]    psr: 80000013
sp : 8718bd7c  ip : 00000015  fp : 7f005008
r10: 7f005094  r9 : 893ba020  r8 : 893ba000
r7 : 00000000  r6 : 00000001  r5 : 00000000  r4 : 00000000
r3 : 7f15a83c  r2 : 893ba020  r1 : 00000000  r0 : fffffffd
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 671d0059  DAC: 00000051
Process insmod (pid: 730, stack limit = 0x8718a210)
Stack: (0x8718bd7c to 0x8718c000)
bd60:                                                                00000000
bd80: 00002710 870db300 c302e7e8 7f004010 893ba000 7f005094 00000000 00000000
bda0: 00000000 00000000 00000000 00000001 00000001 00000000 014000c0 00150628
bdc0: 7f0050ac 7f154840 1234aaaa 1234aaab c302e7e8 0000000f 00000000 893ba000
bde0: 0000000b 7f004010 87fd54a0 ffffe000 7f157000 00000000 871b6fc0 00000001
be00: 78e4905c 00000024 7f154640 8010179c 80a06544 8718a000 00000001 80a54980
be20: 80a3066c 00000007 809685c0 80a54700 80a54700 07551000 80a54700 60070013
be40: 7f154640 801f3fc8 78e4905c 7f154640 00000001 871b6fe4 7f154640 00000001
be60: 871b6b00 00000001 78e4905c 801eaa94 00000001 871b6fe4 8718bf44 00000001
be80: 871b6fe4 80196e4c 7f15464c 00007fff 7f154640 80193f10 87127000 7f154640
bea0: 7f154688 80703800 7f154770 807037e4 8081b184 807bec60 807becc4 807bec6c
bec0: 7f15481c 8010c1b8 93600000 76ed8028 00000f60 00000000 00000000 00000000
bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bf00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00003f80
bf20: 76f5cf88 00000000 93684f80 8718a000 00160fda 00000051 00000000 801973b0
bf40: 87671a00 93501000 00183f80 93684760 93684574 936788e0 00155000 00155290
bf60: 00000000 00000000 00000000 00001f64 00000032 00000033 0000001d 00000000
bf80: 00000017 00000000 00000000 00183f80 756e694c 00000080 80107684 fffffffd
bfa0: 00000000 801074c0 00000000 00183f80 76dd9008 00183f80 00160fda 00000000
bfc0: 00000000 00183f80 756e694c 00000080 00000001 7eabae2c 00172f8c 00000000
bfe0: 7eabaae0 7eabaad0 0004017f 00013172 60070030 76dd9008 00000000 00000000
[<7f1575bc>] (test_bpf_init [test_bpf]) from [<7f157000>]
(test_bpf_init+0x0/0x1000 [test_bpf])
[<7f157000>] (test_bpf_init [test_bpf]) from [<78e4905c>] (0x78e4905c)
Code: e2600000 e3a01000 e58d0220 e58d1224 (e5904054)
---[ end trace a36398923b914fe2 ]---
Segmentation fault

Why is trying to execute TAX which is a cBPF instruction?

Best,
Shubham Bansal


On Thu, Apr 6, 2017 at 6:21 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 04/06/2017 01:05 PM, Shubham Bansal wrote:
>>
>> Gentle Reminder.
>
>
> Sorry for late reply.
>
>> Anybody can tell me how to test the JIT compiler ?
>
>
> There's lib/test_bpf.c, see Documentation/networking/filter.txt +1349
> for some more information. It basically contains various test cases that
> have the purpose to test the JIT with corner cases. If you see a useful
> test missing, please send a patch for it, so all other JITs can benefit
> from this as well. For extracting disassembly from a generated test case,
> check out bpf_jit_disasm (Documentation/networking/filter.txt +486).
>
> Thanks,
> Daniel

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2017-05-09 20:03 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko.

2) cdc_ncm doesn't actually fully zero out the padding area is
   allocates on TX, from Jim Baxter.

3) Don't leak map addresses in BPF verifier, from Daniel Borkmann.

4) If we randomize TCP timestamps, we have to do it everywhere
   including SYN cookies.  From Eric Dumazet.

5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous.

6) Fix allocation size for ntp filter bitmap in bnxt_en driver,
   from Dan Carpenter.

7) Add missing memory allocation return value check to DSA loop
   driver, from Christophe Jaillet.

8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy
   Kalluru.

9) Don't inherit MC list from parent inet connection sockets,
   another syzkaller spotted gem.  Fix from Eric Dumazet.

Please pull, thanks a lot.

The following changes since commit af82455f7dbd9dc20244d80d033721b30d22c065:

  Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc (2017-05-04 19:15:35 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 657831ffc38e30092a2d5f03d385d710eb88b09a:

  dccp/tcp: do not inherit mc_list from parent (2017-05-09 15:17:49 -0400)

----------------------------------------------------------------
Andy Shevchenko (4):
      stmmac: pci: set default number of rx and tx queues
      stmmac: pci: TX and RX queue priority configuration
      stmmac: pci: RX queue routing configuration
      stmmac: pci: split out common_default_data() helper

Christophe Jaillet (1):
      net: dsa: loop: Check for memory allocation failure

Dan Carpenter (1):
      bnxt_en: allocate enough space for ->ntp_fltr_bmap

Daniel Borkmann (1):
      bpf: don't let ldimm64 leak map addresses on unprivileged

David S. Miller (5):
      Merge branch 'stmmac-pci-fix-crash-on-Intel-Galileo-Gen2'
      Merge tag 'mac80211-for-davem-2017-05-08' of git://git.kernel.org/.../jberg/mac80211
      Revert "ipv4: restore rt->fi for reference counting"
      Merge branch 'mlx4-misc-fixes'
      Merge branch 'qed-general-fixes'

Eric Dumazet (2):
      tcp: randomize timestamps on syncookies
      dccp/tcp: do not inherit mc_list from parent

Ganesh Goudar (1):
      cxgb4: avoid disabling FEC by default

Geliang Tang (2):
      net/hippi/rrunner: use memdup_user
      yam: use memdup_user

Grygorii Strashko (1):
      net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control

Hangbin Liu (2):
      bonding: check nla_put_be32 return value
      vti: check nla_put_* return value

Jack Morgenstein (1):
      net/mlx4_core: Reduce harmless SRIOV error message to debug level

Jim Baxter (1):
      net: cdc_ncm: Fix TX zero padding

Johannes Berg (4):
      mac80211: properly remove RX_ENC_FLAG_40MHZ
      nl80211: correctly validate MU-MIMO groups
      mac80211: fix IBSS presp allocation size
      cfg80211: fix multi scheduled scan kernel-doc

Jon Mason (1):
      net: mdio-mux: bcm-iproc: call mdiobus_free() in error path

Kamal Heib (1):
      net/mlx4_en: Change the error print to debug print

Karim Eshapa (1):
      drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

Kees Cook (4):
      bna: Avoid reading past end of buffer
      bna: ethtool: Avoid reading past end of buffer
      qlge: Avoid reading past end of buffer
      DECnet: Use container_of() for embedded struct

Luca Coelho (1):
      mac80211: bail out from prep_connection() if a reconfig is ongoing

Mintz, Yuval (3):
      qed: Fix VF removal sequence
      qed: Tell QM the number of tasks
      qede: Split PF/VF ndos.

Pavel Belous (1):
      aquantia: Fix "ethtool -S" crash when adapter down.

Rakesh Pandit (1):
      net: alx: handle pci_alloc_irq_vectors return correctly

Ram Amrani (1):
      qed: Correct doorbell configuration for !4Kb pages

Suddarsana Reddy Kalluru (1):
      qede: Fix XDP memory leak on unload

Talat Batheesh (1):
      net/mlx4_en: Avoid adding steering rules with invalid ring

Tobias Klauser (1):
      bridge: netlink: account for IFLA_BRPORT_{B, M}CAST_FLOOD size and policy

Vlad Yasevich (1):
      vlan: Keep NETIF_F_HW_CSUM similar to other software devices

WANG Cong (2):
      ipv4: restore rt->fi for reference counting
      ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf

Wei Wang (1):
      tcp: make congestion control optionally skip slow start after idle

 drivers/net/bonding/bond_netlink.c                    |  3 ++-
 drivers/net/dsa/dsa_loop.c                            |  3 +++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c       |  6 ++++--
 drivers/net/ethernet/atheros/alx/main.c               |  4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c             |  3 ++-
 drivers/net/ethernet/brocade/bna/bfa_ioc.c            |  2 +-
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c       |  4 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h            |  9 +++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c            | 38 +++++++++++++++++++++++++++++++-------
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h         |  6 +++---
 drivers/net/ethernet/mellanox/mlx4/cmd.c              | 14 +++++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c       |  5 +++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c            |  3 ++-
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.c             |  1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c             |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c            |  6 ++++--
 drivers/net/ethernet/qlogic/qede/qede_filter.c        |  5 -----
 drivers/net/ethernet/qlogic/qede/qede_main.c          | 25 ++++++++++++++++++++++++-
 drivers/net/ethernet/qlogic/qlge/qlge_dbg.c           |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c      | 41 ++++++++++++++++++-----------------------
 drivers/net/ethernet/ti/cpsw.c                        | 16 ++++++++++++++++
 drivers/net/hamradio/yam.c                            | 10 ++++------
 drivers/net/hippi/rrunner.c                           | 17 +++++++----------
 drivers/net/phy/mdio-mux-bcm-iproc.c                  |  5 ++++-
 drivers/net/usb/cdc_ncm.c                             | 11 +++++++----
 drivers/net/wimax/i2400m/i2400m-usb.h                 |  2 +-
 drivers/net/wireless/ath/ath9k/ar9003_mac.c           |  2 +-
 drivers/net/wireless/ath/ath9k/mac.c                  |  4 ++--
 drivers/net/wireless/intel/iwlegacy/4965-mac.c        |  4 +++-
 drivers/net/wireless/intel/iwlwifi/dvm/rx.c           |  4 +++-
 drivers/net/wireless/mac80211_hwsim.c                 |  8 +++++++-
 include/net/addrconf.h                                |  2 ++
 include/net/cfg80211.h                                |  2 +-
 include/net/mac80211.h                                |  2 --
 include/net/secure_seq.h                              | 10 ++++++----
 include/net/tcp.h                                     |  9 ++++++---
 kernel/bpf/verifier.c                                 | 21 ++++++++++++++++-----
 net/8021q/vlan_dev.c                                  | 13 ++++++++++---
 net/bridge/br_netlink.c                               |  4 ++++
 net/core/secure_seq.c                                 | 31 +++++++++++++++++++------------
 net/decnet/dn_neigh.c                                 | 12 ++++++------
 net/ipv4/inet_connection_sock.c                       |  2 ++
 net/ipv4/ip_vti.c                                     | 13 +++++++------
 net/ipv4/syncookies.c                                 | 12 ++++++++++--
 net/ipv4/tcp_input.c                                  |  8 +++-----
 net/ipv4/tcp_ipv4.c                                   | 32 +++++++++++++++++++-------------
 net/ipv4/tcp_output.c                                 |  4 +++-
 net/ipv6/addrconf.c                                   |  1 +
 net/ipv6/route.c                                      | 13 +++++++++++--
 net/ipv6/syncookies.c                                 | 10 +++++++++-
 net/ipv6/tcp_ipv6.c                                   | 32 +++++++++++++++++++-------------
 net/mac80211/ibss.c                                   |  2 ++
 net/mac80211/mlme.c                                   |  4 ++++
 net/wireless/nl80211.c                                |  4 ++--
 55 files changed, 345 insertions(+), 167 deletions(-)

^ permalink raw reply

* Re: [PATCH v2 net] dccp/tcp: do not inherit mc_list from parent
From: David Miller @ 2017-05-09 19:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, pray3r.z, andreyknvl
In-Reply-To: <1494336559.7796.78.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 09 May 2017 06:29:19 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> syzkaller found a way to trigger double frees from ip_mc_drop_socket()
> 
> It turns out that leave a copy of parent mc_list at accept() time,
> which is very bad.
> 
> Very similar to commit 8b485ce69876 ("tcp: do not inherit
> fastopen_req from parent")
> 
> Initial report from Pray3r, completed by Andrey one.
> Thanks a lot to them !
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Pray3r <pray3r.z@gmail.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> ---
> v2: fix moved into inet_csk_clone_lock() to fix both DCCP and TCP

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: Requirements for a shutdown function?
From: Florian Fainelli @ 2017-05-09 19:06 UTC (permalink / raw)
  To: Timur Tabi, netdev
In-Reply-To: <f995d5b3-2a5c-df08-3915-f77ab9203844@codeaurora.org>

On 05/09/2017 11:51 AM, Timur Tabi wrote:
> On 05/09/2017 01:46 PM, Florian Fainelli wrote:
>> A good test case for exercising a .shutdown() function is kexec'ing a
>> new kernel for instance.
> 
> I tried that.  I run iperf in one window while launching kexec in another.
> Even without a shutdown function, network traffic appear to halt on its own
> and the kexec succeeds.
> 
> Is it possible that the network stack detects a kexec and automatically
> stops all network devices?

No. why would it? However the device driver model does call into your
driver's remove function and that one does a right job already because
it does an network device unregister, and so on.

There is no strict requirement for implementing a .shutdown() function
AFAICT and it does not necessarily make sense to have one depending on
the bus type. For platform/MMIO devices, it hardly has any value, but on
e.g: PCI, it could be added as an additional step to perform a full
device shutdown.

> 
>> You should put your HW in a state where it won't be doing DMA, or have
>> any adverse side effects to the system, putting it in a low power state
>> is also a good approach.
> 
> My in-house driver stops the RX and TX queues.  I'm guessing that's good
> enough, but I don't have a failing test case to prove it.
> 

That's probably good enough, yes.
-- 
Florian

^ permalink raw reply

* Re: Requirements for a shutdown function?
From: Timur Tabi @ 2017-05-09 18:51 UTC (permalink / raw)
  To: Florian Fainelli, netdev
In-Reply-To: <1721db9b-ed60-4556-9aac-81f17e2c1849@gmail.com>

On 05/09/2017 01:46 PM, Florian Fainelli wrote:
> A good test case for exercising a .shutdown() function is kexec'ing a
> new kernel for instance.

I tried that.  I run iperf in one window while launching kexec in another.
Even without a shutdown function, network traffic appear to halt on its own
and the kexec succeeds.

Is it possible that the network stack detects a kexec and automatically
stops all network devices?

> You should put your HW in a state where it won't be doing DMA, or have
> any adverse side effects to the system, putting it in a low power state
> is also a good approach.

My in-house driver stops the RX and TX queues.  I'm guessing that's good
enough, but I don't have a failing test case to prove it.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: Requirements for a shutdown function?
From: Florian Fainelli @ 2017-05-09 18:46 UTC (permalink / raw)
  To: Timur Tabi, netdev
In-Reply-To: <49bee65f-2ea8-1787-9642-659a967df8f0@codeaurora.org>

On 05/09/2017 09:58 AM, Timur Tabi wrote:
> I'm trying to add a platform_driver.shutdown function to my Ethernet driver
> (drivers/net/ethernet/qualcomm/emac/*), but I can't find any definitive
> information as to what a network driver shutdown callback is supposed to do.
>  I also don't know what testcase I should use to verify that my function is
> working.

A good test case for exercising a .shutdown() function is kexec'ing a
new kernel for instance.

> 
> I see only four instances of a platform_driver.shutdown function in
> drivers/net/ethernet:
> 
> $ git grep -A 20 -w platform_driver | grep '\.shutdown'
> apm/xgene-v2/main.c-	.shutdown = xge_shutdown,
> apm/xgene/xgene_enet_main.c-	.shutdown = xgene_enet_shutdown,
> marvell/mv643xx_eth.c-	.shutdown	= mv643xx_eth_shutdown,
> marvell/pxa168_eth.c-	.shutdown = pxa168_eth_shutdown,
> 
> (Other shutdown functions are for pci_driver.shutdown).
> 
> For the xgene drivers, the shutdown function just calls the 'remove'
> function.  Isn't that overkill?  Why bother with a shutdown function if it's
> just the same thing as removing the driver outright?

Yes, that appears unnecessary.

> 
> mv643xx_eth_shutdown() seems more reasonable.  All it does is halt the TX
> and RX queues.
> 
> pxa168_eth_shutdown() is a little more heavyweight: halts the queues, and
> stops the DMA and calls phy_stop().
> 
> Can anyone help me figure out what my driver really should do?

You should put your HW in a state where it won't be doing DMA, or have
any adverse side effects to the system, putting it in a low power state
is also a good approach.
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox