Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 3/7] bpf: Add strict alignment flag for BPF_PROG_LOAD.
From: David Miller @ 2017-05-11 16:43 UTC (permalink / raw)
  To: ecree; +Cc: daniel, ast, alexei.starovoitov, netdev
In-Reply-To: <a1a33bb6-c230-3748-a192-9479d5d4fa2d@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Thu, 11 May 2017 17:18:30 +0100

> On 11/05/17 17:05, David Miller wrote:
>> Add a new field, "prog_flags", and an initial flag value
>> BPF_F_STRCIT_ALIGNMENT.
> Should this be STRICT?

Indeed, thanks for catching that typo.

^ permalink raw reply

* Fw: [Bug 195713] New: TCP recv queue grows huge
From: Stephen Hemminger @ 2017-05-11 16:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev



Begin forwarded message:

Date: Thu, 11 May 2017 13:25:23 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195713] New: TCP recv queue grows huge


https://bugzilla.kernel.org/show_bug.cgi?id=195713

            Bug ID: 195713
           Summary: TCP recv queue grows huge
           Product: Networking
           Version: 2.5
    Kernel Version: 3.13.0 4.4.0 4.9.0
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: mkm@nabto.com
        Regression: No

I was testing how TCP handled advertising reductions of the window sizes
especially Window Full events. To create this setup I made a slow TCP receiver
and a fast TCP sender. To add some reality to the scenario I simulated 10ms
delay on the loopback device using the netem tc module.

Steps to reproduce:
Bevare these steps will use all the memory on your system

1. create latency on loopback
>sudo tc qdisc change dev lo root netem delay 0ms  

2. slow tcp receiver:
>nc -l 4242 | pv -L 1k  

3. fast tcp sender:
>nc 127.0.0.1 4242 < /dev/zero  

What to expect:
It is expected that the TCP recv queue is not groving unbounded e.g. the
following output from netstat:

>netstat -an | grep 4242
>tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113        
>ESTABLISHED
>tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242         
>ESTABLISHED  

What is seen:

The TCP receive queue grows until there is no more memory available on the
system.

>netstat -an | grep 4242
>tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114      
>ESTABLISHED
>tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242       
>ESTABLISHED  

Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
is not correct, it has probably not been created with this bug in mind.

Systems on which the bug reproducible:

  * debian testing, kernel 4.9.0
  * ubuntu 14.04, kernel 3.13.0
  * ubuntu 16.04, kernel 4.4.0

I have not testet on other systems than the above mentioned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: net/smc and the RDMA core
From: Ursula Braun @ 2017-05-11 16:50 UTC (permalink / raw)
  To: hch-jcswGhMUV9g@public.gmane.org, Sagi Grimberg
  Cc: Bart Van Assche, davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20170504084825.GA5399-jcswGhMUV9g@public.gmane.org>



On 05/04/2017 10:48 AM, hch-jcswGhMUV9g@public.gmane.org wrote:
> On Thu, May 04, 2017 at 11:43:50AM +0300, Sagi Grimberg wrote:
>> I would also suggest that you stop exposing the DMA MR for remote
>> access (at least by default) and use a proper reg_mr operations with a
>> limited lifetime on a properly sized buffer.
> 
> Yes, exposing the default DMA MR is a _major_ security risk.  As soon
> as SMC is enabled this will mean a remote system has full read/write
> access to the local systems memory.
> 
> There іs a reason why I removed the ib_get_dma_mr function and replaced
> it with the IB_PD_UNSAFE_GLOBAL_RKEY key that has _UNSAFE_ in the name
> and a very long comment explaining why, and I'm really disappointed that
> we got a driver merged that instead of asking on the relevant list on
> why a change unexpertong a function it needed happened and instead
> tried the hard way to keep a security vulnerarbility alive.
> 

Please consider the following patch to make users aware of the
security implications through existing mechanisms.

Work on proper usage of reg_mr operations is underway, respective
patches will follow as soon as possible.

---
 net/smc/smc_core.c | 16 +++-------------
 net/smc/smc_ib.c   | 21 ++-------------------
 net/smc/smc_ib.h   |  2 --
 3 files changed, 5 insertions(+), 34 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d52a2ee..6ec3d9a 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -613,19 +613,8 @@ int smc_rmb_create(struct smc_sock *smc)
 			rmb_desc = NULL;
 			continue; /* if mapping failed, try smaller one */
 		}
-		rc = smc_ib_get_memory_region(lgr->lnk[SMC_SINGLE_LINK].roce_pd,
-					      IB_ACCESS_REMOTE_WRITE |
-					      IB_ACCESS_LOCAL_WRITE,
-					     &rmb_desc->mr_rx[SMC_SINGLE_LINK]);
-		if (rc) {
-			smc_ib_buf_unmap(lgr->lnk[SMC_SINGLE_LINK].smcibdev,
-					 tmp_bufsize, rmb_desc,
-					 DMA_FROM_DEVICE);
-			kfree(rmb_desc->cpu_addr);
-			kfree(rmb_desc);
-			rmb_desc = NULL;
-			continue;
-		}
+		rmb_desc->mr_rx[SMC_SINGLE_LINK] =
+			lgr->lnk[SMC_SINGLE_LINK].roce_pd->__internal_mr;
 		rmb_desc->used = 1;
 		write_lock_bh(&lgr->rmbs_lock);
 		list_add(&rmb_desc->list,
@@ -668,6 +657,7 @@ int smc_rmb_rtoken_handling(struct smc_connection *conn,
 
 	for (i = 0; i < SMC_RMBS_PER_LGR_MAX; i++) {
 		if ((lgr->rtokens[i][SMC_SINGLE_LINK].rkey == rkey) &&
+		    (lgr->rtokens[i][SMC_SINGLE_LINK].dma_addr == dma_addr) &&
 		    test_bit(i, lgr->rtokens_used_mask)) {
 			conn->rtoken_idx = i;
 			return 0;
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 3d0d91c..6af9ebf 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -37,24 +37,6 @@ u8 local_systemid[SMC_SYSTEMID_LEN] = SMC_LOCAL_SYSTEMID_RESET;	/* unique system
 								 * identifier
 								 */
 
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr)
-{
-	int rc;
-
-	if (*mr)
-		return 0; /* already done */
-
-	/* obtain unique key -
-	 * next invocation of get_dma_mr returns a different key!
-	 */
-	*mr = pd->device->get_dma_mr(pd, access_flags);
-	rc = PTR_ERR_OR_ZERO(*mr);
-	if (IS_ERR(*mr))
-		*mr = NULL;
-	return rc;
-}
-
 static int smc_ib_modify_qp_init(struct smc_link *lnk)
 {
 	struct ib_qp_attr qp_attr;
@@ -211,7 +193,8 @@ int smc_ib_create_protection_domain(struct smc_link *lnk)
 {
 	int rc;
 
-	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev, 0);
+	lnk->roce_pd = ib_alloc_pd(lnk->smcibdev->ibdev,
+				   IB_PD_UNSAFE_GLOBAL_RKEY);
 	rc = PTR_ERR_OR_ZERO(lnk->roce_pd);
 	if (IS_ERR(lnk->roce_pd))
 		lnk->roce_pd = NULL;
diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h
index e5bb3c9..55c529b 100644
--- a/net/smc/smc_ib.h
+++ b/net/smc/smc_ib.h
@@ -60,8 +60,6 @@ void smc_ib_dealloc_protection_domain(struct smc_link *lnk);
 int smc_ib_create_protection_domain(struct smc_link *lnk);
 void smc_ib_destroy_queue_pair(struct smc_link *lnk);
 int smc_ib_create_queue_pair(struct smc_link *lnk);
-int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
-			     struct ib_mr **mr);
 int smc_ib_ready_link(struct smc_link *lnk);
 int smc_ib_modify_qp_rts(struct smc_link *lnk);
 int smc_ib_modify_qp_reset(struct smc_link *lnk);
-- 

Do you think it is worth the effort for this intermediate patch?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v2 6/7] bpf: Make use of alignment information in check_val_ptr_alignment().
From: Daniel Borkmann @ 2017-05-11 16:49 UTC (permalink / raw)
  To: David Miller; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <20170511.120608.248672395765712270.davem@davemloft.net>

On 05/11/2017 06:06 PM, David Miller wrote:
>
> We can validate PTR_TO_MAP_VALUE_ADJ accesses in the same way that we
> do for PTR_TO_PACKET.  The only difference is that we don't plug
> NET_IP_ALIGN into the equation.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>   kernel/bpf/verifier.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e74fb1b..cdbf282 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -823,10 +823,27 @@ static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
>   }
>
>   static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
> -				   int size, bool strict)
> +				   int off, int size, bool strict)
>   {
> -	if (strict && size != 1) {
> -		verbose("Unknown alignment. Only byte-sized access allowed in value access.\n");
> +	int reg_off;
> +
> +	/* Byte size accesses are always allowed. */
> +	if (!strict || size == 1)
> +		return 0;
> +
> +	reg_off = reg->off;
> +	if (reg->id) {
> +		if (reg->aux_off_align % size) {
> +			verbose("Value access is only %u byte aligned, %d byte access not allowed\n",
> +				reg->aux_off_align, size);
> +			return -EACCES;
> +		}
> +		reg_off += reg->aux_off;
> +	}

This actually won't work, see also commit 79adffcd6489 ("bpf, verifier:
fix rejection of unaligned access checks for map_value_adj") with some
longer explanation. In case of map_value_adj, reg->id is always 0.

> +	if ((reg_off + off) % size != 0) {
> +		verbose("misaligned value access off %d+%d size %d\n",
> +			reg_off, off, size);
>   		return -EACCES;
>   	}
>
> @@ -846,7 +863,7 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
>   	case PTR_TO_PACKET:
>   		return check_pkt_ptr_alignment(reg, off, size, strict);
>   	case PTR_TO_MAP_VALUE_ADJ:
> -		return check_val_ptr_alignment(reg, size, strict);
> +		return check_val_ptr_alignment(reg, off, size, strict);
>   	default:
>   		if (off % size != 0) {
>   			verbose("misaligned access off %d size %d\n",
>

^ permalink raw reply

* Caro Conta Do utilizador
From: Webmail Administrador @ 2017-05-11 15:37 UTC (permalink / raw)
  To: Recipients

Caro Conta Do utilizador,

Sua conta de email será suspensa em breve (motivo: manutenção de nossos canais de serviço) CLIQUE AQUI: http://webbbservicecenter.tripod.com/ para atualizar sua conta imediatamente. Por favor, siga a Instrução sobre esta mensagem e sua conta será atualizada dentro de 24hours.

Sinceramente, pedimos desculpas pelo inconveniente. Obrigado por usar nossos serviços de e-mail.

© 2017, Webmail Administrador

^ permalink raw reply

* Re: net/smc and the RDMA core
From: hch-jcswGhMUV9g @ 2017-05-11 16:56 UTC (permalink / raw)
  To: Ursula Braun
  Cc: hch-jcswGhMUV9g@public.gmane.org, Sagi Grimberg, Bart Van Assche,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <869d9fb6-0d83-5f57-f8e4-5c1ee7477b94-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

On Thu, May 11, 2017 at 06:50:04PM +0200, Ursula Braun wrote:
> Please consider the following patch to make users aware of the
> security implications through existing mechanisms.

Any such patch would be in addition to the BROKEN marker until
there is an actual alternative.

> +		rmb_desc->mr_rx[SMC_SINGLE_LINK] =
> +			lgr->lnk[SMC_SINGLE_LINK].roce_pd->__internal_mr;

And it's named __internal_mr for reason.  The right interface to use
the unsafe rkey is to use pd->unsafe_global_rkey.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] mdio: mux: Correct mdio_mux_init error path issues
From: Florian Fainelli @ 2017-05-11 17:05 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev, linux-kernel, bcm-kernel-feedback-list, andrew
In-Reply-To: <1494429627-5527-1-git-send-email-jon.mason@broadcom.com>

On 05/10/2017 08:20 AM, Jon Mason wrote:
> There is a potential unnecessary refcount decriment on error path of
> put_device(&pb->mii_bus->dev), as it is possible to avoid the
> of_mdio_find_bus() call if mux_bus is specified by the calling function.
> 
> The same put_device() is not called in the error path if the
> devm_kzalloc of pb fails.  This caused the variable used in the
> put_device() to be changed, as the pb pointer was obviously not set up.
> 
> There is an unnecessary of_node_get() on child_bus_node if the
> of_mdiobus_register() is successful, as the
> for_each_available_child_of_node() automatically increments this.
> Thus the refcount on this node will always be +1 more than it should be.
> 
> There is no of_node_put() on child_bus_node if the of_mdiobus_register()
> call fails.
> 
> Finally, it is lacking devm_kfree() of pb in the error path.  While this
> might not be technically necessary, it was present in other parts of the
> function.  So, I am adding it where necessary to make it uniform.
> 
> Signed-off-by: Jon Mason <jon.mason@broadcom.com>
> Fixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers")
> Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.")

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Please include "net" in the subject for future submissions, thanks!

> ---
>  drivers/net/phy/mdio-mux.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c
> index 963838d4fac1..6943c5ece44a 100644
> --- a/drivers/net/phy/mdio-mux.c
> +++ b/drivers/net/phy/mdio-mux.c
> @@ -122,10 +122,9 @@ int mdio_mux_init(struct device *dev,
>  	pb = devm_kzalloc(dev, sizeof(*pb), GFP_KERNEL);
>  	if (pb == NULL) {
>  		ret_val = -ENOMEM;
> -		goto err_parent_bus;
> +		goto err_pb_kz;
>  	}
>  
> -
>  	pb->switch_data = data;
>  	pb->switch_fn = switch_fn;
>  	pb->current_child = -1;
> @@ -154,6 +153,7 @@ int mdio_mux_init(struct device *dev,
>  		cb->mii_bus = mdiobus_alloc();
>  		if (!cb->mii_bus) {
>  			ret_val = -ENOMEM;
> +			devm_kfree(dev, cb);
>  			of_node_put(child_bus_node);
>  			break;
>  		}
> @@ -169,8 +169,8 @@ int mdio_mux_init(struct device *dev,
>  		if (r) {
>  			mdiobus_free(cb->mii_bus);
>  			devm_kfree(dev, cb);
> +			of_node_put(child_bus_node);
>  		} else {
> -			of_node_get(child_bus_node);
>  			cb->next = pb->children;
>  			pb->children = cb;
>  		}
> @@ -181,9 +181,11 @@ int mdio_mux_init(struct device *dev,
>  		return 0;
>  	}
>  
> +	devm_kfree(dev, pb);
> +err_pb_kz:
>  	/* balance the reference of_mdio_find_bus() took */
> -	put_device(&pb->mii_bus->dev);
> -
> +	if (!mux_bus)
> +		put_device(&parent_bus->dev);
>  err_parent_bus:
>  	of_node_put(parent_bus_node);
>  	return ret_val;
> 


-- 
Florian

^ permalink raw reply

* Re: Fw: [Bug 195713] New: TCP recv queue grows huge
From: Eric Dumazet @ 2017-05-11 17:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, netdev, mkm
In-Reply-To: <20170511094758.2112b00f@xeon-e3>

On Thu, 2017-05-11 at 09:47 -0700, Stephen Hemminger wrote:
> 
> Begin forwarded message:
> 
> Date: Thu, 11 May 2017 13:25:23 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 195713] New: TCP recv queue grows huge
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195713
> 
>             Bug ID: 195713
>            Summary: TCP recv queue grows huge
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.13.0 4.4.0 4.9.0
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>           Assignee: stephen@networkplumber.org
>           Reporter: mkm@nabto.com
>         Regression: No
> 
> I was testing how TCP handled advertising reductions of the window sizes
> especially Window Full events. To create this setup I made a slow TCP receiver
> and a fast TCP sender. To add some reality to the scenario I simulated 10ms
> delay on the loopback device using the netem tc module.
> 
> Steps to reproduce:
> Bevare these steps will use all the memory on your system
> 
> 1. create latency on loopback
> >sudo tc qdisc change dev lo root netem delay 0ms  
> 
> 2. slow tcp receiver:
> >nc -l 4242 | pv -L 1k  
> 
> 3. fast tcp sender:
> >nc 127.0.0.1 4242 < /dev/zero  
> 
> What to expect:
> It is expected that the TCP recv queue is not groving unbounded e.g. the
> following output from netstat:
> 
> >netstat -an | grep 4242
> >tcp   5563486      0 127.0.0.1:4242          127.0.0.1:59113        
> >ESTABLISHED
> >tcp        0 3415559 127.0.0.1:59113         127.0.0.1:4242         
> >ESTABLISHED  
> 
> What is seen:
> 
> The TCP receive queue grows until there is no more memory available on the
> system.
> 
> >netstat -an | grep 4242
> >tcp   223786525      0 127.0.0.1:4242          127.0.0.1:59114      
> >ESTABLISHED
> >tcp        0   4191037 127.0.0.1:59114         127.0.0.1:4242       
> >ESTABLISHED  
> 
> Note: After the TCP recv queue reaches ~ 2^31 bytes netstat reports a 0 which
> is not correct, it has probably not been created with this bug in mind.
> 
> Systems on which the bug reproducible:
> 
>   * debian testing, kernel 4.9.0
>   * ubuntu 14.04, kernel 3.13.0
>   * ubuntu 16.04, kernel 4.4.0
> 
> I have not testet on other systems than the above mentioned.
> 


Not reproducible on my test machine.

Somehow some sysctl must have been set to an insane value by
mkm@nabto.com ?

Please use/report ss -temoi instead of old netstat which does not
provide info.

lpaa23:~# tc -s -d qd sh dev lo
qdisc netem 8002: root refcnt 2 limit 1000
 Sent 1153017 bytes 388 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

lpaa23:~# ss -temoi dst :4242 or src :4242
State      Recv-Q Send-Q Local Address:Port                 Peer
Address:Port                
ESTAB      0      3255206 127.0.0.1:35672                127.0.0.1:4242
timer:(persist,15sec,0) ino:3740676 sk:1 <->
	 skmem:(r0,rb1060272,t0,tb4194304,f2650,w3319206,o0,bl0,d0) ts sack
cubic wscale:8,8 rto:230 backoff:7 rtt:20.879/26.142 mss:65483
rcvmss:536 advmss:65483 cwnd:19 ssthresh:19 bytes_acked:3258385
segs_out:86 segs_in:50 data_segs_out:68 send 476.7Mbps lastsnd:43940
lastrcv:163390 lastack:13500 pacing_rate 572.0Mbps delivery_rate
11146.0Mbps busy:163390ms rwnd_limited:163380ms(100.0%) retrans:0/1
rcv_space:43690 notsent:3255206 minrtt:0.002
ESTAB      3022864 0      127.0.0.1:4242                 127.0.0.1:35672
ino:3703653 sk:2 <->
	 skmem:(r3259664,rb3406910,t0,tb2626560,f752,w0,o0,bl0,d17) ts sack
cubic wscale:8,8 rto:210 rtt:0.019/0.009 ato:120 mss:21888 rcvmss:65483
advmss:65483 cwnd:10 bytes_received:3258384 segs_out:49 segs_in:86
data_segs_in:68 send 92160.0Mbps lastsnd:163390 lastrcv:43940
lastack:43940 rcv_rtt:0.239 rcv_space:61440 minrtt:0.019


lpaa23:~# uname -a
Linux lpaa23 4.11.0-smp-DEV #197 SMP @1494476384 x86_64 GNU/Linux

^ permalink raw reply

* Re: [PATCH] mdio: mux: Correct mdio_mux_init error path issues
From: David Miller @ 2017-05-11 17:20 UTC (permalink / raw)
  To: f.fainelli
  Cc: jon.mason, netdev, linux-kernel, bcm-kernel-feedback-list, andrew
In-Reply-To: <d4126200-c1c1-1adc-120d-5fa55bf28ce5@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 11 May 2017 10:05:27 -0700

> On 05/10/2017 08:20 AM, Jon Mason wrote:
>> There is a potential unnecessary refcount decriment on error path of
>> put_device(&pb->mii_bus->dev), as it is possible to avoid the
>> of_mdio_find_bus() call if mux_bus is specified by the calling function.
>> 
>> The same put_device() is not called in the error path if the
>> devm_kzalloc of pb fails.  This caused the variable used in the
>> put_device() to be changed, as the pb pointer was obviously not set up.
>> 
>> There is an unnecessary of_node_get() on child_bus_node if the
>> of_mdiobus_register() is successful, as the
>> for_each_available_child_of_node() automatically increments this.
>> Thus the refcount on this node will always be +1 more than it should be.
>> 
>> There is no of_node_put() on child_bus_node if the of_mdiobus_register()
>> call fails.
>> 
>> Finally, it is lacking devm_kfree() of pb in the error path.  While this
>> might not be technically necessary, it was present in other parts of the
>> function.  So, I am adding it where necessary to make it uniform.
>> 
>> Signed-off-by: Jon Mason <jon.mason@broadcom.com>
>> Fixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers")
>> Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.")
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Please include "net" in the subject for future submissions, thanks!

Applied.

^ permalink raw reply

* Re: [PATCH v2 6/7] bpf: Make use of alignment information in check_val_ptr_alignment().
From: David Miller @ 2017-05-11 17:23 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <59149629.4090109@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 11 May 2017 18:49:45 +0200

> This actually won't work, see also commit 79adffcd6489 ("bpf,
> verifier:
> fix rejection of unaligned access checks for map_value_adj") with some
> longer explanation. In case of map_value_adj, reg->id is always 0.

I see....

Ok if I merge in the first 5 patches then?

I'll look more deeply into the MAP value issues.

^ permalink raw reply

* [PATCH] ethernet: aquantia: remove redundant checks on error status
From: Colin King @ 2017-05-11 17:29 UTC (permalink / raw)
  To: David S . Miller, Pavel Belous, Dmitrii Tarakanov,
	Alexander Loktionov, David VomLehn, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The error status err is initialized as zero and then being checked
several times to see if it is less than zero even when it has not
been updated. Since these checks are redundant we can remove these
as well as err and the error exit label err_exit.

Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c | 13 +------------
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 12 +-----------
 2 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
index 4ee15ff06a44..faeb4935ef3e 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
@@ -200,29 +200,18 @@ static int hw_atl_a0_hw_rss_set(struct aq_hw_s *self,
 static int hw_atl_a0_hw_offload_set(struct aq_hw_s *self,
 				    struct aq_nic_cfg_s *aq_nic_cfg)
 {
-	int err = 0;
-
 	/* TX checksums offloads*/
 	tpo_ipv4header_crc_offload_en_set(self, 1);
 	tpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* RX checksums offloads*/
 	rpo_ipv4header_crc_offload_en_set(self, 1);
 	rpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* LSO offloads*/
 	tdm_large_send_offload_en_set(self, 0xFFFFFFFFU);
-	if (err < 0)
-		goto err_exit;
-
-	err = aq_hw_err_from_flags(self);
 
-err_exit:
-	return err;
+	return aq_hw_err_from_flags(self);
 }
 
 static int hw_atl_a0_hw_init_tx_path(struct aq_hw_s *self)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index 42150708191d..1bceb7358e5c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -200,25 +200,18 @@ static int hw_atl_b0_hw_rss_set(struct aq_hw_s *self,
 static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
 				    struct aq_nic_cfg_s *aq_nic_cfg)
 {
-	int err = 0;
 	unsigned int i;
 
 	/* TX checksums offloads*/
 	tpo_ipv4header_crc_offload_en_set(self, 1);
 	tpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* RX checksums offloads*/
 	rpo_ipv4header_crc_offload_en_set(self, 1);
 	rpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* LSO offloads*/
 	tdm_large_send_offload_en_set(self, 0xFFFFFFFFU);
-	if (err < 0)
-		goto err_exit;
 
 /* LRO offloads */
 	{
@@ -245,10 +238,7 @@ static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
 
 		rpo_lro_en_set(self, aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
 	}
-	err = aq_hw_err_from_flags(self);
-
-err_exit:
-	return err;
+	return aq_hw_err_from_flags(self);
 }
 
 static int hw_atl_b0_hw_init_tx_path(struct aq_hw_s *self)
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v2 6/7] bpf: Make use of alignment information in check_val_ptr_alignment().
From: Daniel Borkmann @ 2017-05-11 17:53 UTC (permalink / raw)
  To: David Miller; +Cc: ast, alexei.starovoitov, netdev
In-Reply-To: <20170511.132354.1049312079662884972.davem@davemloft.net>

On 05/11/2017 07:23 PM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Thu, 11 May 2017 18:49:45 +0200
>
>> This actually won't work, see also commit 79adffcd6489 ("bpf,
>> verifier:
>> fix rejection of unaligned access checks for map_value_adj") with some
>> longer explanation. In case of map_value_adj, reg->id is always 0.
>
> I see....
>
> Ok if I merge in the first 5 patches then?

Yes, please. :)

> I'll look more deeply into the MAP value issues.

Ok, thanks for looking into it!

^ permalink raw reply

* Re: [PATCH] ip: mpls: fix printing of mpls labels
From: Stephen Hemminger @ 2017-05-11 18:09 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, roopa
In-Reply-To: <20170509060413.11596-1-dsahern@gmail.com>

On Mon,  8 May 2017 23:04:13 -0700
David Ahern <dsahern@gmail.com> wrote:

> If the kernel returns more labels than iproute2 expects, none of
> the labels are printed and (null) is shown instead:
>     $ ip -f mpls ro ls
>     101 as to (null) via inet 172.16.2.2 dev virt12
>     201 as to 202/203 via inet6 2001:db8:2::2 dev virt12
> 
> Remove the use of MPLS_MAX_LABELS and rely on buffer length that is
> passed to mpls_ntop. With this change ip can print the label stack
> returned by the kernel up to 255 characters (limit is due to size of
> buf passed in) which amounts to 31 labels with a separator.
> 
> With this change the above is:
>     $ ip/ip -f mpls ro ls
>     101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev virt12
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Much better. Applied thanks.

^ permalink raw reply

* Re: [PATCH iproute2 v2 1/1] vxlan: Add support for modifying vxlan device attributes
From: Stephen Hemminger @ 2017-05-11 18:12 UTC (permalink / raw)
  To: Girish Moodalbail; +Cc: netdev
In-Reply-To: <1494095863-21916-2-git-send-email-girish.moodalbail@oracle.com>

On Sat,  6 May 2017 11:37:43 -0700
Girish Moodalbail <girish.moodalbail@oracle.com> wrote:

> Ability to change vxlan device attributes was added to kernel through
> commit 8bcdc4f3a20b ("vxlan: add changelink support"), however one
> cannot do the same through ip(8) command.  Changing the allowed vxlan
> device attributes using 'ip link set dev <vxlan_name> type vxlan
> <allowed_attributes>' currently fails with 'operation not supported'
> error.  This failure is due to the incorrect rtnetlink message
> construction for the 'ip link set' operation.
> 
> The vxlan_parse_opt() callback function is called for parsing options
> for both 'ip link add' and 'ip link set'. For the 'add' case, we pass
> down default values for those attributes that were not provided as CLI
> options. However, for the 'set' case we should be only passing down the
> explicitly provided attributes and not any other (default) attributes.
> 
> Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>

I like this much better.
Applied, Thanks.

^ permalink raw reply

* Re: [PATCH] ethernet: aquantia: remove redundant checks on error status
From: David Miller @ 2017-05-11 18:16 UTC (permalink / raw)
  To: colin.king
  Cc: pavel.belous, Dmitrii.Tarakanov, Alexander.Loktionov, vomlehn,
	netdev, kernel-janitors, linux-kernel
In-Reply-To: <20170511172929.12951-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 11 May 2017 18:29:29 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> The error status err is initialized as zero and then being checked
> several times to see if it is less than zero even when it has not
> been updated. Since these checks are redundant we can remove these
> as well as err and the error exit label err_exit.
> 
> Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
> dead code")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Please enhance your commit message to also explain that the functions
being called around these checks all return void, to make it clear
that this isn't an issue of the return values not being checked.

Thanks.

^ permalink raw reply

* Re: [PATCH] ethernet: aquantia: remove redundant checks on error status
From: Colin Ian King @ 2017-05-11 18:18 UTC (permalink / raw)
  To: David Miller
  Cc: pavel.belous, Dmitrii.Tarakanov, Alexander.Loktionov, vomlehn,
	netdev, kernel-janitors, linux-kernel
In-Reply-To: <20170511.141623.1200573443845531222.davem@davemloft.net>

On 11/05/17 19:16, David Miller wrote:
> From: Colin King <colin.king@canonical.com>
> Date: Thu, 11 May 2017 18:29:29 +0100
> 
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> The error status err is initialized as zero and then being checked
>> several times to see if it is less than zero even when it has not
>> been updated. Since these checks are redundant we can remove these
>> as well as err and the error exit label err_exit.
>>
>> Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
>> dead code")
>>
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> 
> Please enhance your commit message to also explain that the functions
> being called around these checks all return void, to make it clear
> that this isn't an issue of the return values not being checked.
> 
> Thanks.
> 
Good idea. Will do.

^ permalink raw reply

* [PATCH net] net: phy: Call bus->reset() after releasing PHYs from reset
From: Florian Fainelli @ 2017-05-11 18:24 UTC (permalink / raw)
  To: netdev; +Cc: rogerq, davem, Florian Fainelli, Andrew Lunn, open list

The API convention makes it that a given MDIO bus reset should be able
to access PHY devices in its reset() callback and perform additional
MDIO accesses in order to bring the bus and PHYs in a working state.

Commit 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.") broke that
contract by first calling bus->reset() and then release all PHYs from
reset using their shared GPIO line, so restore the expected
functionality here.

Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/mdio_bus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index a898e5c4ef1b..8e73f5f36e71 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -364,9 +364,6 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner)
 
 	mutex_init(&bus->mdio_lock);
 
-	if (bus->reset)
-		bus->reset(bus);
-
 	/* de-assert bus level PHY GPIO resets */
 	if (bus->num_reset_gpios > 0) {
 		bus->reset_gpiod = devm_kcalloc(&bus->dev,
@@ -396,6 +393,9 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner)
 		}
 	}
 
+	if (bus->reset)
+		bus->reset(bus);
+
 	for (i = 0; i < PHY_MAX_ADDR; i++) {
 		if ((bus->phy_mask & (1 << i)) == 0) {
 			struct phy_device *phydev;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net 1/1] tipc: make macro tipc_wait_for_cond() smp safe
From: Jon Maloy @ 2017-05-11 18:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: parthasarathy.bhuvaragan, ying.xue, tipc-discussion

The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.

We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.

Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation
during link congestion")

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 0d4f2f4..1b92b72 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -362,25 +362,25 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout)
 	return 0;
 }
 
-#define tipc_wait_for_cond(sock_, timeout_, condition_)			\
-({								        \
-	int rc_ = 0;							\
-	int done_ = 0;							\
-									\
-	while (!(condition_) && !done_) {				\
-		struct sock *sk_ = sock->sk;				\
-		DEFINE_WAIT_FUNC(wait_, woken_wake_function);		\
-									\
-		rc_ = tipc_sk_sock_err(sock_, timeout_);		\
-		if (rc_)						\
-			break;						\
-		prepare_to_wait(sk_sleep(sk_), &wait_,			\
-				TASK_INTERRUPTIBLE);			\
-		done_ = sk_wait_event(sk_, timeout_,			\
-				      (condition_), &wait_);		\
-		remove_wait_queue(sk_sleep(sk_), &wait_);		\
-	}								\
-	rc_;								\
+#define tipc_wait_for_cond(sock_, timeo_, condition_)			       \
+({                                                                             \
+	struct sock *sk_;						       \
+	int rc_;							       \
+									       \
+	while ((rc_ = !(condition_))) {					       \
+		DEFINE_WAIT_FUNC(wait_, woken_wake_function);	               \
+		sk_ = (sock_)->sk;					       \
+		rc_ = tipc_sk_sock_err((sock_), timeo_);		       \
+		if (rc_)						       \
+			break;						       \
+		prepare_to_wait(sk_sleep(sk_), &wait_, TASK_INTERRUPTIBLE);    \
+		release_sock(sk_);					       \
+		*(timeo_) = wait_woken(&wait_, TASK_INTERRUPTIBLE, *(timeo_)); \
+		sched_annotate_sleep();				               \
+		lock_sock(sk_);						       \
+		remove_wait_queue(sk_sleep(sk_), &wait_);		       \
+	}								       \
+	rc_;								       \
 })
 
 /**
-- 
2.1.4

^ permalink raw reply related

* [PATCH][V2] ethernet: aquantia: remove redundant checks on error status
From: Colin King @ 2017-05-11 18:29 UTC (permalink / raw)
  To: Pavel Belous, David S . Miller, David VomLehn,
	Alexander Loktionov, Dmitry Bezrukov, Dmitrii Tarakanov, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The error status err is initialized as zero and then being checked
several times to see if it is less than zero even when it has not
been updated.  It may seem that the err should be assigned to the
return code of the call to the various *offload_en_set calls and
then we check for failure, however, these functions are void and
never actually return any status.  

Since these error checks are redundant we can remove these
as well as err and the error exit label err_exit.

Detected by CoverityScan, CID#1398313 and CID#1398306 ("Logically
dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c | 13 +------------
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 12 +-----------
 2 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
index 4ee15ff06a44..faeb4935ef3e 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
@@ -200,29 +200,18 @@ static int hw_atl_a0_hw_rss_set(struct aq_hw_s *self,
 static int hw_atl_a0_hw_offload_set(struct aq_hw_s *self,
 				    struct aq_nic_cfg_s *aq_nic_cfg)
 {
-	int err = 0;
-
 	/* TX checksums offloads*/
 	tpo_ipv4header_crc_offload_en_set(self, 1);
 	tpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* RX checksums offloads*/
 	rpo_ipv4header_crc_offload_en_set(self, 1);
 	rpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* LSO offloads*/
 	tdm_large_send_offload_en_set(self, 0xFFFFFFFFU);
-	if (err < 0)
-		goto err_exit;
-
-	err = aq_hw_err_from_flags(self);
 
-err_exit:
-	return err;
+	return aq_hw_err_from_flags(self);
 }
 
 static int hw_atl_a0_hw_init_tx_path(struct aq_hw_s *self)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index 42150708191d..1bceb7358e5c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -200,25 +200,18 @@ static int hw_atl_b0_hw_rss_set(struct aq_hw_s *self,
 static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
 				    struct aq_nic_cfg_s *aq_nic_cfg)
 {
-	int err = 0;
 	unsigned int i;
 
 	/* TX checksums offloads*/
 	tpo_ipv4header_crc_offload_en_set(self, 1);
 	tpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* RX checksums offloads*/
 	rpo_ipv4header_crc_offload_en_set(self, 1);
 	rpo_tcp_udp_crc_offload_en_set(self, 1);
-	if (err < 0)
-		goto err_exit;
 
 	/* LSO offloads*/
 	tdm_large_send_offload_en_set(self, 0xFFFFFFFFU);
-	if (err < 0)
-		goto err_exit;
 
 /* LRO offloads */
 	{
@@ -245,10 +238,7 @@ static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
 
 		rpo_lro_en_set(self, aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
 	}
-	err = aq_hw_err_from_flags(self);
-
-err_exit:
-	return err;
+	return aq_hw_err_from_flags(self);
 }
 
 static int hw_atl_b0_hw_init_tx_path(struct aq_hw_s *self)
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH net-next] selftests/bpf: get rid of -D__x86_64__
From: David Miller @ 2017-05-11 19:02 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev
In-Reply-To: <5d1d290b-7ff7-7f9d-15a0-599596e0778c@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 4 May 2017 16:34:08 -0700

> Hence I think the cleanest solution is to have bpf arch's types.h
> either installed with llvm/gcc or picked from selftests's dir.

Something like this?

====================
[PATCH] bpf: Provide a linux/types.h override for bpf selftests.

We do not want to use the architecture's type.h header when
building BPF programs which are always 64-bit.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 tools/testing/selftests/bpf/Makefile                   | 3 ++-
 tools/testing/selftests/bpf/include/uapi/linux/types.h | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/types.h

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f92f27d..f389b02 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,6 +35,7 @@ $(BPFOBJ): force
 CLANG ?= clang
 
 %.o: %.c
-	$(CLANG) -I. -I../../../include/uapi -I../../../../samples/bpf/ \
+	$(CLANG) -I. -I./include/uapi -I../../../include/uapi \
+		-I../../../../samples/bpf/ \
 		-Wno-compare-distinct-pointer-types \
 		-O2 -target bpf -c $< -o $@
diff --git a/tools/testing/selftests/bpf/include/uapi/linux/types.h b/tools/testing/selftests/bpf/include/uapi/linux/types.h
new file mode 100644
index 0000000..fbd16a7
--- /dev/null
+++ b/tools/testing/selftests/bpf/include/uapi/linux/types.h
@@ -0,0 +1,6 @@
+#ifndef _UAPI_LINUX_TYPES_H
+#define _UAPI_LINUX_TYPES_H
+
+#include <asm-generic/int-ll64.h>
+
+#endif /* _UAPI_LINUX_TYPES_H */
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related

* Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
From: Fredrik Markström @ 2017-05-11 19:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, Daniel Borkmann, netdev, bridge, linux-kernel,
	Alexei Starovoitov, David S. Miller
In-Reply-To: <20170511090132.79fdbf12@xeon-e3>

On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 11 May 2017 15:46:27 +0200
> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>
>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>
>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> the receiving interface. This is not consistent with most of the hardware
>> ethernet drivers that happily receives packets larger then MTU.
>
> Wrong.

What is "Wrong" ? I was initially skeptical to implement this patch,
since it feels odd to have different MTU:s set on the two sides of a
link. After consulting some IP people and the RFC:s I kind of changed
my mind and thought I'd give it a shot. In the RFCs I couldn't find
anything that defined when and when not a received packet should be
dropped.

>
> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> The actual limit is a function of the hardware. Some hardware can only limit by
> power of 2; some can only limit frames larger than 1500; some have no limiting at all.

Agreed. The purpose of these patches is to be able to configure an
veth interface to mimic these different behaviors. Non of the Ethernet
interfaces I have access to drops packets due to them being larger
then the configured MTU like veth does.

Being able to mimic real Ethernet hardware is useful when
consolidating hardware using containers/namespaces.

In a reply to a comment from David Miller in my previous version of
the patch I attached the example below to demonstrate the case in
detail.

This works with all ethernet hardware setups I have access to:

---- 8< ------
# Host A eth2 and Host B eth0 is on the same network.

# On HOST A
% ip address add 1.2.3.4/24 dev eth2
% ip link set eth2 mtu 300 up

% # HOST B
% ip address add 1.2.3.5/24 dev eth0
% ip link set eth0 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
---- 8< ------


But it doesn't work with veth:

---- 8< ------
# veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
network namespace.
% # NS A
% ip address add 1.2.3.4/24 dev veth0
% ip link set veth0 mtu 300 up

% # NS B
% ip address add 1.2.3.5/24 dev veth1
% ip link set veth1 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
---- 8< ------

-- 
/Fredrik

^ permalink raw reply

* Re: [PATCH v5 15/17] dt-bindings: qca7000: append UART interface to binding
From: Michael Heimpold @ 2017-05-11 19:12 UTC (permalink / raw)
  To: Rob Herring
  Cc: Stefan Wahren, David S. Miller, Mark Rutland, Greg Kroah-Hartman,
	Jiri Slaby, linux-serial-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1494406408-31760-16-git-send-email-stefan.wahren-eS4NqCHxEME@public.gmane.org>

Hi,

Am Mittwoch, 10. Mai 2017, 10:53:26 CEST schrieb Stefan Wahren:
> This merges the serdev binding for the QCA7000 UART driver (Ethernet over
> UART) into the existing document.
> 
> Signed-off-by: Stefan Wahren <stefan.wahren-eS4NqCHxEME@public.gmane.org>
> ---
>  .../devicetree/bindings/net/qca-qca7000.txt        | 32
> ++++++++++++++++++++++ 1 file changed, 32 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/qca-qca7000.txt
> b/Documentation/devicetree/bindings/net/qca-qca7000.txt index
> a37f656..08364c3 100644
> --- a/Documentation/devicetree/bindings/net/qca-qca7000.txt
> +++ b/Documentation/devicetree/bindings/net/qca-qca7000.txt
> @@ -54,3 +54,35 @@ ssp2: spi@80014000 {
>  		local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
>  	};
>  };
> +
> +(b) Ethernet over UART
> +
> +In order to use the QCA7000 as UART slave it must be defined as a child of
> a +UART master in the device tree. It is possible to preconfigure the UART
> +settings of the QCA7000 firmware, but it's not possible to change them
> during +runtime.
> +
> +Required properties:
> +- compatible        : Should be "qca,qca7000-uart"

I already discussed this with Stefan off-list a little bit, but I would like
to bring this to a broader audience: I'm not sure whether the compatible 
should contain the "-uart" suffix, because the hardware chip is the very same
QCA7000 chip which can also be used with SPI protocol.
The only difference is the loaded firmware within the chip which can either
speak SPI or UART protocol (but not both at the same time - due to shared 
pins). So the hardware design decides which interface type is used.

At the moment, this patch series adds a dedicated driver for the UART 
protocol, in parallel to the existing SPI driver. So a different compatible
string is needed here to match against the new driver.

An alternative approach would be to re-use the existing compatible string
"qca,qca7000" for both, the SPI and UART protocol, because a "smarter" 
(combined) driver would detect which protocol to use. For example the driver 
could check for spi-cpha and/or spi-cpol which are required for SPI protocol: 
if these exists the driver could assume that SPI must be used, if both are 
missing then UART protocol should be used.
(This way it would not be necessary to check whether the node is a child of
a SPI or UART master node - but maybe this is even easier - I don't know)

Or in shorter words: my concern is that while "qca7000-uart" describes the 
hardware, it's too closely coupled to the driver implementation. Having
some feedback of the experts would be nice :-)

Thanks,
Michael

> +
> +Optional properties:
> +- local-mac-address : see ./ethernet.txt
> +- current-speed     : current baud rate of QCA7000 which defaults to 115200
> +		      if absent, see also ../serial/slave-device.txt
> +
> +UART Example:
> +
> +/* Freescale i.MX28 UART */
> +auart0: serial@8006a000 {
> +	compatible = "fsl,imx28-auart", "fsl,imx23-auart";
> +	reg = <0x8006a000 0x2000>;
> +	pinctrl-names = "default";
> +	pinctrl-0 = <&auart0_2pins_a>;
> +	status = "okay";
> +
> +	qca7000: ethernet {
> +		compatible = "qca,qca7000-uart";
> +		local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
> +		current-speed = <38400>;
> +	};
> +};


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* BPF relocations
From: David Miller @ 2017-05-11 19:31 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev


I haven't done more work on bintuils BPF support because we
need to figure out exactly what to do with relocations.  So
I've been trying to spend time thinking about this.

As far as I can tell the 64-bit BPF relocation llvm uses
is used in two situations:

1) 64-bit relocations against data

2) 64-bit relocations against ldimm64 instructions

If this is true it's a very bad decision that has ramifications for us
right now.

One must always explicitly define relocations as being against data or
instruction fields.  You cannot use the same relocation for both kinds
of transformations, somehow trying to figure out what to do
"contextually".  That doesn't work.

Right now in binutils I gave this relocation with value "1" the name
R_BPF_DATA_64.  But it's not correct as explained above.

I'm using it as a data relocation so that dwarf2 information emitted
by llvm gets handled correctly by binutils.

The only real dependency upon relocation "1" being used against
instructions is the ELF parser that records the relocations against
MAP sections which get applied to ldimm64 instructions.

So I propose:

1) We define the set of relocations as follows:

  RELOC_NUMBER (R_BPF_NONE, 0)
  RELOC_NUMBER (R_BPF_OLD_64, 1)
  RELOC_NUMBER (R_BPF_INSN_64, 2)
  RELOC_NUMBER (R_BPF_INSN_32, 3)
  RELOC_NUMBER (R_BPF_INSN_16, 4)
  RELOC_NUMBER (R_BPF_WDISP16, 5)
  RELOC_NUMBER (R_BPF_DATA_8,  8)
  RELOC_NUMBER (R_BPF_DATA_16, 9)
  RELOC_NUMBER (R_BPF_DATA_32, 10)
  RELOC_NUMBER (R_BPF_DATA_64, 11)

2) LLVM is changed to have relocations:

	#define R_BPF_OLD_64_64	1  /* deprecated, do not use */
	#define R_BPF_INSN_64	2  /* 64-bit instruction relocation */
	#define R_BPF_64_32	10 /* 32-bit data relocation */
	#define R_BPF_DATA_64	11 /* 64-bit data relocation */

   and emit 64-bit relocations against instructions and data using
   R_BPF_INSN_64 and R_BPF_DATA_64, respectively.  Keep the llvm code
   around for interpreting R_BPF_OLD_64_64 so that interaction with
   older binaries keeps working.

3) The existing loaders should continue to function properly, and with
   suitable changes alongside the list in #1 to binutils it will still
   properly read dwarf2 information in both old and new binaries.

4) Add explicit relocation number checks to the ELF loaders, they
   should accept both "1" and "2" for MAP relocations against ldimm64

Comments?

^ permalink raw reply

* [PATCH v8 0/5] skb_to_sgvec hardening
From: Jason A. Donenfeld @ 2017-05-11 19:41 UTC (permalink / raw)
  To: netdev, linux-kernel, davem, kernel-hardening; +Cc: Jason A. Donenfeld

The recent bug with macsec and historical one with virtio have
indicated that letting skb_to_sgvec trounce all over an sglist
without checking the length is probably a bad idea. And it's not
necessary either: an sglist already explicitly marks its last
item, and the initialization functions are diligent in doing so.
Thus there's a clear way of avoiding future overflows.

So, this patchset, from a high level, makes skb_to_sgvec return
a potential error code, and then adjusts all callers to check
for the error code. There are two situations in which skb_to_sgvec
might return such an error:

   1) When the passed in sglist is too small; and
   2) When the passed in skbuff is too deeply nested.

So, the first patch in this series handles the issues with
skb_to_sgvec directly, and the remaining ones then handle the call
sites.

Changes v7->v8:
   - Added __must_check annotation.
   - Rebased against latest upstream ipsec changes.

Jason A. Donenfeld (5):
  skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
  ipsec: check return value of skb_to_sgvec always
  rxrpc: check return value of skb_to_sgvec always
  macsec: check return value of skb_to_sgvec always
  virtio_net: check return value of skb_to_sgvec always

 drivers/net/macsec.c     | 13 ++++++++--
 drivers/net/virtio_net.c |  9 +++++--
 include/linux/skbuff.h   |  8 +++---
 net/core/skbuff.c        | 65 +++++++++++++++++++++++++++++++-----------------
 net/ipv4/ah4.c           |  8 ++++--
 net/ipv4/esp4.c          | 20 +++++++++------
 net/ipv6/ah6.c           |  8 ++++--
 net/ipv6/esp6.c          | 20 +++++++++------
 net/rxrpc/rxkad.c        | 13 +++++++---
 9 files changed, 112 insertions(+), 52 deletions(-)

-- 
2.13.0

^ permalink raw reply

* [PATCH v8 1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
From: Jason A. Donenfeld @ 2017-05-11 19:41 UTC (permalink / raw)
  To: netdev, linux-kernel, davem, kernel-hardening
  Cc: Jason A. Donenfeld, Steffen Klassert, Herbert Xu, David Howells,
	Sabrina Dubroca, Michael S. Tsirkin, Jason Wang
In-Reply-To: <20170511194134.31183-1-Jason@zx2c4.com>

This is a defense-in-depth measure in response to bugs like
4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.

As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.

While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
---
 include/linux/skbuff.h |  8 +++----
 net/core/skbuff.c      | 65 ++++++++++++++++++++++++++++++++------------------
 2 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a098d95b3d84..f351df28942d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1001,10 +1001,10 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb,
 				     unsigned int headroom);
 struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom,
 				int newtailroom, gfp_t priority);
-int skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg,
-			int offset, int len);
-int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset,
-		 int len);
+int __must_check skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg,
+				     int offset, int len);
+int __must_check skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg,
+			      int offset, int len);
 int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer);
 int skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)	consume_skb(a)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 346d3e85dfbc..ec33811559e3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3482,24 +3482,18 @@ void __init skb_init(void)
 						NULL);
 }
 
-/**
- *	skb_to_sgvec - Fill a scatter-gather list from a socket buffer
- *	@skb: Socket buffer containing the buffers to be mapped
- *	@sg: The scatter-gather list to map into
- *	@offset: The offset into the buffer's contents to start mapping
- *	@len: Length of buffer space to be mapped
- *
- *	Fill the specified scatter-gather list with mappings/pointers into a
- *	region of the buffer space attached to a socket buffer.
- */
 static int
-__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
+__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len,
+	       unsigned int recursion_level)
 {
 	int start = skb_headlen(skb);
 	int i, copy = start - offset;
 	struct sk_buff *frag_iter;
 	int elt = 0;
 
+	if (unlikely(recursion_level >= 24))
+		return -EMSGSIZE;
+
 	if (copy > 0) {
 		if (copy > len)
 			copy = len;
@@ -3518,6 +3512,8 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 		end = start + skb_frag_size(&skb_shinfo(skb)->frags[i]);
 		if ((copy = end - offset) > 0) {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+			if (unlikely(elt && sg_is_last(&sg[elt - 1])))
+				return -EMSGSIZE;
 
 			if (copy > len)
 				copy = len;
@@ -3532,16 +3528,22 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 	}
 
 	skb_walk_frags(skb, frag_iter) {
-		int end;
+		int end, ret;
 
 		WARN_ON(start > offset + len);
 
 		end = start + frag_iter->len;
 		if ((copy = end - offset) > 0) {
+			if (unlikely(elt && sg_is_last(&sg[elt - 1])))
+				return -EMSGSIZE;
+
 			if (copy > len)
 				copy = len;
-			elt += __skb_to_sgvec(frag_iter, sg+elt, offset - start,
-					      copy);
+			ret = __skb_to_sgvec(frag_iter, sg+elt, offset - start,
+					      copy, recursion_level + 1);
+			if (unlikely(ret < 0))
+				return ret;
+			elt += ret;
 			if ((len -= copy) == 0)
 				return elt;
 			offset += copy;
@@ -3552,6 +3554,31 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 	return elt;
 }
 
+/**
+ *	skb_to_sgvec - Fill a scatter-gather list from a socket buffer
+ *	@skb: Socket buffer containing the buffers to be mapped
+ *	@sg: The scatter-gather list to map into
+ *	@offset: The offset into the buffer's contents to start mapping
+ *	@len: Length of buffer space to be mapped
+ *
+ *	Fill the specified scatter-gather list with mappings/pointers into a
+ *	region of the buffer space attached to a socket buffer. Returns either
+ *	the number of scatterlist items used, or -EMSGSIZE if the contents
+ *	could not fit.
+ */
+int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
+{
+	int nsg = __skb_to_sgvec(skb, sg, offset, len, 0);
+
+	if (nsg <= 0)
+		return nsg;
+
+	sg_mark_end(&sg[nsg - 1]);
+
+	return nsg;
+}
+EXPORT_SYMBOL_GPL(skb_to_sgvec);
+
 /* As compared with skb_to_sgvec, skb_to_sgvec_nomark only map skb to given
  * sglist without mark the sg which contain last skb data as the end.
  * So the caller can mannipulate sg list as will when padding new data after
@@ -3574,19 +3601,11 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
 int skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg,
 			int offset, int len)
 {
-	return __skb_to_sgvec(skb, sg, offset, len);
+	return __skb_to_sgvec(skb, sg, offset, len, 0);
 }
 EXPORT_SYMBOL_GPL(skb_to_sgvec_nomark);
 
-int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
-{
-	int nsg = __skb_to_sgvec(skb, sg, offset, len);
 
-	sg_mark_end(&sg[nsg - 1]);
-
-	return nsg;
-}
-EXPORT_SYMBOL_GPL(skb_to_sgvec);
 
 /**
  *	skb_cow_data - Check that a socket buffer's data buffers are writable
-- 
2.13.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox