Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] irqchip/renesas-irqc: Postpone driver initialization
From: Geert Uytterhoeven @ 2016-11-09 19:43 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Geert Uytterhoeven, Thomas Gleixner, Jason Cooper, Marc Zyngier,
	Simon Horman, Magnus Damm, Linux-Renesas,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <3dface9b-7602-b1f2-0dd6-37b444ac42e3@gmail.com>

Hi Florian,

On Wed, Nov 9, 2016 at 8:17 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 11/08/2016 11:50 AM, Geert Uytterhoeven wrote:
>> On Tue, Nov 8, 2016 at 8:42 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> On 11/08/2016 11:35 AM, Geert Uytterhoeven wrote:
>>>> Currently the renesas-irqc driver uses postcore_initcall().
>>>>
>>>> However, the new CPG/MSSR driver uses subsys_initcall(). Hence the
>>>> IRQC's probe will be deferred, which causes the Micrel Ethernet PHY to
>>>> not find its interrupt on R-Car Gen2 and RZ/G, as the of_mdio subsystem
>>>> does not support deferred probe yet.
>>>
>>> Is not that the more correct fix to implement though?
>>
>> Sure it is. But nothing has happened since this was reported ca. 1 year ago.
>> Cfr. "of_mdiobus_register_phy() and deferred probe"
>> https://lkml.org/lkml/2015/10/22/377
>>
>> My MDIO foo is not that strong...
>
> Let me try to cook something here which may require
> of_mdiobus_register_phy(), are you okay testing patches?

Of course I am. Thanks a lot!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 2/3] vhost: better detection of available buffers
From: Michael S. Tsirkin @ 2016-11-09 19:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1478677113-13126-2-git-send-email-jasowang@redhat.com>

On Wed, Nov 09, 2016 at 03:38:32PM +0800, Jason Wang wrote:
> We should use vq->last_avail_idx instead of vq->avail_idx in the
> checking of vhost_vq_avail_empty() since latter is the cached avail
> index from guest but we want to know if there's pending available
> buffers in the virtqueue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I'm not sure why is this patch here. Is it related to
batching somehow?


> ---
>  drivers/vhost/vhost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c6f2d89..fdf4cdf 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2230,7 +2230,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	if (r)
>  		return false;
>  
> -	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
> +	return vhost16_to_cpu(vq, avail_idx) == vq->last_avail_idx;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

That might be OK for TX but it's probably wrong for RX
where the fact that used != avail does not mean
we have enough space to store the packet.

Maybe we should just rename this to vhost_vq_avail_unchanged
to clarify usage.


>  
> -- 
> 2.7.4

^ permalink raw reply

* [PATCH net v2] ipv4: update comment to document GSO fragmentation cases.
From: Lance Richardson @ 2016-11-09 20:04 UTC (permalink / raw)
  To: netdev

This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
---
v2: corrected commit ID (v1 used local commit ID by mistake)

 net/ipv4/ip_output.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4971401..c2dae40 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -244,12 +244,18 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	if (skb_gso_validate_mtu(skb, mtu))
 		return ip_finish_output2(net, sk, skb);
 
-	/* Slowpath -  GSO segment length is exceeding the dst MTU.
+	/* Slowpath -  GSO segment length exceeds the egress MTU.
 	 *
-	 * This can happen in two cases:
-	 * 1) TCP GRO packet, DF bit not set
-	 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
-	 * from host network stack.
+	 * This can happen in several cases:
+	 *  - Forwarding of a TCP GRO skb, when DF flag is not set.
+	 *  - Forwarding of an skb that arrived on a virtualization interface
+	 *    (virtio-net/vhost/tap) with TSO/GSO size set by other network
+	 *    stack.
+	 *  - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
+	 *    interface with a smaller MTU.
+	 *  - Arriving GRO skb (or GSO skb in a virtualized environment) that is
+	 *    bridged to a NETIF_F_TSO tunnel stacked over an interface with an
+	 *    insufficent MTU.
 	 */
 	features = netif_skb_features(skb);
 	BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH 3/3] vhost_net: tx support batching
From: Michael S. Tsirkin @ 2016-11-09 20:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1478677113-13126-3-git-send-email-jasowang@redhat.com>

On Wed, Nov 09, 2016 at 03:38:33PM +0800, Jason Wang wrote:
> This patch tries to utilize tuntap rx batching by peeking the tx
> virtqueue during transmission, if there's more available buffers in
> the virtqueue, set MSG_MORE flag for a hint for tuntap to batch the
> packets. The maximum number of batched tx packets were specified
> through a module parameter: tx_bached.
> 
> When use 16 as tx_batched:

When using

> 
> Pktgen test shows 16% on tx pps in guest.
> Netperf test does not show obvious regression.

Why doesn't netperf benefit?

> For safety, 1 were used as the default value for tx_batched.

s/were used/is used/

> Signed-off-by: Jason Wang <jasowang@redhat.com>

These tests unfortunately only run a single flow.
The concern would be whether this increases latency when
NIC is busy with other flows, so I think this is what
you need to test.


> ---
>  drivers/vhost/net.c   | 15 ++++++++++++++-
>  drivers/vhost/vhost.c |  1 +
>  drivers/vhost/vhost.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..51c378e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -35,6 +35,10 @@ module_param(experimental_zcopytx, int, 0444);
>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
>  		                       " 1 -Enable; 0 - Disable");
>  
> +static int tx_batched = 1;
> +module_param(tx_batched, int, 0444);
> +MODULE_PARM_DESC(tx_batched, "Number of patches batched in TX");
> +
>  /* Max number of bytes transferred before requeueing the job.
>   * Using this limit prevents one virtqueue from starving others. */
>  #define VHOST_NET_WEIGHT 0x80000

I think we should do some tests and find a good default.



> @@ -454,6 +458,16 @@ static void handle_tx(struct vhost_net *net)
>  			msg.msg_control = NULL;
>  			ubufs = NULL;
>  		}
> +		total_len += len;
> +		if (vq->delayed < tx_batched &&
> +		    total_len < VHOST_NET_WEIGHT &&
> +		    !vhost_vq_avail_empty(&net->dev, vq)) {
> +			vq->delayed++;
> +			msg.msg_flags |= MSG_MORE;
> +		} else {
> +			vq->delayed = 0;
> +			msg.msg_flags &= ~MSG_MORE;
> +		}
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
>  		if (unlikely(err < 0)) {
> @@ -472,7 +486,6 @@ static void handle_tx(struct vhost_net *net)
>  			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>  		else
>  			vhost_zerocopy_signal_used(net, vq);
> -		total_len += len;
>  		vhost_net_tx_packet(net);
>  		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
>  			vhost_poll_queue(&vq->poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index fdf4cdf..bc362c7 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -311,6 +311,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vq->busyloop_timeout = 0;
>  	vq->umem = NULL;
>  	vq->iotlb = NULL;
> +	vq->delayed = 0;
>  }
>  
>  static int vhost_worker(void *data)
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 78f3c5f..9f81a94 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -141,6 +141,7 @@ struct vhost_virtqueue {
>  	bool user_be;
>  #endif
>  	u32 busyloop_timeout;
> +	int delayed;
>  };
>  
>  struct vhost_msg_node {
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH 2/2] soc: qcom: smem_state: Fix include for ERR_PTR()
From: Andy Gross @ 2016-11-09 20:11 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Eugene Krasnikov, Kalle Valo, wcn36xx, linux-wireless, netdev,
	Linux Kernel list
In-Reply-To: <1478714571-4841-2-git-send-email-bjorn.andersson@linaro.org>

On 9 November 2016 at 12:02, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
> The correct include file for getting errno constants and ERR_PTR() is
> linux/err.h, rather than linux/errno.h, so fix the include.
>
> Fixes: e8b123e60084 ("soc: qcom: smem_state: Add stubs for disabled smem_state")
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
>
> Andy, I don't see anything else going into v4.10 for smem_state.h. So if you
> are okay I would like Kalle to pick this patch through his tree and be able to
> merge my wcn36xx patches for v4.10.
>
>  include/linux/soc/qcom/smem_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/soc/qcom/smem_state.h b/include/linux/soc/qcom/smem_state.h
> index 7b88697929e9..b8478ee7a71f 100644
> --- a/include/linux/soc/qcom/smem_state.h
> +++ b/include/linux/soc/qcom/smem_state.h
> @@ -1,7 +1,7 @@
>  #ifndef __QCOM_SMEM_STATE__
>  #define __QCOM_SMEM_STATE__
>
> -#include <linux/errno.h>
> +#include <linux/err.h>
>
>  struct device_node;
>  struct qcom_smem_state;
> --
> 2.5.0
>

This time with less html.

Acked-by: Andy Gross <andy.gross@linaro.org>

^ permalink raw reply

* Re: Why are IPv6 host and anycast routes referencing lo device?
From: David Ahern @ 2016-11-09 20:30 UTC (permalink / raw)
  To: Hannes Frederic Sowa, YOSHIFUJI Hideaki, netdev@vger.kernel.org
In-Reply-To: <a2955697-9cd2-9f27-9d15-4f9ed95561ec@stressinduktion.org>

On 11/8/16 10:08 AM, Hannes Frederic Sowa wrote:
> On 08.11.2016 02:08, David Ahern wrote:
>>
>> Can anyone explain why host routes and anycast routes for IPv6 are added with the device set to loopback versus the device with the address:
>>
>> local ::1 dev lo  proto none  metric 0  pref medium
>> local 2000:1:: dev lo  proto none  metric 0  pref medium
>> local 2000:1::3 dev lo  proto none  metric 0  pref medium
>> local 2100:2:: dev lo  proto none  metric 0  pref medium
>> local 2100:2::3 dev lo  proto none  metric 0  pref medium
> 
> Does it really matter? For global valid unicast addresses we still
> implement the weak model. Thus the interface does not matter at all.

Consistency. Why does IPv6 need to have all of these little differences from IPv4? Raises questions from customers, customer support and often requires special handling in s/w. 

> 
>> This behavior differs from IPv4 where host routes use the device with the address:
>>
>> broadcast 10.1.1.0 dev eth0  proto kernel  scope link  src 10.1.1.3
>> local 10.1.1.3 dev eth0  proto kernel  scope host  src 10.1.1.3
>> broadcast 10.1.1.255 dev eth0  proto kernel  scope link  src 10.1.1.3
>> broadcast 10.100.2.0 dev eth2  proto kernel  scope link  src 10.100.2.3
>> local 10.100.2.3 dev eth2  proto kernel  scope host  src 10.100.2.3
>> broadcast 10.100.2.255 dev eth2  proto kernel  scope link  src 10.100.2.3
>>
>> The use of loopback pre-dates the git history, so wondering if someone recalls the reason why. We would like to change that to make it consistent with IPv4 - with a sysctl to maintain backwards compatibility.
> 
> A sysctl for that sounds like a really bad idea.
> 
> Internally the sysctl will change the reference counting of interfaces
> and routes towards each other, have small but difficult to find
> semantically changes inside the kernel, just for switchting the
> interface in iproute/netlink dumps?
> 
> If there a good reasons (which can very well be) to switch to have the
> interface with the address in the routes, we should switch without
> providing the backwards compatibility sysctl.

Works for me. I'd prefer not to add yet another sysctl but that seems to be standard for cases like this.

^ permalink raw reply

* Re: [PATCH 1/2] wcn36xx: Correct Kconfig dependency
From: Marcel Holtmann @ 2016-11-09 20:32 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, Eugene Krasnikov, Kalle Valo, wcn36xx, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <1478714571-4841-1-git-send-email-bjorn.andersson@linaro.org>

Hi Bjorn,

> In the case SMD or WCNSS_CTRL are compiled as modules, wcn36xx must be
> compiled as module as well, so we need to mention this dependency.
> But we still want allow the driver to be compiled in case either of
> those are =n, for compile testing reasons.
> 
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
> drivers/net/wireless/ath/wcn36xx/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/wcn36xx/Kconfig b/drivers/net/wireless/ath/wcn36xx/Kconfig
> index 591ebaea8265..4b83e87f0b94 100644
> --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
> +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
> @@ -1,6 +1,8 @@
> config WCN36XX
> 	tristate "Qualcomm Atheros WCN3660/3680 support"
> 	depends on MAC80211 && HAS_DMA
> +	depends on QCOM_WCNSS_CTRL || QCOM_WCNSS_CTRL=n
> +	depends on QCOM_SMD || QCOM_SMD=n

I thought we wanted to move towards being able to compile the WiFi and Bluetooth support without requiring QCOM_SMD.

Regards

Marcel

^ permalink raw reply

* [PATCH net-next 17/17] net: hns: add the support to add/remove the ucast entry to/from table
From: Salil Mehta @ 2016-11-09 18:14 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, mehta.salil.lnk, netdev, linux-kernel,
	linuxarm, Kejian Yan
In-Reply-To: <20161109181401.913728-1-salil.mehta@huawei.com>

From: Kejian Yan <yankejian@huawei.com>

This patch adds the support to add or remove the unicast entries
to the table and remove from the table.

Reported-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns/hnae.h          |    8 ++++
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |   24 ++++++++++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |   40 ++++++++++++++++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h  |    4 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |   49 ++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |    4 ++
 drivers/net/ethernet/hisilicon/hns/hns_enet.c      |   26 +++++++++++
 7 files changed, 155 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 55c0334..09602f1 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -430,6 +430,10 @@ enum hnae_media_type {
  *   clear mcast tcam table
  * set_mc_addr()
  *   set multicast mode
+ * add_uc_addr()
+ *   add ucast address
+ * rm_uc_addr()
+ *   remove ucast address
  * set_mtu()
  *   set mtu
  * update_stats()
@@ -490,6 +494,10 @@ struct hnae_ae_ops {
 	void (*set_promisc_mode)(struct hnae_handle *handle, u32 en);
 	int (*get_mac_addr)(struct hnae_handle *handle, void **p);
 	int (*set_mac_addr)(struct hnae_handle *handle, void *p);
+	int (*add_uc_addr)(struct hnae_handle *handle,
+			   const unsigned char *addr);
+	int (*rm_uc_addr)(struct hnae_handle *handle,
+			  const unsigned char *addr);
 	int (*clr_mc_addr)(struct hnae_handle *handle);
 	int (*set_mc_addr)(struct hnae_handle *handle, void *addr);
 	int (*set_mtu)(struct hnae_handle *handle, int new_mtu);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index 366b25b..0a9cdf0 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -199,6 +199,28 @@ static int hns_ae_set_mac_address(struct hnae_handle *handle, void *p)
 	return 0;
 }
 
+static int hns_ae_add_uc_address(struct hnae_handle *handle,
+				 const unsigned char *addr)
+{
+	struct hns_mac_cb *mac_cb = hns_get_mac_cb(handle);
+
+	if (mac_cb->mac_type != HNAE_PORT_SERVICE)
+		return -ENOSPC;
+
+	return hns_mac_add_uc_addr(mac_cb, handle->vf_id, addr);
+}
+
+static int hns_ae_rm_uc_address(struct hnae_handle *handle,
+				const unsigned char *addr)
+{
+	struct hns_mac_cb *mac_cb = hns_get_mac_cb(handle);
+
+	if (mac_cb->mac_type != HNAE_PORT_SERVICE)
+		return -ENOSPC;
+
+	return hns_mac_rm_uc_addr(mac_cb, handle->vf_id, addr);
+}
+
 static int hns_ae_set_multicast_one(struct hnae_handle *handle, void *addr)
 {
 	int ret;
@@ -830,6 +852,8 @@ static int hns_ae_set_rss(struct hnae_handle *handle, const u32 *indir,
 	.get_coalesce_range = hns_ae_get_coalesce_range,
 	.set_promisc_mode = hns_ae_set_promisc_mode,
 	.set_mac_addr = hns_ae_set_mac_address,
+	.add_uc_addr = hns_ae_add_uc_address,
+	.rm_uc_addr = hns_ae_rm_uc_address,
 	.set_mc_addr = hns_ae_set_multicast_one,
 	.clr_mc_addr = hns_ae_clr_multicast,
 	.set_mtu = hns_ae_set_mtu,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 266417d..3239d27 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -263,6 +263,46 @@ int hns_mac_change_vf_addr(struct hns_mac_cb *mac_cb,
 	return 0;
 }
 
+int hns_mac_add_uc_addr(struct hns_mac_cb *mac_cb, u8 vf_id,
+			const unsigned char *addr)
+{
+	struct dsaf_device *dsaf_dev = mac_cb->dsaf_dev;
+	struct dsaf_drv_mac_single_dest_entry mac_entry;
+	int ret;
+
+	if (HNS_DSAF_IS_DEBUG(dsaf_dev))
+		return -ENOSPC;
+
+	memset(&mac_entry, 0, sizeof(mac_entry));
+	memcpy(mac_entry.addr, addr, sizeof(mac_entry.addr));
+	mac_entry.in_port_num = mac_cb->mac_id;
+	ret = hns_mac_get_inner_port_num(mac_cb, vf_id, &mac_entry.port_num);
+	if (ret)
+		return ret;
+
+	return hns_dsaf_set_mac_uc_entry(dsaf_dev, &mac_entry);
+}
+
+int hns_mac_rm_uc_addr(struct hns_mac_cb *mac_cb, u8 vf_id,
+		       const unsigned char *addr)
+{
+	struct dsaf_device *dsaf_dev = mac_cb->dsaf_dev;
+	struct dsaf_drv_mac_single_dest_entry mac_entry;
+	int ret;
+
+	if (HNS_DSAF_IS_DEBUG(dsaf_dev))
+		return -ENOSPC;
+
+	memset(&mac_entry, 0, sizeof(mac_entry));
+	memcpy(mac_entry.addr, addr, sizeof(mac_entry.addr));
+	mac_entry.in_port_num = mac_cb->mac_id;
+	ret = hns_mac_get_inner_port_num(mac_cb, vf_id, &mac_entry.port_num);
+	if (ret)
+		return ret;
+
+	return hns_dsaf_rm_mac_addr(dsaf_dev, &mac_entry);
+}
+
 int hns_mac_set_multi(struct hns_mac_cb *mac_cb,
 		      u32 port_num, char *addr, bool enable)
 {
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index d896452..2bb3d1e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -461,6 +461,10 @@ int hns_cpld_led_set_id(struct hns_mac_cb *mac_cb,
 void hns_mac_set_promisc(struct hns_mac_cb *mac_cb, u8 en);
 int hns_mac_get_inner_port_num(struct hns_mac_cb *mac_cb,
 			       u8 vmid, u8 *port_num);
+int hns_mac_add_uc_addr(struct hns_mac_cb *mac_cb, u8 vf_id,
+			const unsigned char *addr);
+int hns_mac_rm_uc_addr(struct hns_mac_cb *mac_cb, u8 vf_id,
+		       const unsigned char *addr);
 int hns_mac_clr_multicast(struct hns_mac_cb *mac_cb, int vfn);
 
 #endif /* _HNS_DSAF_MAC_H */
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index ebecbed..90dbda7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -1598,6 +1598,55 @@ int hns_dsaf_set_mac_uc_entry(
 	return 0;
 }
 
+int hns_dsaf_rm_mac_addr(
+	struct dsaf_device *dsaf_dev,
+	struct dsaf_drv_mac_single_dest_entry *mac_entry)
+{
+	u16 entry_index = DSAF_INVALID_ENTRY_IDX;
+	struct dsaf_tbl_tcam_ucast_cfg mac_data;
+	struct dsaf_drv_tbl_tcam_key mac_key;
+
+	/* mac addr check */
+	if (!is_valid_ether_addr(mac_entry->addr)) {
+		dev_err(dsaf_dev->dev, "rm_uc_addr %s Mac %pM err!\n",
+			dsaf_dev->ae_dev.name, mac_entry->addr);
+		return -EINVAL;
+	}
+
+	/* config key */
+	hns_dsaf_set_mac_key(dsaf_dev, &mac_key, mac_entry->in_vlan_id,
+			     mac_entry->in_port_num, mac_entry->addr);
+
+	entry_index = hns_dsaf_find_soft_mac_entry(dsaf_dev, &mac_key);
+	if (entry_index == DSAF_INVALID_ENTRY_IDX) {
+		/* can not find the tcam entry, return 0 */
+		dev_info(dsaf_dev->dev,
+			 "rm_uc_addr no tcam, %s Mac key(%#x:%#x)\n",
+			 dsaf_dev->ae_dev.name,
+			 mac_key.high.val, mac_key.low.val);
+		return 0;
+	}
+
+	dev_dbg(dsaf_dev->dev,
+		"rm_uc_addr, %s Mac key(%#x:%#x) entry_index%d\n",
+		dsaf_dev->ae_dev.name, mac_key.high.val,
+		mac_key.low.val, entry_index);
+
+	hns_dsaf_tcam_uc_get(
+			dsaf_dev, entry_index,
+			(struct dsaf_tbl_tcam_data *)&mac_key,
+			&mac_data);
+
+	/* unicast entry not used locally should not clear */
+	if (mac_entry->port_num != mac_data.tbl_ucast_out_port)
+		return -EFAULT;
+
+	return hns_dsaf_del_mac_entry(dsaf_dev,
+				      mac_entry->in_vlan_id,
+				      mac_entry->in_port_num,
+				      mac_entry->addr);
+}
+
 /**
  * hns_dsaf_set_mac_mc_entry - set mac mc-entry
  * @dsaf_dev: dsa fabric device struct pointer
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
index 901037c..cef6bf4 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
@@ -468,6 +468,10 @@ void hns_dsaf_get_rx_mac_pause_en(struct dsaf_device *dsaf_dev, int mac_id,
 				  u32 *en);
 int hns_dsaf_set_rx_mac_pause_en(struct dsaf_device *dsaf_dev, int mac_id,
 				 u32 en);
+int hns_dsaf_rm_mac_addr(
+	struct dsaf_device *dsaf_dev,
+	struct dsaf_drv_mac_single_dest_entry *mac_entry);
+
 int hns_dsaf_clr_mac_mc_port(struct dsaf_device *dsaf_dev,
 			     u8 mac_id, u8 port_num);
 
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index d22a39e..776d81e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1493,6 +1493,29 @@ static netdev_features_t hns_nic_fix_features(
 	return features;
 }
 
+static int hns_nic_uc_sync(struct net_device *netdev, const unsigned char *addr)
+{
+	struct hns_nic_priv *priv = netdev_priv(netdev);
+	struct hnae_handle *h = priv->ae_handle;
+
+	if (h->dev->ops->add_uc_addr)
+		return h->dev->ops->add_uc_addr(h, addr);
+
+	return 0;
+}
+
+static int hns_nic_uc_unsync(struct net_device *netdev,
+			     const unsigned char *addr)
+{
+	struct hns_nic_priv *priv = netdev_priv(netdev);
+	struct hnae_handle *h = priv->ae_handle;
+
+	if (h->dev->ops->rm_uc_addr)
+		return h->dev->ops->rm_uc_addr(h, addr);
+
+	return 0;
+}
+
 /**
  * nic_set_multicast_list - set mutl mac address
  * @netdev: net device
@@ -1535,6 +1558,9 @@ void hns_nic_set_rx_mode(struct net_device *ndev)
 	}
 
 	hns_set_multicast_list(ndev);
+
+	if (__dev_uc_sync(ndev, hns_nic_uc_sync, hns_nic_uc_unsync))
+		netdev_err(ndev, "sync uc address fail\n");
 }
 
 struct rtnl_link_stats64 *hns_nic_get_stats64(struct net_device *ndev,
-- 
1.7.9.5

^ permalink raw reply related

* nf_nat_tftp broken in 4.8?
From: Florian Fainelli @ 2016-11-09 20:40 UTC (permalink / raw)
  To: netdev, netfilter-devel, fgao; +Cc: pablo

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

Hi,

After updating my workstation from 4.4 to 4.8, which also serves as a
NAT gateway for local machines, I noticed that TFTP across this NAT box
broke:

- TFTP read request to the server, source port 3534
- server replies with a data packet to dst port 3534
- gateway sends an ICMP destination unreachable with port unreachable
along with the UDP data packet

I am currently bisecting this, but it is taking a lot longer to build
than I thought. Attached is the non-working case PCAP file for you to
look at.

There has not been a lot happening for net/netfilter/*tftp* between v4.4
and 4.8, so can you think about something in particular that could cause
that?

Thanks!
--
Florian

[-- Attachment #2: tftp-4.8pcapng.pcapng.gz --]
[-- Type: application/gzip, Size: 1269 bytes --]

^ permalink raw reply

* Re: [PATCH 1/2] wcn36xx: Correct Kconfig dependency
From: Bjorn Andersson @ 2016-11-09 20:45 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Andy Gross, Eugene Krasnikov, Kalle Valo, wcn36xx, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20975667-9D09-430C-A70B-D0C38A320C86@holtmann.org>

On Wed 09 Nov 12:32 PST 2016, Marcel Holtmann wrote:

> Hi Bjorn,
> 
> > In the case SMD or WCNSS_CTRL are compiled as modules, wcn36xx must be
> > compiled as module as well, so we need to mention this dependency.
> > But we still want allow the driver to be compiled in case either of
> > those are =n, for compile testing reasons.
> > 
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> > ---
> > drivers/net/wireless/ath/wcn36xx/Kconfig | 2 ++
> > 1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/net/wireless/ath/wcn36xx/Kconfig b/drivers/net/wireless/ath/wcn36xx/Kconfig
> > index 591ebaea8265..4b83e87f0b94 100644
> > --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
> > +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
> > @@ -1,6 +1,8 @@
> > config WCN36XX
> > 	tristate "Qualcomm Atheros WCN3660/3680 support"
> > 	depends on MAC80211 && HAS_DMA
> > +	depends on QCOM_WCNSS_CTRL || QCOM_WCNSS_CTRL=n
> > +	depends on QCOM_SMD || QCOM_SMD=n
> 
> I thought we wanted to move towards being able to compile the WiFi and
> Bluetooth support without requiring QCOM_SMD.
> 

That's right.

The problem I spotted was that without the dependency on QCOM_SMD we're
allowed to compile these modules as builtin, but as
IS_ENABLED(CONFIG_QCOM_SMD) is true we don't get the stubs and as such
the kernel doesn't link.

So we must enforce these modules to be compiled as modules if either
QCOM_SMD or QCOM_WCNSS_CTRL are compiled as modules, and we do that with
the first part of the depends lines.


The second part of the depends lines states that the alternative
dependency is that the modules are not enabled at all, in which case we
will fall back to using the function stubs.

So this is unfortunately how we have to describe it in Kconfig to work
with the function stubs, sorry for not getting this right in the first
place.

Regards,
Bjorn

^ permalink raw reply

* Re: [PATCH net-next] tcp: remove unaligned accesses from tcp_get_info()
From: Yuchung Cheng @ 2016-11-09 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Eric Dumazet, netdev, Neal Cardwell,
	Soheil Hassas Yeganeh
In-Reply-To: <1478719462.16809.15.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Nov 9, 2016 at 11:24 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> After commit 6ed46d1247a5 ("sock_diag: align nlattr properly when
> needed"), tcp_get_info() gets 64bit aligned memory, so we can avoid
> the unaligned helpers.
>
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>


> ---
>  net/ipv4/tcp.c |   11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index a7d54cbcdabbbc0838c00e05074ef15560665f2d..f8f924ca662d5c84e438a6eacbe8f734ff0674ae 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -279,7 +279,6 @@
>
>  #include <asm/uaccess.h>
>  #include <asm/ioctls.h>
> -#include <asm/unaligned.h>
>  #include <net/busy_poll.h>
>
>  int sysctl_tcp_min_tso_segs __read_mostly = 2;
> @@ -2722,11 +2721,11 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>         /* Report meaningful fields for all TCP states, including listeners */
>         rate = READ_ONCE(sk->sk_pacing_rate);
>         rate64 = rate != ~0U ? rate : ~0ULL;
> -       put_unaligned(rate64, &info->tcpi_pacing_rate);
> +       info->tcpi_pacing_rate = rate64;
>
>         rate = READ_ONCE(sk->sk_max_pacing_rate);
>         rate64 = rate != ~0U ? rate : ~0ULL;
> -       put_unaligned(rate64, &info->tcpi_max_pacing_rate);
> +       info->tcpi_max_pacing_rate = rate64;
>
>         info->tcpi_reordering = tp->reordering;
>         info->tcpi_snd_cwnd = tp->snd_cwnd;
> @@ -2792,8 +2791,8 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>
>         slow = lock_sock_fast(sk);
>
> -       put_unaligned(tp->bytes_acked, &info->tcpi_bytes_acked);
> -       put_unaligned(tp->bytes_received, &info->tcpi_bytes_received);
> +       info->tcpi_bytes_acked = tp->bytes_acked;
> +       info->tcpi_bytes_received = tp->bytes_received;
>         info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt);
>
>         unlock_sock_fast(sk, slow);
> @@ -2811,7 +2810,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>         if (rate && intv) {
>                 rate64 = (u64)rate * tp->mss_cache * USEC_PER_SEC;
>                 do_div(rate64, intv);
> -               put_unaligned(rate64, &info->tcpi_delivery_rate);
> +               info->tcpi_delivery_rate = rate64;
>         }
>  }
>  EXPORT_SYMBOL_GPL(tcp_get_info);
>
>

^ permalink raw reply

* [PATCH net 0/2] qed: Fix RoCE infrastructure
From: Yuval Mintz @ 2016-11-09 20:48 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA, Yuval Mintz

This series fixes 2 basic issues with RoCE support,
one handles a missing configuration in the initial infrastructure
support while the other is a regression introduced by one of the
initial fix submissions.

Dave,

I've built this series against 'net'  [specifically on top of
cdb26d3387f0 ("net: bgmac: fix reversed checks for clock control flag")],
but I'm not entirely convinced whether that's the right place for them -
On one-hand, they're RoCE-specific fixes, but they're also infrastructure
fixes that are wholly contained inside qed.
Would like to hear your [and Doug's] preference on whether we should
target such patches to net or linux-rdma.

Thanks,
Yuval

Ram Amrani (2):
  qed: configure ll2 RoCE v1/v2 flavor correctly
  qed: correct rdma params configuration

 drivers/net/ethernet/qlogic/qed/qed_hsi.h  |  3 ---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c  |  1 +
 drivers/net/ethernet/qlogic/qed/qed_main.c | 17 ++++++++---------
 3 files changed, 9 insertions(+), 12 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 1/2] qed: configure ll2 RoCE v1/v2 flavor correctly
From: Yuval Mintz @ 2016-11-09 20:48 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA, Yuval Mintz
In-Reply-To: <1478724524-16940-1-git-send-email-Yuval.Mintz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

From: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.

Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Yuval Mintz <Yuval.Mintz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h | 3 ---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 72eee29..2777d5b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -727,9 +727,6 @@ struct core_tx_bd_flags {
 #define CORE_TX_BD_FLAGS_L4_PROTOCOL_SHIFT	6
 #define CORE_TX_BD_FLAGS_L4_PSEUDO_CSUM_MODE_MASK	0x1
 #define CORE_TX_BD_FLAGS_L4_PSEUDO_CSUM_MODE_SHIFT 7
-#define CORE_TX_BD_FLAGS_ROCE_FLAV_MASK		0x1
-#define CORE_TX_BD_FLAGS_ROCE_FLAV_SHIFT	12
-
 };
 
 struct core_tx_bd {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 63e1a1b..f95385c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -1119,6 +1119,7 @@ static void qed_ll2_prepare_tx_packet_set_bd(struct qed_hwfn *p_hwfn,
 	start_bd->bd_flags.as_bitfield |= CORE_TX_BD_FLAGS_START_BD_MASK <<
 	    CORE_TX_BD_FLAGS_START_BD_SHIFT;
 	SET_FIELD(start_bd->bitfield0, CORE_TX_BD_NBDS, num_of_bds);
+	SET_FIELD(start_bd->bitfield0, CORE_TX_BD_ROCE_FLAV, type);
 	DMA_REGPAIR_LE(start_bd->addr, first_frag);
 	start_bd->nbytes = cpu_to_le16(first_frag_len);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net 2/2] qed: Correct rdma params configuration
From: Yuval Mintz @ 2016-11-09 20:48 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA, Yuval Mintz
In-Reply-To: <1478724524-16940-1-git-send-email-Yuval.Mintz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

From: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.

Fixes: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")
Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Yuval Mintz <Yuval.Mintz-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
---
 drivers/net/ethernet/qlogic/qed/qed_main.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index c418360..333c744 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -839,20 +839,19 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 {
 	int i;
 
+	if (IS_ENABLED(CONFIG_QED_RDMA)) {
+		params->rdma_pf_params.num_qps = QED_ROCE_QPS;
+		params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+		/* divide by 3 the MRs to avoid MF ILT overflow */
+		params->rdma_pf_params.num_mrs = RDMA_MAX_TIDS;
+		params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
+	}
+
 	for (i = 0; i < cdev->num_hwfns; i++) {
 		struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
 
 		p_hwfn->pf_params = *params;
 	}
-
-	if (!IS_ENABLED(CONFIG_QED_RDMA))
-		return;
-
-	params->rdma_pf_params.num_qps = QED_ROCE_QPS;
-	params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
-	/* divide by 3 the MRs to avoid MF ILT overflow */
-	params->rdma_pf_params.num_mrs = RDMA_MAX_TIDS;
-	params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
 }
 
 static int qed_slowpath_start(struct qed_dev *cdev,
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
From: Daniel Borkmann @ 2016-11-09 21:02 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, bblanco, tariqt, zhiyisun, netdev,
	Daniel Borkmann

Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().

We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()

Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  5 ++++-
 include/linux/bpf.h                            |  5 +++++
 kernel/bpf/syscall.c                           | 11 +++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0f6225c..9bf7320 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2747,8 +2747,11 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	err = mlx4_en_try_alloc_resources(priv, tmp, &new_prof);
-	if (err)
+	if (err) {
+		if (prog)
+			bpf_prog_sub(prog, priv->rx_ring_num - 1);
 		goto unlock_out;
+	}
 
 	if (priv->port_up) {
 		port_up = 1;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index edcd96d..01c1487 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -234,6 +234,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
 struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
+void bpf_prog_sub(struct bpf_prog *prog, int i);
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
@@ -303,6 +304,10 @@ static inline struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
+static inline void bpf_prog_sub(struct bpf_prog *prog, int i)
+{
+}
+
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962..23eb205 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -680,6 +680,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_add);
 
+void bpf_prog_sub(struct bpf_prog *prog, int i)
+{
+	/* Only to be used for undoing previous bpf_prog_add() in some
+	 * error path. We still know that another entity in our call
+	 * path holds a reference to the program, thus atomic_sub() can
+	 * be safely used in such cases!
+	 */
+	WARN_ON(atomic_sub_return(i, &prog->aux->refcnt) == 0);
+}
+EXPORT_SYMBOL_GPL(bpf_prog_sub);
+
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
 {
 	return bpf_prog_add(prog, 1);
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix fixed prio cpdma ctlr configuration
From: Grygorii Strashko @ 2016-11-09 21:09 UTC (permalink / raw)
  To: Ivan Khoronzhuk, mugunthanvnm, netdev; +Cc: linux-omap, linux-kernel
In-Reply-To: <1478610656-24634-1-git-send-email-ivan.khoronzhuk@linaro.org>



On 11/08/2016 07:10 AM, Ivan Khoronzhuk wrote:
> The dma ctlr is reseted to 0 while cpdma start, thus cpdma ctlr

I assume this is because cpdma_ctlr_start() does soft reset. Is it correct?

> cannot be configured after cpdma is stopped. So, restore content
> of cpdma ctlr while off/on procedure.
> 
> Based on net-next/master

^ remove it

> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  drivers/net/ethernet/ti/cpsw.c          |   6 +-
>  drivers/net/ethernet/ti/davinci_cpdma.c | 103 +++++++++++++++++---------------
>  drivers/net/ethernet/ti/davinci_cpdma.h |   2 +
>  3 files changed, 58 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index b1ddf89..4d04b8e 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1376,10 +1376,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
>  				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
>  
>  	if (!cpsw_common_res_usage_state(cpsw)) {
> -		/* setup tx dma to fixed prio and zero offset */
> -		cpdma_control_set(cpsw->dma, CPDMA_TX_PRIO_FIXED, 1);
> -		cpdma_control_set(cpsw->dma, CPDMA_RX_BUFFER_OFFSET, 0);
> -
>  		/* disable priority elevation */
>  		__raw_writel(0, &cpsw->regs->ptype);
>  
> @@ -2710,6 +2706,8 @@ static int cpsw_probe(struct platform_device *pdev)
>  	dma_params.desc_align		= 16;
>  	dma_params.has_ext_regs		= true;
>  	dma_params.desc_hw_addr         = dma_params.desc_mem_phys;
> +	dma_params.rxbuf_offset		= 0;
> +	dma_params.fixed_prio		= 1;

Do we really need this new parameters? Do you have plans to use other values?


-- 
regards,
-grygorii

^ permalink raw reply

* Re: 484611357c19 introduces arbitrary kernel write bug (root-only)
From: Josef Bacik @ 2016-11-09 21:21 UTC (permalink / raw)
  To: Jann Horn, Daniel Borkmann, Alexei Starovoitov, David S. Miller
  Cc: security, netdev
In-Reply-To: <CAG48ez0mP4xwv6vnKRJ+rcdXYyA1wGnCWsbkKUgWGSBbMtgFMw@mail.gmail.com>

On 11/08/2016 07:23 PM, Jann Horn wrote:
> In 484611357c19 (not in any stable kernel yet), functionality is
> introduced that allows root (and afaics nobody else, since nobody else
> is allowed to perform pointer arithmetic) to basically write to (and
> read from) arbitrary kernel memory. There are multiple bugs in the
> validation logic:
>
>  - A bitwise AND of values in the ranges [a,b] and [c,d] is assumed to
> always result in a value
>    >= a&b. However, for the combination of ranges [1,1] and [1,2],
> this calculates a minimum of 1
>    while actually, 1&2 is zero. This is the bug that my crasher
> (below) triggers.
>  - a%b is assumed to always be smaller than b-1. However, for b==0,
> this will calculate an upper
>    limit of -1 while the values will actually always be zero.
>  - I'm not sure about this, but I think that, when only one end of the
> range is bounded, the logic will
>    incorrectly also treat the other end as a bounded, and because of
> the usage of bound
>    placeholders that are smaller than the actual maximum values, this
> could be used to perform
>    out-of-bounds accesses.
>
> The fun part here is that, as soon as the validation is just
> off-by-one, arithmetic transformations can be used to turn that into
> out-of-bounds accesses at arbitrary offsets. The crasher turns the
> off-by-one into a memory write at offset 0x10000000.
>

Can you give this a whirl?  It addresses your testcase and the other issues 
you've brought up.  Thanks

 From e47a1de98af2c1bcebd4224f546e3be1fd340b6a Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik@fb.com>
Date: Wed, 9 Nov 2016 16:09:52 -0500
Subject: [PATCH] bpf: fix range arithmetic for bpf map access

I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
invalid accesses to bpf map entries.  Fix this up by doing a few things

1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
life and just adds extra complexity.

2) Fix the logic for BPF_AND.  If the min value is negative then that is the new
minimum, otherwise it is unconditionally 0.

3) Don't do operations on the ranges if they are set to the limits, as they are
by definition undefined, and allowing arithmetic operations on those values
could make them appear valid when they really aren't.

This fixes the testcase provided by Jann as well as a few other theoretical
problems.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
  include/linux/bpf_verifier.h |  3 +-
  kernel/bpf/verifier.c        | 65 ++++++++++++++++++++++++++++----------------
  2 files changed, 44 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index ac5b393..15ceb7f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -22,7 +22,8 @@ struct bpf_reg_state {
  	 * Used to determine if any memory access using this register will
  	 * result in a bad access.
  	 */
-	u64 min_value, max_value;
+	s64 min_value;
+	u64 max_value;
  	u32 id;
  	union {
  		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9002575..840533a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -234,8 +234,8 @@ static void print_verifier_state(struct bpf_verifier_state 
*state)
  				reg->map_ptr->value_size,
  				reg->id);
  		if (reg->min_value != BPF_REGISTER_MIN_RANGE)
-			verbose(",min_value=%llu",
-				(unsigned long long)reg->min_value);
+			verbose(",min_value=%lld",
+				(long long)reg->min_value);
  		if (reg->max_value != BPF_REGISTER_MAX_RANGE)
  			verbose(",max_value=%llu",
  				(unsigned long long)reg->max_value);
@@ -1490,7 +1490,7 @@ static void check_reg_overflow(struct bpf_reg_state *reg)
  {
  	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
  		reg->max_value = BPF_REGISTER_MAX_RANGE;
-	if ((s64)reg->min_value < BPF_REGISTER_MIN_RANGE)
+	if (reg->min_value < BPF_REGISTER_MIN_RANGE)
  		reg->min_value = BPF_REGISTER_MIN_RANGE;
  }

@@ -1498,7 +1498,8 @@ static void adjust_reg_min_max_vals(struct 
bpf_verifier_env *env,
  				    struct bpf_insn *insn)
  {
  	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
-	u64 min_val = BPF_REGISTER_MIN_RANGE, max_val = BPF_REGISTER_MAX_RANGE;
+	s64 min_val = BPF_REGISTER_MIN_RANGE;
+	u64 max_val = BPF_REGISTER_MAX_RANGE;
  	bool min_set = false, max_set = false;
  	u8 opcode = BPF_OP(insn->code);

@@ -1534,22 +1535,45 @@ static void adjust_reg_min_max_vals(struct 
bpf_verifier_env *env,
  		return;
  	}

+	/* If one of our values was at the end of our ranges then we can't just
+	 * do our normal operations to the register, we need to set the values
+	 * to the min/max since they are undefined.
+	 */
+	if (min_val == BPF_REGISTER_MIN_RANGE)
+		dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+	if (max_val == BPF_REGISTER_MAX_RANGE)
+		dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+
  	switch (opcode) {
  	case BPF_ADD:
-		dst_reg->min_value += min_val;
-		dst_reg->max_value += max_val;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value += min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value += max_val;
  		break;
  	case BPF_SUB:
-		dst_reg->min_value -= min_val;
-		dst_reg->max_value -= max_val;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value -= min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value -= max_val;
  		break;
  	case BPF_MUL:
-		dst_reg->min_value *= min_val;
-		dst_reg->max_value *= max_val;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value *= min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value *= max_val;
  		break;
  	case BPF_AND:
-		/* & is special since it could end up with 0 bits set. */
-		dst_reg->min_value &= min_val;
+		/* & is special since it's could be any value within our range,
+		 * including 0.  But if the thing we're AND'ing against is
+		 * negative and we're negative then that's the minimum value,
+		 * otherwise the minimum will always be 0.
+		 */
+		if (min_val < 0 && dst_reg->min_value < 0)
+			dst_reg->min_value = min_t(s64, dst_reg->min_value,
+						   min_val);
+		else
+			dst_reg->min_value = 0;
  		dst_reg->max_value = max_val;
  		break;
  	case BPF_LSH:
@@ -1559,24 +1583,19 @@ static void adjust_reg_min_max_vals(struct 
bpf_verifier_env *env,
  		 */
  		if (min_val > ilog2(BPF_REGISTER_MAX_RANGE))
  			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		else
+		else if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
  			dst_reg->min_value <<= min_val;

  		if (max_val > ilog2(BPF_REGISTER_MAX_RANGE))
  			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
-		else
+		else if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
  			dst_reg->max_value <<= max_val;
  		break;
  	case BPF_RSH:
-		dst_reg->min_value >>= min_val;
-		dst_reg->max_value >>= max_val;
-		break;
-	case BPF_MOD:
-		/* % is special since it is an unsigned modulus, so the floor
-		 * will always be 0.
-		 */
-		dst_reg->min_value = 0;
-		dst_reg->max_value = max_val - 1;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value >>= min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value >>= max_val;
  		break;
  	default:
  		reset_reg_range_values(regs, insn->dst_reg);
-- 
2.5.5

^ permalink raw reply related

* Re: [swiotlb PATCH v3 0/3] Add support for DMA writable pages being writable by the network stack.
From: Konrad Rzeszutek Wilk @ 2016-11-09 21:23 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: linux-mm, netdev, linux-kernel
In-Reply-To: <20161109151639.25151.24290.stgit@ahduyck-blue-test.jf.intel.com>

On Wed, Nov 09, 2016 at 10:19:57AM -0500, Alexander Duyck wrote:
> This patch series is a subset of the patches originally submitted with the
> above patch title.  Specifically all of these patches relate to the
> swiotlb.
> 
> I wasn't sure if I needed to resubmit this series or not.  I see that v2 is
> currently sitting in the for-linus-4.9 branch of the swiotlb git repo.  If
> no updates are required for the previous set then this patch set can be
> ignored since most of the changes are just cosmetic.

I already had tested v2 so if you have patches that you want to put on top
of that please do send them.

^ permalink raw reply

* Re: [swiotlb PATCH v3 0/3] Add support for DMA writable pages being writable by the network stack.
From: Alexander Duyck @ 2016-11-09 21:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alexander Duyck, linux-mm, Netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20161109212341.GC12670@char.us.oracle.com>

On Wed, Nov 9, 2016 at 1:23 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Nov 09, 2016 at 10:19:57AM -0500, Alexander Duyck wrote:
>> This patch series is a subset of the patches originally submitted with the
>> above patch title.  Specifically all of these patches relate to the
>> swiotlb.
>>
>> I wasn't sure if I needed to resubmit this series or not.  I see that v2 is
>> currently sitting in the for-linus-4.9 branch of the swiotlb git repo.  If
>> no updates are required for the previous set then this patch set can be
>> ignored since most of the changes are just cosmetic.
>
> I already had tested v2 so if you have patches that you want to put on top
> of that please do send them.

I will rebase and if anything looks like it needs to be urgently fixed
I'll resubmit.

Thanks.

- Alex

^ permalink raw reply

* RE: [PATCH net-next 2/3] ptp: igb: Use the high resolution frequency method.
From: Keller, Jacob E @ 2016-11-09 21:40 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev@vger.kernel.org, tglx@linutronix.de,
	Manfred.Rudigier@omicron.at, ulrik.debie-os@e2big.org,
	stefan.sorensen@spectralink.com, davem@davemloft.net,
	Kirsher, Jeffrey T, john.stultz@linaro.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <20161109131146.GA6491@localhost.localdomain>

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Wednesday, November 09, 2016 5:12 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org; tglx@linutronix.de; Manfred.Rudigier@omicron.at;
> ulrik.debie-os@e2big.org; stefan.sorensen@spectralink.com;
> davem@davemloft.net; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> john.stultz@linaro.org; intel-wired-lan@lists.osuosl.org
> Subject: Re: [PATCH net-next 2/3] ptp: igb: Use the high resolution frequency
> method.
> 
> On Tue, Nov 08, 2016 at 10:02:22PM +0000, Keller, Jacob E wrote:
> > On Tue, 2016-11-08 at 22:49 +0100, Richard Cochran wrote:
> > > -	rate = ppb;
> > > -	rate <<= 26;
> > > -	rate = div_u64(rate, 1953125);
> > > +	rate = scaled_ppm;
> > > +	rate <<= 13;
> > > +	rate = div_u64(rate, 15625);
> >
> > I'm curious how you generate the new math here, since this can be
> > tricky, and I could use more examples in order to port to some of the
> > other drivers implementations. I'm not quit sure how to handle the
> > value when the lower 16 bits are fractional.
> 
> TL;DR version:
> 
> In ptp_clock.c we convert scaled_ppm to ppb like this.
> 
> 	ppb = scaled_ppm * 10^3 * 2^-16
> 
> If you already have a working driver that does
> 
> 	regval = ppb * SOMEMATH;
> 
> then just substitute
> 
> 	regval = (scaled_ppm * 10^3 * 2^-16) * SOMEMATH;
> 	       = (scaled_ppm *  5^3 * 2^-13) * SOMEMATH;
> 
> and simplify by combining the 5^3 and 2^-13 constants into SOMEMATH.
> 

Thanks, this makes more sense :)

> Longer explanation:
> 
> You have to consider how the frequency adjustment HW works, case by
> case.  Both the i210 and the phyter have an adjustment register that
> holds units of 2^-32 nanoseconds per 8 nanosecond clock period, and so
> the rate from adjustment value 1 is (2^-32 / 8).
> 
> Then with the old interface, the conversion from "adjustment unit" to
> ppb was (2^-32 / 8 * 10^9) or (2^-26 * 5^9).  The conversion the other
> way needs the inverse, and so the code did (ppb << 26) / 5^9.
> 
> With the new interface, the conversion from "adjustment unit" to
> scaled_ppm is (2^-32 / 8 * 10^6 * 2^16) or (2^-13 * 5^6).  The code
> converts the other direction using the inverse, (s_ppm << 13) / 5^6.
> 

Right. This helps.

Thanks,
Jake

> HTH,
> Richard

^ permalink raw reply

* [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN
From: Tyler Baicar @ 2016-11-09 21:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan, netdev, linux-kernel, okaya,
	timur
  Cc: alexander.duyck, dima.ruinskiy, Tyler Baicar

Move IRQ free code so that it will happen regardless of the
__E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
if the __E1000_DOWN bit is cleared. This is not sufficient because
it is possible for __E1000_DOWN to be set without releasing the IRQ.
In such a situation, we will hit a kernel bug later in e1000_remove
because the IRQ still has action since it was never freed. A
secondary bus reset can cause this case to happen.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..36cfcb0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
 
 	if (!test_bit(__E1000_DOWN, &adapter->state)) {
 		e1000e_down(adapter, true);
-		e1000_free_irq(adapter);
 
 		/* Link status message must follow this format */
 		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
 	}
 
+	e1000_free_irq(adapter);
+
 	napi_disable(&adapter->napi);
 
 	e1000e_free_tx_resources(adapter->tx_ring);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related

* Re: [PATCH net-next] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
From: Alexei Starovoitov @ 2016-11-09 21:53 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, bblanco, tariqt, zhiyisun, netdev
In-Reply-To: <9d5b1ed1b70a8a4db545c308bcb5b28280564798.1478724690.git.daniel@iogearbox.net>

On Wed, Nov 09, 2016 at 10:02:34PM +0100, Daniel Borkmann wrote:
> Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
> scheme") added a bug in that the prog's reference count is not dropped
> in the error path when mlx4_en_try_alloc_resources() is failing from
> mlx4_xdp_set().
> 
> We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
> need to release again. Earlier in the call path, dev_change_xdp_fd()
> itself holds a reference to the prog as well (hence the '- 1' in the
> bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
> an error is propagated, then bpf_prog_put() is called eventually from
> dev_change_xdp_fd()
> 
> Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Good catch. Thanks for the quick fix.

> +void bpf_prog_sub(struct bpf_prog *prog, int i)
> +{
> +	/* Only to be used for undoing previous bpf_prog_add() in some
> +	 * error path. We still know that another entity in our call
> +	 * path holds a reference to the program, thus atomic_sub() can
> +	 * be safely used in such cases!
> +	 */
> +	WARN_ON(atomic_sub_return(i, &prog->aux->refcnt) == 0);
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_sub);

to elaborate on this little bit further...
if we did it as an extension to bpf_prog_put(), like
  if (atomic_sub_return(i, &prog->aux->refcnt) == 0)
    call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
it could be seen a 'safe' version and people would be tempted
to use without thinking,
but since right now bpf_prog_add() can be done only after bpf_prog_get(),
this bpf_prog_sub() matches bpf_prog_add() to clean up refcnt
in the error path, so having separate helper like this makes sense,
instead of overloading bpf_prog_sub into 'safe' bpf_prog_put variant.
So all looks correct to me.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: nf_nat_tftp broken in 4.8?
From: Florian Westphal @ 2016-11-09 21:56 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, netfilter-devel, fgao, pablo
In-Reply-To: <b3fb3d07-fb0b-e4a5-20fe-bdc2abc8f6fc@gmail.com>

Florian Fainelli <f.fainelli@gmail.com> wrote:
> After updating my workstation from 4.4 to 4.8, which also serves as a
> NAT gateway for local machines, I noticed that TFTP across this NAT box
> broke:

does 'sysctl net.netfilter.nf_conntrack_helper=1' help?

^ permalink raw reply

* Re: 484611357c19 introduces arbitrary kernel write bug (root-only)
From: Jann Horn @ 2016-11-09 22:12 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Daniel Borkmann, Alexei Starovoitov, David S. Miller, security,
	netdev
In-Reply-To: <2e6a3cf8-254a-9cbb-201b-d3059a5b723a@fb.com>

Can you resend with "git send-email" or so? "git am" says that the
patch is corrupt, likely because of line wrapping.

On Wed, Nov 9, 2016 at 10:21 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 11/08/2016 07:23 PM, Jann Horn wrote:
>>
>> In 484611357c19 (not in any stable kernel yet), functionality is
>> introduced that allows root (and afaics nobody else, since nobody else
>> is allowed to perform pointer arithmetic) to basically write to (and
>> read from) arbitrary kernel memory. There are multiple bugs in the
>> validation logic:
>>
>>  - A bitwise AND of values in the ranges [a,b] and [c,d] is assumed to
>> always result in a value
>>    >= a&b. However, for the combination of ranges [1,1] and [1,2],
>> this calculates a minimum of 1
>>    while actually, 1&2 is zero. This is the bug that my crasher
>> (below) triggers.
>>  - a%b is assumed to always be smaller than b-1. However, for b==0,
>> this will calculate an upper
>>    limit of -1 while the values will actually always be zero.
>>  - I'm not sure about this, but I think that, when only one end of the
>> range is bounded, the logic will
>>    incorrectly also treat the other end as a bounded, and because of
>> the usage of bound
>>    placeholders that are smaller than the actual maximum values, this
>> could be used to perform
>>    out-of-bounds accesses.
>>
>> The fun part here is that, as soon as the validation is just
>> off-by-one, arithmetic transformations can be used to turn that into
>> out-of-bounds accesses at arbitrary offsets. The crasher turns the
>> off-by-one into a memory write at offset 0x10000000.
>>
>
> Can you give this a whirl?  It addresses your testcase and the other issues
> you've brought up.  Thanks
>
> From e47a1de98af2c1bcebd4224f546e3be1fd340b6a Mon Sep 17 00:00:00 2001
> From: Josef Bacik <jbacik@fb.com>
> Date: Wed, 9 Nov 2016 16:09:52 -0500
> Subject: [PATCH] bpf: fix range arithmetic for bpf map access
>
> I made some invalid assumptions with BPF_AND and BPF_MOD that could result
> in
> invalid accesses to bpf map entries.  Fix this up by doing a few things
>
> 1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in
> real
> life and just adds extra complexity.
>
> 2) Fix the logic for BPF_AND.  If the min value is negative then that is the
> new
> minimum, otherwise it is unconditionally 0.
>
> 3) Don't do operations on the ranges if they are set to the limits, as they
> are
> by definition undefined, and allowing arithmetic operations on those values
> could make them appear valid when they really aren't.
>
> This fixes the testcase provided by Jann as well as a few other theoretical
> problems.
>
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  include/linux/bpf_verifier.h |  3 +-
>  kernel/bpf/verifier.c        | 65
> ++++++++++++++++++++++++++++----------------
>  2 files changed, 44 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index ac5b393..15ceb7f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -22,7 +22,8 @@ struct bpf_reg_state {
>          * Used to determine if any memory access using this register will
>          * result in a bad access.
>          */
> -       u64 min_value, max_value;
> +       s64 min_value;
> +       u64 max_value;
>         u32 id;
>         union {
>                 /* valid when type == CONST_IMM | PTR_TO_STACK |
> UNKNOWN_VALUE */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9002575..840533a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -234,8 +234,8 @@ static void print_verifier_state(struct
> bpf_verifier_state *state)
>                                 reg->map_ptr->value_size,
>                                 reg->id);
>                 if (reg->min_value != BPF_REGISTER_MIN_RANGE)
> -                       verbose(",min_value=%llu",
> -                               (unsigned long long)reg->min_value);
> +                       verbose(",min_value=%lld",
> +                               (long long)reg->min_value);
>                 if (reg->max_value != BPF_REGISTER_MAX_RANGE)
>                         verbose(",max_value=%llu",
>                                 (unsigned long long)reg->max_value);
> @@ -1490,7 +1490,7 @@ static void check_reg_overflow(struct bpf_reg_state
> *reg)
>  {
>         if (reg->max_value > BPF_REGISTER_MAX_RANGE)
>                 reg->max_value = BPF_REGISTER_MAX_RANGE;
> -       if ((s64)reg->min_value < BPF_REGISTER_MIN_RANGE)
> +       if (reg->min_value < BPF_REGISTER_MIN_RANGE)
>                 reg->min_value = BPF_REGISTER_MIN_RANGE;
>  }
>
> @@ -1498,7 +1498,8 @@ static void adjust_reg_min_max_vals(struct
> bpf_verifier_env *env,
>                                     struct bpf_insn *insn)
>  {
>         struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
> -       u64 min_val = BPF_REGISTER_MIN_RANGE, max_val =
> BPF_REGISTER_MAX_RANGE;
> +       s64 min_val = BPF_REGISTER_MIN_RANGE;
> +       u64 max_val = BPF_REGISTER_MAX_RANGE;
>         bool min_set = false, max_set = false;
>         u8 opcode = BPF_OP(insn->code);
>
> @@ -1534,22 +1535,45 @@ static void adjust_reg_min_max_vals(struct
> bpf_verifier_env *env,
>                 return;
>         }
>
> +       /* If one of our values was at the end of our ranges then we can't
> just
> +        * do our normal operations to the register, we need to set the
> values
> +        * to the min/max since they are undefined.
> +        */
> +       if (min_val == BPF_REGISTER_MIN_RANGE)
> +               dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
> +       if (max_val == BPF_REGISTER_MAX_RANGE)
> +               dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
> +
>         switch (opcode) {
>         case BPF_ADD:
> -               dst_reg->min_value += min_val;
> -               dst_reg->max_value += max_val;
> +               if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
> +                       dst_reg->min_value += min_val;
> +               if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
> +                       dst_reg->max_value += max_val;
>                 break;
>         case BPF_SUB:
> -               dst_reg->min_value -= min_val;
> -               dst_reg->max_value -= max_val;
> +               if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
> +                       dst_reg->min_value -= min_val;
> +               if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
> +                       dst_reg->max_value -= max_val;
>                 break;
>         case BPF_MUL:
> -               dst_reg->min_value *= min_val;
> -               dst_reg->max_value *= max_val;
> +               if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
> +                       dst_reg->min_value *= min_val;
> +               if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
> +                       dst_reg->max_value *= max_val;
>                 break;
>         case BPF_AND:
> -               /* & is special since it could end up with 0 bits set. */
> -               dst_reg->min_value &= min_val;
> +               /* & is special since it's could be any value within our
> range,
> +                * including 0.  But if the thing we're AND'ing against is
> +                * negative and we're negative then that's the minimum
> value,
> +                * otherwise the minimum will always be 0.
> +                */
> +               if (min_val < 0 && dst_reg->min_value < 0)
> +                       dst_reg->min_value = min_t(s64, dst_reg->min_value,
> +                                                  min_val);
> +               else
> +                       dst_reg->min_value = 0;
>                 dst_reg->max_value = max_val;
>                 break;
>         case BPF_LSH:
> @@ -1559,24 +1583,19 @@ static void adjust_reg_min_max_vals(struct
> bpf_verifier_env *env,
>                  */
>                 if (min_val > ilog2(BPF_REGISTER_MAX_RANGE))
>                         dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
> -               else
> +               else if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
>                         dst_reg->min_value <<= min_val;
>
>                 if (max_val > ilog2(BPF_REGISTER_MAX_RANGE))
>                         dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
> -               else
> +               else if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
>                         dst_reg->max_value <<= max_val;
>                 break;
>         case BPF_RSH:
> -               dst_reg->min_value >>= min_val;
> -               dst_reg->max_value >>= max_val;
> -               break;
> -       case BPF_MOD:
> -               /* % is special since it is an unsigned modulus, so the
> floor
> -                * will always be 0.
> -                */
> -               dst_reg->min_value = 0;
> -               dst_reg->max_value = max_val - 1;
> +               if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
> +                       dst_reg->min_value >>= min_val;
> +               if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
> +                       dst_reg->max_value >>= max_val;
>                 break;
>         default:
>                 reset_reg_range_values(regs, insn->dst_reg);
> --
> 2.5.5
>

^ permalink raw reply

* Re: nf_nat_tftp broken in 4.8?
From: Florian Fainelli @ 2016-11-09 22:18 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, netfilter-devel, fgao, pablo
In-Reply-To: <20161109215614.GA21923@breakpoint.cc>

On 11/09/2016 01:56 PM, Florian Westphal wrote:
> Florian Fainelli <f.fainelli@gmail.com> wrote:
>> After updating my workstation from 4.4 to 4.8, which also serves as a
>> NAT gateway for local machines, I noticed that TFTP across this NAT box
>> broke:
> 
> does 'sysctl net.netfilter.nf_conntrack_helper=1' help?
> 

It does yes, thanks, and apologies for not seeing that earlier.
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox