Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: 3com: 3c59x: Use dev_get_drvdata
From: Chuhong Yuan @ 2019-07-23 13:18 UTC (permalink / raw)
  Cc: Steffen Klassert, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/3com/3c59x.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c
index 147051404194..8f897828869f 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -847,8 +847,7 @@ static void poll_vortex(struct net_device *dev)
 
 static int vortex_suspend(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct net_device *ndev = pci_get_drvdata(pdev);
+	struct net_device *ndev = dev_get_drvdata(dev);
 
 	if (!ndev || !netif_running(ndev))
 		return 0;
@@ -861,8 +860,7 @@ static int vortex_suspend(struct device *dev)
 
 static int vortex_resume(struct device *dev)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct net_device *ndev = pci_get_drvdata(pdev);
+	struct net_device *ndev = dev_get_drvdata(dev);
 	int err;
 
 	if (!ndev || !netif_running(ndev))
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
From: Jason Wang @ 2019-07-23 13:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20190723041144-mutt-send-email-mst@kernel.org>


On 2019/7/23 下午5:16, Michael S. Tsirkin wrote:
> On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote:
>> There's no need for RCU synchronization in vhost_uninit_vq_maps()
>> since we've already serialized with readers (memory accessors). This
>> also avoid the possible userspace DOS through ioctl() because of the
>> possible high latency caused by synchronize_rcu().
>>
>> Reported-by: Michael S. Tsirkin <mst@redhat.com>
>> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> I agree synchronize_rcu in both mmu notifiers and ioctl
> is a problem we must fix.
>
>> ---
>>   drivers/vhost/vhost.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 5b8821d00fe4..a17df1f4069a 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>>   	}
>>   	spin_unlock(&vq->mmu_lock);
>>   
>> -	synchronize_rcu();
>> +	/* No need for synchronize_rcu() or kfree_rcu() since we are
>> +	 * serialized with memory accessors (e.g vq mutex held).
>> +	 */
>>   
>>   	for (i = 0; i < VHOST_NUM_ADDRS; i++)
>>   		if (map[i])
>> -- 
>> 2.18.1
> .. however we can not RCU with no synchronization in sight.
> Sometimes there are hacks like using a lock/unlock
> pair instead of sync, but here no one bothers.
>
> specifically notifiers call reset vq maps which calls
> uninit vq maps which is not under any lock.


Notifier did this:

         if (map) {
                 if (uaddr->write) {
                         for (i = 0; i < map->npages; i++)
set_page_dirty(map->pages[i]);
}
                 rcu_assign_pointer(vq->maps[index], NULL);
}
spin_unlock(&vq->mmu_lock);

         if (map) {
synchronize_rcu();
vhost_map_unprefetch(map);
         }

So it indeed have a synchronize_rcu() there.

Thanks


>
> You will get use after free when map is then accessed.
>
> If you always have a lock then just take that lock
> and no need for RCU.
>

^ permalink raw reply

* [PATCH v1] tun: mark small packets as owned by the tap sock
From: Alexis Bauvin @ 2019-07-23 13:01 UTC (permalink / raw)
  To: stephen, davem, jasowang; +Cc: netdev, abauvin

Small packets going out of a tap device go through an optimized code
path that uses build_skb() rather than sock_alloc_send_pskb(). The
latter calls skb_set_owner_w(), but the small packet code path does not.

The net effect is that small packets are not owned by the userland
application's socket (e.g. QEMU), while large packets are.
This can be seen with a TCP session, where packets are not owned when
the window size is small enough (around PAGE_SIZE), while they are once
the window grows (note that this requires the host to support virtio
tso for the guest to offload segmentation).
All this leads to inconsistent behaviour in the kernel, especially on
netfilter modules that uses sk->socket (e.g. xt_owner).

Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")
---
 drivers/net/tun.c | 71 ++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3d443597bd04..ac56b6a29eb2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1656,6 +1656,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 {
 	struct page_frag *alloc_frag = &current->task_frag;
 	struct bpf_prog *xdp_prog;
+	struct sk_buff *skb;
 	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	char *buf;
 	size_t copied;
@@ -1686,44 +1687,46 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	 */
 	if (hdr->gso_type || !xdp_prog) {
 		*skb_xdp = 1;
-		return __tun_build_skb(alloc_frag, buf, buflen, len, pad);
-	}
-
-	*skb_xdp = 0;
+	} else {
+		*skb_xdp = 0;
 
-	local_bh_disable();
-	rcu_read_lock();
-	xdp_prog = rcu_dereference(tun->xdp_prog);
-	if (xdp_prog) {
-		struct xdp_buff xdp;
-		u32 act;
-
-		xdp.data_hard_start = buf;
-		xdp.data = buf + pad;
-		xdp_set_data_meta_invalid(&xdp);
-		xdp.data_end = xdp.data + len;
-		xdp.rxq = &tfile->xdp_rxq;
-
-		act = bpf_prog_run_xdp(xdp_prog, &xdp);
-		if (act == XDP_REDIRECT || act == XDP_TX) {
-			get_page(alloc_frag->page);
-			alloc_frag->offset += buflen;
+		local_bh_disable();
+		rcu_read_lock();
+		xdp_prog = rcu_dereference(tun->xdp_prog);
+		if (xdp_prog) {
+			struct xdp_buff xdp;
+			u32 act;
+
+			xdp.data_hard_start = buf;
+			xdp.data = buf + pad;
+			xdp_set_data_meta_invalid(&xdp);
+			xdp.data_end = xdp.data + len;
+			xdp.rxq = &tfile->xdp_rxq;
+
+			act = bpf_prog_run_xdp(xdp_prog, &xdp);
+			if (act == XDP_REDIRECT || act == XDP_TX) {
+				get_page(alloc_frag->page);
+				alloc_frag->offset += buflen;
+			}
+			err = tun_xdp_act(tun, xdp_prog, &xdp, act);
+			if (err < 0)
+				goto err_xdp;
+			if (err == XDP_REDIRECT)
+				xdp_do_flush_map();
+			if (err != XDP_PASS)
+				goto out;
+
+			pad = xdp.data - xdp.data_hard_start;
+			len = xdp.data_end - xdp.data;
 		}
-		err = tun_xdp_act(tun, xdp_prog, &xdp, act);
-		if (err < 0)
-			goto err_xdp;
-		if (err == XDP_REDIRECT)
-			xdp_do_flush_map();
-		if (err != XDP_PASS)
-			goto out;
-
-		pad = xdp.data - xdp.data_hard_start;
-		len = xdp.data_end - xdp.data;
+		rcu_read_unlock();
+		local_bh_enable();
 	}
-	rcu_read_unlock();
-	local_bh_enable();
 
-	return __tun_build_skb(alloc_frag, buf, buflen, len, pad);
+	skb = __tun_build_skb(alloc_frag, buf, buflen, len, pad);
+	if (skb)
+		skb_set_owner_w(skb, tfile->socket.sk);
+	return skb;
 
 err_xdp:
 	put_page(alloc_frag->page);
-- 


^ permalink raw reply related

* Re: [PATCH bpf] tools/bpf: fix bpftool build with OUTPUT set
From: Lorenz Bauer @ 2019-07-23 12:59 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, Networking, gor, heiko.carstens
In-Reply-To: <20190718142041.83342-1-iii@linux.ibm.com>

Hi Ilya,

Thanks for your patch! I tried it but had problems with
cross-compilation. Not sure if this is related to the patch or not
though, I haven't had the time to follow up.

Best
Lorenz

On Thu, 18 Jul 2019 at 15:20, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> Hi Lorenz,
>
> I've been using the following patch for quite some time now.
> Please let me know if it works for you.
>
> Best regards,
> Ilya
>
> ---
>
> When OUTPUT is set, bpftool and libbpf put their objects into the same
> directory, and since some of them have the same names, the collision
> happens.
>
> Fix by invoking libbpf build in a manner similar to $(call descend) -
> descend itself cannot be used, since libbpf is a sibling, and not a
> child, of bpftool.
>
> Also, don't link bpftool with libbpf.a twice.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  tools/bpf/bpftool/Makefile | 17 ++++++-----------
>  1 file changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index a7afea4dec47..2cbc3c166f44 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -15,23 +15,18 @@ else
>  endif
>
>  BPF_DIR = $(srctree)/tools/lib/bpf/
> -
> -ifneq ($(OUTPUT),)
> -  BPF_PATH = $(OUTPUT)
> -else
> -  BPF_PATH = $(BPF_DIR)
> -endif
> -
> -LIBBPF = $(BPF_PATH)libbpf.a
> +BPF_PATH = $(objtree)/tools/lib/bpf
> +LIBBPF = $(BPF_PATH)/libbpf.a
>
>  BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. kernelversion)
>
>  $(LIBBPF): FORCE
> -       $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
> +       $(Q)mkdir -p $(BPF_PATH)
> +       $(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) $(LIBBPF)
>
>  $(LIBBPF)-clean:
>         $(call QUIET_CLEAN, libbpf)
> -       $(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
> +       $(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) clean >/dev/null
>
>  prefix ?= /usr/local
>  bash_compdir ?= /usr/share/bash-completion/completions
> @@ -112,7 +107,7 @@ $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
>         $(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
>
>  $(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
> -       $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS)
> +       $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LIBS)
>
>  $(OUTPUT)%.o: %.c
>         $(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
> --
> 2.21.0
>


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-23 12:51 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <8093e352-d992-e17f-7168-5afbd9d3fb3f@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/23/2019, 12:58:55 (UTC+00:00)

> 
> On 23/07/2019 11:49, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/23/2019, 11:38:33 (UTC+00:00)
> > 
> >>
> >> On 23/07/2019 11:07, Jose Abreu wrote:
> >>> From: Jon Hunter <jonathanh@nvidia.com>
> >>> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
> >>>
> >>>> This appears to be a winner and by disabling the SMMU for the ethernet
> >>>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
> >>>> this worked! So yes appears to be related to the SMMU being enabled. We
> >>>> had to enable the SMMU for ethernet recently due to commit
> >>>> 954a03be033c7cef80ddc232e7cbdb17df735663.
> >>>
> >>> Finally :)
> >>>
> >>> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
> >>>
> >>> +         There are few reasons to allow unmatched stream bypass, and
> >>> +         even fewer good ones.  If saying YES here breaks your board
> >>> +         you should work on fixing your board.
> >>>
> >>> So, how can we fix this ? Is your ethernet DT node marked as 
> >>> "dma-coherent;" ?
> >>
> >> TBH I have no idea. I can't say I fully understand your change or how it
> >> is breaking things for us.
> >>
> >> Currently, the Tegra DT binding does not have 'dma-coherent' set. I see
> >> this is optional, but I am not sure how you determine whether or not
> >> this should be set.
> > 
> > From my understanding it means that your device / IP DMA accesses are coherent regarding the CPU point of view. I think it will be the case if GMAC is not behind any kind of IOMMU in the HW arch.
> 
> I understand what coherency is, I just don't know how you tell if this
> implementation of the ethernet controller is coherent or not.

Do you have any detailed diagram of your HW ? Such as blocks / IPs 
connection, address space wiring , ...

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()
From: Jason Gunthorpe @ 2019-07-23 12:47 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Christoph Hellwig,
	Daniel Vetter, Dan Williams, Dave Chinner, David Airlie,
	David S . Miller, Ilya Dryomov, Jan Kara, Jens Axboe,
	Jérôme Glisse, Johannes Thumshirn, Magnus Karlsson,
	Matthew Wilcox, Miklos Szeredi, Ming Lei, Sage Weil,
	Santosh Shilimkar, Yan Zheng, netdev, dri-devel, linux-mm,
	linux-rdma, bpf, LKML
In-Reply-To: <a4e9b293-11f8-6b3c-cf4d-308e3b32df34@nvidia.com>

On Mon, Jul 22, 2019 at 09:41:34PM -0700, John Hubbard wrote:

> * The leading underscores are often used for the more elaborate form of the
> call (as oppposed to decorating the core function name with "_flags", for
> example).

IMHO usually the __ version of a public symbol means something like
'why are you using this? you probably should not'

Often because the __ version has no locking or some other dangerous
configuration like that.

Jason

^ permalink raw reply

* Re: [net-next:master 13/14] drivers/net/ethernet/faraday/ftgmac100.c:777:13: error: 'skb_frag_t {aka struct bio_vec}' has no member named 'size'
From: René van Dorst @ 2019-07-23 12:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: kbuild-all, netdev, davem
In-Reply-To: <20190723115238.GJ363@bombadil.infradead.org>

Quoting Matthew Wilcox <willy@infradead.org>:

> On Tue, Jul 23, 2019 at 08:58:44AM +0000, René van Dorst wrote:
>> Hi Matthew,
>>
>> I see the same issue for the mediatek/mtk_eth_soc driver.
>
> Thanks, Rene.  The root problem for both of these drivers is that neither
> are built on x86 with CONFIG_COMPILE_TEST.  Is it possible to fix this?
>
> An untested patch to fix both of these problems (and two more that I
> spotted):

Hi Matthew,

Your patch fixes the build error.

I am not sure if iperf3 also triggers this code on my mt7621 device.
iperf3 results seems normal.

Greats,

René


>
> diff --git a/drivers/atm/he.c b/drivers/atm/he.c
> index 211607986134..70b00ae4ec38 100644
> --- a/drivers/atm/he.c
> +++ b/drivers/atm/he.c
> @@ -2580,10 +2580,9 @@ he_send(struct atm_vcc *vcc, struct sk_buff *skb)
>  			slot = 0;
>  		}
>
> -		tpd->iovec[slot].addr = dma_map_single(&he_dev->pci_dev->dev,
> -			(void *) page_address(frag->page) + frag->page_offset,
> -				frag->size, DMA_TO_DEVICE);
> -		tpd->iovec[slot].len = frag->size;
> +		tpd->iovec[slot].addr = skb_frag_dma_map(&he_dev->pci_dev->dev,
> +				frag, 0, skb_frag_size(frag), DMA_TO_DEVICE);
> +		tpd->iovec[slot].len = skb_frag_size(frag);
>  		++slot;
>
>  	}
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c  
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 030fed65393e..dc8d3e726e75 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -774,7 +774,7 @@ static netdev_tx_t  
> ftgmac100_hard_start_xmit(struct sk_buff *skb,
>  	for (i = 0; i < nfrags; i++) {
>  		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>
> -		len = frag->size;
> +		len = skb_frag_size(frag);
>
>  		/* Map it */
>  		map = skb_frag_dma_map(priv->dev, frag, 0, len,
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c  
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index f38c3fa7d705..9c4d1afa34e5 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -1958,7 +1958,7 @@ static int skb_to_sg_fd(struct dpaa_priv *priv,
>  	/* populate the rest of SGT entries */
>  	for (i = 0; i < nr_frags; i++) {
>  		frag = &skb_shinfo(skb)->frags[i];
> -		frag_len = frag->size;
> +		frag_len = skb_frag_size(frag);
>  		WARN_ON(!skb_frag_page(frag));
>  		addr = skb_frag_dma_map(dev, frag, 0,
>  					frag_len, dma_dir);
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c  
> b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 00991df44ed6..e529d86468b8 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -787,7 +787,8 @@ static inline int mtk_cal_txd_req(struct sk_buff *skb)
>  	if (skb_is_gso(skb)) {
>  		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>  			frag = &skb_shinfo(skb)->frags[i];
> -			nfrags += DIV_ROUND_UP(frag->size, MTK_TX_DMA_BUF_LEN);
> +			nfrags += DIV_ROUND_UP(skb_frag_size(frag),
> +						MTK_TX_DMA_BUF_LEN);
>  		}
>  	} else {
>  		nfrags += skb_shinfo(skb)->nr_frags;




^ permalink raw reply

* Re: [RFC PATCH net-next 10/12] drop_monitor: Add packet alert mode
From: Neil Horman @ 2019-07-23 12:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, dsahern, roopa, nikolay, jakub.kicinski, toke,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190722183134.14516-11-idosch@idosch.org>

On Mon, Jul 22, 2019 at 09:31:32PM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> So far drop monitor supported only one alert mode in which a summary of
> locations in which packets were recently dropped was sent to user space.
> 
> This alert mode is sufficient in order to understand that packets were
> dropped, but lacks information to perform a more detailed analysis.
> 
> Add a new alert mode in which the dropped packet itself is passed to
> user space along with metadata: The drop location (as program counter
> and resolved symbol), ingress / egress netdevice and arrival / departure
> timestamp. More metadata can be added in the future.
> 
> To avoid performing expensive operations in the context in which
> kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and
> queued on per-CPU skb drop list. Then, in process context the netlink
> message is allocated, prepared and finally sent to user space.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/uapi/linux/net_dropmon.h |  29 ++++
>  net/core/drop_monitor.c          | 287 ++++++++++++++++++++++++++++++-
>  2 files changed, 308 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
> index 5edbd0a675fd..7708c8a440a1 100644
> --- a/include/uapi/linux/net_dropmon.h
> +++ b/include/uapi/linux/net_dropmon.h
> @@ -53,6 +53,7 @@ enum {
>  	NET_DM_CMD_CONFIG,
>  	NET_DM_CMD_START,
>  	NET_DM_CMD_STOP,
> +	NET_DM_CMD_PACKET_ALERT,
>  	_NET_DM_CMD_MAX,
>  };
>  
> @@ -62,4 +63,32 @@ enum {
>   * Our group identifiers
>   */
>  #define NET_DM_GRP_ALERT 1
> +
> +enum net_dm_attr {
> +	NET_DM_ATTR_UNSPEC,
> +
> +	NET_DM_ATTR_ALERT_MODE,			/* u8 */
> +	NET_DM_ATTR_PC,				/* u64 */
> +	NET_DM_ATTR_SYMBOL,			/* string */
> +	NET_DM_ATTR_NETDEV_IFINDEX,		/* u32 */
> +	NET_DM_ATTR_NETDEV_NAME,		/* string */
> +	NET_DM_ATTR_TIMESTAMP,			/* struct timespec */
> +	NET_DM_ATTR_PAYLOAD,			/* binary */
> +	NET_DM_ATTR_PAD,
> +
> +	__NET_DM_ATTR_MAX,
> +	NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1
> +};
> +
> +/**
> + * enum net_dm_alert_mode - Alert mode.
> + * @NET_DM_ALERT_MODE_SUMMARY: A summary of recent drops is sent to user space.
> + * @NET_DM_ALERT_MODE_PACKET: Each dropped packet is sent to user space along
> + *                            with metadata.
> + */
> +enum net_dm_alert_mode {
> +	NET_DM_ALERT_MODE_SUMMARY,
> +	NET_DM_ALERT_MODE_PACKET,
> +};
> +
>  #endif
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index f441fb653567..512935fc235b 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -54,6 +54,7 @@ static DEFINE_MUTEX(net_dm_mutex);
>  struct per_cpu_dm_data {
>  	spinlock_t		lock;	/* Protects 'skb' and 'send_timer' */
>  	struct sk_buff		*skb;
> +	struct sk_buff_head	drop_queue;
>  	struct work_struct	dm_alert_work;
>  	struct timer_list	send_timer;
>  };
> @@ -75,6 +76,22 @@ static int dm_delay = 1;
>  static unsigned long dm_hw_check_delta = 2*HZ;
>  static LIST_HEAD(hw_stats_list);
>  
> +static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY;
> +
> +struct net_dm_skb_cb {
> +	void *pc;
> +};
> +
> +#define NET_DM_SKB_CB(__skb) ((struct net_dm_skb_cb *)&((__skb)->cb[0]))
> +
> +struct net_dm_alert_ops {
> +	void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb,
> +				void *location);
> +	void (*napi_poll_probe)(void *ignore, struct napi_struct *napi,
> +				int work, int budget);
> +	void (*work_item_func)(struct work_struct *work);
> +};
> +
>  static int net_dm_nl_pre_doit(const struct genl_ops *ops,
>  			      struct sk_buff *skb, struct genl_info *info)
>  {
> @@ -255,10 +272,194 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi,
>  	rcu_read_unlock();
>  }
>  
> +static const struct net_dm_alert_ops net_dm_alert_summary_ops = {
> +	.kfree_skb_probe	= trace_kfree_skb_hit,
> +	.napi_poll_probe	= trace_napi_poll_hit,
> +	.work_item_func		= send_dm_alert,
> +};
> +
> +static void net_dm_packet_trace_kfree_skb_hit(void *ignore,
> +					      struct sk_buff *skb,
> +					      void *location)
> +{
> +	struct per_cpu_dm_data *data;
> +	struct sk_buff *nskb;
> +	unsigned long flags;
> +
> +	nskb = skb_clone(skb, GFP_ATOMIC);
> +	if (!nskb)
> +		return;
> +
> +	NET_DM_SKB_CB(nskb)->pc = location;
> +	if (nskb->dev)
> +		dev_hold(nskb->dev);
> +
> +	data = this_cpu_ptr(&dm_cpu_data);
> +	spin_lock_irqsave(&data->drop_queue.lock, flags);
> +
> +	__skb_queue_tail(&data->drop_queue, nskb);
> +
> +	if (!timer_pending(&data->send_timer)) {
> +		data->send_timer.expires = jiffies + dm_delay * HZ;
> +		add_timer(&data->send_timer);
> +	}
> +
> +	spin_unlock_irqrestore(&data->drop_queue.lock, flags);
> +}
> +
> +static void net_dm_packet_trace_napi_poll_hit(void *ignore,
> +					      struct napi_struct *napi,
> +					      int work, int budget)
> +{
> +}
> +
> +#define NET_DM_MAX_SYMBOL_LEN 40
> +
> +static size_t net_dm_packet_report_size(size_t payload_len)
> +{
> +	size_t size;
> +
> +	size = nlmsg_msg_size(GENL_HDRLEN + net_drop_monitor_family.hdrsize);
> +
> +	return NLMSG_ALIGN(size) +
> +	       /* NET_DM_ATTR_PC */
> +	       nla_total_size(sizeof(u64)) +
> +	       /* NET_DM_ATTR_SYMBOL */
> +	       nla_total_size(NET_DM_MAX_SYMBOL_LEN + 1) +
> +	       /* NET_DM_ATTR_NETDEV_IFINDEX */
> +	       nla_total_size(sizeof(u32)) +
> +	       /* NET_DM_ATTR_NETDEV_NAME */
> +	       nla_total_size(IFNAMSIZ + 1) +
> +	       /* NET_DM_ATTR_TIMESTAMP */
> +	       nla_total_size(sizeof(struct timespec)) +
> +	       /* NET_DM_ATTR_PAYLOAD */
> +	       nla_total_size(payload_len);
> +}
> +
> +static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
> +				     size_t payload_len)
> +{
> +	u64 pc = (u64)(uintptr_t) NET_DM_SKB_CB(skb)->pc;
> +	char buf[NET_DM_MAX_SYMBOL_LEN];
> +	struct nlattr *attr;
> +	struct timespec ts;
> +	void *hdr;
> +
> +	hdr = genlmsg_put(msg, 0, 0, &net_drop_monitor_family, 0,
> +			  NET_DM_CMD_PACKET_ALERT);
> +	if (!hdr)
> +		return -EMSGSIZE;
> +
> +	if (nla_put_u64_64bit(msg, NET_DM_ATTR_PC, pc, NET_DM_ATTR_PAD))
> +		goto nla_put_failure;
> +
> +	snprintf(buf, sizeof(buf), "%pS", NET_DM_SKB_CB(skb)->pc);
> +	if (nla_put_string(msg, NET_DM_ATTR_SYMBOL, buf))
> +		goto nla_put_failure;
> +
> +	if (skb->dev &&
> +	    nla_put_u32(msg, NET_DM_ATTR_NETDEV_IFINDEX, skb->dev->ifindex))
> +		goto nla_put_failure;
> +
> +	if (skb->dev &&
> +	    nla_put_string(msg, NET_DM_ATTR_NETDEV_NAME, skb->dev->name))
> +		goto nla_put_failure;
> +
> +	if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
> +	    nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
> +		goto nla_put_failure;
> +
> +	if (!payload_len)
> +		goto out;
> +
> +	attr = skb_put(msg, nla_total_size(payload_len));
> +	attr->nla_type = NET_DM_ATTR_PAYLOAD;
> +	attr->nla_len = nla_attr_size(payload_len);
> +	if (skb_copy_bits(skb, 0, nla_data(attr), payload_len))
> +		goto nla_put_failure;
> +
> +out:
> +	genlmsg_end(msg, hdr);
> +
> +	return 0;
> +
> +nla_put_failure:
> +	genlmsg_cancel(msg, hdr);
> +	return -EMSGSIZE;
> +}
> +
> +#define NET_DM_MAX_PACKET_SIZE (0xffff - NLA_HDRLEN - NLA_ALIGNTO)
> +
> +static void net_dm_packet_report(struct sk_buff *skb)
> +{
> +	struct sk_buff *msg;
> +	size_t payload_len;
> +	int rc;
> +
> +	/* Make sure we start copying the packet from the MAC header */
> +	if (skb->data > skb_mac_header(skb))
> +		skb_push(skb, skb->data - skb_mac_header(skb));
> +	else
> +		skb_pull(skb, skb_mac_header(skb) - skb->data);
> +
> +	/* Ensure packet fits inside a single netlink attribute */
> +	payload_len = min_t(size_t, skb->len, NET_DM_MAX_PACKET_SIZE);
> +
> +	msg = nlmsg_new(net_dm_packet_report_size(payload_len), GFP_KERNEL);
> +	if (!msg)
> +		goto out;
> +
> +	rc = net_dm_packet_report_fill(msg, skb, payload_len);
> +	if (rc) {
> +		nlmsg_free(msg);
> +		goto out;
> +	}
> +
> +	genlmsg_multicast(&net_drop_monitor_family, msg, 0, 0, GFP_KERNEL);
> +
> +out:
> +	if (skb->dev)
> +		dev_put(skb->dev);
> +	consume_skb(skb);
> +}
> +
> +static void net_dm_packet_work(struct work_struct *work)
> +{
> +	struct per_cpu_dm_data *data;
> +	struct sk_buff_head list;
> +	struct sk_buff *skb;
> +	unsigned long flags;
> +
> +	data = container_of(work, struct per_cpu_dm_data, dm_alert_work);
> +
> +	__skb_queue_head_init(&list);
> +
> +	spin_lock_irqsave(&data->drop_queue.lock, flags);
> +	skb_queue_splice_tail_init(&data->drop_queue, &list);
> +	spin_unlock_irqrestore(&data->drop_queue.lock, flags);
> +
These functions are all executed in a per-cpu context.  While theres nothing
wrong with using a spinlock here, I think you can get away with just doing
local_irqsave and local_irq_restore.

Neil

> +	while ((skb = __skb_dequeue(&list)))
> +		net_dm_packet_report(skb);
> +}
> +
> +static const struct net_dm_alert_ops net_dm_alert_packet_ops = {
> +	.kfree_skb_probe	= net_dm_packet_trace_kfree_skb_hit,
> +	.napi_poll_probe	= net_dm_packet_trace_napi_poll_hit,
> +	.work_item_func		= net_dm_packet_work,
> +};
> +
> +static const struct net_dm_alert_ops *net_dm_alert_ops_arr[] = {
> +	[NET_DM_ALERT_MODE_SUMMARY]	= &net_dm_alert_summary_ops,
> +	[NET_DM_ALERT_MODE_PACKET]	= &net_dm_alert_packet_ops,
> +};
> +
>  static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
>  {
> +	const struct net_dm_alert_ops *ops;
>  	int cpu, rc;
>  
> +	ops = net_dm_alert_ops_arr[net_dm_alert_mode];
> +
>  	if (!try_module_get(THIS_MODULE)) {
>  		NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module");
>  		return -ENODEV;
> @@ -267,17 +468,17 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
>  	for_each_possible_cpu(cpu) {
>  		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
>  
> -		INIT_WORK(&data->dm_alert_work, send_dm_alert);
> +		INIT_WORK(&data->dm_alert_work, ops->work_item_func);
>  		timer_setup(&data->send_timer, sched_send_work, 0);
>  	}
>  
> -	rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL);
> +	rc = register_trace_kfree_skb(ops->kfree_skb_probe, NULL);
>  	if (rc) {
>  		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint");
>  		goto err_module_put;
>  	}
>  
> -	rc = register_trace_napi_poll(trace_napi_poll_hit, NULL);
> +	rc = register_trace_napi_poll(ops->napi_poll_probe, NULL);
>  	if (rc) {
>  		NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to napi_poll() tracepoint");
>  		goto err_unregister_trace;
> @@ -286,7 +487,7 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
>  	return 0;
>  
>  err_unregister_trace:
> -	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
> +	unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL);
>  err_module_put:
>  	module_put(THIS_MODULE);
>  	return rc;
> @@ -295,10 +496,13 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack)
>  static void net_dm_trace_off_set(void)
>  {
>  	struct dm_hw_stat_delta *new_stat, *temp;
> +	const struct net_dm_alert_ops *ops;
>  	int cpu;
>  
> -	unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
> -	unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL);
> +	ops = net_dm_alert_ops_arr[net_dm_alert_mode];
> +
> +	unregister_trace_napi_poll(ops->napi_poll_probe, NULL);
> +	unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL);
>  
>  	tracepoint_synchronize_unregister();
>  
> @@ -307,9 +511,18 @@ static void net_dm_trace_off_set(void)
>  	 */
>  	for_each_possible_cpu(cpu) {
>  		struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu);
> +		struct sk_buff *skb;
>  
>  		del_timer_sync(&data->send_timer);
>  		cancel_work_sync(&data->dm_alert_work);
> +		/* If we deactivated a pending timer, some packets are still
> +		 * queued and we need to free them.
> +		 */
> +		while ((skb = __skb_dequeue(&data->drop_queue))) {
> +			if (skb->dev)
> +				dev_put(skb->dev);
> +			consume_skb(skb);
> +		}
>  	}
>  
>  	list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) {
> @@ -351,12 +564,61 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack)
>  	return rc;
>  }
>  
> +static int net_dm_alert_mode_get_from_info(struct genl_info *info,
> +					   enum net_dm_alert_mode *p_alert_mode)
> +{
> +	u8 val;
> +
> +	val = nla_get_u8(info->attrs[NET_DM_ATTR_ALERT_MODE]);
> +
> +	switch (val) {
> +	case NET_DM_ALERT_MODE_SUMMARY: /* fall-through */
> +	case NET_DM_ALERT_MODE_PACKET:
> +		*p_alert_mode = val;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int net_dm_alert_mode_set(struct genl_info *info)
> +{
> +	struct netlink_ext_ack *extack = info->extack;
> +	enum net_dm_alert_mode alert_mode;
> +	int rc;
> +
> +	if (!info->attrs[NET_DM_ATTR_ALERT_MODE])
> +		return 0;
> +
> +	rc = net_dm_alert_mode_get_from_info(info, &alert_mode);
> +	if (rc) {
> +		NL_SET_ERR_MSG_MOD(extack, "Invalid alert mode");
> +		return -EINVAL;
> +	}
> +
> +	net_dm_alert_mode = alert_mode;
> +
> +	return 0;
> +}
> +
>  static int net_dm_cmd_config(struct sk_buff *skb,
>  			struct genl_info *info)
>  {
> -	NL_SET_ERR_MSG_MOD(info->extack, "Command not supported");
> +	struct netlink_ext_ack *extack = info->extack;
> +	int rc;
>  
> -	return -EOPNOTSUPP;
> +	if (trace_state == TRACE_ON) {
> +		NL_SET_ERR_MSG_MOD(extack, "Cannot configure drop monitor while tracing is on");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	rc = net_dm_alert_mode_set(info);
> +	if (rc)
> +		return rc;
> +
> +	return 0;
>  }
>  
>  static int net_dm_cmd_trace(struct sk_buff *skb,
> @@ -411,6 +673,11 @@ static int dropmon_net_event(struct notifier_block *ev_block,
>  	return NOTIFY_DONE;
>  }
>  
> +static const struct nla_policy net_dm_nl_policy[NET_DM_ATTR_MAX + 1] = {
> +	[NET_DM_ATTR_UNSPEC] = { .strict_start_type = NET_DM_ATTR_UNSPEC + 1 },
> +	[NET_DM_ATTR_ALERT_MODE] = { .type = NLA_U8 },
> +};
> +
>  static const struct genl_ops dropmon_ops[] = {
>  	{
>  		.cmd = NET_DM_CMD_CONFIG,
> @@ -434,6 +701,8 @@ static struct genl_family net_drop_monitor_family __ro_after_init = {
>  	.hdrsize        = 0,
>  	.name           = "NET_DM",
>  	.version        = 2,
> +	.maxattr	= NET_DM_ATTR_MAX,
> +	.policy		= net_dm_nl_policy,
>  	.pre_doit	= net_dm_nl_pre_doit,
>  	.post_doit	= net_dm_nl_post_doit,
>  	.module		= THIS_MODULE,
> @@ -479,6 +748,7 @@ static int __init init_net_drop_monitor(void)
>  		INIT_WORK(&data->dm_alert_work, send_dm_alert);
>  		timer_setup(&data->send_timer, sched_send_work, 0);
>  		spin_lock_init(&data->lock);
> +		skb_queue_head_init(&data->drop_queue);
>  		reset_per_cpu_data(data);
>  	}
>  
> @@ -509,6 +779,7 @@ static void exit_net_drop_monitor(void)
>  		 * to this struct and can free the skb inside it
>  		 */
>  		kfree_skb(data->skb);
> +		WARN_ON(!skb_queue_empty(&data->drop_queue));
>  	}
>  
>  	BUG_ON(genl_unregister_family(&net_drop_monitor_family));
> -- 
> 2.21.0
> 
> 

^ permalink raw reply

* Re: [PATCH bpf-next 3/5] samples/bpf: convert xdp_sample_pkts_user to perf_buffer API
From: Toke Høiland-Jørgensen @ 2019-07-23 12:34 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190723043112.3145810-4-andriin@fb.com>

Andrii Nakryiko <andriin@fb.com> writes:

> Convert xdp_sample_pkts_user to libbpf's perf_buffer API.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

^ permalink raw reply

* Re: [bpf-next 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Toke Høiland-Jørgensen @ 2019-07-23 12:33 UTC (permalink / raw)
  To: Petar Penkov, netdev, bpf
  Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-4-ppenkov.kernel@gmail.com>

Petar Penkov <ppenkov.kernel@gmail.com> writes:

> From: Petar Penkov <ppenkov@google.com>
>
> This helper function allows BPF programs to try to generate SYN
> cookies, given a reference to a listener socket. The function works
> from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> socket in both cases.
>
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/uapi/linux/bpf.h | 30 ++++++++++++++++-
>  net/core/filter.c        | 73 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6f68438aa4ed..20baee7b2219 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2713,6 +2713,33 @@ union bpf_attr {
>   *		**-EPERM** if no permission to send the *sig*.
>   *
>   *		**-EAGAIN** if bpf program can try again.
> + *
> + * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
> + *	Description
> + *		Try to issue a SYN cookie for the packet with corresponding
> + *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
> + *
> + *		*iph* points to the start of the IPv4 or IPv6 header, while
> + *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
> + *		**sizeof**\ (**struct ip6hdr**).
> + *
> + *		*th* points to the start of the TCP header, while *th_len*
> + *		contains the length of the TCP header.
> + *
> + *	Return
> + *		On success, lower 32 bits hold the generated SYN cookie in
> + *		followed by 16 bits which hold the MSS value for that cookie,
> + *		and the top 16 bits are unused.
> + *
> + *		On failure, the returned value is one of the following:
> + *
> + *		**-EINVAL** SYN cookie cannot be issued due to error
> + *
> + *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
> + *
> + *		**-ENOTSUPP** kernel configuration does not enable SYN
> cookies

nit: This should be EOPNOTSUPP - the other one is for NFS...

> + *
> + *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -2824,7 +2851,8 @@ union bpf_attr {
>  	FN(strtoul),			\
>  	FN(sk_storage_get),		\
>  	FN(sk_storage_delete),		\
> -	FN(send_signal),
> +	FN(send_signal),		\
> +	FN(tcp_gen_syncookie),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 47f6386fb17a..92114271eff6 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5850,6 +5850,75 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
>  	.arg5_type	= ARG_CONST_SIZE,
>  };
>  
> +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> +	   struct tcphdr *, th, u32, th_len)
> +{
> +#ifdef CONFIG_SYN_COOKIES
> +	u32 cookie;
> +	u16 mss;
> +
> +	if (unlikely(th_len < sizeof(*th) || th_len != th->doff * 4))
> +		return -EINVAL;
> +
> +	if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> +		return -EINVAL;
> +
> +	if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> +		return -ENOENT;
> +
> +	if (!th->syn || th->ack || th->fin || th->rst)
> +		return -EINVAL;
> +
> +	if (unlikely(iph_len < sizeof(struct iphdr)))
> +		return -EINVAL;
> +
> +	/* Both struct iphdr and struct ipv6hdr have the version field at the
> +	 * same offset so we can cast to the shorter header (struct iphdr).
> +	 */
> +	switch (((struct iphdr *)iph)->version) {
> +	case 4:
> +		if (sk->sk_family == AF_INET6 && sk->sk_ipv6only)
> +			return -EINVAL;
> +
> +		mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
> +		break;
> +
> +#if IS_BUILTIN(CONFIG_IPV6)
> +	case 6:
> +		if (unlikely(iph_len < sizeof(struct ipv6hdr)))
> +			return -EINVAL;
> +
> +		if (sk->sk_family != AF_INET6)
> +			return -EINVAL;
> +
> +		mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
> +		break;
> +#endif /* CONFIG_IPV6 */
> +
> +	default:
> +		return -EPROTONOSUPPORT;
> +	}
> +	if (mss <= 0)
> +		return -ENOENT;
> +
> +	return cookie | ((u64)mss << 32);
> +#else
> +	return -ENOTSUPP;

See above

> +#endif /* CONFIG_SYN_COOKIES */
> +}
> +
> +static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
> +	.func		= bpf_tcp_gen_syncookie,
> +	.gpl_only	= true, /* __cookie_v*_init_sequence() is GPL */
> +	.pkt_access	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
> +	.arg2_type	= ARG_PTR_TO_MEM,
> +	.arg3_type	= ARG_CONST_SIZE,
> +	.arg4_type	= ARG_PTR_TO_MEM,
> +	.arg5_type	= ARG_CONST_SIZE,
> +};
> +
>  #endif /* CONFIG_INET */
>  
>  bool bpf_helper_changes_pkt_data(void *func)
> @@ -6135,6 +6204,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_tcp_check_syncookie_proto;
>  	case BPF_FUNC_skb_ecn_set_ce:
>  		return &bpf_skb_ecn_set_ce_proto;
> +	case BPF_FUNC_tcp_gen_syncookie:
> +		return &bpf_tcp_gen_syncookie_proto;
>  #endif
>  	default:
>  		return bpf_base_func_proto(func_id);
> @@ -6174,6 +6245,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_xdp_skc_lookup_tcp_proto;
>  	case BPF_FUNC_tcp_check_syncookie:
>  		return &bpf_tcp_check_syncookie_proto;
> +	case BPF_FUNC_tcp_gen_syncookie:
> +		return &bpf_tcp_gen_syncookie_proto;
>  #endif
>  	default:
>  		return bpf_base_func_proto(func_id);
> -- 
> 2.22.0.657.g960e92d24f-goog

^ permalink raw reply

* Re: [RFC PATCH net-next 00/12] drop_monitor: Capture dropped packets and metadata
From: Toke Høiland-Jørgensen @ 2019-07-23 12:17 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, nhorman, dsahern, roopa, nikolay, jakub.kicinski,
	andy, f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190723064659.GA16069@splinter>

Ido Schimmel <idosch@idosch.org> writes:

> On Mon, Jul 22, 2019 at 09:43:15PM +0200, Toke Høiland-Jørgensen wrote:
>> Is there a mechanism for the user to filter the packets before they are
>> sent to userspace? A bpf filter would be the obvious choice I guess...
>
> Hi Toke,
>
> Yes, it's on my TODO list to write an eBPF program that only lets
> "unique" packets to be enqueued on the netlink socket. Where "unique" is
> defined as {5-tuple, PC}. The rest of the copies will be counted in an
> eBPF map, which is just a hash table keyed by {5-tuple, PC}.

Yeah, that's a good idea. Or even something simpler like tcpdump-style
filters for the packets returned by drop monitor (say if I'm just trying
to figure out what happens to my HTTP requests).

> I think it would be good to have the program as part of the bcc
> repository [1]. What do you think?

Sure. We could also add it to the XDP tutorial[2]; it could go into a
section on introspection and debugging (just added a TODO about that[3]).

>> For integrating with XDP the trick would be to find a way to do it that
>> doesn't incur any overhead when it's not enabled. Are you envisioning
>> that this would be enabled separately for the different "modes" (kernel,
>> hardware, XDP, etc)?
>
> Yes. Drop monitor have commands to enable and disable tracing, but they
> don't carry any attributes at the moment. My plan is to add an attribute
> (e.g., 'NET_DM_ATTR_DROP_TYPE') that will specify the type of drops
> you're interested in - SW/HW/XDP. If the attribute is not specified,
> then current behavior is maintained and all the drop types are traced.
> But if you're only interested in SW drops, then overhead for the rest
> should be zero.

Makes sense (although "should be" is the key here ;)).

I'm also worried about the drop monitor getting overwhelmed; if you turn
it on for XDP and you're running a filtering program there, you'll
suddenly get *a lot* of drops.

As I read your patch, the current code can basically queue up an
unbounded number of packets waiting to go out over netlink, can't it?

> For HW drops I'm going to have devlink call into drop monitor. The
> function call will just be a NOP in case user is not interested in HW
> drops. I'm not sure if for XDP you want to register a probe on a
> tracepoint or call into drop monitor. If you want to use the former,
> then you can just have drop monitor unregister its probe from the
> tracepoint, which is what drop monitor is currently doing with the
> kfree_skb() tracepoint.

I guess we would use whichever has the least overhead. We do already
have tracepoints in the XDP path, but only on the error path. We
certainly don't want the overhead of a function call in the XDP fast
path when this feature is turned off (even if that function does
nothing), but maybe the NOP-patching for a tracepoint is acceptable.
We'll have to benchmark it :)

-Toke

[2] https://github.com/xdp-project/xdp-tutorial

[3] https://github.com/xdp-project/xdp-project/commit/7637702087a6b9d94816092a054d77c479df808a

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-23 12:09 UTC (permalink / raw)
  To: Robin Murphy, Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <6c769226-bdd9-6fe0-b96b-5a0d800fed24@arm.com>


On 23/07/2019 11:29, Robin Murphy wrote:
> On 23/07/2019 11:07, Jose Abreu wrote:
>> From: Jon Hunter <jonathanh@nvidia.com>
>> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
>>
>>> This appears to be a winner and by disabling the SMMU for the ethernet
>>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>>> this worked! So yes appears to be related to the SMMU being enabled. We
>>> had to enable the SMMU for ethernet recently due to commit
>>> 954a03be033c7cef80ddc232e7cbdb17df735663.
>>
>> Finally :)
>>
>> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
>>
>> +         There are few reasons to allow unmatched stream bypass, and
>> +         even fewer good ones.  If saying YES here breaks your board
>> +         you should work on fixing your board.
>>
>> So, how can we fix this ? Is your ethernet DT node marked as
>> "dma-coherent;" ?
> 
> The first thing to try would be booting the failing setup with
> "iommu.passthrough=1" (or using CONFIG_IOMMU_DEFAULT_PASSTHROUGH) - if
> that makes things seem OK, then the problem is likely related to address
> translation; if not, then it's probably time to start looking at nasties
> like coherency and ordering, although in principle I wouldn't expect the
> SMMU to have too much impact there.

Setting "iommu.passthrough=1" works for me. However, I am not sure where
to go from here, so any ideas you have would be great.

> Do you know if the SMMU interrupts are working correctly? If not, it's
> possible that an incorrect address or mapping direction could lead to
> the DMA transaction just being silently terminated without any fault
> indication, which generally presents as inexplicable weirdness (I've
> certainly seen that on another platform with the mix of an unsupported
> interrupt controller and an 'imperfect' ethernet driver).

If I simply remove the iommu node for the ethernet controller, then I
see lots of ...

[    6.296121] arm-smmu 12000000.iommu: Unexpected global fault, this could be serious
[    6.296125] arm-smmu 12000000.iommu:         GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000014, GFSYNR2 0x00000000

So I assume that this is triggering the SMMU interrupt correctly. 

> Just to confirm, has the original patch been tested with
> CONFIG_DMA_API_DEBUG to rule out any high-level mishaps?
Yes one of the first things we tried but did not bare any fruit.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* [PATCH bpf] libbpf: fix using uninitialized ioctl results
From: Ilya Maximets @ 2019-07-23 12:08 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, bpf, xdp-newbies, David S. Miller, Magnus Karlsson,
	Alexei Starovoitov, Daniel Borkmann, Ilya Maximets
In-Reply-To: <CGME20190723120815eucas1p21027b1ab47daba7ebb3a885bf869be8a@eucas1p2.samsung.com>

'channels.max_combined' initialized only on ioctl success and
errno is only valid on ioctl failure.

The code doesn't produce any runtime issues, but makes memory
sanitizers angry:

 Conditional jump or move depends on uninitialised value(s)
    at 0x55C056F: xsk_get_max_queues (xsk.c:336)
    by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
    by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
    by 0x55C0E57: xsk_socket__create (xsk.c:601)
  Uninitialised value was created by a stack allocation
    at 0x55C04CD: xsk_get_max_queues (xsk.c:318)

Additionally fixed warning on uninitialized bytes in ioctl arguments:

 Syscall param ioctl(SIOCETHTOOL) points to uninitialised byte(s)
    at 0x648D45B: ioctl (in /usr/lib64/libc-2.28.so)
    by 0x55C0546: xsk_get_max_queues (xsk.c:330)
    by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
    by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
    by 0x55C0E57: xsk_socket__create (xsk.c:601)
  Address 0x1ffefff378 is on thread 1's stack
  in frame #1, created by xsk_get_max_queues (xsk.c:318)
  Uninitialised value was created by a stack allocation
    at 0x55C04CD: xsk_get_max_queues (xsk.c:318)

CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 tools/lib/bpf/xsk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 5007b5d4fd2c..c4f912dc30f9 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -317,7 +317,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 
 static int xsk_get_max_queues(struct xsk_socket *xsk)
 {
-	struct ethtool_channels channels;
+	struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS };
 	struct ifreq ifr;
 	int fd, err, ret;
 
@@ -325,7 +325,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 	if (fd < 0)
 		return -errno;
 
-	channels.cmd = ETHTOOL_GCHANNELS;
+	memset(&ifr, 0, sizeof(ifr));
 	ifr.ifr_data = (void *)&channels;
 	strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
 	ifr.ifr_name[IFNAMSIZ - 1] = '\0';
@@ -335,7 +335,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 		goto out;
 	}
 
-	if (channels.max_combined == 0 || errno == EOPNOTSUPP)
+	if (err || channels.max_combined == 0)
 		/* If the device says it has no channels, then all traffic
 		 * is sent to a single stream, so max queues = 1.
 		 */
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v2] net/mlx5e: xsk: dynamically allocate mlx5e_channel_param
From: Maxim Mikityanskiy @ 2019-07-23 12:02 UTC (permalink / raw)
  To: Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed,
	Leon Romanovsky, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Tariq Toukan, Maxim Mikityanskiy

From: Arnd Bergmann <arnd@arndb.de>

The structure is too large to put on the stack, resulting in a
warning on 32-bit ARM:

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:59:5: error: stack frame size of 1344 bytes in function
      'mlx5e_open_xsk' [-Werror,-Wframe-larger-than=]

Use kvzalloc() instead.

Fixes: a038e9794541 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
v2 changes: use kvzalloc/kvfree and fix a memory leak.

 .../mellanox/mlx5/core/en/xsk/setup.c         | 27 ++++++++++++-------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index aaffa6f68dc0..f701e4f3c076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -60,24 +60,28 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 		   struct mlx5e_xsk_param *xsk, struct xdp_umem *umem,
 		   struct mlx5e_channel *c)
 {
-	struct mlx5e_channel_param cparam = {};
+	struct mlx5e_channel_param *cparam;
 	struct dim_cq_moder icocq_moder = {};
 	int err;
 
 	if (!mlx5e_validate_xsk_param(params, xsk, priv->mdev))
 		return -EINVAL;
 
-	mlx5e_build_xsk_cparam(priv, params, xsk, &cparam);
+	cparam = kvzalloc(sizeof(*cparam), GFP_KERNEL);
+	if (!cparam)
+		return -ENOMEM;
 
-	err = mlx5e_open_cq(c, params->rx_cq_moderation, &cparam.rx_cq, &c->xskrq.cq);
+	mlx5e_build_xsk_cparam(priv, params, xsk, cparam);
+
+	err = mlx5e_open_cq(c, params->rx_cq_moderation, &cparam->rx_cq, &c->xskrq.cq);
 	if (unlikely(err))
-		return err;
+		goto err_free_cparam;
 
-	err = mlx5e_open_rq(c, params, &cparam.rq, xsk, umem, &c->xskrq);
+	err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq);
 	if (unlikely(err))
 		goto err_close_rx_cq;
 
-	err = mlx5e_open_cq(c, params->tx_cq_moderation, &cparam.tx_cq, &c->xsksq.cq);
+	err = mlx5e_open_cq(c, params->tx_cq_moderation, &cparam->tx_cq, &c->xsksq.cq);
 	if (unlikely(err))
 		goto err_close_rq;
 
@@ -87,21 +91,23 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 	 * is disabled and then reenabled, but the SQ continues receiving CQEs
 	 * from the old UMEM.
 	 */
-	err = mlx5e_open_xdpsq(c, params, &cparam.xdp_sq, umem, &c->xsksq, true);
+	err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true);
 	if (unlikely(err))
 		goto err_close_tx_cq;
 
-	err = mlx5e_open_cq(c, icocq_moder, &cparam.icosq_cq, &c->xskicosq.cq);
+	err = mlx5e_open_cq(c, icocq_moder, &cparam->icosq_cq, &c->xskicosq.cq);
 	if (unlikely(err))
 		goto err_close_sq;
 
 	/* Create a dedicated SQ for posting NOPs whenever we need an IRQ to be
 	 * triggered and NAPI to be called on the correct CPU.
 	 */
-	err = mlx5e_open_icosq(c, params, &cparam.icosq, &c->xskicosq);
+	err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->xskicosq);
 	if (unlikely(err))
 		goto err_close_icocq;
 
+	kvfree(cparam);
+
 	spin_lock_init(&c->xskicosq_lock);
 
 	set_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
@@ -123,6 +129,9 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
 err_close_rx_cq:
 	mlx5e_close_cq(&c->xskrq.cq);
 
+err_free_cparam:
+	kvfree(cparam);
+
 	return err;
 }
 
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-23 11:58 UTC (permalink / raw)
  To: Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <BYAPR12MB3269D050556BD51030DCDDFCD3C70@BYAPR12MB3269.namprd12.prod.outlook.com>


On 23/07/2019 11:49, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/23/2019, 11:38:33 (UTC+00:00)
> 
>>
>> On 23/07/2019 11:07, Jose Abreu wrote:
>>> From: Jon Hunter <jonathanh@nvidia.com>
>>> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
>>>
>>>> This appears to be a winner and by disabling the SMMU for the ethernet
>>>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>>>> this worked! So yes appears to be related to the SMMU being enabled. We
>>>> had to enable the SMMU for ethernet recently due to commit
>>>> 954a03be033c7cef80ddc232e7cbdb17df735663.
>>>
>>> Finally :)
>>>
>>> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
>>>
>>> +         There are few reasons to allow unmatched stream bypass, and
>>> +         even fewer good ones.  If saying YES here breaks your board
>>> +         you should work on fixing your board.
>>>
>>> So, how can we fix this ? Is your ethernet DT node marked as 
>>> "dma-coherent;" ?
>>
>> TBH I have no idea. I can't say I fully understand your change or how it
>> is breaking things for us.
>>
>> Currently, the Tegra DT binding does not have 'dma-coherent' set. I see
>> this is optional, but I am not sure how you determine whether or not
>> this should be set.
> 
> From my understanding it means that your device / IP DMA accesses are coherent regarding the CPU point of view. I think it will be the case if GMAC is not behind any kind of IOMMU in the HW arch.

I understand what coherency is, I just don't know how you tell if this
implementation of the ethernet controller is coherent or not.

Jon

-- 
nvpublic

^ permalink raw reply

* Re: [net-next:master 13/14] drivers/net/ethernet/faraday/ftgmac100.c:777:13: error: 'skb_frag_t {aka struct bio_vec}' has no member named 'size'
From: Matthew Wilcox @ 2019-07-23 11:52 UTC (permalink / raw)
  To: René van Dorst; +Cc: kbuild-all, netdev, davem
In-Reply-To: <20190723085844.Horde.ehPsGFdWI2BCQdl_UyzJxlS@www.vdorst.com>

On Tue, Jul 23, 2019 at 08:58:44AM +0000, René van Dorst wrote:
> Hi Matthew,
> 
> I see the same issue for the mediatek/mtk_eth_soc driver.

Thanks, Rene.  The root problem for both of these drivers is that neither
are built on x86 with CONFIG_COMPILE_TEST.  Is it possible to fix this?

An untested patch to fix both of these problems (and two more that I
spotted):

diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 211607986134..70b00ae4ec38 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2580,10 +2580,9 @@ he_send(struct atm_vcc *vcc, struct sk_buff *skb)
 			slot = 0;
 		}
 
-		tpd->iovec[slot].addr = dma_map_single(&he_dev->pci_dev->dev,
-			(void *) page_address(frag->page) + frag->page_offset,
-				frag->size, DMA_TO_DEVICE);
-		tpd->iovec[slot].len = frag->size;
+		tpd->iovec[slot].addr = skb_frag_dma_map(&he_dev->pci_dev->dev,
+				frag, 0, skb_frag_size(frag), DMA_TO_DEVICE);
+		tpd->iovec[slot].len = skb_frag_size(frag);
 		++slot;
 
 	}
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 030fed65393e..dc8d3e726e75 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -774,7 +774,7 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,
 	for (i = 0; i < nfrags; i++) {
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
-		len = frag->size;
+		len = skb_frag_size(frag);
 
 		/* Map it */
 		map = skb_frag_dma_map(priv->dev, frag, 0, len,
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index f38c3fa7d705..9c4d1afa34e5 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1958,7 +1958,7 @@ static int skb_to_sg_fd(struct dpaa_priv *priv,
 	/* populate the rest of SGT entries */
 	for (i = 0; i < nr_frags; i++) {
 		frag = &skb_shinfo(skb)->frags[i];
-		frag_len = frag->size;
+		frag_len = skb_frag_size(frag);
 		WARN_ON(!skb_frag_page(frag));
 		addr = skb_frag_dma_map(dev, frag, 0,
 					frag_len, dma_dir);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 00991df44ed6..e529d86468b8 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -787,7 +787,8 @@ static inline int mtk_cal_txd_req(struct sk_buff *skb)
 	if (skb_is_gso(skb)) {
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			frag = &skb_shinfo(skb)->frags[i];
-			nfrags += DIV_ROUND_UP(frag->size, MTK_TX_DMA_BUF_LEN);
+			nfrags += DIV_ROUND_UP(skb_frag_size(frag),
+						MTK_TX_DMA_BUF_LEN);
 		}
 	} else {
 		nfrags += skb_shinfo(skb)->nr_frags;

^ permalink raw reply related

* Re: [PATCH] [net-next] net/mlx5e: xsk: dynamically allocate mlx5e_channel_param
From: Arnd Bergmann @ 2019-07-23 11:29 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Saeed Mahameed, David S. Miller, Alexei Starovoitov, Tariq Toukan,
	Leon Romanovsky, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org
In-Reply-To: <535ebf16-c523-0799-3ffe-6cfbeee3ac57@mellanox.com>

On Tue, Jul 23, 2019 at 1:21 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
> On 2019-07-08 18:16, Maxim Mikityanskiy wrote:
> > On 2019-07-08 15:55, Arnd Bergmann wrote:
> >> -    mlx5e_build_xsk_cparam(priv, params, xsk, &cparam);
> >> +    cparam = kzalloc(sizeof(*cparam), GFP_KERNEL);
> >
> > Similar code in mlx5e_open_channels (en_main.c) uses kvzalloc. Although
> > the struct is currently smaller than a page anyway, and there should be
> > no difference in behavior now, I suggest using the same alloc function
> > to keep code uniform.
> >
> >>      /* Create a dedicated SQ for posting NOPs whenever we need an IRQ to be
> >>       * triggered and NAPI to be called on the correct CPU.
> >>       */
> >> -    err = mlx5e_open_icosq(c, params, &cparam.icosq, &c->xskicosq);
> >> +    err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->xskicosq);
> >>      if (unlikely(err))
> >>              goto err_close_icocq;
> >>
> >
> > Here is kfree missing. It's a memory leak in the good path.
>
> Arnd, I'm going to take over your patch and respin it, addressing my own
> comments, because it's been quite a while, and we want to have this fix.
>
> Thanks for spotting it.

Thanks for taking care of it now. I was planning to do a respin, but
the reply got lost in the depth of my inbox so I forgot about it.

      Arnd

^ permalink raw reply

* [patch iproute2 2/2] tc: batch: fix line/line_next processing in batch
From: Jiri Pirko @ 2019-07-23 11:25 UTC (permalink / raw)
  To: netdev; +Cc: sthemmin, dsahern, alexanderk, mlxsw
In-Reply-To: <20190723112538.10977-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

When getcmdline fails, there is no valid string in line_next.
So change the flow and don't process it. Alongside with that,
free the previous line buffer and prevent memory leak.

Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 tc/tc.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index 64e342dd85bf..8abc82aedcf8 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -325,11 +325,10 @@ static int batch(const char *name)
 {
 	struct batch_buf *head = NULL, *tail = NULL, *buf_pool = NULL;
 	char *largv[100], *largv_next[100];
-	char *line, *line_next = NULL;
+	char *line = NULL, *line_next = NULL;
 	bool bs_enabled = false;
 	bool lastline = false;
 	int largc, largc_next;
-	bool bs_enabled_saved;
 	bool bs_enabled_next;
 	int batchsize = 0;
 	size_t len = 0;
@@ -360,11 +359,13 @@ static int batch(const char *name)
 	largc = makeargs(line, largv, 100);
 	bs_enabled = batchsize_enabled(largc, largv);
 	do {
-		if (getcmdline(&line_next, &len, stdin) == -1)
+		if (getcmdline(&line_next, &len, stdin) == -1) {
 			lastline = true;
-
-		largc_next = makeargs(line_next, largv_next, 100);
-		bs_enabled_next = batchsize_enabled(largc_next, largv_next);
+			bs_enabled_next = false;
+		} else {
+			largc_next = makeargs(line_next, largv_next, 100);
+			bs_enabled_next = batchsize_enabled(largc_next, largv_next);
+		}
 		if (bs_enabled) {
 			struct batch_buf *buf;
 
@@ -389,17 +390,8 @@ static int batch(const char *name)
 		else
 			send = true;
 
-		line = line_next;
-		line_next = NULL;
-		len = 0;
-		bs_enabled_saved = bs_enabled;
-		bs_enabled = bs_enabled_next;
-
-		if (largc == 0) {
-			largc = largc_next;
-			memcpy(largv, largv_next, largc * sizeof(char *));
-			continue;	/* blank line */
-		}
+		if (largc == 0)
+			goto to_next_line;	/* blank line */
 
 		err = do_cmd(largc, largv, tail == NULL ? NULL : tail->buf,
 			     tail == NULL ? 0 : sizeof(tail->buf));
@@ -411,10 +403,8 @@ static int batch(const char *name)
 			if (!force)
 				break;
 		}
-		largc = largc_next;
-		memcpy(largv, largv_next, largc * sizeof(char *));
 
-		if (send && bs_enabled_saved) {
+		if (send && bs_enabled) {
 			struct iovec *iov, *iovs;
 			struct batch_buf *buf;
 			struct nlmsghdr *n;
@@ -438,6 +428,18 @@ static int batch(const char *name)
 			}
 			batchsize = 0;
 		}
+
+to_next_line:
+		if (lastline)
+			continue;
+		free(line);
+		line = line_next;
+		line_next = NULL;
+		len = 0;
+		bs_enabled = bs_enabled_next;
+		largc = largc_next;
+		memcpy(largv, largv_next, largc * sizeof(char *));
+
 	} while (!lastline);
 
 	free_batch_bufs(&buf_pool);
-- 
2.21.0


^ permalink raw reply related

* [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check
From: Jiri Pirko @ 2019-07-23 11:25 UTC (permalink / raw)
  To: netdev; +Cc: sthemmin, dsahern, alexanderk, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

One cannot depend on *argv being null in case of no arg is left on the
command line. For example in batch mode, this is not always true. Check
argc instead to prevent crash.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 tc/m_action.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index ab6bc0ad28ff..0f9c3a27795d 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -222,7 +222,7 @@ done0:
 				goto bad_val;
 			}
 
-			if (*argv && strcmp(*argv, "cookie") == 0) {
+			if (argc && strcmp(*argv, "cookie") == 0) {
 				size_t slen;
 
 				NEXT_ARG();
-- 
2.21.0


^ permalink raw reply related

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-23 11:22 UTC (permalink / raw)
  To: Robin Murphy, Jose Abreu, Jon Hunter, Lars Persson,
	Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <6c769226-bdd9-6fe0-b96b-5a0d800fed24@arm.com>

From: Robin Murphy <robin.murphy@arm.com>
Date: Jul/23/2019, 11:29:28 (UTC+00:00)

> On 23/07/2019 11:07, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/23/2019, 11:01:24 (UTC+00:00)
> > 
> >> This appears to be a winner and by disabling the SMMU for the ethernet
> >> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
> >> this worked! So yes appears to be related to the SMMU being enabled. We
> >> had to enable the SMMU for ethernet recently due to commit
> >> 954a03be033c7cef80ddc232e7cbdb17df735663.
> > 
> > Finally :)
> > 
> > However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
> > 
> > +         There are few reasons to allow unmatched stream bypass, and
> > +         even fewer good ones.  If saying YES here breaks your board
> > +         you should work on fixing your board.
> > 
> > So, how can we fix this ? Is your ethernet DT node marked as
> > "dma-coherent;" ?
> 
> The first thing to try would be booting the failing setup with 
> "iommu.passthrough=1" (or using CONFIG_IOMMU_DEFAULT_PASSTHROUGH) - if 
> that makes things seem OK, then the problem is likely related to address 
> translation; if not, then it's probably time to start looking at nasties 
> like coherency and ordering, although in principle I wouldn't expect the 
> SMMU to have too much impact there.
> 
> Do you know if the SMMU interrupts are working correctly? If not, it's 
> possible that an incorrect address or mapping direction could lead to 
> the DMA transaction just being silently terminated without any fault 
> indication, which generally presents as inexplicable weirdness (I've 
> certainly seen that on another platform with the mix of an unsupported 
> interrupt controller and an 'imperfect' ethernet driver).
> 
> Just to confirm, has the original patch been tested with 
> CONFIG_DMA_API_DEBUG to rule out any high-level mishaps?

Yes but both my setups don't have any IOMMU: One is x86 + SWIOTLB and 
another is just coherent regarding CPU.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH] [net-next] net/mlx5e: xsk: dynamically allocate mlx5e_channel_param
From: Maxim Mikityanskiy @ 2019-07-23 11:21 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Saeed Mahameed, David S. Miller, Alexei Starovoitov, Tariq Toukan,
	Leon Romanovsky, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org
In-Reply-To: <543fa599-8ea1-dbc8-d94a-f90af2069edd@mellanox.com>

On 2019-07-08 18:16, Maxim Mikityanskiy wrote:
> On 2019-07-08 15:55, Arnd Bergmann wrote:
>> -	mlx5e_build_xsk_cparam(priv, params, xsk, &cparam);
>> +	cparam = kzalloc(sizeof(*cparam), GFP_KERNEL);
> 
> Similar code in mlx5e_open_channels (en_main.c) uses kvzalloc. Although
> the struct is currently smaller than a page anyway, and there should be
> no difference in behavior now, I suggest using the same alloc function
> to keep code uniform.
> 
>>    	/* Create a dedicated SQ for posting NOPs whenever we need an IRQ to be
>>    	 * triggered and NAPI to be called on the correct CPU.
>>    	 */
>> -	err = mlx5e_open_icosq(c, params, &cparam.icosq, &c->xskicosq);
>> +	err = mlx5e_open_icosq(c, params, &cparam->icosq, &c->xskicosq);
>>    	if (unlikely(err))
>>    		goto err_close_icocq;
>>    
> 
> Here is kfree missing. It's a memory leak in the good path.

Arnd, I'm going to take over your patch and respin it, addressing my own 
comments, because it's been quite a while, and we want to have this fix.

Thanks for spotting it.

^ permalink raw reply

* Re: [PATCH v3 2/7] net: Use skb accessors in network core
From: Matthew Wilcox @ 2019-07-23 11:17 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: davem, hch, netdev
In-Reply-To: <aa40f270-9f55-323a-2e94-5bd326a7a142@huawei.com>

On Tue, Jul 23, 2019 at 11:56:41AM +0800, Yunsheng Lin wrote:
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6f1e31f674a3..e32081709a0d 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -2975,11 +2975,15 @@ skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen)
> >  	skb_zerocopy_clone(to, from, GFP_ATOMIC);
> >  
> >  	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
> > +		int size;
> > +
> >  		if (!len)
> >  			break;
> >  		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
> > -		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
> > -		len -= skb_shinfo(to)->frags[j].size;
> > +		size = min_t(int, skb_frag_size(&skb_shinfo(to)->frags[j]),
> > +					len);
> 
> It seems skb_frag_size returns unsigned int here, maybe:
> 
> unsigned int size;
> 
> size = min_t(unsigned int, skb_frag_size(&skb_shinfo(to)->frags[j]),
> 
> The original code also do not seem to using the correct min_t, but
> perhaps it is better to clean that up too?

A signed size also doesn't make sense to me, but I wasn't sufficiently
certain to make that change.  Please feel free to send a followup patch
for people to consider.

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-23 10:49 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <2ad7bf21-1f1f-db0f-2358-4901b7988b7d@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/23/2019, 11:38:33 (UTC+00:00)

> 
> On 23/07/2019 11:07, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/23/2019, 11:01:24 (UTC+00:00)
> > 
> >> This appears to be a winner and by disabling the SMMU for the ethernet
> >> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
> >> this worked! So yes appears to be related to the SMMU being enabled. We
> >> had to enable the SMMU for ethernet recently due to commit
> >> 954a03be033c7cef80ddc232e7cbdb17df735663.
> > 
> > Finally :)
> > 
> > However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
> > 
> > +         There are few reasons to allow unmatched stream bypass, and
> > +         even fewer good ones.  If saying YES here breaks your board
> > +         you should work on fixing your board.
> > 
> > So, how can we fix this ? Is your ethernet DT node marked as 
> > "dma-coherent;" ?
> 
> TBH I have no idea. I can't say I fully understand your change or how it
> is breaking things for us.
> 
> Currently, the Tegra DT binding does not have 'dma-coherent' set. I see
> this is optional, but I am not sure how you determine whether or not
> this should be set.

From my understanding it means that your device / IP DMA accesses are coherent regarding the CPU point of view. I think it will be the case if GMAC is not behind any kind of IOMMU in the HW arch.

I don't know about this SMMU but the source does have some special 
conditions when device is dma-coherent.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Lorenz Bauer @ 2019-07-23 10:45 UTC (permalink / raw)
  To: Song Liu
  Cc: Andy Lutomirski, Kees Cook, linux-security@vger.kernel.org,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Jann Horn, Greg KH, Linux API
In-Reply-To: <4A7A225A-6C23-4C0F-9A95-7C6C56B281ED@fb.com>

On Mon, 22 Jul 2019 at 21:54, Song Liu <songliubraving@fb.com> wrote:
>
> Hi Andy, Lorenz, and all,
>
> With 5.3-rc1 out, I am back on this. :)
>
> How about we modify the set as:
>   1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf.
>   2. Better handling of capable() calls through bpf code. I guess the
>      biggest problem here is is_priv in verifier.c:bpf_check().
>
> With this approach, we will be able to pass the fd around, so it should
> also solve problem for Go.

Thanks for picking this up again. I need to figure out what the API
for this would
look like on the Go side, but I think it's a nice solution!

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply

* [PATCH] net: dsa: sja1105: sja1105_main: Add of_node_put()
From: Nishka Dasgupta @ 2019-07-23 10:44 UTC (permalink / raw)
  To: olteanv, andrew, vivien.didelot, f.fainelli, davem, netdev,
	linux-kernel
  Cc: Nishka Dasgupta

Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
---
 drivers/net/dsa/sja1105/sja1105_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 32bf3a7cc3b6..6ed5f1e35789 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -625,6 +625,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 		if (of_property_read_u32(child, "reg", &index) < 0) {
 			dev_err(dev, "Port number not defined in device tree "
 				"(property \"reg\")\n");
+			of_node_put(child);
 			return -ENODEV;
 		}
 
@@ -634,6 +635,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 			dev_err(dev, "Failed to read phy-mode or "
 				"phy-interface-type property for port %d\n",
 				index);
+			of_node_put(child);
 			return -ENODEV;
 		}
 		ports[index].phy_mode = phy_mode;
@@ -643,6 +645,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 			if (!of_phy_is_fixed_link(child)) {
 				dev_err(dev, "phy-handle or fixed-link "
 					"properties missing!\n");
+				of_node_put(child);
 				return -ENODEV;
 			}
 			/* phy-handle is missing, but fixed-link isn't.
-- 
2.19.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox