Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 7/8] tools lib bpf: fix maps resolution
From: Wangnan (F) @ 2016-11-07 18:23 UTC (permalink / raw)
  To: Eric Leblond, netdev; +Cc: linux-kernel, ast, Daniel Borkmann, Joe Stringer
In-Reply-To: <20161016211834.11732-8-eric@regit.org>

Hi Eric,

Are you still working in this patch set?

Now I know why maps section is not a simple array
from a patch set from Joe Stringer:

https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html

So I think this patch is really useful.

Are you going to resend the whole patch set? If not, let me collect
this patch 7/8 into my local code base and send to Arnaldo
with my other patches.

Thank you.

On 2016/10/17 5:18, Eric Leblond wrote:
> It is not correct to assimilate the elf data of the maps section
> to an array of map definition. In fact the sizes differ. The
> offset provided in the symbol section has to be used instead.
>
> This patch fixes a bug causing a elf with two maps not to load
> correctly.
>
> Signed-off-by: Eric Leblond <eric@regit.org>
> ---
>   tools/lib/bpf/libbpf.c | 50 +++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 35 insertions(+), 15 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 1fe4532..f72628b 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -186,6 +186,7 @@ struct bpf_program {
>   struct bpf_map {
>   	int fd;
>   	char *name;
> +	size_t offset;
>   	struct bpf_map_def def;
>   	void *priv;
>   	bpf_map_clear_priv_t clear_priv;
> @@ -529,13 +530,6 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
>   
>   	pr_debug("maps in %s: %zd bytes\n", obj->path, size);
>   
> -	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
> -	if (!obj->maps) {
> -		pr_warning("alloc maps for object failed\n");
> -		return -ENOMEM;
> -	}
> -	obj->nr_maps = nr_maps;
> -
>   	for (i = 0; i < nr_maps; i++) {
>   		struct bpf_map_def *def = &obj->maps[i].def;
>   
> @@ -547,23 +541,42 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
>   		obj->maps[i].fd = -1;
>   
>   		/* Save map definition into obj->maps */
> -		*def = ((struct bpf_map_def *)data)[i];
> +		*def = *(struct bpf_map_def *)(data + obj->maps[i].offset);
>   	}
>   	return 0;
>   }
>   
>   static int
> -bpf_object__init_maps_name(struct bpf_object *obj)
> +bpf_object__init_maps_symbol(struct bpf_object *obj)
>   {
>   	int i;
> +	int nr_maps = 0;
>   	Elf_Data *symbols = obj->efile.symbols;
> +	size_t map_idx = 0;
>   
>   	if (!symbols || obj->efile.maps_shndx < 0)
>   		return -EINVAL;
>   
> +	/* get the number of maps */
> +	for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
> +		GElf_Sym sym;
> +
> +		if (!gelf_getsym(symbols, i, &sym))
> +			continue;
> +		if (sym.st_shndx != obj->efile.maps_shndx)
> +			continue;
> +		nr_maps++;
> +	}
> +
> +	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
> +	if (!obj->maps) {
> +		pr_warning("alloc maps for object failed\n");
> +		return -ENOMEM;
> +	}
> +	obj->nr_maps = nr_maps;
> +
>   	for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
>   		GElf_Sym sym;
> -		size_t map_idx;
>   		const char *map_name;
>   
>   		if (!gelf_getsym(symbols, i, &sym))
> @@ -574,12 +587,12 @@ bpf_object__init_maps_name(struct bpf_object *obj)
>   		map_name = elf_strptr(obj->efile.elf,
>   				      obj->efile.strtabidx,
>   				      sym.st_name);
> -		map_idx = sym.st_value / sizeof(struct bpf_map_def);
>   		if (map_idx >= obj->nr_maps) {
>   			pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
>   				   map_name, map_idx, obj->nr_maps);
>   			continue;
>   		}
> +		obj->maps[map_idx].offset = sym.st_value;
>   		obj->maps[map_idx].name = strdup(map_name);
>   		if (!obj->maps[map_idx].name) {
>   			pr_warning("failed to alloc map name\n");
> @@ -587,6 +600,7 @@ bpf_object__init_maps_name(struct bpf_object *obj)
>   		}
>   		pr_debug("map %zu is \"%s\"\n", map_idx,
>   			 obj->maps[map_idx].name);
> +		map_idx++;
>   	}
>   	return 0;
>   }
> @@ -647,8 +661,6 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
>   							data->d_buf,
>   							data->d_size);
>   		else if (strcmp(name, "maps") == 0) {
> -			err = bpf_object__init_maps(obj, data->d_buf,
> -						    data->d_size);
>   			obj->efile.maps_shndx = idx;
>   		} else if (sh.sh_type == SHT_SYMTAB) {
>   			if (obj->efile.symbols) {
> @@ -698,8 +710,16 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
>   		pr_warning("Corrupted ELF file: index of strtab invalid\n");
>   		return LIBBPF_ERRNO__FORMAT;
>   	}
> -	if (obj->efile.maps_shndx >= 0)
> -		err = bpf_object__init_maps_name(obj);
> +	if (obj->efile.maps_shndx >= 0) {
> +		Elf_Data *data;
> +		err = bpf_object__init_maps_symbol(obj);
> +		if (err)
> +			goto out;
> +
> +		scn = elf_getscn(elf, obj->efile.maps_shndx);
> +		data = elf_getdata(scn, 0);
> +		err = bpf_object__init_maps(obj, data->d_buf, data->d_size);
> +	}
>   out:
>   	return err;
>   }

^ permalink raw reply

* Re: [PATCH net] bpf: fix htab map destruction when extra reserve is in use
From: David Miller @ 2016-11-07 18:21 UTC (permalink / raw)
  To: daniel; +Cc: netdev, ast, dvyukov
In-Reply-To: <e8a8cc8011d16169eaf604b78583836ab2246f0e.1478213579.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri,  4 Nov 2016 00:01:19 +0100

> Commit a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
> added an extra per-cpu reserve to the hash table map to restore old
> behaviour from pre prealloc times. When non-prealloc is in use for a
> map, then problem is that once a hash table extra element has been
> linked into the hash-table, and the hash table is destroyed due to
> refcount dropping to zero, then htab_map_free() -> delete_all_elements()
> will walk the whole hash table and drop all elements via htab_elem_free().
> The problem is that the element from the extra reserve is first fed
> to the wrong backend allocator and eventually freed twice.
> 
> Fixes: a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH net-next 0/2] sfc: enable 4-tuple UDP RSS hashing
From: David Miller @ 2016-11-07 18:20 UTC (permalink / raw)
  To: ecree; +Cc: netdev, linux-net-drivers
In-Reply-To: <ba5aa0bc-277b-5b47-89f8-fad1010b77f9@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Thu, 3 Nov 2016 22:10:31 +0000

> EF10 based NICs have configurable RSS hash fields, and can be made to take the
> ports into the hash on UDP (they already do so for TCP).  This patch series
> enables this, in order to improve spreading of UDP traffic.

What does the chip do with fragmented traffic?

^ permalink raw reply

* Re: [PATCH net] sctp: assign assoc_id earlier in __sctp_connect
From: David Miller @ 2016-11-07 18:19 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: netdev, linux-sctp, vyasevich, nhorman, syzkaller, kcc, glider,
	edumazet, dvyukov, andreyknvl
In-Reply-To: <9df38dcd0323ad92386eb6851a60dc128dd00b4e.1478199530.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu,  3 Nov 2016 17:03:41 -0200

> sctp_wait_for_connect() currently already holds the asoc to keep it
> alive during the sleep, in case another thread release it. But Andrey
> Konovalov and Dmitry Vyukov reported an use-after-free in such
> situation.
> 
> Problem is that __sctp_connect() doesn't get a ref on the asoc and will
> do a read on the asoc after calling sctp_wait_for_connect(), but by then
> another thread may have closed it and the _put on sctp_wait_for_connect
> will actually release it, causing the use-after-free.
> 
> Fix is, instead of doing the read after waiting for the connect, do it
> before so, and avoid this issue as the socket is still locked by then.
> There should be no issue on returning the asoc id in case of failure as
> the application shouldn't trust on that number in such situations
> anyway.
> 
> This issue doesn't exist in sctp_sendmsg() path.
> 
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net v2 0/4] net: fix device reference leaks
From: David Miller @ 2016-11-07 18:18 UTC (permalink / raw)
  To: johan
  Cc: f.fainelli, mugunthanvnm, yisen.zhuang, salil.mehta, netdev,
	linux-kernel
In-Reply-To: <1478194822-29545-1-git-send-email-johan@kernel.org>

From: Johan Hovold <johan@kernel.org>
Date: Thu,  3 Nov 2016 18:40:18 +0100

> This series fixes a number of device reference leaks (and one of_node
> leak) due to failure to drop the references taken by bus_find_device()
> and friends.
> 
> Note that the final two patches have been compile tested only.
 ...
> v2
>  - hold reference to cpsw-phy-sel device while accessing private data as
>    requested by David. Also update the commit message. (patch 1/4)
>  - add linux-omap on CC where appropriate

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: icmp_route_lookup should use rt dev to determine L3 domain
From: David Miller @ 2016-11-07 18:16 UTC (permalink / raw)
  To: dsa; +Cc: netdev
In-Reply-To: <1478193219-5288-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu,  3 Nov 2016 10:13:39 -0700

> icmp_send is called in response to some event. The skb may not have
> the device set (skb->dev is NULL), but it is expected to have an rt.
> Update icmp_route_lookup to use the rt on the skb to determine L3
> domain.
> 
> Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

"skb_dst(...)->dev" would be more direct and look nicer.  No need to
use skb_rtable() just to walk backwards to the 'dst'.

^ permalink raw reply

* Re: [PATCH net-next] net: Update raw socket bind to consider l3 domain
From: David Miller @ 2016-11-07 18:15 UTC (permalink / raw)
  To: dsa; +Cc: netdev
In-Reply-To: <1478190300-14094-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu,  3 Nov 2016 09:25:00 -0700

> Binding a raw socket to a local address fails if the socket is bound
> to an L3 domain:
> 
>     $ vrf-test  -s -l 10.100.1.2 -R -I red
>     error binding socket: 99: Cannot assign requested address
> 
> Update raw_bind to look consider if sk_bound_dev_if is bound to an L3
> domain and use inet_addr_type_table to lookup the address.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
From: David Miller @ 2016-11-07 18:13 UTC (permalink / raw)
  To: brouer; +Cc: netdev, phil, robert, jhs
In-Reply-To: <20161103135534.28737.37657.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 03 Nov 2016 14:55:56 +0100

> This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> hopefully help userspace get a more consistent behavior when attaching
> qdisc to such virtual devices.

I'm still thinking about this.

My reservation about this is basically since the one known offender in
userspace acknowledged that what it was doing wrong, and fixed it
quickly already, I see no reason to explicitly accomodate this.

^ permalink raw reply

* Re: [PATCH v6 0/7] add NS2 support to bgmac
From: David Miller @ 2016-11-07 18:11 UTC (permalink / raw)
  To: jon.mason
  Cc: robh+dt, mark.rutland, f.fainelli, rafal,
	bcm-kernel-feedback-list, netdev, devicetree, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1478236262-3351-1-git-send-email-jon.mason@broadcom.com>

From: Jon Mason <jon.mason@broadcom.com>
Date: Fri,  4 Nov 2016 01:10:55 -0400

> Changes in v6:
> * Use a common bgmac_phy_connect_direct (per Rafal Milecki) 
> * Rebased on latest net-next
> * Added Reviewed-by to the relevant patches
> 
> 
> Changes in v5:
> * Change a pr_err to netdev_err (per Scott Branden)
> * Reword the lane swap binding documentation (per Andrew Lunn)
> 
> 
> Changes in v4:
> * Actually send out the lane swap binding doc patch (Per Scott Branden)
> * Remove unused #define (Per Andrew Lunn)
> 
> 
> Changes in v3:
> * Clean-up the bgmac DT binding doc (per Rob Herring)
> * Document the lane swap binding and make it generic (Per Andrew Lunn)
> 
> 
> Changes in v2:
> * Remove the PHY power-on (per Andrew Lunn)
> * Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
>   This results on none of the original PHY code from Vikas being
>   present.  So, I'm removing him as an author and giving him
>   "Inspired-by" credit.
> * Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
>   Fainelli)
> * Remove bgmac sleep (per Florian Fainelli)
> * Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
> * Rebased on latest net-next
> * Added patch for bcm54xx_auxctl_read, which is used in the BCM54810

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: l2tp: fix L2TP_ATTR_UDP_CSUM attribute type
From: David Miller @ 2016-11-07 18:08 UTC (permalink / raw)
  To: asbjorn; +Cc: jchapman, netdev, linux-kernel, shankerwangmiao
In-Reply-To: <20161104224838.7925-1-asbjorn@asbjorn.st>

From: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Date: Fri,  4 Nov 2016 22:48:34 +0000

> L2TP_ATTR_UDP_CSUM is a flag, and gets read with
> nla_get_flag, but it is defined as NLA_U8 in
> the nla_policy.
> 
> It appears that this is only publicly used in
> iproute2, where it's broken, because it's used as
> a NLA_FLAG, and fails validation as a NLA_U8.
> 
> The only place it's used as a NLA_U8 is in
> l2tp_nl_tunnel_send(), but iproute2 again reads that
> as a flag, it's therefore always set. Fortunately
> it is never used for anything, just read.
> 
> CC: Miao Wang <shankerwangmiao@gmail.com>
> Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>

This is definitely the wrong way to go about this.

The kernel is everywhere and updating iproute2 is infinitely
easier for users to do than updating the kernel.

And in any event, once exported we really should never change
the API of anything shown to userspace like this.  Just because
you can't find a user out there doesn't mean it doesn't exist.

Please instead fix iproute2 to use u8 attributes for this.

Thanks.

^ permalink raw reply

* Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems
From: Martin Blumenstingl @ 2016-11-07 17:37 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: Jerome Brunet, André Roth, Alexandre Torgue, Johnson Leung,
	linux-amlogic, netdev, afaerber
In-Reply-To: <3c21b0a4-43c9-0257-f8f2-c8e02cf94fbf@st.com>

Hi Peppe,

On Mon, Nov 7, 2016 at 11:59 AM, Giuseppe CAVALLARO
<peppe.cavallaro@st.com> wrote:
> In the meantime, I will read again the thread just to see if
> there is something I am missing.
if you are re-reading this thread: please note that there are two
devices in discussion here!
Both are using the Amlogic S905 (GXBB) SoC and both are experiencing
the same issue (Gbit TX issues, RX with Gbit speeds and RX/TX with
100Mbit speed are NOT affected):
- Odroid-C2 (used by Jerome and André Roth)
- Tronsmart Vega S95 Meta (my device)

The (Gbit TX) problem seems to be gone on the Odroid-C2 with Jerome's
patch which disables EEE in drivers/net/phy/realtek.c (at least in his
tests, I don't have that device so I can't verify).
The same problem still appears on my Tronsmart Vega S95 Meta even with
the patched PHY driver.

Unfortunately I don't have a second device to rule out that my
Tronsmart Vega S95 Meta could be broken (not unlikely, I get DDR
errors from time to time in u-boot). Maybe Andreas Faerber can test
ethernet with and without Jerome's patch on one of his Tronsmart
devices.

Regards,
Martin

^ permalink raw reply

* Re: [lkp] [net]  af1fee9821: BUG:spinlock_trylock_failure_on_UP_on_CPU
From: Andrew Lunn @ 2016-11-07 17:34 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: kernel test robot, Raju Lakkaraju, David S. Miller, LKML, netdev,
	lkp
In-Reply-To: <20161107132714.GA10669@microsemi.com>

On Mon, Nov 07, 2016 at 02:27:14PM +0100, Allan W. Nielsen wrote:
> Hi,
> 
> I tried to get this "lkp" up and running, but I had some troubles gettting
> these scripts to work.
> 
> But it seems like it can be reproduced using th eprovided config file, and qemu.
> 
> Here is what I did:
> 
> # reproduce original bug
> git reset --hard af1fee98219992ba2c12441a447719652ed7e983
> mkdir bug-build
> cp config-4.8.0-14895-gaf1fee9 bug-build/.config
> make O=bug-build oldconfig
> make O=bug-build -j8
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
>     ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> <see-output-1-below>
> # bug seemed to be re-produced
> 
> 
> # Try previous version
> git reset --hard 32ab0a38f0bd554cc45203ff4fdb6b0fdea6f025
> make O=bug-build oldconfig
> make O=bug-build -j8
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
>     ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> <see-output-2-below>
> # bug seemed to disappear
> 
> 
> # Try the buggy revision again - but without MICROSEMI_PHY
> git reset --hard af1fee98219992ba2c12441a447719652ed7e983
> sed -e "/MICROSEMI_PHY/d" -i bug-build/.config
> make O=bug-build oldconfig
> cat bug-build/.config | grep MICROSEMI_PHY
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
>     ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> <see-output-3-below>
> # bug still seem to be there...
> 
> 
> Not sure what this tells me, any hints are more than welcome.

If the bug happens without your code being compiled, it cannot be your
code. It suggests the patch is moving code around in such a way to
trigger the issue, but it is not the source of the issue itself. To me
it seems like memory corruption or uninitialised variables in some
other code, or maybe DMA from the stack, which was never allowed but
mostly work on some platforms, but the recent change to virtual mapped
stacks as broken.

Your code is off the hook, thanks for the testing you did.

     Andrew

^ permalink raw reply

* RE: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
From: Madalin-Cristian Bucur @ 2016-11-07 16:59 UTC (permalink / raw)
  To: David Miller
  Cc: pebolle@tiscali.nl, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, ppc@mindchasers.com,
	oss@buserror.net, joe@perches.com, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20161107.113941.669208733868640796.davem@davemloft.net>

> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, November 07, 2016 6:40 PM
> 
> From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
> Date: Mon, 7 Nov 2016 16:32:16 +0000
> 
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Monday, November 07, 2016 5:55 PM
> >>
> >> From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
> >> Date: Mon, 7 Nov 2016 15:43:26 +0000
> >>
> >> >> From: David Miller [mailto:davem@davemloft.net]
> >> >> Sent: Thursday, November 03, 2016 9:58 PM
> >> >>
> >> >> Why?  By clearing this, you disallow an important fundamental way to
> >> >> do performane testing, via pktgen.
> >> >
> >> > The Tx path in DPAA requires one to insert a back-pointer to the skb
> >> > into
> >> > the Tx buffer. On the Tx confirmation path the back-pointer in the
> >> > buffer
> >> > is used to release the skb. If Tx buffer is shared we'd alter the
> >> > back-pointer
> >> > and leak/double free skbs. See also
> >>
> >> Then have your software state store an array of SKB pointers, one for
> each
> >> TX ring entry, just like every other driver does.
> >
> > There is no Tx ring in DPAA. Frames are send out on QMan HW queues
> > towards the FMan for Tx and then received back on Tx confirmation queues
> > for cleanup.
> > Array traversal would for sure cost more than using the back-pointer.
> > Also, we can now process confirmations on a different core than the one
> > doing Tx,
> > we'd have to keep the arrays percpu and force the Tx conf on the same
> > core. Or add locks.
> 
> Report back an integer index, like every scsi driver out there which
> completes tagged queued block I/O operations asynchronously.  You can
> associate the array with a specific TX confirmation queue.

>From HW? It only gives you back the buffer start address (plus length, etc).
"buff_2_skb()" needs to be solved in SW, expensively using array (lists? As
the number of frames in flight can be large/variable) or cheaply with the back
pointer. The back-pointer approach has its tradeoffs: no shared skbs, imposed
non-zero needed_headroom.

^ permalink raw reply

* [PATCH 2/2] [v2] net: qcom/emac: enable flow control if requested
From: Timur Tabi @ 2016-11-07 16:51 UTC (permalink / raw)
  To: David Miller, Florian Fainelli, alokc, netdev
In-Reply-To: <1478537501-23454-1-git-send-email-timur@codeaurora.org>

If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
---

v2: fix calculation when TXFC should be set

 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 70a55dc..0b4deb3 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -575,10 +575,11 @@ void emac_mac_start(struct emac_adapter *adpt)
 
 	mac |= TXEN | RXEN;     /* enable RX/TX */
 
-	/* We don't have ethtool support yet, so force flow-control mode
-	 * to 'full' always.
-	 */
-	mac |= TXFC | RXFC;
+	/* Configure MAC flow control to match the PHY's settings. */
+	if (phydev->pause)
+		mac |= RXFC;
+	if (phydev->pause != phydev->asym_pause)
+		mac |= TXFC;
 
 	/* setup link speed */
 	mac &= ~SPEED_MASK;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH 1/2] net: qcom/emac: configure the external phy to allow pause frames
From: Timur Tabi @ 2016-11-07 16:51 UTC (permalink / raw)
  To: David Miller, Florian Fainelli, alokc, netdev
In-Reply-To: <1478537501-23454-1-git-send-email-timur@codeaurora.org>

Pause frames are used to enable flow control.  A MAC can send and
receive pause frames in order to throttle traffic.  However, the PHY
must be configured to allow those frames to pass through.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 6fb3bee..70a55dc 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -1003,6 +1003,12 @@ int emac_mac_up(struct emac_adapter *adpt)
 	writel((u32)~DIS_INT, adpt->base + EMAC_INT_STATUS);
 	writel(adpt->irq.mask, adpt->base + EMAC_INT_MASK);
 
+	/* Enable pause frames.  Without this feature, the EMAC has been shown
+	 * to receive (and drop) frames with FCS errors at gigabit connections.
+	 */
+	adpt->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+	adpt->phydev->advertising |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+
 	adpt->phydev->irq = PHY_IGNORE_INTERRUPT;
 	phy_start(adpt->phydev);
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH 0/2] net: qcom/emac: ensure that pause frames are enabled
From: Timur Tabi @ 2016-11-07 16:51 UTC (permalink / raw)
  To: David Miller, Florian Fainelli, alokc, netdev

The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it.  Therefore, we
need to enable flow control and force the phy to pass pause frames.

Timur Tabi (2):
  net: qcom/emac: configure the external phy to allow pause frames
  [v2] net: qcom/emac: enable flow control if requested

 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
From: David Miller @ 2016-11-07 16:39 UTC (permalink / raw)
  To: madalin.bucur
  Cc: netdev, linuxppc-dev, linux-kernel, oss, ppc, joe, pebolle,
	joakim.tjernlund
In-Reply-To: <AM4PR04MB1604B2393D5CE6607204800EECA70@AM4PR04MB1604.eurprd04.prod.outlook.com>

From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
Date: Mon, 7 Nov 2016 16:32:16 +0000

>> -----Original Message-----
>> From: David Miller [mailto:davem@davemloft.net]
>> Sent: Monday, November 07, 2016 5:55 PM
>> 
>> From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
>> Date: Mon, 7 Nov 2016 15:43:26 +0000
>> 
>> >> From: David Miller [mailto:davem@davemloft.net]
>> >> Sent: Thursday, November 03, 2016 9:58 PM
>> >>
>> >> Why?  By clearing this, you disallow an important fundamental way to do
>> >> performane testing, via pktgen.
>> >
>> > The Tx path in DPAA requires one to insert a back-pointer to the skb
>> into
>> > the Tx buffer. On the Tx confirmation path the back-pointer in the
>> buffer
>> > is used to release the skb. If Tx buffer is shared we'd alter the back-
>> pointer
>> > and leak/double free skbs. See also
>> 
>> Then have your software state store an array of SKB pointers, one for each
>> TX ring entry, just like every other driver does.
> 
> There is no Tx ring in DPAA. Frames are send out on QMan HW queues towards
> the FMan for Tx and then received back on Tx confirmation queues for cleanup.
> Array traversal would for sure cost more than using the back-pointer. Also,
> we can now process confirmations on a different core than the one doing Tx,
> we'd have to keep the arrays percpu and force the Tx conf on the same core.
> Or add locks.

Report back an integer index, like every scsi driver out there which
completes tagged queued block I/O operations asynchronously.  You can
associate the array with a specific TX confirmation queue.

^ permalink raw reply

* RE: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
From: Madalin-Cristian Bucur @ 2016-11-07 16:32 UTC (permalink / raw)
  To: David Miller
  Cc: pebolle@tiscali.nl, joakim.tjernlund@transmode.se,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	ppc@mindchasers.com, oss@buserror.net, joe@perches.com,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20161107.105500.43380129278294700.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, November 07, 2016 5:55 PM
> 
> From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
> Date: Mon, 7 Nov 2016 15:43:26 +0000
> 
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Thursday, November 03, 2016 9:58 PM
> >>
> >> Why?  By clearing this, you disallow an important fundamental way to do
> >> performane testing, via pktgen.
> >
> > The Tx path in DPAA requires one to insert a back-pointer to the skb
> into
> > the Tx buffer. On the Tx confirmation path the back-pointer in the
> buffer
> > is used to release the skb. If Tx buffer is shared we'd alter the back-
> pointer
> > and leak/double free skbs. See also
> 
> Then have your software state store an array of SKB pointers, one for each
> TX ring entry, just like every other driver does.

There is no Tx ring in DPAA. Frames are send out on QMan HW queues towards
the FMan for Tx and then received back on Tx confirmation queues for cleanup.
Array traversal would for sure cost more than using the back-pointer. Also,
we can now process confirmations on a different core than the one doing Tx,
we'd have to keep the arrays percpu and force the Tx conf on the same core.
Or add locks.

Madalin

^ permalink raw reply

* Re: [PATCH] [RFC] net: phy: phy drivers should not set SUPPORTED_Pause or SUPPORTED_Asym_Pause
From: Timur Tabi @ 2016-11-07 16:30 UTC (permalink / raw)
  To: Florian Fainelli, David Miller, netdev
In-Reply-To: <aa0224f1-814a-d281-ad26-6c8086e40658@gmail.com>

On 11/01/2016 01:35 PM, Florian Fainelli wrote:
> So in premise, this is good, and is exactly what I have in mind for the
> series that I am cooking, but if we apply this alone, without a change
> in drivers/net/phy/phy.c which adds SUPPORTED_Pause |
> SUPPORTED_AsymPause to phydev->features, we are basically breaking the
> Ethernet MAC drivers that don't explicitly override phydev->features and
> yet rely on that to get flow control to work.

Can you tell me where I should set the SUPPORTED_Pause and 
SUPPORTED_AsymPause flags?  I have a two candidates:

1. The settings[] array.  Add these flags to each entry.

2. In phy_sanitize_settings().  Add

	phydev->supported |= SUPPORTED_Pause | SUPPORTED_AsymPause;

at the end of the function.

I'm still don't understand 100% how these flags really work, because I 
just can't shake the feeling that they should not be set for every phy. 
  If these flags are supposed to be turned on universally, then why are 
they even an option?

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
From: Joakim Tjernlund @ 2016-11-07 16:25 UTC (permalink / raw)
  To: netdev@vger.kernel.org, madalin.bucur@nxp.com
  Cc: joe@perches.com, linux-kernel@vger.kernel.org, oss@buserror.net,
	linuxppc-dev@lists.ozlabs.org, pebolle@tiscali.nl,
	ppc@mindchasers.com, davem@davemloft.net
In-Reply-To: <1478117854-8952-3-git-send-email-madalin.bucur@nxp.com>

On Wed, 2016-11-02 at 22:17 +0200, Madalin Bucur wrote:
> This introduces the Freescale Data Path Acceleration Architecture
> (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
> BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
> the Freescale DPAA QorIQ platforms.

Nice to see DPAA support soon entering the kernel(not a day too early:)
I would like to see BQL supported from day one though, if possible.

 Regards
          Joakim Tjernlund

^ permalink raw reply

* Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
From: David Miller @ 2016-11-07 16:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: stephen.suryaputra.lin, netdev, ssurya
In-Reply-To: <1478534932.17367.2.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 07 Nov 2016 08:08:52 -0800

> In any case, rt is a shared object at that time, so even temporarily
> clearing/restoring rt_gateway seems wrong to me.
> 
> I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> this point.

Agreed.

^ permalink raw reply

* Re: [PATCH 00/12] xen: add common function for reading optional value
From: Jarkko Sakkinen @ 2016-11-07 16:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: Juergen Gross, linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	wei.liu2-Sxgqhf6Nn4DQT0dZR+AlfA,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	tomi.valkeinen-l0cyMroinI0,
	dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	xen-devel-GuqFBffKawuEi8DpZVb4nw,
	tpmdd-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA,
	paul.durrant-Sxgqhf6Nn4DQT0dZR+AlfA,
	roger.pau-Sxgqhf6Nn4DQT0dZR+AlfA
In-Reply-To: <8d314b86-4a28-5628-2a79-842a2fafc4c1-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

On Mon, Nov 07, 2016 at 11:08:09AM +0000, David Vrabel wrote:
> On 31/10/16 16:48, Juergen Gross wrote:
> > There are multiple instances of code reading an optional unsigned
> > parameter from Xenstore via xenbus_scanf(). Instead of repeating the
> > same code over and over add a service function doing the job and
> > replace the call of xenbus_scanf() with the call of the new function
> > where appropriate.
> 
> Acked-by: David Vrabel <david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
> 
> Please queue for the next release.

If you want this change to tpmdd, please resend it to tpmdd mailing
list and CC it to linux-security-module. Thanks.

> David

/Jarkko

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi

^ permalink raw reply

* Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
From: Eric Dumazet @ 2016-11-07 16:08 UTC (permalink / raw)
  To: Stephen Suryaputra Lin; +Cc: netdev, Stephen Suryaputra Lin
In-Reply-To: <1478531058-12701-1-git-send-email-ssurya@ieee.org>

On Mon, 2016-11-07 at 10:04 -0500, Stephen Suryaputra Lin wrote:
> ICMP redirects behavior is different after the commit above. An email
> requesting the explanation on why the behavior needs to be different
> was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
> Since there isn't a reply yet, I decided to prepare this formal patch.
> 
> In v2.6 kernel, it used to be that ip_rt_redirect() calls
> arp_bind_neighbour() which returns 0 and then the state of the neigh for
> the new_gw is checked. If the state isn't valid then the redirected
> route is deleted. This behavior is maintained up to v3.5.7 by
> check_peer_redirect() because rt->rt_gateway is assigned to
> peer->redirect_learned.a4 before calling ipv4_neigh_lookup().
> 
> After the commit, ipv4_neigh_lookup() is performed without the
> rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
> isn't zero, the function uses it as the key. The neigh is most likely valid
> since the old_gw is the one that sends the ICMP redirect message. Then the
> new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
> never gets resolved and the traffic is blackholed.
> 
> Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
> ---
>  net/ipv4/route.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 62d4d90c1389..510045cefcab 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
>  			goto reject_redirect;
>  	}
>  
> +	rt->rt_gateway = 0;
>  	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
> +	rt->rt_gateway = old_gw;
>  	if (!IS_ERR(n)) {
>  		if (!(n->nud_state & NUD_VALID)) {
>  			neigh_event_send(n, NULL);

In any case, rt is a shared object at that time, so even temporarily
clearing/restoring rt_gateway seems wrong to me.

I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
this point.

^ permalink raw reply

* Re: [net PATCH] fib_trie: Correct /proc/net/route off by one error
From: Jason Baron @ 2016-11-07 16:03 UTC (permalink / raw)
  To: Alexander Duyck, netdev, alexander.duyck; +Cc: Andy Whitcroft, davem
In-Reply-To: <20161104191157.13974.70665.stgit@ahduyck-blue-test.jf.intel.com>



On 11/04/2016 03:11 PM, Alexander Duyck wrote:
> The display of /proc/net/route has had a couple issues due to the fact that
> when I originally rewrote most of fib_trie I made it so that the iterator
> was tracking the next value to use instead of the current.
>
> In addition it had an off by 1 error where I was tracking the first piece
> of data as position 0, even though in reality that belonged to the
> SEQ_START_TOKEN.
>
> This patch updates the code so the iterator tracks the last reported
> position and key instead of the next expected position and key.  In
> addition it shifts things so that all of the leaves start at 1 instead of
> trying to report leaves starting with offset 0 as being valid.  With these
> two issues addressed this should resolve any off by one errors that were
> present in the display of /proc/net/route.
>
> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
> Cc: Andy Whitcroft <apw@canonical.com>
> Reported-by: Jason Baron <jbaron@akamai.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  net/ipv4/fib_trie.c |   21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>

Ok. Works for me.

Feel free to add:
Reviewed-and-Tested-by: Jason Baron <jbaron@akamai.com>

Thanks,

-Jason

^ permalink raw reply

* Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
From: David Miller @ 2016-11-07 15:55 UTC (permalink / raw)
  To: madalin.bucur
  Cc: netdev, linuxppc-dev, linux-kernel, oss, ppc, joe, pebolle,
	joakim.tjernlund
In-Reply-To: <AM4PR04MB16049FF6E433DA445828A855ECA70@AM4PR04MB1604.eurprd04.prod.outlook.com>

From: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
Date: Mon, 7 Nov 2016 15:43:26 +0000

>> From: David Miller [mailto:davem@davemloft.net]
>> Sent: Thursday, November 03, 2016 9:58 PM
>> 
>> Why?  By clearing this, you disallow an important fundamental way to do
>> performane testing, via pktgen.
> 
> The Tx path in DPAA requires one to insert a back-pointer to the skb into
> the Tx buffer. On the Tx confirmation path the back-pointer in the buffer
> is used to release the skb. If Tx buffer is shared we'd alter the back-pointer
> and leak/double free skbs. See also 

Then have your software state store an array of SKB pointers, one for each
TX ring entry, just like every other driver does.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox