Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net: adjust skb->truesize in ___pskb_trim()
From: Eric Dumazet @ 2017-04-26 16:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Andrey Konovalov, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket.

As we did recently in commit 158f323b9868 ("net: adjust skb->truesize in
pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
via a call to skb_condense().

If all frags were freed, then skb->truesize can be recomputed.

This call can be done if skb is not yet owned, or destructor is
sock_edemux().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 net/core/skbuff.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f86bf69cfb8d8bc17262cdba5d9f57a4726cd476..f1d04592ace02f32efa6e05df89c9a5e0023157f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1576,6 +1576,8 @@ int ___pskb_trim(struct sk_buff *skb, unsigned int len)
 		skb_set_tail_pointer(skb, len);
 	}
 
+	if (!skb->sk || skb->destructor == sock_edemux)
+		skb_condense(skb);
 	return 0;
 }
 EXPORT_SYMBOL(___pskb_trim);

^ permalink raw reply related

* [PATCH net-next] bpf: restore skb->sk before pskb_trim() call
From: Eric Dumazet @ 2017-04-26 16:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Andrey Konovalov, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE()
in skb_try_coalesce() reported by Andrey, I found that we had an skb
with skb->sk set but no skb->destructor.

This invalidated heuristic found in commit 158f323b9868 ("net: adjust
skb->truesize in pskb_expand_head()") and in cited patch.

Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should
restrain the temporary setting to a minimal section.

[1] https://patchwork.ozlabs.org/patch/755570/ 
    net: adjust skb->truesize in ___pskb_trim()

Fixes: 8f917bba0042 ("bpf: pass sk to helper functions")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
---
 net/core/filter.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 9a37860a80fc78378705b681ec3b0468824cbcf4..a253a6197e6b37a7ae2fe451c646b01c861a3e22 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -98,8 +98,8 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
 
 		skb->sk = sk;
 		pkt_len = bpf_prog_run_save_cb(filter->prog, skb);
-		err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
 		skb->sk = save_sk;
+		err = pkt_len ? pskb_trim(skb, max(cap, pkt_len)) : -EPERM;
 	}
 	rcu_read_unlock();
 

^ permalink raw reply related

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Cong Wang @ 2017-04-26 16:10 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Kernel Network Developers, Andrey Konovalov
In-Reply-To: <20170426054033.GA1867@nanopsycho.orion>

On Tue, Apr 25, 2017 at 10:40 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, Apr 26, 2017 at 07:03:23AM CEST, xiyou.wangcong@gmail.com wrote:
>>IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
>>but in many places especially bonding, we use struct sockaddr
>>to copy and set mac addr, this could lead to stack out-of-bounds
>>access.
>>
>>Fix it by using a larger address storage.
>>
>>Reported-by: Andrey Konovalov <andreyknvl@google.com>
>>Cc: Jiri Pirko <jiri@resnulli.us>
>>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>>---
>> drivers/net/team/team.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>>index 85c0124..88878f1 100644
>>--- a/drivers/net/team/team.c
>>+++ b/drivers/net/team/team.c
>>@@ -60,10 +60,13 @@ static struct team_port *team_port_get_rtnl(const struct net_device *dev)
>> static int __set_port_dev_addr(struct net_device *port_dev,
>>                              const unsigned char *dev_addr)
>> {
>>-      struct sockaddr addr;
>>+      struct {
>>+              unsigned short type;
>>+              unsigned char addr[MAX_ADDR_LEN];
>>+      } addr;
>
> Wouldn't it make sense to define this struct somewhere in the core
> headers?

I _did_ use a struct mac_addr until I found there are multiple places
in the tree already defining it... We are in a similar situation to the union
of struct in_addr and struct in6_addr, unfortunately.

We can always clean up these for net-next.

^ permalink raw reply

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Cong Wang @ 2017-04-26 16:11 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Jiri Pirko, Linux Kernel Network Developers, Andrey Konovalov
In-Reply-To: <7103fb6b-2483-37c0-48e5-19aa9fd4f386@redhat.com>

On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson <jarod@redhat.com> wrote:
>
> We already have struct sockaddr_storage that could be used throughout this
> set as well. We just converted a few pieces of the bonding driver over to
> using it for better support of ipoib bonds, via commit
> faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use that
> in both bonding and team, rather than having different per-driver structs,
> or Yet Another Address Storage implementation.

Technically, struct sockaddr_storage is not enough either, given the
max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage.

^ permalink raw reply

* Re: [PATCH net-next] bpf: restore skb->sk before pskb_trim() call
From: Daniel Borkmann @ 2017-04-26 16:14 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Andrey Konovalov, Willem de Bruijn
In-Reply-To: <1493222963.6453.77.camel@edumazet-glaptop3.roam.corp.google.com>

On 04/26/2017 06:09 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE()
> in skb_try_coalesce() reported by Andrey, I found that we had an skb
> with skb->sk set but no skb->destructor.
>
> This invalidated heuristic found in commit 158f323b9868 ("net: adjust
> skb->truesize in pskb_expand_head()") and in cited patch.
>
> Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should
> restrain the temporary setting to a minimal section.
>
> [1] https://patchwork.ozlabs.org/patch/755570/
>      net: adjust skb->truesize in ___pskb_trim()
>
> Fixes: 8f917bba0042 ("bpf: pass sk to helper functions")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>

Good point, thanks!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [REGRESSION next-20170426] Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices") causes oops in mvneta
From: Ralph Sennhauser @ 2017-04-26 16:15 UTC (permalink / raw)
  To: Sricharan R
  Cc: Rafael J. Wysocki, Joerg Roedel, Bjorn Helgaas, linux-acpi,
	linux-kernel, linux-pci, Thomas Petazzoni, netdev

Hi Sricharan R,

Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe time
for platform/amba/pci bus devices") causes a kernel panic as in the log
below on an armada-385. Reverting the commit fixes the issue.

Regards
Ralph

---

[   18.288244] [c06d8480] *pgd=0061941e(bad)
[   18.292271] Internal error: Oops: 80d [#1] SMP ARM
[   18.297080] Modules linked in:
[   18.471175] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.11.0-rc8-next-20170426 #3
[   18.479909] Hardware name: Marvell Armada 380/385 (Device Tree)
[   18.485850] task: c0a07000 task.stack: c0a00000
[   18.490401] PC is at __memzero+0x40/0x80
[   18.494336] LR is at 0x0
[   18.496878] pc : [<c0317920>]    lr : [<00000000>]    psr: 00000113
[   18.496878] sp : c0a01d0c  ip : 00000000  fp : c0a01d34
[   18.508402] r10: df43f800  r9 : df43f800  r8 : 00000001
[   18.513645] r7 : c06d7e40  r6 : 000007c0  r5 : c06d8480  r4 : de14aa80
[   18.520196] r3 : 00000000  r2 : 00000000  r1 : ffffffe4  r0 : c06d8480
[   18.526750] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   18.533912] Control: 10c5387d  Table: 16f0004a  DAC: 00000051
[   18.539679] Process swapper/0 (pid: 0, stack limit = 0xc0a00210)
[   18.545708] Stack: (0xc0a01d0c to 0xc0a02000)
[   18.550082] 1d00:                            c04caef0 c06d7e40 00000700 0cc04000 00000001
[   18.558292] 1d20: df691810 df43f800 c0a01d4c c0a01d38 c04caf20 c04cae7c e1b1d000 df5bd980
[   18.566503] 1d40: c0a01dc4 c0a01d50 c043d3d8 c04caf14 df41e494 00000000 1e652500 c0601460
[   18.574714] 1d60: dfbd8880 c0a04680 c06d7e40 00000000 dfbd8888 00000000 00000000 00000188
[   18.582924] 1d80: 1e652500 dfbd8880 00000040 00000100 df43fc80 00000001 00000000 dfbd8888
[   18.591134] 1da0: c043cfd4 00000040 0000012c ffff91f5 c0a02d00 c0a01de8 c0a01e24 c0a01dc8
[   18.599345] 1dc0: c04db2d4 c043cfe0 c015d81c c06db4d4 c0a04990 c0a04990 c0a302b2 1f267000
[   18.607555] 1de0: c096ab40 dfbd1b40 c0a01de8 c0a01de8 c0a01df0 c0a01df0 c0a01e1c 00000000
[   18.615766] 1e00: c0a0208c c0a00000 00000100 00000003 c0a02080 40000003 c0a01e84 c0a01e28
[   18.623977] 1e20: c0122e9c c04db0cc c0a01e54 00000101 c0a01e4c 00200102 c0a02d00 ffff91f5
[   18.632187] 1e40: 0000000a c06025b4 c0a31740 c09632a8 c0a02080 c0a01e28 c0169788 c0968420
[   18.640397] 1e60: 00000000 00000000 00000001 df408000 e0803100 c0a01f58 c0a01e94 c0a01e88
[   18.648607] 1e80: c01232d0 c0122d84 c0a01ebc c0a01e98 c015d6e4 c012322c c0a169c0 c0a03fac
[   18.656818] 1ea0: e080210c c0a01ee8 e0802100 e0803100 c0a01ee4 c0a01ec0 c01014a4 c015d688
[   18.665029] 1ec0: c01085b0 60000013 ffffffff c0a01f1c 00000000 c0a00000 c0a01f44 c0a01ee8
[   18.673240] 1ee0: c010c86c c0101460 00000001 00000000 00000000 c0118e40 c0a00000 c0a03cf8
[   18.681451] 1f00: c0a03cac c09696f8 00000000 00000000 c0a01f58 c0a01f44 c0a01f48 c0a01f38
[   18.689661] 1f20: c01085ac c01085b0 60000013 ffffffff 00000051 00000000 c0a01f54 c0a01f48
[   18.697871] 1f40: c05e3634 c010857c c0a01f8c c0a01f58 c0153458 c05e3618 c0a03c80 c0a0f30a
[   18.706082] 1f60: c0a30c00 000000bd c0a30c00 c0a03c80 ffffffff c0a30c00 c0833a28 dfffcb40
[   18.714292] 1f80: c0a01f9c c0a01f90 c0153718 c01532c0 c0a01fac c0a01fa0 c05dd758 c0153704
[   18.722503] 1fa0: c0a01ff4 c0a01fb0 c0800d68 c05dd6e8 ffffffff ffffffff 00000000 c08006f8
[   18.730713] 1fc0: 00000000 c0833a28 00000000 c0a30e94 c0a03c9c c0833a24 c0a081e8 0000406a
[   18.738923] 1fe0: 414fc091 00000000 00000000 c0a01ff8 0000807c c08009b4 00000000 00000000
[   18.747132] Backtrace:
[   18.749591] [<c04cae70>] (__build_skb) from [<c04caf20>] (build_skb+0x18/0x6c)
[   18.756843]  r9:df43f800 r8:df691810 r7:00000001 r6:0cc04000 r5:00000700 r4:c06d7e40
[   18.764618] [<c04caf08>] (build_skb) from [<c043d3d8>] (mvneta_poll+0x404/0xc18)
[   18.772042]  r5:df5bd980 r4:e1b1d000
[   18.775632] [<c043cfd4>] (mvneta_poll) from [<c04db2d4>] (net_rx_action+0x214/0x308)
[   18.783406]  r10:c0a01de8 r9:c0a02d00 r8:ffff91f5 r7:0000012c r6:00000040 r5:c043cfd4
[   18.791266]  r4:dfbd8888
[   18.793810] [<c04db0c0>] (net_rx_action) from [<c0122e9c>] (__do_softirq+0x124/0x248)
[   18.801672]  r10:40000003 r9:c0a02080 r8:00000003 r7:00000100 r6:c0a00000 r5:c0a0208c
[   18.809531]  r4:00000000
[   18.812074] [<c0122d78>] (__do_softirq) from [<c01232d0>] (irq_exit+0xb0/0xe4)
[   18.819325]  r10:c0a01f58 r9:e0803100 r8:df408000 r7:00000001 r6:00000000 r5:00000000
[   18.827184]  r4:c0968420
[   18.829729] [<c0123220>] (irq_exit) from [<c015d6e4>] (__handle_domain_irq+0x68/0xbc)
[   18.837591] [<c015d67c>] (__handle_domain_irq) from [<c01014a4>] (gic_handle_irq+0x50/0x94)
[   18.845976]  r9:e0803100 r8:e0802100 r7:c0a01ee8 r6:e080210c r5:c0a03fac r4:c0a169c0
[   18.853749] [<c0101454>] (gic_handle_irq) from [<c010c86c>] (__irq_svc+0x6c/0x90)
[   18.861261] Exception stack(0xc0a01ee8 to 0xc0a01f30)
[   18.866332] 1ee0:                   00000001 00000000 00000000 c0118e40 c0a00000 c0a03cf8
[   18.874543] 1f00: c0a03cac c09696f8 00000000 00000000 c0a01f58 c0a01f44 c0a01f48 c0a01f38
[   18.882753] 1f20: c01085ac c01085b0 60000013 ffffffff
[   18.887823]  r9:c0a00000 r8:00000000 r7:c0a01f1c r6:ffffffff r5:60000013 r4:c01085b0
[   18.895601] [<c0108570>] (arch_cpu_idle) from [<c05e3634>] (default_idle_call+0x28/0x34)
[   18.903727] [<c05e360c>] (default_idle_call) from [<c0153458>] (do_idle+0x1a4/0x1d0)
[   18.911503] [<c01532b4>] (do_idle) from [<c0153718>] (cpu_startup_entry+0x20/0x24)
[   18.919103]  r10:dfffcb40 r9:c0833a28 r8:c0a30c00 r7:ffffffff r6:c0a03c80 r5:c0a30c00
[   18.926962]  r4:000000bd
[   18.929508] [<c01536f8>] (cpu_startup_entry) from [<c05dd758>] (rest_init+0x7c/0x80)
[   18.937284] [<c05dd6dc>] (rest_init) from [<c0800d68>] (start_kernel+0x3c0/0x3cc)
[   18.944796] [<c08009a8>] (start_kernel) from [<0000807c>] (0x807c)
[   18.951001] Code: a8a0500c cafffff9 08bd8000 e3110020 (18a0500c)
[   18.957119] ---[ end trace 4e5c1e66e49610b0 ]---
[   18.961753] Kernel panic - not syncing: Fatal exception in interrupt
[   18.968133] CPU1: stopping
[   18.970852] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D    O    4.11.0-rc8-next-20170426 #3
[   18.979585] Hardware name: Marvell Armada 380/385 (Device Tree)
[   18.985527] Backtrace:
[   18.987986] [<c010ba4c>] (dump_backtrace) from [<c010bd20>] (show_stack+0x18/0x1c)
[   18.995586]  r7:df467f28 r6:60000193 r5:c0a165b0 r4:00000000
[   19.001268] [<c010bd08>] (show_stack) from [<c031883c>] (dump_stack+0x94/0xa8)
[   19.008520] [<c03187a8>] (dump_stack) from [<c010ecec>] (handle_IPI+0x178/0x198)
[   19.015945]  r7:df467f28 r6:00000000 r5:00000001 r4:c0a30ef0
[   19.021627] [<c010eb74>] (handle_IPI) from [<c01014e4>] (gic_handle_irq+0x90/0x94)
[   19.029227]  r7:df467f28 r6:e080210c r5:c0a03fac r4:c0a169c0
[   19.034908] [<c0101454>] (gic_handle_irq) from [<c010c86c>] (__irq_svc+0x6c/0x90)
[   19.042419] Exception stack(0xdf467f28 to 0xdf467f70)
[   19.047490] 7f20:                   00000001 00000000 00000000 c0118e40 df466000 c0a03cf8
[   19.055701] 7f40: c0a03cac c09696f8 00000000 00000000 df467f98 df467f84 df467f88 df467f78
[   19.063911] 7f60: c01085ac c01085b0 60000013 ffffffff
[   19.068982]  r9:df466000 r8:00000000 r7:df467f5c r6:ffffffff r5:60000013 r4:c01085b0
[   19.076758] [<c0108570>] (arch_cpu_idle) from [<c05e3634>] (default_idle_call+0x28/0x34)
[   19.084882] [<c05e360c>] (default_idle_call) from [<c0153458>] (do_idle+0x1a4/0x1d0)
[   19.092657] [<c01532b4>] (do_idle) from [<c0153718>] (cpu_startup_entry+0x20/0x24)
[   19.100258]  r10:00000000 r9:414fc091 r8:0000406a r7:c0a30f00 r6:10c0387d r5:00000001
[   19.108118]  r4:00000087
[   19.110661] [<c01536f8>] (cpu_startup_entry) from [<c010e918>] (secondary_start_kernel+0x150/0x15c)
[   19.119743] [<c010e7c8>] (secondary_start_kernel) from [<0010162c>] (0x10162c)
[   19.126993]  r5:00000051 r4:1f45c06a
[   19.130583] Rebooting in 3 seconds..

---

git bisect start
# bad: [e0a8aa40bd2c7d973b6520293f3fd86dcbca847b] Add linux-next specific files for 20170426
git bisect bad e0a8aa40bd2c7d973b6520293f3fd86dcbca847b
# good: [c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201] Linux 4.11-rc1
git bisect good c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201
# good: [221c3c382a529418d3a2acc58f53101103f6ff13] Merge remote-tracking branch 'l2-mtd/master'
git bisect good 221c3c382a529418d3a2acc58f53101103f6ff13
# bad: [ffcefbbe2f8c0af5f22af921afd2756baceddd74] Merge remote-tracking branch 'spi/for-next'
git bisect bad ffcefbbe2f8c0af5f22af921afd2756baceddd74
# good: [c2e7f82d336a451ebb904b8bf9a5a558cf16c39b] drm: mali-dp: Check the mclk rate and allow up/down scaling
git bisect good c2e7f82d336a451ebb904b8bf9a5a558cf16c39b
# good: [03b600f368cfa6f94e9622dda82e60f55b5e6224] Merge remote-tracking branch 'block/for-next'
git bisect good 03b600f368cfa6f94e9622dda82e60f55b5e6224
# good: [8b1e05cbcdfc59496eff5870cb6b6ab964ecc733] Merge remote-tracking branch 'battery/for-next'
git bisect good 8b1e05cbcdfc59496eff5870cb6b6ab964ecc733
# good: [e765d496d3adb3d69bd8c53df6fd3f3b77e5b1d2] Merge remote-tracking branch 'watchdog/master'
git bisect good e765d496d3adb3d69bd8c53df6fd3f3b77e5b1d2
# bad: [858eed97e369df7af0993463f355aa9755227136] Merge remote-tracking branch 'audit/next'
git bisect bad 858eed97e369df7af0993463f355aa9755227136
# bad: [efc2195bcc35eebf06805806eb525893f3b9ab5c] Merge branches 'arm/exynos', 'arm/omap', 'arm/rockchip', 'arm/mediatek', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next
git bisect bad efc2195bcc35eebf06805806eb525893f3b9ab5c
# bad: [316ca8804ea84a782d5ba2163711ebb22116ff5a] ACPI/IORT: Remove linker section for IORT entries probing
git bisect bad 316ca8804ea84a782d5ba2163711ebb22116ff5a
# good: [d7b0558230e444f29488fcee0b0b561015d16f8a] iommu/of: Prepare for deferred IOMMU configuration
git bisect good d7b0558230e444f29488fcee0b0b561015d16f8a
# bad: [09515ef5ddad71c7820e5e428da418b709feeb26] of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices
git bisect bad 09515ef5ddad71c7820e5e428da418b709feeb26
# good: [1d9029d440e40b276c0691caed1de10c42d96bef] ACPI/IORT: Add function to check SMMUs drivers presence
git bisect good 1d9029d440e40b276c0691caed1de10c42d96bef
# good: [efc8551a276faab19d85079da02c5fb602b0dcbe] of: device: Fix overflow of coherent_dma_mask
git bisect good efc8551a276faab19d85079da02c5fb602b0dcbe
# first bad commit: [09515ef5ddad71c7820e5e428da418b709feeb26] of/acpi:
Configure dma operations at probe time for platform/amba/pci bus devices

^ permalink raw reply

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Alexander Duyck @ 2017-04-26 16:18 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Amir Ancel, David Laight, Gabriele Paoloni, davem@davemloft.net,
	Catalin Marinas, Will Deacon, Mark Rutland, Robin Murphy,
	jeffrey.t.kirsher@intel.com, linux-arm-kernel@lists.infradead.org,
	netdev@vger.kernel.org, LinuxArm, linux-pci@vger.kernel.org
In-Reply-To: <a2239141-a185-1bb8-0abd-7a05b9fde015@huawei.com>

On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> Hi Amir:
>
> It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx,
> we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO,
> we could try to restart this solution, what do you think about it. :)
>
> Thanks
> Ding

Hi Ding,

Enabing relaxed ordering really doesn't have any place in ethtool. It
is a PCIe attribute that you are essentially wanting to enable.

It might be worth while to take a look at updating the PCIe code path
to handle this. Really what we should probably do is guarantee that
the architectures that need relaxed ordering are setting it in the
PCIe Device Control register and that the ones that don't are clearing
the bit. It's possible that this is already occurring, but I don't
know the state of handling those bits is in the kernel. Once we can
guarantee that we could use that to have the drivers determine their
behavior in regards to relaxed ordering. For example in the case of
igb/ixgbe we could probably change the behavior so that it will bey
default try to use relaxed ordering but if it is not enabled in PCIe
Device Control register the hardware should not request to use it. It
would simplify things in the drivers and allow for each architecture
to control things as needed in their PCIe code.

- Alex

^ permalink raw reply

* Re: [PATCH net-next] tcp: memset ca_priv data to 0 properly
From: Wei Wang @ 2017-04-26 16:25 UTC (permalink / raw)
  To: Linux Kernel Network Developers, David Miller
  Cc: Eric Dumazet, Yuchung Cheng, Neal Cardwell, Wei Wang
In-Reply-To: <20170426003802.40091-1-tracywwnj@gmail.com>

This fix should target for net tree instead of net-next.
Sorry for the wrong title.

On Tue, Apr 25, 2017 at 5:38 PM, Wei Wang <weiwan@google.com> wrote:
> From: Wei Wang <weiwan@google.com>
>
> Always zero out ca_priv data in tcp_assign_congestion_control() so that
> ca_priv data is cleared out during socket creation.
> Also always zero out ca_priv data in tcp_reinit_congestion_control() so
> that when cc algorithm is changed, ca_priv data is cleared out as well.
> We should still zero out ca_priv data even in TCP_CLOSE state because
> user could call connect() on AF_UNSPEC to disconnect the socket and
> leave it in TCP_CLOSE state and later call setsockopt() to switch cc
> algorithm on this socket.
>
> Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
> Reported-by: Andrey Konovalov  <andreyknvl@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp_cong.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 79c4817abc94..6e3c512054a6 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -168,12 +168,8 @@ void tcp_assign_congestion_control(struct sock *sk)
>         }
>  out:
>         rcu_read_unlock();
> +       memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
>
> -       /* Clear out private data before diag gets it and
> -        * the ca has not been initialized.
> -        */
> -       if (ca->get_info)
> -               memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
>         if (ca->flags & TCP_CONG_NEEDS_ECN)
>                 INET_ECN_xmit(sk);
>         else
> @@ -200,11 +196,10 @@ static void tcp_reinit_congestion_control(struct sock *sk,
>         tcp_cleanup_congestion_control(sk);
>         icsk->icsk_ca_ops = ca;
>         icsk->icsk_ca_setsockopt = 1;
> +       memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
>
> -       if (sk->sk_state != TCP_CLOSE) {
> -               memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
> +       if (sk->sk_state != TCP_CLOSE)
>                 tcp_init_congestion_control(sk);
> -       }
>  }
>
>  /* Manage refcounts on socket close. */
> --
> 2.13.0.rc0.306.g87b477812d-goog
>

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: John Fastabend @ 2017-04-26 16:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andy Gospodarek,
	Daniel Borkmann, Alexei Starovoitov, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org
In-Reply-To: <20170426111158.578b925e@redhat.com>

[...]

>> Jesper, I was working up the code for the redirect piece for ixgbe and
>> virtio, please use this as a base for your virtual port number table. I'll
>> push an update onto github tomorrow. I think the table should drop in fairly
>> nicely.
> 
> Cool, I will do that. Then, I'll also have a redirect method to shape
> this around, and I would have to benchmark/test your ixgbe redirect.
> 
> (John please let me know, what github tree we are talking about, and
> what branch)
> 
> 
>> One piece that isn't clear to me is how do you plan to instantiate and
>> program this table. Is it a new static bpf map that is created any
>> time we see the redirect command? I think this would be preferred.
> 
> (This is difficult to explain without us misunderstanding each-other)
> 

Yep and I'm not sure I follow :)

> As Alexei also mentioned before, ifindex vs port makes no real
> difference seen from the bpf program side.  It is userspace's
> responsibility to add ifindex/port's to the bpf-maps, according to how
> the bpf program "policy" want to "connect" these ports.  The
> port-table system add one extra step, of also adding this port to the
> port-table (which lives inside the kernel). 
> 

I'm not sure I understand the "lives inside the kernel" bit. I assumed
the 'map' should be a bpf map and behave like any other bpf map.

I wanted a new map to be defined, something like this from the bpf programmer
side.

struct bpf_map_def SEC("maps") port_table =
	.type = BPF_MAP_TYPE_PORT_CONNECTION,
	.key_size = sizeof(u32),
	.value_size = BPF_PORT_CONNECTION_SIZE,
	.max_entries = 256,
};

> When loading the XDP program, we also need to pass along a port table
> "id" this XDP program is associated with (and if it doesn't exists you
> create it).  And your userspace "control-plane" application also need
> to know this port table "id", when adding a new port.

So the user space application that is loading the program also needs
to handle this map. This seems correct to me. But I don't see the
value in making some new port table when we already have well understood
framework for maps.

> 
> The concept of having multiple port tables is key.  As this implies we
> can have several simultaneous "data-planes" that is *isolated* from
> each-other.  Think about how network-namespaces/containers want
> isolation. A subtle thing I'm afraid to mention, is that oppose to the
> ifindex model, a port table with mapping to a net_device pointer, would
> allow (faster) delivery into the container's inner net_device, which
> sort of violates the isolation, but I would argue it is not a problem
> as this net_device pointer could only be added from a process within the
> namespace.  I like this feature, but it could easily be disallowed via
> port insertion-time validation.
> 

I think the above optimization should be allowed. And agree multiple port
tables (maps?) is needed. Again all this points to using standard maps
logic in my mind. For permissions and different domains, which I think
you were starting to touch on, it looks like we could extend the pinning API.
At the moment it does an inode_permission(inode, MAY_WRITE) check but I
presume this could be extended. None of this would be needed in v1 and
could be added subsequently. read-only maps seems doable.

>    
>>>> I'm not worried about the DROP case, I agree that is fine (as you
>>>> also say).  The problem is unintentionally sending a packet to a
>>>> wrong ifindex.  This is clearly an eBPF program error, BUT with
>>>> XDP this becomes a very hard to debug program error.  With
>>>> TC-redirect/cls_bpf we can tcpdump the packets, with XDP there is
>>>> no visibility into this happening (the NSA is going to love this
>>>> "feature").  Maybe we could add yet-another tracepoint to allow
>>>> debugging this.  My proposal that we simply remove the possibility
>>>> for such program errors, by as you say move the validation from
>>>> run-time into static insertion-time, via a port table.  
>>>
>>> I think lack of tcpdump-like debugging in xdp is a separate issue.
>>> As I was saying in the other thread we have trivial 'xdpdump'
>>> kern+user app that emits pcap file, but it's too specific to how we
>>> use tail_calls+prog_array in our xdp setup. I'm working on the
>>> program chaining that will be generic and allow us transparently
>>> add multiple xdp or tc progs to the same attachment point and will
>>> allow us to do 'xdpdump' at any point of this pipeline, so
>>> debugging of what happened to the packet will be easier and done in
>>> the same way for both tc and xdp.
>>> btw in our experience working with both tc and xdp the tc+bpf was
>>> actually harder to use and more bug prone.
>>>   
>>
>> Nice, the tcpdump-like debugging looks interesting.
> 
> Yes, this xdpdump sound like a very useful tool.
> 

^ permalink raw reply

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Jarod Wilson @ 2017-04-26 16:46 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jiri Pirko, Linux Kernel Network Developers, Andrey Konovalov
In-Reply-To: <CAM_iQpVXBuXNfVM_h894HEF_9hv2rWgDJcMbvEJG25xfm7DGUA@mail.gmail.com>

On 2017-04-26 12:11 PM, Cong Wang wrote:
> On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson <jarod@redhat.com> wrote:
>>
>> We already have struct sockaddr_storage that could be used throughout this
>> set as well. We just converted a few pieces of the bonding driver over to
>> using it for better support of ipoib bonds, via commit
>> faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use that
>> in both bonding and team, rather than having different per-driver structs,
>> or Yet Another Address Storage implementation.
> 
> Technically, struct sockaddr_storage is not enough either, given the
> max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage.

Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and 
sockaddr_storage is a #define for __kernel_sockaddr_storage, which has 
it's __data member defined as being of size 128 - sizeof(unsigned short).

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply

* Re: bluetooth 6lowpan interfaces are not virtual anymore
From: Jukka Rissanen @ 2017-04-26 16:52 UTC (permalink / raw)
  To: Michael Richardson, Alexander Aring
  Cc: Network Development, Luiz Augusto von Dentz,
	linux-wpan-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux Bluetooth
In-Reply-To: <383.1493218546-VaGMqW6d0iXFptTlUKWvmrDks+cytr/Z@public.gmane.org>

Hi Michael,

On Wed, 2017-04-26 at 10:55 -0400, Michael Richardson wrote:
> Alexander Aring <aar-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> wrote:
>     >> In a classic SVR4 STREAMS works, it would have been just
> another
>     >> module.  (No, I'm not a fan of *STREAMS* or of SVR4 in
> general,
>     >> although I liked some of the ideas).
>     >>
> 
>     > ok, I see you complain about "having a virtual on top of wpan
>     > interface", or?
> 
>     > I wanted to talk at first about the queue handling which is
> introduced
>     > when 6LoWPAN is not a virtual interface anymore. Or do you want
> to have
>     > a queue in front of 6lowpan adaptation (see other mail reply
> with ASCII
>     > graphics).
> 
> I would like to have a single queue, as close to the hardware as
> possible,
> such that BQL can do it's thing easily.  Should we rethink outgoing
> fragment
> handling for 6lowpan?  Clearly the BT people had a need.
> I don't think they've had a chance to respond to your complaints.

Note that the BT fragmentation (or actually it is called segmentation
in BT) is totally different what 802.15.4 is doing. I do not think
there is any need to add fragmentation handling into 6lo.

Actually the 6lowpan kernel module could probably be simplified to be a
library. We did this in Zephyr where we have compress() and
uncompress() functions that do all the magic.  

> 
>     > We can change that you can run multiple interfaces on one
>     > PHY. Currently we just allow one, because address filtering.
> Disable
>     > address filtering
>     > we will loose ACK handling on hardware.
> 
> Yes, that's a limitation of some hardware, and if you enable multiple
> PANIDs,
> that might be the consequence....
> 
>     > I can try to implement all stuff in software "for fun, maybe
> see what
>     > we can do to handle ACK in software, etc" Then you can run
> multiple
> 
> I'm not asking you to do it, I'm asking, now that we've gotten to a
> certain
> point, we have a better idea what the various requirements are, and
> can we
> re-evaluate things and maybe tweak some things.
> 
> --
> ]               Never tell me the odds!                 | ipv6 mesh
> networks [
> ]   Michael Richardson, Sandelman Software Works        | network
> architect  [
> ]     mcr-SWp7JaYWvAQV+D8aMU/kSg@public.gmane.org  http://www.sandelman.ca/        |   ruby on
> rails    [
> 


Cheers,
Jukka

^ permalink raw reply

* Re: [PATCH v1 net-next 0/6] Extend socket timestamping API
From: Richard Cochran @ 2017-04-26 16:54 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: netdev, Willem de Bruijn, Soheil Hassas Yeganeh, Keller, Jacob E,
	Denny Page, Jiri Benc
In-Reply-To: <20170426145035.25846-1-mlichvar@redhat.com>

On Wed, Apr 26, 2017 at 04:50:29PM +0200, Miroslav Lichvar wrote:
> This patchset adds new options to the timestamping API that will be
> useful for NTP implementations and possibly other applications.

Are there any userland ntp patches floating around to exercise the new
HW time stamping option?

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option
From: Craig Gallek @ 2017-04-26 17:07 UTC (permalink / raw)
  To: Hideaki YOSHIFUJI, Alexey Kuznetsov, David S . Miller; +Cc: netdev

From: Craig Gallek <kraig@google.com>

The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4).  An MTU adjustment is done to account for
these options as well.  However, the options are never present in the
generated packets.

ipv6_push_nfrag_opts requires that IPV6_RTHDR be present in order to
include any IPV6_DSTOPTS options.  The v6 tunnel code does not
use routing options, so the encap limit options are not included.

A brief reading of RFC 3542 section 9.2 (specifically the 4th paragraph)
makes me believe that this requirement in the kernel is incorrect.

Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
---
 net/ipv6/exthdrs.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 25192a3b0cd7..224a89e68a42 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -932,15 +932,12 @@ void ipv6_push_nfrag_opts(struct sk_buff *skb, struct ipv6_txoptions *opt,
 			  u8 *proto,
 			  struct in6_addr **daddr, struct in6_addr *saddr)
 {
-	if (opt->srcrt) {
+	if (opt->srcrt)
 		ipv6_push_rthdr(skb, proto, opt->srcrt, daddr, saddr);
-		/*
-		 * IPV6_RTHDRDSTOPTS is ignored
-		 * unless IPV6_RTHDR is set (RFC3542).
-		 */
-		if (opt->dst0opt)
-			ipv6_push_exthdr(skb, proto, NEXTHDR_DEST, opt->dst0opt);
-	}
+
+	if (opt->dst0opt)
+		ipv6_push_exthdr(skb, proto, NEXTHDR_DEST, opt->dst0opt);
+
 	if (opt->hopopt)
 		ipv6_push_exthdr(skb, proto, NEXTHDR_HOP, opt->hopopt);
 }
-- 
2.13.0.rc0.306.g87b477812d-goog

^ permalink raw reply related

* [PATCH net] tcp: fix access to sk->sk_state in tcp_poll()
From: Davide Caratti @ 2017-04-26 17:07 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Wei Wang
  Cc: netdev

avoid direct access to sk->sk_state when tcp_poll() is called on a socket
using active TCP fastopen with deferred connect. Use local variable
'state', which stores the result of sk_state_load(), like it was done in
commit 00fd38d938db ("tcp: ensure proper barriers in lockless contexts").

Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 40ba424..2dc7fcf 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -533,7 +533,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 		if (tp->urg_data & TCP_URG_VALID)
 			mask |= POLLPRI;
-	} else if (sk->sk_state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
+	} else if (state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
 		/* Active TCP fastopen socket with defer_connect
 		 * Return POLLOUT so application can call write()
 		 * in order for kernel to generate SYN+data
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] net: adjust skb->truesize in ___pskb_trim()
From: Andrey Konovalov @ 2017-04-26 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Willem de Bruijn
In-Reply-To: <1493222866.6453.75.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Apr 26, 2017 at 6:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
> skb_try_coalesce() using syzkaller and a filter attached to a TCP
> socket.
>
> As we did recently in commit 158f323b9868 ("net: adjust skb->truesize in
> pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
> via a call to skb_condense().
>
> If all frags were freed, then skb->truesize can be recomputed.
>
> This call can be done if skb is not yet owned, or destructor is
> sock_edemux().

Hi Eric,

I still see the warning even with your patch.

Thanks!

>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> ---
>  net/core/skbuff.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f86bf69cfb8d8bc17262cdba5d9f57a4726cd476..f1d04592ace02f32efa6e05df89c9a5e0023157f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1576,6 +1576,8 @@ int ___pskb_trim(struct sk_buff *skb, unsigned int len)
>                 skb_set_tail_pointer(skb, len);
>         }
>
> +       if (!skb->sk || skb->destructor == sock_edemux)
> +               skb_condense(skb);
>         return 0;
>  }
>  EXPORT_SYMBOL(___pskb_trim);
>
>

^ permalink raw reply

* Re: blocking ops when !TASK_RUNNING in vsock_stream_sendmsg() (again)
From: Cong Wang @ 2017-04-26 17:18 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Claudio Imbrenda, Linux Kernel Network Developers, Andy King,
	George Zhang
In-Reply-To: <20170421081458.GI13789@unicorn.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

Hi,

On Fri, Apr 21, 2017 at 1:14 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> I tried to think about a solution but there doesn't seem to be an easy
> way to fix this in vmw_stream_sendmsg() as moving prepare_to_wait()
> inside the loop would result in missed wake-ups (that was the problem
> with the original fix); IMHO the right way to resolve the issue would be
> rewriting the vmci queue pair code to allow performing the has_space()
> check without taking a mutex.


Can you try the attached patch (compile only)?

Thanks.

[-- Attachment #2: vsock-wait.diff --]
[-- Type: text/plain, Size: 2095 bytes --]

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 6f7f675..dfc8c51e 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1540,8 +1540,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	long timeout;
 	int err;
 	struct vsock_transport_send_notify_data send_data;
-
-	DEFINE_WAIT(wait);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
@@ -1584,11 +1583,10 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	if (err < 0)
 		goto out;
 
-
 	while (total_written < len) {
 		ssize_t written;
 
-		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		add_wait_queue(sk_sleep(sk), &wait);
 		while (vsock_stream_has_space(vsk) == 0 &&
 		       sk->sk_err == 0 &&
 		       !(sk->sk_shutdown & SEND_SHUTDOWN) &&
@@ -1597,33 +1595,30 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 			/* Don't wait for non-blocking sockets. */
 			if (timeout == 0) {
 				err = -EAGAIN;
-				finish_wait(sk_sleep(sk), &wait);
+				remove_wait_queue(sk_sleep(sk), &wait);
 				goto out_err;
 			}
 
 			err = transport->notify_send_pre_block(vsk, &send_data);
 			if (err < 0) {
-				finish_wait(sk_sleep(sk), &wait);
+				remove_wait_queue(sk_sleep(sk), &wait);
 				goto out_err;
 			}
 
 			release_sock(sk);
-			timeout = schedule_timeout(timeout);
+			timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout);
 			lock_sock(sk);
 			if (signal_pending(current)) {
 				err = sock_intr_errno(timeout);
-				finish_wait(sk_sleep(sk), &wait);
+				remove_wait_queue(sk_sleep(sk), &wait);
 				goto out_err;
 			} else if (timeout == 0) {
 				err = -EAGAIN;
-				finish_wait(sk_sleep(sk), &wait);
+				remove_wait_queue(sk_sleep(sk), &wait);
 				goto out_err;
 			}
-
-			prepare_to_wait(sk_sleep(sk), &wait,
-					TASK_INTERRUPTIBLE);
 		}
-		finish_wait(sk_sleep(sk), &wait);
+		remove_wait_queue(sk_sleep(sk), &wait);
 
 		/* These checks occur both as part of and after the loop
 		 * conditional since we need to check before and after

^ permalink raw reply related

* Re: [PATCH net] tcp: fix access to sk->sk_state in tcp_poll()
From: Wei Wang @ 2017-04-26 17:24 UTC (permalink / raw)
  To: Davide Caratti
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy,
	Linux Kernel Network Developers
In-Reply-To: <0a1710138c3e55c388a52dba817b2d1b1996c899.1493226034.git.dcaratti@redhat.com>

On Wed, Apr 26, 2017 at 10:07 AM, Davide Caratti <dcaratti@redhat.com> wrote:
> avoid direct access to sk->sk_state when tcp_poll() is called on a socket
> using active TCP fastopen with deferred connect. Use local variable
> 'state', which stores the result of sk_state_load(), like it was done in
> commit 00fd38d938db ("tcp: ensure proper barriers in lockless contexts").
>
> Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Acked-by: Wei Wang <weiwan@google.com>
Thanks for the fix.

^ permalink raw reply

* Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
From: Eric Anholt @ 2017-04-26 17:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, Vivien Didelot, netdev, Rob Herring,
	Mark Rutland, devicetree, linux-arm-kernel, linux-kernel,
	bcm-kernel-feedback-list, Ray Jui, Scott Branden, Jon Mason
In-Reply-To: <20170426004907.GA9453@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

Andrew Lunn <andrew@lunn.ch> writes:

>> +		eth0: ethernet@18042000 {
>> +			compatible = "brcm,amac";
>> +			reg = <0x18042000 0x1000>,
>> +			      <0x18110000 0x1000>;
>> +			reg-names = "amac_base", "idm_base";
>> +			interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>;
>> +			max-speed = <1000>;
>
> Hi Eric
>
> Sorry i missed this the first time. Does this Ethernet controller do >
> 1Gbps? Does this max-speed do anything useful?

It doesn't look like it.  Dropped.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: hmmm...
From: David Miller @ 2017-04-26 17:25 UTC (permalink / raw)
  To: ast; +Cc: netdev
In-Reply-To: <133d8e46-c5c1-1e0e-86a1-a7b5a2737bff@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Tue, 25 Apr 2017 22:31:06 -0700

> On 4/25/17 8:38 PM, David Miller wrote:
> jgt/jge/jsgt/sge was a stumbling block for me as well,
> since it still takes me longer than necessary to disambiguate
> into > vs >= and signed/unsigned

I had this problem while writing Sparc JIT :)

> Though I think Daniel still prefers old classic bpf asm ;)

I do too.

> Anyway, back to the question...
> since BFD and GCC are so much entrenched into canonical style
> of asm code, I don't mind that gnu toolchain will be using it.

Ok.  All data flows from right to left in the instructions so it will
be familiar for x86 assembler hackers.

> I like that you used 'dw' in 'ldxdw' instead of just 'd'
> though 'x' can probably be dropped.

Ok, dropped.

> 'x' should be added here instead:
>  { "stb", BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],i" },
>  { "stb", BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],2" },

The 'x' really isn't necessary, I would say.  Assembler can tell from
context whether immediate or register variant is wanted and thus:

	stb	[r1+8], 2
	stb	[r1+8], r4

are both assembled correctly.

^ permalink raw reply

* Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
From: Eric Anholt @ 2017-04-26 17:26 UTC (permalink / raw)
  To: Florian Fainelli, Vivien Didelot, Andrew Lunn,
	netdev-u79uwXL29TY76Z2rM5mHXA, Rob Herring, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w, Ray Jui,
	Scott Branden, Jon Mason
In-Reply-To: <0cb00eb7-41d0-0390-4687-d966499ed9f4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 04/25/2017 04:53 PM, Eric Anholt wrote:
>> Cygnus has a single AMAC controller connected to the B53 switch with 2
>> PHYs.  On the BCM911360_EP platform, those two PHYs are connected to
>> the external ethernet jacks.
>> 
>> Signed-off-by: Eric Anholt <eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>
>> Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> 
>> v2: Call the node "switch", just call the ports "port" (suggestions by
>>     Florian), drop max-speed on the phys (suggestion by Andrew Lunn),
>>     call the other nodes "ethernet" and "ethernet-phy" (suggestions by
>>     Sergei Shtylyov)
>> 
>>  arch/arm/boot/dts/bcm-cygnus.dtsi      | 58 ++++++++++++++++++++++++++++++++++
>>  arch/arm/boot/dts/bcm911360_entphn.dts |  8 +++++
>>  2 files changed, 66 insertions(+)
>> 
>> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> index 009f1346b817..9fd89be0f5e0 100644
>> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi
>> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> @@ -142,6 +142,54 @@
>>  			interrupts = <0>;
>>  		};
>>  
>> +		mdio: mdio@18002000 {
>> +			compatible = "brcm,iproc-mdio";
>> +			reg = <0x18002000 0x8>;
>> +			#size-cells = <1>;
>> +			#address-cells = <0>;
>
> Sorry for not noticing earlier, since you override this correctly in the
> board-level DTS file can you put a:
>
> 			status = "disabled"
>
> property in there by default?

I didn't have the override in the board file either, just switch and
ethernet.  Fixed.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH v2 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
From: Eric Anholt @ 2017-04-26 17:26 UTC (permalink / raw)
  To: Florian Fainelli, Vivien Didelot, Andrew Lunn,
	netdev-u79uwXL29TY76Z2rM5mHXA, Rob Herring, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w, Ray Jui,
	Scott Branden, Jon Mason
In-Reply-To: <0cb00eb7-41d0-0390-4687-d966499ed9f4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 04/25/2017 04:53 PM, Eric Anholt wrote:
>> Cygnus has a single AMAC controller connected to the B53 switch with 2
>> PHYs.  On the BCM911360_EP platform, those two PHYs are connected to
>> the external ethernet jacks.
>> 
>> Signed-off-by: Eric Anholt <eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>
>> Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> 
>> v2: Call the node "switch", just call the ports "port" (suggestions by
>>     Florian), drop max-speed on the phys (suggestion by Andrew Lunn),
>>     call the other nodes "ethernet" and "ethernet-phy" (suggestions by
>>     Sergei Shtylyov)
>> 
>>  arch/arm/boot/dts/bcm-cygnus.dtsi      | 58 ++++++++++++++++++++++++++++++++++
>>  arch/arm/boot/dts/bcm911360_entphn.dts |  8 +++++
>>  2 files changed, 66 insertions(+)
>> 
>> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> index 009f1346b817..9fd89be0f5e0 100644
>> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi
>> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
>> @@ -142,6 +142,54 @@
>>  			interrupts = <0>;
>>  		};
>>  
>> +		mdio: mdio@18002000 {
>> +			compatible = "brcm,iproc-mdio";
>> +			reg = <0x18002000 0x8>;
>> +			#size-cells = <1>;
>> +			#address-cells = <0>;
>
> Sorry for not noticing earlier, since you override this correctly in the
> board-level DTS file can you put a:
>
> 			status = "disabled"
>
> property in there by default?

Done.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* [PATCH] net: hso: register netdev later to avoid a race condition
From: Andreas Kemnade @ 2017-04-26 17:26 UTC (permalink / raw)
  To: davem, joe, gregkh, peter, hns, linux-usb, netdev, linux-kernel
  Cc: Andreas Kemnade

If the netdev is accessed before the urbs are initialized,
there will be NULL pointer dereferences. That is avoided by
registering it when it is fully initialized.

This case occurs e.g. if dhcpcd is running in the background
and the device is probed, either after insmod hso or
when the device appears on the usb bus.

A backtrace is the following:

[ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
[ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
[ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
[ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
[ 1357.574096] usb 1-2: Manufacturer: Option N.V.
[ 1357.685882] hso 1-2:1.5: Not our interface
[ 1460.886352] hso: unloaded
[ 1460.889984] usbcore: deregistering interface driver hso
[ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
[ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[ 1513.887664] hso 1-2:1.5: Not our interface
[ 1513.906890] usbcore: registered new interface driver hso
[ 1513.937988] pgd = ecdec000
[ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
[ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_acce
 l_i2c tsc2007
[ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
[ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
[ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 1514.077758] task: ee748240 task.stack: ecdd6000
[ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
[ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
[ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
sp : ecdd7e20  ip : 00000000  fp : ffffffff
[ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
[ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
[ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
[ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
[ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
[ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
[ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
[ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
[ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
[ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
[ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
[ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
[ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
[ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
[ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
[ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
[ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
[ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
[ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
[ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
[ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
[ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
[ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
[ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
[ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
[ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
[ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
[ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
[ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
[ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
[ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
[ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
[ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---

Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
---
 drivers/net/usb/hso.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index 93411a3..00067a0 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -2534,13 +2534,6 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface,
 	SET_NETDEV_DEV(net, &interface->dev);
 	SET_NETDEV_DEVTYPE(net, &hso_type);
 
-	/* registering our net device */
-	result = register_netdev(net);
-	if (result) {
-		dev_err(&interface->dev, "Failed to register device\n");
-		goto exit;
-	}
-
 	/* start allocating */
 	for (i = 0; i < MUX_BULK_RX_BUF_COUNT; i++) {
 		hso_net->mux_bulk_rx_urb_pool[i] = usb_alloc_urb(0, GFP_KERNEL);
@@ -2560,6 +2553,13 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface,
 
 	add_net_device(hso_dev);
 
+	/* registering our net device */
+	result = register_netdev(net);
+	if (result) {
+		dev_err(&interface->dev, "Failed to register device\n");
+		goto exit;
+	}
+
 	hso_log_port(hso_dev);
 
 	hso_create_rfkill(hso_dev, interface);
-- 
2.1.4

^ permalink raw reply related

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Cong Wang @ 2017-04-26 17:28 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Jiri Pirko, Linux Kernel Network Developers, Andrey Konovalov
In-Reply-To: <3f48c69c-3ff4-f895-3c0f-2e8c9c5f063a@redhat.com>

On Wed, Apr 26, 2017 at 9:46 AM, Jarod Wilson <jarod@redhat.com> wrote:
> On 2017-04-26 12:11 PM, Cong Wang wrote:
>>
>> On Wed, Apr 26, 2017 at 8:55 AM, Jarod Wilson <jarod@redhat.com> wrote:
>>>
>>>
>>> We already have struct sockaddr_storage that could be used throughout
>>> this
>>> set as well. We just converted a few pieces of the bonding driver over to
>>> using it for better support of ipoib bonds, via commit
>>> faeeb317a5615076dff1ff44b51e862e6064dbd0. Might be better to just use
>>> that
>>> in both bonding and team, rather than having different per-driver
>>> structs,
>>> or Yet Another Address Storage implementation.
>>
>>
>> Technically, struct sockaddr_storage is not enough either, given the
>> max is MAX_ADDR_LEN. This is why I gave up on sockaddr_storage.
>
>
> Wait, what? Am I missing something? MAX_ADDR_LEN is 32, and sockaddr_storage
> is a #define for __kernel_sockaddr_storage, which has it's __data member
> defined as being of size 128 - sizeof(unsigned short).

My bad, I thought it is same with sizeof(in6addr) without looking into it.
The question is, why do we waste 126 - 32 = 94 bytes on stack to just
use struct sockaddr_storage?

I totally understand we want a unified struct, but we already redefine
it in multiple places in tree...

^ permalink raw reply

* Re: Corrupted SKB
From: Cong Wang @ 2017-04-26 17:38 UTC (permalink / raw)
  To: Michael Ma; +Cc: Linux Kernel Network Developers, jin.oyj
In-Reply-To: <CAAmHdhwncVcbeKN+K4EnmWqwBRXPwuMOSFwtOfuwmgc6gw-5=g@mail.gmail.com>

On Tue, Apr 25, 2017 at 9:42 PM, Michael Ma <make0818@gmail.com> wrote:
> 2017-04-18 21:46 GMT-07:00 Michael Ma <make0818@gmail.com>:
>> 2017-04-18 16:12 GMT-07:00 Cong Wang <xiyou.wangcong@gmail.com>:
>>> On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma <make0818@gmail.com> wrote:
>>>> Hi -
>>>>
>>>> We've implemented a "glue" qdisc similar to mqprio which can associate
>>>> one qdisc to multiple txqs as the root qdisc. Reference count of the
>>>> child qdiscs have been adjusted properly in this case so that it
>>>> represents the number of txqs it has been attached to. However when
>>>> sending packets we saw the skb from dequeue_skb() corrupted with the
>>>> following call stack:
>>>>
>>>>     [exception RIP: netif_skb_features+51]
>>>>     RIP: ffffffff815292b3  RSP: ffff8817f6987940  RFLAGS: 00010246
>>>>
>>>>  #9 [ffff8817f6987968] validate_xmit_skb at ffffffff815294aa
>>>> #10 [ffff8817f69879a0] validate_xmit_skb at ffffffff8152a0d9
>>>> #11 [ffff8817f69879b0] __qdisc_run at ffffffff8154a193
>>>> #12 [ffff8817f6987a00] dev_queue_xmit at ffffffff81529e03
>>>>
>>>> It looks like the skb has already been released since its dev pointer
>>>> field is invalid.
>>>>
>>>> Any clue on how this can be investigated further? My current thought
>>>> is to add some instrumentation to the place where skb is released and
>>>> analyze whether there is any race condition happening there. However
>>>
>>> Either dropwatch or perf could do the work to instrument kfree_skb().
>>
>> Thanks - will try it out.
>
> I'm using perf to collect the callstack for kfree_skb and trying to
> correlate that with the corrupted SKB address however when system
> crashes the perf.data file is also corrupted - how can I view this
> file in case the system crashes before perf exits?

Hmm, KASAN is pretty good at detecting use-after-free,
its report can nicely shows where we allocate/free it and the
use after free.

https://01.org/linuxgraphics/gfx-docs/drm/dev-tools/kasan.html

^ permalink raw reply

* RE: qed*: debug infrastructures
From: Elior, Ariel @ 2017-04-26 17:57 UTC (permalink / raw)
  To: Florian Fainelli, David Miller, Jakub Kicinski, Jiri Pirko
  Cc: netdev@vger.kernel.org, Mintz, Yuval, Tayar, Tomer, Dupuis, Chad
In-Reply-To: <c8751a31-f493-45c1-5bf2-596b367ea034@gmail.com>

Jiri, Florian, Jakub,
Thanks all for you suggestions.

Some answers to questions posted: The signal tracing in our device can be used 
for tracing things like load/store/program_counter from our fastpath processors
(which handle every packet) which can then be re-run in a simulative environment
(recreating the recorded scenario). Other interesting uses for this feature can
be partial pci recording or partial network recording (poor man's analyzer)
which can also be very effective where full blown lab equipment is unavailable.

I reviewed the code of the drivers under hw_tracing (thanks Florian) and I think
we might be a good fit.

Jiri indicated dpipe was not intended for this sort of thing and suggested an
additional dev link object, although it seems to me that this will have to be
either a very generic object which would be susceptible to abuse similar to
debugfs, or it would be tailored to our device so much that no one else would
use it, so I am somewhat less inclined to go down this path (the code
abstracting our debug feature is accessed via ~20 api functions accepting ~10
params each, i.e. quite a handful of configuraion to generalize).

The ethtool debug dump presets (thanks Jakub) are far too narrow to encompass
the full flexibility required here.

Dave, I think my next step would be to send an RFC adding to our core module
(qed) the necessary APIs (mostly to provide some details to this rather abstract
discussion). I will plan to connect those to a new hwtracing driver I'll create
for this purpose, unless a different direction is suggested.

Thanks,
Ariel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox