* Re: [PATCH] Revert "net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit"
From: Jose Abreu @ 2018-08-28 8:12 UTC (permalink / raw)
To: Jerome Brunet, Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu,
netdev
Cc: linux-kernel, linux-amlogic, Joao Pinto, Vitor Soares,
Corentin Labbe
In-Reply-To: <20180824090440.13411-1-jbrunet@baylibre.com>
Hi Jerome,
On 24-08-2018 10:04, Jerome Brunet wrote:
> This reverts commit 4ae0169fd1b3c792b66be58995b7e6b629919ecf.
>
> This change in the handling of the coalesce timer is causing regression on
> (at least) amlogic platforms.
>
> Network will break down very quickly (a few seconds) after starting
> a download. This can easily be reproduced using iperf3 for example.
>
> The problem has been reported on the S805, S905, S912 and A113 SoCs
> (Realtek and Micrel PHYs) and it is likely impacting all Amlogics
> platforms using Gbit ethernet
>
> No problem was seen with the platform using 10/100 only PHYs (GXL internal)
>
> Reverting change brings things back to normal and allows to use network
> again until we better understand the problem with the coalesce timer.
>
>
Apologies for the delayed answer but I was in FTO.
I'm not sure what can be causing this but I have some questions
for you:
- What do you mean by "network will break down"? Do you see
queue timeout?
- What do you see in ethtool/ifconfig stats? Can you send me
the stats before and after network break?
- Is your setup multi-queue/channel?
- Can you point me to the DT bindings of your setup?
Thanks and Best Regards,
Jose Miguel Abreu
^ permalink raw reply
* [PATCH] specifically mention zero TX queues in error msg
From: Robert P. J. Day @ 2018-08-28 7:31 UTC (permalink / raw)
To: Linux kernel netdev mailing list
To be consistent with subsequent error message specifically mentioning
zero RX queues, add a reference to TX queues to the error message.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 325fc5088370..a5d0c2244fb5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8867,7 +8867,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
BUG_ON(strlen(name) >= sizeof(dev->name));
if (txqs < 1) {
- pr_err("alloc_netdev: Unable to allocate device with zero queues\n");
+ pr_err("alloc_netdev: Unable to allocate device with zero TX queues\n");
return NULL;
}
--
========================================================================
Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca/dokuwiki
Twitter: http://twitter.com/rpjday
LinkedIn: http://ca.linkedin.com/in/rpjday
========================================================================
^ permalink raw reply related
* [PATCH bpf-next] bpf: remove duplicated include from syscall.c
From: YueHaibing @ 2018-08-28 7:42 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann; +Cc: YueHaibing, netdev, kernel-janitors
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
kernel/bpf/syscall.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8339d81..3c9636f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -30,7 +30,6 @@
#include <linux/cred.h>
#include <linux/timekeeping.h>
#include <linux/ctype.h>
-#include <linux/btf.h>
#include <linux/nospec.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
^ permalink raw reply related
* Re: [PATCH] net: wireless: ath: Convert to using %pOFn instead of device_node.name
From: Kalle Valo @ 2018-08-28 11:19 UTC (permalink / raw)
To: Rob Herring; +Cc: linux-kernel, David S. Miller, linux-wireless, netdev
In-Reply-To: <20180828015252.28511-36-robh@kernel.org>
Rob Herring <robh@kernel.org> writes:
> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
> drivers/net/wireless/ath/ath6kl/init.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
The correct prefix is "ath6kl:" but I can fix that.
--
Kalle Valo
^ permalink raw reply
* bpf-next is OPEN
From: Daniel Borkmann @ 2018-08-28 7:06 UTC (permalink / raw)
To: netdev; +Cc: ast
Merge window is over so new bpf-next development round begins.
Thanks,
Daniel
^ permalink raw reply
* Re: [bpf PATCH] bpf: sockmap, decrement copied count correctly in redirect error case
From: Daniel Borkmann @ 2018-08-28 7:04 UTC (permalink / raw)
To: John Fastabend, alexei.starovoitov; +Cc: netdev
In-Reply-To: <20180825003659.7508.52198.stgit@john-Precision-Tower-5810>
On 08/25/2018 02:37 AM, John Fastabend wrote:
> Currently, when a redirect occurs in sockmap and an error occurs in
> the redirect call we unwind the scatterlist once in the error path
> of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
> in the error path of sendmsg we decrement the copied count by the
> send size.
>
> However, its possible we partially sent data before the error was
> generated. This can happen if do_tcp_sendpages() partially sends the
> scatterlist before encountering a memory pressure error. If this
> happens we need to decrement the copied value (the value tracking
> how many bytes were actually sent to TCP stack) by the number of
> remaining bytes _not_ the entire send size. Otherwise we risk
> confusing userspace.
>
> Also we don't need two calls to free the scatterlist one is
> good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
> then properly reduce copied by the number of remaining bytes which
> may in fact be the entire send size if no bytes were sent.
>
> To do this use bool to indicate if free_start_sg() should do mem
> accounting or not.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Applied to bpf, thanks John!
^ permalink raw reply
* Re: [PATCH v2 01/29] nvmem: add support for cell lookups
From: Srinivas Kandagatla @ 2018-08-28 10:15 UTC (permalink / raw)
To: Bartosz Golaszewski, Boris Brezillon
Cc: Andrew Lunn, linux-doc, Sekhar Nori, Bartosz Golaszewski,
linux-i2c, Mauro Carvalho Chehab, Rob Herring, Florian Fainelli,
Kevin Hilman, Richard Weinberger, Russell King, Marek Vasut,
Paolo Abeni, Dan Carpenter, Grygorii Strashko, David Lechner,
Arnd Bergmann, Sven Van Asbroeck, open list:MEMORY TECHNOLOGY...
In-Reply-To: <CAMRc=MfpZJUEAhM4OoLbmJcnX3rTJk8fvdiL6=9BjNkVcsf=SA@mail.gmail.com>
On 27/08/18 14:37, Bartosz Golaszewski wrote:
> I didn't notice it before but there's a global list of nvmem cells
Bit of history here.
The global list of nvmem_cell is to assist non device tree based cell
lookups. These cell entries come as part of the non-dt providers
nvmem_config.
All the device tree based cell lookup happen dynamically on
request/demand, and all the cell definition comes from DT.
As of today NVMEM supports both DT and non DT usecase, this is much simpler.
Non dt cases have various consumer usecases.
1> Consumer is aware of provider name and cell details.
This is probably simple usecase where it can just use device based apis.
2> Consumer is not aware of provider name, its just aware of cell name.
This is the case where global list of cells are used.
> with each cell referencing its owner nvmem device. I'm wondering if
> this isn't some kind of inversion of ownership. Shouldn't each nvmem
> device have a separate list of nvmem cells owned by it? What happens
This is mainly done for use case where consumer does not have idea of
provider name or any details.
First thing non dt user should do is use "NVMEM device based consumer APIs"
ex: First get handle to nvmem device using its nvmem provider name by
calling nvmem_device_get(); and use nvmem_device_cell_read/write() apis.
Also am not 100% sure how would maintaining cells list per nvmem
provider would help for the intended purpose of global list?
> if we have two nvmem providers with the same names for cells? I'm
Yes, it would return the first instance.. which is a known issue.
Am not really sure this is a big problem as of today! but am open for
any better suggestions!
> asking because dev_id based lookup doesn't make sense if internally
> nvmem_cell_get_from_list() doesn't care about any device names (takes
> only the cell_id as argument).
As I said this is for non DT usecase where consumers are not aware of
provider details.
>
> This doesn't cause any trouble now since there are no users defining
> cells in nvmem_config - there are only DT users - but this must be
> clarified before I can advance with correctly implementing nvmem
> lookups.
DT users should not be defining this to start with! It's redundant and
does not make sense!
>
> BTW: of_nvmem_cell_get() seems to always allocate an nvmem_cell
> instance even if the cell for this node was already added to the nvmem
> device.
I hope you got the reason why of_nvmem_cell_get() always allocates new
instance for every get!!
thanks,
srini
^ permalink raw reply
* Re: [PATCH RFT] net: dsa: Allow configuring CPU port VLANs
From: Ilias Apalodimas @ 2018-08-28 8:32 UTC (permalink / raw)
To: Florian Fainelli
Cc: Petr Machata, netdev, jiri, Andrew Lunn, Vivien Didelot,
David S. Miller, open list
In-Reply-To: <9ce291a4-b40d-81d8-1c1a-c4311e5cc113@gmail.com>
On Fri, Aug 10, 2018 at 04:58:10PM -0700, Florian Fainelli wrote:
> On 06/25/2018 02:17 AM, Ilias Apalodimas wrote:
> > On Mon, Jun 25, 2018 at 12:13:10PM +0300, Petr Machata wrote:
> >> Florian Fainelli <f.fainelli@gmail.com> writes:
> >>
> >>> if (netif_is_bridge_master(vlan->obj.orig_dev))
> >>> - return -EOPNOTSUPP;
> >>> + info.port = dp->cpu_dp->index;
> >>
> >> The condition above will trigger also when a VLAN is added on a member
> >> port, and there's no other port with that VLAN. In that case the VLAN
> >> comes without the BRIDGE_VLAN_INFO_BRENTRY flag. In mlxsw we have this
> >> to get the bridge VLANs:
> >>
> >> if (netif_is_bridge_master(orig_dev)) {
> >> [...]
> >> if ((vlan->flags & BRIDGE_VLAN_INFO_BRENTRY) &&
> >> [...]
> >>
> >> This doesn't appear to be done in DSA unless I'm missing something.
> > Petr's right. This will trigger for VLANs added on 'not cpu ports' if the VLAN
> > is not already a member.
> >
> > This command has BRIDGE_VLAN_INFO_BRENTRY set:
> > bridge vlan add dev br0 vid 100 pvid untagged self
> > I had the same issue on my CPSW RFC and solved it
> > exactly the same was as Petr suggested.
>
> Humm, there must be something obvious I am missing, but the following
> don't exactly result in what I would expect after adding a check for
> vlan->flags & BRIDGE_VLAN_INFO_BRENTRY:
>
> brctl addbr br0
> echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> brctl addif br0 lan1
>
> #1 results in lan1 being programmed with VID 1, PVID, untagged, but not
> the CPU port. I would have sort of expected that the bridge layer would
> also push the configuration to br0/CPU port since this is the default VLAN:
>
> bridge vlan show dev br0
> port vlan ids
> br0 1 PVID Egress Untagged
>
> But it does not.
>
> bridge vlan add vid 2 dev lan1
>
> #2 same thing, results in only lan1 being programmed with VID 2, tagged
> but that is expected because we are creating the VLAN only for the
> user-facing port.
>
> bridge vlan add vid 3 dev br0 self
>
> #3 results in the CPU port being programmed with VID 3, tagged, again,
> this is expected because we are only programming the bridge master/CPU
> port here.
>
> Does #1 also happen for cpsw and mlxsw or do you actually get events
> about the bridge's default VLAN configuration? Or does the switch driver
> actually need to obtain that at the time the port is enslaved somehow?
As long as ports are attached you get the events (one event per attached port
iirc)
if the event is checked against BRIDGE_VLAN_INFO_BRENTRY, the only way to add a
VLAN to the cpu port is via 'bridge vlan add vid 3 dev br0 self'
>
> Thanks!
> --
> Florian
/Ilias
^ permalink raw reply
* Re: Oops running iptables -F OUTPUT
From: Nicholas Piggin @ 2018-08-28 4:06 UTC (permalink / raw)
To: Andreas Schwab
Cc: netdev, linuxppc-dev, Ard Biesheuvel, Jessica Yu,
Michael Ellerman, Will Deacon, Ingo Molnar, Andrew Morton,
linux-arch
In-Reply-To: <87bm9n68gq.fsf@igel.home>
On Mon, 27 Aug 2018 19:11:01 +0200
Andreas Schwab <schwab@linux-m68k.org> wrote:
> I'm getting this Oops when running iptables -F OUTPUT:
>
> [ 91.139409] Unable to handle kernel paging request for data at address 0xd0000001fff12f34
> [ 91.139414] Faulting instruction address: 0xd0000000016a5718
> [ 91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 91.139426] BE SMP NR_CPUS=2 PowerMac
> [ 91.139434] Modules linked in: iptable_filter ip_tables x_tables bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
> [ 91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> [ 91.139526] NIP: d0000000016a5718 LR: d0000000016a569c CTR: c0000000006f560c
> [ 91.139531] REGS: c0000001fa577670 TRAP: 0300 Not tainted (4.19.0-rc1)
> [ 91.139534] MSR: 900000000200b032 <SF,HV,VEC,EE,FP,ME,IR,DR,RI> CR: 84002484 XER: 20000000
> [ 91.139553] DAR: d0000001fff12f34 DSISR: 40000000 IRQMASK: 0
> GPR00: d0000000016a569c c0000001fa5778f0 d0000000016b0400 0000000000000000
> GPR04: 0000000000000002 0000000000000000 80000001fa46418e c0000001fa0d05c8
> GPR08: d0000000016b0400 d00037fffff13000 00000001ff3e7000 d0000000016a6fb8
> GPR12: c0000000006f560c c00000000ffff780 0000000000000000 0000000000000000
> GPR16: 0000000011635010 00003fffa1b7aa68 0000000000000000 0000000000000000
> GPR20: 0000000000000003 0000000010013918 00000000116350c0 c000000000b88990
> GPR24: c000000000b88ba4 0000000000000000 d0000001fff12f34 0000000000000000
> GPR28: d0000000016b8000 c0000001fa20f400 c0000001fa20f440 0000000000000000
> [ 91.139627] NIP [d0000000016a5718] .alloc_counters.isra.10+0xbc/0x140 [ip_tables]
> [ 91.139634] LR [d0000000016a569c] .alloc_counters.isra.10+0x40/0x140 [ip_tables]
> [ 91.139638] Call Trace:
> [ 91.139645] [c0000001fa5778f0] [d0000000016a569c] .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> [ 91.139655] [c0000001fa5779b0] [d0000000016a5b54] .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> [ 91.139666] [c0000001fa577aa0] [c0000000006233e0] .nf_getsockopt+0x68/0x88
> [ 91.139674] [c0000001fa577b40] [c000000000631608] .ip_getsockopt+0xbc/0x128
> [ 91.139682] [c0000001fa577bf0] [c00000000065adf4] .raw_getsockopt+0x18/0x5c
> [ 91.139690] [c0000001fa577c60] [c0000000005b5f60] .sock_common_getsockopt+0x2c/0x40
> [ 91.139697] [c0000001fa577cd0] [c0000000005b3394] .__sys_getsockopt+0xa4/0xd0
> [ 91.139704] [c0000001fa577d80] [c0000000005b5ab0] .__se_sys_socketcall+0x238/0x2b4
> [ 91.139712] [c0000001fa577e30] [c00000000000a31c] system_call+0x5c/0x70
> [ 91.139716] Instruction dump:
> [ 91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 81380000 2b890001 419d000c 393e0060
> [ 91.139736] 48000010 7d57c82a e93e0060 7d295214 <815a0000> 794807e1 41e20010 7c210b78
> [ 91.139752] ---[ end trace f5d1d5431651845d ]---
This is due to 7290d58095 ("module: use relative references for
__ksymtab entries"). This part of kernel/module.c -
/* Divert to percpu allocation if a percpu var. */
if (sym[i].st_shndx == info->index.pcpu)
secbase = (unsigned long)mod_percpu(mod);
else
secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
sym[i].st_value += secbase;
Causes the distance to the target to exceed 32-bits on powerpc, so
it doesn't fit in a rel32 reloc. Not sure how other archs cope.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH] tcp: another fix of uncloning packets before mangling them
From: Eric Dumazet @ 2018-08-28 7:36 UTC (permalink / raw)
To: wen.yang99
Cc: David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev, LKML,
jiang.biao2, zhong.weidong, liu.bo9
In-Reply-To: <1535441465-65170-1-git-send-email-wen.yang99@zte.com.cn>
On Tue, Aug 28, 2018 at 12:32 AM Wen Yang <wen.yang99@zte.com.cn> wrote:
>
> The following warning was caught:
>
> [937151.638394] Call Trace:
> [937151.638401] [<ffffffff8163f2f6>] dump_stack+0x19/0x1b
> [937151.638405] [<ffffffff8107dd70>] warn_slowpath_common+0x70/0xb0
> [937151.638407] [<ffffffff8107deba>] warn_slowpath_null+0x1a/0x20
> [937151.638410] [<ffffffff8158bb7b>] tcp_set_skb_tso_segs+0xeb/0x100
> [937151.638412] [<ffffffff8158bbc7>] tcp_init_tso_segs+0x37/0x50
> [937151.638414] [<ffffffff8158d7b9>] tcp_write_xmit+0x1d9/0xce0
> [937151.638417] [<ffffffff8158e53e>] __tcp_push_pending_frames+0x2e/0xc0
> [937151.638419] [<ffffffff8157cf3c>] tcp_push+0xec/0x120
> [937151.638421] [<ffffffff81580728>] tcp_sendmsg+0xc8/0xc20
> [937151.638424] [<ffffffff815aae24>] inet_sendmsg+0x64/0xb0
> [937151.638428] [<ffffffff810b9565>] ? check_preempt_curr+0x75/0xa0
> [937151.638434] [<ffffffff81519917>] sock_aio_write+0x157/0x180
> [937151.638437] [<ffffffff811e267d>] do_sync_write+0x8d/0xd0
> [937151.638440] [<ffffffff811e2f95>] vfs_write+0x1b5/0x1e0
> [937151.638442] [<ffffffff811e393f>] SyS_write+0x7f/0xe0
> [937151.638445] [<ffffffff816513fd>] system_call_fastpath+0x16/0x1b
>
> According commit c52e2421f736 ("tcp: must unclone packets before
> mangling them"), TCP stack should make sure it owns skbs before
> mangling them.
> And there is another place where skb_unclone() is needed. This patch
> fix that.
>
> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
> Tested-by: Liu Bo <liu.bo9@zte.com.cn>
> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
> ---
> net/ipv4/tcp_output.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 597dbd7..fbe8140 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1793,6 +1793,9 @@ static int tcp_init_tso_segs(struct sk_buff *skb, unsigned int mss_now)
> int tso_segs = tcp_skb_pcount(skb);
>
Certainly not needed.
TCP stack owns its packets, as long they were never sent (not yet in
retransmit queue)
You probably are using an old kernel, missing some backport...
^ permalink raw reply
* [PATCH] tcp: another fix of uncloning packets before mangling them
From: Wen Yang @ 2018-08-28 7:31 UTC (permalink / raw)
To: edumazet, davem, kuznet, yoshfuji
Cc: netdev, linux-kernel, wen.yang99, jiang.biao2, zhong.weidong,
liu.bo9
The following warning was caught:
[937151.638394] Call Trace:
[937151.638401] [<ffffffff8163f2f6>] dump_stack+0x19/0x1b
[937151.638405] [<ffffffff8107dd70>] warn_slowpath_common+0x70/0xb0
[937151.638407] [<ffffffff8107deba>] warn_slowpath_null+0x1a/0x20
[937151.638410] [<ffffffff8158bb7b>] tcp_set_skb_tso_segs+0xeb/0x100
[937151.638412] [<ffffffff8158bbc7>] tcp_init_tso_segs+0x37/0x50
[937151.638414] [<ffffffff8158d7b9>] tcp_write_xmit+0x1d9/0xce0
[937151.638417] [<ffffffff8158e53e>] __tcp_push_pending_frames+0x2e/0xc0
[937151.638419] [<ffffffff8157cf3c>] tcp_push+0xec/0x120
[937151.638421] [<ffffffff81580728>] tcp_sendmsg+0xc8/0xc20
[937151.638424] [<ffffffff815aae24>] inet_sendmsg+0x64/0xb0
[937151.638428] [<ffffffff810b9565>] ? check_preempt_curr+0x75/0xa0
[937151.638434] [<ffffffff81519917>] sock_aio_write+0x157/0x180
[937151.638437] [<ffffffff811e267d>] do_sync_write+0x8d/0xd0
[937151.638440] [<ffffffff811e2f95>] vfs_write+0x1b5/0x1e0
[937151.638442] [<ffffffff811e393f>] SyS_write+0x7f/0xe0
[937151.638445] [<ffffffff816513fd>] system_call_fastpath+0x16/0x1b
According commit c52e2421f736 ("tcp: must unclone packets before
mangling them"), TCP stack should make sure it owns skbs before
mangling them.
And there is another place where skb_unclone() is needed. This patch
fix that.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Tested-by: Liu Bo <liu.bo9@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
---
net/ipv4/tcp_output.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 597dbd7..fbe8140 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1793,6 +1793,9 @@ static int tcp_init_tso_segs(struct sk_buff *skb, unsigned int mss_now)
int tso_segs = tcp_skb_pcount(skb);
if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
+ if (skb_unclone(skb, GFP_ATOMIC))
+ return -ENOMEM;
+
tcp_set_skb_tso_segs(skb, mss_now);
tso_segs = tcp_skb_pcount(skb);
}
@@ -2045,6 +2048,7 @@ static int tcp_mtu_probe(struct sock *sk)
int copy, len;
int mss_now;
int interval;
+ int err;
/* Not currently probing/verifying,
* not in recovery,
@@ -2151,7 +2155,9 @@ static int tcp_mtu_probe(struct sock *sk)
if (len >= probe_size)
break;
}
- tcp_init_tso_segs(nskb, nskb->len);
+ err = tcp_init_tso_segs(nskb, nskb->len);
+ if (unlikely(err < 0))
+ return err;
/* We're ready to send. If this fails, the probe will
* be resegmented into mss-sized pieces by tcp_write_xmit().
@@ -2309,6 +2315,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
tso_segs = tcp_init_tso_segs(skb, mss_now);
BUG_ON(!tso_segs);
+ if (unlikely(tso_segs < 0))
+ break;
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
/* "skb_mstamp" is used as a start point for the retransmit timer */
--
1.8.3.1
^ permalink raw reply related
* Re: bpfilter causes a leftover kernel process
From: Alexei Starovoitov @ 2018-08-28 3:35 UTC (permalink / raw)
To: Olivier Brunel; +Cc: netdev, daniel
In-Reply-To: <20180827183122.0b4ac65e@jjacky.com>
On Mon, Aug 27, 2018 at 06:31:22PM +0200, Olivier Brunel wrote:
>
> So the process is required, never ends and prevents umouting the
it's not required. It's not doing anything useful at the moment
and defaults to 'n' in kconfig. Please disable it your kernel.
> rootfs on shutdown. Unless I'm missing something, there's definitely a
> bug there?
I'm also running Arch Linux in my VM, but I'm not able to reproduce umount issue.
I'm guessing it's somehow related to non-static build and libc.so being busy
with old systemd.
Typical shutdown should have done:
[ 73.498022] shutdown[1]: Sending SIGTERM to remaining processes...
[ 73.505501] shutdown[1]: Sending SIGKILL to remaining processes...
[ 73.512783] shutdown[1]: Unmounting file systems.
And at the time of umount / no processes are alive other than systemd.
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Jens Freimann @ 2018-08-28 5:51 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Tiwei Bie, jasowang, virtualization, linux-kernel, netdev,
virtio-dev, wexu
In-Reply-To: <20180827170005-mutt-send-email-mst@kernel.org>
On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
>Are there still plans to test the performance with vost pmd?
>vhost doesn't seem to show a performance gain ...
Yes, I'm having trouble getting it to work with virtio PMD (it works
with Tiweis guest driver though), but I'm getting closer. Should only
be 1-2 more days.
regards,
Jens
^ permalink raw reply
* URGENT MATTER
From: Willmott Chambers @ 2018-08-28 1:05 UTC (permalink / raw)
Greetings,
I wrote to you before, but you did not answer my mail.
How are you today and your family? I hope you are fine! With due
respect, I am Attorney James Willmott, I sent you a letter last month,
but you did not get back to me with response.& I have an important
information about your heritage worth $5.5 million, which has been
entrusted to you by your late cousin, from your country. I seek your
consent to present you as the next of kin for the claim of this fund,
because the bank has mandated me to present to them the next of kin to
enable them start the legal process for the transfer of this fund to
your bank account.
Your prompt response will be appreciated for more details.
Sincerely,
James Willmott
^ permalink raw reply
* Re: BUG: corrupted list in p9_write_work
From: syzbot @ 2018-08-28 4:42 UTC (permalink / raw)
To: asmadeus, davem, ericvh, linux-kernel, lucho, netdev, rminnich,
syzkaller-bugs, v9fs-developer
In-Reply-To: <0000000000002a2fdf0573107004@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: 050cdc6c9501 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1386bce1400000
kernel config: https://syzkaller.appspot.com/x/.config?x=49927b422dcf0b29
dashboard link: https://syzkaller.appspot.com/bug?extid=1788bd5d4e051da6ec08
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1196b7ba400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1022391e400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1788bd5d4e051da6ec08@syzkaller.appspotmail.com
8021q: adding VLAN 0 to HW filter on device team0
8021q: adding VLAN 0 to HW filter on device team0
list_add corruption. prev->next should be next (ffff8801c5b17ab0), but was
ffff8801c5b17ac0. (prev=ffff8801a92d1b58).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:28!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 13 Comm: kworker/0:1 Not tainted 4.19.0-rc1+ #212
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
FS-Cache: Duplicate cookie detected
Workqueue: events p9_write_work
FS-Cache: O-cookie c=000000008e4eb276 [p=000000002fd7b0b4 fl=222 nc=0 na=1]
RIP: 0010:__list_add_valid.cold.0+0x23/0x25 lib/list_debug.c:26
Code: e8 4f 2b 5a fe eb 97 48 89 d9 48 c7 c7 60 b2 3a 87 e8 62 05 02 fe 0f
0b 48 89 f1 48 c7 c7 20 b3 3a 87 48 89 de e8 4e 05 02 fe <0f> 0b 4c 89 e2
48 89 de 48 c7 c7 60 b4 3a 87 e8 3a 05 02 fe 0f 0b
RSP: 0018:ffff8801d9f17590 EFLAGS: 00010282
FS-Cache: O-cookie d=0000000068a887e4 n=00000000257e8f2f
RAX: 0000000000000075 RBX: ffff8801c5b17ab0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8163ac01 RDI: 0000000000000001
RBP: ffff8801d9f175a8 R08: ffff8801d9f06340 R09: ffffed003b605010
FS-Cache: O-key=[10] '
R10: ffffed003b605010 R11: ffff8801db028087 R12: ffff8801a92d1b58
R13: ffff8801a92d1b58 R14: ffff8801c5b17b04 R15: ffff8801a92d1b58
FS: 0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
34
CR2: 00000000006dc138 CR3: 00000001c845c000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
32
Call Trace:
__list_add include/linux/list.h:60 [inline]
list_add_tail include/linux/list.h:93 [inline]
list_move_tail include/linux/list.h:183 [inline]
p9_write_work+0x34e/0xd50 net/9p/trans_fd.c:470
39
34
37
process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
38
31
36
31
39
'
FS-Cache: N-cookie c=00000000da38e585 [p=000000002fd7b0b4 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=0000000068a887e4 n=000000005afa2e39
FS-Cache: N-key=[10] '
34
32
39
worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
34
37
38
31
36
31
39
'
kthread+0x35a/0x420 kernel/kthread.c:246
FS-Cache: Duplicate cookie detected
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace c3e56d8d2cc1f8a2 ]---
FS-Cache: O-cookie c=000000008e4eb276 [p=000000002fd7b0b4 fl=222 nc=0 na=1]
RIP: 0010:__list_add_valid.cold.0+0x23/0x25 lib/list_debug.c:26
FS-Cache: O-cookie d=0000000068a887e4 n=00000000257e8f2f
Code: e8 4f 2b 5a fe eb 97 48 89 d9 48 c7 c7 60 b2 3a 87 e8 62 05 02 fe 0f
0b 48 89 f1 48 c7 c7 20 b3 3a 87 48 89 de e8 4e 05 02 fe <0f> 0b 4c 89 e2
48 89 de 48 c7 c7 60 b4 3a 87 e8 3a 05 02 fe 0f 0b
FS-Cache: O-key=[10] '
RSP: 0018:ffff8801d9f17590 EFLAGS: 00010282
34
32
RAX: 0000000000000075 RBX: ffff8801c5b17ab0 RCX: 0000000000000000
39
RDX: 0000000000000000 RSI: ffffffff8163ac01 RDI: 0000000000000001
34
RBP: ffff8801d9f175a8 R08: ffff8801d9f06340 R09: ffffed003b605010
37
R10: ffffed003b605010 R11: ffff8801db028087 R12: ffff8801a92d1b58
38
R13: ffff8801a92d1b58 R14: ffff8801c5b17b04 R15: ffff8801a92d1b58
31
FS: 0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
36
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
3139'
CR2: 00000000006dc138 CR3: 00000001c845c000 CR4: 00000000001406f0
FS-Cache: N-cookie c=00000000c3d88b67 [p=000000002fd7b0b4 fl=2 nc=0 na=1]
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
FS-Cache: N-cookie d=0000000068a887e4 n=00000000c137bf7f
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
FS-Cache: N-key=[10] '
^ permalink raw reply
* Re: WARNING in format_decode (2)
From: Alexei Starovoitov @ 2018-08-28 4:05 UTC (permalink / raw)
To: Steven Rostedt
Cc: syzbot, linux-kernel, mingo, syzkaller-bugs, Daniel Borkmann,
netdev
In-Reply-To: <20180827134626.1b943593@gandalf.local.home>
On Mon, Aug 27, 2018 at 01:46:26PM -0400, Steven Rostedt wrote:
> On Mon, 27 Aug 2018 10:10:04 -0700
> syzbot <syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com> wrote:
>
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 2ad0d5269970 Merge git://git.kernel.org/pub/scm/linux/kern..
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15b8efba400000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=79e695838ce7a210
> > dashboard link: https://syzkaller.appspot.com/bug?extid=1ec5c5ec949c4adaa0c4
> > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1626f761400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
> >
> > ** **
> > ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
> > **********************************************************
> > ------------[ cut here ]------------
> > Please remove unsupported %WARNING: CPU: 0 PID: 6453 at lib/vsprintf.c:2149 format_decode+0x8fc/0xaf0
> > lib/vsprintf.c:2149
> > Kernel panic - not syncing: panic_on_warn set ...
> >
> > CPU: 0 PID: 6453 Comm: syz-executor7 Not tainted 4.18.0+ #190
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
> > panic+0x238/0x4e7 kernel/panic.c:184
> > __warn.cold.8+0x163/0x1ba kernel/panic.c:536
> > report_bug+0x252/0x2d0 lib/bug.c:186
> > fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
> > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
> > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993
> > RIP: 0010:format_decode+0x8fc/0xaf0 lib/vsprintf.c:2149
> > Code: e8 59 59 c9 fa 41 c6 04 24 12 e9 94 fd ff ff e8 4a 59 c9 fa 0f be f3
> > 48 c7 c7 60 bc 89 87 c6 05 28 aa d2 01 01 e8 e4 e9 93 fa <0f> 0b 4d 8b 7d
> > c0 e9 56 fe ff ff 48 8b bd 68 ff ff ff e8 cd 4f 08
> > RSP: 0018:ffff8801b6b27688 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: ffffffff816422b1 RDI: ffff8801b6b27378
> > RBP: ffff8801b6b27730 R08: ffff8801b69a0040 R09: 0000000000000006
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b6b277a8
> > R13: ffff8801b6b27708 R14: 0000000000000000 R15: ffff8801b6b27b04
> > vsnprintf+0x185/0x1b60 lib/vsprintf.c:2245
> > vscnprintf+0x2d/0x80 lib/vsprintf.c:2396
> > __trace_array_vprintk.part.60+0xc7/0x330 kernel/trace/trace.c:2990
> > __trace_array_vprintk kernel/trace/trace.c:3021 [inline]
> > trace_array_vprintk kernel/trace/trace.c:3021 [inline]
> > trace_vprintk+0x5f/0x90 kernel/trace/trace.c:3059
> > __trace_printk+0xce/0x120 kernel/trace/trace_printk.c:237
> > ____bpf_trace_printk kernel/trace/bpf_trace.c:274 [inline]
>
> Looks like a bug in the bpf trace printk code.
yes. looks like %p% slipped past bpf_trace_printk() runtime checks.
^ permalink raw reply
* Re: GPL compliance issue with liquidio/lio_23xx_vsw.bin firmware
From: Felix Manlunas @ 2018-08-28 0:04 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-firmware, netdev@vger.kernel.org, Derek Chickles,
Satanand Burla, Felix Manlunas, Raghu Vatsavayi, Manish Awasthi,
Manojkumar.Panicker
In-Reply-To: <13e96e81-8794-4d69-3df1-eb07a18655ac@redhat.com>
On Mon, Aug 27, 2018 at 05:01:10PM +0200, Florian Weimer wrote:
> liquidio/lio_23xx_vsw.bin contains a compiled MIPS Linux kernel:
>
> $ tail --bytes=+1313 liquidio/lio_23xx_vsw.bin > elf
> $ readelf -aW elf
> […]
> [ 6] __ksymtab PROGBITS ffffffff80e495f8 64a5f8 00d130
> 00 A 0 0 8
> [ 7] __ksymtab_gpl PROGBITS ffffffff80e56728 657728 008400
> 00 A 0 0 8
> [ 8] __ksymtab_strings PROGBITS ffffffff80e5eb28 65fb28 018868
> 00 A 0 0 1
> […]
> Symbol table '.symtab' contains 1349 entries:
> Num: Value Size Type Bind Vis Ndx Name
> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
> 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS
> arch/mips/kernel/head.o
> 2: 0000000000000000 0 FILE LOCAL DEFAULT ABS init/main.c
> 3: 0000000000000000 0 FILE LOCAL DEFAULT ABS
> include/linux/types.h
> […]
>
> Yet there is no corresponding source provided, and LICENCE.cavium lacks
> the required notices.
>
> Thanks,
> Florian
Cavium apologizes for the oversight. Cavium has been advertising the
appropriate license terms including the existence of GPL in the firmware
in our outbox releases. We will update the license terms in LICENCE.cavium
in our upstream contribution in collaboration with our legal team.
Felix
^ permalink raw reply
* Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL
From: Al Viro @ 2018-08-28 0:03 UTC (permalink / raw)
To: Cong Wang
Cc: Jamal Hadi Salim, Kees Cook, LKML, Jiri Pirko, David Miller,
Linux Kernel Network Developers
In-Reply-To: <CAM_iQpVEyq9hR3bbOtLFKoLo6nHCtiL6A__uEz3JdDO79GF_8A@mail.gmail.com>
On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > I cant think of any challenges. Cong/Jiri? Would it require development
> > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > dont want them in include/net).
> > BTW, the idea of improving grep-ability of the code by prefixing the
> > ops appropriately makes sense. i.e we should have ops->cls_init,
> > ops->act_init etc.
>
> Hmm? Isn't struct tcf_proto_ops used and must be provided
> by each tc filter module? How does it work if you move it into
> net/sched/* for out-of-tree modules? Are they supposed to
> include "..../net/sched/tcf_proto.h"?? Or something else?
If you care about out-of-tree modules, that could easily live in
include/net/tcf_proto.h, provided that it's not pulled by indirect
includes into hell knows how many places. Try
make allmodconfig
make >/dev/null 2>&1
find -name '.*.cmd'|xargs grep sch_generic.h
That finds 2977 files here, most of them having nothing to do with
net/sched.
> BTW, we need some grep tool that really understands C syntax,
> not making each variable friendly to plain grep.
This isn't the matter of C syntax; it needs to handle C typization,
and you really can't do that anywhere near reliably without looking
at preprocessor output. Which very much depends upon .config...
BTW, something odd in cls_u32.c: what happens if we have the following
graph:
tcf_proto <tp>, it's ->data being <c0> and ->root - <ht0>
tc_u_common <c0>, in its ->hlist
<ht1>, in its ->ht[0]
<knode>
<ht0>
and set ->ht_down in <knode> to the <ht0>? AFAICS,
there's nothing to prevent that - TCA_U32_LINK being
0x80000000 will do just that. What happens upon u32_destroy()
in that case? Unless I'm misreading that code, refcounts will be
<c0>: 1
<ht0>: 2
<ht1>: 1
and in u32_destroy() we'll get this:
root_ht = <ht0>
tp_c = <c0>
if (root_ht && --root_ht->refcnt == 0)
u32_destroy_hnode(tp, root_ht, extack);
decrements refcnt to 1 and does nothing else.
if (--tp_c->refcnt == 0) {
is satisfied
hlist_del(&tp_c->hnode);
<c0> unhashed
while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
we take ht = <ht1>
u32_clear_hnode(tp, ht, extack);
which does
for (h = 0; h <= ht->divisor; h++) {
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
n = <knode>
RCU_INIT_POINTER(ht->ht[h],
rtnl_dereference(n->next));
remove <knode> from <ht1>->ht[0]
tcf_unbind_filter(tp, &n->res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(&ht->handle_idr, n->handle);
if (tcf_exts_get_net(&n->exts))
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
else
u32_destroy_key(n->tp, n, true);
... and we hit u32_destroy_key(<tp>, <knode>, true), which does
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
ht = <ht0>
tcf_exts_destroy(&n->exts);
tcf_exts_put_net(&n->exts);
if (ht && --ht->refcnt == 0)
kfree(ht);
*NOW* <ht0>->refcnt is 0, and we free the damn thing.
....
kfree(n);
<knode> is freed and we return to u32_destroy_hnode() where we
see that there's nothing else left in <ht1>->ht[...] and return
to u32_destroy(). Where
RCU_INIT_POINTER(tp_c->hlist, ht->next);
sets <c0>->hlist to <ht1>->next, aka <h0>. Which is already freed.
/* u32_destroy_key() will later free ht for us, if it's
* still referenced by some knode
*/
if (--ht->refcnt == 0)
kfree_rcu(ht, rcu);
<ht1>->refcnt reaches 0 and we free it (RCU-delayed)
}
... and we go for the next iteration, this time with ht = <ht0>.
Doing all kinds of unsanitary things to the memory it used to occupy...
Incidentally, if we hit
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
instead of u32_destroy_key(), the things don't seem to be any better - we
won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
break on the second pass through the loop - it'll free <ht0> there and
return. Setting us up for trouble, since when u32_delete_key_freepf_work()
finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
to freed memory and decrementing its contents...
What am I missing in there? Is it just "we should never have ->ht_down
pointing to anyone's ->root"? If so, I'm not sure how to detect that;
if not... what should happen to the orphaned root_ht? Should it
remain on the list? We might have two tcf_proto sharing tp->data,
so tp_c and its list might very well survive the u32_destroy()...
Note, BTW, that if we do leave the orphan on the list and later
change the tc_u_knode so that ->ht_down doesn't point to that
thing anymore, we'll get its refcount incremented to 2 in
u32_init_knode(), then decremented to 1 by u32_set_parms() and
then arrange for u32_delete_key_work() to be run. Which will
drive the refcount to 0 and free the damn thing. While it's
still in the middle of ->hlist...
^ permalink raw reply
* Re: [PATCH] bpf: fix build error with clang
From: Alexei Starovoitov @ 2018-08-28 3:44 UTC (permalink / raw)
To: Stefan Agner; +Cc: daniel, kafai, ast, mka, netdev, linux-kernel
In-Reply-To: <20180827193042.3573-1-stefan@agner.ch>
On Mon, Aug 27, 2018 at 09:30:42PM +0200, Stefan Agner wrote:
> Building the newly introduced BPF_PROG_TYPE_SK_REUSEPORT leads to
> a compile time error when building with clang:
> net/core/filter.o: In function `sk_reuseport_convert_ctx_access':
> ../net/core/filter.c:7284: undefined reference to `__compiletime_assert_7284'
>
> It seems that clang has issues resolving hweight_long at compile
> time. Since SK_FL_PROTO_MASK is a constant, we can use the interface
> for known constant arguments which works fine with clang.
>
> Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
> Signed-off-by: Stefan Agner <stefan@agner.ch>
Applied, Thanks
^ permalink raw reply
* Re: [PATCH RFC net-next] net/fib: Poptrie based FIB lookup
From: Md. Islam @ 2018-08-27 23:03 UTC (permalink / raw)
To: David Ahern
Cc: Stephen Hemminger, Netdev, David Miller, Eric Dumazet,
Alexey Kuznetsov, makita.toshiaki, panda, yasuhiro.ohara,
john fastabend, alexei.starovoitov
In-Reply-To: <e091f84e-f6c6-24a7-9e45-bd941cceae8c@gmail.com>
On Mon, Aug 27, 2018 at 12:56 PM, David Ahern <dsahern@gmail.com> wrote:
> On 8/27/18 10:24 AM, Stephen Hemminger wrote:
>>
>> Also, as Dave mentioned any implementation needs to handle multiple namespaces
>> and routing tables.
>>
>> Could this alternative lookup be enabled via sysctl at runtime rather than kernel config?
>>
>
> I spent time a couple of years ago refactoring IPv4 fib lookups with the
> intent of allowing different algorithms - for use cases like this:
>
> https://github.com/dsahern/linux/commits/net/ipv4-fib-ops
>
> (it is also another way to solve the API nightmare that ipv6 has become).
>
> But the poptrie patches that have been sent so far have much bigger
> problems that need to be addressed before anyone worries about how to
> select poptrie vs lc-trie.
>
> The patch does not handle errors (e.g., if attributes such as tos,
> metric/priority and multipath are not allowed you need to fail the route
> insert;
Poptrie is not intended to replace LC-trie for processing incoming
packets. It rather tries to provide an alternative way to do FIB
lookup in XDP forwarding. I know, its confusing that in the patch,
fib_lookup calls poptrie_lookup. This is just to show how
poptrie_lookup should be called. We shouldn't actually use
poptrie_lookup in fib_lookup.
TOS, metric/priority and multipath can easily be incorporated by
storing fib_alias rather than netdevice, But the main objective here
is not to worry about TOS, metric/priority, and so on. Let's assume
that we want Linux to work as a TCAM/ ASIC based router. The only job
of Linux here is to forward incoming packet to a destination port ASAP
without worrying about those TOS, metric/priority, and so on.
further, what happens if someone creates > 255 netdevices?),
Most of the commercial ASIC/TCAM routers have no more than 64 ports
these days. I think, 255 netdevice is sufficient in that case. If we
need more than 255 NICs, we can accommodate that by using u16 rather
than u8.
> last patch has both fib tables populated (a no-go), does not handle
> delete or dumps. In the current form, the poptrie algorithm can not be
Yeah, we will need to implement delete/update and dumps. Those will
not be the hardest part, I think. Insertion and lookup are the main
challenge. Once everyone agree on Insertion and Lookup, those can be
implemented incrementally.
Yes, delete and dumps will be needed. This
> taken for a test drive. My suggestion to make it a compile time
> selection is just so people can actually try it out using current admin
> tools.
^ permalink raw reply
* [PATCH] mac80211: fix to follow standard
From: Yuan-Chi Pang @ 2018-08-28 2:24 UTC (permalink / raw)
To: johannes; +Cc: davem, linux-wireless, netdev, linux-kernel, fu3mo6goo
IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says:
If it is a target mesh STA, it shall update its own HWMP SN to
maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1
immediately before it generates a PREP element in response to a
PREQ element.
Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com>
---
net/mac80211/mesh_hwmp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 35ad398..6c21a26 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -572,6 +572,11 @@ static void hwmp_preq_frame_process(struct ieee80211_sub_if_data *sdata,
forward = false;
reply = true;
target_metric = 0;
+
+ if (SN_GT(target_sn, ifmsh->sn)) {
+ ifmsh->sn = target_sn;
+ }
+
if (time_after(jiffies, ifmsh->last_sn_update +
net_traversal_jiffies(sdata)) ||
time_before(jiffies, ifmsh->last_sn_update)) {
--
2.7.4
^ permalink raw reply related
* Re: [PATCH RFC net-next] net/fib: Poptrie based FIB lookup
From: Md. Islam @ 2018-08-27 22:29 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Netdev, David Miller, David Ahern, Eric Dumazet, Alexey Kuznetsov,
makita.toshiaki, panda, yasuhiro.ohara, john fastabend,
alexei.starovoitov
In-Reply-To: <20180827092420.791bb1ad@shemminger-XPS-13-9360>
On Mon, Aug 27, 2018 at 12:24 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Sun, 26 Aug 2018 22:28:48 -0400
> "Md. Islam" <mislam4@kent.edu> wrote:
>
>> This patch implements Poptrie [1] based FIB lookup. It exhibits pretty
>> impressive lookup performance compared to LC-trie. This poptrie
>> implementation however somewhat deviates from the original
>> implementation [2]. I tested this patch very rigorously with several
>> FIB tables containing half a million routes. I got same result as
>> LC-trie based fib_lookup().
>>
>> Poptrie is intended to work in conjunction with LC-trie (not replace
>> it). It is primarily designed to overcome many issues of TCAM based
>> router [1]. It [1] shows that the Poptrie can achieve very impressive
>> lookup performance on CPU. This patch will mainly be used by XDP
>> forwarding.
>>
>> 1. Asai, Hirochika, and Yasuhiro Ohara. "Poptrie: A compressed trie
>> with population count for fast and scalable software IP routing table
>> lookup." ACM SIGCOMM Computer Communication Review. 2015.
>>
>> 2. https://github.com/pixos/poptrie
>
>
> I am glad to see more research in to lookup speed. Here are some non-technical
> feedback. Looking deeper takes longer.
>
> The license in github version is not compatiable with GPL. If you based your
> code off that, you need to get approval from original copyright holder.
No, I developed it from scratch. It was not developed off the original
code. To make it consistent with the paper, I name few variables as in
the paper. But nothing has been taken from the copyrighted code.
Poptrie formation and lookup is also very different from the paper.
>
> The code is not formatted according to current kernel coding style.
> Please use checkpatch to see what the issues are.
>
> It is preferred that a function return a value, rather than being void
> and returing result by reference. Example:
I will fix those in next patches. I will also add a CONFIG option so
that it can be disable/enabled.
>
>> +
>> +/*We assume that pt->root is not NULL*/
>> +void poptrie_lookup(struct poptrie *pt, __be32 dest, struct net_device **dev)
>> +{
> ...
>
>> + *dev = get_fib(&pt->nhs, fib_index);
>> + return;
>> + }
>
> Why not?
pt->root will not be NULL when we call it XDP forwarding. Checking
this for every packet in a high speed router is redundant, I think.
Currently this function is being called during system startup, and
pt->root was NULL at that time. That's why I checked it before the
function is being called.
> static struct net_device *poptrie_lookup(struct poptrie *pt, __be32 dest)
>
> Also, as Dave mentioned any implementation needs to handle multiple namespaces
> and routing tables.
>
Currently it supports multiple routing tables. poptrie is an instance
of fib_table, Each fib_table has its poptrie. Supporting multiple
namespace wouldn't be difficult. Once the core functionality is
accepted added, those can be implemented incrementally.
> Could this alternative lookup be enabled via sysctl at runtime rather than kernel config?
^ permalink raw reply
* Re: [Patch iproute2] ss: add UNIX_DIAG_VFS and UNIX_DIAG_ICONS for unix sockets
From: Stephen Hemminger @ 2018-08-27 22:27 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev
In-Reply-To: <20180827214652.29318-1-xiyou.wangcong@gmail.com>
On Mon, 27 Aug 2018 14:46:52 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:
> UNIX_DIAG_VFS and UNIX_DIAG_ICONS are never used by ss,
> make them available in ss -e output.
>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> misc/ss.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/misc/ss.c b/misc/ss.c
> index 41e7762b..d28bc1ec 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -16,6 +16,7 @@
> #include <sys/ioctl.h>
> #include <sys/socket.h>
> #include <sys/uio.h>
> +#include <sys/sysmacros.h>
Why is this included, it isn't on my system.
> #include <netinet/in.h>
> #include <string.h>
> #include <errno.h>
> @@ -3604,6 +3605,28 @@ static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh,
> out(" %c-%c",
> mask & 1 ? '-' : '<', mask & 2 ? '-' : '>');
> }
> + if (tb[UNIX_DIAG_VFS]) {
> + struct unix_diag_vfs uv;
> +
> + memcpy(&uv, RTA_DATA(tb[UNIX_DIAG_VFS]), sizeof(uv));
Copy here is unnecessary, you can just do:
const struct unix_diag_vfs *uv
= RTA_DATA(tb[UNIX_DIAG_VFS]);
> + out(" ino:%u dev:%u/%u", uv.udiag_vfs_ino, major(uv.udiag_vfs_dev),
> + minor(uv.udiag_vfs_dev));
> + }
> + if (tb[UNIX_DIAG_ICONS]) {
> + int len = RTA_PAYLOAD(tb[UNIX_DIAG_ICONS]);
> + __u32 *peers = malloc(len);
> + int i;
Ditto, allocation and copy are not necessary, just reference the data.
> + if (!peers) {
> + fprintf(stderr, "ss: failed to malloc buffer\n");
> + abort();
> + }
> + memcpy(peers, RTA_DATA(tb[UNIX_DIAG_ICONS]), len);
> + out(" peers:");
> + for (i = 0; i < len / sizeof(__u32); i++)
> + out(" %u", peers[i]);
> + free(peers);
> + }
> }
>
> return 0;
> @@ -3641,6 +3664,8 @@ static int unix_show_netlink(struct filter *f)
> req.r.udiag_show = UDIAG_SHOW_NAME | UDIAG_SHOW_PEER | UDIAG_SHOW_RQLEN;
> if (show_mem)
> req.r.udiag_show |= UDIAG_SHOW_MEMINFO;
> + if (show_details)
> + req.r.udiag_show |= UDIAG_SHOW_VFS | UDIAG_SHOW_ICONS;
>
> return handle_netlink_request(f, &req.nlh, sizeof(req), unix_show_sock);
> }
^ permalink raw reply
* Re: [PATCH 1/1] net/rds: Use rdma_read_gids to get connection SGID/DGID in IPv6
From: David Miller @ 2018-08-27 22:26 UTC (permalink / raw)
To: yanjun.zhu; +Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel
In-Reply-To: <20180825071905.2749-1-yanjun.zhu@oracle.com>
From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Sat, 25 Aug 2018 15:19:05 +0800
> In IPv4, the newly introduced rdma_read_gids is used to read the SGID/DGID
> for the connection which returns GID correctly for RoCE transport as well.
>
> In IPv6, rdma_read_gids is also used. The following are why rdma_read_gids
> is introduced.
>
> rdma_addr_get_dgid() for RoCE for client side connections returns MAC
> address, instead of DGID.
> rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
> when more than one IP address is assigned to the netdevice.
>
> So the transport agnostic rdma_read_gids() API is provided by rdma_cm
> module.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Applied.
^ permalink raw reply
* Re: [PATCH] r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices
From: David Miller @ 2018-08-27 22:25 UTC (permalink / raw)
To: a3at.mail; +Cc: netdev, hkallweit1, nic_swsd
In-Reply-To: <20180826140309.32310-1-a3at.mail@gmail.com>
From: Azat Khuzhin <a3at.mail@gmail.com>
Date: Sun, 26 Aug 2018 17:03:09 +0300
> I have two Ethernet adapters:
> r8169 0000:03:01.0 eth0: RTL8169sb/8110sb, 00:14:d1:14:2d:49, XID 10000000, IRQ 18
> r8169 0000:01:00.0 eth0: RTL8168e/8111e, 64:66:b3:11:14:5d, XID 2c200000, IRQ 30
> And after upgrading from linux 4.15 [1] to linux 4.18+ [2] RTL8169sb failed to
> receive any packets. tcpdump shows a lot of checksum mismatch.
>
> [1]: a0f79386a4968b4925da6db2d1daffd0605a4402
> [2]: 0519359784328bfa92bf0931bf0cff3b58c16932 (4.19 merge window opened)
>
> I started bisecting and the found that [3] breaks it. According to [4]:
> "For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig
> needs to be set after the tx/rx is enabled."
> So I moved rtl_init_rxcfg() after enabling tx/rs and now my adapter works
> (RTL8168e works too).
>
> [3]: 3559d81e76bfe3803e89f2e04cf6ef7ab4f3aace
> [4]: e542a2269f232d61270ceddd42b73a4348dee2bb ("r8169: adjust the RxConfig
> settings.")
>
> Also drop "rx" from rtl_set_rx_tx_config_registers(), since it does nothing
> with it already.
>
> Fixes: 3559d81e76bfe3803e89f2e04cf6ef7ab4f3aace ("r8169: simplify
> rtl_hw_start_8169")
>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
> Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
> ---
> It looks like calling rtl_init_rxcfg() the second time is fine, but I
> can move it into rtl_hw_start_8169())
Heiner, please review.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox