Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: KASAN: use-after-free Read in rxrpc_queue_local
From: syzbot @ 2019-08-12 18:06 UTC (permalink / raw)
  To: arvid.brodin, davem, dhowells, linux-afs, linux-kernel, netdev,
	syzkaller-bugs, xiyou.wangcong
In-Reply-To: <0000000000007593f4058fea60d8@google.com>

syzbot has bisected this bug to:

commit b9a1e627405d68d475a3c1f35e685ccfb5bbe668
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jul 4 00:21:13 2019 +0000

     hsr: implement dellink to clean up resources

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10b4ebce600000
start commit:   125b7e09 net: tc35815: Explicitly check NET_IP_ALIGN is no..
git tree:       net
final crash:    https://syzkaller.appspot.com/x/report.txt?x=12b4ebce600000
console output: https://syzkaller.appspot.com/x/log.txt?x=14b4ebce600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a4c9e9f08e9e8960
dashboard link: https://syzkaller.appspot.com/bug?extid=78e71c5bab4f76a6a719
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=165ec172600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=119d4eba600000

Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Fixes: b9a1e627405d ("hsr: implement dellink to clean up resources")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH 1/2] ip nexthop: Add space to display properly when showing a group
From: David Ahern @ 2019-08-12 18:14 UTC (permalink / raw)
  To: Donald Sharp, netdev
In-Reply-To: <20190810001843.32068-2-sharpd@cumulusnetworks.com>

On 8/9/19 6:18 PM, Donald Sharp wrote:
> When displaying a nexthop group made up of other nexthops, the display
> line shows this when you have additional data at the end:
> 
> id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74proto zebra
> 
> Modify code so that it shows:
> 
> id 42 group 43/44/45/46/47/48/49/50/51/52/53/54/55/56/57/58/59/60/61/62/63/64/65/66/67/68/69/70/71/72/73/74 proto zebra
> 
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
>  ip/ipnexthop.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
> index 97f09e74..f35aab52 100644
> --- a/ip/ipnexthop.c
> +++ b/ip/ipnexthop.c
> @@ -186,6 +186,7 @@ static void print_nh_group(FILE *fp, const struct rtattr *grps_attr)
>  
>  		close_json_object();
>  	}
> +	print_string(PRINT_FP, NULL, "%s", " ");
>  	close_json_array(PRINT_JSON, NULL);
>  }
>  
> 

Looks right to me:
Reviewed-by: David Ahern <dsahern@gmail.com>

Stephen: this should go through your tree.

^ permalink raw reply

* Re: [PATCH v3 net-next 0/3] net: batched receive in GRO path
From: Eric Dumazet @ 2019-08-12 18:20 UTC (permalink / raw)
  To: Ioana Ciocoi Radulescu, Edward Cree
  Cc: David Miller, netdev, Eric Dumazet,
	linux-net-drivers@solarflare.com
In-Reply-To: <AM0PR04MB4994A035C6121DC13C0EFBB194D30@AM0PR04MB4994.eurprd04.prod.outlook.com>



On 8/12/19 7:51 PM, Ioana Ciocoi Radulescu wrote:
>> -----Original Message-----
>> From: Edward Cree <ecree@solarflare.com>
>> Sent: Friday, August 9, 2019 8:32 PM
>> To: Ioana Ciocoi Radulescu <ruxandra.radulescu@nxp.com>
>> Cc: David Miller <davem@davemloft.net>; netdev <netdev@vger.kernel.org>;
>> Eric Dumazet <eric.dumazet@gmail.com>; linux-net-drivers@solarflare.com
>> Subject: Re: [PATCH v3 net-next 0/3] net: batched receive in GRO path
>>
>> On 09/08/2019 18:14, Ioana Ciocoi Radulescu wrote:
>>> Hi Edward,
>>>
>>> I'm probably missing a lot of context here, but is there a reason
>>> this change targets only the napi_gro_frags() path and not the
>>> napi_gro_receive() one?
>>> I'm trying to understand what drivers that don't call napi_gro_frags()
>>> should do in order to benefit from this batching feature.
>> The sfc driver (which is what I have lots of hardware for, so I can
>>  test it) uses napi_gro_frags().
>> It should be possible to do a similar patch to napi_gro_receive(),
>>  if someone wants to put in the effort of writing and testing it.
> 
> Rather tricky, since I'm not really familiar with GRO internals and
> probably don't understand all the implications of such a change :-/
> Any pointers to what I should pay attention to/sensitive areas that
> need extra care?
> 
>> However, there are many more callers, so more effort required to
>>  make sure none of them care whether the return value is GRO_DROP
>>  or GRO_NORMAL (since the listified version cannot give that
>>  indication).
> 
> At a quick glance, there's only one driver that looks at the return
> value of napi_gro_receive (drivers/net/ethernet/socionext/netsec.c),
> and it only updates interface stats based on it.
> 
>> Also, the guidance from Eric is that drivers seeking high performance
>>  should use napi_gro_frags(), as this allows GRO to recycle the SKB.
> 
> But this guidance is for GRO-able frames only, right? If I try to use
> napi_gro_frags() indiscriminately on the Rx path, I get a big
> performance penalty in some cases - e.g. forwarding of non-TCP
> single buffer frames.

How big is big ?

You can not win all the time.

Some design (or optimizations) are for the most common case,
they might hurt some other use cases.

> 
> On the other hand, Eric shot down my attempt to differentiate between
> TCP and non-TCP frames inside the driver (see 
> https://patchwork.ozlabs.org/patch/1135817/#2222236), so I'm not
> really sure what's the recommended approach here?

If GRO is not good enough for non-TCP buffer frames, please make the change in GRO,
or document that disabling GRO might help some setups.

We do not want each driver to implement their own logic that are a
maintenance nightmare.

GRO can aggregate non-TCP frames (say if you add any encapsulation over TCP),
with a very significant gain, so detecting if an incoming frame is a 'TCP packet'
in the driver would be a serious problem if the traffic is 100% SIT for example.


^ permalink raw reply

* Re: [PATCHv2 net 0/2] Add netdev_level_ratelimited to avoid netdev msg flush
From: David Miller @ 2019-08-12 18:27 UTC (permalink / raw)
  To: tlfalcon; +Cc: liuhangbin, netdev, joe
In-Reply-To: <9bb8e9af-4d9b-7c16-f58d-e299b1f30007@linux.ibm.com>

From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Mon, 12 Aug 2019 10:56:39 -0500

> Hi, thanks for reporting this. I was able to recreate this on my own
> system. The virtual ethernet's multicast filter list size is limited,
> and the driver will check that there is available space before adding
> entries.  The problem is that the size is encoded as big endian, but
> the driver does not convert it for little endian systems after
> retrieving it from the device tree.  As a result the driver is
> requesting more than the hypervisor can allow and getting this error
> in reply. I will submit a patch to correct this soon.

This is 1,000 times better than just trying to make the warning message
go away, thanks Thomas!

^ permalink raw reply

* Re: Error when loading BPF_CGROUP_INET_EGRESS program with bpftool
From: Andrii Nakryiko @ 2019-08-12 18:27 UTC (permalink / raw)
  To: Fejes Ferenc; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAAej5NbkQDpDXEtsROmLmNidSP8qN3VRE56s3z91zHw9XjtNZA@mail.gmail.com>

On Mon, Aug 12, 2019 at 1:59 AM Fejes Ferenc <fejes@inf.elte.hu> wrote:
>
> Greetings!
>
> I found a strange error when I tried to load a BPF_CGROUP_INET_EGRESS
> prog with bpftool. Loading the same program from C code with
> bpf_prog_load_xattr works without problem.
>
> The error message I got:
> bpftool prog loadall hbm_kern.o /sys/fs/bpf/hbm type cgroup/skb

You need "cgroup_skb/egress" instead of "cgroup/skb" (or try just
dropping it, bpftool will try to guess the type from program's section
name, which would be correct in this case).

> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> ; return ALLOW_PKT | REDUCE_CW;
> 0: (b7) r0 = 3
> 1: (95) exit
> At program exit the register R0 has value (0x3; 0x0) should have been
> in (0x0; 0x1)
> processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
> peak_states 0 mark_read 0
>
> libbpf: -- END LOG --
> libbpf: failed to load program 'cgroup_skb/egress'
> libbpf: failed to load object 'hbm_kern.o'
> Error: failed to load object file
>
>
> My environment: 5.3-rc3 / net-next master (both producing the error).
> Libbpf and bpftool installed from the kernel source (cleaned and
> reinstalled when I tried a new kernel). I compiled the program with
> Clang 8, on Ubuntu 19.10 server image, the source:
>
> #include <linux/bpf.h>
> #include "bpf_helpers.h"
>
> #define DROP_PKT        0
> #define ALLOW_PKT       1
> #define REDUCE_CW       2
>
> SEC("cgroup_skb/egress")
> int hbm(struct __sk_buff *skb)
> {
>         return ALLOW_PKT | REDUCE_CW;
> }
> char _license[] SEC("license") = "GPL";
>
>
> I also tried to trace down the bug with gdb. It seems like the
> section_names array in libbpf.c filled with garbage, especially the

I did the same, section_names appears to be correct, not sure what was
going on in your case. The problem is that "cgroup/skb", which you
provided on command line, overrides this section name and forces
bpftool to guess program type as BPF_CGROUP_INET_INGRESS, which
restricts return codes to just 0 or 1, while for
BPF_CGROUP_INET_EGRESS i is [0, 3].

> expected_attach_type fields (in my case, this contains
> BPF_CGROUP_INET_INGRESS instead of BPF_CGROUP_INET_EGRESS).
>
> Thanks!

^ permalink raw reply

* Re: [PATCH net-next] r8169: make use of xmit_more
From: Heiner Kallweit @ 2019-08-12 18:38 UTC (permalink / raw)
  To: Holger Hoffstätte, Eric Dumazet
  Cc: Realtek linux nic maintainers, David Miller,
	netdev@vger.kernel.org, Sander Eikelenboom
In-Reply-To: <06438520-1902-bc7c-7bb2-015dfcdf5457@applied-asynchrony.com>

On 12.08.2019 11:59, Holger Hoffstätte wrote:
> On 8/9/19 10:52 AM, Holger Hoffstätte wrote:
>> On 8/9/19 10:25 AM, Eric Dumazet wrote:
>> (snip)
>>>>
>>>> So that didn't take long - got another timeout this morning during some
>>>> random light usage, despite sg/tso being disabled this time.
>>>> Again the only common element is the xmit_more patch. :(
>>>> Not sure whether you want to revert this right away or wait for 5.4-rc1
>>>> feedback. Maybe this too is chipset-specific?
>>>>
>>>>> Thanks a lot for the analysis and testing. Then I'll submit the disabling
>>>>> of SG on RTL8168evl (on your behalf), independent of whether it fixes
>>>>> the timeout issue.
>>>>
>>>> Got it, thanks!
>>>>
>>>> Holger
>>>
>>> I would try this fix maybe ?
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c
>>> b/drivers/net/ethernet/realtek/r8169_main.c
>>> index b2a275d8504cf099cff738f2f7554efa9658fe32..e77628813daba493ad50dab9ac1e3703e38b560c
>>> 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -5691,6 +5691,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>                   */
>>>                  smp_wmb();
>>>                  netif_stop_queue(dev);
>>> +               door_bell = true;
>>>          }
>>>
>>>          if (door_bell)
>>>
>>
>> Thanks Eric, I'll give that a try and see how it fares over the next few days.
>> It suspiciously looks like it could help..
> 
> Good news everyone!
> 
> After three days non-stop action between two machines and hundreds of GBs
> pushed back and forth: not a single timeout or hiccup. Nice! \o/
> Eric, please send this as a proper patch for -next. Feel free to add my
> Tested-by.
> 
Thanks for the feedback! I can submit the fix with Eric's "Suggested-by".

> cheers
> Holger
> 
Heiner

^ permalink raw reply

* [RESEND][PATCH v3 bpf-next] btf: expose BTF info through sysfs
From: Andrii Nakryiko @ 2019-08-12 18:39 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Make .BTF section allocated and expose its contents through sysfs.

/sys/kernel/btf directory is created to contain all the BTFs present
inside kernel. Currently there is only kernel's main BTF, represented as
/sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
each module will expose its BTF as /sys/kernel/btf/<module-name> file.

Current approach relies on a few pieces coming together:
1. pahole is used to take almost final vmlinux image (modulo .BTF and
   kallsyms) and generate .BTF section by converting DWARF info into
   BTF. This section is not allocated and not mapped to any segment,
   though, so is not yet accessible from inside kernel at runtime.
2. objcopy dumps .BTF contents into binary file and subsequently
   convert binary file into linkable object file with automatically
   generated symbols _binary__btf_kernel_bin_start and
   _binary__btf_kernel_bin_end, pointing to start and end, respectively,
   of BTF raw data.
3. final vmlinux image is generated by linking this object file (and
   kallsyms, if necessary). sysfs_btf.c then creates
   /sys/kernel/btf/kernel file and exposes embedded BTF contents through
   it. This allows, e.g., libbpf and bpftool access BTF info at
   well-known location, without resorting to searching for vmlinux image
   on disk (location of which is not standardized and vmlinux image
   might not be even available in some scenarios, e.g., inside qemu
   during testing).

Alternative approach using .incbin assembler directive to embed BTF
contents directly was attempted but didn't work, because sysfs_proc.o is
not re-compiled during link-vmlinux.sh stage. This is required, though,
to update embedded BTF data (initially empty data is embedded, then
pahole generates BTF info and we need to regenerate sysfs_btf.o with
updated contents, but it's too late at that point).

If BTF couldn't be generated due to missing or too old pahole,
sysfs_btf.c handles that gracefully by detecting that
_binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
/sys/kernel/btf at all.

v2->v3:
- added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H);
- created proper kobject (btf_kobj) for btf directory (Greg K-H);
- undo v2 change of reusing vmlinux, as it causes extra kallsyms pass
  due to initially missing  __binary__btf_kernel_bin_{start/end} symbols;

v1->v2:
- allow kallsyms stage to re-use vmlinux generated by gen_btf();

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---

Resending with shorter CC list as it seems vger blocked my patch.
Added Greg's Reviewd-by, though.

 Documentation/ABI/testing/sysfs-kernel-btf | 17 +++++++
 kernel/bpf/Makefile                        |  3 ++
 kernel/bpf/sysfs_btf.c                     | 51 +++++++++++++++++++++
 scripts/link-vmlinux.sh                    | 52 ++++++++++++++--------
 4 files changed, 104 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-btf
 create mode 100644 kernel/bpf/sysfs_btf.c

diff --git a/Documentation/ABI/testing/sysfs-kernel-btf b/Documentation/ABI/testing/sysfs-kernel-btf
new file mode 100644
index 000000000000..5390f8001f96
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-btf
@@ -0,0 +1,17 @@
+What:		/sys/kernel/btf
+Date:		Aug 2019
+KernelVersion:	5.5
+Contact:	bpf@vger.kernel.org
+Description:
+		Contains BTF type information and related data for kernel and
+		kernel modules.
+
+What:		/sys/kernel/btf/kernel
+Date:		Aug 2019
+KernelVersion:	5.5
+Contact:	bpf@vger.kernel.org
+Description:
+		Read-only binary attribute exposing kernel's own BTF type
+		information with description of all internal kernel types. See
+		Documentation/bpf/btf.rst for detailed description of format
+		itself.
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 29d781061cd5..e1d9adb212f9 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
 ifeq ($(CONFIG_INET),y)
 obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
 endif
+ifeq ($(CONFIG_SYSFS),y)
+obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
+endif
diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
new file mode 100644
index 000000000000..092e63b9758b
--- /dev/null
+++ b/kernel/bpf/sysfs_btf.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Provide kernel BTF information for introspection and use by eBPF tools.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kobject.h>
+#include <linux/init.h>
+#include <linux/sysfs.h>
+
+/* See scripts/link-vmlinux.sh, gen_btf() func for details */
+extern char __weak _binary__btf_kernel_bin_start[];
+extern char __weak _binary__btf_kernel_bin_end[];
+
+static ssize_t
+btf_kernel_read(struct file *file, struct kobject *kobj,
+		struct bin_attribute *bin_attr,
+		char *buf, loff_t off, size_t len)
+{
+	memcpy(buf, _binary__btf_kernel_bin_start + off, len);
+	return len;
+}
+
+static struct bin_attribute bin_attr_btf_kernel __ro_after_init = {
+	.attr = { .name = "kernel", .mode = 0444, },
+	.read = btf_kernel_read,
+};
+
+static struct kobject *btf_kobj;
+
+static int __init btf_kernel_init(void)
+{
+	int err;
+
+	if (!_binary__btf_kernel_bin_start)
+		return 0;
+
+	btf_kobj = kobject_create_and_add("btf", kernel_kobj);
+	if (IS_ERR(btf_kobj)) {
+		err = PTR_ERR(btf_kobj);
+		btf_kobj = NULL;
+		return err;
+	}
+
+	bin_attr_btf_kernel.size = _binary__btf_kernel_bin_end -
+				   _binary__btf_kernel_bin_start;
+
+	return sysfs_create_bin_file(btf_kobj, &bin_attr_btf_kernel);
+}
+
+subsys_initcall(btf_kernel_init);
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index a7124f895b24..cb93832c6ad7 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -56,8 +56,8 @@ modpost_link()
 }
 
 # Link of vmlinux
-# ${1} - optional extra .o files
-# ${2} - output file
+# ${1} - output file
+# ${@:2} - optional extra .o files
 vmlinux_link()
 {
 	local lds="${objtree}/${KBUILD_LDS}"
@@ -70,9 +70,9 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
-			${1}"
+			${@:2}"
 
-		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}	\
+		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${1}	\
 			-T ${lds} ${objects}
 	else
 		objects="-Wl,--whole-archive			\
@@ -81,9 +81,9 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
-			${1}"
+			${@:2}"
 
-		${CC} ${CFLAGS_vmlinux} -o ${2}			\
+		${CC} ${CFLAGS_vmlinux} -o ${1}			\
 			-Wl,-T,${lds}				\
 			${objects}				\
 			-lutil -lrt -lpthread
@@ -92,23 +92,34 @@ vmlinux_link()
 }
 
 # generate .BTF typeinfo from DWARF debuginfo
+# ${1} - vmlinux image
+# ${2} - file to dump raw BTF data into
 gen_btf()
 {
-	local pahole_ver;
+	local pahole_ver
+	local bin_arch
 
 	if ! [ -x "$(command -v ${PAHOLE})" ]; then
 		info "BTF" "${1}: pahole (${PAHOLE}) is not available"
-		return 0
+		return 1
 	fi
 
 	pahole_ver=$(${PAHOLE} --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/')
 	if [ "${pahole_ver}" -lt "113" ]; then
 		info "BTF" "${1}: pahole version $(${PAHOLE} --version) is too old, need at least v1.13"
-		return 0
+		return 1
 	fi
 
-	info "BTF" ${1}
+	info "BTF" ${2}
+	vmlinux_link ${1}
 	LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
+
+	# dump .BTF section into raw binary file to link with final vmlinux
+	bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
+		cut -d, -f1 | cut -d' ' -f2)
+	${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
+	${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
+		--rename-section .data=.BTF .btf.kernel.bin ${2}
 }
 
 # Create ${2} .o file with all symbols from the ${1} object file
@@ -153,6 +164,7 @@ sortextable()
 # Delete output files in case of error
 cleanup()
 {
+	rm -f .btf.*
 	rm -f .tmp_System.map
 	rm -f .tmp_kallsyms*
 	rm -f .tmp_vmlinux*
@@ -215,6 +227,13 @@ ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
 info MODINFO modules.builtin.modinfo
 ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
 
+btf_kernel_bin_o=""
+if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
+	if gen_btf .tmp_vmlinux.btf .btf.kernel.bin.o ; then
+		btf_kernel_bin_o=.btf.kernel.bin.o
+	fi
+fi
+
 kallsymso=""
 kallsyms_vmlinux=""
 if [ -n "${CONFIG_KALLSYMS}" ]; then
@@ -246,11 +265,11 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
 	kallsyms_vmlinux=.tmp_vmlinux2
 
 	# step 1
-	vmlinux_link "" .tmp_vmlinux1
+	vmlinux_link .tmp_vmlinux1 ${btf_kernel_bin_o}
 	kallsyms .tmp_vmlinux1 .tmp_kallsyms1.o
 
 	# step 2
-	vmlinux_link .tmp_kallsyms1.o .tmp_vmlinux2
+	vmlinux_link .tmp_vmlinux2 .tmp_kallsyms1.o ${btf_kernel_bin_o}
 	kallsyms .tmp_vmlinux2 .tmp_kallsyms2.o
 
 	# step 3
@@ -261,18 +280,13 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
 		kallsymso=.tmp_kallsyms3.o
 		kallsyms_vmlinux=.tmp_vmlinux3
 
-		vmlinux_link .tmp_kallsyms2.o .tmp_vmlinux3
-
+		vmlinux_link .tmp_vmlinux3 .tmp_kallsyms2.o ${btf_kernel_bin_o}
 		kallsyms .tmp_vmlinux3 .tmp_kallsyms3.o
 	fi
 fi
 
 info LD vmlinux
-vmlinux_link "${kallsymso}" vmlinux
-
-if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
-	gen_btf vmlinux
-fi
+vmlinux_link vmlinux "${kallsymso}" "${btf_kernel_bin_o}"
 
 if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
 	info SORTEX vmlinux
-- 
2.17.1


^ permalink raw reply related

* [PATCH net] ibmveth: Convert multicast list size for little-endian systems
From: Thomas Falcon @ 2019-08-12 17:43 UTC (permalink / raw)
  To: netdev; +Cc: liuhangbin, davem, joe, Thomas Falcon

The ibm,mac-address-filters property defines the maximum number of
addresses the hypervisor's multicast filter list can support. It is
encoded as a big-endian integer in the OF device tree, but the virtual
ethernet driver does not convert it for use by little-endian systems.
As a result, the driver is not behaving as it should on affected systems
when a large number of multicast addresses are assigned to the device.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index d654c23..b50a6cf 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1645,7 +1645,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
 
 	adapter->vdev = dev;
 	adapter->netdev = netdev;
-	adapter->mcastFilterSize = *mcastFilterSize_p;
+	adapter->mcastFilterSize = be32_to_cpu(*mcastFilterSize_p);
 	adapter->pool_config = 0;
 
 	netif_napi_add(netdev, &adapter->napi, ibmveth_poll, 16);
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next] r8169: fix sporadic transmit timeout issue
From: Heiner Kallweit @ 2019-08-12 18:47 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Eric Dumazet, Holger Hoffstätte

Holger reported sporadic transmit timeouts and it turned out that one
path misses ringing the doorbell. Fix was suggested by Eric.

Fixes: ef14358546b1 ("r8169: make use of xmit_more")
Suggested-by: Eric Dumazet <edumazet@google.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 641a34942..448047a32 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5681,6 +5681,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 */
 		smp_wmb();
 		netif_stop_queue(dev);
+		door_bell = true;
 	}
 
 	if (door_bell)
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v3 13/17] mvpp2: no need to check return value of debugfs_create functions
From: Greg Kroah-Hartman @ 2019-08-12 19:01 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: netdev, David S. Miller, Maxime Chevallier, Nathan Huckleberry
In-Reply-To: <CAKwvOdnP4OU9g_ebjnT=r1WcGRvsFsgv3NbguhFKOtt8RWNHwA@mail.gmail.com>

On Mon, Aug 12, 2019 at 10:55:51AM -0700, Nick Desaulniers wrote:
> On Sat, Aug 10, 2019 at 3:17 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > When calling debugfs functions, there is no need to ever check the
> > return value.  The function can work or not, but the code logic should
> > never do something different based on this.
> 
> Maybe adding this recommendation to the comment block above the
> definition of debugfs_create_dir() in fs/debugfs/inode.c would help
> prevent this issue in the future?  What failure means, and how to
> proceed can be tricky; more documentation can only help in this
> regard.

If it was there, would you have read it?  :)

I'll add it to the list for when I revamp the debugfs documentation that
is already in the kernel, that very few people actually read...

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v4 14/14] dt-bindings: net: add bindings for ADIN PHY driver
From: Rob Herring @ 2019-08-12 19:02 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel@vger.kernel.org, David Miller,
	Mark Rutland, Florian Fainelli, Heiner Kallweit, Andrew Lunn
In-Reply-To: <20190812112350.15242-15-alexandru.ardelean@analog.com>

On Mon, Aug 12, 2019 at 5:24 AM Alexandru Ardelean
<alexandru.ardelean@analog.com> wrote:
>
> This change adds bindings for the Analog Devices ADIN PHY driver, detailing
> all the properties implemented by the driver.
>
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> ---
>  .../devicetree/bindings/net/adi,adin.yaml     | 73 +++++++++++++++++++
>  MAINTAINERS                                   |  1 +
>  2 files changed, 74 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/adi,adin.yaml

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* libbpf distro packaging
From: Julia Kartseva @ 2019-08-12 19:04 UTC (permalink / raw)
  To: labbott@redhat.com, acme@kernel.org,
	debian-kernel@lists.debian.org, netdev@vger.kernel.org
  Cc: Andrii Nakryiko, Andrey Ignatov, Alexei Starovoitov,
	Yonghong Song, jolsa@kernel.org

I would like to bring up libbpf publishing discussion started at [1].
The present state of things is that libbpf is built from kernel tree, e.g. [2]
For Debian and [3] for Fedora whereas the better way would be having a
package built from github mirror. The advantages of the latter:
- Consistent, ABI matching versioning across distros
- The mirror has integration tests
- No need in kernel tree to build a package
- Changes can be merged directly to github w/o waiting them to be merged
through bpf-next -> net-next -> main
There is a PR introducing a libbpf.spec which can be used as a starting point: [4]
Any comments regarding the spec itself can be posted there.
In the future it may be used as a source of truth.
Please consider switching libbpf packaging to the github mirror instead
of the kernel tree.
Thanks

[1] https://lists.iovisor.org/g/iovisor-dev/message/1521
[2] https://packages.debian.org/sid/libbpf4.19
[3] http://rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/l/libbpf-5.3.0-0.rc2.git0.1.fc31.x86_64.html
[4] https://github.com/libbpf/libbpf/pull/64



^ permalink raw reply

* [PATCH v2] net/mlx4_en: fix a memory leak bug
From: Wenwen Wang @ 2019-08-12 19:11 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Tariq Toukan, David S. Miller,
	open list:MELLANOX ETHERNET DRIVER (mlx4_en),
	open list:MELLANOX MLX4 core VPI driver, open list

In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through
kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS
indirection. However, if mlx4_qp_alloc() fails, the allocated
'rss_map->indir_qp' is not deallocated, leading to a memory leak bug.

To fix the above issue, add the 'qp_alloc_err' label to free
'rss_map->indir_qp'.

Fixes: 4931c6ef04b4 ("net/mlx4_en: Optimized single ring steering")

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6c01314..db3552f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1187,7 +1187,7 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 	err = mlx4_qp_alloc(mdev->dev, priv->base_qpn, rss_map->indir_qp);
 	if (err) {
 		en_err(priv, "Failed to allocate RSS indirection QP\n");
-		goto rss_err;
+		goto qp_alloc_err;
 	}
 
 	rss_map->indir_qp->event = mlx4_en_sqp_event;
@@ -1241,6 +1241,7 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 		       MLX4_QP_STATE_RST, NULL, 0, 0, rss_map->indir_qp);
 	mlx4_qp_remove(mdev->dev, rss_map->indir_qp);
 	mlx4_qp_free(mdev->dev, rss_map->indir_qp);
+qp_alloc_err:
 	kfree(rss_map->indir_qp);
 	rss_map->indir_qp = NULL;
 rss_err:
-- 
2.7.4


^ permalink raw reply related

* [PATCH net] net: phy: consider AN_RESTART status when reading link status
From: Heiner Kallweit @ 2019-08-12 19:20 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: netdev@vger.kernel.org, Yonglong Liu

After configuring and restarting aneg we immediately try to read the
link status. On some systems the PHY may not yet have cleared the
"aneg complete" and "link up" bits, resulting in a false link-up
signal. See [0] for a report.
Clause 22 and 45 both require the PHY to keep the AN_RESTART
bit set until the PHY actually starts auto-negotiation.
Let's consider this in the generic functions for reading link status.
The commit marked as fixed is the first one where the patch applies
cleanly.

[0] https://marc.info/?t=156518400300003&r=1&w=2

Fixes: c1164bb1a631 ("net: phy: check PMAPMD link status only in genphy_c45_read_link")
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy-c45.c    | 14 ++++++++++++++
 drivers/net/phy/phy_device.c | 12 +++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index b9d414578..58bb25e4a 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -219,6 +219,20 @@ int genphy_c45_read_link(struct phy_device *phydev)
 	int val, devad;
 	bool link = true;
 
+	if (phydev->c45_ids.devices_in_package & MDIO_DEVS_AN) {
+		val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_CTRL1);
+		if (val < 0)
+			return val;
+
+		/* Autoneg is being started, therefore disregard current
+		 * link status and report link as down.
+		 */
+		if (val & MDIO_AN_CTRL1_RESTART) {
+			phydev->link = 0;
+			return 0;
+		}
+	}
+
 	while (mmd_mask && link) {
 		devad = __ffs(mmd_mask);
 		mmd_mask &= ~BIT(devad);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index b039632de..163295dbc 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1741,7 +1741,17 @@ EXPORT_SYMBOL(genphy_aneg_done);
  */
 int genphy_update_link(struct phy_device *phydev)
 {
-	int status;
+	int status = 0, bmcr;
+
+	bmcr = phy_read(phydev, MII_BMCR);
+	if (bmcr < 0)
+		return bmcr;
+
+	/* Autoneg is being started, therefore disregard BMSR value and
+	 * report link as down.
+	 */
+	if (bmcr & BMCR_ANRESTART)
+		goto done;
 
 	/* The link state is latched low so that momentary link
 	 * drops can be detected. Do not double-read the status
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v3 13/17] mvpp2: no need to check return value of debugfs_create functions
From: Nick Desaulniers @ 2019-08-12 19:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, David S. Miller, Maxime Chevallier, Nathan Huckleberry
In-Reply-To: <20190812190128.GB14905@kroah.com>

On Mon, Aug 12, 2019 at 12:01 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Aug 12, 2019 at 10:55:51AM -0700, Nick Desaulniers wrote:
> > On Sat, Aug 10, 2019 at 3:17 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > When calling debugfs functions, there is no need to ever check the
> > > return value.  The function can work or not, but the code logic should
> > > never do something different based on this.
> >
> > Maybe adding this recommendation to the comment block above the
> > definition of debugfs_create_dir() in fs/debugfs/inode.c would help
> > prevent this issue in the future?  What failure means, and how to
> > proceed can be tricky; more documentation can only help in this
> > regard.
>
> If it was there, would you have read it?  :)

Absolutely; I went looking for it, which is why I haven't added my
reviewed by tag, because it's not clear from the existing comment
block how callers should handle the return value, particularly as you
describe in this commit's commit message.

>
> I'll add it to the list for when I revamp the debugfs documentation that
> is already in the kernel, that very few people actually read...
>
> thanks,
>
> greg k-h

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: WARNING: ODEBUG bug in netdev_freemem (2)
From: Thomas Gleixner @ 2019-08-12 19:45 UTC (permalink / raw)
  To: syzbot
  Cc: alexander.h.duyck, amritha.nambiar, andriy.shevchenko, avagin,
	davem, dmitry.torokhov, dvyukov, eric.dumazet, f.fainelli, gregkh,
	idosch, jiri, kimbrownkd, linux-kernel, netdev, syzkaller-bugs,
	tyhicks, wanghai26, yuehaibing
In-Reply-To: <000000000000ea2c30058f901624@google.com>

On Wed, 7 Aug 2019, syzbot wrote:

> syzbot has found a reproducer for the following crash on:
> 
> HEAD commit:    13dfb3fa Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1671e69a600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=c4521ac872a4ccc3afec
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=170542c2600000

I can't reproduce that here. Can you please apply the patch from:

  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906241920540.32342@nanos.tec.linutronix.de

and try to reproduce with that applied? That should give us more
information about the actual delayed work.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v3 13/17] mvpp2: no need to check return value of debugfs_create functions
From: Greg Kroah-Hartman @ 2019-08-12 19:51 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: netdev, David S. Miller, Maxime Chevallier, Nathan Huckleberry
In-Reply-To: <CAKwvOdkWzr5fu3v0KR2XXj0dqCZki=JOoMft9SMjs+XmZ8HpUg@mail.gmail.com>

On Mon, Aug 12, 2019 at 12:44:36PM -0700, Nick Desaulniers wrote:
> On Mon, Aug 12, 2019 at 12:01 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Aug 12, 2019 at 10:55:51AM -0700, Nick Desaulniers wrote:
> > > On Sat, Aug 10, 2019 at 3:17 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > When calling debugfs functions, there is no need to ever check the
> > > > return value.  The function can work or not, but the code logic should
> > > > never do something different based on this.
> > >
> > > Maybe adding this recommendation to the comment block above the
> > > definition of debugfs_create_dir() in fs/debugfs/inode.c would help
> > > prevent this issue in the future?  What failure means, and how to
> > > proceed can be tricky; more documentation can only help in this
> > > regard.
> >
> > If it was there, would you have read it?  :)
> 
> Absolutely; I went looking for it, which is why I haven't added my
> reviewed by tag, because it's not clear from the existing comment
> block how callers should handle the return value, particularly as you
> describe in this commit's commit message.

Ok, fair enough, I'll update the documentation soon, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH v4 7/9] mfd: ioc3: Add driver for SGI IOC3 chip
From: Jakub Kicinski @ 2019-08-12 19:52 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, Dmitry Torokhov,
	Lee Jones, David S. Miller, Srinivas Kandagatla, Alessandro Zummo,
	Alexandre Belloni, Greg Kroah-Hartman, Jiri Slaby,
	Evgeniy Polyakov, linux-mips, linux-kernel, linux-input, netdev,
	linux-rtc, linux-serial
In-Reply-To: <20190811093212.88635fb1a6c796a073ec71ff@suse.de>

On Sun, 11 Aug 2019 09:32:12 +0200, Thomas Bogendoerfer wrote:
> > Also please don't use stdint types in the kernel, please try checkpatch
> > to catch coding style issues.  
> 
> my patch already reduces them and checkpatch only warns about usage of printk
> for the network part. Changing that to dev_warn/dev_err in the mfd patch didn't
> seem the right thing to do. As I'm splitting the conversion patch into a few
> steps I could also replace the printks.

Thanks for looking into it. I was referring to the use of uint32_t
instead of u32. Perhaps checkpatch has to be motivated with the --strict
option to point those out?

^ permalink raw reply

* Re: [PATCH 0/7] Add definition for the number of standard PCI BARs
From: Bjorn Helgaas @ 2019-08-12 20:01 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Sebastian Ott, Gerald Schaefer, H. Peter Anvin,
	Giuseppe Cavallaro, Alexandre Torgue, Matt Porter,
	Alexandre Bounine, Peter Jones, Bartlomiej Zolnierkiewicz,
	Cornelia Huck, Alex Williamson, kvm, linux-fbdev, netdev, x86,
	linux-s390, linux-pci, linux-kernel
In-Reply-To: <20190811150802.2418-1-efremov@linux.com>

On Sun, Aug 11, 2019 at 06:07:55PM +0300, Denis Efremov wrote:
> Code that iterates over all standard PCI BARs typically uses
> PCI_STD_RESOURCE_END, but this is error-prone because it requires
> "i <= PCI_STD_RESOURCE_END" rather than something like
> "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
> way PCI_SRIOV_NUM_BARS is used. There is already the definition
> PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
> 
> The patch is splitted into 7 parts for different drivers/subsystems for
> easy readability.

This looks good.  I can take all these together, since they all depend
on the first patch.  I have a few comments on the individual patches.

> Denis Efremov (7):
>   PCI: Add define for the number of standard PCI BARs
>   s390/pci: Replace PCI_BAR_COUNT with PCI_STD_NUM_BARS
>   x86/PCI: Use PCI_STD_NUM_BARS in loops instead of PCI_STD_RESOURCE_END
>   PCI/net: Use PCI_STD_NUM_BARS in loops instead of PCI_STD_RESOURCE_END
>   rapidio/tsi721: use PCI_STD_NUM_BARS in loops instead of
>     PCI_STD_RESOURCE_END
>   efifb: Use PCI_STD_NUM_BARS in loops instead of PCI_STD_RESOURCE_END
>   vfio_pci: Use PCI_STD_NUM_BARS in loops instead of
>     PCI_STD_RESOURCE_END
> 
>  arch/s390/include/asm/pci.h                      |  5 +----
>  arch/s390/include/asm/pci_clp.h                  |  6 +++---
>  arch/s390/pci/pci.c                              | 16 ++++++++--------
>  arch/s390/pci/pci_clp.c                          |  6 +++---
>  arch/x86/pci/common.c                            |  2 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
>  drivers/pci/quirks.c                             |  2 +-
>  drivers/rapidio/devices/tsi721.c                 |  2 +-
>  drivers/vfio/pci/vfio_pci.c                      |  4 ++--
>  drivers/vfio/pci/vfio_pci_config.c               |  2 +-
>  drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
>  drivers/video/fbdev/efifb.c                      |  2 +-
>  include/linux/pci.h                              |  2 +-
>  include/uapi/linux/pci_regs.h                    |  1 +
>  15 files changed, 29 insertions(+), 31 deletions(-)
> 
> -- 
> 2.21.0
> 

^ permalink raw reply

* Re: [PATCH 4/7] PCI/net: Use PCI_STD_NUM_BARS in loops instead of PCI_STD_RESOURCE_END
From: Bjorn Helgaas @ 2019-08-12 20:02 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Giuseppe Cavallaro, Alexandre Torgue, netdev, linux-pci,
	linux-kernel
In-Reply-To: <20190811150802.2418-5-efremov@linux.com>

The subject can be simply:

  <prefix>: Loop using PCI_STD_NUM_BARS

to keep them a little shorter so "git log --online" doesn't wrap.

On Sun, Aug 11, 2019 at 06:08:00PM +0300, Denis Efremov wrote:
> This patch refactors the loop condition scheme from
> 'i <= PCI_STD_RESOURCE_END' to 'i < PCI_STD_NUM_BARS'.

  Refactor loops to use 'i < PCI_STD_NUM_BARS' instead of 'i <=
  PCI_STD_RESOURCE_END'.

See https://chris.beams.io/posts/git-commit/

> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 4 ++--
>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

This patch touches two unrelated drivers and should be split up.
When you do that, pay attention to the convention for commit log
prefixes, e.g.,

  $ git log --oneline drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
  37e9c087c814 stmmac: pci: Fix typo in IOT2000 comment
  d4a62ea411f9 stmmac: pci: Use pci_dev_id() helper
  e0c1d14a1a32 stmmac: pci: Adjust IOT2000 matching

  $ git log --oneline drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
  ea8c1c642ea5 net: dwc-xlgmac: declaration of dual license in headers
  65e0ace2c5cd net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> index 86f9c07a38cf..cfe496cdd78b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> @@ -258,7 +258,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
>  	}
>  
>  	/* Get the base address of device */
> -	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  		if (pci_resource_len(pdev, i) == 0)
>  			continue;
>  		ret = pcim_iomap_regions(pdev, BIT(i), pci_name(pdev));
> @@ -296,7 +296,7 @@ static void stmmac_pci_remove(struct pci_dev *pdev)
>  
>  	stmmac_dvr_remove(&pdev->dev);
>  
> -	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  		if (pci_resource_len(pdev, i) == 0)
>  			continue;
>  		pcim_iounmap_regions(pdev, BIT(i));
> diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c b/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
> index 386bafe74c3f..fa8604d7b797 100644
> --- a/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
> +++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
> @@ -34,7 +34,7 @@ static int xlgmac_probe(struct pci_dev *pcidev, const struct pci_device_id *id)
>  		return ret;
>  	}
>  
> -	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  		if (pci_resource_len(pcidev, i) == 0)
>  			continue;
>  		ret = pcim_iomap_regions(pcidev, BIT(i), XLGMAC_DRV_NAME);
> -- 
> 2.21.0
> 

^ permalink raw reply

* [PATCH net] netlink: Fix nlmsg_parse as a wrapper for strict message parsing
From: David Ahern @ 2019-08-12 20:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, johannes.berg, edumazet, David Ahern

From: David Ahern <dsahern@gmail.com>

Eric reported a syzbot warning:

BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510
CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x191/0x1f0 lib/dump_stack.c:113
 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
 nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510
 rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543
 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223
 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241
 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
 netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328
 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg net/socket.c:657 [inline]
 ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
 __sys_sendmmsg+0x53a/0xae0 net/socket.c:2413
 __do_sys_sendmmsg net/socket.c:2442 [inline]
 __se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439
 __x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

The root cause is nlmsg_parse calling __nla_parse which means the
header struct size is not checked.

nlmsg_parse should be a wrapper around __nlmsg_parse with
NL_VALIDATE_STRICT for the validate argument very much like
nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL.

Fixes: 3de6440354465 ("netlink: re-add parse/validate functions in strict mode")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/netlink.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index e4650e5b64a1..b140c8f1be22 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -684,9 +684,8 @@ static inline int nlmsg_parse(const struct nlmsghdr *nlh, int hdrlen,
 			      const struct nla_policy *policy,
 			      struct netlink_ext_ack *extack)
 {
-	return __nla_parse(tb, maxtype, nlmsg_attrdata(nlh, hdrlen),
-			   nlmsg_attrlen(nlh, hdrlen), policy,
-			   NL_VALIDATE_STRICT, extack);
+	return __nlmsg_parse(nlh, hdrlen, tb, maxtype, policy,
+			     NL_VALIDATE_STRICT, extack);
 }
 
 /**
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH 0/7] Add definition for the number of standard PCI BARs
From: Thomas Gleixner @ 2019-08-12 20:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, Sebastian Ott, Gerald Schaefer, H. Peter Anvin,
	Giuseppe Cavallaro, Alexandre Torgue, Matt Porter,
	Alexandre Bounine, Peter Jones, Bartlomiej Zolnierkiewicz,
	Cornelia Huck, Alex Williamson, kvm, linux-fbdev, netdev, x86,
	linux-s390, linux-pci, linux-kernel
In-Reply-To: <20190812200134.GB11785@google.com>

On Mon, 12 Aug 2019, Bjorn Helgaas wrote:

> On Sun, Aug 11, 2019 at 06:07:55PM +0300, Denis Efremov wrote:
> > Code that iterates over all standard PCI BARs typically uses
> > PCI_STD_RESOURCE_END, but this is error-prone because it requires
> > "i <= PCI_STD_RESOURCE_END" rather than something like
> > "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
> > way PCI_SRIOV_NUM_BARS is used. There is already the definition
> > PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
> > 
> > The patch is splitted into 7 parts for different drivers/subsystems for
> > easy readability.
> 
> This looks good.  I can take all these together, since they all depend
> on the first patch.  I have a few comments on the individual patches.
> 
> > Denis Efremov (7):
> >   PCI: Add define for the number of standard PCI BARs
> >   s390/pci: Replace PCI_BAR_COUNT with PCI_STD_NUM_BARS
> >   x86/PCI: Use PCI_STD_NUM_BARS in loops instead of PCI_STD_RESOURCE_END

Fine with me for the x86 part. That's your turf anyway :)

Thanks,

	tglx

^ permalink raw reply

* Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
From: Sander Eikelenboom @ 2019-08-12 20:17 UTC (permalink / raw)
  To: Eric Dumazet, netdev, linux-kernel
In-Reply-To: <4d803565-b716-42ab-1db8-3dcade91e939@gmail.com>

On 12/08/2019 19:56, Eric Dumazet wrote:
> 
> 
> On 8/12/19 2:50 PM, Sander Eikelenboom wrote:
>> L.S.,
>>
>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9),
>> one of my Xen VM's (which gets quite some network load) crashed.
>> See below for the stacktrace.
>>
>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. 
>> I haven't encountered this on 5.2, so it seems to be an regression against 5.2.
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>> [16930.653595] general protection fault: 0000 [#1] SMP NOPTI
>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1
>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.653741] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>> [16930.653762] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
> 
> crash in " mov    0x20(%rax),%eax"   and RAX=fffe888005bf62c0 (not a valid kernel address)
> 
> Look like one bit corruption maybe.
> 
> Nothing comes to mind really between 5.2 and 53 that could explain this.

Hi Eric,

Hmm could be it's a rare coincidence, sp that it just never occurred on pre 5.3 by chance.
Let's wait and see if it reoccurs, will report back if it does.

Thanks for your explanation.

--
Sander


>> [16930.653791] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>> [16930.653819] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>> [16930.653848] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>> [16930.653875] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>> [16930.653913] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [16930.653943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [16930.653965] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>> [16930.653993] Call Trace:
>> [16930.654005]  <IRQ>
>> [16930.654018]  tcp_ack+0xbb0/0x1230
>> [16930.654033]  tcp_rcv_established+0x2e8/0x630
>> [16930.654053]  tcp_v4_do_rcv+0x129/0x1d0
>> [16930.654070]  tcp_v4_rcv+0xac9/0xcb0
>> [16930.654088]  ip_protocol_deliver_rcu+0x27/0x1b0
>> [16930.654109]  ip_local_deliver_finish+0x3f/0x50
>> [16930.654128]  ip_local_deliver+0x4d/0xe0
>> [16930.654145]  ? ip_protocol_deliver_rcu+0x1b0/0x1b0
>> [16930.654163]  ip_rcv+0x4c/0xd0
>> [16930.654179]  __netif_receive_skb_one_core+0x79/0x90
>> [16930.654200]  netif_receive_skb_internal+0x2a/0xa0
>> [16930.654219]  napi_gro_receive+0xe7/0x140
>> [16930.654237]  xennet_poll+0x9be/0xae0
>> [16930.654254]  net_rx_action+0x136/0x340
>> [16930.654271]  __do_softirq+0xdd/0x2cf
>> [16930.654287]  irq_exit+0x7a/0xa0
>> [16930.654304]  xen_evtchn_do_upcall+0x27/0x40
>> [16930.654320]  xen_hvm_callback_vector+0xf/0x20
>> [16930.654339]  </IRQ>
>> [16930.654349] RIP: 0033:0x55de0d87db99
>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6
>> [16930.654432] RSP: 002b:00007ffd5531eec8 EFLAGS: 00000a87 ORIG_RAX: ffffffffffffff0c
>> [16930.655004] RAX: 0000000000000002 RBX: 000055de0f3e8e50 RCX: 000000000000007f
>> [16930.655034] RDX: 000055de0f3dc2d2 RSI: 0000000000003492 RDI: 0000000000000002
>> [16930.655062] RBP: 0000000000007fff R08: 00000000000080ea R09: 00000000000001f0
>> [16930.655089] R10: 000055de0f3d8e40 R11: 0000000000000094 R12: 000055de0f3e0f2a
>> [16930.655116] R13: 0000000000000010 R14: 0000000000007f16 R15: 0000000000000080
>> [16930.655144] Modules linked in:
>> [16930.655200] ---[ end trace 533367c95501b645 ]---
>> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0
>> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8
>> [16930.655312] RSP: 0000:ffffc90000003ad8 EFLAGS: 00010286
>> [16930.655331] RAX: fffe888005bf62c0 RBX: ffff8880115fb800 RCX: 000000008010000b
>> [16930.655360] RDX: 00000000000005a0 RSI: ffff8880115fb800 RDI: ffff888016b00880
>> [16930.655387] RBP: ffff888016b00880 R08: 0000000000000001 R09: 0000000000000000
>> [16930.655414] R10: ffff88800ae00800 R11: 00000000bfe632e6 R12: 00000000000005a0
>> [16930.655441] R13: 0000000000000001 R14: 00000000bfe62d46 R15: 0000000000000004
>> [16930.655475] FS:  00007fe71fe2cb80(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000
>> [16930.655502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [16930.655525] CR2: 000055de0f3e7000 CR3: 0000000011f32000 CR4: 00000000000006f0
>> [16930.655553] Kernel panic - not syncing: Fatal exception in interrupt
>> [16930.655789] Kernel Offset: disabled
>>


^ permalink raw reply

* Re: [PATCH net] net: phy: consider AN_RESTART status when reading link status
From: Andrew Lunn @ 2019-08-12 20:24 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org,
	Yonglong Liu
In-Reply-To: <46efcf9f-0938-e017-706c-fb5a400f6fbb@gmail.com>

On Mon, Aug 12, 2019 at 09:20:02PM +0200, Heiner Kallweit wrote:
> After configuring and restarting aneg we immediately try to read the
> link status. On some systems the PHY may not yet have cleared the
> "aneg complete" and "link up" bits, resulting in a false link-up
> signal. See [0] for a report.
> Clause 22 and 45 both require the PHY to keep the AN_RESTART
> bit set until the PHY actually starts auto-negotiation.
> Let's consider this in the generic functions for reading link status.
> The commit marked as fixed is the first one where the patch applies
> cleanly.
> 
> [0] https://marc.info/?t=156518400300003&r=1&w=2
> 
> Fixes: c1164bb1a631 ("net: phy: check PMAPMD link status only in genphy_c45_read_link")
> Tested-by: Yonglong Liu <liuyonglong@huawei.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: Error when loading BPF_CGROUP_INET_EGRESS program with bpftool
From: Fejes Ferenc @ 2019-08-12 20:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAEf4BzZ27SnYkQ=psqxeWadLhnspojiJGQrGB0JRuPkP+GTiNQ@mail.gmail.com>

Thanks for the answer, I really appreciate it. I tried omitting
"cgroup/skb" to let libbpf guess the attach type, but I got the same
error. Really interesting, because I got the error
> libbpf: failed to load program 'cgroup_skb/egress'
wich is weird because it shows that libbpf guess the program type
correctly. So something definitely on my side - thank you for verifyng
that - I try to investigate it!

Ferenc
Andrii Nakryiko <andrii.nakryiko@gmail.com> ezt írta (időpont: 2019.
aug. 12., H, 20:27):
>
> On Mon, Aug 12, 2019 at 1:59 AM Fejes Ferenc <fejes@inf.elte.hu> wrote:
> >
> > Greetings!
> >
> > I found a strange error when I tried to load a BPF_CGROUP_INET_EGRESS
> > prog with bpftool. Loading the same program from C code with
> > bpf_prog_load_xattr works without problem.
> >
> > The error message I got:
> > bpftool prog loadall hbm_kern.o /sys/fs/bpf/hbm type cgroup/skb
>
> You need "cgroup_skb/egress" instead of "cgroup/skb" (or try just
> dropping it, bpftool will try to guess the type from program's section
> name, which would be correct in this case).
>
> > libbpf: load bpf program failed: Invalid argument
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf:
> > ; return ALLOW_PKT | REDUCE_CW;
> > 0: (b7) r0 = 3
> > 1: (95) exit
> > At program exit the register R0 has value (0x3; 0x0) should have been
> > in (0x0; 0x1)
> > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
> > peak_states 0 mark_read 0
> >
> > libbpf: -- END LOG --
> > libbpf: failed to load program 'cgroup_skb/egress'
> > libbpf: failed to load object 'hbm_kern.o'
> > Error: failed to load object file
> >
> >
> > My environment: 5.3-rc3 / net-next master (both producing the error).
> > Libbpf and bpftool installed from the kernel source (cleaned and
> > reinstalled when I tried a new kernel). I compiled the program with
> > Clang 8, on Ubuntu 19.10 server image, the source:
> >
> > #include <linux/bpf.h>
> > #include "bpf_helpers.h"
> >
> > #define DROP_PKT        0
> > #define ALLOW_PKT       1
> > #define REDUCE_CW       2
> >
> > SEC("cgroup_skb/egress")
> > int hbm(struct __sk_buff *skb)
> > {
> >         return ALLOW_PKT | REDUCE_CW;
> > }
> > char _license[] SEC("license") = "GPL";
> >
> >
> > I also tried to trace down the bug with gdb. It seems like the
> > section_names array in libbpf.c filled with garbage, especially the
>
> I did the same, section_names appears to be correct, not sure what was
> going on in your case. The problem is that "cgroup/skb", which you
> provided on command line, overrides this section name and forces
> bpftool to guess program type as BPF_CGROUP_INET_INGRESS, which
> restricts return codes to just 0 or 1, while for
> BPF_CGROUP_INET_EGRESS i is [0, 3].
>
> > expected_attach_type fields (in my case, this contains
> > BPF_CGROUP_INET_INGRESS instead of BPF_CGROUP_INET_EGRESS).
> >
> > Thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox