Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net: aquantia: fix unsigned numvecs comparison with less than zero
From: Igor Russkikh @ 2018-06-07 21:54 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Pavel Belous, Colin Ian King, Igor Russkikh

From: Colin Ian King <colin.king@canonical.com>

From: Colin Ian King <colin.king@canonical.com>

This was originally mistakenly submitted to net-next. Resubmitting to net.

The comparison of numvecs < 0 is always false because numvecs is a u32
and hence the error return from a failed call to pci_alloc_irq_vectores
is never detected.  Fix this by using the signed int ret to handle the
error return and assign numvecs to err.

Detected by CoverityScan, CID#1468650 ("Unsigned compared against 0")

Fixes: a09bd81b5413 ("net: aquantia: Limit number of vectors to actually allocated irqs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index a50e08bb4748..750007513f9d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -267,14 +267,13 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	numvecs = min(numvecs, num_online_cpus());
 	/*enable interrupts */
 #if !AQ_CFG_FORCE_LEGACY_INT
-	numvecs = pci_alloc_irq_vectors(self->pdev, 1, numvecs,
-					PCI_IRQ_MSIX | PCI_IRQ_MSI |
-					PCI_IRQ_LEGACY);
+	err = pci_alloc_irq_vectors(self->pdev, 1, numvecs,
+				    PCI_IRQ_MSIX | PCI_IRQ_MSI |
+				    PCI_IRQ_LEGACY);
 
-	if (numvecs < 0) {
-		err = numvecs;
+	if (err < 0)
 		goto err_hwinit;
-	}
+	numvecs = err;
 #endif
 	self->irqvecs = numvecs;
 
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
From: Cong Wang @ 2018-06-07 21:52 UTC (permalink / raw)
  To: Ben Greear; +Cc: Linux Kernel Network Developers
In-Reply-To: <6dd70f92-f026-0b2a-9aed-91f303ee4335@candelatech.com>

On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 06/07/2018 02:29 PM, Cong Wang wrote:
>>
>> On Thu, Jun 7, 2018 at 9:06 AM,  <greearb@candelatech.com> wrote:
>>>
>>> --- a/include/net/fq_impl.h
>>> +++ b/include/net/fq_impl.h
>>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>
>>>         flow = list_first_entry(head, struct fq_flow, flowchain);
>>>
>>> +       if (WARN_ON_ONCE(!flow))
>>> +               return NULL;
>>> +
>>
>>
>> How could even possibly list_first_entry() returns NULL?
>> You need list_first_entry_or_null().
>>
>
> I don't know for certain flow as null, but something was NULL in this method
> near that line and it looked like a likely culprit.
>
> I guess possibly tin or fq was passed in as NULL?

A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.

^ permalink raw reply

* Re: linux-next: Please add mlx5-next tree
From: Stephen Rothwell @ 2018-06-07 21:47 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Saeed Mahameed, Jason Gunthorpe, Doug Ledford, David S. Miller,
	linux-netdev, RDMA mailing list
In-Reply-To: <20180606192500.GZ2958@mtr-leonro.mtl.com>

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

Hi Leon,

On Wed, 6 Jun 2018 22:25:00 +0300 Leon Romanovsky <leon@kernel.org> wrote:
>
> Can you please add the branch "mlx5-next" from the following tree to the
> linux-next integration tree?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/
> 
> The mlx5-next branch is used to send shared pull requests to both netdev
> and RDMA subsystems and it is worth to have linux-next coverage on it
> before.

I try hard not to add new trees during the merge window, sorry.  If I
forget to add it after -rc1 has been released, please remind me.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [Patch net] socket: close race condition between sock_close() and sockfs_setattr()
From: Cong Wang @ 2018-06-07 21:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Linux Kernel Network Developers, shankarapailoor, Tetsuo Handa,
	Lorenzo Colitti
In-Reply-To: <20180607212631.GW30522@ZenIV.linux.org.uk>

On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
>> fchownat() doesn't even hold refcnt of fd until it figures out
>> fd is really needed (otherwise is ignored) and releases it after
>> it resolves the path. This means sock_close() could race with
>> sockfs_setattr(), which leads to a NULL pointer dereference
>> since typically we set sock->sk to NULL in ->release().
>>
>> As pointed out by Al, this is unique to sockfs. So we can fix this
>> in socket layer by acquiring inode_lock in sock_close() and
>> checking against NULL in sockfs_setattr().
>
> That looks like a massive overkill - it's way heavier than it should be.

I don't see any other quick way to fix this. My initial thought is
to keep that refcnt until path_put(), apparently you don't like it
either.


> And it's very likely to trigger shitloads of deadlock warnings, some
> possibly even true.

I do audit the inode_lock usage in networking, I don't see any
deadlock, of course, there could be some non-networking code
uses socket API that I missed. But _generally_, socket doesn't
have a pointer to retrieve this inode.

^ permalink raw reply

* Re: [PATCH] selftests: bpf: fix urandom_read build issue
From: Anders Roxell @ 2018-06-07 21:43 UTC (permalink / raw)
  To: Y Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Shuah Khan, netdev, LKML,
	linux-kselftest
In-Reply-To: <CAH3MdRXMHRtK9FJ4UP8Gy-bahCS2oipOY7yLTEqu-x6uOMKLvw@mail.gmail.com>

On 7 June 2018 at 23:17, Y Song <ys114321@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 12:07 PM, Anders Roxell <anders.roxell@linaro.org> wrote:
>> On 7 June 2018 at 19:52, Y Song <ys114321@gmail.com> wrote:
>>> On Thu, Jun 7, 2018 at 3:57 AM, Anders Roxell <anders.roxell@linaro.org> wrote:
>>>> gcc complains that urandom_read gets built twice.
>>>>
>>>> gcc -o tools/testing/selftests/bpf/urandom_read
>>>> -static urandom_read.c -Wl,--build-id
>>>> gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf
>>>> -I../../../../include/generated  -I../../../include    urandom_read.c
>>>> urandom_read -lcap -lelf -lrt -lpthread -o
>>>> tools/testing/selftests/bpf/urandom_read
>>>> gcc: fatal error: input file
>>>> ‘tools/testing/selftests/bpf/urandom_read’ is the
>>>> same as output file
>>>> compilation terminated.
>>>> ../lib.mk:110: recipe for target
>>>> 'tools/testing/selftests/bpf/urandom_read' failed
>>>
>>> What is the build/make command to reproduce the above failure?
>>
>> make -C tools/testing/selftests
>
> Thanks. The patch will break
>    make -C tools/testing/selftests/bpf
>
> [yhs@localhost bpf-next]$ make -C tools/testing/selftests/bpf
> make: Entering directory '/home/yhs/work/bpf-next/tools/testing/selftests/bpf'
> gcc -o /urandom_read -static urandom_read.c -Wl,--build-id
> /usr/bin/ld: cannot open output file /urandom_read: Permission denied
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:20: /urandom_read] Error 1
> make: Leaving directory '/home/yhs/work/bpf-next/tools/testing/selftests/bpf'
> [yhs@localhost bpf-next]$

urgh, I'm sorry, missed that.

>
> Could you still make the above command work?

$(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
        $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id

That worked both with:
make -C tools/testing/selftests
and
make -C tools/testing/selftests/bpf

for me.

what do you think?

>
>>
>> Cheers,
>> Anders
>>
>>>
>>>> To fix this issue remove the urandom_read target and so target
>>>> TEST_CUSTOM_PROGS gets used.
>>>>
>>>> Fixes: 81f77fd0deeb ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
>>>> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
>>>> ---
>>>>  tools/testing/selftests/bpf/Makefile | 6 ++----
>>>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>>> index 607ed8729c06..67285591ffd7 100644
>>>> --- a/tools/testing/selftests/bpf/Makefile
>>>> +++ b/tools/testing/selftests/bpf/Makefile
>>>> @@ -16,10 +16,8 @@ LDLIBS += -lcap -lelf -lrt -lpthread
>>>>  TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
>>>>  all: $(TEST_CUSTOM_PROGS)
>>>>
>>>> -$(TEST_CUSTOM_PROGS): urandom_read
>>>> -
>>>> -urandom_read: urandom_read.c
>>>> -       $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
>>>> +$(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
>>>> +       $(CC) -o $@ -static $< -Wl,--build-id
>>>>
>>>>  # Order correspond to 'make run_tests' order
>>>>  TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
>>>> --
>>>> 2.17.1
>>>>

^ permalink raw reply

* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
From: Ben Greear @ 2018-06-07 21:41 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAM_iQpXHDVkQ0sS1sOnhnp8vcLkhsZvYzycwF1Do=mydMZhvzA@mail.gmail.com>

On 06/07/2018 02:29 PM, Cong Wang wrote:
> On Thu, Jun 7, 2018 at 9:06 AM,  <greearb@candelatech.com> wrote:
>> --- a/include/net/fq_impl.h
>> +++ b/include/net/fq_impl.h
>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>
>>         flow = list_first_entry(head, struct fq_flow, flowchain);
>>
>> +       if (WARN_ON_ONCE(!flow))
>> +               return NULL;
>> +
>
> How could even possibly list_first_entry() returns NULL?
> You need list_first_entry_or_null().
>

I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.

I guess possibly tin or fq was passed in as NULL?

Anyway, if the patch seems worthless just ignore it.  I'll leave it in my tree
since it should be harmless and will let you know if I ever hit it.

If someone else hits a similar crash, hopefully they can report it.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: KMSAN: uninit-value in ebt_stp_mt_check (2)
From: syzbot @ 2018-06-07 21:40 UTC (permalink / raw)
  To: bridge, coreteam, davem, dvyukov, fw, kadlec, linux-kernel,
	netdev, netfilter-devel, pablo, stephen, syzkaller-bugs
In-Reply-To: <0000000000009a3b5b056e109e42@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    c6a6aed994b6 kmsan: remove dead code to trigger syzbot build
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=122606bf800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=848e40757852af3e
dashboard link: https://syzkaller.appspot.com/bug?extid=da4494182233c23a5fcf
compiler:       clang version 7.0.0 (trunk 334104)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=121f481f800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=121fbbd7800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450  
net/bridge/netfilter/ebt_stp.c:162
CPU: 1 PID: 4523 Comm: syz-executor710 Not tainted 4.17.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
  __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:620
  ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162
  xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506
  ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline]
  ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline]
  translate_table+0x4e88/0x6120 net/bridge/netfilter/ebtables.c:943
  do_replace_finish+0x1258/0x2ea0 net/bridge/netfilter/ebtables.c:999
  do_replace+0x719/0x780 net/bridge/netfilter/ebtables.c:1138
  do_ebt_set_ctl+0x2ab/0x3c0 net/bridge/netfilter/ebtables.c:1517
  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
  nf_setsockopt+0x47c/0x4e0 net/netfilter/nf_sockopt.c:115
  ip_setsockopt+0x24b/0x2b0 net/ipv4/ip_sockglue.c:1251
  udp_setsockopt+0x108/0x1b0 net/ipv4/udp.c:2416
  sock_common_setsockopt+0x13b/0x170 net/core/sock.c:3039
  __sys_setsockopt+0x496/0x540 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
  do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43fd89
RSP: 002b:00007ffc9d7edb28 EFLAGS: 00000213 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd89
RDX: 0000000000000080 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000300 R09: 00000000004002c8
R10: 0000000020000480 R11: 0000000000000213 R12: 00000000004016b0
R13: 0000000000401740 R14: 0000000000000000 R15: 0000000000000000

Local variable description: ----mtpar.i@translate_table
Variable was created at:
  translate_table+0xbb/0x6120 net/bridge/netfilter/ebtables.c:831
  do_replace_finish+0x1258/0x2ea0 net/bridge/netfilter/ebtables.c:999
==================================================================

^ permalink raw reply

* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
From: Cong Wang @ 2018-06-07 21:29 UTC (permalink / raw)
  To: Ben Greear; +Cc: Linux Kernel Network Developers
In-Reply-To: <1528387585-5806-1-git-send-email-greearb@candelatech.com>

On Thu, Jun 7, 2018 at 9:06 AM,  <greearb@candelatech.com> wrote:
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>
>         flow = list_first_entry(head, struct fq_flow, flowchain);
>
> +       if (WARN_ON_ONCE(!flow))
> +               return NULL;
> +

How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().

^ permalink raw reply

* Re: [Patch net] socket: close race condition between sock_close() and sockfs_setattr()
From: Al Viro @ 2018-06-07 21:26 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, shankarapailoor, Tetsuo Handa, Lorenzo Colitti
In-Reply-To: <20180607203949.16945-1-xiyou.wangcong@gmail.com>

On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
> fchownat() doesn't even hold refcnt of fd until it figures out
> fd is really needed (otherwise is ignored) and releases it after
> it resolves the path. This means sock_close() could race with
> sockfs_setattr(), which leads to a NULL pointer dereference
> since typically we set sock->sk to NULL in ->release().
> 
> As pointed out by Al, this is unique to sockfs. So we can fix this
> in socket layer by acquiring inode_lock in sock_close() and
> checking against NULL in sockfs_setattr().

That looks like a massive overkill - it's way heavier than it should be.
And it's very likely to trigger shitloads of deadlock warnings, some
possibly even true.

^ permalink raw reply

* Re: [PATCH] selftests: bpf: fix urandom_read build issue
From: Y Song @ 2018-06-07 21:17 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Alexei Starovoitov, Daniel Borkmann, Shuah Khan, netdev, LKML,
	linux-kselftest
In-Reply-To: <CADYN=9J3wY1f6q4NvH1mHJXnu6GnErOqz75CfzfG1q5-WzcyCg@mail.gmail.com>

On Thu, Jun 7, 2018 at 12:07 PM, Anders Roxell <anders.roxell@linaro.org> wrote:
> On 7 June 2018 at 19:52, Y Song <ys114321@gmail.com> wrote:
>> On Thu, Jun 7, 2018 at 3:57 AM, Anders Roxell <anders.roxell@linaro.org> wrote:
>>> gcc complains that urandom_read gets built twice.
>>>
>>> gcc -o tools/testing/selftests/bpf/urandom_read
>>> -static urandom_read.c -Wl,--build-id
>>> gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf
>>> -I../../../../include/generated  -I../../../include    urandom_read.c
>>> urandom_read -lcap -lelf -lrt -lpthread -o
>>> tools/testing/selftests/bpf/urandom_read
>>> gcc: fatal error: input file
>>> ‘tools/testing/selftests/bpf/urandom_read’ is the
>>> same as output file
>>> compilation terminated.
>>> ../lib.mk:110: recipe for target
>>> 'tools/testing/selftests/bpf/urandom_read' failed
>>
>> What is the build/make command to reproduce the above failure?
>
> make -C tools/testing/selftests

Thanks. The patch will break
   make -C tools/testing/selftests/bpf

[yhs@localhost bpf-next]$ make -C tools/testing/selftests/bpf
make: Entering directory '/home/yhs/work/bpf-next/tools/testing/selftests/bpf'
gcc -o /urandom_read -static urandom_read.c -Wl,--build-id
/usr/bin/ld: cannot open output file /urandom_read: Permission denied
collect2: error: ld returned 1 exit status
make: *** [Makefile:20: /urandom_read] Error 1
make: Leaving directory '/home/yhs/work/bpf-next/tools/testing/selftests/bpf'
[yhs@localhost bpf-next]$

Could you still make the above command work?

>
> Cheers,
> Anders
>
>>
>>> To fix this issue remove the urandom_read target and so target
>>> TEST_CUSTOM_PROGS gets used.
>>>
>>> Fixes: 81f77fd0deeb ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
>>> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
>>> ---
>>>  tools/testing/selftests/bpf/Makefile | 6 ++----
>>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>> index 607ed8729c06..67285591ffd7 100644
>>> --- a/tools/testing/selftests/bpf/Makefile
>>> +++ b/tools/testing/selftests/bpf/Makefile
>>> @@ -16,10 +16,8 @@ LDLIBS += -lcap -lelf -lrt -lpthread
>>>  TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
>>>  all: $(TEST_CUSTOM_PROGS)
>>>
>>> -$(TEST_CUSTOM_PROGS): urandom_read
>>> -
>>> -urandom_read: urandom_read.c
>>> -       $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
>>> +$(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
>>> +       $(CC) -o $@ -static $< -Wl,--build-id
>>>
>>>  # Order correspond to 'make run_tests' order
>>>  TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
>>> --
>>> 2.17.1
>>>

^ permalink raw reply

* Re: [PATCH net-next] bpfilter: fix OUTPUT_FORMAT
From: David Miller @ 2018-06-07 20:57 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, kernel-team
In-Reply-To: <20180607172929.3154049-1-ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 7 Jun 2018 10:29:29 -0700

> From: Alexei Starovoitov <ast@fb.com>
> 
> CONFIG_OUTPUT_FORMAT is x86 only macro.
> Used objdump to extract elf file format.
> 
> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> Reported-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] umh: fix race condition
From: David Miller @ 2018-06-07 20:57 UTC (permalink / raw)
  To: ast; +Cc: daniel, mcgrof, netdev, dvyukov, linux-kernel, kernel-team
In-Reply-To: <20180607172310.3121039-1-ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 7 Jun 2018 10:23:10 -0700

> kasan reported use-after-free:
> BUG: KASAN: use-after-free in call_usermodehelper_exec_work+0x2d3/0x310 kernel/umh.c:195
> Write of size 4 at addr ffff8801d9202370 by task kworker/u4:2/50
> Workqueue: events_unbound call_usermodehelper_exec_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
>  __asan_report_store4_noabort+0x17/0x20 mm/kasan/report.c:437
>  call_usermodehelper_exec_work+0x2d3/0x310 kernel/umh.c:195
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:240
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> 
> The reason is that 'sub_info' cannot be accessed out of parent task
> context, since it will be freed by the child.
> Instead remember the pid in the child task.
> 
> Fixes: 449325b52b7a ("umh: introduce fork_usermode_blob() helper")
> Reported-by: syzbot+2c73319c406f1987d156@syzkaller.appspotmail.com
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: mscc: ocelot: Fix uninitialized error in ocelot_netdevice_event()
From: David Miller @ 2018-06-07 20:55 UTC (permalink / raw)
  To: geert; +Cc: alexandre.belloni, andrew, arnd, netdev, linux-kernel
In-Reply-To: <1528377030-28319-1-git-send-email-geert@linux-m68k.org>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Thu,  7 Jun 2018 15:10:30 +0200

> With gcc-4.1.2:
> 
>     drivers/net/ethernet/mscc/ocelot.c: In function ‘ocelot_netdevice_event’:
>     drivers/net/ethernet/mscc/ocelot.c:1129: warning: ‘ret’ may be used uninitialized in this function
> 
> If the list iterated over by netdev_for_each_lower_dev() is empty, ret
> is never initialized, and converted into a notifier return value.
> 
> Fix this by preinitializing ret to zero.
> 
> Fixes: a556c76adc052c97 ("net: mscc: Add initial Ocelot switch support")
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
> ---
> This may be unlikely to happen, but given the notifier is called for any
> (also non-ocelot) network device, better be safe than sorry.

Applied.

^ permalink raw reply

* Re: [PATCH net] bonding: re-evaluate force_primary when the primary slave name changes
From: David Miller @ 2018-06-07 20:54 UTC (permalink / raw)
  To: yuxiangning; +Cc: netdev
In-Reply-To: <A977208F-4266-411B-8DA5-0C6901F201E7@gmail.com>

From: Xiangning Yu <yuxiangning@gmail.com>
Date: Thu, 7 Jun 2018 13:39:59 +0800

> From: Xiangning Yu <yuxiangning@gmail.com>
> 
> There is a timing issue under active-standy mode, when bond_enslave() is
> called, bond->params.primary might not be initialized yet.
> 
> Any time the primary slave string changes, bond->force_primary should be
> set to true to make sure the primary becomes the active slave.
> 
> Signed-off-by: Xiangning Yu <yuxiangning@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* [Patch net] socket: close race condition between sock_close() and sockfs_setattr()
From: Cong Wang @ 2018-06-07 20:39 UTC (permalink / raw)
  To: netdev; +Cc: shankarapailoor, Cong Wang, Tetsuo Handa, Lorenzo Colitti,
	Al Viro

fchownat() doesn't even hold refcnt of fd until it figures out
fd is really needed (otherwise is ignored) and releases it after
it resolves the path. This means sock_close() could race with
sockfs_setattr(), which leads to a NULL pointer dereference
since typically we set sock->sk to NULL in ->release().

As pointed out by Al, this is unique to sockfs. So we can fix this
in socket layer by acquiring inode_lock in sock_close() and
checking against NULL in sockfs_setattr().

sock_release() is called in many places, only the sock_close()
path matters here. And fortunately, this should not affect normal
sock_close() as it is only called when the last fd refcnt is gone.
It only affects sock_close() with a parallel sockfs_setattr() in
progress, which is not common.

Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
Reported-by: shankarapailoor <shankarapailoor@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/socket.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index af57d85bcb48..8a109012608a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -541,7 +541,10 @@ static int sockfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	if (!err && (iattr->ia_valid & ATTR_UID)) {
 		struct socket *sock = SOCKET_I(d_inode(dentry));
 
-		sock->sk->sk_uid = iattr->ia_uid;
+		if (sock->sk)
+			sock->sk->sk_uid = iattr->ia_uid;
+		else
+			err = -ENOENT;
 	}
 
 	return err;
@@ -590,12 +593,16 @@ EXPORT_SYMBOL(sock_alloc);
  *	an inode not a file.
  */
 
-void sock_release(struct socket *sock)
+static void __sock_release(struct socket *sock, struct inode *inode)
 {
 	if (sock->ops) {
 		struct module *owner = sock->ops->owner;
 
+		if (inode)
+			inode_lock(inode);
 		sock->ops->release(sock);
+		if (inode)
+			inode_unlock(inode);
 		sock->ops = NULL;
 		module_put(owner);
 	}
@@ -609,6 +616,11 @@ void sock_release(struct socket *sock)
 	}
 	sock->file = NULL;
 }
+
+void sock_release(struct socket *sock)
+{
+	__sock_release(sock, NULL);
+}
 EXPORT_SYMBOL(sock_release);
 
 void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags)
@@ -1171,7 +1183,7 @@ static int sock_mmap(struct file *file, struct vm_area_struct *vma)
 
 static int sock_close(struct inode *inode, struct file *filp)
 {
-	sock_release(SOCKET_I(inode));
+	__sock_release(SOCKET_I(inode), inode);
 	return 0;
 }
 
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH] ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
From: David Miller @ 2018-06-07 20:32 UTC (permalink / raw)
  To: sultanxda; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <20180606225654.7077-1-sultanxda@gmail.com>

From: Sultan Alsawaf <sultanxda@gmail.com>
Date: Wed,  6 Jun 2018 15:56:54 -0700

> By passing a limit of 2 bytes to strncat, strncat is limited to writing
> fewer bytes than what it's supposed to append to the name here.
> 
> Since the bounds are checked on the line above this, just remove the string
> bounds checks entirely since they're unneeded.
> 
> Signed-off-by: Sultan Alsawaf <sultanxda@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] hv_netvsc: Add per-cpu ethtool stats for netvsc
From: David Miller @ 2018-06-07 20:26 UTC (permalink / raw)
  To: yidren, yidren; +Cc: sthemmin, netdev, haiyangz, linux-kernel, devel
In-Reply-To: <20180606222700.24732-1-yidren@linuxonhyperv.com>

From: Yidong Ren <yidren@linuxonhyperv.com>
Date: Wed,  6 Jun 2018 15:27:00 -0700

> From: Yidong Ren <yidren@microsoft.com>
> 
> This patch implements following ethtool stats fields for netvsc:
> cpu<n>_tx/rx_packets/bytes
> cpu<n>_vf_tx/rx_packets/bytes
> 
> Corresponding per-cpu counters exist in current code. Exposing these
> counters will help troubleshooting performance issues.
> 
> Signed-off-by: Yidong Ren <yidren@microsoft.com>

net-next is closed, please resubmit this new feature when it opens again,
also:

> +	// fetch percpu stats of vf

Please do not use c++ comments.

^ permalink raw reply

* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Arnaldo Carvalho de Melo @ 2018-06-07 20:25 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Wang Nan, Jiri Olsa, Namhyung Kim, Ingo Molnar
In-Reply-To: <20180607200701.bvsjzoq56rnggwfd@kafai-mbp>

Em Thu, Jun 07, 2018 at 01:07:01PM -0700, Martin KaFai Lau escreveu:
> On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > So this must be available in a newer llvm version? Which one?

> I should have put in the details in my last email or
> in the commit message, my bad.
 
> 1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
>    LLC_FLAGS needed to compile the bpf prog.  It requires a new
>    "-mattr=dwarf" llc option which was added to the future
>    llvm 7.0.
 
>    Hence, I have been using the llvm's master in github which
>    also has the llvm-objcopy.
 
> 2. The kernel's btf part only focus on the BPF map.
>    Hence, the testing bpf program should have the map's key
>    and map's value.  e.g. tools/testing/selftests/bpf/test_btf_haskv.c

Thanks for the version required to test this, but where is this
test_btf_haskv.c file? Which tree? net-next?

Ok, just pulled torvalds/master and there it is. Gotcha.

struct bpf_map_def SEC("maps") __bpf_stdout__ = {
       .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       .key_size = sizeof(int),
       .value_size = sizeof(u32),
       .max_entries = __NR_CPUS__,
};

This map is in the above hello.c example, but I guess its way too simple
:-)

Ok, I'll test this at home in another machine where I have the llvm's
git repo.

- Arnaldo

^ permalink raw reply

* Re: [PATCH v2] hv_netvsc: Fix a network regression after ifdown/ifup
From: David Miller @ 2018-06-07 20:24 UTC (permalink / raw)
  To: decui
  Cc: jopoulso, olaf, sthemmin, netdev, haiyangz, linux-kernel,
	marcelo.cerri, apw, devel, vkuznets, jasowang, Stephen.Zarkos
In-Reply-To: <KL1P15301MB00062D55F01921B958359A55BF650@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 6 Jun 2018 21:32:51 +0000

> 
> Recently people reported the NIC stops working after
> "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> enabled, after the refactoring of the common detach logic: when the NIC
> has sub-channels, usually we enable all the TX queues after all
> sub-channels are set up: see rndis_set_subchannel() ->
> netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> the number of channels doesn't change, we also must make sure the TX queues
> are enabled. The patch fixes the regression.
> 
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan
From: David Miller @ 2018-06-07 20:22 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, rppt, herbert, anton.ivanov, willemb
In-Reply-To: <20180606152301.33240-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed,  6 Jun 2018 11:23:01 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
> to communicate packet metadata to userspace.
> 
> For skbuffs with vlan, the first two return the packet as it may have
> existed on the wire, inserting the VLAN tag in the user buffer.  Then
> virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
> 
> Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
> added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
> csum_start when VLAN tags are present") then fixed up csum_start.
> 
> Virtio, packet and uml do not insert the vlan header in the user
> buffer.
> 
> When introducing virtio_net_hdr_from_skb to deduplicate filling in
> the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
> applied uniformly, breaking csum offset for packets with vlan on
> virtio and packet.
> 
> Make insertion of VLAN_HLEN optional. Convert the callers to pass it
> when needed.
> 
> Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
> Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion")
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable, thanks Willem.

^ permalink raw reply

* next-20180605 - BUG in ipv6_add_addr
From: valdis.kletnieks @ 2018-06-07 20:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

Hit the following, after which my network connection was pretty much hosed

This ring any bells? I don't have a reproducer for this, so bisecting is problematic....

[ 1820.832682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000209
[ 1820.832728] RIP: 0010:ipv6_add_addr+0x280/0xd10
[ 1820.832732] Code: 49 8b 1f 0f 84 6a 0a 00 00 48 85 db 0f 84 4e 0a 00 00 48 8b 03 48 8b 53 08 49 89 45 00 49 8b 47 10
49 89 55 08 48 85 c0 74 15 <48> 8b 50 08 48 8b 00 49 89 95 b8 01 00 00 49 89 85 b0 01 00 00 4c
[ 1820.832847] RSP: 0018:ffffaa07c2fd7880 EFLAGS: 00010202
[ 1820.832853] RAX: 0000000000000201 RBX: ffffaa07c2fd79b0 RCX: 0000000000000000
[ 1820.832858] RDX: a4cfbfba2cbfa64c RSI: 0000000000000000 RDI: ffffffff8a8e9fa0
[ 1820.832862] RBP: ffffaa07c2fd7920 R08: 000000000000017a R09: ffffffff8a555300
[ 1820.832866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888d18e71c00
[ 1820.832871] R13: ffff888d0a9b1200 R14: 0000000000000000 R15: ffffaa07c2fd7980
[ 1820.832876] FS:  00007faa51bdb800(0000) GS:ffff888d1d400000(0000) knlGS:0000000000000000
[ 1820.832880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1820.832885] CR2: 0000000000000209 CR3: 000000021e8f8001 CR4: 00000000001606e0
[ 1820.832888] Call Trace:
[ 1820.832898]  ? __local_bh_enable_ip+0x119/0x260
[ 1820.832904]  ? ipv6_create_tempaddr+0x259/0x5a0
[ 1820.832912]  ? __local_bh_enable_ip+0x139/0x260
[ 1820.832921]  ipv6_create_tempaddr+0x2da/0x5a0
[ 1820.832926]  ? ipv6_create_tempaddr+0x2da/0x5a0
[ 1820.832941]  manage_tempaddrs+0x1a5/0x240
[ 1820.832951]  inet6_addr_del+0x20b/0x3b0
[ 1820.832959]  ? nla_parse+0xce/0x1e0
[ 1820.832968]  inet6_rtm_deladdr+0xd9/0x210
[ 1820.832981]  rtnetlink_rcv_msg+0x1d4/0x5f0
[ 1820.832989]  ? netlink_deliver_tap+0xe2/0x7a0
[ 1820.832996]  ? rtnetlink_put_metrics+0x290/0x290
[ 1820.833003]  netlink_rcv_skb+0x4d/0x140
[ 1820.833012]  rtnetlink_rcv+0x15/0x20
[ 1820.833018]  netlink_unicast+0x215/0x340
[ 1820.833027]  netlink_sendmsg+0x2e3/0x6a0
[ 1820.833040]  sock_sendmsg+0x6a/0xf0
[ 1820.833047]  __sys_sendto+0x15d/0x220
[ 1820.833059]  ? __sys_recvmsg+0x51/0x90
[ 1820.833068]  ? do_syscall_64+0x30/0xee7
[ 1820.833076]  __x64_sys_sendto+0x2f/0x50
[ 1820.833082]  do_syscall_64+0x98/0xee7
[ 1820.833089]  ? trace_hardirqs_off_caller+0x1f/0xd0
[ 1820.833096]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1820.833107]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1820.833113] RIP: 0033:0x7faa50ed831d


[-- Attachment #2: Type: application/pgp-signature, Size: 486 bytes --]

^ permalink raw reply

* Re: [PATCH] netfilter: provide udp*_lib_lookup for nf_tproxy
From: David Miller @ 2018-06-07 20:14 UTC (permalink / raw)
  To: pablo
  Cc: arnd, kuznet, yoshfuji, ecklm94, pabeni, willemb, edumazet,
	dsahern, kafai, netdev, linux-kernel
In-Reply-To: <20180606133324.537naj6trahiq26f@salvia>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 6 Jun 2018 15:33:24 +0200

> BTW, could you also pick this fix for net-next?
> 
> http://patchwork.ozlabs.org/patch/924706/

Sorry, I just noticed this now.

Applied.

^ permalink raw reply

* Re: [PATCH bpf-next v5 00/10] BTF: BPF Type Format
From: Martin KaFai Lau @ 2018-06-07 20:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Wang Nan, Jiri Olsa, Namhyung Kim, Ingo Molnar
In-Reply-To: <20180607193029.GG17292@kernel.org>

On Thu, Jun 07, 2018 at 04:30:29PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 07, 2018 at 12:05:10PM -0700, Martin KaFai Lau escreveu:
> > On Thu, Jun 07, 2018 at 11:03:37AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Jun 07, 2018 at 10:54:01AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Tue, Jun 05, 2018 at 02:25:48PM -0700, Martin KaFai Lau escreveu:
> > > > > [ btw, the latest commit (1 commit) should be 94a11b59e592 ].
> 
> > > So, the commit log message for the pahole patch is non-existent:
> 
> > > https://github.com/iamkafai/pahole/commit/94a11b59e5920908085bfc8d24c92f95c8ffceaf
> 
> > > we should do better in describing what is done and how, I'm staring
> > > with a message you sent to the kernel part:
> > > --
> > > This patch introduces BPF Type Format (BTF).
> 
> > > BTF (BPF Type Format) is the meta data format which describes
> > > the data types of BPF program/map.  Hence, it basically focus
> > > on the C programming language which the modern BPF is primary
> > > using.  The first use case is to provide a generic pretty print
> > > capability for a BPF map.
> 
> > I will add details in the next github respin/push.
> 
> Ok, but I can do that if there is nothing else to do in the code at this
> stage :-)
>  
> > > Now I'm going to do the step-by-step guide on testing the feature just
> > > introduced, and will try to convert from dwarf to BTF and back, compare
> > > the pahole output for types encoded in DWARF and BTF, etc.
> 
> > > If you have something ressembling this already, please share.
> 
> > The pahole only has the encoder part.  I tested with the verbose output
> > from the "pahole -V -J".  Loading the btf to the kernel is also tested.
> 
> Ok, so here it goes my first stab at testing, using perf's BPF
> integration:
> 
> [root@jouet bpf]# cat hello.c
> #include "stdio.h"
> 
> int syscall_enter(openat)(void *ctx)
> {
> 	puts("Hello, world\n");
> 	return 0;
> }
> [root@jouet bpf]# cat ~/.perfconfig
> [llvm]
> dump-obj = true
> [root@jouet bpf]# perf trace -e open*,hello.c touch /tmp/hello.BTF
> LLVM: dumping hello.o
>      0.017 (         ): __bpf_stdout__:Hello, world
>      0.019 ( 0.011 ms): touch/28147 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
>      0.053 (         ): __bpf_stdout__:Hello, world
>      0.055 ( 0.011 ms): touch/28147 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
>      0.354 ( 0.012 ms): touch/28147 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC         ) = 3
>      0.411 (         ): __bpf_stdout__:Hello, world
>      0.412 ( 0.198 ms): touch/28147 openat(dfd: CWD, filename: /tmp/hello.BTF, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
> [root@jouet bpf]# 
> [root@jouet bpf]# file hello.o
> hello.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), not stripped
> [root@jouet bpf]# pahole --btf_encode hello.o
> pahole: hello.o: No debugging information found
> [root@jouet bpf]#
> 
> [root@jouet bpf]# readelf -s hello.o
> 
> Symbol table '.symtab' contains 5 entries:
>    Num:    Value          Size Type    Bind   Vis      Ndx Name
>      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
>      1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    7 __bpf_stdout__
>      2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    5 _license
>      3: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    6 _version
>      4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 syscall_enter_openat
> [root@jouet bpf]#
> [root@jouet bpf]# readelf -SW hello.o
> There are 10 section headers, starting at offset 0x1f8:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>   [ 1] .strtab           STRTAB          0000000000000000 000178 00007f 00      0   0  1
>   [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
>   [ 3] syscalls:sys_enter_openat PROGBITS        0000000000000000 000040 000088 00  AX  0   0  8
>   [ 4] .relsyscalls:sys_enter_openat REL             0000000000000000 000168 000010 10      9   3  8
>   [ 5] license           PROGBITS        0000000000000000 0000c8 000004 00  WA  0   0  1
>   [ 6] version           PROGBITS        0000000000000000 0000cc 000004 00  WA  0   0  4
>   [ 7] maps              PROGBITS        0000000000000000 0000d0 000010 00  WA  0   0  4
>   [ 8] .rodata.str1.1    PROGBITS        0000000000000000 0000e0 00000e 01 AMS  0   0  1
>   [ 9] .symtab           SYMTAB          0000000000000000 0000f0 000078 18      1   1  8
> Key to Flags:
>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>   L (link order), O (extra OS processing required), G (group), T (TLS),
>   C (compressed), x (unknown), o (OS specific), E (exclude),
>   p (processor specific)
> [root@jouet bpf]#
> 
> Humm, lemme try something, add -g to clang-opt:
> 
> [root@jouet bpf]# cat ~/.perfconfig
> [llvm]
> dump-obj = true
> clang-opt = -g
> [root@jouet bpf]# perf trace -e open*,hello.c touch /tmp/hello.BTF
> LLVM: dumping hello.o
>      0.185 (         ): __bpf_stdout__:Hello, world
>      0.188 ( 0.009 ms): touch/19670 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
>      0.219 (         ): __bpf_stdout__:Hello, world
>      0.220 ( 0.011 ms): touch/19670 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
>      0.480 ( 0.095 ms): touch/19670 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC         ) = 3
>      0.624 (         ): __bpf_stdout__:Hello, world
>      0.626 ( 0.011 ms): touch/19670 openat(dfd: CWD, filename: /tmp/hello.BTF, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
> [root@jouet bpf]# file hello.o
> hello.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
> [root@jouet bpf]#
> 
> Much better:
> 
> [root@jouet bpf]# readelf -SW hello.o
> There are 25 section headers, starting at offset 0xc70:
> 
> Section Headers:
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>   [ 1] .strtab           STRTAB          0000000000000000 000b50 000119 00      0   0  1
>   [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
>   [ 3] syscalls:sys_enter_openat PROGBITS        0000000000000000 000040 000088 00  AX  0   0  8
>   [ 4] .relsyscalls:sys_enter_openat REL             0000000000000000 000910 000010 10     24   3  8
>   [ 5] license           PROGBITS        0000000000000000 0000c8 000004 00  WA  0   0  1
>   [ 6] version           PROGBITS        0000000000000000 0000cc 000004 00  WA  0   0  4
>   [ 7] maps              PROGBITS        0000000000000000 0000d0 000010 00  WA  0   0  4
>   [ 8] .rodata.str1.1    PROGBITS        0000000000000000 0000e0 00000e 01 AMS  0   0  1
>   [ 9] .debug_str        PROGBITS        0000000000000000 0000ee 00010e 01  MS  0   0  1
>   [10] .debug_loc        PROGBITS        0000000000000000 0001fc 000023 00      0   0  1
>   [11] .debug_abbrev     PROGBITS        0000000000000000 00021f 0000e3 00      0   0  1
>   [12] .debug_info       PROGBITS        0000000000000000 000302 00015e 00      0   0  1
>   [13] .rel.debug_info   REL             0000000000000000 000920 0001e0 10     24  12  8
>   [14] .debug_ranges     PROGBITS        0000000000000000 000460 000030 00      0   0  1
>   [15] .debug_macinfo    PROGBITS        0000000000000000 000490 000001 00      0   0  1
>   [16] .debug_pubnames   PROGBITS        0000000000000000 000491 00006e 00      0   0  1
>   [17] .rel.debug_pubnames REL             0000000000000000 000b00 000010 10     24  16  8
>   [18] .debug_pubtypes   PROGBITS        0000000000000000 0004ff 00005a 00      0   0  1
>   [19] .rel.debug_pubtypes REL             0000000000000000 000b10 000010 10     24  18  8
>   [20] .debug_frame      PROGBITS        0000000000000000 000560 000028 00      0   0  8
>   [21] .rel.debug_frame  REL             0000000000000000 000b20 000020 10     24  20  8
>   [22] .debug_line       PROGBITS        0000000000000000 000588 00006e 00      0   0  1
>   [23] .rel.debug_line   REL             0000000000000000 000b40 000010 10     24  22  8
>   [24] .symtab           SYMTAB          0000000000000000 0005f8 000318 18      1  29  8
> Key to Flags:
>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>   L (link order), O (extra OS processing required), G (group), T (TLS),
>   C (compressed), x (unknown), o (OS specific), E (exclude),
>   p (processor specific)
> [root@jouet bpf]# 
> 
> [root@jouet bpf]# readelf -s hello.o
> 
> Symbol table '.symtab' contains 33 entries:
>    Num:    Value          Size Type    Bind   Vis      Ndx Name
>      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
>      1: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    9 
>      2: 000000000000002d     0 NOTYPE  LOCAL  DEFAULT    9 
>      3: 0000000000000044     0 NOTYPE  LOCAL  DEFAULT    9 
>      4: 0000000000000053     0 NOTYPE  LOCAL  DEFAULT    9 
>      5: 000000000000005c     0 NOTYPE  LOCAL  DEFAULT    9 
>      6: 0000000000000061     0 NOTYPE  LOCAL  DEFAULT    9 
>      7: 000000000000006a     0 NOTYPE  LOCAL  DEFAULT    9 
>      8: 0000000000000073     0 NOTYPE  LOCAL  DEFAULT    9 
>      9: 0000000000000077     0 NOTYPE  LOCAL  DEFAULT    9 
>     10: 0000000000000086     0 NOTYPE  LOCAL  DEFAULT    9 
>     11: 000000000000008b     0 NOTYPE  LOCAL  DEFAULT    9 
>     12: 0000000000000098     0 NOTYPE  LOCAL  DEFAULT    9 
>     13: 00000000000000a1     0 NOTYPE  LOCAL  DEFAULT    9 
>     14: 00000000000000ac     0 NOTYPE  LOCAL  DEFAULT    9 
>     15: 00000000000000b8     0 NOTYPE  LOCAL  DEFAULT    9 
>     16: 00000000000000c4     0 NOTYPE  LOCAL  DEFAULT    9 
>     17: 00000000000000d6     0 NOTYPE  LOCAL  DEFAULT    9 
>     18: 00000000000000e8     0 NOTYPE  LOCAL  DEFAULT    9 
>     19: 00000000000000fd     0 NOTYPE  LOCAL  DEFAULT    9 
>     20: 0000000000000101     0 NOTYPE  LOCAL  DEFAULT    9 
>     21: 0000000000000107     0 NOTYPE  LOCAL  DEFAULT    9 
>     22: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
>     23: 0000000000000000     0 SECTION LOCAL  DEFAULT   10 
>     24: 0000000000000000     0 SECTION LOCAL  DEFAULT   11 
>     25: 0000000000000000     0 SECTION LOCAL  DEFAULT   12 
>     26: 0000000000000000     0 SECTION LOCAL  DEFAULT   14 
>     27: 0000000000000000     0 SECTION LOCAL  DEFAULT   20 
>     28: 0000000000000000     0 SECTION LOCAL  DEFAULT   22 
>     29: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    7 __bpf_stdout__
>     30: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    5 _license
>     31: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    6 _version
>     32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 syscall_enter_openat
> [root@jouet bpf]# 
> 
> Progress:
> 
> [root@jouet bpf]# pahole --btf_encode hello.o
> sh: llvm-objcopy: command not found
> Failed to encode BTF
> [root@jouet bpf]# 
> 
> [acme@jouet perf]$ llvm-
> llvm-ar          llvm-cov         llvm-dis         llvm-lib         llvm-modextract  llvm-profdata    llvm-split
> llvm-as          llvm-c-test      llvm-dlltool     llvm-link        llvm-mt          llvm-ranlib      llvm-stress
> llvm-bcanalyzer  llvm-cvtres      llvm-dsymutil    llvm-lto         llvm-nm          llvm-readelf     llvm-strings
> llvm-cat         llvm-cxxdump     llvm-dwarfdump   llvm-lto2        llvm-objdump     llvm-readobj     llvm-symbolizer
> llvm-config      llvm-cxxfilt     llvm-dwp         llvm-mc          llvm-opt-report  llvm-rtdyld      llvm-tblgen
> llvm-config-64   llvm-diff        llvm-extract     llvm-mcmarkup    llvm-pdbutil     llvm-size        llvm-xray
> [acme@jouet perf]$ llvm-
> 
> So this must be available in a newer llvm version? Which one?
I should have put in the details in my last email or
in the commit message, my bad.

1. The tools/testing/selftests/bpf/Makefile has the CLANG_FLAGS and
   LLC_FLAGS needed to compile the bpf prog.  It requires a new
   "-mattr=dwarf" llc option which was added to the future
   llvm 7.0.

   Hence, I have been using the llvm's master in github which
   also has the llvm-objcopy.

2. The kernel's btf part only focus on the BPF map.
   Hence, the testing bpf program should have the map's key
   and map's value.  e.g. tools/testing/selftests/bpf/test_btf_haskv.c

> 
> [root@jouet bpf]# clang -v
> clang version 5.0.1 (tags/RELEASE_501/final)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7
> Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/7
> Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: .;@m64
> [root@jouet bpf]#
> 
> [acme@jouet perf]$ rpm -q clang llvm
> clang-5.0.1-5.fc27.x86_64
> llvm-5.0.1-6.fc27.x86_64
> [acme@jouet perf]$ 
> [root@jouet bpf]# clang -v
> clang version 5.0.1 (tags/RELEASE_501/final)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7
> Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/7
> Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/7
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: .;@m64
> [root@jouet bpf]#
> 
> [acme@jouet perf]$ rpm -q clang llvm
> clang-5.0.1-5.fc27.x86_64
> llvm-5.0.1-6.fc27.x86_64
> [acme@jouet perf]$ 
> 
> - Arnaldo

^ permalink raw reply

* Re: INFO: task hung in ip6gre_exit_batch_net
From: Kirill Tkhai @ 2018-06-07 19:59 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Christian Brauner, David Miller, David Ahern,
	Florian Westphal, Jiri Benc, LKML, Xin Long, mschiffer, netdev,
	syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <CACT4Y+b8FkwFe1PCEVDZcTufB4HKVy+ekk=1_YcFH6-RoBGY-w@mail.gmail.com>

On 07.06.2018 22:03, Dmitry Vyukov wrote:
> On Thu, Jun 7, 2018 at 8:54 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>>>>>> Hi, Dmirty!
>>>>>>
>>>>>> On 04.06.2018 18:22, Dmitry Vyukov wrote:
>>>>>>> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
>>>>>>> <syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot found the following crash on:
>>>>>>>>
>>>>>>>> HEAD commit:    bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>>>>>>>> git tree:       upstream
>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>>>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>>>>>>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>>>>>>>
>>>>>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>>>>> Reported-by: syzbot+bf78a74f82c1cf19069e@syzkaller.appspotmail.com
>>>>>>>
>>>>>>> Another hang on rtnl lock:
>>>>>>>
>>>>>>> #syz dup: INFO: task hung in netdev_run_todo
>>>>>>>
>>>>>>> May be related to "unregister_netdevice: waiting for DEV to become free":
>>>>>>> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
>>>>>
>>>>> netdev_wait_allrefs does not hold rtnl lock during waiting, so it must
>>>>> be something different.
>>>>>
>>>>>
>>>>>>> Any other explanations for massive hangs on rtnl lock for minutes?
>>>>>>
>>>>>> To exclude the situation, when a task exists with rtnl_mutex held:
>>>>>>
>>>>>> would the pr_warn() from print_held_locks_bug() be included in the console output
>>>>>> if they appear?
>>>>>
>>>>> Yes, everything containing "WARNING:" is detected as bug.
>>>>
>>>> OK, then dead task not releasing the lock is excluded.
>>>>
>>>> One more assumption: someone corrupted memory around rtnl_mutex and it looks like locked.
>>>> (I track lockdep "(rtnl_mutex){+.+.}" prints in initial message as "nobody owns rtnl_mutex").
>>>> There may help a crash dump of the VM.
>>>
>>> I can't find any legend for these +'s and .'s, but {+.+.} is present
>>> in large amounts in just any task hung report for different mutexes,
>>> so I would not expect that it means corruption.
>>>
>>> Are dozens of known corruptions that syzkaller can trigger. But
>>> usually they are reliably caught by KASAN. If any of them would lead
>>> to silent memory corruption, we would got dozens of assorted crashes
>>> throughout the kernel. We've seen that at some points, but not
>>> recently. So I would assume that memory is not corrupted in all these
>>> cases:
>>> https://syzkaller.appspot.com/bug?id=2503c576cabb08d41812e732b390141f01a59545
>>
>> This BUG clarifies the {+.+.}:
>>
>> 4 locks held by kworker/0:145/381:
>>  #0:  ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] work_static include/linux/workqueue.h:198 [inline]
>>  #0:  ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] set_work_data kernel/workqueue.c:619 [inline]
>>  #0:  ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline]
>>  #0:  ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
>>  #1:  ((work_completion)(&data->destroy_work)){+.+.}, at: [<00000000bbdd2115>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
>>  #2:  (rtnl_mutex){+.+.}, at: [<000000009c9d14f8>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>>  #3:  (rcu_sched_state.exp_mutex){+.+.}, at: [<000000001ba1a807>] exp_funnel_lock kernel/rcu/tree_exp.h:272 [inline]
>>  #3:  (rcu_sched_state.exp_mutex){+.+.}, at: [<000000001ba1a807>] _synchronize_rcu_expedited.constprop.72+0x9fa/0xac0 kernel/rcu/tree_exp.h:596
>>
>> There we have rtnl_mutex locked and the {..} is like above. It's definitely locked
>> since there is one more lock after it.
>>
>> This BUG happen because of there are many rtnl_mutex waiters while owner
>> is synchronizing RCU. Rather clear for me in comparison to the topic's hung.
> 
> 
> You mean that it's not hanged, but rather needs more than 2 minutes to
> complete, right?

Yeah, I think, this is possible. I've seen the situations like that.
Let synchronize_rcu_expedited() is executed for X seconds. Then,
it's need just 120/x calls of "this function under rtnl_mutex" to make
a soft lockup of someone else who wants the mutex too.

Also, despite the CFS is fair scheduler, in case of the calls are
made from workqueue, every work will cause sleep. So, every work
will be executed in separate worker task. Every worker will haved its
own time slice. This increases the probability these tasks will
take cpu time before the task in the header of the hang.

Kirill

^ permalink raw reply

* Re: [PATCH bpf] bpf: reject passing modified ctx to helper functions
From: Alexei Starovoitov @ 2018-06-07 19:43 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev, ecree
In-Reply-To: <20180607154003.5368-1-daniel@iogearbox.net>

On Thu, Jun 07, 2018 at 05:40:03PM +0200, Daniel Borkmann wrote:
> As commit 28e33f9d78ee ("bpf: disallow arithmetic operations on
> context pointer") already describes, f1174f77b50c ("bpf/verifier:
> rework value tracking") removed the specific white-listed cases
> we had previously where we would allow for pointer arithmetic in
> order to further generalize it, and allow e.g. context access via
> modified registers. While the dereferencing of modified context
> pointers had been forbidden through 28e33f9d78ee, syzkaller did
> recently manage to trigger several KASAN splats for slab out of
> bounds access and use after frees by simply passing a modified
> context pointer to a helper function which would then do the bad
> access since verifier allowed it in adjust_ptr_min_max_vals().
> 
> Rejecting arithmetic on ctx pointer in adjust_ptr_min_max_vals()
> generally could break existing programs as there's a valid use
> case in tracing in combination with passing the ctx to helpers as
> bpf_probe_read(), where the register then becomes unknown at
> verification time due to adding a non-constant offset to it. An
> access sequence may look like the following:
> 
>   offset = args->filename;  /* field __data_loc filename */
>   bpf_probe_read(&dst, len, (char *)args + offset); // args is ctx
> 
> There are two options: i) we could special case the ctx and as
> soon as we add a constant or bounded offset to it (hence ctx type
> wouldn't change) we could turn the ctx into an unknown scalar, or
> ii) we generalize the sanity test for ctx member access into a
> small helper and assert it on the ctx register that was passed
> as a function argument. Fwiw, latter is more obvious and less
> complex at the same time, and one case that may potentially be
> legitimate in future for ctx member access at least would be for
> ctx to carry a const offset. Therefore, fix follows approach
> from ii) and adds test cases to BPF kselftests.
> 
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Reported-by: syzbot+3d0b2441dbb71751615e@syzkaller.appspotmail.com
> Reported-by: syzbot+c8504affd4fdd0c1b626@syzkaller.appspotmail.com
> Reported-by: syzbot+e5190cb881d8660fb1a3@syzkaller.appspotmail.com
> Reported-by: syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied, Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox