Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 5/9] perf, bpf: save bpf_prog_info in a rbtree in perf_env
From: Jiri Olsa @ 2019-02-15  7:41 UTC (permalink / raw)
  To: Song Liu
  Cc: Netdev, linux-kernel, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team, peterz@infradead.org, acme@redhat.com
In-Reply-To: <2446DEFA-3CD3-4D3A-BE73-C29037F00C5C@fb.com>

On Thu, Feb 14, 2019 at 05:03:03PM +0000, Song Liu wrote:
> 
> 
> > On Feb 14, 2019, at 4:33 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> > On Fri, Feb 08, 2019 at 05:17:01PM -0800, Song Liu wrote:
> >> bpf_prog_info contains information necessary to annotate bpf programs.
> >> This patch saves bpf_prog_info for bpf programs loaded in the system.
> >> 
> >> perf-record saves bpf_prog_info information as headers to perf.data.
> >> A new header type HEADER_BPF_PROG_INFO is introduced for this data.
> > 
> > please move those 2 changes into separate patches then
> 
> Do you mean one patch to save data in rbtree, then a separate patch 
> to save data in perf.data file?

yes

jirka

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] libbpf: Support 32-bit static data loads
From: Joe Stringer @ 2019-02-15  7:16 UTC (permalink / raw)
  To: Y Song; +Cc: Joe Stringer, bpf, netdev, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <CAH3MdRVU5ayEm6eb9Fz53Q5gjA60vadyJ+0_meCkWBo+t+XXYA@mail.gmail.com>

On Thu, 14 Feb 2019 at 21:39, Y Song <ys114321@gmail.com> wrote:
>
> On Mon, Feb 11, 2019 at 4:48 PM Joe Stringer <joe@wand.net.nz> wrote:
> >
> > Support loads of static 32-bit data when BPF writers make use of
> > convenience macros for accessing static global data variables. A later
> > patch in this series will demonstrate its usage in a selftest.
> >
> > As of LLVM-7, this technique only works with 32-bit data, as LLVM will
> > complain if this technique is attempted with data of other sizes:
> >
> >     LLVM ERROR: Unsupported relocation: try to compile with -O2 or above,
> >     or check your static variable usage
>
> A little bit clarification from compiler side.
> The above compiler error is to prevent people use static variables since current
> kernel/libbpf does not handle this. The compiler only warns if .bss or
> .data section
> has more than one definitions. The first definition always has section offset 0
> and the compiler did not warn.

Ah, interesting. I observed that warning when I attempted to define
global variables of multiple sizes, and I thought also with sizes
other than 32-bit. This clarifies things a bit, thanks.

For the .bss my observation was that if you had a definition like:

static int a = 0;

Then this will be placed into .bss, hence why I looked into the
approach from this patch for patch 3 as well.

> The restriction is a little strange. To only work with 32-bit data is
> not a right
> statement. The following are some examples.
>
> The following static variable definitions will succeed:
> static int a; /* one in .bss */
> static long b = 2;  /* one in .data */
>
> The following definitions will fail as both in .bss.
> static int a;
> static int b;
>
> The following definitions will fail as both in .data:
> static char a = 2;
> static int b = 3;

Are there type restrictions or something? I've been defining multiple
static uint32_t and using them per the approach in this patch series
without hitting this compiler assertion.

> Using global variables can prevent compiler errors.
> maps are defined as globals and the compiler does not
> check whether a particular global variable is defining a map or not.
>
> If you just use static variable like below
> static int a = 2;
> without potential assignment to a, the compiler will replace variable
> a with 2 at compile time.
> An alternative is to define like below
> static volatile int a = 2;
> You can get a "load" for variable "a" in the bpf load even if there is
> no assignment to a.

I'll take a closer look at this too.

> Maybe now is the time to remove the compiler assertions as
> libbpf/kernel starts to
> handle static variables?

I don't understand why those assertions exists in this form. It
already allows code which will not load with libbpf (ie generate any
.data/.bss), does it help prevent unexpected situations for
developers?

^ permalink raw reply

* Re: general protection fault in prepare_to_wait
From: syzbot @ 2019-02-15  5:43 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs,
	xiyou.wangcong
In-Reply-To: <000000000000fa6a2c057e8b7064@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    23e93c9b2cde Revert "gfs2: read journal in large chunks to..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14e94014c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ee434566c893c7b1
dashboard link: https://syzkaller.appspot.com/bug?extid=55f9d3e51d49e20b2ce5
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=109c886cc00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ed20d0c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+55f9d3e51d49e20b2ce5@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_hsr: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_0: link becomes ready
kasan: CONFIG_KASAN_INLINE enabled
8021q: adding VLAN 0 to HW filter on device batadv0
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7785 Comm: syz-executor295 Not tainted 5.0.0-rc6+ #72
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__lock_acquire+0x8df/0x4700 kernel/locking/lockdep.c:3215
Code: 28 00 00 00 0f 85 35 27 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f  
5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f  
85 dc 27 00 00 49 81 3c 24 20 25 9a 89 0f 84 03 f8
RSP: 0018:ffff888094c6f970 EFLAGS: 00010006
kobject: 'vlan0' (00000000d30c60ff): kobject_add_internal: parent: 'mesh',  
set: '<NULL>'
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000018
RBP: ffff888094c6fb40 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000018
R13: 0000000000000001 R14: 0000000000000000 R15: ffff88808c0f00c0
FS:  00007f9abeeec700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
8021q: adding VLAN 0 to HW filter on device batadv0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kobject: 'vlan0' (00000000a54e32cc): kobject_add_internal: parent: 'mesh',  
set: '<NULL>'
CR2: 00000000004d0910 CR3: 00000000906d2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
8021q: adding VLAN 0 to HW filter on device batadv0
Call Trace:
kobject: 'vlan0' (0000000016be0e34): kobject_add_internal: parent: 'mesh',  
set: '<NULL>'
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3841
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
  prepare_to_wait+0x7c/0x300 kernel/sched/wait.c:230
  nr_accept+0x239/0x790 net/netrom/af_netrom.c:796
  __sys_accept4+0x350/0x6a0 net/socket.c:1588
  __do_sys_accept net/socket.c:1629 [inline]
  __se_sys_accept net/socket.c:1626 [inline]
  __x64_sys_accept+0x75/0xb0 net/socket.c:1626
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x449489
Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9abeeebcc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 00000000006dfc68 RCX: 0000000000449489
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006dfc60 R08: 00007f9abeeec700 R09: 0000000000000000
R10: 00007f9abeeec700 R11: 0000000000000246 R12: 00000000006dfc6c
R13: 00007ffe8979730f R14: 00007f9abeeec9c0 R15: 0000000000000002
Modules linked in:
---[ end trace 91ccd60fc619e2e6 ]---
RIP: 0010:__lock_acquire+0x8df/0x4700 kernel/locking/lockdep.c:3215
Code: 28 00 00 00 0f 85 35 27 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f  
5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f  
85 dc 27 00 00 49 81 3c 24 20 25 9a 89 0f 84 03 f8
RSP: 0018:ffff888094c6f970 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000018
RBP: ffff888094c6fb40 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000018
R13: 0000000000000001 R14: 0000000000000000 R15: ffff88808c0f00c0
FS:  00007f9abeeec700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d0910 CR3: 00000000906d2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


^ permalink raw reply

* Re: [PATCH bpf-next 2/4] libbpf: Support 32-bit static data loads
From: Y Song @ 2019-02-15  5:38 UTC (permalink / raw)
  To: Joe Stringer; +Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <20190212004729.535-3-joe@wand.net.nz>

On Mon, Feb 11, 2019 at 4:48 PM Joe Stringer <joe@wand.net.nz> wrote:
>
> Support loads of static 32-bit data when BPF writers make use of
> convenience macros for accessing static global data variables. A later
> patch in this series will demonstrate its usage in a selftest.
>
> As of LLVM-7, this technique only works with 32-bit data, as LLVM will
> complain if this technique is attempted with data of other sizes:
>
>     LLVM ERROR: Unsupported relocation: try to compile with -O2 or above,
>     or check your static variable usage

A little bit clarification from compiler side.
The above compiler error is to prevent people use static variables since current
kernel/libbpf does not handle this. The compiler only warns if .bss or
.data section
has more than one definitions. The first definition always has section offset 0
and the compiler did not warn.

The restriction is a little strange. To only work with 32-bit data is
not a right
statement. The following are some examples.

The following static variable definitions will succeed:
static int a; /* one in .bss */
static long b = 2;  /* one in .data */

The following definitions will fail as both in .bss.
static int a;
static int b;

The following definitions will fail as both in .data:
static char a = 2;
static int b = 3;

Using global variables can prevent compiler errors.
maps are defined as globals and the compiler does not
check whether a particular global variable is defining a map or not.

If you just use static variable like below
static int a = 2;
without potential assignment to a, the compiler will replace variable
a with 2 at compile time.
An alternative is to define like below
static volatile int a = 2;
You can get a "load" for variable "a" in the bpf load even if there is
no assignment to a.

Maybe now is the time to remove the compiler assertions as
libbpf/kernel starts to
handle static variables?

>
> Based on the proof of concept by Daniel Borkmann (presented at LPC 2018).
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  tools/lib/bpf/libbpf.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 1ec28d5154dc..da35d5559b22 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -140,11 +140,13 @@ struct bpf_program {
>                 enum {
>                         RELO_LD64,
>                         RELO_CALL,
> +                       RELO_DATA,
>                 } type;
>                 int insn_idx;
>                 union {
>                         int map_idx;
>                         int text_off;
> +                       uint32_t data;
>                 };
>         } *reloc_desc;
>         int nr_reloc;
> @@ -210,6 +212,7 @@ struct bpf_object {
>                 Elf *elf;
>                 GElf_Ehdr ehdr;
>                 Elf_Data *symbols;
> +               Elf_Data *global_data;
>                 size_t strtabidx;
>                 struct {
>                         GElf_Shdr shdr;
> @@ -218,6 +221,7 @@ struct bpf_object {
>                 int nr_reloc;
>                 int maps_shndx;
>                 int text_shndx;
> +               int data_shndx;
>         } efile;
>         /*
>          * All loaded bpf_object is linked in a list, which is
> @@ -476,6 +480,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>                 obj->efile.elf = NULL;
>         }
>         obj->efile.symbols = NULL;
> +       obj->efile.global_data = NULL;
>
>         zfree(&obj->efile.reloc);
>         obj->efile.nr_reloc = 0;
> @@ -866,6 +871,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                         pr_warning("failed to alloc program %s (%s): %s",
>                                                    name, obj->path, cp);
>                                 }
> +                       } else if (strcmp(name, ".data") == 0) {
> +                               obj->efile.global_data = data;
> +                               obj->efile.data_shndx = idx;
>                         }
>                 } else if (sh.sh_type == SHT_REL) {
>                         void *reloc = obj->efile.reloc;
> @@ -962,6 +970,7 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>         Elf_Data *symbols = obj->efile.symbols;
>         int text_shndx = obj->efile.text_shndx;
>         int maps_shndx = obj->efile.maps_shndx;
> +       int data_shndx = obj->efile.data_shndx;
>         struct bpf_map *maps = obj->maps;
>         size_t nr_maps = obj->nr_maps;
>         int i, nrels;
> @@ -1000,8 +1009,9 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                          (long long) (rel.r_info >> 32),
>                          (long long) sym.st_value, sym.st_name);
>
> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> +                   sym.st_shndx != data_shndx) {
> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>                                    prog->section_name, sym.st_shndx);
>                         return -LIBBPF_ERRNO__RELOC;
>                 }
> @@ -1046,6 +1056,20 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                         prog->reloc_desc[i].type = RELO_LD64;
>                         prog->reloc_desc[i].insn_idx = insn_idx;
>                         prog->reloc_desc[i].map_idx = map_idx;
> +               } else if (sym.st_shndx == data_shndx) {
> +                       Elf_Data *global_data = obj->efile.global_data;
> +                       uint32_t *static_data;
> +
> +                       if (sym.st_value + sizeof(uint32_t) > (int)global_data->d_size) {
> +                               pr_warning("bpf relocation: static data load beyond data size %lu\n",
> +                                          global_data->d_size);
> +                               return -LIBBPF_ERRNO__RELOC;
> +                       }
> +
> +                       static_data = global_data->d_buf + sym.st_value;
> +                       prog->reloc_desc[i].type = RELO_DATA;
> +                       prog->reloc_desc[i].insn_idx = insn_idx;
> +                       prog->reloc_desc[i].data = *static_data;
>                 }
>         }
>         return 0;
> @@ -1399,6 +1423,12 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>                                                       &prog->reloc_desc[i]);
>                         if (err)
>                                 return err;
> +               } else if (prog->reloc_desc[i].type == RELO_DATA) {
> +                       struct bpf_insn *insns = prog->insns;
> +                       int insn_idx;
> +
> +                       insn_idx = prog->reloc_desc[i].insn_idx;
> +                       insns[insn_idx].imm = prog->reloc_desc[i].data;
>                 }
>         }
>
> --
> 2.19.1
>

^ permalink raw reply

* Re: KASAN: use-after-free Read in lock_sock_nested
From: syzbot @ 2019-02-15  5:25 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs
In-Reply-To: <0000000000007a5aad057e7748c9@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    b3418f8bddf4 Add linux-next specific files for 20190214
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14f63047400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8a3a37525a677c71
dashboard link: https://syzkaller.appspot.com/bug?extid=500c69d1e21d970e461b
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14b08da7400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+500c69d1e21d970e461b@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3150/0x4710  
kernel/locking/lockdep.c:3200
Read of size 8 at addr ffff8880195faa60 by task syz-executor.4/7495

CPU: 1 PID: 7495 Comm: syz-executor.4 Not tainted 5.0.0-rc6-next-20190214  
#35
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
  kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
  __lock_acquire+0x3150/0x4710 kernel/locking/lockdep.c:3200
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3833
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:168
  spin_lock_bh include/linux/spinlock.h:334 [inline]
  lock_sock_nested+0x41/0x120 net/core/sock.c:2878
  lock_sock include/net/sock.h:1507 [inline]
  nr_accept+0x200/0x790 net/netrom/af_netrom.c:808
  __sys_accept4+0x350/0x6a0 net/socket.c:1610
  __do_sys_accept net/socket.c:1651 [inline]
  __se_sys_accept net/socket.c:1648 [inline]
  __x64_sys_accept+0x75/0xb0 net/socket.c:1648
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f16cb51ec78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 000000000073bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f16cb51f6d4
R13: 00000000004bdbf0 R14: 00000000004cde80 R15: 00000000ffffffff

Allocated by task 7492:
  save_stack+0x45/0xd0 mm/kasan/common.c:75
  set_track mm/kasan/common.c:87 [inline]
  __kasan_kmalloc mm/kasan/common.c:497 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511
  __do_kmalloc mm/slab.c:3721 [inline]
  __kmalloc+0x15c/0x740 mm/slab.c:3730
  kmalloc include/linux/slab.h:553 [inline]
  sk_prot_alloc+0x19c/0x2e0 net/core/sock.c:1573
  sk_alloc+0x39/0xf70 net/core/sock.c:1627
  nr_create+0xb9/0x5e0 net/netrom/af_netrom.c:436
  __sock_create+0x3e6/0x750 net/socket.c:1297
  sock_create net/socket.c:1337 [inline]
  __sys_socket+0x103/0x220 net/socket.c:1367
  __do_sys_socket net/socket.c:1376 [inline]
  __se_sys_socket net/socket.c:1374 [inline]
  __x64_sys_socket+0x73/0xb0 net/socket.c:1374
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7491:
  save_stack+0x45/0xd0 mm/kasan/common.c:75
  set_track mm/kasan/common.c:87 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
  __cache_free mm/slab.c:3491 [inline]
  kfree+0xcf/0x230 mm/slab.c:3816
  sk_prot_free net/core/sock.c:1610 [inline]
  __sk_destruct+0x4f1/0x6d0 net/core/sock.c:1692
  sk_destruct+0x7b/0x90 net/core/sock.c:1700
  __sk_free+0xce/0x300 net/core/sock.c:1711
  sk_free+0x42/0x50 net/core/sock.c:1722
  sock_put include/net/sock.h:1708 [inline]
  nr_release+0x337/0x3c0 net/netrom/af_netrom.c:557
  __sock_release+0xd3/0x250 net/socket.c:579
  sock_close+0x1b/0x30 net/socket.c:1161
  __fput+0x2e5/0x8d0 fs/file_table.c:278
  ____fput+0x16/0x20 fs/file_table.c:309
  task_work_run+0x14a/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
  exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880195fa9c0
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 160 bytes inside of
  2048-byte region [ffff8880195fa9c0, ffff8880195fb1c0)
The buggy address belongs to the page:
page:ffffea0000657e80 count:1 mapcount:0 mapping:ffff88812c3f0c40 index:0x0  
compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00026f8a88 ffffea000220c888 ffff88812c3f0c40
raw: 0000000000000000 ffff8880195fa140 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880195fa900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff8880195fa980: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> ffff8880195faa00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8880195faa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880195fab00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply

* Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Dominique Martinet @ 2019-02-15  4:52 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Tom Herbert, David Miller, Doron Roberts-Kedes, Dave Watson,
	Linux Kernel Network Developers, LKML
In-Reply-To: <CAPDqMeoJ7CCo1eGNBp_-crkxfVt_4f=XQqhEo7kmyCN-hf_EWQ@mail.gmail.com>

Tom Herbert wrote on Thu, Feb 14, 2019:
> On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet
> <asmadeus@codewreck.org> wrote:
> > Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
> > all: from a user point of view, the connection just hangs (recvmsg never
> > returns), without so much as a message in dmesg either.
>
> That's by design. Here is the description in kcm.txt:
> 
> "When a TCP socket is attached to a KCM multiplexor data ready
> (POLLIN) and write space available (POLLOUT) events are handled by the
> multiplexor. If there is a state change (disconnection) or other error
> on a TCP socket, an error is posted on the TCP socket so that a
> POLLERR event happens and KCM discontinues using the socket. When the
> application gets the error notification for a TCP socket, it should
> unattach the socket from KCM and then handle the error condition (the
> typical response is to close the socket and create a new connection if
> necessary)."

Sigh, that's what I get for relying on pieces of code found on the
internet.

It does make "trivial" kcm sockets difficult to use though, the old
ganesha code I adapted to kcm was the horrible (naive?) kind spawning
one thread per connection and just waiting on read(), so making it just
create a kcm socket from the tcp one and wait on recvmsg() until an
error pops up does not seem an option.

That being said I'm a bit confused, I thought part of the point of kcm
was the multiplexing so a simple server could just keep attaching tcp
sockets to a single kcm socket and only have a single trivial loop
reading from the kcm socket ; but I guess there's no harm in also
looking for POLLERR on the tcp sockets... It would still need to close
them once clients disconnect I guess, this really only affects my naive
server.

> > strparser might be able to retry somehow.
> 
> We could retry on -ENOMEM since it is potentially a transient error,

Yes, I think we should aim to retry on -ENOMEM; I agree all errors are
not meant to be retried on but there already are different cases based
on what the parse cb returned; but that can be a different patch.

> but generally for errors (like connection is simply broken) it seems
> like it's working as intended. I suppose we could add a socket option
> to fail the KCM socket when all attached lower sockets have failed,
> but I not sure that would be a significant benefit for additional
> complexity.

Right, I agree it's probably not worth doing, now I'm aware of this I
can probably motivate myself to change this old code to use epoll
properly.

I think it could make sense to have this option for simpler programs,
but as things stand I guess we can require kcm users to do this much,
thanks for the explanation, and sorry for having failed to notice it.

With all that said I guess my patch should work correctly then, I'll try
to find some time to check the error does come back up the tcp socket in
my reproducer but I have no reason to believe it doesn't.

I'd like to see some retry on ENOMEM before this is merged though, so
while I'm there I'll resend this with a second patch doing that
retry,.. I think just not setting strp->interrupted and not reporting
the error up might be enough? Will have to try either way.

Thanks for the feedback,
-- 
Dominique

^ permalink raw reply

* Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Tom Herbert @ 2019-02-15  4:01 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Tom Herbert, David Miller, Doron Roberts-Kedes, Dave Watson,
	Linux Kernel Network Developers, LKML
In-Reply-To: <20190215033102.GA3099@nautica>

On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> Tom Herbert wrote on Thu, Feb 14, 2019:
> > > This second patch[2] (the current thread) now does an extra clone if
> > > there is an offset, but the problem really isn't in the clone but the
> > > pull itself that can fail and return NULL when there is memory pressure.
> > > For some reason I hadn't been able to reproduce that behaviour with
> > > strparser doing the pull, but I assume it would also be possible to hit
> > > in extreme situations, I'm not sure...
> >
> > This option looks the best to me, at least as a way to fix the issue
> > without requiring a change to the API. If the pull fails, doesn't that
> > just mean that the parser fails? Is there some behavior with this
> > patch that is not being handled gracefully?
>
> Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
> all: from a user point of view, the connection just hangs (recvmsg never
> returns), without so much as a message in dmesg either.
>
Dominique,

That's by design. Here is the description in kcm.txt:

"When a TCP socket is attached to a KCM multiplexor data ready
(POLLIN) and write space available (POLLOUT) events are handled by the
multiplexor. If there is a state change (disconnection) or other error
on a TCP socket, an error is posted on the TCP socket so that a
POLLERR event happens and KCM discontinues using the socket. When the
application gets the error notification for a TCP socket, it should
unattach the socket from KCM and then handle the error condition (the
typical response is to close the socket and create a new connection if
necessary)."

> It took me a while to figure out what failed exactly as I did indeed
> expect strparser/kcm to handle that better, but ultimately as things
> stand if the parser fails it calls strp_parser_err() with the error
> which ends up in strp_abort_strp that should call
> sk->sk_error_report(sk) but in kcm case sk is the csk and I guess
> failing csk does not make a pending recv on the kcm sock to fail...
>
> I'm not sure whether propagating the error to the upper socket is the
> right thing to do, kcm is meant to be able to work with multiple
> underlying sockets so I feel we must be cautious about that, but

Right, that's the motivation for the design.

> strparser might be able to retry somehow.

We could retry on -ENOMEM since it is potentially a transient error,
but generally for errors (like connection is simply broken) it seems
like it's working as intended. I suppose we could add a socket option
to fail the KCM socket when all attached lower sockets have failed,
but I not sure that would be a significant benefit for additional
complexity.

> This is what I said below:
> > > [,,,]
> > >  - the current patch, that I could only get to fail with KASAN, but does
> > > complexify kcm a bit; this also does not fix bpf sockmap at all.
> > > Would still require to fix the hang, so make strparser retry a few times
> > > if strp->cb.parse_msg failed maybe? Or at least get the error back to
> > > userspace somehow.

The error should be getting to userspace via the TCP socket.

Tom

>
> Thanks,
> --
> Dominique

^ permalink raw reply

* Re: dead code in vhost.c
From: Jason Wang @ 2019-02-15  3:52 UTC (permalink / raw)
  To: Stephen Hemminger, mst; +Cc: netdev
In-Reply-To: <20190214080346.352c6941@shemminger-XPS-13-9360>


On 2019/2/15 上午12:03, Stephen Hemminger wrote:
> Coverity found this obvious bug
>
> ________________________________________________________________________________________________________
> *** CID 1442593:  Control flow issues  (DEADCODE)
> /drivers/vhost/vhost.c: 1795 in log_used()
> 1789     	ret = translate_desc(vq, (uintptr_t)vq->used + used_offset,
> 1790     			     len, iov, 64, VHOST_ACCESS_WO);
> 1791     	if (ret)
> 1792     		return ret;
> 1793
> 1794     	for (i = 0; i < ret; i++) {
>>>>      CID 1442593:  Control flow issues  (DEADCODE)
>>>>      Execution cannot reach this statement: "ret = log_write_hva(vq, (ui...".
> 1795     		ret = log_write_hva(vq,	(uintptr_t)iov[i].iov_base,
> 1796     				    iov[i].iov_len);
> 1797     		if (ret)
> 1798     			return ret;
> 1799     	}
> 1800


My bad, need check ret < 0 instead.

Will post a fix.

Thanks


^ permalink raw reply

* Re: [PATCH v2 1/2] Provide in-kernel headers for making it easy to extend the kernel
From: Joel Fernandes @ 2019-02-15  3:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, linux-kernel, Andrew Morton, ast, atishp04, dancol,
	Dan Williams, gregkh, Jonathan Corbet, karim.yaghmour, Kees Cook,
	kernel-team, linux-doc, linux-kselftest, Manoj Rao,
	Masahiro Yamada, paulmck, Peter Zijlstra (Intel), rdunlap,
	rostedt, Shuah Khan, Thomas Gleixner, yhs
In-Reply-To: <20190215031926.ljzluy2cfxp64u6o@ast-mbp>

On Thu, Feb 14, 2019 at 07:19:29PM -0800, Alexei Starovoitov wrote:
> On Mon, Feb 11, 2019 at 09:35:59AM -0500, Joel Fernandes (Google) wrote:
> > Introduce in-kernel headers and other artifacts which are made available
> > as an archive through proc (/proc/kheaders.txz file). This archive makes
> > it possible to build kernel modules, run eBPF programs, and other
> > tracing programs that need to extend the kernel for tracing purposes
> > without any dependency on the file system having headers and build
> > artifacts.
> > 
> > On Android and embedded systems, it is common to switch kernels but not
> > have kernel headers available on the file system. Raw kernel headers
> > also cannot be copied into the filesystem like they can be on other
> > distros, due to licensing and other issues. There's no linux-headers
> > package on Android. Further once a different kernel is booted, any
> > headers stored on the file system will no longer be useful. By storing
> > the headers as a compressed archive within the kernel, we can avoid these
> > issues that have been a hindrance for a long time.
> 
> The set looks good to me and since the main use case is building bpf progs
> I can route it via bpf-next tree if there are no objections.
> Masahiro, could you please ack it?
> 

Yes, eBPF is one of the usecases. After this, I am also planning to send
patches to BCC so that it can use this feature when compiling C to eBPF.

Thanks!

 - Joel


^ permalink raw reply

* Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Dominique Martinet @ 2019-02-15  3:31 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, doronrk, Tom Herbert, Dave Watson,
	Linux Kernel Network Developers, LKML
In-Reply-To: <CALx6S37MZadJ=PaAd+SSv9hxSX9kFTmTUtijPGA39JCx3PYq1Q@mail.gmail.com>

Tom Herbert wrote on Thu, Feb 14, 2019:
> > This second patch[2] (the current thread) now does an extra clone if
> > there is an offset, but the problem really isn't in the clone but the
> > pull itself that can fail and return NULL when there is memory pressure.
> > For some reason I hadn't been able to reproduce that behaviour with
> > strparser doing the pull, but I assume it would also be possible to hit
> > in extreme situations, I'm not sure...
>
> This option looks the best to me, at least as a way to fix the issue
> without requiring a change to the API. If the pull fails, doesn't that
> just mean that the parser fails? Is there some behavior with this
> patch that is not being handled gracefully?

Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
all: from a user point of view, the connection just hangs (recvmsg never
returns), without so much as a message in dmesg either.

It took me a while to figure out what failed exactly as I did indeed
expect strparser/kcm to handle that better, but ultimately as things
stand if the parser fails it calls strp_parser_err() with the error
which ends up in strp_abort_strp that should call
sk->sk_error_report(sk) but in kcm case sk is the csk and I guess
failing csk does not make a pending recv on the kcm sock to fail...

I'm not sure whether propagating the error to the upper socket is the
right thing to do, kcm is meant to be able to work with multiple
underlying sockets so I feel we must be cautious about that, but
strparser might be able to retry somehow.
This is what I said below:
> > [,,,]
> >  - the current patch, that I could only get to fail with KASAN, but does
> > complexify kcm a bit; this also does not fix bpf sockmap at all.
> > Would still require to fix the hang, so make strparser retry a few times
> > if strp->cb.parse_msg failed maybe? Or at least get the error back to
> > userspace somehow.

Thanks,
-- 
Dominique

^ permalink raw reply

* Re: [PATCH v2 1/2] Provide in-kernel headers for making it easy to extend the kernel
From: Alexei Starovoitov @ 2019-02-15  3:19 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: netdev, linux-kernel, Andrew Morton, ast, atishp04, dancol,
	Dan Williams, gregkh, Jonathan Corbet, karim.yaghmour, Kees Cook,
	kernel-team, linux-doc, linux-kselftest, Manoj Rao,
	Masahiro Yamada, paulmck, Peter Zijlstra (Intel), rdunlap,
	rostedt, Shuah Khan, Thomas Gleixner, yhs
In-Reply-To: <20190211143600.15021-1-joel@joelfernandes.org>

On Mon, Feb 11, 2019 at 09:35:59AM -0500, Joel Fernandes (Google) wrote:
> Introduce in-kernel headers and other artifacts which are made available
> as an archive through proc (/proc/kheaders.txz file). This archive makes
> it possible to build kernel modules, run eBPF programs, and other
> tracing programs that need to extend the kernel for tracing purposes
> without any dependency on the file system having headers and build
> artifacts.
> 
> On Android and embedded systems, it is common to switch kernels but not
> have kernel headers available on the file system. Raw kernel headers
> also cannot be copied into the filesystem like they can be on other
> distros, due to licensing and other issues. There's no linux-headers
> package on Android. Further once a different kernel is booted, any
> headers stored on the file system will no longer be useful. By storing
> the headers as a compressed archive within the kernel, we can avoid these
> issues that have been a hindrance for a long time.

The set looks good to me and since the main use case is building bpf progs
I can route it via bpf-next tree if there are no objections.
Masahiro, could you please ack it?


^ permalink raw reply

* Re: bpf: test_tunnel.sh: BUG: unable to handle kernel NULL pointer dereference
From: Alexei Starovoitov @ 2019-02-15  3:08 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Naresh Kamboju, Netdev, David S. Miller, Valdis Kletnieks,
	Song Liu, Rafael Tinoco, ast, Daniel Borkmann
In-Reply-To: <alpine.LRH.2.20.1902111627170.7885@dhcp-10-175-202-251.vpn.oracle.com>

On Mon, Feb 11, 2019 at 04:39:50PM +0000, Alan Maguire wrote:
> On Fri, 1 Feb 2019, Naresh Kamboju wrote:
> 
> > Kernel panic while running bpf: test_tunnel.sh on linux -next
> > 5.0.0-rc4-next-20190129.
> > 
> > selftests: bpf: test_tunnel.sh
> > [  273.997647] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> > [  274.054068] ip (11458) used greatest stack depth: 11448 bytes left
> > [  274.120445] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000000
> > [  274.128285] #PF error: [INSTR]
> > [  274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
> > [  274.138241] Oops: 0010 [#1] SMP PTI
> > [  274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
> > 5.0.0-rc4-next-20190129 #1
> > [  274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.0b 07/27/2017
> > [  274.156526] RIP: 0010:          (null)
> > [  274.160280] Code: Bad RIP value.
> > [  274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
> > [  274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
> > [  274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
> > [  274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
> > [  274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
> > [  274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
> > [  274.204346] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
> > knlGS:0000000000000000
> > [  274.212424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
> > [  274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  274.239541] Call Trace:
> > [  274.241988]  ? tnl_update_pmtu+0x296/0x3b0
> > [  274.246085]  ip_md_tunnel_xmit+0x1bc/0x520
> > [  274.250176]  gre_fb_xmit+0x330/0x390
> > [  274.253754]  gre_tap_xmit+0x128/0x180
> > [  274.257414]  dev_hard_start_xmit+0xb7/0x300
> > [  274.261598]  sch_direct_xmit+0xf6/0x290
> > [  274.265430]  __qdisc_run+0x15d/0x5e0
> > [  274.269007]  __dev_queue_xmit+0x2c5/0xc00
> > [  274.273011]  ? dev_queue_xmit+0x10/0x20
> > [  274.276842]  ? eth_header+0x2b/0xc0
> > [  274.280326]  dev_queue_xmit+0x10/0x20
> > [  274.283984]  ? dev_queue_xmit+0x10/0x20
> > [  274.287813]  arp_xmit+0x1a/0xf0
> > [  274.290952]  arp_send_dst.part.19+0x46/0x60
> > [  274.295138]  arp_solicit+0x177/0x6b0
> > [  274.298708]  ? mod_timer+0x18e/0x440
> > [  274.302281]  neigh_probe+0x57/0x70
> > [  274.305684]  __neigh_event_send+0x197/0x2d0
> > [  274.309862]  neigh_resolve_output+0x18c/0x210
> > [  274.314212]  ip_finish_output2+0x257/0x690
> > [  274.318304]  ip_finish_output+0x219/0x340
> > [  274.322314]  ? ip_finish_output+0x219/0x340
> > [  274.326493]  ip_output+0x76/0x240
> > [  274.329805]  ? ip_fragment.constprop.53+0x80/0x80
> > [  274.334510]  ip_local_out+0x3f/0x70
> > [  274.337992]  ip_send_skb+0x19/0x40
> > [  274.341391]  ip_push_pending_frames+0x33/0x40
> > [  274.345740]  raw_sendmsg+0xc15/0x11d0
> > [  274.349403]  ? __might_fault+0x85/0x90
> > [  274.353151]  ? _copy_from_user+0x6b/0xa0
> > [  274.357070]  ? rw_copy_check_uvector+0x54/0x130
> > [  274.361604]  inet_sendmsg+0x42/0x1c0
> > [  274.365179]  ? inet_sendmsg+0x42/0x1c0
> > [  274.368937]  sock_sendmsg+0x3e/0x50
> > [  274.372460]  ___sys_sendmsg+0x26f/0x2d0
> > [  274.376293]  ? lock_acquire+0x95/0x190
> > [  274.380043]  ? __handle_mm_fault+0x7ce/0xb70
> > [  274.384307]  ? lock_acquire+0x95/0x190
> > [  274.388053]  ? __audit_syscall_entry+0xdd/0x130
> > [  274.392586]  ? ktime_get_coarse_real_ts64+0x64/0xc0
> > [  274.397461]  ? __audit_syscall_entry+0xdd/0x130
> > [  274.401989]  ? trace_hardirqs_on+0x4c/0x100
> > [  274.406173]  __sys_sendmsg+0x63/0xa0
> > [  274.409744]  ? __sys_sendmsg+0x63/0xa0
> > [  274.413488]  __x64_sys_sendmsg+0x1f/0x30
> > [  274.417405]  do_syscall_64+0x55/0x190
> > [  274.421064]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [  274.426113] RIP: 0033:0x7ff4ae0e6e87
> > [  274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
> > 00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
> > 24 08
> > [  274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000002e
> > [  274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
> > [  274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
> > [  274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
> > [  274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
> > [  274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
> > [  274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
> > ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
> > test_bpf]
> > [  274.504634] CR2: 0000000000000000
> > [  274.507976] ---[ end trace 196d18386545eae1 ]---
> > [  274.512588] RIP: 0010:          (null)
> > [  274.516334] Code: Bad RIP value.
> > [  274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
> > [  274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
> > [  274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
> > [  274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
> > [  274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
> > [  274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
> > [  274.560456] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
> > knlGS:0000000000000000
> > [  274.568541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
> > [  274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  274.595658] Kernel panic - not syncing: Fatal exception in interrupt
> > [  274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [  274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
> > interrupt ]---
> > [  274.620387] ------------[ cut here ]------------
> > 
> > The above log is from x86_64 device.
> > 
> 
>  
> I'm also seeing the same panic during test_tunnel.sh execution
> on x86_64.
> 
> From poking around it looks like the skb's dst entry is being used
> to calculate the mtu in:
> 
> mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
> 
> ...but because that dst_entry  has an "ops" value set to md_dst_ops,
> the various ops (including mtu) are not set:
> 
> crash> struct sk_buff._skb_refdst ffff928f87447700 -x
>       _skb_refdst = 0xffffcd6fbf5ea590
> crash> struct dst_entry.ops 0xffffcd6fbf5ea590
>   ops = 0xffffffffa0193800
> crash> struct dst_ops.mtu 0xffffffffa0193800
>   mtu = 0x0
> crash>
> 
> I confirmed that the dst entry also has dst->input set to
> dst_md_discard, so it looks like it's an entry that's been
> initialized via __metadata_dst_init alright.
> 
> I think the fix here is to use skb_valid_dst(skb) - it checks
> for  DST_METADATA also, and with that fix in place, the
> problem - which was previously 100% reproducible - disappears.

The fix looks good to me.
Could you please resend the patch officially?

> However what we should do in terms of path MTU setting for
> such cases (use ip*_update_pmtu() perhaps?), I'm not sure.
> 
> The below patch resolves the panic and all tunnel tests pass
> without incident.
> 
> Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  net/ipv4/ip_tunnel.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 893f013..5dcf50c 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -515,9 +515,10 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
>  		mtu = dst_mtu(&rt->dst) - dev->hard_header_len
>  					- sizeof(struct iphdr) - tunnel_hlen;
>  	else
> -		mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
> +		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
>  
> -	skb_dst_update_pmtu(skb, mtu);
> +	if (skb_valid_dst(skb))
> +		skb_dst_update_pmtu(skb, mtu);
>  
>  	if (skb->protocol == htons(ETH_P_IP)) {
>  		if (!skb_is_gso(skb) &&
> @@ -530,9 +531,11 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
>  	}
>  #if IS_ENABLED(CONFIG_IPV6)
>  	else if (skb->protocol == htons(ETH_P_IPV6)) {
> -		struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
> +		struct rt6_info *rt6;
>  		__be32 daddr;
>  
> +		rt6 = skb_valid_dst(skb) ? (struct rt6_info *)skb_dst(skb) :
> +					   NULL;
>  		daddr = md ? dst : tunnel->parms.iph.daddr;
>  
>  		if (rt6 && mtu < dst_mtu(skb_dst(skb)) &&
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: Question on ptr_ring linux header
From: fei phung @ 2019-02-15  3:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: feiphung, netdev
In-Reply-To: <20190201102519-mutt-send-email-mst@kernel.org>

Hi Michael,

> If I had to guess I'd say the way you play with indices is probably racy
> so you are producing an invalid index.

You are probably right.

I am suspecting item_recv_push_index and item_send_push_index in
https://gist.github.com/promach/7716ee8addcaa33fda140d74d1ad94d6#file-riffa_driver_ptr_ring-c-L254-L404
causing data race (illegal index) for consume() function

note that this interrupt handler could be interrupted again by another
hardware interrupt

I do not think we can use lock within interrupt handler, right ?

Code:
~~~

struct item item_recv_push[CIRC_BUFF_SIZE];
struct item item_send_push[CIRC_BUFF_SIZE];

unsigned int item_send_push_index;
unsigned int item_recv_push_index;

///////////////////////////////////////////////////////
// INTERRUPT HANDLER
///////////////////////////////////////////////////////

/**
* Reads the interrupt vector and processes it. If processing VECT0, off will
* be 0. If processing VECT1, off will be 6.
*/
static inline void process_intr_vector(struct fpga_state * sc, int off,
unsigned int vect)
{
// VECT_0/VECT_1 are organized from right to left (LSB to MSB) as:
// [ 0] TX_TXN for channel 0 in VECT_0, channel 6 in VECT_1
// [ 1] TX_SG_BUF_RECVD for channel 0 in VECT_0, channel 6 in VECT_1
// [ 2] TX_TXN_DONE for channel 0 in VECT_0, channel 6 in VECT_1
// [ 3] RX_SG_BUF_RECVD for channel 0 in VECT_0, channel 6 in VECT_1
// [ 4] RX_TXN_DONE for channel 0 in VECT_0, channel 6 in VECT_1
// ...
// [25] TX_TXN for channel 5 in VECT_0, channel 11 in VECT_1
// [26] TX_SG_BUF_RECVD for channel 5 in VECT_0, channel 11 in VECT_1
// [27] TX_TXN_DONE for channel 5 in VECT_0, channel 11 in VECT_1
// [28] RX_SG_BUF_RECVD for channel 5 in VECT_0, channel 11 in VECT_1
// [29] RX_TXN_DONE for channel 5 in VECT_0, channel 11 in VECT_1
// Positions 30 - 31 in both VECT_0 and VECT_1 are zero.

unsigned int offlast;
unsigned int len;
int recv;
int send;
int chnl;
int i;

offlast = 0;
len = 0;

//printk(KERN_INFO "riffa: intrpt_handler received:%08x\n", vect);
if (vect & 0xC0000000) {
printk(KERN_ERR "riffa: fpga:%d, received bad interrupt
vector:%08x\n", sc->id, vect);
return;
}

for (i = 0; i < 6 && (i+off) < sc->num_chnls; ++i) {
chnl = i + off;
recv = 0;
send = 0;

// TX (PC receive) scatter gather buffer is read.
if (vect & (1<<((5*i)+1))) {
recv = 1;

item_recv_push[item_recv_push_index].val1 = EVENT_SG_BUF_READ;
item_recv_push[item_recv_push_index].val2 = 0;

// Keep track so the thread can handle this.
if (ptr_ring_produce_any(sc->recv[chnl]->msgs,
&item_recv_push[item_recv_push_index])) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, recv sg buf read msg queue
full\n", sc->id, chnl);
}
DEBUG_MSG(KERN_INFO "riffa: fpga:%d chnl:%d, recv sg buf read\n", sc->id, chnl);

item_recv_push_index++;
}

// TX (PC receive) transaction done.
if (vect & (1<<((5*i)+2))) {
recv = 1;

item_recv_push[item_recv_push_index].val1 = EVENT_TXN_DONE;
item_recv_push[item_recv_push_index].val2 = len;

// Read the transferred amount.
len = read_reg(sc, CHNL_REG(chnl, TX_TNFR_LEN_REG_OFF));
// Notify the thread.
if (ptr_ring_produce_any(sc->recv[chnl]->msgs,
&item_recv_push[item_recv_push_index])) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, recv txn done msg queue
full\n", sc->id, chnl);
}
DEBUG_MSG(KERN_INFO "riffa: fpga:%d chnl:%d, recv txn done\n", sc->id, chnl);

item_recv_push_index++;
}

// New TX (PC receive) transaction.
if (vect & (1<<((5*i)+0))) {
recv = 1;
recv_sg_buf_populated = 0; // resets for new transaction

// Read the offset/last and length
offlast = read_reg(sc, CHNL_REG(chnl, TX_OFFLAST_REG_OFF));
tx_len = read_reg(sc, CHNL_REG(chnl, TX_LEN_REG_OFF));

item_recv_push[item_recv_push_index].val1 = EVENT_TXN_OFFLAST;
item_recv_push[item_recv_push_index].val2 = offlast;

// Keep track of this transaction
if (ptr_ring_produce_any(sc->recv[chnl]->msgs,
&item_recv_push[item_recv_push_index])) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, recv txn offlast msg queue
full\n", sc->id, chnl);
}
/*if (push_circ_queue(sc->recv[chnl]->msgs, EVENT_TXN_LEN, len)) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, recv txn len msg queue
full\n", sc->id, chnl);
}*/
DEBUG_MSG(KERN_INFO "riffa: fpga:%d chnl:%d, recv txn (len:%d off:%d
last:%d)\n", sc->id, chnl, tx_len, (offlast>>1), (offlast & 0x1));

item_recv_push_index++;
}

// RX (PC send) scatter gather buffer is read.
if (vect & (1<<((5*i)+3))) {
send = 1;

item_send_push[item_send_push_index].val1 = EVENT_SG_BUF_READ;
item_send_push[item_send_push_index].val2 = 0;

// Keep track so the thread can handle this.
if (ptr_ring_produce_any(sc->send[chnl]->msgs,
&item_send_push[item_send_push_index])) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, send sg buf read msg queue
full\n", sc->id, chnl);
}
DEBUG_MSG(KERN_INFO "riffa: fpga:%d chnl:%d, send sg buf read\n", sc->id, chnl);

item_send_push_index++;
}

// RX (PC send) transaction done.
if (vect & (1<<((5*i)+4))) {
send = 1;

item_send_push[item_send_push_index].val1 = EVENT_TXN_DONE;
item_send_push[item_send_push_index].val2 = len;

// Read the transferred amount.
len = read_reg(sc, CHNL_REG(chnl, RX_TNFR_LEN_REG_OFF));
// Notify the thread.
if (ptr_ring_produce_any(sc->send[chnl]->msgs,
&item_send_push[item_send_push_index])) {
printk(KERN_ERR "riffa: fpga:%d chnl:%d, send txn done msg queue
full\n", sc->id, chnl);
}
DEBUG_MSG(KERN_INFO "riffa: fpga:%d chnl:%d, send txn done\n", sc->id, chnl);

item_send_push_index++;
}

// Wake up the thread?
if (recv)
wake_up(&sc->recv[chnl]->waitq);
if (send)
wake_up(&sc->send[chnl]->waitq);
}
}

~~~

Regards,
Cheng Fei

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: David Chang @ 2019-02-15  2:51 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
In-Reply-To: <47a0819f-5ec3-6d73-210e-235d6bbcaab1@gmail.com>

Hi Heiner,

On Feb 14, 2019 at 07:17:44 +0100, Heiner Kallweit wrote:
> Hi David,
> 
> On 14.02.2019 03:45, David Chang wrote:
> > Hi Heiner,
> > 
> > On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> >> Hi David,
> >>
> >> meanwhile there's the following bug report matching what reported.
> >> It's even the same chip version (RTL8168h).
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> >>
> >> Symptom there is also a significant number of rx_missed packets.
> >> Could you try what I mentioned there last:
> >> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> >> end of rtl_hw_start_8168h_1() being disabled.
> > 
> > After disabled the aspm function that you mentioned, we finally got the
> > positive testing result. And the rx_missed error was gone. If without
> > the patch, the receive side get back to bad performance.
> > 
> Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
> driver uses some heuristics to disable ASPM under high load. So it seems like there
> is some hw issue. Open so far is whether this affects certain chip versions only.
> Let's see whether they can provide more information.

Ok!

> Disabling ASPM in general would hurt notebook users because based on some past
> measurements we know ASPM can significantly save energy.

I understand, thanks!

regards,
David
> 
> > kernel: r8169: loading out-of-tree module taints kernel.
> > kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> > kernel: libphy: r8169: probed
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> > kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> > kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> > kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> > kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
> > 
> > NIC statistics:
> >      tx_packets: 1653804
> >      rx_packets: 1555966
> >      tx_errors: 0
> >      rx_errors: 0
> >      rx_missed: 0
> >      align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      unicast: 1555884
> >      broadcast: 78
> >      multicast: 4
> >      tx_aborted: 0
> >      tx_underrun: 0
> > 
> > iperf receive:
> > -----------------------------------------------------------
> > Server listening on 5201
> > -----------------------------------------------------------
> > Accepted connection from 10.x.x.x, port 55516
> > [  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> > [ ID] Interval           Transfer     Bitrate
> > [  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
> > [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
> > [  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
> > [  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
> > [  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
> > [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
> > [...]
> > [  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
> > [  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
> > [  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
> > [  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate
> > [  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver
> > 
> > regards,
> > David
> > 
> Heiner
> 

^ permalink raw reply

* Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Tom Herbert @ 2019-02-15  2:48 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: David Miller, doronrk, Tom Herbert, Dave Watson,
	Linux Kernel Network Developers, LKML
In-Reply-To: <20190215015705.GA17974@nautica>

On Thu, Feb 14, 2019 at 5:57 PM Dominique Martinet
<asmadeus@codewreck.org> wrote:
>
> Tom Herbert wrote on Thu, Feb 14, 2019:
> > > The best alternative I see is adding a proper helper to get
> > > "kcm_rx_msg(skb)->offset" from bpf and document it so users aren't as
> > > lost as I have been; I'm not quite sure how/where to add such a helper
> > > though as I've barely looked at the bpf code until now, but should we go
> > > for that?
> >
> > I'd rather not complicate the bpf code for this. Can we just always do
> > an pskb_pull after skb_clone?
>
> Which skb_clone are you thinking of?
> If you're referring to the one in strparser, that was my very first
> approach here[1], but Dordon shot it down saying that this is not an
> strparser bug but a kcm bug since there are ways for users to properly
> get the offset and use it -- and the ktls code does it right.
>
> Frankly, my opinion still is that it'd be better in strparser because
> there also is some bad use in bpf sockmap (they never look at the offset
> either, while kcm uses it for rcv but not for parse), but looking at it
> from an optimal performance point of view I agree the user can make
> better decision than strparser so I understand where he comes from.
>
>
> This second patch[2] (the current thread) now does an extra clone if
> there is an offset, but the problem really isn't in the clone but the
> pull itself that can fail and return NULL when there is memory pressure.
> For some reason I hadn't been able to reproduce that behaviour with
> strparser doing the pull, but I assume it would also be possible to hit
> in extreme situations, I'm not sure...
>
This option looks the best to me, at least as a way to fix the issue
without requiring a change to the API. If the pull fails, doesn't that
just mean that the parser fails? Is there some behavior with this
patch that is not being handled gracefully?

Thanks,
Tom

> So anyway, we basically have three choices that I can see:
>  - push harder on strparser and go back to my first patch ; it's simple
> and makes using strparser easier/safer but has a small overhead for
> ktls, which uses the current strparser implementation correctly.
> I hadn't been able to get this to fail with KASAN last time I tried but
> I assume it should still be possible somehow.
>
>  - the current patch, that I could only get to fail with KASAN, but does
> complexify kcm a bit; this also does not fix bpf sockmap at all.
> Would still require to fix the hang, so make strparser retry a few times
> if strp->cb.parse_msg failed maybe? Or at least get the error back to
> userspace somehow.
>
>  - my last suggestion to document the kcm behaviour, and if possible add
> a bpf helper to make proper usage easier ; this will make kcm harder to
> use on end users but it's better than not being documented and seeing
> random unappropriate lengths being parsed.
>
>
>
> [1] first strparser patch
> http://lkml.kernel.org/r/1534855906-22870-1-git-send-email-asmadeus@codewreck.org
> [2] current thread's patch
> http://lkml.kernel.org/r/1536657703-27577-1-git-send-email-asmadeus@codewreck.org
>
>
> Thanks,
> --
> Dominique

^ permalink raw reply

* [PATCH AUTOSEL 4.20 41/77] libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ilya Dryomov, Sasha Levin, ceph-devel, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Ilya Dryomov <idryomov@gmail.com>

[ Upstream commit 4aac9228d16458cedcfd90c7fb37211cf3653ac3 ]

con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():

    libceph user thread               ceph-msgr worker

ceph_con_keepalive()
  mutex_lock(&con->mutex)
  clear_standby(con)
  mutex_unlock(&con->mutex)
                                mutex_lock(&con->mutex)
                                con_fault()
                                  ...
                                  if KEEPALIVE_PENDING isn't set
                                    set state to STANDBY
                                  ...
                                mutex_unlock(&con->mutex)
  set KEEPALIVE_PENDING
  set WRITE_PENDING

This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.

I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.

Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ceph/messenger.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 2f126eff275d..664f886f464d 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -3219,9 +3219,10 @@ void ceph_con_keepalive(struct ceph_connection *con)
 	dout("con_keepalive %p\n", con);
 	mutex_lock(&con->mutex);
 	clear_standby(con);
+	con_flag_set(con, CON_FLAG_KEEPALIVE_PENDING);
 	mutex_unlock(&con->mutex);
-	if (con_flag_test_and_set(con, CON_FLAG_KEEPALIVE_PENDING) == 0 &&
-	    con_flag_test_and_set(con, CON_FLAG_WRITE_PENDING) == 0)
+
+	if (con_flag_test_and_set(con, CON_FLAG_WRITE_PENDING) == 0)
 		queue_con(con);
 }
 EXPORT_SYMBOL(ceph_con_keepalive);
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 54/77] net: altera_tse: fix connect_local_phy error path
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Atsushi Nemoto, David S . Miller, Sasha Levin, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>

[ Upstream commit 17b42a20d7ca59377788c6a2409e77569570cc10 ]

The connect_local_phy should return NULL (not negative errno) on
error, since its caller expects it.

Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/altera/altera_tse_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c
index 02921d877c08..aa1d1f5339d2 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -714,8 +714,10 @@ static struct phy_device *connect_local_phy(struct net_device *dev)
 
 		phydev = phy_connect(dev, phy_id_fmt, &altera_tse_adjust_link,
 				     priv->phy_iface);
-		if (IS_ERR(phydev))
+		if (IS_ERR(phydev)) {
 			netdev_err(dev, "Could not attach to PHY\n");
+			phydev = NULL;
+		}
 
 	} else {
 		int ret;
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 56/77] hv_netvsc: Refactor assignments of struct netvsc_device_info
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Haiyang Zhang, Sasha Levin, devel, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Haiyang Zhang <haiyangz@microsoft.com>

[ Upstream commit 7c9f335a3ff20557a92584199f3d35c7e992bbe5 ]

These assignments occur in multiple places. The patch refactor them
to a function for simplicity. It also puts the struct to heap area
for future expension.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
[sl: fix up subject line]
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/hyperv/netvsc_drv.c | 134 ++++++++++++++++++++------------
 1 file changed, 85 insertions(+), 49 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index cf36e7ff3191..4055e12fce9a 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -857,6 +857,36 @@ static void netvsc_get_channels(struct net_device *net,
 	}
 }
 
+/* Alloc struct netvsc_device_info, and initialize it from either existing
+ * struct netvsc_device, or from default values.
+ */
+static struct netvsc_device_info *netvsc_devinfo_get
+			(struct netvsc_device *nvdev)
+{
+	struct netvsc_device_info *dev_info;
+
+	dev_info = kzalloc(sizeof(*dev_info), GFP_ATOMIC);
+
+	if (!dev_info)
+		return NULL;
+
+	if (nvdev) {
+		dev_info->num_chn = nvdev->num_chn;
+		dev_info->send_sections = nvdev->send_section_cnt;
+		dev_info->send_section_size = nvdev->send_section_size;
+		dev_info->recv_sections = nvdev->recv_section_cnt;
+		dev_info->recv_section_size = nvdev->recv_section_size;
+	} else {
+		dev_info->num_chn = VRSS_CHANNEL_DEFAULT;
+		dev_info->send_sections = NETVSC_DEFAULT_TX;
+		dev_info->send_section_size = NETVSC_SEND_SECTION_SIZE;
+		dev_info->recv_sections = NETVSC_DEFAULT_RX;
+		dev_info->recv_section_size = NETVSC_RECV_SECTION_SIZE;
+	}
+
+	return dev_info;
+}
+
 static int netvsc_detach(struct net_device *ndev,
 			 struct netvsc_device *nvdev)
 {
@@ -942,7 +972,7 @@ static int netvsc_set_channels(struct net_device *net,
 	struct net_device_context *net_device_ctx = netdev_priv(net);
 	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
 	unsigned int orig, count = channels->combined_count;
-	struct netvsc_device_info device_info;
+	struct netvsc_device_info *device_info;
 	int ret;
 
 	/* We do not support separate count for rx, tx, or other */
@@ -961,24 +991,26 @@ static int netvsc_set_channels(struct net_device *net,
 
 	orig = nvdev->num_chn;
 
-	memset(&device_info, 0, sizeof(device_info));
-	device_info.num_chn = count;
-	device_info.send_sections = nvdev->send_section_cnt;
-	device_info.send_section_size = nvdev->send_section_size;
-	device_info.recv_sections = nvdev->recv_section_cnt;
-	device_info.recv_section_size = nvdev->recv_section_size;
+	device_info = netvsc_devinfo_get(nvdev);
+
+	if (!device_info)
+		return -ENOMEM;
+
+	device_info->num_chn = count;
 
 	ret = netvsc_detach(net, nvdev);
 	if (ret)
-		return ret;
+		goto out;
 
-	ret = netvsc_attach(net, &device_info);
+	ret = netvsc_attach(net, device_info);
 	if (ret) {
-		device_info.num_chn = orig;
-		if (netvsc_attach(net, &device_info))
+		device_info->num_chn = orig;
+		if (netvsc_attach(net, device_info))
 			netdev_err(net, "restoring channel setting failed\n");
 	}
 
+out:
+	kfree(device_info);
 	return ret;
 }
 
@@ -1047,48 +1079,45 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
 	int orig_mtu = ndev->mtu;
-	struct netvsc_device_info device_info;
+	struct netvsc_device_info *device_info;
 	int ret = 0;
 
 	if (!nvdev || nvdev->destroy)
 		return -ENODEV;
 
+	device_info = netvsc_devinfo_get(nvdev);
+
+	if (!device_info)
+		return -ENOMEM;
+
 	/* Change MTU of underlying VF netdev first. */
 	if (vf_netdev) {
 		ret = dev_set_mtu(vf_netdev, mtu);
 		if (ret)
-			return ret;
+			goto out;
 	}
 
-	memset(&device_info, 0, sizeof(device_info));
-	device_info.num_chn = nvdev->num_chn;
-	device_info.send_sections = nvdev->send_section_cnt;
-	device_info.send_section_size = nvdev->send_section_size;
-	device_info.recv_sections = nvdev->recv_section_cnt;
-	device_info.recv_section_size = nvdev->recv_section_size;
-
 	ret = netvsc_detach(ndev, nvdev);
 	if (ret)
 		goto rollback_vf;
 
 	ndev->mtu = mtu;
 
-	ret = netvsc_attach(ndev, &device_info);
-	if (ret)
-		goto rollback;
-
-	return 0;
+	ret = netvsc_attach(ndev, device_info);
+	if (!ret)
+		goto out;
 
-rollback:
 	/* Attempt rollback to original MTU */
 	ndev->mtu = orig_mtu;
 
-	if (netvsc_attach(ndev, &device_info))
+	if (netvsc_attach(ndev, device_info))
 		netdev_err(ndev, "restoring mtu failed\n");
 rollback_vf:
 	if (vf_netdev)
 		dev_set_mtu(vf_netdev, orig_mtu);
 
+out:
+	kfree(device_info);
 	return ret;
 }
 
@@ -1673,7 +1702,7 @@ static int netvsc_set_ringparam(struct net_device *ndev,
 {
 	struct net_device_context *ndevctx = netdev_priv(ndev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
-	struct netvsc_device_info device_info;
+	struct netvsc_device_info *device_info;
 	struct ethtool_ringparam orig;
 	u32 new_tx, new_rx;
 	int ret = 0;
@@ -1693,26 +1722,29 @@ static int netvsc_set_ringparam(struct net_device *ndev,
 	    new_rx == orig.rx_pending)
 		return 0;	 /* no change */
 
-	memset(&device_info, 0, sizeof(device_info));
-	device_info.num_chn = nvdev->num_chn;
-	device_info.send_sections = new_tx;
-	device_info.send_section_size = nvdev->send_section_size;
-	device_info.recv_sections = new_rx;
-	device_info.recv_section_size = nvdev->recv_section_size;
+	device_info = netvsc_devinfo_get(nvdev);
+
+	if (!device_info)
+		return -ENOMEM;
+
+	device_info->send_sections = new_tx;
+	device_info->recv_sections = new_rx;
 
 	ret = netvsc_detach(ndev, nvdev);
 	if (ret)
-		return ret;
+		goto out;
 
-	ret = netvsc_attach(ndev, &device_info);
+	ret = netvsc_attach(ndev, device_info);
 	if (ret) {
-		device_info.send_sections = orig.tx_pending;
-		device_info.recv_sections = orig.rx_pending;
+		device_info->send_sections = orig.tx_pending;
+		device_info->recv_sections = orig.rx_pending;
 
-		if (netvsc_attach(ndev, &device_info))
+		if (netvsc_attach(ndev, device_info))
 			netdev_err(ndev, "restoring ringparam failed");
 	}
 
+out:
+	kfree(device_info);
 	return ret;
 }
 
@@ -2166,7 +2198,7 @@ static int netvsc_probe(struct hv_device *dev,
 {
 	struct net_device *net = NULL;
 	struct net_device_context *net_device_ctx;
-	struct netvsc_device_info device_info;
+	struct netvsc_device_info *device_info = NULL;
 	struct netvsc_device *nvdev;
 	int ret = -ENOMEM;
 
@@ -2213,21 +2245,21 @@ static int netvsc_probe(struct hv_device *dev,
 	netif_set_real_num_rx_queues(net, 1);
 
 	/* Notify the netvsc driver of the new device */
-	memset(&device_info, 0, sizeof(device_info));
-	device_info.num_chn = VRSS_CHANNEL_DEFAULT;
-	device_info.send_sections = NETVSC_DEFAULT_TX;
-	device_info.send_section_size = NETVSC_SEND_SECTION_SIZE;
-	device_info.recv_sections = NETVSC_DEFAULT_RX;
-	device_info.recv_section_size = NETVSC_RECV_SECTION_SIZE;
-
-	nvdev = rndis_filter_device_add(dev, &device_info);
+	device_info = netvsc_devinfo_get(NULL);
+
+	if (!device_info) {
+		ret = -ENOMEM;
+		goto devinfo_failed;
+	}
+
+	nvdev = rndis_filter_device_add(dev, device_info);
 	if (IS_ERR(nvdev)) {
 		ret = PTR_ERR(nvdev);
 		netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
 		goto rndis_failed;
 	}
 
-	memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
+	memcpy(net->dev_addr, device_info->mac_adr, ETH_ALEN);
 
 	/* We must get rtnl lock before scheduling nvdev->subchan_work,
 	 * otherwise netvsc_subchan_work() can get rtnl lock first and wait
@@ -2265,12 +2297,16 @@ static int netvsc_probe(struct hv_device *dev,
 
 	list_add(&net_device_ctx->list, &netvsc_dev_list);
 	rtnl_unlock();
+
+	kfree(device_info);
 	return 0;
 
 register_failed:
 	rtnl_unlock();
 	rndis_filter_device_remove(dev, nvdev);
 rndis_failed:
+	kfree(device_info);
+devinfo_failed:
 	free_percpu(net_device_ctx->vf_stats);
 no_stats:
 	hv_set_drvdata(dev, NULL);
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 55/77] hv_netvsc: Fix ethtool change hash key error
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Haiyang Zhang, Sasha Levin, devel, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Haiyang Zhang <haiyangz@microsoft.com>

[ Upstream commit b4a10c750424e01b5e37372fef0a574ebf7b56c3 ]

Hyper-V hosts require us to disable RSS before changing RSS key,
otherwise the changing request will fail. This patch fixes the
coding error.

Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
Reported-by: Wei Hu <weh@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
[sl: fix up subject line]
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 8b537a049c1e..a4661d396e3c 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -774,8 +774,8 @@ rndis_filter_set_offload_params(struct net_device *ndev,
 	return ret;
 }
 
-int rndis_filter_set_rss_param(struct rndis_device *rdev,
-			       const u8 *rss_key)
+static int rndis_set_rss_param_msg(struct rndis_device *rdev,
+				   const u8 *rss_key, u16 flag)
 {
 	struct net_device *ndev = rdev->ndev;
 	struct rndis_request *request;
@@ -804,7 +804,7 @@ int rndis_filter_set_rss_param(struct rndis_device *rdev,
 	rssp->hdr.type = NDIS_OBJECT_TYPE_RSS_PARAMETERS;
 	rssp->hdr.rev = NDIS_RECEIVE_SCALE_PARAMETERS_REVISION_2;
 	rssp->hdr.size = sizeof(struct ndis_recv_scale_param);
-	rssp->flag = 0;
+	rssp->flag = flag;
 	rssp->hashinfo = NDIS_HASH_FUNC_TOEPLITZ | NDIS_HASH_IPV4 |
 			 NDIS_HASH_TCP_IPV4 | NDIS_HASH_IPV6 |
 			 NDIS_HASH_TCP_IPV6;
@@ -829,9 +829,12 @@ int rndis_filter_set_rss_param(struct rndis_device *rdev,
 
 	wait_for_completion(&request->wait_event);
 	set_complete = &request->response_msg.msg.set_complete;
-	if (set_complete->status == RNDIS_STATUS_SUCCESS)
-		memcpy(rdev->rss_key, rss_key, NETVSC_HASH_KEYLEN);
-	else {
+	if (set_complete->status == RNDIS_STATUS_SUCCESS) {
+		if (!(flag & NDIS_RSS_PARAM_FLAG_DISABLE_RSS) &&
+		    !(flag & NDIS_RSS_PARAM_FLAG_HASH_KEY_UNCHANGED))
+			memcpy(rdev->rss_key, rss_key, NETVSC_HASH_KEYLEN);
+
+	} else {
 		netdev_err(ndev, "Fail to set RSS parameters:0x%x\n",
 			   set_complete->status);
 		ret = -EINVAL;
@@ -842,6 +845,16 @@ int rndis_filter_set_rss_param(struct rndis_device *rdev,
 	return ret;
 }
 
+int rndis_filter_set_rss_param(struct rndis_device *rdev,
+			       const u8 *rss_key)
+{
+	/* Disable RSS before change */
+	rndis_set_rss_param_msg(rdev, rss_key,
+				NDIS_RSS_PARAM_FLAG_DISABLE_RSS);
+
+	return rndis_set_rss_param_msg(rdev, rss_key, 0);
+}
+
 static int rndis_filter_query_device_link_status(struct rndis_device *dev,
 						 struct netvsc_device *net_device)
 {
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 57/77] hv_netvsc: Fix hash key value reset after other ops
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Haiyang Zhang, Sasha Levin, devel, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Haiyang Zhang <haiyangz@microsoft.com>

[ Upstream commit 17d91256898402daf4425cc541ac9cbf64574d9a ]

Changing mtu, channels, or buffer sizes ops call to netvsc_attach(),
rndis_set_subchannel(), which always reset the hash key to default
value. That will override hash key changed previously. This patch
fixes the problem by save the hash key, then restore it when we re-
add the netvsc device.

Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
[sl: fix up subject line]
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/hyperv/hyperv_net.h   | 10 +++++++---
 drivers/net/hyperv/netvsc.c       |  2 +-
 drivers/net/hyperv/netvsc_drv.c   |  5 ++++-
 drivers/net/hyperv/rndis_filter.c |  9 +++++++--
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ef6f766f6389..e598a684700b 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -144,6 +144,8 @@ struct hv_netvsc_packet {
 	u32 total_data_buflen;
 };
 
+#define NETVSC_HASH_KEYLEN 40
+
 struct netvsc_device_info {
 	unsigned char mac_adr[ETH_ALEN];
 	u32  num_chn;
@@ -151,6 +153,8 @@ struct netvsc_device_info {
 	u32  recv_sections;
 	u32  send_section_size;
 	u32  recv_section_size;
+
+	u8 rss_key[NETVSC_HASH_KEYLEN];
 };
 
 enum rndis_device_state {
@@ -160,8 +164,6 @@ enum rndis_device_state {
 	RNDIS_DEV_DATAINITIALIZED,
 };
 
-#define NETVSC_HASH_KEYLEN 40
-
 struct rndis_device {
 	struct net_device *ndev;
 
@@ -209,7 +211,9 @@ int netvsc_recv_callback(struct net_device *net,
 void netvsc_channel_cb(void *context);
 int netvsc_poll(struct napi_struct *napi, int budget);
 
-int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev);
+int rndis_set_subchannel(struct net_device *ndev,
+			 struct netvsc_device *nvdev,
+			 struct netvsc_device_info *dev_info);
 int rndis_filter_open(struct netvsc_device *nvdev);
 int rndis_filter_close(struct netvsc_device *nvdev);
 struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 922054c1d544..1910810e55bd 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -84,7 +84,7 @@ static void netvsc_subchan_work(struct work_struct *w)
 
 	rdev = nvdev->extension;
 	if (rdev) {
-		ret = rndis_set_subchannel(rdev->ndev, nvdev);
+		ret = rndis_set_subchannel(rdev->ndev, nvdev, NULL);
 		if (ret == 0) {
 			netif_device_attach(rdev->ndev);
 		} else {
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 4055e12fce9a..80d9297ad9d9 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -876,6 +876,9 @@ static struct netvsc_device_info *netvsc_devinfo_get
 		dev_info->send_section_size = nvdev->send_section_size;
 		dev_info->recv_sections = nvdev->recv_section_cnt;
 		dev_info->recv_section_size = nvdev->recv_section_size;
+
+		memcpy(dev_info->rss_key, nvdev->extension->rss_key,
+		       NETVSC_HASH_KEYLEN);
 	} else {
 		dev_info->num_chn = VRSS_CHANNEL_DEFAULT;
 		dev_info->send_sections = NETVSC_DEFAULT_TX;
@@ -938,7 +941,7 @@ static int netvsc_attach(struct net_device *ndev,
 		return PTR_ERR(nvdev);
 
 	if (nvdev->num_chn > 1) {
-		ret = rndis_set_subchannel(ndev, nvdev);
+		ret = rndis_set_subchannel(ndev, nvdev, dev_info);
 
 		/* if unavailable, just proceed with one queue */
 		if (ret) {
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index a4661d396e3c..db81378e6624 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1134,7 +1134,9 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
  * This breaks overlap of processing the host message for the
  * new primary channel with the initialization of sub-channels.
  */
-int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev)
+int rndis_set_subchannel(struct net_device *ndev,
+			 struct netvsc_device *nvdev,
+			 struct netvsc_device_info *dev_info)
 {
 	struct nvsp_message *init_packet = &nvdev->channel_init_pkt;
 	struct net_device_context *ndev_ctx = netdev_priv(ndev);
@@ -1175,7 +1177,10 @@ int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev)
 		   atomic_read(&nvdev->open_chn) == nvdev->num_chn);
 
 	/* ignore failues from setting rss parameters, still have channels */
-	rndis_filter_set_rss_param(rdev, netvsc_hash_key);
+	if (dev_info)
+		rndis_filter_set_rss_param(rdev, dev_info->rss_key);
+	else
+		rndis_filter_set_rss_param(rdev, netvsc_hash_key);
 
 	netif_set_real_num_tx_queues(ndev, nvdev->num_chn);
 	netif_set_real_num_rx_queues(ndev, nvdev->num_chn);
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 58/77] sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Edward Cree, David S . Miller, Sasha Levin, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Edward Cree <ecree@solarflare.com>

[ Upstream commit 3366463513f544c12c6b88c13da4462ee9e7a1a1 ]

Use a bitmap to keep track of which partition types we've already seen;
 for duplicates, return -EEXIST from efx_ef10_mtd_probe_partition() and
 thus skip adding that partition.
Duplicate partitions occur because of the A/B backup scheme used by newer
 sfc NICs.  Prior to this patch they cause sysfs_warn_dup errors because
 they have the same name, causing us not to expose any MTDs at all.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/sfc/ef10.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 7eeac3d6cfe8..a497aace7e4f 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -6042,22 +6042,25 @@ static const struct efx_ef10_nvram_type_info efx_ef10_nvram_types[] = {
 	{ NVRAM_PARTITION_TYPE_LICENSE,		   0,    0, "sfc_license" },
 	{ NVRAM_PARTITION_TYPE_PHY_MIN,		   0xff, 0, "sfc_phy_fw" },
 };
+#define EF10_NVRAM_PARTITION_COUNT	ARRAY_SIZE(efx_ef10_nvram_types)
 
 static int efx_ef10_mtd_probe_partition(struct efx_nic *efx,
 					struct efx_mcdi_mtd_partition *part,
-					unsigned int type)
+					unsigned int type,
+					unsigned long *found)
 {
 	MCDI_DECLARE_BUF(inbuf, MC_CMD_NVRAM_METADATA_IN_LEN);
 	MCDI_DECLARE_BUF(outbuf, MC_CMD_NVRAM_METADATA_OUT_LENMAX);
 	const struct efx_ef10_nvram_type_info *info;
 	size_t size, erase_size, outlen;
+	int type_idx = 0;
 	bool protected;
 	int rc;
 
-	for (info = efx_ef10_nvram_types; ; info++) {
-		if (info ==
-		    efx_ef10_nvram_types + ARRAY_SIZE(efx_ef10_nvram_types))
+	for (type_idx = 0; ; type_idx++) {
+		if (type_idx == EF10_NVRAM_PARTITION_COUNT)
 			return -ENODEV;
+		info = efx_ef10_nvram_types + type_idx;
 		if ((type & ~info->type_mask) == info->type)
 			break;
 	}
@@ -6070,6 +6073,13 @@ static int efx_ef10_mtd_probe_partition(struct efx_nic *efx,
 	if (protected)
 		return -ENODEV; /* hide it */
 
+	/* If we've already exposed a partition of this type, hide this
+	 * duplicate.  All operations on MTDs are keyed by the type anyway,
+	 * so we can't act on the duplicate.
+	 */
+	if (__test_and_set_bit(type_idx, found))
+		return -EEXIST;
+
 	part->nvram_type = type;
 
 	MCDI_SET_DWORD(inbuf, NVRAM_METADATA_IN_TYPE, type);
@@ -6098,6 +6108,7 @@ static int efx_ef10_mtd_probe_partition(struct efx_nic *efx,
 static int efx_ef10_mtd_probe(struct efx_nic *efx)
 {
 	MCDI_DECLARE_BUF(outbuf, MC_CMD_NVRAM_PARTITIONS_OUT_LENMAX);
+	DECLARE_BITMAP(found, EF10_NVRAM_PARTITION_COUNT);
 	struct efx_mcdi_mtd_partition *parts;
 	size_t outlen, n_parts_total, i, n_parts;
 	unsigned int type;
@@ -6126,11 +6137,13 @@ static int efx_ef10_mtd_probe(struct efx_nic *efx)
 	for (i = 0; i < n_parts_total; i++) {
 		type = MCDI_ARRAY_DWORD(outbuf, NVRAM_PARTITIONS_OUT_TYPE_ID,
 					i);
-		rc = efx_ef10_mtd_probe_partition(efx, &parts[n_parts], type);
-		if (rc == 0)
-			n_parts++;
-		else if (rc != -ENODEV)
+		rc = efx_ef10_mtd_probe_partition(efx, &parts[n_parts], type,
+						  found);
+		if (rc == -EEXIST || rc == -ENODEV)
+			continue;
+		if (rc)
 			goto fail;
+		n_parts++;
 	}
 
 	rc = efx_mtd_add(efx, &parts[0].common, n_parts, sizeof(*parts));
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 59/77] ax25: fix possible use-after-free
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Ralf Baechle, David S . Miller, Sasha Levin,
	linux-hams, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 63530aba7826a0f8e129874df9c4d264f9db3f9e ]

syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/ax25.h    | 12 ++++++++++++
 net/ax25/ax25_ip.c    |  4 ++--
 net/ax25/ax25_route.c | 19 ++++++++-----------
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 3f9aea8087e3..8b7eb46ad72d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -201,6 +201,18 @@ static inline void ax25_hold_route(ax25_route *ax25_rt)
 
 void __ax25_put_route(ax25_route *ax25_rt);
 
+extern rwlock_t ax25_route_lock;
+
+static inline void ax25_route_lock_use(void)
+{
+	read_lock(&ax25_route_lock);
+}
+
+static inline void ax25_route_lock_unuse(void)
+{
+	read_unlock(&ax25_route_lock);
+}
+
 static inline void ax25_put_route(ax25_route *ax25_rt)
 {
 	if (refcount_dec_and_test(&ax25_rt->refcount))
diff --git a/net/ax25/ax25_ip.c b/net/ax25/ax25_ip.c
index 70417e9b932d..314bbc8010fb 100644
--- a/net/ax25/ax25_ip.c
+++ b/net/ax25/ax25_ip.c
@@ -114,6 +114,7 @@ netdev_tx_t ax25_ip_xmit(struct sk_buff *skb)
 	dst = (ax25_address *)(bp + 1);
 	src = (ax25_address *)(bp + 8);
 
+	ax25_route_lock_use();
 	route = ax25_get_route(dst, NULL);
 	if (route) {
 		digipeat = route->digipeat;
@@ -206,9 +207,8 @@ netdev_tx_t ax25_ip_xmit(struct sk_buff *skb)
 	ax25_queue_xmit(skb, dev);
 
 put:
-	if (route)
-		ax25_put_route(route);
 
+	ax25_route_lock_unuse();
 	return NETDEV_TX_OK;
 }
 
diff --git a/net/ax25/ax25_route.c b/net/ax25/ax25_route.c
index a0eff323af12..66f74c85cf6b 100644
--- a/net/ax25/ax25_route.c
+++ b/net/ax25/ax25_route.c
@@ -40,7 +40,7 @@
 #include <linux/export.h>
 
 static ax25_route *ax25_route_list;
-static DEFINE_RWLOCK(ax25_route_lock);
+DEFINE_RWLOCK(ax25_route_lock);
 
 void ax25_rt_device_down(struct net_device *dev)
 {
@@ -335,6 +335,7 @@ const struct seq_operations ax25_rt_seqops = {
  *	Find AX.25 route
  *
  *	Only routes with a reference count of zero can be destroyed.
+ *	Must be called with ax25_route_lock read locked.
  */
 ax25_route *ax25_get_route(ax25_address *addr, struct net_device *dev)
 {
@@ -342,7 +343,6 @@ ax25_route *ax25_get_route(ax25_address *addr, struct net_device *dev)
 	ax25_route *ax25_def_rt = NULL;
 	ax25_route *ax25_rt;
 
-	read_lock(&ax25_route_lock);
 	/*
 	 *	Bind to the physical interface we heard them on, or the default
 	 *	route if none is found;
@@ -365,11 +365,6 @@ ax25_route *ax25_get_route(ax25_address *addr, struct net_device *dev)
 	if (ax25_spe_rt != NULL)
 		ax25_rt = ax25_spe_rt;
 
-	if (ax25_rt != NULL)
-		ax25_hold_route(ax25_rt);
-
-	read_unlock(&ax25_route_lock);
-
 	return ax25_rt;
 }
 
@@ -400,9 +395,12 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25_address *addr)
 	ax25_route *ax25_rt;
 	int err = 0;
 
-	if ((ax25_rt = ax25_get_route(addr, NULL)) == NULL)
+	ax25_route_lock_use();
+	ax25_rt = ax25_get_route(addr, NULL);
+	if (!ax25_rt) {
+		ax25_route_lock_unuse();
 		return -EHOSTUNREACH;
-
+	}
 	if ((ax25->ax25_dev = ax25_dev_ax25dev(ax25_rt->dev)) == NULL) {
 		err = -EHOSTUNREACH;
 		goto put;
@@ -437,8 +435,7 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25_address *addr)
 	}
 
 put:
-	ax25_put_route(ax25_rt);
-
+	ax25_route_lock_unuse();
 	return err;
 }
 
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 67/77] net: usb: asix: ax88772_bind return error when hw_reset fail
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zhang Run, David S . Miller, Sasha Levin, linux-usb, netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Zhang Run <zhang.run@zte.com.cn>

[ Upstream commit 6eea3527e68acc22483f4763c8682f223eb90029 ]

The ax88772_bind() should return error code immediately when the PHY
was not reset properly through ax88772a_hw_reset().
Otherwise, The asix_get_phyid() will block when get the PHY
Identifier from the PHYSID1 MII registers through asix_mdio_read()
due to the PHY isn't ready. Furthermore, it will produce a lot of
error message cause system crash.As follows:
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
 reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to send
 software reset: ffffffb9
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
 reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to enable
 software MII access
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to read
 reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
 reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to enable
 software MII access
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to read
 reg index 0x0000: -71
...

Signed-off-by: Zhang Run <zhang.run@zte.com.cn>
Reviewed-by: Yang Wei <yang.wei9@zte.com.cn>
Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/usb/asix_devices.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index b654f05b2ccd..3d93993e74da 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -739,8 +739,13 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 	asix_read_cmd(dev, AX_CMD_STATMNGSTS_REG, 0, 0, 1, &chipcode, 0);
 	chipcode &= AX_CHIPCODE_MASK;

-	(chipcode == AX_AX88772_CHIPCODE) ? ax88772_hw_reset(dev, 0) :
-					    ax88772a_hw_reset(dev, 0);
+	ret = (chipcode == AX_AX88772_CHIPCODE) ? ax88772_hw_reset(dev, 0) :
+						  ax88772a_hw_reset(dev, 0);
+
+	if (ret < 0) {
+		netdev_dbg(dev->net, "Failed to reset AX88772: %d\n", ret);
+		return ret;
+	}

 	/* Read PHYID register *AFTER* the PHY was reset properly */
 	phyid = asix_get_phyid(dev);
-- 
2.19.1

^ permalink raw reply related

* [PATCH AUTOSEL 4.20 69/77] ibmveth: Do not process frames after calling napi_reschedule
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Falcon, David S . Miller, Sasha Levin, linuxppc-dev,
	netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Thomas Falcon <tlfalcon@linux.ibm.com>

[ Upstream commit e95d22c69b2c130ccce257b84daf283fd82d611e ]

The IBM virtual ethernet driver's polling function continues
to process frames after rescheduling NAPI, resulting in a warning
if it exhausted its budget. Do not restart polling after calling
napi_reschedule. Instead let frames be processed in the following
instance.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 098d8764c0ea..dd71d5db7274 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1313,7 +1313,6 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	unsigned long lpar_rc;
 	u16 mss = 0;
 
-restart_poll:
 	while (frames_processed < budget) {
 		if (!ibmveth_rxq_pending_buffer(adapter))
 			break;
@@ -1401,7 +1400,6 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 		    napi_reschedule(napi)) {
 			lpar_rc = h_vio_signal(adapter->vdev->unit_address,
 					       VIO_IRQ_DISABLE);
-			goto restart_poll;
 		}
 	}
 
-- 
2.19.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.20 70/77] mac80211: don't initiate TDLS connection if station is not associated to AP
From: Sasha Levin @ 2019-02-15  2:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Balaji Pothunoori, Johannes Berg, Sasha Levin, linux-wireless,
	netdev
In-Reply-To: <20190215020855.176727-1-sashal@kernel.org>

From: Balaji Pothunoori <bpothuno@codeaurora.org>

[ Upstream commit 7ed5285396c257fd4070b1e29e7b2341aae2a1ce ]

Following call trace is observed while adding TDLS peer entry in driver
during TDLS setup.

Call Trace:
[<c1301476>] dump_stack+0x47/0x61
[<c10537d2>] __warn+0xe2/0x100
[<fa22415f>] ? sta_apply_parameters+0x49f/0x550 [mac80211]
[<c1053895>] warn_slowpath_null+0x25/0x30
[<fa22415f>] sta_apply_parameters+0x49f/0x550 [mac80211]
[<fa20ad42>] ? sta_info_alloc+0x1c2/0x450 [mac80211]
[<fa224623>] ieee80211_add_station+0xe3/0x160 [mac80211]
[<c1876fe3>] nl80211_new_station+0x273/0x420
[<c170f6d9>] genl_rcv_msg+0x219/0x3c0
[<c170f4c0>] ? genl_rcv+0x30/0x30
[<c170ee7e>] netlink_rcv_skb+0x8e/0xb0
[<c170f4ac>] genl_rcv+0x1c/0x30
[<c170e8aa>] netlink_unicast+0x13a/0x1d0
[<c170ec18>] netlink_sendmsg+0x2d8/0x390
[<c16c5acd>] sock_sendmsg+0x2d/0x40
[<c16c6369>] ___sys_sendmsg+0x1d9/0x1e0

Fixing this by allowing TDLS setup request only when we have completed
association.

Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/cfg.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 818aa0060349..04337e6be7f0 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1490,6 +1490,10 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER))
 		sta->sta.tdls = true;
 
+	if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION &&
+	    !sdata->u.mgd.associated)
+		return -EINVAL;
+
 	err = sta_apply_parameters(local, sta, params);
 	if (err) {
 		sta_info_free(local, sta);
-- 
2.19.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox