Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC 03/14] net: hstats: add basic/core functionality
From: David Ahern @ 2019-01-30  4:18 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch
In-Reply-To: <20190128234507.32028-4-jakub.kicinski@netronome.com>

On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
>  		rcu_read_unlock();
>  	}
>  
> +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))

filter_mask is populated by RTEXT_FILTER_ from
include/uapi/linux/rtnetlink.h

> +		size += rtnl_get_link_hstats_size(dev);

rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.

> +
>  	return size;
>  }
>  
> 


^ permalink raw reply

* Re: [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
From: Alexei Starovoitov @ 2019-01-30  4:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Zijlstra, Eric Dumazet,
	Jann Horn, Network Development, Kernel Team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>

On Tue, Jan 29, 2019 at 8:06 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> In addition to preempt_disable patch for socket filters
> https://patchwork.ozlabs.org/patch/1032437/
> the first three patches fix various lockdep false positives.
> Last patch fixes potential deadlock in stackmap access from
> tracing bpf prog and from syscall.

Typo in subject.
All patches are for 'bpf' tree.

^ permalink raw reply

* [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>

Lockdep warns about false positive:
[   13.007000] WARNING: possible circular locking dependency detected
[   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[   13.008124] ------------------------------------------------------
[   13.008624] test_progs/246 is trying to acquire lock:
[   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[   13.009770]
[   13.009770] but task is already holding lock:
[   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.010877]
[   13.010877] which lock already depends on the new lock.
[   13.010877]
[   13.011532]
[   13.011532] the existing dependency chain (in reverse order) is:
[   13.012129]
[   13.012129] -> #4 (bpf_event_mutex){+.+.}:
[   13.012582]        perf_event_query_prog_array+0x9b/0x130
[   13.013016]        _perf_ioctl+0x3aa/0x830
[   13.013354]        perf_ioctl+0x2e/0x50
[   13.013668]        do_vfs_ioctl+0x8f/0x6a0
[   13.014003]        ksys_ioctl+0x70/0x80
[   13.014320]        __x64_sys_ioctl+0x16/0x20
[   13.014668]        do_syscall_64+0x4a/0x180
[   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.015469]
[   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[   13.015910]        perf_event_init_cpu+0x5a/0x90
[   13.016291]        perf_event_init+0x1b2/0x1de
[   13.016654]        start_kernel+0x2b8/0x42a
[   13.016995]        secondary_startup_64+0xa4/0xb0
[   13.017382]
[   13.017382] -> #2 (pmus_lock){+.+.}:
[   13.017794]        perf_event_init_cpu+0x21/0x90
[   13.018172]        cpuhp_invoke_callback+0xb3/0x960
[   13.018573]        _cpu_up+0xa7/0x140
[   13.018871]        do_cpu_up+0xa4/0xc0
[   13.019178]        smp_init+0xcd/0xd2
[   13.019483]        kernel_init_freeable+0x123/0x24f
[   13.019878]        kernel_init+0xa/0x110
[   13.020201]        ret_from_fork+0x24/0x30
[   13.020541]
[   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[   13.021051]        static_key_slow_inc+0xe/0x20
[   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
[   13.021891]        perf_trace_event_init+0x11f/0x250
[   13.022297]        perf_trace_init+0x6b/0xa0
[   13.022644]        perf_tp_event_init+0x25/0x40
[   13.023011]        perf_try_init_event+0x6b/0x90
[   13.023386]        perf_event_alloc+0x9a8/0xc40
[   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
[   13.024173]        do_syscall_64+0x4a/0x180
[   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.024968]
[   13.024968] -> #0 (tracepoints_mutex){+.+.}:
[   13.025434]        __mutex_lock+0x86/0x970
[   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
[   13.026215]        bpf_probe_register+0x40/0x60
[   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.027042]        __do_sys_bpf+0x94f/0x1a90
[   13.027389]        do_syscall_64+0x4a/0x180
[   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.028171]
[   13.028171] other info that might help us debug this:
[   13.028171]
[   13.028807] Chain exists of:
[   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[   13.028807]
[   13.029666]  Possible unsafe locking scenario:
[   13.029666]
[   13.030140]        CPU0                    CPU1
[   13.030510]        ----                    ----
[   13.030875]   lock(bpf_event_mutex);
[   13.031166]                                lock(&cpuctx_mutex);
[   13.031645]                                lock(bpf_event_mutex);
[   13.032135]   lock(tracepoints_mutex);
[   13.032441]
[   13.032441]  *** DEADLOCK ***
[   13.032441]
[   13.032911] 1 lock held by test_progs/246:
[   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.033909]
[   13.033909] stack backtrace:
[   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   13.035657] Call Trace:
[   13.035859]  dump_stack+0x5f/0x8b
[   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
[   13.036526]  __lock_acquire+0x1158/0x1350
[   13.036852]  ? lock_acquire+0x98/0x190
[   13.037154]  lock_acquire+0x98/0x190
[   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.037876]  __mutex_lock+0x86/0x970
[   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.039028]  ? __mutex_lock+0x86/0x970
[   13.039337]  ? __mutex_lock+0x24a/0x970
[   13.039649]  ? bpf_probe_register+0x1d/0x60
[   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
[   13.041325]  bpf_probe_register+0x40/0x60
[   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.042068]  ? __might_fault+0x3e/0x90
[   13.042374]  __do_sys_bpf+0x94f/0x1a90
[   13.042678]  do_syscall_64+0x4a/0x180
[   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.043382] RIP: 0033:0x7f23b10a07f9
[   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

Lockdep seems to be confusing different mutexes.
Such deadlock is not possible.
Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.

Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/trace/bpf_trace.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8b068adb9da1..f1a86a0d881d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1204,22 +1204,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	int err;
-
-	mutex_lock(&bpf_event_mutex);
-	err = __bpf_probe_register(btp, prog);
-	mutex_unlock(&bpf_event_mutex);
-	return err;
+	return __bpf_probe_register(btp, prog);
 }
 
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	int err;
-
-	mutex_lock(&bpf_event_mutex);
-	err = tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
-	mutex_unlock(&bpf_event_mutex);
-	return err;
+	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
 }
 
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
-- 
2.20.0


^ permalink raw reply related

* [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>

From: Martin KaFai Lau <kafai@fb.com>

The map_lookup_elem used to not acquiring spinlock
in order to optimize the reader.

It was true until commit 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
If that is the case, pcpu_freelist_push() saves this elem for reuse later.
This push requires a spinlock.

If a tracing bpf_prog got run in the middle of the syscall's
map_lookup_elem(stackmap) and this tracing bpf_prog is calling
bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
spinlock, it may end up with a dead lock situation as reported by
Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/

The situation is the same as the syscall's map_update_elem() which
needs to acquire the pcpu_freelist's spinlock and could race
with tracing bpf_prog.  Hence, this patch fixes it by protecting
bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
to prevent tracing bpf_prog from running.

A later syscall's map_lookup_elem commit f1a2e44a3aec ("bpf: add queue and stack maps")
also acquires a spinlock and races with tracing bpf_prog similarly.
Hence, this patch is forward looking and protects the majority
of the map lookups.  bpf_map_offload_lookup_elem() is the exception
since it is for network bpf_prog only (i.e. never called by tracing
bpf_prog).

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/syscall.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..8577bb7f8be6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -713,8 +713,13 @@ static int map_lookup_elem(union bpf_attr *attr)

 	if (bpf_map_is_dev_bound(map)) {
 		err = bpf_map_offload_lookup_elem(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-		   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+		goto done;
+	}
+
+	preempt_disable();
+	this_cpu_inc(bpf_prog_active);
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
 		err = bpf_percpu_hash_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		err = bpf_percpu_array_copy(map, key, value);
@@ -744,7 +749,10 @@ static int map_lookup_elem(union bpf_attr *attr)
 		}
 		rcu_read_unlock();
 	}
+	this_cpu_dec(bpf_prog_active);
+	preempt_enable();

+done:
 	if (err)
 		goto free_value;

-- 
2.20.0

^ permalink raw reply related

* [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>

Lockdep warns about false positive:
[   11.211460] ------------[ cut here ]------------
[   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
[   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
[   11.213134] Modules linked in:
[   11.213413] CPU: 0 PID: 141 Comm: systemd-journal Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #476
[   11.214191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   11.214954] RIP: 0010:lock_release+0x1ad/0x280
[   11.217036] RSP: 0018:ffff88813ba03f50 EFLAGS: 00010086
[   11.217516] RAX: 000000000000001f RBX: ffff8881378d8000 RCX: 0000000000000000
[   11.218179] RDX: ffffffff810d3e9e RSI: 0000000000000001 RDI: ffffffff810d3eb3
[   11.218851] RBP: ffff8881393e2b08 R08: 0000000000000002 R09: 0000000000000000
[   11.219504] R10: 0000000000000000 R11: ffff88813ba03d9d R12: ffffffff8118dfa2
[   11.220162] R13: 0000000000000086 R14: 0000000000000000 R15: 0000000000000000
[   11.220717] FS:  00007f3c8cf35780(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
[   11.221348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.221822] CR2: 00007f5825d92080 CR3: 00000001378c8005 CR4: 00000000003606f0
[   11.222381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.222951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   11.223508] Call Trace:
[   11.223705]  <IRQ>
[   11.223874]  ? __local_bh_enable+0x7a/0x80
[   11.224199]  up_read+0x1c/0xa0
[   11.224446]  do_up_read+0x12/0x20
[   11.224713]  irq_work_run_list+0x43/0x70
[   11.225030]  irq_work_run+0x26/0x50
[   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
[   11.225662]  irq_work_interrupt+0xf/0x20

since rw_semaphore is released in a different task vs task that locked the sema.
It is expected behavior.
Silence the warning by using up_read_non_owner().

Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/stackmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b14535827..4b79e7c251e5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -44,7 +44,7 @@ static void do_up_read(struct irq_work *entry)
 	struct stack_map_irq_work *work;
 
 	work = container_of(entry, struct stack_map_irq_work, irq_work);
-	up_read(work->sem);
+	up_read_non_owner(work->sem);
 	work->sem = NULL;
 }
 
-- 
2.20.0


^ permalink raw reply related

* [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>

Lockdep warns about false positive:
[   12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[   12.493275]  (&rq->lock){-.-.}
[   12.493276]
[   12.493276]
[   12.493276] and interrupts could create inverse lock ordering between them.
[   12.493276]
[   12.494435]
[   12.494435] other info that might help us debug this:
[   12.494979]  Possible interrupt unsafe locking scenario:
[   12.494979]
[   12.495518]        CPU0                    CPU1
[   12.495879]        ----                    ----
[   12.496243]   lock(&head->lock);
[   12.496502]                                local_irq_disable();
[   12.496969]                                lock(&rq->lock);
[   12.497431]                                lock(&head->lock);
[   12.497890]   <Interrupt>
[   12.498104]     lock(&rq->lock);
[   12.498368]
[   12.498368]  *** DEADLOCK ***
[   12.498368]
[   12.498837] 1 lock held by dd/276:
[   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[   12.499747]
[   12.499747] the shortest dependencies between 2nd lock and 1st lock:
[   12.500389]  -> (&rq->lock){-.-.} {
[   12.500669]     IN-HARDIRQ-W at:
[   12.500934]                       _raw_spin_lock+0x2f/0x40
[   12.501373]                       scheduler_tick+0x4c/0xf0
[   12.501812]                       update_process_times+0x40/0x50
[   12.502294]                       tick_periodic+0x27/0xb0
[   12.502723]                       tick_handle_periodic+0x1f/0x60
[   12.503203]                       timer_interrupt+0x11/0x20
[   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
[   12.504167]                       handle_irq_event_percpu+0x20/0x50
[   12.504674]                       handle_irq_event+0x37/0x60
[   12.505139]                       handle_level_irq+0xa7/0x120
[   12.505601]                       handle_irq+0xa1/0x150
[   12.506018]                       do_IRQ+0x77/0x140
[   12.506411]                       ret_from_intr+0x0/0x1d
[   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
[   12.507362]                       __setup_irq+0x481/0x730
[   12.507789]                       setup_irq+0x49/0x80
[   12.508195]                       hpet_time_init+0x21/0x32
[   12.508644]                       x86_late_time_init+0xb/0x16
[   12.509106]                       start_kernel+0x390/0x42a
[   12.509554]                       secondary_startup_64+0xa4/0xb0
[   12.510034]     IN-SOFTIRQ-W at:
[   12.510305]                       _raw_spin_lock+0x2f/0x40
[   12.510772]                       try_to_wake_up+0x1c7/0x4e0
[   12.511220]                       swake_up_locked+0x20/0x40
[   12.511657]                       swake_up_one+0x1a/0x30
[   12.512070]                       rcu_process_callbacks+0xc5/0x650
[   12.512553]                       __do_softirq+0xe6/0x47b
[   12.512978]                       irq_exit+0xc3/0xd0
[   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
[   12.513876]                       apic_timer_interrupt+0xf/0x20
[   12.514343]                       default_idle+0x1c/0x170
[   12.514765]                       do_idle+0x199/0x240
[   12.515159]                       cpu_startup_entry+0x19/0x20
[   12.515614]                       start_kernel+0x422/0x42a
[   12.516045]                       secondary_startup_64+0xa4/0xb0
[   12.516521]     INITIAL USE at:
[   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
[   12.517258]                      rq_attach_root+0x16/0xd0
[   12.517685]                      sched_init+0x2f2/0x3eb
[   12.518096]                      start_kernel+0x1fb/0x42a
[   12.518525]                      secondary_startup_64+0xa4/0xb0
[   12.518986]   }
[   12.519132]   ... key      at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[   12.519649]   ... acquired at:
[   12.519892]    pcpu_freelist_pop+0x7b/0xd0
[   12.520221]    bpf_get_stackid+0x1d2/0x4d0
[   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
[   12.520887]
[   12.521008] -> (&head->lock){+...} {
[   12.521292]    HARDIRQ-ON-W at:
[   12.521539]                     _raw_spin_lock+0x2f/0x40
[   12.521950]                     pcpu_freelist_push+0x2a/0x40
[   12.522396]                     bpf_get_stackid+0x494/0x4d0
[   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
[   12.523296]    INITIAL USE at:
[   12.523537]                    _raw_spin_lock+0x2f/0x40
[   12.523944]                    pcpu_freelist_populate+0xc0/0x120
[   12.524417]                    htab_map_alloc+0x405/0x500
[   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
[   12.525253]                    do_syscall_64+0x4a/0x180
[   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   12.526167]  }
[   12.526311]  ... key      at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[   12.526812]  ... acquired at:
[   12.527047]    __lock_acquire+0x521/0x1350
[   12.527371]    lock_acquire+0x98/0x190
[   12.527680]    _raw_spin_lock+0x2f/0x40
[   12.527994]    pcpu_freelist_push+0x2a/0x40
[   12.528325]    bpf_get_stackid+0x494/0x4d0
[   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
[   12.528970]
[   12.529092]
[   12.529092] stack backtrace:
[   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   12.530750] Call Trace:
[   12.530948]  dump_stack+0x5f/0x8b
[   12.531248]  check_usage_backwards+0x10c/0x120
[   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
[   12.531935]  ? mark_lock+0x382/0x560
[   12.532229]  mark_lock+0x382/0x560
[   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
[   12.532928]  __lock_acquire+0x521/0x1350
[   12.533271]  ? find_get_entry+0x17f/0x2e0
[   12.533586]  ? find_get_entry+0x19c/0x2e0
[   12.533902]  ? lock_acquire+0x98/0x190
[   12.534196]  lock_acquire+0x98/0x190
[   12.534482]  ? pcpu_freelist_push+0x2a/0x40
[   12.534810]  _raw_spin_lock+0x2f/0x40
[   12.535099]  ? pcpu_freelist_push+0x2a/0x40
[   12.535432]  pcpu_freelist_push+0x2a/0x40
[   12.535750]  bpf_get_stackid+0x494/0x4d0
[   12.536062]  ___bpf_prog_run+0x8b4/0x11a0

It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
  has already been running (ensured by the percpu bpf_prog_active counter).

Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.

Fix this by calling local_irq_save/restore for stackmap.

Another false positive had also been worked around by calling
local_irq_save in commit 89ad2fa3f043 ("bpf: fix lockdep splat").
That commit added unnecessary irq_save/restore to fast path of
bpf hash map. irqs are already disabled at that point, since htab
is holding per bucket spin_lock with irqsave.

Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
to be used elsewhere (right now only in stackmap).
It stops lockdep false positive in stackmap with a bit of acceptable overhead.

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/hashtab.c         |  4 ++--
 kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
 kernel/bpf/percpu_freelist.h |  4 ++++
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4b7c76765d9d..f9274114c88d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -686,7 +686,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 	}
 
 	if (htab_is_prealloc(htab)) {
-		pcpu_freelist_push(&htab->freelist, &l->fnode);
+		__pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		atomic_dec(&htab->count);
 		l->htab = htab;
@@ -748,7 +748,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			struct pcpu_freelist_node *l;
 
-			l = pcpu_freelist_pop(&htab->freelist);
+			l = __pcpu_freelist_pop(&htab->freelist);
 			if (!l)
 				return ERR_PTR(-E2BIG);
 			l_new = container_of(l, struct htab_elem, fnode);
diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 673fa6fe2d73..0c1b4ba9e90e 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -28,8 +28,8 @@ void pcpu_freelist_destroy(struct pcpu_freelist *s)
 	free_percpu(s->freelist);
 }
 
-static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
-					struct pcpu_freelist_node *node)
+static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+					 struct pcpu_freelist_node *node)
 {
 	raw_spin_lock(&head->lock);
 	node->next = head->first;
@@ -37,12 +37,22 @@ static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
 	raw_spin_unlock(&head->lock);
 }
 
-void pcpu_freelist_push(struct pcpu_freelist *s,
+void __pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
 	struct pcpu_freelist_head *head = this_cpu_ptr(s->freelist);
 
-	__pcpu_freelist_push(head, node);
+	___pcpu_freelist_push(head, node);
+}
+
+void pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__pcpu_freelist_push(s, node);
+	local_irq_restore(flags);
 }
 
 void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
@@ -63,7 +73,7 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 	for_each_possible_cpu(cpu) {
 again:
 		head = per_cpu_ptr(s->freelist, cpu);
-		__pcpu_freelist_push(head, buf);
+		___pcpu_freelist_push(head, buf);
 		i++;
 		buf += elem_size;
 		if (i == nr_elems)
@@ -74,14 +84,12 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 	local_irq_restore(flags);
 }
 
-struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
 	struct pcpu_freelist_head *head;
 	struct pcpu_freelist_node *node;
-	unsigned long flags;
 	int orig_cpu, cpu;
 
-	local_irq_save(flags);
 	orig_cpu = cpu = raw_smp_processor_id();
 	while (1) {
 		head = per_cpu_ptr(s->freelist, cpu);
@@ -89,16 +97,25 @@ struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
 		node = head->first;
 		if (node) {
 			head->first = node->next;
-			raw_spin_unlock_irqrestore(&head->lock, flags);
+			raw_spin_unlock(&head->lock);
 			return node;
 		}
 		raw_spin_unlock(&head->lock);
 		cpu = cpumask_next(cpu, cpu_possible_mask);
 		if (cpu >= nr_cpu_ids)
 			cpu = 0;
-		if (cpu == orig_cpu) {
-			local_irq_restore(flags);
+		if (cpu == orig_cpu)
 			return NULL;
-		}
 	}
 }
+
+struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+{
+	struct pcpu_freelist_node *ret;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ret = __pcpu_freelist_pop(s);
+	local_irq_restore(flags);
+	return ret;
+}
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3049aae8ea1e..c3960118e617 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -22,8 +22,12 @@ struct pcpu_freelist_node {
 	struct pcpu_freelist_node *next;
 };
 
+/* pcpu_freelist_* do spin_lock_irqsave. */
 void pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
 struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *);
+/* __pcpu_freelist_* do spin_lock only. caller must disable irqs. */
+void __pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *);
 void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 			    u32 nr_elems);
 int pcpu_freelist_init(struct pcpu_freelist *);
-- 
2.20.0


^ permalink raw reply related

* [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

In addition to preempt_disable patch for socket filters
https://patchwork.ozlabs.org/patch/1032437/
the first three patches fix various lockdep false positives.
Last patch fixes potential deadlock in stackmap access from
tracing bpf prog and from syscall.

Alexei Starovoitov (3):
  bpf: fix lockdep false positive in percpu_freelist
  bpf: fix lockdep false positive in stackmap
  bpf: fix lockdep false positive in bpf_prog_register

Martin KaFai Lau (1):
  bpf: Fix syscall's stackmap lookup potential deadlock

 kernel/bpf/hashtab.c         |  4 ++--
 kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
 kernel/bpf/percpu_freelist.h |  4 ++++
 kernel/bpf/stackmap.c        |  2 +-
 kernel/bpf/syscall.c         | 12 +++++++++--
 kernel/trace/bpf_trace.c     | 14 ++----------
 6 files changed, 48 insertions(+), 29 deletions(-)

-- 
2.20.0


^ permalink raw reply

* Re: [PATCH net-next] cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()
From: Gustavo A. R. Silva @ 2019-01-30  3:57 UTC (permalink / raw)
  To: David Miller; +Cc: arjun, netdev, linux-kernel
In-Reply-To: <20190129.105724.1719800687435588791.davem@davemloft.net>

On 1/29/19 12:57 PM, David Miller wrote:

>>
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> 
> Applied, thanks.
> 

Thanks, Dave.

--
Gustavo

^ permalink raw reply

* Re: [PATCHv4 2/3] net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC
From: Sean Wang @ 2019-01-30  3:55 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
	Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <d030fb8f-c707-9aa2-5ebc-e8100188a5c2@gmail.com>

On Tue, Jan 29, 2019 at 7:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Greg Ungerer <gerg@kernel.org>
> >
> > The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
> > linux kernel 7530 DSA switch driver can be used with it.
> >
> > The bulk of the changes required stem from the 7621 having different
> > regulator and pad setup. The existing setup of these in the 7530
> > driver appears to be very specific to its implemtation in the Mediatek
> > 7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
> > based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).
> >
> > Create a new devicetree type, "mediatek,mt7621", to support the 7530
> > switch in the 7621 SoC. There appears to be no usable ID register to
> > distinguish it from a 7530 in other hardware at runtime. This is used
> > to carry out the appropriate configuration and setup.
> >
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Sean Wang <sean.wang@kernel.org>

> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply

* Re: [PATCHv4 3/3] dt-bindings: net: dsa: add new MT7530 binding to support MT7621
From: Sean Wang @ 2019-01-30  3:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
	Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <9fd01c61-fb62-52fc-e587-43a03602bae2@gmail.com>

On Tue, Jan 29, 2019 at 7:40 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Greg Ungerer <gerg@kernel.org>
> >
> > Add devicetree binding to support the compatible mt7530 switch as used
> > in the MediaTek MT7621 SoC.
> >
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Sean Wang <sean.wang@kernel.org>

> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply

* Re: [PATCHv4 1/3] net: ethernet: mediatek: support MT7621 SoC ethernet hardware
From: Sean Wang @ 2019-01-30  3:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
	Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <48bb34ce-9bb0-2dc1-0e17-55e000a23114@gmail.com>

On Tue, Jan 29, 2019 at 7:40 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Bjørn Mork <bjorn@mork.no>
> >
> > The Mediatek MT7621 SoC contains the same ethernet hardware module as
> > used on a number of other MediaTek SoC parts. There are some minor
> > differences to deal with but we can use the same driver to support
> > them all.
> >
> > This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his
> > original patch is at:
> >
> > https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
> >
> > There is an additional compatible devicetree type added, and the primary
> > change to the code required is to support a single interrupt (for both
> > RX and TX interrupts).
> >
> > Signed-off-by: Bjørn Mork <bjorn@mork.no>
> > [gerg@kernel.org: rebase to mainline and irq handler fix]
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>

Acked-by: Sean Wang <sean.wang@kernel.org>

> > ---
> >  drivers/net/ethernet/mediatek/Kconfig       |  2 +-
> >  drivers/net/ethernet/mediatek/mtk_eth_soc.c | 48 ++++++++++++++++++---
> >  drivers/net/ethernet/mediatek/mtk_eth_soc.h |  4 ++
> >  3 files changed, 46 insertions(+), 8 deletions(-)
> >
> > v2: first in series for this change
> > v3: rebase onto 5.0-rc3
> > v4: rebase onto 5.0-rc4
> >
> > diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> > index f9149d2a4694..43656f961891 100644
> > --- a/drivers/net/ethernet/mediatek/Kconfig
> > +++ b/drivers/net/ethernet/mediatek/Kconfig
> > @@ -1,6 +1,6 @@
> >  config NET_VENDOR_MEDIATEK
> >       bool "MediaTek ethernet driver"
> > -     depends on ARCH_MEDIATEK
> > +     depends on ARCH_MEDIATEK || SOC_MT7621
>
> Would be nice to add COMPILE_TEST there as well at some point so someone
> can easily build test changes to the driver.

okay, I will do it in another patch

>
> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply

* [PATCH] tun: move the call to tun_set_real_num_queues
From: George Amanakis @ 2019-01-30  3:53 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, sdf, netdev, George Amanakis
In-Reply-To: <CAKH8qBvEintAf5Dmz+8kCTW1V_BRUDxSAZy8WoYSyNoLW+JHfQ@mail.gmail.com>

Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".

Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/tun.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..fed298c0cb39 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	if (rtnl_dereference(tun->xdp_prog))
 		sock_set_flag(&tfile->sk, SOCK_XDP);
 
-	tun_set_real_num_queues(tun);
-
 	/* device is allowed to go away first, so no need to hold extra
 	 * refcnt.
 	 */
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	rcu_assign_pointer(tfile->tun, tun);
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	tun->numqueues++;
+	tun_set_real_num_queues(tun);
 out:
 	return err;
 }
-- 
2.20.1


^ permalink raw reply related

* [PATCH] tun: move the call to tun_set_real_num_queues
From: George Amanakis @ 2019-01-30  3:50 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, sdf, netdev, George Amanakis

Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".

Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 drivers/net/tun.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..fed298c0cb39 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	if (rtnl_dereference(tun->xdp_prog))
 		sock_set_flag(&tfile->sk, SOCK_XDP);
 
-	tun_set_real_num_queues(tun);
-
 	/* device is allowed to go away first, so no need to hold extra
 	 * refcnt.
 	 */
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	rcu_assign_pointer(tfile->tun, tun);
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	tun->numqueues++;
+	tun_set_real_num_queues(tun);
 out:
 	return err;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCHv4 1/3] net: ethernet: mediatek: support MT7621 SoC ethernet hardware
From: Florian Fainelli @ 2019-01-30  3:40 UTC (permalink / raw)
  To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
	netdev
  Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-2-gerg@kernel.org>



On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Bjørn Mork <bjorn@mork.no>
> 
> The Mediatek MT7621 SoC contains the same ethernet hardware module as
> used on a number of other MediaTek SoC parts. There are some minor
> differences to deal with but we can use the same driver to support
> them all.
> 
> This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his
> original patch is at:
> 
> https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
> 
> There is an additional compatible devicetree type added, and the primary
> change to the code required is to support a single interrupt (for both
> RX and TX interrupts).
> 
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> [gerg@kernel.org: rebase to mainline and irq handler fix]
> Signed-off-by: Greg Ungerer <gerg@kernel.org>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

> ---
>  drivers/net/ethernet/mediatek/Kconfig       |  2 +-
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c | 48 ++++++++++++++++++---
>  drivers/net/ethernet/mediatek/mtk_eth_soc.h |  4 ++
>  3 files changed, 46 insertions(+), 8 deletions(-)
> 
> v2: first in series for this change
> v3: rebase onto 5.0-rc3
> v4: rebase onto 5.0-rc4
> 
> diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> index f9149d2a4694..43656f961891 100644
> --- a/drivers/net/ethernet/mediatek/Kconfig
> +++ b/drivers/net/ethernet/mediatek/Kconfig
> @@ -1,6 +1,6 @@
>  config NET_VENDOR_MEDIATEK
>  	bool "MediaTek ethernet driver"
> -	depends on ARCH_MEDIATEK
> +	depends on ARCH_MEDIATEK || SOC_MT7621

Would be nice to add COMPILE_TEST there as well at some point so someone
can easily build test changes to the driver.

-- 
Florian

^ permalink raw reply

* Re: [PATCHv4 3/3] dt-bindings: net: dsa: add new MT7530 binding to support MT7621
From: Florian Fainelli @ 2019-01-30  3:39 UTC (permalink / raw)
  To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
	netdev
  Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-4-gerg@kernel.org>



On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Greg Ungerer <gerg@kernel.org>
> 
> Add devicetree binding to support the compatible mt7530 switch as used
> in the MediaTek MT7621 SoC.
> 
> Signed-off-by: Greg Ungerer <gerg@kernel.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCHv4 2/3] net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC
From: Florian Fainelli @ 2019-01-30  3:39 UTC (permalink / raw)
  To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
	netdev
  Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-3-gerg@kernel.org>



On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Greg Ungerer <gerg@kernel.org>
> 
> The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
> linux kernel 7530 DSA switch driver can be used with it.
> 
> The bulk of the changes required stem from the 7621 having different
> regulator and pad setup. The existing setup of these in the 7530
> driver appears to be very specific to its implemtation in the Mediatek
> 7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
> based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).
> 
> Create a new devicetree type, "mediatek,mt7621", to support the 7530
> switch in the 7621 SoC. There appears to be no usable ID register to
> distinguish it from a 7530 in other hardware at runtime. This is used
> to carry out the appropriate configuration and setup.
> 
> Signed-off-by: Greg Ungerer <gerg@kernel.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next v3 1/4] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: David Ahern @ 2019-01-30  3:29 UTC (permalink / raw)
  To: Peter Oskolkov, Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov, Willem de Bruijn
In-Reply-To: <20190129011217.192510-2-posk@google.com>

On 1/28/19 6:12 PM, Peter Oskolkov wrote
> @@ -2583,7 +2594,15 @@ enum bpf_ret_code {
>  	BPF_DROP = 2,
>  	/* 3-6 reserved */
>  	BPF_REDIRECT = 7,
> -	/* >127 are reserved for prog type specific return codes */
> +	/* >127 are reserved for prog type specific return codes.
> +	 *
> +	 * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
> +	 *    BPF_PROG_TYPE_LWT_XMIT to indicate that skb's dst
> +	 *    has changed and appropriate dst_input() or dst_output()
> +	 *    action has to be taken (this is an L3 redirect, as
> +	 *    opposed to L2 redirect represented by BPF_REDIRECT above).
> +	 */
> +	BPF_LWT_REROUTE = 128,
>  };

What happens if a program pushes a new header onto the skb and does not
return BPF_LWT_REROUTE?

Might be better to move the route lookup and dst swap to run_lwt_bpf and
only do it if the program returns BPF_LWT_REROUTE. That allows calling
bpf_push_ip_encap without requiring a route lookup. That might be fine
as long as their is not a protocol mismatch (ipv4 packet gets an ipv6
header or vice versa). But then, I think you have the mismatch problem
now if the program does not return BPF_LWT_REROUTE.

^ permalink raw reply

* Re: [PATCH bpf 0/2] bpf: btf: allow typedef func_proto
From: Alexei Starovoitov @ 2019-01-30  3:18 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20190130003816.1043826-1-yhs@fb.com>

On Tue, Jan 29, 2019 at 04:38:16PM -0800, Yonghong Song wrote:
> The current btf implementation disallows the typedef of
> a func_proto type. This actually is allowed per C standard.
> This patch fixed btf verification to permit such types.
> Patch #1 fixed the kernel side and Patch #2 fixed
> the tools test_btf test.

Applied, Thanks


^ permalink raw reply

* [PATCH][RESEND] rtlwifi: remove set but not used variable 'cmd_seq'
From: YueHaibing @ 2019-01-30  3:15 UTC (permalink / raw)
  To: kvalo, davem, pkshih; +Cc: linux-kernel, netdev, linux-wireless, YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/realtek/rtlwifi/base.c: In function 'rtl_c2h_content_parsing':
drivers/net/wireless/realtek/rtlwifi/base.c:2313:13: warning:
 variable 'cmd_seq' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
---
 drivers/net/wireless/realtek/rtlwifi/base.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..ee726f4 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2311,11 +2311,10 @@ static void rtl_c2h_content_parsing(struct ieee80211_hw *hw,
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	struct rtl_hal_ops *hal_ops = rtlpriv->cfg->ops;
 	const struct rtl_btc_ops *btc_ops = rtlpriv->btcoexist.btc_ops;
-	u8 cmd_id, cmd_seq, cmd_len;
+	u8 cmd_id, cmd_len;
 	u8 *cmd_buf = NULL;
 
 	cmd_id = GET_C2H_CMD_ID(skb->data);
-	cmd_seq = GET_C2H_SEQ(skb->data);
 	cmd_len = skb->len - C2H_DATA_OFFSET;
 	cmd_buf = GET_C2H_DATA_PTR(skb->data);
 
-- 
2.7.4



^ permalink raw reply related

* [PATCH][RESEND] ath10k: snoc: remove set but not used variable 'ar_snoc'
From: YueHaibing @ 2019-01-30  3:09 UTC (permalink / raw)
  To: kvalo, davem, ath10k; +Cc: linux-kernel, netdev, linux-wireless, YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/wireless/ath/ath10k/snoc.c: In function 'ath10k_snoc_tx_pipe_cleanup':
drivers/net/wireless/ath/ath10k/snoc.c:681:22: warning:
 variable 'ar_snoc' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 54efe6b..b7493df 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -875,13 +875,11 @@ static void ath10k_snoc_tx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
 {
 	struct ath10k_ce_pipe *ce_pipe;
 	struct ath10k_ce_ring *ce_ring;
-	struct ath10k_snoc *ar_snoc;
 	struct sk_buff *skb;
 	struct ath10k *ar;
 	int i;
 
 	ar = snoc_pipe->hif_ce_state;
-	ar_snoc = ath10k_snoc_priv(ar);
 	ce_pipe = snoc_pipe->ce_hdl;
 	ce_ring = ce_pipe->src_ring;
 
-- 
2.7.4



^ permalink raw reply related

* Re: BUG: vnet0 selects TX queue 11, but real number of TX queues is 11
From: Stanislav Fomichev @ 2019-01-30  2:52 UTC (permalink / raw)
  To: George Amanakis; +Cc: linux-kernel, Netdev
In-Reply-To: <20190130021621.17250-1-gamanakis@gmail.com>

On Tue, Jan 29, 2019 at 6:16 PM George Amanakis <gamanakis@gmail.com> wrote:
>
> Since 4.20.4 when running a KVM with vhost_net I am seeing in dmesg:
> vnet0 selects TX queue 11, but real number of TX queues is 11
>
> The corresponding part in the xml definition of the virtual machine is:
> -------8<-------
> <interface type='bridge'>
>   <mac address='xx:xx:xx:xx:xx:xx'/>
>   <source bridge='br0'/>
>   <model type='virtio'/>
>   <driver name='vhost' queues='12'>
>   </driver>
>   <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> </interface>
> -------8<-------
>
> Doing a git-bisect with 4.20.3 last known good, and 4.20.4 as bad, this
> commit turned up:
> -------8<-------
> commit 9ff0436e2c3575ffe64d359fb3b67aee237dc519
> Author: Stanislav Fomichev <sdf@google.com>
> Date:   Mon Jan 7 13:38:38 2019 -0800
>
>     tun: publish tfile after it's fully initialized
>
>     [ Upstream commit 0b7959b6257322f7693b08a459c505d4938646f2 ]
>
>     BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000d1
> -------8<-------
>
>
> Applying the following patch corrects it in 4.20.5. Would this be the
> correct thing to do?
Ouch, tun_set_real_num_queues uses tun->numqueues internally :-(
Your patch looks good to me, care to do a proper submission (with a Fixes tag)?
I wonder whether we should use it as an opportunity to also do
something like the following (to make it more explicit):

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..ea9928b3b930 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -632,10 +632,10 @@ static inline bool tun_not_capable(struct tun_struct *tun)
  !ns_capable(net->user_ns, CAP_NET_ADMIN);
 }

-static void tun_set_real_num_queues(struct tun_struct *tun)
+static void tun_set_real_num_queues(struct tun_struct *tun, unsigned int nr)
 {
- netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
- netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
+ netif_set_real_num_tx_queues(tun->dev, nr);
+ netif_set_real_num_rx_queues(tun->dev, nr);
 }

 static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
@@ -712,7 +712,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
  tun_flow_delete_by_queue(tun, tun->numqueues + 1);
  /* Drop read queue */
  tun_queue_purge(tfile);
- tun_set_real_num_queues(tun);
+ tun_set_real_num_queues(tun, tun->numqueues);
  } else if (tfile->detached && clean) {
  tun = tun_enable_queue(tfile);
  sock_put(&tfile->sk);
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun,
struct file *file,
  if (rtnl_dereference(tun->xdp_prog))
  sock_set_flag(&tfile->sk, SOCK_XDP);

- tun_set_real_num_queues(tun);
-
  /* device is allowed to go away first, so no need to hold extra
  * refcnt.
  */
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun,
struct file *file,
  rcu_assign_pointer(tfile->tun, tun);
  rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
  tun->numqueues++;
+ tun_set_real_num_queues(tun, tun->numqueues);
 out:
  return err;
 }


> ---
>  drivers/net/tun.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 6658658246d2..e0dc004c6483 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -862,8 +862,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>         if (rtnl_dereference(tun->xdp_prog))
>                 sock_set_flag(&tfile->sk, SOCK_XDP);
>
> -       tun_set_real_num_queues(tun);
> -
>         /* device is allowed to go away first, so no need to hold extra
>          * refcnt.
>          */
> @@ -875,6 +873,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>         rcu_assign_pointer(tfile->tun, tun);
>         rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>         tun->numqueues++;
> +
> +       tun_set_real_num_queues(tun);
> +
>  out:
>         return err;
>  }
> --
> 2.20.1
>

^ permalink raw reply related

* [PATCH] Revert "ethtool: change to new sane powerpc64 kernel headers"
From: Maciej Żenczykowski @ 2019-01-30  2:48 UTC (permalink / raw)
  To: Maciej Żenczykowski, John W . Linville; +Cc: netdev
In-Reply-To: <CANP3RGdkA4LNNiC3UPcz3b4oGMBPFD2uEyKBMovXJz79eqiO-g@mail.gmail.com>

From: Maciej Żenczykowski <maze@google.com>

This reverts commit 4df55c81996dfb1dbe98c93ee62d8067ed5073a9.

It turns out this is not needed due to:
    commit c0a2c04b3cbf6d399a2551654401957ddb529a50
    internal.h: change to new sane kernel headers on 64-bit archs
which I apparently entirely forgot about while trying
to synchronize internal and upstream git repositories.

Change-Id: I56d90a3c1e9b66c30526824fb7bc41aab01d85d1
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 ethtool-copy.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 7772a4970987..6bfbb85f9402 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -14,12 +14,6 @@
 #ifndef _LINUX_ETHTOOL_H
 #define _LINUX_ETHTOOL_H
 
-#ifdef __powerpc64__
-/* Powerpc needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select
- * 'int-ll64.h' and avoid compile warnings when printing __u64 with %llu.
- */
-#define __SANE_USERSPACE_TYPES__
-#endif
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/if_ether.h>
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* Re: ethtool - manual changes in ethtool-copy.h
From: Maciej Żenczykowski @ 2019-01-30  2:42 UTC (permalink / raw)
  To: John W. Linville; +Cc: Michal Kubecek, Linux NetDev
In-Reply-To: <CANP3RGf-MmOYgBV4fQwoRbg_+OAvKTFKGM2EorR3ig9WfevHZg@mail.gmail.com>

Eh, I think this patch can simply be reverted.
This is a dupe.  ethtool-copy.h is only included from internal.h which
already includes:

...
#ifndef ETHTOOL_INTERNAL_H__
#define ETHTOOL_INTERNAL_H__

/* Some platforms (eg. ppc64) need __SANE_USERSPACE_TYPES__ before
 * <linux/types.h> to select 'int-ll64.h' and avoid compile warnings
 * when printing __u64 with %llu.
 */
#define __SANE_USERSPACE_TYPES__
...

which came in:

commit c0a2c04b3cbf6d399a2551654401957ddb529a50
Author: Maciej Żenczykowski <maze@google.com>
Date:   Fri Mar 11 09:58:14 2016 -0800

    internal.h: change to new sane kernel headers on 64-bit archs

    On ppc64, this fixes:
      ...
    Signed-off-by: Maciej Żenczykowski <maze@google.com>
    Signed-off-by: David Decotigny <decot@googlers.com>
    Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

I think I missed this while trying to synchronize internal with upstream repos.
(the local copy of the patch was obviously in the wrong spot)

^ permalink raw reply

* Re: [PATCH iproute2-next 2/2] ss: add AF_XDP support
From: David Ahern @ 2019-01-30  2:39 UTC (permalink / raw)
  To: bjorn.topel, netdev, stephen
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson
In-Reply-To: <20190125071848.25959-3-bjorn.topel@gmail.com>

On 1/25/19 12:18 AM, bjorn.topel@gmail.com wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> AF_XDP is an address family that is optimized for high performance
> packet processing.
> 
> This patch adds AF_XDP support to ss(8) so that sockets can be queried
> and monitored.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  man/man8/ss.8 |   9 ++-
>  misc/ss.c     | 168 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 172 insertions(+), 5 deletions(-)
> 

AF_XDP is not currently defined for a number of distributions. Add a
definition to include/utils.h similar to what is done for MPLS.

Also, please add example output to the commit log.

^ permalink raw reply

* Re: [PATCH iproute2-next v2] netns: add subcommand to attach an existing network namespace
From: David Ahern @ 2019-01-30  2:32 UTC (permalink / raw)
  To: Matteo Croce, netdev; +Cc: Stephen Hemminger, Andrea Claudi
In-Reply-To: <20190129150115.19965-1-mcroce@redhat.com>

On 1/29/19 8:01 AM, Matteo Croce wrote:
> ip tracks namespaces with dummy files in /var/run/netns/, but can't see
> namespaces created with other tools.
> Creating the dummy file and bind mounting the correct procfs entry will
> make ip aware of that namespace.
> Add an ip netns subcommand to automate this task.
> 
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
> ---
>  ip/ipnetns.c        | 62 ++++++++++++++++++++++++++++++++++-----------
>  man/man8/ip-netns.8 | 10 ++++++++
>  2 files changed, 57 insertions(+), 15 deletions(-)
> 

applied to iproute2-next. Thanks


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox