* Re: [RFC 03/14] net: hstats: add basic/core functionality
From: David Ahern @ 2019-01-30 4:18 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
simon.horman, jesse.brandeburg, maciejromanfijalkowski,
vasundhara-v.volam, michael.chan, shalomt, idosch
In-Reply-To: <20190128234507.32028-4-jakub.kicinski@netronome.com>
On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
> rcu_read_unlock();
> }
>
> + if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))
filter_mask is populated by RTEXT_FILTER_ from
include/uapi/linux/rtnetlink.h
> + size += rtnl_get_link_hstats_size(dev);
rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.
> +
> return size;
> }
>
>
^ permalink raw reply
* Re: [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
From: Alexei Starovoitov @ 2019-01-30 4:07 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: David S. Miller, Daniel Borkmann, Peter Zijlstra, Eric Dumazet,
Jann Horn, Network Development, Kernel Team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>
On Tue, Jan 29, 2019 at 8:06 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> In addition to preempt_disable patch for socket filters
> https://patchwork.ozlabs.org/patch/1032437/
> the first three patches fix various lockdep false positives.
> Last patch fixes potential deadlock in stackmap access from
> tracing bpf prog and from syscall.
Typo in subject.
All patches are for 'bpf' tree.
^ permalink raw reply
* [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
From: Alexei Starovoitov @ 2019-01-30 4:04 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>
Lockdep warns about false positive:
[ 13.007000] WARNING: possible circular locking dependency detected
[ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[ 13.008124] ------------------------------------------------------
[ 13.008624] test_progs/246 is trying to acquire lock:
[ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[ 13.009770]
[ 13.009770] but task is already holding lock:
[ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.010877]
[ 13.010877] which lock already depends on the new lock.
[ 13.010877]
[ 13.011532]
[ 13.011532] the existing dependency chain (in reverse order) is:
[ 13.012129]
[ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
[ 13.012582] perf_event_query_prog_array+0x9b/0x130
[ 13.013016] _perf_ioctl+0x3aa/0x830
[ 13.013354] perf_ioctl+0x2e/0x50
[ 13.013668] do_vfs_ioctl+0x8f/0x6a0
[ 13.014003] ksys_ioctl+0x70/0x80
[ 13.014320] __x64_sys_ioctl+0x16/0x20
[ 13.014668] do_syscall_64+0x4a/0x180
[ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.015469]
[ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[ 13.015910] perf_event_init_cpu+0x5a/0x90
[ 13.016291] perf_event_init+0x1b2/0x1de
[ 13.016654] start_kernel+0x2b8/0x42a
[ 13.016995] secondary_startup_64+0xa4/0xb0
[ 13.017382]
[ 13.017382] -> #2 (pmus_lock){+.+.}:
[ 13.017794] perf_event_init_cpu+0x21/0x90
[ 13.018172] cpuhp_invoke_callback+0xb3/0x960
[ 13.018573] _cpu_up+0xa7/0x140
[ 13.018871] do_cpu_up+0xa4/0xc0
[ 13.019178] smp_init+0xcd/0xd2
[ 13.019483] kernel_init_freeable+0x123/0x24f
[ 13.019878] kernel_init+0xa/0x110
[ 13.020201] ret_from_fork+0x24/0x30
[ 13.020541]
[ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[ 13.021051] static_key_slow_inc+0xe/0x20
[ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
[ 13.021891] perf_trace_event_init+0x11f/0x250
[ 13.022297] perf_trace_init+0x6b/0xa0
[ 13.022644] perf_tp_event_init+0x25/0x40
[ 13.023011] perf_try_init_event+0x6b/0x90
[ 13.023386] perf_event_alloc+0x9a8/0xc40
[ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
[ 13.024173] do_syscall_64+0x4a/0x180
[ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.024968]
[ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
[ 13.025434] __mutex_lock+0x86/0x970
[ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
[ 13.026215] bpf_probe_register+0x40/0x60
[ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.027042] __do_sys_bpf+0x94f/0x1a90
[ 13.027389] do_syscall_64+0x4a/0x180
[ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.028171]
[ 13.028171] other info that might help us debug this:
[ 13.028171]
[ 13.028807] Chain exists of:
[ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[ 13.028807]
[ 13.029666] Possible unsafe locking scenario:
[ 13.029666]
[ 13.030140] CPU0 CPU1
[ 13.030510] ---- ----
[ 13.030875] lock(bpf_event_mutex);
[ 13.031166] lock(&cpuctx_mutex);
[ 13.031645] lock(bpf_event_mutex);
[ 13.032135] lock(tracepoints_mutex);
[ 13.032441]
[ 13.032441] *** DEADLOCK ***
[ 13.032441]
[ 13.032911] 1 lock held by test_progs/246:
[ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.033909]
[ 13.033909] stack backtrace:
[ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 13.035657] Call Trace:
[ 13.035859] dump_stack+0x5f/0x8b
[ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
[ 13.036526] __lock_acquire+0x1158/0x1350
[ 13.036852] ? lock_acquire+0x98/0x190
[ 13.037154] lock_acquire+0x98/0x190
[ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.037876] __mutex_lock+0x86/0x970
[ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.039028] ? __mutex_lock+0x86/0x970
[ 13.039337] ? __mutex_lock+0x24a/0x970
[ 13.039649] ? bpf_probe_register+0x1d/0x60
[ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
[ 13.041325] bpf_probe_register+0x40/0x60
[ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.042068] ? __might_fault+0x3e/0x90
[ 13.042374] __do_sys_bpf+0x94f/0x1a90
[ 13.042678] do_syscall_64+0x4a/0x180
[ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.043382] RIP: 0033:0x7f23b10a07f9
[ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
Lockdep seems to be confusing different mutexes.
Such deadlock is not possible.
Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.
Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/trace/bpf_trace.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8b068adb9da1..f1a86a0d881d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1204,22 +1204,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
{
- int err;
-
- mutex_lock(&bpf_event_mutex);
- err = __bpf_probe_register(btp, prog);
- mutex_unlock(&bpf_event_mutex);
- return err;
+ return __bpf_probe_register(btp, prog);
}
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
{
- int err;
-
- mutex_lock(&bpf_event_mutex);
- err = tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
- mutex_unlock(&bpf_event_mutex);
- return err;
+ return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
}
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
--
2.20.0
^ permalink raw reply related
* [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock
From: Alexei Starovoitov @ 2019-01-30 4:04 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>
From: Martin KaFai Lau <kafai@fb.com>
The map_lookup_elem used to not acquiring spinlock
in order to optimize the reader.
It was true until commit 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
If that is the case, pcpu_freelist_push() saves this elem for reuse later.
This push requires a spinlock.
If a tracing bpf_prog got run in the middle of the syscall's
map_lookup_elem(stackmap) and this tracing bpf_prog is calling
bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
spinlock, it may end up with a dead lock situation as reported by
Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
The situation is the same as the syscall's map_update_elem() which
needs to acquire the pcpu_freelist's spinlock and could race
with tracing bpf_prog. Hence, this patch fixes it by protecting
bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
to prevent tracing bpf_prog from running.
A later syscall's map_lookup_elem commit f1a2e44a3aec ("bpf: add queue and stack maps")
also acquires a spinlock and races with tracing bpf_prog similarly.
Hence, this patch is forward looking and protects the majority
of the map lookups. bpf_map_offload_lookup_elem() is the exception
since it is for network bpf_prog only (i.e. never called by tracing
bpf_prog).
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/syscall.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..8577bb7f8be6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -713,8 +713,13 @@ static int map_lookup_elem(union bpf_attr *attr)
if (bpf_map_is_dev_bound(map)) {
err = bpf_map_offload_lookup_elem(map, key, value);
- } else if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
- map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+ goto done;
+ }
+
+ preempt_disable();
+ this_cpu_inc(bpf_prog_active);
+ if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+ map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_copy(map, key, value);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_copy(map, key, value);
@@ -744,7 +749,10 @@ static int map_lookup_elem(union bpf_attr *attr)
}
rcu_read_unlock();
}
+ this_cpu_dec(bpf_prog_active);
+ preempt_enable();
+done:
if (err)
goto free_value;
--
2.20.0
^ permalink raw reply related
* [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Alexei Starovoitov @ 2019-01-30 4:04 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>
Lockdep warns about false positive:
[ 11.211460] ------------[ cut here ]------------
[ 11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
[ 11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
[ 11.213134] Modules linked in:
[ 11.213413] CPU: 0 PID: 141 Comm: systemd-journal Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #476
[ 11.214191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 11.214954] RIP: 0010:lock_release+0x1ad/0x280
[ 11.217036] RSP: 0018:ffff88813ba03f50 EFLAGS: 00010086
[ 11.217516] RAX: 000000000000001f RBX: ffff8881378d8000 RCX: 0000000000000000
[ 11.218179] RDX: ffffffff810d3e9e RSI: 0000000000000001 RDI: ffffffff810d3eb3
[ 11.218851] RBP: ffff8881393e2b08 R08: 0000000000000002 R09: 0000000000000000
[ 11.219504] R10: 0000000000000000 R11: ffff88813ba03d9d R12: ffffffff8118dfa2
[ 11.220162] R13: 0000000000000086 R14: 0000000000000000 R15: 0000000000000000
[ 11.220717] FS: 00007f3c8cf35780(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
[ 11.221348] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.221822] CR2: 00007f5825d92080 CR3: 00000001378c8005 CR4: 00000000003606f0
[ 11.222381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 11.222951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 11.223508] Call Trace:
[ 11.223705] <IRQ>
[ 11.223874] ? __local_bh_enable+0x7a/0x80
[ 11.224199] up_read+0x1c/0xa0
[ 11.224446] do_up_read+0x12/0x20
[ 11.224713] irq_work_run_list+0x43/0x70
[ 11.225030] irq_work_run+0x26/0x50
[ 11.225310] smp_irq_work_interrupt+0x57/0x1f0
[ 11.225662] irq_work_interrupt+0xf/0x20
since rw_semaphore is released in a different task vs task that locked the sema.
It is expected behavior.
Silence the warning by using up_read_non_owner().
Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/stackmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b14535827..4b79e7c251e5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -44,7 +44,7 @@ static void do_up_read(struct irq_work *entry)
struct stack_map_irq_work *work;
work = container_of(entry, struct stack_map_irq_work, irq_work);
- up_read(work->sem);
+ up_read_non_owner(work->sem);
work->sem = NULL;
}
--
2.20.0
^ permalink raw reply related
* [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
From: Alexei Starovoitov @ 2019-01-30 4:04 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In-Reply-To: <20190130040458.2544340-1-ast@kernel.org>
Lockdep warns about false positive:
[ 12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[ 12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[ 12.493275] (&rq->lock){-.-.}
[ 12.493276]
[ 12.493276]
[ 12.493276] and interrupts could create inverse lock ordering between them.
[ 12.493276]
[ 12.494435]
[ 12.494435] other info that might help us debug this:
[ 12.494979] Possible interrupt unsafe locking scenario:
[ 12.494979]
[ 12.495518] CPU0 CPU1
[ 12.495879] ---- ----
[ 12.496243] lock(&head->lock);
[ 12.496502] local_irq_disable();
[ 12.496969] lock(&rq->lock);
[ 12.497431] lock(&head->lock);
[ 12.497890] <Interrupt>
[ 12.498104] lock(&rq->lock);
[ 12.498368]
[ 12.498368] *** DEADLOCK ***
[ 12.498368]
[ 12.498837] 1 lock held by dd/276:
[ 12.499110] #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[ 12.499747]
[ 12.499747] the shortest dependencies between 2nd lock and 1st lock:
[ 12.500389] -> (&rq->lock){-.-.} {
[ 12.500669] IN-HARDIRQ-W at:
[ 12.500934] _raw_spin_lock+0x2f/0x40
[ 12.501373] scheduler_tick+0x4c/0xf0
[ 12.501812] update_process_times+0x40/0x50
[ 12.502294] tick_periodic+0x27/0xb0
[ 12.502723] tick_handle_periodic+0x1f/0x60
[ 12.503203] timer_interrupt+0x11/0x20
[ 12.503651] __handle_irq_event_percpu+0x43/0x2c0
[ 12.504167] handle_irq_event_percpu+0x20/0x50
[ 12.504674] handle_irq_event+0x37/0x60
[ 12.505139] handle_level_irq+0xa7/0x120
[ 12.505601] handle_irq+0xa1/0x150
[ 12.506018] do_IRQ+0x77/0x140
[ 12.506411] ret_from_intr+0x0/0x1d
[ 12.506834] _raw_spin_unlock_irqrestore+0x53/0x60
[ 12.507362] __setup_irq+0x481/0x730
[ 12.507789] setup_irq+0x49/0x80
[ 12.508195] hpet_time_init+0x21/0x32
[ 12.508644] x86_late_time_init+0xb/0x16
[ 12.509106] start_kernel+0x390/0x42a
[ 12.509554] secondary_startup_64+0xa4/0xb0
[ 12.510034] IN-SOFTIRQ-W at:
[ 12.510305] _raw_spin_lock+0x2f/0x40
[ 12.510772] try_to_wake_up+0x1c7/0x4e0
[ 12.511220] swake_up_locked+0x20/0x40
[ 12.511657] swake_up_one+0x1a/0x30
[ 12.512070] rcu_process_callbacks+0xc5/0x650
[ 12.512553] __do_softirq+0xe6/0x47b
[ 12.512978] irq_exit+0xc3/0xd0
[ 12.513372] smp_apic_timer_interrupt+0xa9/0x250
[ 12.513876] apic_timer_interrupt+0xf/0x20
[ 12.514343] default_idle+0x1c/0x170
[ 12.514765] do_idle+0x199/0x240
[ 12.515159] cpu_startup_entry+0x19/0x20
[ 12.515614] start_kernel+0x422/0x42a
[ 12.516045] secondary_startup_64+0xa4/0xb0
[ 12.516521] INITIAL USE at:
[ 12.516774] _raw_spin_lock_irqsave+0x38/0x50
[ 12.517258] rq_attach_root+0x16/0xd0
[ 12.517685] sched_init+0x2f2/0x3eb
[ 12.518096] start_kernel+0x1fb/0x42a
[ 12.518525] secondary_startup_64+0xa4/0xb0
[ 12.518986] }
[ 12.519132] ... key at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[ 12.519649] ... acquired at:
[ 12.519892] pcpu_freelist_pop+0x7b/0xd0
[ 12.520221] bpf_get_stackid+0x1d2/0x4d0
[ 12.520563] ___bpf_prog_run+0x8b4/0x11a0
[ 12.520887]
[ 12.521008] -> (&head->lock){+...} {
[ 12.521292] HARDIRQ-ON-W at:
[ 12.521539] _raw_spin_lock+0x2f/0x40
[ 12.521950] pcpu_freelist_push+0x2a/0x40
[ 12.522396] bpf_get_stackid+0x494/0x4d0
[ 12.522828] ___bpf_prog_run+0x8b4/0x11a0
[ 12.523296] INITIAL USE at:
[ 12.523537] _raw_spin_lock+0x2f/0x40
[ 12.523944] pcpu_freelist_populate+0xc0/0x120
[ 12.524417] htab_map_alloc+0x405/0x500
[ 12.524835] __do_sys_bpf+0x1a3/0x1a90
[ 12.525253] do_syscall_64+0x4a/0x180
[ 12.525659] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 12.526167] }
[ 12.526311] ... key at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[ 12.526812] ... acquired at:
[ 12.527047] __lock_acquire+0x521/0x1350
[ 12.527371] lock_acquire+0x98/0x190
[ 12.527680] _raw_spin_lock+0x2f/0x40
[ 12.527994] pcpu_freelist_push+0x2a/0x40
[ 12.528325] bpf_get_stackid+0x494/0x4d0
[ 12.528645] ___bpf_prog_run+0x8b4/0x11a0
[ 12.528970]
[ 12.529092]
[ 12.529092] stack backtrace:
[ 12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[ 12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 12.530750] Call Trace:
[ 12.530948] dump_stack+0x5f/0x8b
[ 12.531248] check_usage_backwards+0x10c/0x120
[ 12.531598] ? ___bpf_prog_run+0x8b4/0x11a0
[ 12.531935] ? mark_lock+0x382/0x560
[ 12.532229] mark_lock+0x382/0x560
[ 12.532496] ? print_shortest_lock_dependencies+0x180/0x180
[ 12.532928] __lock_acquire+0x521/0x1350
[ 12.533271] ? find_get_entry+0x17f/0x2e0
[ 12.533586] ? find_get_entry+0x19c/0x2e0
[ 12.533902] ? lock_acquire+0x98/0x190
[ 12.534196] lock_acquire+0x98/0x190
[ 12.534482] ? pcpu_freelist_push+0x2a/0x40
[ 12.534810] _raw_spin_lock+0x2f/0x40
[ 12.535099] ? pcpu_freelist_push+0x2a/0x40
[ 12.535432] pcpu_freelist_push+0x2a/0x40
[ 12.535750] bpf_get_stackid+0x494/0x4d0
[ 12.536062] ___bpf_prog_run+0x8b4/0x11a0
It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
has already been running (ensured by the percpu bpf_prog_active counter).
Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.
Fix this by calling local_irq_save/restore for stackmap.
Another false positive had also been worked around by calling
local_irq_save in commit 89ad2fa3f043 ("bpf: fix lockdep splat").
That commit added unnecessary irq_save/restore to fast path of
bpf hash map. irqs are already disabled at that point, since htab
is holding per bucket spin_lock with irqsave.
Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
to be used elsewhere (right now only in stackmap).
It stops lockdep false positive in stackmap with a bit of acceptable overhead.
Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
kernel/bpf/hashtab.c | 4 ++--
kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
kernel/bpf/percpu_freelist.h | 4 ++++
3 files changed, 35 insertions(+), 14 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4b7c76765d9d..f9274114c88d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -686,7 +686,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
}
if (htab_is_prealloc(htab)) {
- pcpu_freelist_push(&htab->freelist, &l->fnode);
+ __pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
atomic_dec(&htab->count);
l->htab = htab;
@@ -748,7 +748,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
} else {
struct pcpu_freelist_node *l;
- l = pcpu_freelist_pop(&htab->freelist);
+ l = __pcpu_freelist_pop(&htab->freelist);
if (!l)
return ERR_PTR(-E2BIG);
l_new = container_of(l, struct htab_elem, fnode);
diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 673fa6fe2d73..0c1b4ba9e90e 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -28,8 +28,8 @@ void pcpu_freelist_destroy(struct pcpu_freelist *s)
free_percpu(s->freelist);
}
-static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
- struct pcpu_freelist_node *node)
+static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+ struct pcpu_freelist_node *node)
{
raw_spin_lock(&head->lock);
node->next = head->first;
@@ -37,12 +37,22 @@ static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
raw_spin_unlock(&head->lock);
}
-void pcpu_freelist_push(struct pcpu_freelist *s,
+void __pcpu_freelist_push(struct pcpu_freelist *s,
struct pcpu_freelist_node *node)
{
struct pcpu_freelist_head *head = this_cpu_ptr(s->freelist);
- __pcpu_freelist_push(head, node);
+ ___pcpu_freelist_push(head, node);
+}
+
+void pcpu_freelist_push(struct pcpu_freelist *s,
+ struct pcpu_freelist_node *node)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __pcpu_freelist_push(s, node);
+ local_irq_restore(flags);
}
void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
@@ -63,7 +73,7 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
for_each_possible_cpu(cpu) {
again:
head = per_cpu_ptr(s->freelist, cpu);
- __pcpu_freelist_push(head, buf);
+ ___pcpu_freelist_push(head, buf);
i++;
buf += elem_size;
if (i == nr_elems)
@@ -74,14 +84,12 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
local_irq_restore(flags);
}
-struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
{
struct pcpu_freelist_head *head;
struct pcpu_freelist_node *node;
- unsigned long flags;
int orig_cpu, cpu;
- local_irq_save(flags);
orig_cpu = cpu = raw_smp_processor_id();
while (1) {
head = per_cpu_ptr(s->freelist, cpu);
@@ -89,16 +97,25 @@ struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
node = head->first;
if (node) {
head->first = node->next;
- raw_spin_unlock_irqrestore(&head->lock, flags);
+ raw_spin_unlock(&head->lock);
return node;
}
raw_spin_unlock(&head->lock);
cpu = cpumask_next(cpu, cpu_possible_mask);
if (cpu >= nr_cpu_ids)
cpu = 0;
- if (cpu == orig_cpu) {
- local_irq_restore(flags);
+ if (cpu == orig_cpu)
return NULL;
- }
}
}
+
+struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+{
+ struct pcpu_freelist_node *ret;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ ret = __pcpu_freelist_pop(s);
+ local_irq_restore(flags);
+ return ret;
+}
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3049aae8ea1e..c3960118e617 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -22,8 +22,12 @@ struct pcpu_freelist_node {
struct pcpu_freelist_node *next;
};
+/* pcpu_freelist_* do spin_lock_irqsave. */
void pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *);
+/* __pcpu_freelist_* do spin_lock only. caller must disable irqs. */
+void __pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *);
void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
u32 nr_elems);
int pcpu_freelist_init(struct pcpu_freelist *);
--
2.20.0
^ permalink raw reply related
* [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
From: Alexei Starovoitov @ 2019-01-30 4:04 UTC (permalink / raw)
To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team
In addition to preempt_disable patch for socket filters
https://patchwork.ozlabs.org/patch/1032437/
the first three patches fix various lockdep false positives.
Last patch fixes potential deadlock in stackmap access from
tracing bpf prog and from syscall.
Alexei Starovoitov (3):
bpf: fix lockdep false positive in percpu_freelist
bpf: fix lockdep false positive in stackmap
bpf: fix lockdep false positive in bpf_prog_register
Martin KaFai Lau (1):
bpf: Fix syscall's stackmap lookup potential deadlock
kernel/bpf/hashtab.c | 4 ++--
kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
kernel/bpf/percpu_freelist.h | 4 ++++
kernel/bpf/stackmap.c | 2 +-
kernel/bpf/syscall.c | 12 +++++++++--
kernel/trace/bpf_trace.c | 14 ++----------
6 files changed, 48 insertions(+), 29 deletions(-)
--
2.20.0
^ permalink raw reply
* Re: [PATCH net-next] cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()
From: Gustavo A. R. Silva @ 2019-01-30 3:57 UTC (permalink / raw)
To: David Miller; +Cc: arjun, netdev, linux-kernel
In-Reply-To: <20190129.105724.1719800687435588791.davem@davemloft.net>
On 1/29/19 12:57 PM, David Miller wrote:
>>
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>
> Applied, thanks.
>
Thanks, Dave.
--
Gustavo
^ permalink raw reply
* Re: [PATCHv4 2/3] net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC
From: Sean Wang @ 2019-01-30 3:55 UTC (permalink / raw)
To: Florian Fainelli
Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <d030fb8f-c707-9aa2-5ebc-e8100188a5c2@gmail.com>
On Tue, Jan 29, 2019 at 7:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Greg Ungerer <gerg@kernel.org>
> >
> > The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
> > linux kernel 7530 DSA switch driver can be used with it.
> >
> > The bulk of the changes required stem from the 7621 having different
> > regulator and pad setup. The existing setup of these in the 7530
> > driver appears to be very specific to its implemtation in the Mediatek
> > 7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
> > based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).
> >
> > Create a new devicetree type, "mediatek,mt7621", to support the 7530
> > switch in the 7621 SoC. There appears to be no usable ID register to
> > distinguish it from a 7530 in other hardware at runtime. This is used
> > to carry out the appropriate configuration and setup.
> >
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply
* Re: [PATCHv4 3/3] dt-bindings: net: dsa: add new MT7530 binding to support MT7621
From: Sean Wang @ 2019-01-30 3:54 UTC (permalink / raw)
To: Florian Fainelli
Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <9fd01c61-fb62-52fc-e587-43a03602bae2@gmail.com>
On Tue, Jan 29, 2019 at 7:40 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Greg Ungerer <gerg@kernel.org>
> >
> > Add devicetree binding to support the compatible mt7530 switch as used
> > in the MediaTek MT7621 SoC.
> >
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply
* Re: [PATCHv4 1/3] net: ethernet: mediatek: support MT7621 SoC ethernet hardware
From: Sean Wang @ 2019-01-30 3:54 UTC (permalink / raw)
To: Florian Fainelli
Cc: gerg, Sean Wang (王志亘), linux-mediatek, bjorn,
Andrew Lunn, vivien.didelot, netdev, neil, rene, John Crispin
In-Reply-To: <48bb34ce-9bb0-2dc1-0e17-55e000a23114@gmail.com>
On Tue, Jan 29, 2019 at 7:40 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> > From: Bjørn Mork <bjorn@mork.no>
> >
> > The Mediatek MT7621 SoC contains the same ethernet hardware module as
> > used on a number of other MediaTek SoC parts. There are some minor
> > differences to deal with but we can use the same driver to support
> > them all.
> >
> > This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his
> > original patch is at:
> >
> > https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
> >
> > There is an additional compatible devicetree type added, and the primary
> > change to the code required is to support a single interrupt (for both
> > RX and TX interrupts).
> >
> > Signed-off-by: Bjørn Mork <bjorn@mork.no>
> > [gerg@kernel.org: rebase to mainline and irq handler fix]
> > Signed-off-by: Greg Ungerer <gerg@kernel.org>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>
Acked-by: Sean Wang <sean.wang@kernel.org>
> > ---
> > drivers/net/ethernet/mediatek/Kconfig | 2 +-
> > drivers/net/ethernet/mediatek/mtk_eth_soc.c | 48 ++++++++++++++++++---
> > drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 ++
> > 3 files changed, 46 insertions(+), 8 deletions(-)
> >
> > v2: first in series for this change
> > v3: rebase onto 5.0-rc3
> > v4: rebase onto 5.0-rc4
> >
> > diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> > index f9149d2a4694..43656f961891 100644
> > --- a/drivers/net/ethernet/mediatek/Kconfig
> > +++ b/drivers/net/ethernet/mediatek/Kconfig
> > @@ -1,6 +1,6 @@
> > config NET_VENDOR_MEDIATEK
> > bool "MediaTek ethernet driver"
> > - depends on ARCH_MEDIATEK
> > + depends on ARCH_MEDIATEK || SOC_MT7621
>
> Would be nice to add COMPILE_TEST there as well at some point so someone
> can easily build test changes to the driver.
okay, I will do it in another patch
>
> --
> Florian
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply
* [PATCH] tun: move the call to tun_set_real_num_queues
From: George Amanakis @ 2019-01-30 3:53 UTC (permalink / raw)
To: davem; +Cc: linux-kernel, sdf, netdev, George Amanakis
In-Reply-To: <CAKH8qBvEintAf5Dmz+8kCTW1V_BRUDxSAZy8WoYSyNoLW+JHfQ@mail.gmail.com>
Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".
Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
drivers/net/tun.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..fed298c0cb39 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
if (rtnl_dereference(tun->xdp_prog))
sock_set_flag(&tfile->sk, SOCK_XDP);
- tun_set_real_num_queues(tun);
-
/* device is allowed to go away first, so no need to hold extra
* refcnt.
*/
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
+ tun_set_real_num_queues(tun);
out:
return err;
}
--
2.20.1
^ permalink raw reply related
* [PATCH] tun: move the call to tun_set_real_num_queues
From: George Amanakis @ 2019-01-30 3:50 UTC (permalink / raw)
To: davem; +Cc: linux-kernel, sdf, netdev, George Amanakis
Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".
Fixes: 0b7959b62573 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
drivers/net/tun.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..fed298c0cb39 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
if (rtnl_dereference(tun->xdp_prog))
sock_set_flag(&tfile->sk, SOCK_XDP);
- tun_set_real_num_queues(tun);
-
/* device is allowed to go away first, so no need to hold extra
* refcnt.
*/
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
+ tun_set_real_num_queues(tun);
out:
return err;
}
--
2.20.1
^ permalink raw reply related
* Re: [PATCHv4 1/3] net: ethernet: mediatek: support MT7621 SoC ethernet hardware
From: Florian Fainelli @ 2019-01-30 3:40 UTC (permalink / raw)
To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
netdev
Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-2-gerg@kernel.org>
On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Bjørn Mork <bjorn@mork.no>
>
> The Mediatek MT7621 SoC contains the same ethernet hardware module as
> used on a number of other MediaTek SoC parts. There are some minor
> differences to deal with but we can use the same driver to support
> them all.
>
> This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his
> original patch is at:
>
> https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
>
> There is an additional compatible devicetree type added, and the primary
> change to the code required is to support a single interrupt (for both
> RX and TX interrupts).
>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> [gerg@kernel.org: rebase to mainline and irq handler fix]
> Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> drivers/net/ethernet/mediatek/Kconfig | 2 +-
> drivers/net/ethernet/mediatek/mtk_eth_soc.c | 48 ++++++++++++++++++---
> drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 ++
> 3 files changed, 46 insertions(+), 8 deletions(-)
>
> v2: first in series for this change
> v3: rebase onto 5.0-rc3
> v4: rebase onto 5.0-rc4
>
> diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> index f9149d2a4694..43656f961891 100644
> --- a/drivers/net/ethernet/mediatek/Kconfig
> +++ b/drivers/net/ethernet/mediatek/Kconfig
> @@ -1,6 +1,6 @@
> config NET_VENDOR_MEDIATEK
> bool "MediaTek ethernet driver"
> - depends on ARCH_MEDIATEK
> + depends on ARCH_MEDIATEK || SOC_MT7621
Would be nice to add COMPILE_TEST there as well at some point so someone
can easily build test changes to the driver.
--
Florian
^ permalink raw reply
* Re: [PATCHv4 3/3] dt-bindings: net: dsa: add new MT7530 binding to support MT7621
From: Florian Fainelli @ 2019-01-30 3:39 UTC (permalink / raw)
To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
netdev
Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-4-gerg@kernel.org>
On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Greg Ungerer <gerg@kernel.org>
>
> Add devicetree binding to support the compatible mt7530 switch as used
> in the MediaTek MT7621 SoC.
>
> Signed-off-by: Greg Ungerer <gerg@kernel.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCHv4 2/3] net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC
From: Florian Fainelli @ 2019-01-30 3:39 UTC (permalink / raw)
To: gerg, sean.wang, linux-mediatek, bjorn, andrew, vivien.didelot,
netdev
Cc: rene, john, neil
In-Reply-To: <20190130012406.28271-3-gerg@kernel.org>
On 1/29/19 5:24 PM, gerg@kernel.org wrote:
> From: Greg Ungerer <gerg@kernel.org>
>
> The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
> linux kernel 7530 DSA switch driver can be used with it.
>
> The bulk of the changes required stem from the 7621 having different
> regulator and pad setup. The existing setup of these in the 7530
> driver appears to be very specific to its implemtation in the Mediatek
> 7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
> based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).
>
> Create a new devicetree type, "mediatek,mt7621", to support the 7530
> switch in the 7621 SoC. There appears to be no usable ID register to
> distinguish it from a 7530 in other hardware at runtime. This is used
> to carry out the appropriate configuration and setup.
>
> Signed-off-by: Greg Ungerer <gerg@kernel.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH bpf-next v3 1/4] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: David Ahern @ 2019-01-30 3:29 UTC (permalink / raw)
To: Peter Oskolkov, Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, Willem de Bruijn
In-Reply-To: <20190129011217.192510-2-posk@google.com>
On 1/28/19 6:12 PM, Peter Oskolkov wrote
> @@ -2583,7 +2594,15 @@ enum bpf_ret_code {
> BPF_DROP = 2,
> /* 3-6 reserved */
> BPF_REDIRECT = 7,
> - /* >127 are reserved for prog type specific return codes */
> + /* >127 are reserved for prog type specific return codes.
> + *
> + * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
> + * BPF_PROG_TYPE_LWT_XMIT to indicate that skb's dst
> + * has changed and appropriate dst_input() or dst_output()
> + * action has to be taken (this is an L3 redirect, as
> + * opposed to L2 redirect represented by BPF_REDIRECT above).
> + */
> + BPF_LWT_REROUTE = 128,
> };
What happens if a program pushes a new header onto the skb and does not
return BPF_LWT_REROUTE?
Might be better to move the route lookup and dst swap to run_lwt_bpf and
only do it if the program returns BPF_LWT_REROUTE. That allows calling
bpf_push_ip_encap without requiring a route lookup. That might be fine
as long as their is not a protocol mismatch (ipv4 packet gets an ipv6
header or vice versa). But then, I think you have the mismatch problem
now if the program does not return BPF_LWT_REROUTE.
^ permalink raw reply
* Re: [PATCH bpf 0/2] bpf: btf: allow typedef func_proto
From: Alexei Starovoitov @ 2019-01-30 3:18 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20190130003816.1043826-1-yhs@fb.com>
On Tue, Jan 29, 2019 at 04:38:16PM -0800, Yonghong Song wrote:
> The current btf implementation disallows the typedef of
> a func_proto type. This actually is allowed per C standard.
> This patch fixed btf verification to permit such types.
> Patch #1 fixed the kernel side and Patch #2 fixed
> the tools test_btf test.
Applied, Thanks
^ permalink raw reply
* [PATCH][RESEND] rtlwifi: remove set but not used variable 'cmd_seq'
From: YueHaibing @ 2019-01-30 3:15 UTC (permalink / raw)
To: kvalo, davem, pkshih; +Cc: linux-kernel, netdev, linux-wireless, YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/wireless/realtek/rtlwifi/base.c: In function 'rtl_c2h_content_parsing':
drivers/net/wireless/realtek/rtlwifi/base.c:2313:13: warning:
variable 'cmd_seq' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
---
drivers/net/wireless/realtek/rtlwifi/base.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..ee726f4 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2311,11 +2311,10 @@ static void rtl_c2h_content_parsing(struct ieee80211_hw *hw,
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_hal_ops *hal_ops = rtlpriv->cfg->ops;
const struct rtl_btc_ops *btc_ops = rtlpriv->btcoexist.btc_ops;
- u8 cmd_id, cmd_seq, cmd_len;
+ u8 cmd_id, cmd_len;
u8 *cmd_buf = NULL;
cmd_id = GET_C2H_CMD_ID(skb->data);
- cmd_seq = GET_C2H_SEQ(skb->data);
cmd_len = skb->len - C2H_DATA_OFFSET;
cmd_buf = GET_C2H_DATA_PTR(skb->data);
--
2.7.4
^ permalink raw reply related
* [PATCH][RESEND] ath10k: snoc: remove set but not used variable 'ar_snoc'
From: YueHaibing @ 2019-01-30 3:09 UTC (permalink / raw)
To: kvalo, davem, ath10k; +Cc: linux-kernel, netdev, linux-wireless, YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/wireless/ath/ath10k/snoc.c: In function 'ath10k_snoc_tx_pipe_cleanup':
drivers/net/wireless/ath/ath10k/snoc.c:681:22: warning:
variable 'ar_snoc' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
---
drivers/net/wireless/ath/ath10k/snoc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 54efe6b..b7493df 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -875,13 +875,11 @@ static void ath10k_snoc_tx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
{
struct ath10k_ce_pipe *ce_pipe;
struct ath10k_ce_ring *ce_ring;
- struct ath10k_snoc *ar_snoc;
struct sk_buff *skb;
struct ath10k *ar;
int i;
ar = snoc_pipe->hif_ce_state;
- ar_snoc = ath10k_snoc_priv(ar);
ce_pipe = snoc_pipe->ce_hdl;
ce_ring = ce_pipe->src_ring;
--
2.7.4
^ permalink raw reply related
* Re: BUG: vnet0 selects TX queue 11, but real number of TX queues is 11
From: Stanislav Fomichev @ 2019-01-30 2:52 UTC (permalink / raw)
To: George Amanakis; +Cc: linux-kernel, Netdev
In-Reply-To: <20190130021621.17250-1-gamanakis@gmail.com>
On Tue, Jan 29, 2019 at 6:16 PM George Amanakis <gamanakis@gmail.com> wrote:
>
> Since 4.20.4 when running a KVM with vhost_net I am seeing in dmesg:
> vnet0 selects TX queue 11, but real number of TX queues is 11
>
> The corresponding part in the xml definition of the virtual machine is:
> -------8<-------
> <interface type='bridge'>
> <mac address='xx:xx:xx:xx:xx:xx'/>
> <source bridge='br0'/>
> <model type='virtio'/>
> <driver name='vhost' queues='12'>
> </driver>
> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> </interface>
> -------8<-------
>
> Doing a git-bisect with 4.20.3 last known good, and 4.20.4 as bad, this
> commit turned up:
> -------8<-------
> commit 9ff0436e2c3575ffe64d359fb3b67aee237dc519
> Author: Stanislav Fomichev <sdf@google.com>
> Date: Mon Jan 7 13:38:38 2019 -0800
>
> tun: publish tfile after it's fully initialized
>
> [ Upstream commit 0b7959b6257322f7693b08a459c505d4938646f2 ]
>
> BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000d1
> -------8<-------
>
>
> Applying the following patch corrects it in 4.20.5. Would this be the
> correct thing to do?
Ouch, tun_set_real_num_queues uses tun->numqueues internally :-(
Your patch looks good to me, care to do a proper submission (with a Fixes tag)?
I wonder whether we should use it as an opportunity to also do
something like the following (to make it more explicit):
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..ea9928b3b930 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -632,10 +632,10 @@ static inline bool tun_not_capable(struct tun_struct *tun)
!ns_capable(net->user_ns, CAP_NET_ADMIN);
}
-static void tun_set_real_num_queues(struct tun_struct *tun)
+static void tun_set_real_num_queues(struct tun_struct *tun, unsigned int nr)
{
- netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
- netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
+ netif_set_real_num_tx_queues(tun->dev, nr);
+ netif_set_real_num_rx_queues(tun->dev, nr);
}
static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
@@ -712,7 +712,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
tun_flow_delete_by_queue(tun, tun->numqueues + 1);
/* Drop read queue */
tun_queue_purge(tfile);
- tun_set_real_num_queues(tun);
+ tun_set_real_num_queues(tun, tun->numqueues);
} else if (tfile->detached && clean) {
tun = tun_enable_queue(tfile);
sock_put(&tfile->sk);
@@ -866,8 +866,6 @@ static int tun_attach(struct tun_struct *tun,
struct file *file,
if (rtnl_dereference(tun->xdp_prog))
sock_set_flag(&tfile->sk, SOCK_XDP);
- tun_set_real_num_queues(tun);
-
/* device is allowed to go away first, so no need to hold extra
* refcnt.
*/
@@ -879,6 +877,7 @@ static int tun_attach(struct tun_struct *tun,
struct file *file,
rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
+ tun_set_real_num_queues(tun, tun->numqueues);
out:
return err;
}
> ---
> drivers/net/tun.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 6658658246d2..e0dc004c6483 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -862,8 +862,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
> if (rtnl_dereference(tun->xdp_prog))
> sock_set_flag(&tfile->sk, SOCK_XDP);
>
> - tun_set_real_num_queues(tun);
> -
> /* device is allowed to go away first, so no need to hold extra
> * refcnt.
> */
> @@ -875,6 +873,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
> rcu_assign_pointer(tfile->tun, tun);
> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> tun->numqueues++;
> +
> + tun_set_real_num_queues(tun);
> +
> out:
> return err;
> }
> --
> 2.20.1
>
^ permalink raw reply related
* [PATCH] Revert "ethtool: change to new sane powerpc64 kernel headers"
From: Maciej Żenczykowski @ 2019-01-30 2:48 UTC (permalink / raw)
To: Maciej Żenczykowski, John W . Linville; +Cc: netdev
In-Reply-To: <CANP3RGdkA4LNNiC3UPcz3b4oGMBPFD2uEyKBMovXJz79eqiO-g@mail.gmail.com>
From: Maciej Żenczykowski <maze@google.com>
This reverts commit 4df55c81996dfb1dbe98c93ee62d8067ed5073a9.
It turns out this is not needed due to:
commit c0a2c04b3cbf6d399a2551654401957ddb529a50
internal.h: change to new sane kernel headers on 64-bit archs
which I apparently entirely forgot about while trying
to synchronize internal and upstream git repositories.
Change-Id: I56d90a3c1e9b66c30526824fb7bc41aab01d85d1
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
ethtool-copy.h | 6 ------
1 file changed, 6 deletions(-)
diff --git a/ethtool-copy.h b/ethtool-copy.h
index 7772a4970987..6bfbb85f9402 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -14,12 +14,6 @@
#ifndef _LINUX_ETHTOOL_H
#define _LINUX_ETHTOOL_H
-#ifdef __powerpc64__
-/* Powerpc needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select
- * 'int-ll64.h' and avoid compile warnings when printing __u64 with %llu.
- */
-#define __SANE_USERSPACE_TYPES__
-#endif
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/if_ether.h>
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* Re: ethtool - manual changes in ethtool-copy.h
From: Maciej Żenczykowski @ 2019-01-30 2:42 UTC (permalink / raw)
To: John W. Linville; +Cc: Michal Kubecek, Linux NetDev
In-Reply-To: <CANP3RGf-MmOYgBV4fQwoRbg_+OAvKTFKGM2EorR3ig9WfevHZg@mail.gmail.com>
Eh, I think this patch can simply be reverted.
This is a dupe. ethtool-copy.h is only included from internal.h which
already includes:
...
#ifndef ETHTOOL_INTERNAL_H__
#define ETHTOOL_INTERNAL_H__
/* Some platforms (eg. ppc64) need __SANE_USERSPACE_TYPES__ before
* <linux/types.h> to select 'int-ll64.h' and avoid compile warnings
* when printing __u64 with %llu.
*/
#define __SANE_USERSPACE_TYPES__
...
which came in:
commit c0a2c04b3cbf6d399a2551654401957ddb529a50
Author: Maciej Żenczykowski <maze@google.com>
Date: Fri Mar 11 09:58:14 2016 -0800
internal.h: change to new sane kernel headers on 64-bit archs
On ppc64, this fixes:
...
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
I think I missed this while trying to synchronize internal with upstream repos.
(the local copy of the patch was obviously in the wrong spot)
^ permalink raw reply
* Re: [PATCH iproute2-next 2/2] ss: add AF_XDP support
From: David Ahern @ 2019-01-30 2:39 UTC (permalink / raw)
To: bjorn.topel, netdev, stephen
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson
In-Reply-To: <20190125071848.25959-3-bjorn.topel@gmail.com>
On 1/25/19 12:18 AM, bjorn.topel@gmail.com wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> AF_XDP is an address family that is optimized for high performance
> packet processing.
>
> This patch adds AF_XDP support to ss(8) so that sockets can be queried
> and monitored.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> man/man8/ss.8 | 9 ++-
> misc/ss.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 172 insertions(+), 5 deletions(-)
>
AF_XDP is not currently defined for a number of distributions. Add a
definition to include/utils.h similar to what is done for MPLS.
Also, please add example output to the commit log.
^ permalink raw reply
* Re: [PATCH iproute2-next v2] netns: add subcommand to attach an existing network namespace
From: David Ahern @ 2019-01-30 2:32 UTC (permalink / raw)
To: Matteo Croce, netdev; +Cc: Stephen Hemminger, Andrea Claudi
In-Reply-To: <20190129150115.19965-1-mcroce@redhat.com>
On 1/29/19 8:01 AM, Matteo Croce wrote:
> ip tracks namespaces with dummy files in /var/run/netns/, but can't see
> namespaces created with other tools.
> Creating the dummy file and bind mounting the correct procfs entry will
> make ip aware of that namespace.
> Add an ip netns subcommand to automate this task.
>
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
> ---
> ip/ipnetns.c | 62 ++++++++++++++++++++++++++++++++++-----------
> man/man8/ip-netns.8 | 10 ++++++++
> 2 files changed, 57 insertions(+), 15 deletions(-)
>
applied to iproute2-next. Thanks
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox