Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v2] bpf/tracing: fix kernel/events/core.c compilation error
From: Yonghong Song @ 2017-12-13 18:35 UTC (permalink / raw)
  To: ast, daniel, sfr, netdev; +Cc: kernel-team

Commit f371b304f12e ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          | 1 -
 include/linux/trace_events.h | 6 ++++++
 kernel/events/core.c         | 2 +-
 kernel/trace/bpf_trace.c     | 2 +-
 4 files changed, 8 insertions(+), 3 deletions(-)

Changelog:
 v1 -> v2:
   . Updated the commit message
   . Using CONFIG_BPF_EVENTS to guard the real definition
   . Renamed the function and move the definition from linux/bpf.h
     to linux/trace_events.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 93e15b9..54dc7ca 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -254,7 +254,6 @@ typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
 
 u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 		     void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
-int bpf_event_query_prog_array(struct perf_event *event, void __user *info);
 
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5fea451..8a1442c 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -467,6 +467,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 void perf_event_detach_bpf_prog(struct perf_event *event);
+int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -481,6 +482,11 @@ perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
 
 static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
 
+static inline int
+perf_event_query_prog_array(struct perf_event *event, void __user *info)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 enum {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5857c500..34fda6a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4725,7 +4725,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 	}
 
 	case PERF_EVENT_IOC_QUERY_BPF:
-		return bpf_event_query_prog_array(event, (void __user *)arg);
+		return perf_event_query_prog_array(event, (void __user *)arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e009b7e..c5dd60c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -856,7 +856,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
 	mutex_unlock(&bpf_event_mutex);
 }
 
-int bpf_event_query_prog_array(struct perf_event *event, void __user *info)
+int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 {
 	struct perf_event_query_bpf __user *uquery = info;
 	struct perf_event_query_bpf query = {};
-- 
2.9.5

^ permalink raw reply related

* Re: BUG REPORT: iproute2 seems to have bug with dsfield/tos in ip-rule and ip-route
From: Daniel Lakeland @ 2017-12-13 18:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171213101259.65652da6@xeon-e3>

On December 13, 2017 10:12:59 AM PST, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Wed, 13 Dec 2017 09:40:08 -0800
>Daniel Lakeland <dlakelan@street-artists.org> wrote:
>
>> This same problem as detailed here
>> 
>> http://lists.openwall.net/netdev/2010/03/26/36
>
>This mail reports an issue from 7 years ago, much nas
>changed since then.
>

I figure it's still biting me because no one reported it. This is with modern Debian testing system.

>
>The kernel is complaining that ip rule is not valid, (ie not iproute2
>issue).

Note that like some of those other people I was able to get ip rule to accept tos values with just low order bits set... On my phone now so can't test an example but it was a tos like 0x0c or something.

I'm really not familiar with internals or who's in charge of what I just wanted to be sure this issue hit some kernel netdev people's radar instead of dropping on the floor!!

>Not sure exactly why or where in fib_rules.c this is happening.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply

* Re: [patch net-next v3 00/10] net: sched: allow qdiscs to share filter block instances
From: David Ahern @ 2017-12-13 18:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20171213173948.GK2031@nanopsycho>

On 12/13/17 10:39 AM, Jiri Pirko wrote:
> Wed, Dec 13, 2017 at 06:18:04PM CET, dsahern@gmail.com wrote:
>> On 12/13/17 10:07 AM, Jiri Pirko wrote:
>>> Wed, Dec 13, 2017 at 05:54:35PM CET, dsahern@gmail.com wrote:
>>>> On 12/13/17 8:10 AM, Jiri Pirko wrote:
>>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>>> block number 22. "22" is just an identification. If we don't pass any
>>>>> block number, a new one will be generated by kernel:
>>>>>
>>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>>
>>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>>
>>>>> $ tc qdisc
>>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>>
>>>>> To make is more visual, the situation looks like this:
>>>>>
>>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>>           |                                  |
>>>>>           |                                  |
>>>>>           +---------->  block 22  <----------+
>>>>>
>>>>> Unlimited number of qdiscs may share the same block.
>>>>>
>>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>>
>>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>
>>>> I still say this is very odd user semantic - making changes to device M
>>>> and the changes magically affect device N. Operating on the shared block
>>>> as a separate object makes it is much more direct and clear.
>>>
>>> I plan to do it as a follow-up patch. But this is how things are done
>>> now and have to continue to work.
>>
>> Why is that? You are introducing the notion of a shared block with this
>> patch set. What is the legacy "how things are done now" you are
>> referring to?
> 
> Well, the filter add/del should just work no matter if the block behind is
> shared or not.

My argument is that modifying a shared block instance via a dev should
not be allowed. Those changes should only be allowed via the shared
block. So if a user puts adds a shared block to the device and then
attempts to add a filter via the device it should not be allowed.

^ permalink raw reply

* Re: [PATCH net-next 0/2] hv_netvsc: Fix default and limit of recv buffe
From: David Miller @ 2017-12-13 18:25 UTC (permalink / raw)
  To: stephen; +Cc: kys, haiyangz, sthemmin, devel, netdev
In-Reply-To: <20171211165658.14768-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Mon, 11 Dec 2017 08:56:56 -0800

> The default for receive buffer descriptors is not correct, it should
> match the default receive buffer size and the upper limit of receive
> buffer size is too low.  Also, for older versions of Window servers
> hosts, different lower limit check is necessary, otherwise the buffer
> request will be rejected by the host, resulting vNIC not come up.
> 
> This patch set corrects these problems.

Series applied.

^ permalink raw reply

* Re: [PATCH 27/45] net: remove duplicate includes
From: David Miller @ 2017-12-13 18:19 UTC (permalink / raw)
  To: pravin.shedge4linux
  Cc: netdev, netfilter-devel, coreteam, andrew, vivien.didelot,
	f.fainelli, pablo, kadlec, fw, jhs, xiyou.wangcong, mingo,
	rmk+kernel, linux-kernel
In-Reply-To: <1513010386-3394-1-git-send-email-pravin.shedge4linux@gmail.com>

From: Pravin Shedge <pravin.shedge4linux@gmail.com>
Date: Mon, 11 Dec 2017 22:09:46 +0530

> These duplicate includes have been found with scripts/checkincludes.pl but
> they have been removed manually to avoid removing false positives.
> 
> Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] ipv4: igmp: guard against silly MTU values
From: David Miller @ 2017-12-13 18:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1513005459.25033.45.camel@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 11 Dec 2017 07:17:39 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> IPv4 stack reacts to changes to small MTU, by disabling itself under
> RTNL.
> 
> But there is a window where threads not using RTNL can see a wrong
> device mtu. This can lead to surprises, in igmp code where it is
> assumed the mtu is suitable.
> 
> Fix this by reading device mtu once and checking IPv4 minimal MTU.
> 
> This patch adds missing IPV4_MIN_MTU define, to not abuse
> ETH_MIN_MTU anymore.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] ipv6: mcast: better catch silly mtu values
From: David Miller @ 2017-12-13 18:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lucien.xin, netdev
In-Reply-To: <1513004618.25033.43.camel@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 11 Dec 2017 07:03:38 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> syzkaller reported crashes in IPv6 stack [1]
> 
> Xin Long found that lo MTU was set to silly values.
> 
> IPv6 stack reacts to changes to small MTU, by disabling itself under
> RTNL.
> 
> But there is a window where threads not using RTNL can see a wrong
> device mtu. This can lead to surprises, in mld code where it is assumed
> the mtu is suitable.
> 
> Fix this by reading device mtu once and checking IPv6 minimal MTU.
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: BUG REPORT: iproute2 seems to have bug with dsfield/tos in ip-rule and ip-route
From: Stephen Hemminger @ 2017-12-13 18:12 UTC (permalink / raw)
  To: Daniel Lakeland; +Cc: netdev
In-Reply-To: <9e606c3b-915c-2608-c8aa-aa3167f51f8d@street-artists.org>

On Wed, 13 Dec 2017 09:40:08 -0800
Daniel Lakeland <dlakelan@street-artists.org> wrote:

> This same problem as detailed here
> 
> http://lists.openwall.net/netdev/2010/03/26/36

This mail reports an issue from 7 years ago, much nas
changed since then.

> 
> or here:
> 
> https://www.spinics.net/lists/lartc/msg22541.html
> 
> bit me today
> 
> I tried either
> 
> ip rule add dsfield CS6 table 100
> 
> or
> 
> ip rule add dsfield 0xc0 table 100
> 
> or replace dsfield with tos, all return:
> 
> RTNETLINK answers: Invalid argument
> 
> on the other hand, for ip route it will accept the ds/tos values
> 
> ip route add default dsfield CS6 dev dummy0
> 
> or
> 
> ip route add default dsfield 0xc0 dev dummy0
> 
> but packets tagged with CS6 don't go to dummy0 they go the regular 
> default route
> 
> 

The kernel is complaining that ip rule is not valid, (ie not iproute2 issue).
Not sure exactly why or where in fib_rules.c this is happening.

^ permalink raw reply

* Re: [PATCH v9 0/5] Add the ability to do BPF directed error injection
From: Darrick J. Wong @ 2017-12-13 18:07 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Masami Hiramatsu, rostedt, mingo, davem, netdev, linux-kernel,
	ast, kernel-team, daniel, linux-btrfs
In-Reply-To: <20171213180356.hsuhzoa7s4ngro2r@destiny>

On Wed, Dec 13, 2017 at 01:03:57PM -0500, Josef Bacik wrote:
> On Tue, Dec 12, 2017 at 03:11:50PM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 11, 2017 at 11:36:45AM -0500, Josef Bacik wrote:
> > > This is the same as v8, just rebased onto the bpf tree.
> > > 
> > > v8->v9:
> > > - rebased onto the bpf tree.
> > > 
> > > v7->v8:
> > > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
> > > 
> > > v6->v7:
> > > - moved the opt-in macro to bpf.h out of kprobes.h.
> > > 
> > > v5->v6:
> > > - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
> > >   feature.  This way only functions that opt-in will be allowed to be
> > >   overridden.
> > > - added a btrfs patch to allow error injection for open_ctree() so that the bpf
> > >   sample actually works.
> > > 
> > > v4->v5:
> > > - disallow kprobe_override programs from being put in the prog map array so we
> > >   don't tail call into something we didn't check.  This allows us to make the
> > >   normal path still fast without a bunch of percpu operations.
> > > 
> > > v3->v4:
> > > - fix a build error found by kbuild test bot (I didn't wait long enough
> > >   apparently.)
> > > - Added a warning message as per Daniels suggestion.
> > > 
> > > v2->v3:
> > > - added a ->kprobe_override flag to bpf_prog.
> > > - added some sanity checks to disallow attaching bpf progs that have
> > >   ->kprobe_override set that aren't for ftrace kprobes.
> > > - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
> > >   ftrace kprobe.
> > > - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
> > >   value in the kprobe path, and thus only write to it if we're overriding or
> > >   clearing the override.
> > > 
> > > v1->v2:
> > > - moved things around to make sure that bpf_override_return could really only be
> > >   used for an ftrace kprobe.
> > > - killed the special return values from trace_call_bpf.
> > > - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
> > >   it was being called from an ftrace kprobe context.
> > > - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
> > > - updated the test as per Alexei's review.
> > > 
> > > - Original message -
> > > 
> > > A lot of our error paths are not well tested because we have no good way of
> > > injecting errors generically.  Some subystems (block, memory) have ways to
> > > inject errors, but they are random so it's hard to get reproduceable results.
> > > 
> > > With BPF we can add determinism to our error injection.  We can use kprobes and
> > > other things to verify we are injecting errors at the exact case we are trying
> > > to test.  This patch gives us the tool to actual do the error injection part.
> > > It is very simple, we just set the return value of the pt_regs we're given to
> > > whatever we provide, and then override the PC with a dummy function that simply
> > > returns.
> > 
> > Heh, this looks cool.  I decided to try it to see what happens, and saw
> > a bunch of dmesg pasted in below.  Is that supposed to happen?  Or am I
> > the only fs developer still running with lockdep enabled? :)
> > 
> > It looks like bpf_override_return has some sort of side effect such that
> > we get the splat, since commenting it out makes the symptom go away.
> > 
> > <shrug>
> > 
> > --D
> > 
> > [ 1847.769183] BTRFS error (device (null)): open_ctree failed
> > [ 1847.770130] BUG: sleeping function called from invalid context at /storage/home/djwong/cdev/work/linux-xfs/kernel/locking/rwsem.c:69
> > [ 1847.771976] in_atomic(): 1, irqs_disabled(): 0, pid: 1524, name: mount
> > [ 1847.773016] 1 lock held by mount/1524:
> > [ 1847.773530]  #0:  (&type->s_umount_key#34/1){+.+.}, at: [<00000000653a9bb4>] sget_userns+0x302/0x4f0
> > [ 1847.774731] Preemption disabled at:
> > [ 1847.774735] [<          (null)>]           (null)
> > [ 1847.777009] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> > [ 1847.778800] Call Trace:
> > [ 1847.779047]  dump_stack+0x7c/0xbe
> > [ 1847.779361]  ___might_sleep+0x1f7/0x260
> > [ 1847.779720]  down_write+0x29/0xb0
> > [ 1847.780046]  unregister_shrinker+0x15/0x70
> > [ 1847.780427]  deactivate_locked_super+0x2e/0x60
> > [ 1847.780935]  btrfs_mount+0xbb6/0x1000 [btrfs]
> > [ 1847.781353]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.781750]  ? mount_fs+0xf/0x80
> > [ 1847.782065]  ? alloc_vfsmnt+0x1a1/0x230
> > [ 1847.782429]  mount_fs+0xf/0x80
> > [ 1847.782733]  vfs_kern_mount+0x62/0x160
> > [ 1847.783128]  btrfs_mount+0x3d3/0x1000 [btrfs]
> > [ 1847.783493]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.783849]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.784207]  ? mount_fs+0xf/0x80
> > [ 1847.784502]  mount_fs+0xf/0x80
> > [ 1847.784835]  vfs_kern_mount+0x62/0x160
> > [ 1847.785235]  do_mount+0x1b1/0xd50
> > [ 1847.785594]  ? _copy_from_user+0x5b/0x90
> > [ 1847.786028]  ? memdup_user+0x4b/0x70
> > [ 1847.786501]  SyS_mount+0x85/0xd0
> > [ 1847.786835]  entry_SYSCALL_64_fastpath+0x1f/0x96
> > [ 1847.787311] RIP: 0033:0x7f6ebecc1b5a
> > [ 1847.787691] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> > [ 1847.788383] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> > [ 1847.789106] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> > [ 1847.789807] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> > [ 1847.790511] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> > [ 1847.791211] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> > [ 1847.792029] BUG: scheduling while atomic: mount/1524/0x00000002
> > [ 1847.792680] 1 lock held by mount/1524:
> > [ 1847.793087]  #0:  (rcu_preempt_state.exp_mutex){+.+.}, at: [<00000000a6c536a9>] _synchronize_rcu_expedited+0x1ce/0x400
> > [ 1847.794129] Modules linked in: xfs libcrc32c btrfs xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress zlib_deflate raid6_pq dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: xfs]
> > [ 1847.795949] Preemption disabled at:
> > [ 1847.795951] [<          (null)>]           (null)
> > [ 1847.796844] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> > [ 1847.797621] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > [ 1847.798510] Call Trace:
> > [ 1847.798786]  dump_stack+0x7c/0xbe
> > [ 1847.799134]  __schedule_bug+0x88/0xe0
> > [ 1847.799517]  __schedule+0x78c/0xb20
> > [ 1847.799890]  ? trace_hardirqs_on_caller+0x119/0x180
> > [ 1847.800391]  schedule+0x40/0x90
> > [ 1847.800729]  _synchronize_rcu_expedited+0x36b/0x400
> > [ 1847.801218]  ? rcu_preempt_qs+0xa0/0xa0
> > [ 1847.801616]  ? remove_wait_queue+0x60/0x60
> > [ 1847.802040]  ? rcu_preempt_qs+0xa0/0xa0
> > [ 1847.802433]  ? rcu_exp_wait_wake+0x630/0x630
> > [ 1847.802872]  ? __lock_acquire+0xfb9/0x1120
> > [ 1847.803302]  ? __lock_acquire+0x534/0x1120
> > [ 1847.803725]  ? bdi_unregister+0x57/0x1a0
> > [ 1847.804135]  bdi_unregister+0x5c/0x1a0
> > [ 1847.804519]  bdi_put+0xcb/0xe0
> > [ 1847.804746]  generic_shutdown_super+0xe2/0x110
> > [ 1847.805066]  kill_anon_super+0xe/0x20
> > [ 1847.805344]  btrfs_kill_super+0x12/0xa0 [btrfs]
> > [ 1847.805664]  deactivate_locked_super+0x34/0x60
> > [ 1847.806111]  btrfs_mount+0xbb6/0x1000 [btrfs]
> > [ 1847.806476]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.806824]  ? mount_fs+0xf/0x80
> > [ 1847.807104]  ? alloc_vfsmnt+0x1a1/0x230
> > [ 1847.807416]  mount_fs+0xf/0x80
> > [ 1847.807712]  vfs_kern_mount+0x62/0x160
> > [ 1847.808112]  btrfs_mount+0x3d3/0x1000 [btrfs]
> > [ 1847.808565]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.809005]  ? __lockdep_init_map+0x5c/0x1d0
> > [ 1847.809425]  ? mount_fs+0xf/0x80
> > [ 1847.809731]  mount_fs+0xf/0x80
> > [ 1847.810070]  vfs_kern_mount+0x62/0x160
> > [ 1847.810469]  do_mount+0x1b1/0xd50
> > [ 1847.810821]  ? _copy_from_user+0x5b/0x90
> > [ 1847.811237]  ? memdup_user+0x4b/0x70
> > [ 1847.811622]  SyS_mount+0x85/0xd0
> > [ 1847.811996]  entry_SYSCALL_64_fastpath+0x1f/0x96
> > [ 1847.812465] RIP: 0033:0x7f6ebecc1b5a
> > [ 1847.812840] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> > [ 1847.813615] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> > [ 1847.814302] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> > [ 1847.814770] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> > [ 1847.815246] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> > [ 1847.815720] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> i> 
> 
> Looks like this is new, Masami this is happening because of your change here
> 
> 5bb4fc2d8641 ("kprobes/x86: Disable preemption in ftrace-based jprobes")
> 
> which makes it not do the preempt_enable() if the handler returns 1.  Why is
> that?  Should I be doing preempt_enable_no_resched() from the handler before
> returning 1?  Or is this just an oversight on your part?  Thanks,

FWIW I shut up the preemption imbalance warnings with the attached
coarse bandaid.  No idea if that's the correct fix...

--D

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5db8498..fd948e3 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1215,8 +1215,10 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
 		if (__this_cpu_read(bpf_kprobe_override)) {
 			__this_cpu_write(bpf_kprobe_override, 0);
 			reset_current_kprobe();
+			preempt_enable();
 			return 1;
 		}
+		preempt_enable();
 		if (!ret)
 			return 0;
 	}
> 
> Josef

^ permalink raw reply related

* Re: [PATCH v9 0/5] Add the ability to do BPF directed error injection
From: Josef Bacik @ 2017-12-13 18:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Josef Bacik, rostedt, mingo, davem, netdev, linux-kernel, ast,
	kernel-team, daniel, linux-btrfs, darrick.wong
In-Reply-To: <20171212231150.GA4894@magnolia>

On Tue, Dec 12, 2017 at 03:11:50PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 11, 2017 at 11:36:45AM -0500, Josef Bacik wrote:
> > This is the same as v8, just rebased onto the bpf tree.
> > 
> > v8->v9:
> > - rebased onto the bpf tree.
> > 
> > v7->v8:
> > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
> > 
> > v6->v7:
> > - moved the opt-in macro to bpf.h out of kprobes.h.
> > 
> > v5->v6:
> > - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
> >   feature.  This way only functions that opt-in will be allowed to be
> >   overridden.
> > - added a btrfs patch to allow error injection for open_ctree() so that the bpf
> >   sample actually works.
> > 
> > v4->v5:
> > - disallow kprobe_override programs from being put in the prog map array so we
> >   don't tail call into something we didn't check.  This allows us to make the
> >   normal path still fast without a bunch of percpu operations.
> > 
> > v3->v4:
> > - fix a build error found by kbuild test bot (I didn't wait long enough
> >   apparently.)
> > - Added a warning message as per Daniels suggestion.
> > 
> > v2->v3:
> > - added a ->kprobe_override flag to bpf_prog.
> > - added some sanity checks to disallow attaching bpf progs that have
> >   ->kprobe_override set that aren't for ftrace kprobes.
> > - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
> >   ftrace kprobe.
> > - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
> >   value in the kprobe path, and thus only write to it if we're overriding or
> >   clearing the override.
> > 
> > v1->v2:
> > - moved things around to make sure that bpf_override_return could really only be
> >   used for an ftrace kprobe.
> > - killed the special return values from trace_call_bpf.
> > - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
> >   it was being called from an ftrace kprobe context.
> > - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
> > - updated the test as per Alexei's review.
> > 
> > - Original message -
> > 
> > A lot of our error paths are not well tested because we have no good way of
> > injecting errors generically.  Some subystems (block, memory) have ways to
> > inject errors, but they are random so it's hard to get reproduceable results.
> > 
> > With BPF we can add determinism to our error injection.  We can use kprobes and
> > other things to verify we are injecting errors at the exact case we are trying
> > to test.  This patch gives us the tool to actual do the error injection part.
> > It is very simple, we just set the return value of the pt_regs we're given to
> > whatever we provide, and then override the PC with a dummy function that simply
> > returns.
> 
> Heh, this looks cool.  I decided to try it to see what happens, and saw
> a bunch of dmesg pasted in below.  Is that supposed to happen?  Or am I
> the only fs developer still running with lockdep enabled? :)
> 
> It looks like bpf_override_return has some sort of side effect such that
> we get the splat, since commenting it out makes the symptom go away.
> 
> <shrug>
> 
> --D
> 
> [ 1847.769183] BTRFS error (device (null)): open_ctree failed
> [ 1847.770130] BUG: sleeping function called from invalid context at /storage/home/djwong/cdev/work/linux-xfs/kernel/locking/rwsem.c:69
> [ 1847.771976] in_atomic(): 1, irqs_disabled(): 0, pid: 1524, name: mount
> [ 1847.773016] 1 lock held by mount/1524:
> [ 1847.773530]  #0:  (&type->s_umount_key#34/1){+.+.}, at: [<00000000653a9bb4>] sget_userns+0x302/0x4f0
> [ 1847.774731] Preemption disabled at:
> [ 1847.774735] [<          (null)>]           (null)
> [ 1847.777009] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> [ 1847.778800] Call Trace:
> [ 1847.779047]  dump_stack+0x7c/0xbe
> [ 1847.779361]  ___might_sleep+0x1f7/0x260
> [ 1847.779720]  down_write+0x29/0xb0
> [ 1847.780046]  unregister_shrinker+0x15/0x70
> [ 1847.780427]  deactivate_locked_super+0x2e/0x60
> [ 1847.780935]  btrfs_mount+0xbb6/0x1000 [btrfs]
> [ 1847.781353]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.781750]  ? mount_fs+0xf/0x80
> [ 1847.782065]  ? alloc_vfsmnt+0x1a1/0x230
> [ 1847.782429]  mount_fs+0xf/0x80
> [ 1847.782733]  vfs_kern_mount+0x62/0x160
> [ 1847.783128]  btrfs_mount+0x3d3/0x1000 [btrfs]
> [ 1847.783493]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.783849]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.784207]  ? mount_fs+0xf/0x80
> [ 1847.784502]  mount_fs+0xf/0x80
> [ 1847.784835]  vfs_kern_mount+0x62/0x160
> [ 1847.785235]  do_mount+0x1b1/0xd50
> [ 1847.785594]  ? _copy_from_user+0x5b/0x90
> [ 1847.786028]  ? memdup_user+0x4b/0x70
> [ 1847.786501]  SyS_mount+0x85/0xd0
> [ 1847.786835]  entry_SYSCALL_64_fastpath+0x1f/0x96
> [ 1847.787311] RIP: 0033:0x7f6ebecc1b5a
> [ 1847.787691] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> [ 1847.788383] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> [ 1847.789106] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> [ 1847.789807] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> [ 1847.790511] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> [ 1847.791211] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> [ 1847.792029] BUG: scheduling while atomic: mount/1524/0x00000002
> [ 1847.792680] 1 lock held by mount/1524:
> [ 1847.793087]  #0:  (rcu_preempt_state.exp_mutex){+.+.}, at: [<00000000a6c536a9>] _synchronize_rcu_expedited+0x1ce/0x400
> [ 1847.794129] Modules linked in: xfs libcrc32c btrfs xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress zlib_deflate raid6_pq dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: xfs]
> [ 1847.795949] Preemption disabled at:
> [ 1847.795951] [<          (null)>]           (null)
> [ 1847.796844] CPU: 2 PID: 1524 Comm: mount Tainted: G        W        4.15.0-rc3-xfsx #3
> [ 1847.797621] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> [ 1847.798510] Call Trace:
> [ 1847.798786]  dump_stack+0x7c/0xbe
> [ 1847.799134]  __schedule_bug+0x88/0xe0
> [ 1847.799517]  __schedule+0x78c/0xb20
> [ 1847.799890]  ? trace_hardirqs_on_caller+0x119/0x180
> [ 1847.800391]  schedule+0x40/0x90
> [ 1847.800729]  _synchronize_rcu_expedited+0x36b/0x400
> [ 1847.801218]  ? rcu_preempt_qs+0xa0/0xa0
> [ 1847.801616]  ? remove_wait_queue+0x60/0x60
> [ 1847.802040]  ? rcu_preempt_qs+0xa0/0xa0
> [ 1847.802433]  ? rcu_exp_wait_wake+0x630/0x630
> [ 1847.802872]  ? __lock_acquire+0xfb9/0x1120
> [ 1847.803302]  ? __lock_acquire+0x534/0x1120
> [ 1847.803725]  ? bdi_unregister+0x57/0x1a0
> [ 1847.804135]  bdi_unregister+0x5c/0x1a0
> [ 1847.804519]  bdi_put+0xcb/0xe0
> [ 1847.804746]  generic_shutdown_super+0xe2/0x110
> [ 1847.805066]  kill_anon_super+0xe/0x20
> [ 1847.805344]  btrfs_kill_super+0x12/0xa0 [btrfs]
> [ 1847.805664]  deactivate_locked_super+0x34/0x60
> [ 1847.806111]  btrfs_mount+0xbb6/0x1000 [btrfs]
> [ 1847.806476]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.806824]  ? mount_fs+0xf/0x80
> [ 1847.807104]  ? alloc_vfsmnt+0x1a1/0x230
> [ 1847.807416]  mount_fs+0xf/0x80
> [ 1847.807712]  vfs_kern_mount+0x62/0x160
> [ 1847.808112]  btrfs_mount+0x3d3/0x1000 [btrfs]
> [ 1847.808565]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.809005]  ? __lockdep_init_map+0x5c/0x1d0
> [ 1847.809425]  ? mount_fs+0xf/0x80
> [ 1847.809731]  mount_fs+0xf/0x80
> [ 1847.810070]  vfs_kern_mount+0x62/0x160
> [ 1847.810469]  do_mount+0x1b1/0xd50
> [ 1847.810821]  ? _copy_from_user+0x5b/0x90
> [ 1847.811237]  ? memdup_user+0x4b/0x70
> [ 1847.811622]  SyS_mount+0x85/0xd0
> [ 1847.811996]  entry_SYSCALL_64_fastpath+0x1f/0x96
> [ 1847.812465] RIP: 0033:0x7f6ebecc1b5a
> [ 1847.812840] RSP: 002b:00007ffc7bd1c958 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> [ 1847.813615] RAX: ffffffffffffffda RBX: 00007f6ebefba63a RCX: 00007f6ebecc1b5a
> [ 1847.814302] RDX: 0000000000bfd010 RSI: 0000000000bfa230 RDI: 0000000000bfa210
> [ 1847.814770] RBP: 0000000000bfa0f0 R08: 0000000000000000 R09: 0000000000000014
> [ 1847.815246] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 00007f6ebf1ca83c
> [ 1847.815720] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
i> 

Looks like this is new, Masami this is happening because of your change here

5bb4fc2d8641 ("kprobes/x86: Disable preemption in ftrace-based jprobes")

which makes it not do the preempt_enable() if the handler returns 1.  Why is
that?  Should I be doing preempt_enable_no_resched() from the handler before
returning 1?  Or is this just an oversight on your part?  Thanks,

Josef

^ permalink raw reply

* [PATCH net 4/4] s390/qeth: update takeover IPs after configuration change
From: Julian Wiedmann @ 2017-12-13 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20171213175632.100561-1-jwi@linux.vnet.ibm.com>

Any modification to the takeover IP-ranges requires that we re-evaluate
which IP addresses are takeover-eligible. Otherwise we might do takeover
for some addresses when we no longer should, or vice-versa.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core.h      |  4 +--
 drivers/s390/net/qeth_core_main.c |  4 +--
 drivers/s390/net/qeth_l3.h        |  2 +-
 drivers/s390/net/qeth_l3_main.c   | 31 +++++++++++++++++--
 drivers/s390/net/qeth_l3_sys.c    | 63 +++++++++++++++++++++------------------
 5 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 51c618d9fefe..badf42acbf95 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -566,8 +566,8 @@ enum qeth_cq {
 
 struct qeth_ipato {
 	bool enabled;
-	int invert4;
-	int invert6;
+	bool invert4;
+	bool invert6;
 	struct list_head entries;
 };
 
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 8d18675e60e2..6c815207f4f5 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1481,8 +1481,8 @@ static int qeth_setup_card(struct qeth_card *card)
 	/* IP address takeover */
 	INIT_LIST_HEAD(&card->ipato.entries);
 	card->ipato.enabled = false;
-	card->ipato.invert4 = 0;
-	card->ipato.invert6 = 0;
+	card->ipato.invert4 = false;
+	card->ipato.invert6 = false;
 	/* init QDIO stuff */
 	qeth_init_qdio_info(card);
 	INIT_DELAYED_WORK(&card->buffer_reclaim_work, qeth_buffer_reclaim_work);
diff --git a/drivers/s390/net/qeth_l3.h b/drivers/s390/net/qeth_l3.h
index 194ae9b577cc..e5833837b799 100644
--- a/drivers/s390/net/qeth_l3.h
+++ b/drivers/s390/net/qeth_l3.h
@@ -82,7 +82,7 @@ void qeth_l3_del_vipa(struct qeth_card *, enum qeth_prot_versions, const u8 *);
 int qeth_l3_add_rxip(struct qeth_card *, enum qeth_prot_versions, const u8 *);
 void qeth_l3_del_rxip(struct qeth_card *card, enum qeth_prot_versions,
 			const u8 *);
-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *, struct qeth_ipaddr *);
+void qeth_l3_update_ipato(struct qeth_card *card);
 struct qeth_ipaddr *qeth_l3_get_addr_buffer(enum qeth_prot_versions);
 int qeth_l3_add_ip(struct qeth_card *, struct qeth_ipaddr *);
 int qeth_l3_delete_ip(struct qeth_card *, struct qeth_ipaddr *);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 4a4be81800eb..ef0961e18686 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -164,8 +164,8 @@ static void qeth_l3_convert_addr_to_bits(u8 *addr, u8 *bits, int len)
 	}
 }
 
-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
-						struct qeth_ipaddr *addr)
+static bool qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
+					     struct qeth_ipaddr *addr)
 {
 	struct qeth_ipato_entry *ipatoe;
 	u8 addr_bits[128] = {0, };
@@ -606,6 +606,27 @@ int qeth_l3_setrouting_v6(struct qeth_card *card)
 /*
  * IP address takeover related functions
  */
+
+/**
+ * qeth_l3_update_ipato() - Update 'takeover' property, for all NORMAL IPs.
+ *
+ * Caller must hold ip_lock.
+ */
+void qeth_l3_update_ipato(struct qeth_card *card)
+{
+	struct qeth_ipaddr *addr;
+	unsigned int i;
+
+	hash_for_each(card->ip_htable, i, addr, hnode) {
+		if (addr->type != QETH_IP_TYPE_NORMAL)
+			continue;
+		if (qeth_l3_is_addr_covered_by_ipato(card, addr))
+			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+		else
+			addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
+	}
+}
+
 static void qeth_l3_clear_ipato_list(struct qeth_card *card)
 {
 	struct qeth_ipato_entry *ipatoe, *tmp;
@@ -617,6 +638,7 @@ static void qeth_l3_clear_ipato_list(struct qeth_card *card)
 		kfree(ipatoe);
 	}
 
+	qeth_l3_update_ipato(card);
 	spin_unlock_bh(&card->ip_lock);
 }
 
@@ -641,8 +663,10 @@ int qeth_l3_add_ipato_entry(struct qeth_card *card,
 		}
 	}
 
-	if (!rc)
+	if (!rc) {
 		list_add_tail(&new->entry, &card->ipato.entries);
+		qeth_l3_update_ipato(card);
+	}
 
 	spin_unlock_bh(&card->ip_lock);
 
@@ -665,6 +689,7 @@ void qeth_l3_del_ipato_entry(struct qeth_card *card,
 			    (proto == QETH_PROT_IPV4)? 4:16) &&
 		    (ipatoe->mask_bits == mask_bits)) {
 			list_del(&ipatoe->entry);
+			qeth_l3_update_ipato(card);
 			kfree(ipatoe);
 		}
 	}
diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c
index aa676b4090da..6ea2b528a64e 100644
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -370,9 +370,8 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct qeth_card *card = dev_get_drvdata(dev);
-	struct qeth_ipaddr *addr;
-	int i, rc = 0;
 	bool enable;
+	int rc = 0;
 
 	if (!card)
 		return -EINVAL;
@@ -391,20 +390,12 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 		goto out;
 	}
 
-	if (card->ipato.enabled == enable)
-		goto out;
-	card->ipato.enabled = enable;
-
-	spin_lock_bh(&card->ip_lock);
-	hash_for_each(card->ip_htable, i, addr, hnode) {
-		if (addr->type != QETH_IP_TYPE_NORMAL)
-			continue;
-		if (!enable)
-			addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
-		else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
-			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+	if (card->ipato.enabled != enable) {
+		card->ipato.enabled = enable;
+		spin_lock_bh(&card->ip_lock);
+		qeth_l3_update_ipato(card);
+		spin_unlock_bh(&card->ip_lock);
 	}
-	spin_unlock_bh(&card->ip_lock);
 out:
 	mutex_unlock(&card->conf_mutex);
 	return rc ? rc : count;
@@ -430,20 +421,27 @@ static ssize_t qeth_l3_dev_ipato_invert4_store(struct device *dev,
 				const char *buf, size_t count)
 {
 	struct qeth_card *card = dev_get_drvdata(dev);
+	bool invert;
 	int rc = 0;
 
 	if (!card)
 		return -EINVAL;
 
 	mutex_lock(&card->conf_mutex);
-	if (sysfs_streq(buf, "toggle"))
-		card->ipato.invert4 = (card->ipato.invert4)? 0 : 1;
-	else if (sysfs_streq(buf, "1"))
-		card->ipato.invert4 = 1;
-	else if (sysfs_streq(buf, "0"))
-		card->ipato.invert4 = 0;
-	else
+	if (sysfs_streq(buf, "toggle")) {
+		invert = !card->ipato.invert4;
+	} else if (kstrtobool(buf, &invert)) {
 		rc = -EINVAL;
+		goto out;
+	}
+
+	if (card->ipato.invert4 != invert) {
+		card->ipato.invert4 = invert;
+		spin_lock_bh(&card->ip_lock);
+		qeth_l3_update_ipato(card);
+		spin_unlock_bh(&card->ip_lock);
+	}
+out:
 	mutex_unlock(&card->conf_mutex);
 	return rc ? rc : count;
 }
@@ -609,20 +607,27 @@ static ssize_t qeth_l3_dev_ipato_invert6_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct qeth_card *card = dev_get_drvdata(dev);
+	bool invert;
 	int rc = 0;
 
 	if (!card)
 		return -EINVAL;
 
 	mutex_lock(&card->conf_mutex);
-	if (sysfs_streq(buf, "toggle"))
-		card->ipato.invert6 = (card->ipato.invert6)? 0 : 1;
-	else if (sysfs_streq(buf, "1"))
-		card->ipato.invert6 = 1;
-	else if (sysfs_streq(buf, "0"))
-		card->ipato.invert6 = 0;
-	else
+	if (sysfs_streq(buf, "toggle")) {
+		invert = !card->ipato.invert6;
+	} else if (kstrtobool(buf, &invert)) {
 		rc = -EINVAL;
+		goto out;
+	}
+
+	if (card->ipato.invert6 != invert) {
+		card->ipato.invert6 = invert;
+		spin_lock_bh(&card->ip_lock);
+		qeth_l3_update_ipato(card);
+		spin_unlock_bh(&card->ip_lock);
+	}
+out:
 	mutex_unlock(&card->conf_mutex);
 	return rc ? rc : count;
 }
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 3/4] s390/qeth: lock IP table while applying takeover changes
From: Julian Wiedmann @ 2017-12-13 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20171213175632.100561-1-jwi@linux.vnet.ibm.com>

Modifying the flags of an IP addr object needs to be protected against
eg. concurrent removal of the same object from the IP table.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_l3_sys.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c
index e256928092e5..aa676b4090da 100644
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -395,6 +395,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 		goto out;
 	card->ipato.enabled = enable;
 
+	spin_lock_bh(&card->ip_lock);
 	hash_for_each(card->ip_htable, i, addr, hnode) {
 		if (addr->type != QETH_IP_TYPE_NORMAL)
 			continue;
@@ -403,6 +404,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 		else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
 			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
 	}
+	spin_unlock_bh(&card->ip_lock);
 out:
 	mutex_unlock(&card->conf_mutex);
 	return rc ? rc : count;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 2/4] s390/qeth: don't apply takeover changes to RXIP
From: Julian Wiedmann @ 2017-12-13 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20171213175632.100561-1-jwi@linux.vnet.ibm.com>

When takeover is switched off, current code clears the 'TAKEOVER' flag on
all IPs. But the flag is also used for RXIP addresses, and those should
not be affected by the takeover mode.
Fix the behaviour by consistenly applying takover logic to NORMAL
addresses only.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_l3_main.c | 5 +++--
 drivers/s390/net/qeth_l3_sys.c  | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 6a73894b0cb5..4a4be81800eb 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -174,6 +174,8 @@ int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
 
 	if (!card->ipato.enabled)
 		return 0;
+	if (addr->type != QETH_IP_TYPE_NORMAL)
+		return 0;
 
 	qeth_l3_convert_addr_to_bits((u8 *) &addr->u, addr_bits,
 				  (addr->proto == QETH_PROT_IPV4)? 4:16);
@@ -290,8 +292,7 @@ int qeth_l3_add_ip(struct qeth_card *card, struct qeth_ipaddr *tmp_addr)
 		memcpy(addr, tmp_addr, sizeof(struct qeth_ipaddr));
 		addr->ref_counter = 1;
 
-		if (addr->type == QETH_IP_TYPE_NORMAL  &&
-				qeth_l3_is_addr_covered_by_ipato(card, addr)) {
+		if (qeth_l3_is_addr_covered_by_ipato(card, addr)) {
 			QETH_CARD_TEXT(card, 2, "tkovaddr");
 			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
 		}
diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c
index 198717f71b3d..e256928092e5 100644
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -396,10 +396,11 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 	card->ipato.enabled = enable;
 
 	hash_for_each(card->ip_htable, i, addr, hnode) {
+		if (addr->type != QETH_IP_TYPE_NORMAL)
+			continue;
 		if (!enable)
 			addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
-		else if (addr->type == QETH_IP_TYPE_NORMAL &&
-			 qeth_l3_is_addr_covered_by_ipato(card, addr))
+		else if (qeth_l3_is_addr_covered_by_ipato(card, addr))
 			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
 	}
 out:
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 1/4] s390/qeth: apply takeover changes when mode is toggled
From: Julian Wiedmann @ 2017-12-13 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20171213175632.100561-1-jwi@linux.vnet.ibm.com>

Just as for an explicit enable/disable, toggling the takeover mode also
requires that the IP addresses get updated. Otherwise all IPs that were
added to the table before the mode-toggle, get registered with the old
settings.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core.h      |  2 +-
 drivers/s390/net/qeth_core_main.c |  2 +-
 drivers/s390/net/qeth_l3_sys.c    | 35 +++++++++++++++++------------------
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 15015a24f8ad..51c618d9fefe 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -565,7 +565,7 @@ enum qeth_cq {
 };
 
 struct qeth_ipato {
-	int enabled;
+	bool enabled;
 	int invert4;
 	int invert6;
 	struct list_head entries;
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 430e3214f7e2..8d18675e60e2 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1480,7 +1480,7 @@ static int qeth_setup_card(struct qeth_card *card)
 	qeth_set_intial_options(card);
 	/* IP address takeover */
 	INIT_LIST_HEAD(&card->ipato.entries);
-	card->ipato.enabled = 0;
+	card->ipato.enabled = false;
 	card->ipato.invert4 = 0;
 	card->ipato.invert6 = 0;
 	/* init QDIO stuff */
diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c
index bd12fdf678be..198717f71b3d 100644
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -372,6 +372,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 	struct qeth_card *card = dev_get_drvdata(dev);
 	struct qeth_ipaddr *addr;
 	int i, rc = 0;
+	bool enable;
 
 	if (!card)
 		return -EINVAL;
@@ -384,25 +385,23 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev,
 	}
 
 	if (sysfs_streq(buf, "toggle")) {
-		card->ipato.enabled = (card->ipato.enabled)? 0 : 1;
-	} else if (sysfs_streq(buf, "1")) {
-		card->ipato.enabled = 1;
-		hash_for_each(card->ip_htable, i, addr, hnode) {
-				if ((addr->type == QETH_IP_TYPE_NORMAL) &&
-				qeth_l3_is_addr_covered_by_ipato(card, addr))
-					addr->set_flags |=
-					QETH_IPA_SETIP_TAKEOVER_FLAG;
-			}
-	} else if (sysfs_streq(buf, "0")) {
-		card->ipato.enabled = 0;
-		hash_for_each(card->ip_htable, i, addr, hnode) {
-			if (addr->set_flags &
-			QETH_IPA_SETIP_TAKEOVER_FLAG)
-				addr->set_flags &=
-				~QETH_IPA_SETIP_TAKEOVER_FLAG;
-			}
-	} else
+		enable = !card->ipato.enabled;
+	} else if (kstrtobool(buf, &enable)) {
 		rc = -EINVAL;
+		goto out;
+	}
+
+	if (card->ipato.enabled == enable)
+		goto out;
+	card->ipato.enabled = enable;
+
+	hash_for_each(card->ip_htable, i, addr, hnode) {
+		if (!enable)
+			addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
+		else if (addr->type == QETH_IP_TYPE_NORMAL &&
+			 qeth_l3_is_addr_covered_by_ipato(card, addr))
+			addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+	}
 out:
 	mutex_unlock(&card->conf_mutex);
 	return rc ? rc : count;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 0/4] s390/qeth: fixes 2017-12-13
From: Julian Wiedmann @ 2017-12-13 17:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann

Hi Dave,

some more patches for 4.15, that fix multiple issues with IP Takeover
configuration in qeth.
Please queue them up for stable kernels as well (4.9 and newer).

Thanks,
Julian


Julian Wiedmann (4):
  s390/qeth: apply takeover changes when mode is toggled
  s390/qeth: don't apply takeover changes to RXIP
  s390/qeth: lock IP table while applying takeover changes
  s390/qeth: update takeover IPs after configuration change

 drivers/s390/net/qeth_core.h      |  6 ++--
 drivers/s390/net/qeth_core_main.c |  6 ++--
 drivers/s390/net/qeth_l3.h        |  2 +-
 drivers/s390/net/qeth_l3_main.c   | 36 ++++++++++++++++---
 drivers/s390/net/qeth_l3_sys.c    | 75 +++++++++++++++++++++------------------
 5 files changed, 79 insertions(+), 46 deletions(-)

-- 
2.13.5

^ permalink raw reply

* Re: [PATCH] ipv6: ip6mr: Recalc UDP checksum before forwarding
From: Eric Dumazet @ 2017-12-13 17:52 UTC (permalink / raw)
  To: Brendan McGrath, David S . Miller, netdev, linux-kernel
In-Reply-To: <1513164048-21368-1-git-send-email-redmcg@redmandi.dyndns.org>

On Wed, 2017-12-13 at 22:20 +1100, Brendan McGrath wrote:
> Currently, when forwarding from a Virtual Interface to a Physical
> Interface, ip_summed is set to a value of CHECKSUM_UNNECESSARY and
> the UDP checksum has not been calculated.
> 

This seems a bug then ?
CHECKSUM_UNNECESSARY means checksum has been validated.
Not that we want it being computed later in the stack.

If we force a checksum here, what guarantee do we have packet was not
corrupted before we do this ?

^ permalink raw reply

* Re: [RFC][PATCH] new byteorder primitives - ..._{replace,get}_bits()
From: Al Viro @ 2017-12-13 17:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linus Torvalds, netdev, linux-kernel
In-Reply-To: <20171213142212.GD21978@ZenIV.linux.org.uk>

On Wed, Dec 13, 2017 at 02:22:12PM +0000, Al Viro wrote:

> Next question: where do we put that bunch?  I've put it into
> linux/byteorder/generic.h, so that anything picking fixed-endian primitives
> would pick those as well; I hadn't thought of linux/bitfield.h at the time.
> We certainly could put it there instead - it's never pulled by other headers,
> so adding #include <asm/byteorder.h> into linux/bitfield.h is not going to
> cause header order problems.  Not sure...
> 
> Linus, do you have any preferences in that area?

After looking at some of the callers of bitfield.h stuff: it might be useful
to add

static inline void le64p_replace_bits(__le64 *p, u64 v, u64 mask)
{
	__le64 m = cpu_to_le64(mask);
	*p = (*p & ~m) | (cpu_to_le64(v * mask_to_multiplier(mask)) & m);
}

and similar for other types.  Not sure what would be a good name for
host-endian variants - u64p_replace_bits() sounds a bit clumsy.  Suggestions?

^ permalink raw reply

* BUG REPORT: iproute2 seems to have bug with dsfield/tos in ip-rule and ip-route
From: Daniel Lakeland @ 2017-12-13 17:40 UTC (permalink / raw)
  To: netdev

This same problem as detailed here

http://lists.openwall.net/netdev/2010/03/26/36

or here:

https://www.spinics.net/lists/lartc/msg22541.html

bit me today

I tried either

ip rule add dsfield CS6 table 100

or

ip rule add dsfield 0xc0 table 100

or replace dsfield with tos, all return:

RTNETLINK answers: Invalid argument

on the other hand, for ip route it will accept the ds/tos values

ip route add default dsfield CS6 dev dummy0

or

ip route add default dsfield 0xc0 dev dummy0

but packets tagged with CS6 don't go to dummy0 they go the regular 
default route

^ permalink raw reply

* Re: [patch net-next v3 00/10] net: sched: allow qdiscs to share filter block instances
From: Jiri Pirko @ 2017-12-13 17:39 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <90bf2450-a21c-9f70-2dc3-b147d0c40740@gmail.com>

Wed, Dec 13, 2017 at 06:18:04PM CET, dsahern@gmail.com wrote:
>On 12/13/17 10:07 AM, Jiri Pirko wrote:
>> Wed, Dec 13, 2017 at 05:54:35PM CET, dsahern@gmail.com wrote:
>>> On 12/13/17 8:10 AM, Jiri Pirko wrote:
>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>> block number 22. "22" is just an identification. If we don't pass any
>>>> block number, a new one will be generated by kernel:
>>>>
>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>                                 ^^^^^^^^
>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>                                 ^^^^^^^^
>>>>
>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>
>>>> $ tc qdisc
>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>
>>>> To make is more visual, the situation looks like this:
>>>>
>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>           |                                  |
>>>>           |                                  |
>>>>           +---------->  block 22  <----------+
>>>>
>>>> Unlimited number of qdiscs may share the same block.
>>>>
>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>
>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>
>>> I still say this is very odd user semantic - making changes to device M
>>> and the changes magically affect device N. Operating on the shared block
>>> as a separate object makes it is much more direct and clear.
>> 
>> I plan to do it as a follow-up patch. But this is how things are done
>> now and have to continue to work.
>
>Why is that? You are introducing the notion of a shared block with this
>patch set. What is the legacy "how things are done now" you are
>referring to?

Well, the filter add/del should just work no matter if the block behind is
shared or not.


>
>> Also changes done on dev block X for dev A has to appear in block X
>> for dev B. Block X is share between A and B.
>> 
>
>Certainly - that's the definition of a shared block and you are
>referring to display and datapath. For admin, it is more direct and
>apparent in terms of what is happening to require changes (filter add
>and deletes) to be done by specifying the shared block as the primary
>object.

^ permalink raw reply

* [PATCH 2/3] net: phy: select sensible mode for non-autoneg PHYs on startup
From: Lucas Stach @ 2017-12-13 17:37 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev, kernel, patchwork-lst
In-Reply-To: <20171213173751.12722-1-l.stach@pengutronix.de>

Init speed and duplex to unknown, so phy_lookup_setting() knows that
it should select the mode only based on the PHY allowed link modes.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/net/phy/phy-core.c   | 5 +++++
 drivers/net/phy/phy_device.c | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 21f75ae244b3..149a4bab1e6f 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -149,6 +149,11 @@ phy_lookup_setting(int speed, int duplex, const unsigned long *mask,
 	const struct phy_setting *p, *match = NULL, *last = NULL;
 	int i;
 
+	if (!exact && speed == SPEED_UNKNOWN)
+		speed = INT_MAX;
+	if (!exact && duplex == DUPLEX_UNKNOWN)
+		duplex = DUPLEX_FULL;
+
 	for (i = 0, p = settings; i < ARRAY_SIZE(settings); i++, p++) {
 		if (p->bit < maxbit && test_bit(p->bit, mask)) {
 			last = p;
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 8ef48b38d97b..35278282259a 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1785,6 +1785,8 @@ static int phy_probe(struct device *dev)
 	phydev->supported = phydrv->features;
 	of_set_phy_supported(phydev);
 	phydev->advertising = phydev->supported;
+	phydev->speed = SPEED_UNKNOWN;
+	phydev->duplex = DUPLEX_UNKNOWN;
 
 	/* Get the EEE modes we want to prohibit. We will ask
 	 * the PHY stop advertising these mode later on
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/3] net: phy: add support to detect 100BASE-T1 capability
From: Lucas Stach @ 2017-12-13 17:37 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev, kernel, patchwork-lst

100BASE-T1 is the automotive ethernet standard 802.3bw-2015. Currently
we don't detect any valid modes for PHYs, which only support this
standard. Add support to detect the common 100Mbit full-duplex mode.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/net/phy/phy_device.c | 2 ++
 include/uapi/linux/mii.h     | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 67f25ac29025..8ef48b38d97b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1607,6 +1607,8 @@ int genphy_config_init(struct phy_device *phydev)
 		if (val < 0)
 			return val;
 
+		if (val & ESTATUS_100T1_FULL)
+			features |= SUPPORTED_100baseT_Full;
 		if (val & ESTATUS_1000_TFULL)
 			features |= SUPPORTED_1000baseT_Full;
 		if (val & ESTATUS_1000_THALF)
diff --git a/include/uapi/linux/mii.h b/include/uapi/linux/mii.h
index b5c2fdcf23fd..eb5cc45d23fb 100644
--- a/include/uapi/linux/mii.h
+++ b/include/uapi/linux/mii.h
@@ -121,6 +121,7 @@
 #define EXPANSION_MFAULTS	0x0010	/* Multiple faults detected    */
 #define EXPANSION_RESV		0xffe0	/* Unused...                   */
 
+#define ESTATUS_100T1_FULL	0x0080	/* Can do 100BASE-T1 Full      */
 #define ESTATUS_1000_TFULL	0x2000	/* Can do 1000BT Full          */
 #define ESTATUS_1000_THALF	0x1000	/* Can do 1000BT Half          */
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH 3/3] net: phy: sanitize autoneg in phy_start_aneg_priv
From: Lucas Stach @ 2017-12-13 17:37 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev, kernel, patchwork-lst
In-Reply-To: <20171213173751.12722-1-l.stach@pengutronix.de>

phy_sanitize_settings() is only called when autonegotiation has been
explicitly disabled. This breaks PHYs without any autonegotiation
capability, as the check for the capability happens inside this function.

Move the check out to the caller, so it is properly applied for those
PHYs.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/net/phy/phy.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 2b1e67bc1e73..433d859b6955 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -226,10 +226,6 @@ static void phy_sanitize_settings(struct phy_device *phydev)
 	const struct phy_setting *setting;
 	u32 features = phydev->supported;
 
-	/* Sanitize settings based on PHY capabilities */
-	if ((features & SUPPORTED_Autoneg) == 0)
-		phydev->autoneg = AUTONEG_DISABLE;
-
 	setting = phy_find_valid(phydev->speed, phydev->duplex, features);
 	if (setting) {
 		phydev->speed = setting->speed;
@@ -487,6 +483,10 @@ static int phy_start_aneg_priv(struct phy_device *phydev, bool sync)
 
 	mutex_lock(&phydev->lock);
 
+	/* Sanitize settings based on PHY capabilities */
+	if ((phydev->supported & SUPPORTED_Autoneg) == 0)
+		phydev->autoneg = AUTONEG_DISABLE;
+
 	if (AUTONEG_DISABLE == phydev->autoneg)
 		phy_sanitize_settings(phydev);
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next] bpf/tracing: fix kernel/events/core.c compilation error
From: Yonghong Song @ 2017-12-13 17:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, sfr, netdev; +Cc: kernel-team
In-Reply-To: <475c2645-01fc-7f25-da78-7ff180760b97@fb.com>



On 12/13/17 7:50 AM, Alexei Starovoitov wrote:
> On 12/13/17 7:44 AM, Daniel Borkmann wrote:
>> On 12/13/2017 08:42 AM, Yonghong Song wrote:
>>> Commit f371b304f12e ("bpf/tracing: allow user space to
>>> query prog array on the same tp") introduced a perf
>>> ioctl command to query prog array attached to the
>>> same perf tracepoint. The commit introduced a
>>> compilation error when either CONFIG_BPF_SYSCALL or
>>> CONFIG_EVENT_TRACING is not defined:
>>>   kernel/events/core.o: In function `perf_ioctl':
>>>   core.c:(.text+0x98c4): undefined reference to 
>>> `bpf_event_query_prog_array'
>>>
>>> This patch fixed this error.
>>>
>>> Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog 
>>> array on the same tp")
>>> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>
>> Looking at _perf_ioctl(), we also have perf_event_set_bpf_prog()
>> there. It's basically under CONFIG_EVENT_TRACING, which later calls
>> perf_event_attach_bpf_prog() which is under CONFIG_BPF_EVENTS, so
>> where we have the dummy handler returning -EOPNOTSUPP when BPF
>> events is not set. bpf_trace.c is only built when CONFIG_BPF_EVENTS
>> is set and that by itself depends on BPF_SYSCALL already. So it would
>> be more correct to do the same thing here ...
>>
>> #if defined(CONFIG_EVENT_TRACING) && defined(CONFIG_BPF_EVENTS)
>> [...]
> 
> +1
> #ifdef CONFIG_BPF_EVENTS
> works, whereas CONFIG_EVENT_TRACING probably not, since kprobe
> can be disabled independently which will turn off BPF_EVENTS
> and body of bpf_event_query_prog_array() will be gone.

I tested to enable/disable uprobe/kprobe/both and my patch works.
But I did not test  enable a non uprobe/kprobe tracing event
(e.g., CONFIG_FUNCTION_TRACER) where CONFIG_TRACING and 
CONFIG_EVENT_TRACING is on but CONFIG_UPROBES_EVENT/CONFIG_KPROBES_EVENT
is off and then my patch breaks.

Looks like
#ifdef CONFIG_BPF_EVENTS
is suffice.
This config will enable to include bpf_trace.c with the real definition. 
It will depend on KPROBE_EVENTS or UPROBE_EVENTS and either
of them will enable CONFIG_TRACING and then CONFIG_EVENT_TRACING.

Will resubmit the patch after testing.

^ permalink raw reply

* Re: [PATCH] ipv6: ip6mr: Recalc UDP checksum before forwarding
From: Marcelo Ricardo Leitner @ 2017-12-13 17:21 UTC (permalink / raw)
  To: Brendan McGrath
  Cc: David S . Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel
In-Reply-To: <1513164048-21368-1-git-send-email-redmcg@redmandi.dyndns.org>

Hi,

On Wed, Dec 13, 2017 at 10:20:48PM +1100, Brendan McGrath wrote:
> Currently, when forwarding from a Virtual Interface to a Physical
> Interface, ip_summed is set to a value of CHECKSUM_UNNECESSARY and
> the UDP checksum has not been calculated.
> 
> When the packet is then forwarded by a Multicast Router, the checksum
> value is left as is and therefore rejected by the receiving
> machine(s).
> 
> This patch ensures the checksum is recalculated before forwarding.
> 
> Signed-off-by: Brendan McGrath <redmcg@redmandi.dyndns.org>
> ---
> 
> It's a bit ugly putting UDP specific code in this spot - but I'm not
> aware of any other protocols that are:
> a) multicast;
> b) forwarded; and
> c) checksummed
> 
>  net/ipv6/ip6mr.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 890f9bda..ee4370a 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -2077,6 +2077,13 @@ static int ip6mr_forward2(struct net *net, struct mr6_table *mrt,
>  	ipv6h = ipv6_hdr(skb);
>  	ipv6h->hop_limit--;
>  
> +	if (ipv6h->nexthdr == NEXTHDR_UDP &&
> +					skb->ip_summed != CHECKSUM_PARTIAL) {
	    ^

This indentation is wrong. The 2nd line should start right after the (
column in the 1st line, like:
+	if (ipv6h->nexthdr == NEXTHDR_UDP &&
+	    skb->ip_summed != CHECKSUM_PARTIAL) {
Adjust with spaces as needed.

Running the patch through scripts/checkpatch.pl before posting will
catch these.

> +		struct udphdr *uh = udp_hdr(skb);
> +		udp6_set_csum(false, skb, &ipv6_hdr(skb)->saddr,
> +					&ipv6_hdr(skb)->daddr, ntohs(uh->len));
                              ^  same deal here.

> +	}
> +
>  	IP6CB(skb)->flags |= IP6SKB_FORWARDED;
>  
>  	return NF_HOOK(NFPROTO_IPV6, NF_INET_FORWARD,
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net] net: phy: fix resume handling
From: Andrew Lunn @ 2017-12-13 17:19 UTC (permalink / raw)
  To: Russell King; +Cc: Florian Fainelli, netdev
In-Reply-To: <E1eOi48-0002N2-QC@rmk-PC.armlinux.org.uk>

On Tue, Dec 12, 2017 at 10:45:36AM +0000, Russell King wrote:
> When a PHY has the BMCR_PDOWN bit set, it may decide to ignore writes
> to other registers, or reset the registers to power-on defaults.
> Micrel PHYs do this for their interrupt registers.
> 
> The current structure of phylib tries to enable interrupts before
> resuming (and releasing) the BMCR_PDOWN bit.  This fails, causing
> Micrel PHYs to stop working after a suspend/resume sequence if they
> are using interrupts.
> 
> Fix this by ensuring that the PHY driver resume methods do not take
> the phydev->lock mutex themselves, but the callers of phy_resume()
> take that lock.  This then allows us to move the call to phy_resume()
> before we enable interrupts in phy_start().
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox