Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver
From: David Miller @ 2019-01-30 22:50 UTC (permalink / raw)
  To: tanhuazhong
  Cc: netdev, linux-kernel, huangdaode, yisen.zhuang, salil.mehta,
	linuxarm
In-Reply-To: <20190130205552.8512-1-tanhuazhong@huawei.com>

From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Thu, 31 Jan 2019 04:55:40 +0800

> This patchset includes bugfixes and code optimizations for the HNS3
> ethernet controller driver

Series applied, thanks.

^ permalink raw reply

* Re: BUG: KASAN: double-free or invalid-free in ip_defrag after upgrade from 4.19.13
From: Michal Kubecek @ 2019-01-30 23:00 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Linux Kernel Network Developers, David S. Miller, Eric Dumazet,
	Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CABWYdi1=CwMH1McYkVy+HOQcVHWqZerhjqyn8irQq10wee08Zg@mail.gmail.com>

On Wed, Jan 30, 2019 at 02:26:32PM -0800, Ivan Babrou wrote:
> Hey,
> 
> Continuing from this thread earlier today:
> 
> * https://marc.info/?t=154886729100001&r=1&w=2
> 
> We fired up KASAN enabled kernel one one of those machine and this is
> what we saw:
...
> This commit from 4.19.14 seems relevant:
> 
> * https://github.com/torvalds/linux/commit/d5f9565c8d5ad3cf94982223cfcef1169b0bb60f
> 
> As a reminder, we upgraded from 4.19.13 and started seeing crashes.

Unfortunately I'm on vacation this week so that my capability to look
deeper into this is limited but there seems to be one obvious problem
with the 4.19.y backport: in mainline, there is

        err = -EINVAL;

right on top of the "Find out where to put this fragment." comment which
had been added by commit 0ff89efb5246 ("ip: fail fast on IP defrag
errors"). In 4.19.y backport of the commit, this assignment is missing
so that the value of err at this point comes from earlier
pskb_trim_rcsum() call so that it must be zero and if we take any of the
"goto err" added by commit d5f9565c8d5a, we drop the packet by calling
kfree_skb() but return zero so that caller doesn't know about it.

Michal Kubecek


^ permalink raw reply

* Re: [PATCH net-next v2 00/12] net: dsa: management mode for bcm_sf2
From: David Miller @ 2019-01-30 22:23 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, andrew, vivien.didelot, idosch, jiri, ilias.apalodimas,
	ivan.khoronzhuk, roopa, nikolay
In-Reply-To: <20190130005548.2212-1-f.fainelli@gmail.com>


Florian, Ido has a question about how the driver return value is
propagated wrt. patch #1 since the operation is deferred.

Please address this.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] nfp: devlink: report the running and flashed versions
From: Jiri Pirko @ 2019-01-30 22:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, oss-drivers, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon
In-Reply-To: <20190130142158.433af0ce@cakuba.hsd1.ca.comcast.net>

Wed, Jan 30, 2019 at 11:21:58PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 30 Jan 2019 22:57:52 +0100, Jiri Pirko wrote:
>> >+/* Control processor FW version, FW is responsible for house keeping tasks,
>> >+ * PHY control etc.
>> >+ */
>> >+#define DEVLINK_VERSION_GENERIC_FW_MGMT		"fw.mgmt"
>> >+/* Data path microcode controlling high-speed packet processing */
>> >+#define DEVLINK_VERSION_GENERIC_FW_APP		"fw.app"
>> >+/* UNDI software version */
>> >+#define DEVLINK_VERSION_GENERIC_FW_UNDI		"fw.undi"
>> >+/* NCSI support/handler version */
>> >+#define DEVLINK_VERSION_GENERIC_FW_NCSI		"fw.ncsi"  
>> 
>> Same here. Also, please put "INFO" in the names to respect the namespacing
>
>Ack on all, and thanks for the reviews!  Do you also think I should add
>a doc with them?  I was going back and forth on that..

Adding docs would not do harm I believe.

Thanks!

^ permalink raw reply

* [PATCH net-next v2 0/5] net: tls: TLS 1.3 support
From: Dave Watson @ 2019-01-30 21:57 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Dave Miller
  Cc: Vakul Garg, Boris Pismenny, Aviad Yehezkel, John Fastabend,
	Daniel Borkmann

This patchset adds 256bit keys and TLS1.3 support to the kernel TLS
socket.  

TLS 1.3 is requested by passing TLS_1_3_VERSION in the setsockopt
call, which changes the framing as required for TLS1.3.  

256bit keys are requested by passing TLS_CIPHER_AES_GCM_256 in the
sockopt.  This is a fairly straightforward passthrough to the crypto
framework.  

256bit keys work with both TLS 1.2 and TLS 1.3

TLS 1.3 requires a different AAD layout, necessitating some minor
refactoring.  It also moves the message type byte to the encrypted
portion of the message, instead of the cleartext header as it was in
TLS1.2.  This requires moving the control message handling to after
decryption, but is otherwise similar.

V1 -> V2

The first two patches were dropped, and sent separately, one as a
bugfix to the net tree.

Dave Watson (5):
  net: tls: Support 256 bit keys
  net: tls: Refactor tls aad space size calculation
  net: tls: Refactor control message handling on recv
  net: tls: Add tls 1.3 support
  net: tls: Add tests for TLS 1.3

 include/net/tls.h                 |  72 ++++++---
 include/uapi/linux/tls.h          |  19 +++
 net/tls/tls_device.c              |   5 +-
 net/tls/tls_device_fallback.c     |   3 +-
 net/tls/tls_main.c                |  36 ++++-
 net/tls/tls_sw.c                  | 244 +++++++++++++++++++++---------
 tools/testing/selftests/net/tls.c | 138 ++++++++++++++++-
 7 files changed, 417 insertions(+), 100 deletions(-)

-- 
2.17.1


^ permalink raw reply

* Re: [PATCH iproute2-next] Introduce ip-brctl shell script
From: Roopa Prabhu @ 2019-01-30 22:30 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Nikolay Aleksandrov, David Ahern, Phil Sutter, Eric Garver,
	Tomas Dolezal, Stephen Hemminger, Lennert Buytenhek, netdev
In-Reply-To: <20190128085748.57cbeff5@redhat.com>

On Sun, Jan 27, 2019 at 11:57 PM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> On Sun, 27 Jan 2019 21:08:13 -0800
> Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> > On Fri, Jan 25, 2019 at 2:05 AM Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > Hi Roopa,
> > >
> > > On Wed, 23 Jan 2019 08:33:27 -0800
> > > Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> > >
> > > > On Wed, Jan 23, 2019 at 7:09 AM Nikolay Aleksandrov
> > > > <nikolay@cumulusnetworks.com> wrote:
> > > >
> > > > > Hi,
> > > > > IMO the effort should be towards improving iproute2 to be
> > > > > easier to use and more intuitive. We should be pushing people to
> > > > > use the new tools instead of trying to find workarounds to keep the
> > > > > old tools alive. I do like to idea of deprecating bridge-utils, but
> > > > > I think it should be done via improving ip/bridge enough to be
> > > > > pleasant to use. We will have to maintain this compatibility layer
> > > > > forever if it gets accepted and we'll never get rid of brctl this
> > > > > way.
> > > >
> > > > +1, we should move people away from brtcl. there is enough confusion
> > > > among users looking at bridge attributes.,
> > > >
> > > > ip -d link show
> > > > bridge -d link show
> > > > brctl
> > >
> > > Why is this confusing? One can simply pick the most appropriate tool.
> > >
> > > > Adding a 4th one  to the list is not going to ease the confusion.
> > >
> > > Why do you say I'm adding a fourth (I guess) tool? I'm replacing the
> > > third one.
> >
> > I know. But the first two commands were supposed to replace the third
> > one already.
> > and they should be.
>
> They can't replace brctl not because they are badly designed or
> unusable, but simply because they are different tools with different
> purposes (see also my comments to Nikolay).

I don't think i understand that they are different tools. The new netlink tools
are supposed to deprecate the old tools that use ioctls. this is the same reason
we don't have a ip-ifconfig today


>
> > So, I think its better to fix the first two instead of introducing
> > another one.
>
> This is really not the same thing: I'm not introducing a new tool, I'm
> effectively replacing a 1794-LoC, non-trivial, ioctl-based
> implementation with a trivial, 572-lines shell script, with half
> the binary size.

you are replacing a ioctl-based tool from another package into
iproute2..and maybe there-by deprecating the netlink based tool ? :).
We are in opposite camps, I strongly want people to move to bridge and
ip link which has all the latest support.

>
> I'm not doing this on bridge-utils directly because that would imply
> the need to still maintain a different tool. For all practical
> maintenance purposes, I'm actually getting rid of a separate tool,
> which is my only goal here.
>
> I'd rather say we go from 3 tools to slightly more than 2.
>
> > > > We should try to make the 'ip -d link show and bridge -d link show'
> > > > outputs better. Any suggestions there from people will be useful.
> > >
> > > To be honest, I don't see any problem with them -- they just do
> > > different things.
> >
> > Can we extend 'bridge' tool with extra options to provide a summary
> > view of all bridges like brctl ?
>
> We could, and I initially thought of that approach instead, but that
> has a number of fundamental downsides:
>
> - we can't provide a brctl-compatible syntax, unless we want to
>   substantially rewrite the 'bridge' interface, and I think it's a
>   bad idea to break 'bridge' syntax for users, while we won't be able to
>   replace brctl if we don't provide a similar syntax, history showed

I am certainly not suggesting we break existing bridge users. I am
talking about new options.

I understand some people are finding it hard to move away from brctl
output, but in my experience,
these are also the people who want new things in brctl like json
output etc. which is already available in the bridge command



>
> - the fdb implementation has a long-dated comment by Stephen in its
>   header,
>         * TODO: merge/replace this with ip neighbour
>   and this is actually the only part of 'bridge' I'm using in ip-brctl.
>   Code is conceptually duplicated there, and I think we should actually
>   get rid of that -- but then 'bridge' wouldn't even give information
>   about the FDB, one would need to use ip neighbour instead.

This could be comment from initial days. Today bridge has support for
fdb, vlans and vlan tunnels which you
cannot get from brctl and any brctl compat tool.


>
> - 'bridge' doesn't implement settings for basic bridge features (say,
>   STP), which are convenient for users, especially if they are used to
>   brctl. To get that, even at an interface/syntax level, we would need
>   to duplicate some parts of ip-link, which looks like a bad idea per
>   se.

thats fine IMO. Today ip link set extended bridge attribute support is
only for convenience.
You can set most attributes both from ip link set and bridge link
command. We can see if they can share code.

You can set a vlan on a bridge today via the bridge command. I dont
see why we should hesitate about STP here.
And you will get the json output for free.


>
> > Its supposed to be the netlink based tool for all bridging and hence
> > could be a good replacement for all brctl users.
>
> I still think the best replacement for users is the one that changes
> absolutely nothing, and if that's easily achievable, I'd rather go for
> it.

That would also mean we add ip-ifconfig and ip-ethtool (if we
deprecate ethtool tomorrow. i am not saying its going away....,
but just giving you an example of ioctl to netlink based tools).

^ permalink raw reply

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Alexei Starovoitov @ 2019-01-30 22:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Peter Zijlstra, Alexei Starovoitov, davem, daniel,
	jakub.kicinski, netdev, kernel-team, mingo, jannh
In-Reply-To: <20190130210536.GY4240@linux.ibm.com>

On Wed, Jan 30, 2019 at 01:05:36PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 30, 2019 at 11:51:14AM -0800, Alexei Starovoitov wrote:
> > On Wed, Jan 30, 2019 at 10:36:18AM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> > > > Hi Alexei,
> > > > 
> > > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > > > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > > > > BPF memory model is equal to memory model of underlying architecture.
> > > > > > > What we can do is to make it bpf progs a bit more portable with
> > > > > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > > > > 
> > > > > > Well, I agree with only introducing bits you actually need, and my
> > > > > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > > > > smp_store_release() might have been a far more useful example.
> > > > > > 
> > > > > > But I disagree with the last part; we have to pick a model now;
> > > > > > otherwise you'll pain yourself into a corner.
> > > > > > 
> > > > > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > > > > be gaining a lot of attention and that is very much a weak architecture.
> > > > > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > > > > the long run.
> > > > > 
> > > > > arm64 is gaining attention just like riscV is gaining it too.
> > > > > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > > > > BPF is not picking sides in CPU HW and ISA battles.
> > > > 
> > > > It's not about picking a side, it's about providing an abstraction of the
> > > > various CPU architectures out there so that the programmer doesn't need to
> > > > worry about where their program may run. Hell, even if you just said "eBPF
> > > > follows x86 semantics" that would be better than saying nothing (and then we
> > > > could have a discussion about whether x86 semantics are really what you
> > > > want).
> > > 
> > > To reinforce this point, the Linux-kernel memory model (tools/memory-model)
> > > is that abstraction for the Linux kernel.  Why not just use that for BPF?
> > 
> > I already answered this earlier in the thread.
> > tldr: not going to sacrifice performance.
> 
> Understood.
> 
> But can we at least say that where there are no performance consequences,
> BPF should follow LKMM?  You already mentioned smp_load_acquire()
> and smp_store_release(), but the void atomics (e.g., atomic_inc())
> should also work because they don't provide any ordering guarantees.
> The _relaxed(), _release(), and _acquire() variants of the value-returning
> atomics should be just fine as well.
> 
> The other value-returning atomics have strong ordering, which is fine
> on many systems, but potentially suboptimal for the weakly ordered ones.
> Though you have to have pretty good locality of reference to be able to
> see the difference, because otherwise cache-miss overhead dominates.
> 
> Things like cmpxchg() don't seem to fit BPF because they are normally
> used in spin loops, though there are some non-spinning use cases.
> 
> You correctly pointed out that READ_ONCE() and WRITE_ONCE() are suboptimal
> on systems that don't support all sizes of loads, but I bet that there
> are some sizes for which they are just fine across systems, for example,
> pointer size and int size.
> 
> Does that help?  Or am I missing additional cases where performance
> could be degraded?

bpf doesn't have smp_load_acquire, atomic_fetch_add, xchg, fence instructions.
They can be added step by step. That's easy.
I believe folks already started working on adding atomic_fetch_add.
What I have problem with is making a statement today that bpf's end
goal is LKMM. Even after adding all sorts of instructions it may
not be practical.
Only when real use case requires adding new instruction we do it.
Do you have a bpf program that needs smp_load_acquire ?


^ permalink raw reply

* Re: BUG: KASAN: double-free or invalid-free in ip_defrag after upgrade from 4.19.13
From: Eric Dumazet @ 2019-01-30 22:50 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Linux Kernel Network Developers, mkubecek, David S. Miller,
	Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CABWYdi1=CwMH1McYkVy+HOQcVHWqZerhjqyn8irQq10wee08Zg@mail.gmail.com>

On Wed, Jan 30, 2019 at 2:26 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> Hey,
>
> Continuing from this thread earlier today:
>
> * https://marc.info/?t=154886729100001&r=1&w=2
>
> We fired up KASAN enabled kernel one one of those machine and this is
> what we saw:
>
> $ /tmp/decode_stacktrace.sh
> /usr/lib/debug/lib/modules/4.19.18-cloudflare-2019.1.8-1-gcabf55c/vmlinux
> linux-4.19.18 < kasan.txt
> [ 2300.250278] ==================================================================
> [ 2300.266575] BUG: KASAN: double-free or invalid-free in ip_defrag
> (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
> [ 2300.282860]
> [ 2300.293415] CPU: 28 PID: 0 Comm: swapper/28 Tainted: G    B      O
>     4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
> [ 2300.313767] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2300.332707] Call Trace:
> [ 2300.344701]  <IRQ>
> [ 2300.356188] dump_stack (lib/dump_stack.c:115)
> [ 2300.368967] print_address_description (mm/kasan/report.c:257)
> [ 2300.383192] ? ip_defrag (net/ipv4/ip_fragment.c:507
> net/ipv4/ip_fragment.c:699)
> [ 2300.396330] kasan_report_invalid_free (mm/kasan/report.c:337)
> [ 2300.410448] ? ip_defrag (net/ipv4/ip_fragment.c:507
> net/ipv4/ip_fragment.c:699)
> [ 2300.423599] __kasan_slab_free (mm/kasan/kasan.c:502)
> [ 2300.437165] ? ip_defrag (net/ipv4/ip_fragment.c:507
> net/ipv4/ip_fragment.c:699)
> [ 2300.450251] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
> [ 2300.463497] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
> [ 2300.476352] ? ip4_obj_hashfn (net/ipv4/ip_fragment.c:684)
> [ 2300.489711] ? ip_route_input_rcu (net/ipv4/route.c:2122)
> [ 2300.503416] ip_local_deliver (net/ipv4/ip_input.c:252)
> [ 2300.516739] ? ip_call_ra_chain (net/ipv4/ip_input.c:245)
> [ 2300.530174] ? ip_rcv_finish_core.isra.19 (net/ipv4/ip_input.c:366)
> [ 2300.544535] ? ip_local_deliver (net/ipv4/ip_input.c:518)
> [ 2300.557862] ip_rcv (net/ipv4/ip_input.c:518)
> [ 2300.569972] ? ip_local_deliver (net/ipv4/ip_input.c:518)
> [ 2300.583216] ? ip_rcv_core.isra.20 (net/ipv4/ip_input.c:403)
> [ 2300.596683] __netif_receive_skb_one_core (net/core/dev.c:4911)
> [ 2300.610732] ? __netif_receive_skb_core (net/core/dev.c:4911)
> [ 2300.624666] ? eth_gro_receive (net/ethernet/eth.c:157)
> [ 2300.637374] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2300.650015] ? ktime_get_with_offset (kernel/time/timekeeping.c:267
> kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:799)
> [ 2300.662708] ? __build_skb (include/linux/compiler.h:214
> arch/x86/include/asm/atomic.h:43
> include/asm-generic/atomic-instrumented.h:34 net/core/skbuff.c:300)
> [ 2300.674529] netif_receive_skb_internal (net/core/dev.c:5097)
> [ 2300.687430] ? dev_cpu_dead (net/core/dev.c:5097)
> [ 2300.699351] ? efx_rx_mk_skb+0x5d0/0x1210 sfc]
> [ 2300.711999] ? efx_time_sync_event+0x1b0/0x1b0 sfc]
> [ 2300.725126] efx_rx_deliver+0x447/0x640 sfc]
> [ 2300.737697] ? efx_free_rx_buffers+0x180/0x180 sfc]
> [ 2300.750803] ? __efx_rx_packet+0x76e/0x23b0 sfc]
> [ 2300.763572] ? efx_ssr+0x19c0/0x19c0 sfc]
> [ 2300.775502] ? efx_ef10_ptp_set_ts_config+0x120/0x120 sfc]
> [ 2300.788713] ? reweight_entity (kernel/sched/fair.c:2762
> kernel/sched/fair.c:2830)
> [ 2300.800224] ? efx_poll+0x991/0x12b0 sfc]
> [ 2300.811467] ? net_rx_action (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/napi.h:14
> net/core/dev.c:6263 net/core/dev.c:6328)
> [ 2300.822343] ? napi_complete_done (net/core/dev.c:6306)
> [ 2300.833468] ? hrtimer_init (kernel/time/hrtimer.c:1430)
> [ 2300.843830] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2300.854377] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:194
> include/asm-generic/atomic-instrumented.h:58
> include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:180
> include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:144)
> [ 2300.864214] ? handle_irq_event (kernel/irq/handle.c:209)
> [ 2300.874106] ? __do_softirq (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/irq.h:142
> kernel/softirq.c:293)
> [ 2300.883609] ? handle_irq (arch/x86/kernel/irq_64.c:79)
> [ 2300.892849] ? irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
> [ 2300.901709] ? do_IRQ (arch/x86/include/asm/irq_regs.h:19
> arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/irq.c:260)
> [ 2300.910059] ? common_interrupt (arch/x86/entry/entry_64.S:646)
> [ 2300.918862]  </IRQ>
> [ 2300.925956] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:251)
> [ 2300.935470] ? do_idle (kernel/sched/idle.c:204 kernel/sched/idle.c:262)
> [ 2300.943904] ? arch_cpu_idle_exit (??:?)
> [ 2300.953108] ? cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
> [ 2300.962229] ? cpu_in_idle (kernel/sched/idle.c:349)
> [ 2300.970788] ? clockevents_config.part.12 (kernel/time/clockevents.c:503)
> [ 2300.980788] ? start_secondary (arch/x86/kernel/smpboot.c:213)
> [ 2300.989915] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:213)
> [ 2300.999569] ? secondary_startup_64 (arch/x86/kernel/head_64.S:243)
> [ 2301.008969]
> [ 2301.015480] Allocated by task 0:
> [ 2301.023718] kasan_kmalloc (mm/kasan/kasan.c:460 mm/kasan/kasan.c:553)
> [ 2301.032340] kmem_cache_alloc (arch/x86/include/asm/jump_label.h:36
> include/linux/memcontrol.h:1292 mm/slab.h:447 mm/slub.c:2706
> mm/slub.c:2714 mm/slub.c:2719)
> [ 2301.041269] __build_skb (net/core/skbuff.c:282 (discriminator 4))
> [ 2301.049724] __netdev_alloc_skb (net/core/skbuff.c:423)
> [ 2301.058898] efx_rx_mk_skb+0x10e/0x1210 sfc]
> [ 2301.068239]
> [ 2301.074615] Freed by task 0:
> [ 2301.082411] __kasan_slab_free (mm/kasan/kasan.c:522)
> [ 2301.091429] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
> [ 2301.100160] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
> [ 2301.108518] ipv4_conntrack_defrag+0x323/0x490 nf_defrag_ipv4]
> [ 2301.119408] nf_hook_slow (net/netfilter/core.c:512)
> [ 2301.127942] ip_rcv (include/linux/netfilter.h:288 net/ipv4/ip_input.c:524)
> [ 2301.135977] __netif_receive_skb_one_core (net/core/dev.c:4911)
> [ 2301.145905] netif_receive_skb_internal (net/core/dev.c:5097)
> [ 2301.155687] efx_rx_deliver+0x447/0x640 sfc]
> [ 2301.164986]
> [ 2301.171326] The buggy address belongs to the object at ffff888bd8f543c0
> [ 2301.171326]  which belongs to the cache skbuff_head_cache of size 232
> [ 2301.194483] The buggy address is located 0 bytes inside of
> [ 2301.194483]  232-byte region [ffff888bd8f543c0, ffff888bd8f544a8)
> [ 2301.216346] The buggy address belongs to the page:
> [ 2301.226355] page:ffffea002f63d500 count:1 mapcount:0
> mapping:ffff88a03c294540 index:0xffff888bd8f561c0 compound_mapcount: 0
> [ 2301.243024] flags: 0x2ffff800008100(slab|head)
> [ 2301.253041] raw: 002ffff800008100 ffffea002341d300 0000002d00000002
> ffff88a03c294540
> [ 2301.266600] raw: ffff888bd8f561c0 0000000080330030 00000001ffffffff
> 0000000000000000
> [ 2301.280190] page dumped because: kasan: bad access detected
> [ 2301.291627]
> [ 2301.298900] Memory state around the buggy address:
> [ 2301.309617]  ffff888bd8f54280: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 2301.322930]  ffff888bd8f54300: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fc fc fc
> [ 2301.336183] >ffff888bd8f54380: fc fc fc fc fc fc fc fc fb fb fb fb
> fb fb fb fb
> [ 2301.349449]                                            ^
> [ 2301.360817]  ffff888bd8f54400: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 2301.374248]  ffff888bd8f54480: fb fb fb fb fb fc fc fc fc fc fc fc
> fc fc fc fc
> [ 2301.387663] ==================================================================
> [ 2301.401334] ==================================================================
> [ 2301.414780] BUG: KASAN: double-free or invalid-free in tcp_v4_rcv
> (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.428222]
> [ 2301.435965] CPU: 28 PID: 0 Comm: swapper/28 Tainted: G    B      O
>     4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
> [ 2301.453552] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2301.469737] Call Trace:
> [ 2301.478962]  <IRQ>
> [ 2301.487699] dump_stack (lib/dump_stack.c:115)
> [ 2301.497768] print_address_description (mm/kasan/report.c:257)
> [ 2301.509256] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.519681] kasan_report_invalid_free (mm/kasan/report.c:337)
> [ 2301.531138] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.541628] __kasan_slab_free (mm/kasan/kasan.c:502)
> [ 2301.552571] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.563087] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
> [ 2301.573831] tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.584110] ? icmp_checkentry+0x70/0x70 ip_tables]
> [ 2301.595966] ? tcp_v4_early_demux (net/ipv4/tcp_ipv4.c:1693)
> [ 2301.607224] ip_local_deliver_finish (net/ipv4/ip_input.c:216)
> [ 2301.618764] ip_local_deliver (net/ipv4/ip_input.c:245)
> [ 2301.629636] ? ip_call_ra_chain (net/ipv4/ip_input.c:245)
> [ 2301.640683] ? ip_sublist_rcv (net/ipv4/ip_input.c:192)
> [ 2301.651493] ? ip_local_deliver (net/ipv4/ip_input.c:518)
> [ 2301.662419] ip_rcv (net/ipv4/ip_input.c:518)
> [ 2301.672198] ? ip_local_deliver (net/ipv4/ip_input.c:518)
> [ 2301.683164] ? ip_rcv_core.isra.20 (net/ipv4/ip_input.c:403)
> [ 2301.694340] __netif_receive_skb_one_core (net/core/dev.c:4911)
> [ 2301.694344] ? __netif_receive_skb_core (net/core/dev.c:4911)
> [ 2301.694361] ? eth_gro_receive (net/ethernet/eth.c:157)
> [ 2301.694369] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2301.694375] ? ktime_get_with_offset (kernel/time/timekeeping.c:267
> kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:799)
> [ 2301.694385] ? __build_skb (include/linux/compiler.h:214
> arch/x86/include/asm/atomic.h:43
> include/asm-generic/atomic-instrumented.h:34 net/core/skbuff.c:300)
> [ 2301.760745] netif_receive_skb_internal (net/core/dev.c:5097)
> [ 2301.760750] ? dev_cpu_dead (net/core/dev.c:5097)
> [ 2301.760786] ? efx_rx_mk_skb+0x5d0/0x1210 sfc]
> [ 2301.760808] ? efx_time_sync_event+0x1b0/0x1b0 sfc]
> [ 2301.760831] efx_rx_deliver+0x447/0x640 sfc]
> [ 2301.760851] ? efx_free_rx_buffers+0x180/0x180 sfc]
> [ 2301.760872] ? __efx_rx_packet+0x76e/0x23b0 sfc]
> [ 2301.835110] ? efx_ssr+0x19c0/0x19c0 sfc]
> [ 2301.835142] ? efx_ef10_ptp_set_ts_config+0x120/0x120 sfc]
> [ 2301.835152] ? reweight_entity (kernel/sched/fair.c:2762
> kernel/sched/fair.c:2830)
> [ 2301.835186] ? efx_poll+0x991/0x12b0 sfc]
> [ 2301.876013] ? net_rx_action (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/napi.h:14
> net/core/dev.c:6263 net/core/dev.c:6328)
> [ 2301.876019] ? napi_complete_done (net/core/dev.c:6306)
> [ 2301.895619] ? hrtimer_init (kernel/time/hrtimer.c:1430)
> [ 2301.895630] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2301.914880] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:194
> include/asm-generic/atomic-instrumented.h:58
> include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:180
> include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:144)
> [ 2301.914887] ? handle_irq_event (kernel/irq/handle.c:209)
> [ 2301.914895] ? __do_softirq (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/irq.h:142
> kernel/softirq.c:293)
> [ 2301.943072] ? handle_irq (arch/x86/kernel/irq_64.c:79)
> [ 2301.943085] ? irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
> [ 2301.960340] ? do_IRQ (arch/x86/include/asm/irq_regs.h:19
> arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/irq.c:260)
> [ 2301.960346] ? common_interrupt (arch/x86/entry/entry_64.S:646)
> [ 2301.960348]  </IRQ>
> [ 2301.960359] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:251)
> [ 2301.960380] ? do_idle (kernel/sched/idle.c:204 kernel/sched/idle.c:262)
> [ 2301.960383] ? arch_cpu_idle_exit (??:?)
> [ 2301.960389] ? cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
> [ 2301.960392] ? cpu_in_idle (kernel/sched/idle.c:349)
> [ 2301.960413] ? clockevents_config.part.12 (kernel/time/clockevents.c:503)
> [ 2301.960420] ? start_secondary (arch/x86/kernel/smpboot.c:213)
> [ 2301.960423] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:213)
> [ 2301.960430] ? secondary_startup_64 (arch/x86/kernel/head_64.S:243)
> [ 2301.960435]
> [ 2302.070728] Allocated by task 0:
> [ 2302.070739] kasan_kmalloc (mm/kasan/kasan.c:460 mm/kasan/kasan.c:553)
> [ 2302.070764] kmem_cache_alloc (arch/x86/include/asm/jump_label.h:36
> include/linux/memcontrol.h:1292 mm/slab.h:447 mm/slub.c:2706
> mm/slub.c:2714 mm/slub.c:2719)
> [ 2302.095562] __build_skb (net/core/skbuff.c:282 (discriminator 4))
> [ 2302.095565] __netdev_alloc_skb (net/core/skbuff.c:423)
> [ 2302.095604] efx_rx_mk_skb+0x10e/0x1210 sfc]
> [ 2302.095611]
> [ 2302.127968] Freed by task 0:
> [ 2302.127983] __kasan_slab_free (mm/kasan/kasan.c:522)
> [ 2302.127993] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
> [ 2302.152762] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
> [ 2302.152768] ipv4_conntrack_defrag+0x323/0x490 nf_defrag_ipv4]
> [ 2302.152771] nf_hook_slow (net/netfilter/core.c:512)
> [ 2302.152775] ip_rcv (include/linux/netfilter.h:288 net/ipv4/ip_input.c:524)
> [ 2302.152779] __netif_receive_skb_one_core (net/core/dev.c:4911)
> [ 2302.152782] netif_receive_skb_internal (net/core/dev.c:5097)
> [ 2302.152808] efx_rx_deliver+0x447/0x640 sfc]
> [ 2302.152810]
> [ 2302.152813] The buggy address belongs to the object at ffff888bd8f543c0
> [ 2302.152813]  which belongs to the cache skbuff_head_cache of size 232
> [ 2302.152815] The buggy address is located 0 bytes inside of
> [ 2302.152815]  232-byte region [ffff888bd8f543c0, ffff888bd8f544a8)
> [ 2302.152816] The buggy address belongs to the page:
> [ 2302.152819] page:ffffea002f63d500 count:1 mapcount:0
> mapping:ffff88a03c294540 index:0xffff888bd8f561c0 compound_mapcount: 0
> [ 2302.152822] flags: 0x2ffff800008100(slab|head)
> [ 2302.152827] raw: 002ffff800008100 ffffea002341d300 0000002d00000002
> ffff88a03c294540
> [ 2302.152829] raw: ffff888bd8f561c0 0000000080330030 00000001ffffffff
> 0000000000000000
> [ 2302.152830] page dumped because: kasan: bad access detected
> [ 2302.152830]
> [ 2302.152831] Memory state around the buggy address:
> [ 2302.152833]  ffff888bd8f54280: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 2302.152835]  ffff888bd8f54300: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fc fc fc
> [ 2302.152836] >ffff888bd8f54380: fc fc fc fc fc fc fc fc fb fb fb fb
> fb fb fb fb
> [ 2302.152837]                                            ^
> [ 2302.152839]  ffff888bd8f54400: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 2302.152840]  ffff888bd8f54480: fb fb fb fb fb fc fc fc fc fc fc fc
> fc fc fc fc
> [ 2302.152841] ==================================================================
> [ 2302.187379] BUG: Bad page state in process nginx-origin  pfn:28b7f8
> [ 2302.462537] page:ffffea000a2dfe00 count:-1 mapcount:0
> mapping:0000000000000000 index:0x0
> [ 2302.462542] flags: 0x2ffff800000000()
> [ 2302.462549] raw: 002ffff800000000 dead000000000100 dead000000000200
> 0000000000000000
> [ 2302.462553] raw: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000000
> [ 2302.462554] page dumped because: nonzero _count
> [ 2302.462555] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
> dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
> x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
> crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
> [ 2302.650012]  crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
> dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2302.650031] CPU: 1 PID: 74997 Comm: nginx-origin Tainted: G    B
>   O      4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
> [ 2302.650033] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2302.650035] Call Trace:
> [ 2302.650049] dump_stack (lib/dump_stack.c:115)
> [ 2302.650062] bad_page.cold.116 (mm/page_alloc.c:542)
> [ 2302.755115] ? si_mem_available (mm/page_alloc.c:507)
> [ 2302.755119] ? ksys_write (fs/read_write.c:599)
> [ 2302.755126] ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
> [ 2302.755130] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2302.755135] get_page_from_freelist (mm/page_alloc.c:2997
> mm/page_alloc.c:3342)
> [ 2302.755140] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2302.755144] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2302.755153] ? kasan_unpoison_shadow (mm/kasan/kasan.c:71)
> [ 2302.861765] ? __isolate_free_page (mm/page_alloc.c:3252)
> [ 2302.861769] ? __kmalloc_node_track_caller (mm/slab.h:448
> mm/slub.c:2706 mm/slub.c:4320)
> [ 2302.861775] ? __alloc_skb (net/core/skbuff.c:206)
> [ 2302.861783] __alloc_pages_nodemask (mm/page_alloc.c:4369)
> [ 2302.915129] ? __alloc_pages_slowpath (mm/page_alloc.c:4345)
> [ 2302.915135] skb_page_frag_refill (net/core/sock.c:2213)
> [ 2302.915139] sk_page_frag_refill (net/core/sock.c:2234)
> [ 2302.915144] tcp_sendmsg_locked (net/ipv4/tcp.c:1321)
> [ 2302.915149] ? interrupt_entry (arch/x86/entry/entry_64.S:607)
> [ 2302.915153] ? kasan_unpoison_shadow (mm/kasan/kasan.c:68)
> [ 2302.915160] ? tcp_sendpage (net/ipv4/tcp.c:1175)
> [ 2303.003254] ? selinux_secmark_relabel_packet (security/selinux/hooks.c:4532)
> [ 2303.003260] ? release_pages (mm/swap.c:716)
> [ 2303.028592] ? inet_sk_set_state (net/ipv4/af_inet.c:794)
> [ 2303.028596] tcp_sendmsg (net/ipv4/tcp.c:1444)
> [ 2303.028603] sock_sendmsg (net/socket.c:622 net/socket.c:631)
> [ 2303.028609] sock_write_iter (net/socket.c:901)
> [ 2303.075968] ? sock_sendmsg (net/socket.c:884)
> [ 2303.075978] __vfs_write (fs/read_write.c:475 fs/read_write.c:487)
> [ 2303.075986] ? __handle_mm_fault (mm/memory.c:3211 mm/memory.c:4030
> mm/memory.c:4156)
> [ 2303.111370] ? kernel_read (fs/read_write.c:483)
> [ 2303.111375] ? file_has_perm (security/selinux/hooks.c:1919)
> [ 2303.111379] ? bpf_fd_pass (security/selinux/hooks.c:1890)
> [ 2303.111386] vfs_write (fs/read_write.c:550)
> [ 2303.111389] ksys_write (fs/read_write.c:599)
> [ 2303.111394] ? __ia32_sys_read (fs/read_write.c:592)
> [ 2303.111401] do_syscall_64 (arch/x86/entry/common.c:290)
> [ 2303.188508] ? page_fault (arch/x86/entry/entry_64.S:1161)
> [ 2303.188513] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
> [ 2303.188517] RIP: 0033:0x7f53e469f190
> [ 2303.188521] Code: 2e 0f 1f 84 00 00 00 00 00 90 48 8b 05 39 7e 20
> 00 c3 0f 1f 84 00 00 00 00 00 83 3d 39 c2 20 00 00 75 10 b8 01 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff ff 48 89
> 04 24
> All code
> ========
>    0: 2e 0f 1f 84 00 00 00 nopl   %cs:0x0(%rax,%rax,1)
>    7: 00 00
>    9: 90                    nop
>    a: 48 8b 05 39 7e 20 00 mov    0x207e39(%rip),%rax        # 0x207e4a
>   11: c3                    retq
>   12: 0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
>   19: 00
>   1a: 83 3d 39 c2 20 00 00 cmpl   $0x0,0x20c239(%rip)        # 0x20c25a
>   21: 75 10                jne    0x33
>   23: b8 01 00 00 00        mov    $0x1,%eax
>   28: 0f 05                syscall
>   2a:* 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax <--
> trapping instruction
>   30: 73 31                jae    0x63
>   32: c3                    retq
>   33: 48 83 ec 08          sub    $0x8,%rsp
>   37: e8 ae fc ff ff        callq  0xfffffffffffffcea
>   3c: 48 89 04 24          mov    %rax,(%rsp)
>
> Code starting with the faulting instruction
> ===========================================
>    0: 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax
>    6: 73 31                jae    0x39
>    8: c3                    retq
>    9: 48 83 ec 08          sub    $0x8,%rsp
>    d: e8 ae fc ff ff        callq  0xfffffffffffffcc0
>   12: 48 89 04 24          mov    %rax,(%rsp)
> [ 2303.188523] RSP: 002b:00007ffcc6a0c118 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [ 2303.188528] RAX: ffffffffffffffda RBX: 00005562df6160b3 RCX: 00007f53e469f190
> [ 2303.188531] RDX: 000000000000401d RSI: 00005562df6160b3 RDI: 0000000000000d4f
> [ 2303.188533] RBP: 00007ffcc6a0c150 R08: 0000000000000005 R09: 0000000060640d3e
> [ 2303.188535] R10: 00005562d20f7b10 R11: 0000000000000246 R12: 000000000000401d
> [ 2303.188541] R13: 000000000000401d R14: 00007ffcc6a0c3a8 R15: 00005562dc0e6ec8
> [ 2303.407074] WARNING: CPU: 21 PID: 74997 at lib/iov_iter.c:825
> copy_page_to_iter (lib/iov_iter.c:825 lib/iov_iter.c:832)
> [ 2303.420983] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
> dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
> x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
> crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
> [ 2303.538009]  crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
> dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2303.538034] CPU: 21 PID: 74997 Comm: nginx-origin Tainted: G    B
>    O      4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
> [ 2303.538037] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2303.538050] RIP: 0010:copy_page_to_iter (??:?)
> [ 2303.538055] Code: 07 00 00 4d 85 f6 4c 89 54 24 10 4d 8b 6f 18 4c
> 89 44 24 08 74 0c 4c 89 ff e8 65 43 ff ff 84 c0 75 12 45 31 f6 e9 d9
> fe ff ff <0f> 0b 45 31 f6 e9 cf fe ff ff 49 8d 6f 08 4c 8b 44 24 08 48
> b8 00
> All code
> ========
>    0: 07                    (bad)
>    1: 00 00                add    %al,(%rax)
>    3: 4d 85 f6              test   %r14,%r14
>    6: 4c 89 54 24 10        mov    %r10,0x10(%rsp)
>    b: 4d 8b 6f 18          mov    0x18(%r15),%r13
>    f: 4c 89 44 24 08        mov    %r8,0x8(%rsp)
>   14: 74 0c                je     0x22
>   16: 4c 89 ff              mov    %r15,%rdi
>   19: e8 65 43 ff ff        callq  0xffffffffffff4383
>   1e: 84 c0                test   %al,%al
>   20: 75 12                jne    0x34
>   22: 45 31 f6              xor    %r14d,%r14d
>   25: e9 d9 fe ff ff        jmpq   0xffffffffffffff03
>   2a:* 0f 0b                ud2    <-- trapping instruction
>   2c: 45 31 f6              xor    %r14d,%r14d
>   2f: e9 cf fe ff ff        jmpq   0xffffffffffffff03
>   34: 49 8d 6f 08          lea    0x8(%r15),%rbp
>   38: 4c 8b 44 24 08        mov    0x8(%rsp),%r8
>   3d: 48                    rex.W
>   3e: b8                    .byte 0xb8
> ...
>
> Code starting with the faulting instruction
> ===========================================
>    0: 0f 0b                ud2
>    2: 45 31 f6              xor    %r14d,%r14d
>    5: e9 cf fe ff ff        jmpq   0xfffffffffffffed9
>    a: 49 8d 6f 08          lea    0x8(%r15),%rbp
>    e: 4c 8b 44 24 08        mov    0x8(%rsp),%r8
>   13: 48                    rex.W
>   14: b8                    .byte 0xb8
> ...
> [ 2303.538057] RSP: 0018:ffff88a005e0f7c0 EFLAGS: 00010293
> [ 2303.538061] RAX: 0000000000001000 RBX: 000000000000168d RCX: 002ffff800000000
> [ 2303.538064] RDX: ffffffffa66bdcb0 RSI: ffffffffa66bdca0 RDI: ffffea000a2dfe00
> [ 2303.538066] RBP: 0000000000000005 R08: ffffea000a2dfe00 R09: dffffc0000000000
> [ 2303.538069] R10: 0000000000001688 R11: 0000000000000004 R12: ffffea000a2dfe08
> [ 2303.538071] R13: ffffea000a2dfe00 R14: ffffea0000000000 R15: ffff88a005e0fc40
> [ 2303.538075] FS:  00007f53e4ac0740(0000) GS:ffff888c3f4c0000(0000)
> knlGS:0000000000000000
> [ 2303.538077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2303.538079] CR2: 00005562d36cc000 CR3: 0000002015486001 CR4: 00000000003606e0
> [ 2303.538081] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2303.538083] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2303.538085] Call Trace:
> [ 2303.538099] skb_copy_datagram_iter (net/core/datagram.c:453)
> [ 2303.538108] tcp_recvmsg (net/ipv4/tcp.c:2104)
> [ 2303.538115] ? tcp_get_md5sig_pool (net/ipv4/tcp.c:1917)
> [ 2303.538119] ? tcp_poll (include/net/sock.h:1204
> include/net/sock.h:1210 net/ipv4/tcp.c:569)
> [ 2303.538123] ? tcp_splice_read (net/ipv4/tcp.c:504)
> [ 2303.538131] ? bad_area_access_error (arch/x86/mm/fault.c:1213)
> [ 2303.538134] ? tcp_splice_read (net/ipv4/tcp.c:504)
> [ 2303.538144] ? ep_item_poll.isra.20 (fs/eventpoll.c:892)
> [ 2303.538151] ? selinux_secmark_relabel_packet (security/selinux/hooks.c:4532)
> [ 2303.538159] inet_recvmsg (net/ipv4/af_inet.c:838)
> [ 2303.538164] ? inet_sendpage (net/ipv4/af_inet.c:828)
> [ 2303.538172] sock_read_iter (net/socket.c:879)
> [ 2303.538177] ? sock_recvmsg (net/socket.c:862)
> [ 2303.538187] __vfs_read (fs/read_write.c:407 fs/read_write.c:418)
> [ 2303.538193] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2303.538197] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2303.538202] ? __x64_sys_copy_file_range (fs/read_write.c:414)
> [ 2303.538208] ? file_has_perm (security/selinux/hooks.c:1919)
> [ 2303.538216] vfs_read (fs/read_write.c:453)
> [ 2303.538221] ksys_read (fs/read_write.c:579)
> [ 2303.538225] ? kernel_write (fs/read_write.c:572)
> [ 2303.538232] do_syscall_64 (arch/x86/entry/common.c:290)
> [ 2303.538236] ? prepare_exit_to_usermode (arch/x86/entry/common.c:197)
> [ 2303.538240] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
> [ 2303.538245] RIP: 0033:0x7f53e469f1f0
> [ 2303.538249] Code: 73 01 c3 48 8b 0d b8 7d 20 00 f7 d8 64 89 01 48
> 83 c8 ff c3 66 0f 1f 44 00 00 83 3d d9 c1 20 00 00 75 10 b8 00 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 4e fc ff ff 48 89
> 04 24
> All code
> ========
>    0: 73 01                jae    0x3
>    2: c3                    retq
>    3: 48 8b 0d b8 7d 20 00 mov    0x207db8(%rip),%rcx        # 0x207dc2
>    a: f7 d8                neg    %eax
>    c: 64 89 01              mov    %eax,%fs:(%rcx)
>    f: 48 83 c8 ff          or     $0xffffffffffffffff,%rax
>   13: c3                    retq
>   14: 66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
>   1a: 83 3d d9 c1 20 00 00 cmpl   $0x0,0x20c1d9(%rip)        # 0x20c1fa
>   21: 75 10                jne    0x33
>   23: b8 00 00 00 00        mov    $0x0,%eax
>   28: 0f 05                syscall
>   2a:* 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax <--
> trapping instruction
>   30: 73 31                jae    0x63
>   32: c3                    retq
>   33: 48 83 ec 08          sub    $0x8,%rsp
>   37: e8 4e fc ff ff        callq  0xfffffffffffffc8a
>   3c: 48 89 04 24          mov    %rax,(%rsp)
>
> Code starting with the faulting instruction
> ===========================================
>    0: 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax
>    6: 73 31                jae    0x39
>    8: c3                    retq
>    9: 48 83 ec 08          sub    $0x8,%rsp
>    d: e8 4e fc ff ff        callq  0xfffffffffffffc60
>   12: 48 89 04 24          mov    %rax,(%rsp)
> [ 2303.538251] RSP: 002b:00007ffcc6a0c188 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000000
> [ 2303.538254] RAX: ffffffffffffffda RBX: 00005562d5f89883 RCX: 00007f53e469f1f0
> [ 2303.538256] RDX: 0000000000000005 RSI: 00005562d5f89883 RDI: 0000000000000dfb
> [ 2303.538258] RBP: 00007ffcc6a0c1c0 R08: 0000000000000032 R09: 0000000000000020
> [ 2303.538260] R10: 00005562d20944de R11: 0000000000000246 R12: 0000000000000005
> [ 2303.538262] R13: 00005562dbb17f60 R14: 00005562d2570e80 R15: 00007f53c5866d98
> [ 2303.538268] ---[ end trace d791391e77eef582 ]---
> [ 2330.200708] kasan: CONFIG_KASAN_INLINE enabled
> [ 2330.211020] kasan: GPF could be caused by NULL-ptr deref or user
> memory access
> [ 2330.224169] general protection fault: 0000 [#1] SMP KASAN PTI
> [ 2330.235791] CPU: 28 PID: 69371 Comm: nginx-fl Tainted: G    B   W
> O      4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
> [ 2330.253036] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2330.268679] RIP: 0010:rb_replace_node (??:?)
> [ 2330.279645] Code: 55 48 89 f5 53 48 89 fb 48 83 ec 08 80 3c 01 00
> 0f 85 64 02 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 e8 4c 8b 23 48
> c1 e8 03 <0f> b6 34 08 48 8d 45 17 48 89 c7 83 e0 07 48 c1 ef 03 49 83
> e4 fc
> All code
> ========
>    0: 55                    push   %rbp
>    1: 48 89 f5              mov    %rsi,%rbp
>    4: 53                    push   %rbx
>    5: 48 89 fb              mov    %rdi,%rbx
>    8: 48 83 ec 08          sub    $0x8,%rsp
>    c: 80 3c 01 00          cmpb   $0x0,(%rcx,%rax,1)
>   10: 0f 85 64 02 00 00    jne    0x27a
>   16: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
>   1d: fc ff df
>   20: 48 89 e8              mov    %rbp,%rax
>   23: 4c 8b 23              mov    (%rbx),%r12
>   26: 48 c1 e8 03          shr    $0x3,%rax
>   2a:* 0f b6 34 08          movzbl (%rax,%rcx,1),%esi <-- trapping instruction
>   2e: 48 8d 45 17          lea    0x17(%rbp),%rax
>   32: 48 89 c7              mov    %rax,%rdi
>   35: 83 e0 07              and    $0x7,%eax
>   38: 48 c1 ef 03          shr    $0x3,%rdi
>   3c: 49 83 e4 fc          and    $0xfffffffffffffffc,%r12
>
> Code starting with the faulting instruction
> ===========================================
>    0: 0f b6 34 08          movzbl (%rax,%rcx,1),%esi
>    4: 48 8d 45 17          lea    0x17(%rbp),%rax
>    8: 48 89 c7              mov    %rax,%rdi
>    b: 83 e0 07              and    $0x7,%eax
>    e: 48 c1 ef 03          shr    $0x3,%rdi
>   12: 49 83 e4 fc          and    $0xfffffffffffffffc,%r12
> [ 2330.311757] RSP: 0018:ffff888c3f687d88 EFLAGS: 00010206
> [ 2330.323631] RAX: 0000000000000003 RBX: ffff888c081fc000 RCX: dffffc0000000000
> [ 2330.323634] RDX: ffff888c0a5c38e0 RSI: 000000000000001a RDI: ffff888c081fc000
> [ 2330.323636] RBP: 000000000000001a R08: fffffbfff4d88d09 R09: fffffbfff4d88d08
> [ 2330.323639] R10: fffffbfff4d88d08 R11: ffffffffa6c46847 R12: 0000000030747865
> [ 2330.323641] R13: ffff888c0a5c3910 R14: ffff888c0a5c3870 R15: ffff888c0a5c38e0
> [ 2330.323644] FS:  00007f3375a30780(0000) GS:ffff888c3f680000(0000)
> knlGS:0000000000000000
> [ 2330.323647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2330.323649] CR2: 00007f19d3da5000 CR3: 0000000bee77a001 CR4: 00000000003606e0
> [ 2330.323651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2330.323653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2330.323655] Call Trace:
> [ 2330.323658]  <IRQ>
> [ 2330.323673] ip_expire (net/ipv4/ip_fragment.c:223)
> [ 2330.323680] ? ip_check_defrag (net/ipv4/ip_fragment.c:187)
> [ 2330.323686] call_timer_fn (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/timer.h:121
> kernel/time/timer.c:1327)
> [ 2330.323691] run_timer_softirq (kernel/time/timer.c:1364
> kernel/time/timer.c:1682 kernel/time/timer.c:1695)
> [ 2330.323695] ? add_timer (kernel/time/timer.c:1692)
> [ 2330.323699] ? hrtimer_init (kernel/time/hrtimer.c:1430)
> [ 2330.323705] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2330.323709] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
> arch/x86/kernel/tsc.c:1066)
> [ 2330.323713] ? ktime_get (kernel/time/timekeeping.c:267
> kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:756)
> [ 2330.323720] ? lapic_timer_set_oneshot (arch/x86/kernel/apic/apic.c:467)
> [ 2330.323727] ? clockevents_program_event (kernel/time/clockevents.c:346)
> [ 2330.323733] __do_softirq (arch/x86/include/asm/jump_label.h:36
> include/linux/jump_label.h:142 include/trace/events/irq.h:142
> kernel/softirq.c:293)
> [ 2330.323741] irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
> [ 2330.323744] smp_apic_timer_interrupt
> (arch/x86/include/asm/irq_regs.h:19 arch/x86/include/asm/irq_regs.h:26
> arch/x86/kernel/apic/apic.c:1058)
> [ 2330.323751] apic_timer_interrupt (arch/x86/entry/entry_64.S:864)
> [ 2330.323753]  </IRQ>
> [ 2330.323760] RIP: 0010:check_memory_region (??:?)
> [ 2330.323765] Code: ff 41 54 49 b9 00 00 00 00 00 fc ff df 4d 89 da
> 55 49 c1 ea 03 53 48 89 fb 4d 01 ca 48 c1 eb 03 49 8d 6a 01 49 01 d9
> 49 89 e8 <4c> 89 c8 4d 29 c8 49 83 f8 10 0f 8e 98 00 00 00 44 89 cb 83
> e3 07
> All code
> ========
>    0: ff 41 54              incl   0x54(%rcx)
>    3: 49 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%r9
>    a: fc ff df
>    d: 4d 89 da              mov    %r11,%r10
>   10: 55                    push   %rbp
>   11: 49 c1 ea 03          shr    $0x3,%r10
>   15: 53                    push   %rbx
>   16: 48 89 fb              mov    %rdi,%rbx
>   19: 4d 01 ca              add    %r9,%r10
>   1c: 48 c1 eb 03          shr    $0x3,%rbx
>   20: 49 8d 6a 01          lea    0x1(%r10),%rbp
>   24: 49 01 d9              add    %rbx,%r9
>   27: 49 89 e8              mov    %rbp,%r8
>   2a:* 4c 89 c8              mov    %r9,%rax <-- trapping instruction
>   2d: 4d 29 c8              sub    %r9,%r8
>   30: 49 83 f8 10          cmp    $0x10,%r8
>   34: 0f 8e 98 00 00 00    jle    0xd2
>   3a: 44 89 cb              mov    %r9d,%ebx
>   3d: 83 e3 07              and    $0x7,%ebx
>
> Code starting with the faulting instruction
> ===========================================
>    0: 4c 89 c8              mov    %r9,%rax
>    3: 4d 29 c8              sub    %r9,%r8
>    6: 49 83 f8 10          cmp    $0x10,%r8
>    a: 0f 8e 98 00 00 00    jle    0xa8
>   10: 44 89 cb              mov    %r9d,%ebx
>   13: 83 e3 07              and    $0x7,%ebx
> [ 2330.323767] RSP: 0018:ffff888bcb66f830 EFLAGS: 00000286 ORIG_RAX:
> ffffffffffffff13
> [ 2330.323771] RAX: ffff7fffffffffff RBX: 1ffffd400601a58e RCX: ffffffffa5591192
> [ 2330.323772] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffea00300d2c74
> [ 2330.323775] RBP: fffff9400601a58f R08: fffff9400601a58f R09: fffff9400601a58e
> [ 2330.323777] R10: fffff9400601a58e R11: ffffea00300d2c77 R12: dffffc0000000000
> [ 2330.323779] R13: ffff888bf01d0500 R14: ffff88826902a7c0 R15: ffffea00300d2c40
> [ 2330.323787] ? skb_release_data (arch/x86/include/asm/atomic.h:125
> (discriminator 3) include/asm-generic/atomic-instrumented.h:260
> (discriminator 3) include/linux/page_ref.h:139 (discriminator 3)
> include/linux/mm.h:520 (discriminator 3) include/linux/mm.h:942
> (discriminator 3) include/linux/skbuff.h:2795 (discriminator 3)
> net/core/skbuff.c:564 (discriminator 3))
> [ 2330.323793] skb_release_data (arch/x86/include/asm/atomic.h:125
> (discriminator 3) include/asm-generic/atomic-instrumented.h:260
> (discriminator 3) include/linux/page_ref.h:139 (discriminator 3)
> include/linux/mm.h:520 (discriminator 3) include/linux/mm.h:942
> (discriminator 3) include/linux/skbuff.h:2795 (discriminator 3)
> net/core/skbuff.c:564 (discriminator 3))
> [ 2330.323798] __kfree_skb (net/core/skbuff.c:642)
> [ 2330.323804] tcp_recvmsg (include/net/sock.h:2405 net/ipv4/tcp.c:2134)
> [ 2330.323808] ? sock_def_readable (arch/x86/include/asm/bitops.h:328
> include/net/sock.h:828 include/net/sock.h:2181 net/core/sock.c:2698)
> [ 2330.323814] ? tcp_get_md5sig_pool (net/ipv4/tcp.c:1917)
> [ 2330.323817] ? tcp_poll (include/net/sock.h:1204
> include/net/sock.h:1210 net/ipv4/tcp.c:569)
> [ 2330.323825] ? unix_stream_sendpage (net/unix/af_unix.c:1829)
> [ 2330.323831] ? sock_sendmsg (net/socket.c:622 net/socket.c:631)
> [ 2330.323834] ? sock_write_iter (net/socket.c:901)
> [ 2330.323838] ? sock_sendmsg (net/socket.c:884)
> [ 2330.323846] inet_recvmsg (net/ipv4/af_inet.c:838)
> [ 2330.323851] ? inet_sendpage (net/ipv4/af_inet.c:828)
> [ 2330.323856] sock_read_iter (net/socket.c:879)
> [ 2330.323860] ? sock_recvmsg (net/socket.c:862)
> [ 2330.323870] __vfs_read (fs/read_write.c:407 fs/read_write.c:418)
> [ 2330.323874] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2330.323878] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
> [ 2330.323883] ? __x64_sys_copy_file_range (fs/read_write.c:414)
> [ 2330.323890] ? file_has_perm (security/selinux/hooks.c:1919)
> [ 2330.323898] vfs_read (fs/read_write.c:453)
> [ 2330.323903] ksys_read (fs/read_write.c:579)
> [ 2330.323908] ? kernel_write (fs/read_write.c:572)
> [ 2330.323911] ? fput (arch/x86/include/asm/atomic64_64.h:118
> include/asm-generic/atomic-instrumented.h:269
> include/asm-generic/atomic-long.h:218 fs/file_table.c:331)
> [ 2330.323918] do_syscall_64 (arch/x86/entry/common.c:290)
> [ 2330.323921] ? prepare_exit_to_usermode (arch/x86/entry/common.c:197)
> [ 2330.323926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
> [ 2330.323930] RIP: 0033:0x7f337540b20d
> [ 2330.323934] Code: c1 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01
> f0 ff ff 73 31 c3 48 83 ec 08 e8 4e fc ff ff 48 89 04 24 b8 00 00 00
> 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 97 fc ff ff 48 89 d0 48 83 c4 08 48
> 3d 01
> All code
> ========
>    0: c1 20 00              shll   $0x0,(%rax)
>    3: 00 75 10              add    %dh,0x10(%rbp)
>    6: b8 00 00 00 00        mov    $0x0,%eax
>    b: 0f 05                syscall
>    d: 48 3d 01 f0 ff ff    cmp    $0xfffffffffffff001,%rax
>   13: 73 31                jae    0x46
>   15: c3                    retq
>   16: 48 83 ec 08          sub    $0x8,%rsp
>   1a: e8 4e fc ff ff        callq  0xfffffffffffffc6d
>   1f: 48 89 04 24          mov    %rax,(%rsp)
>   23: b8 00 00 00 00        mov    $0x0,%eax
>   28: 0f 05                syscall
>   2a:* 48 8b 3c 24          mov    (%rsp),%rdi <-- trapping instruction
>   2e: 48 89 c2              mov    %rax,%rdx
>   31: e8 97 fc ff ff        callq  0xfffffffffffffccd
>   36: 48 89 d0              mov    %rdx,%rax
>   39: 48 83 c4 08          add    $0x8,%rsp
>   3d: 48                    rex.W
>   3e: 3d                    .byte 0x3d
>   3f: 01                    .byte 0x1
>
> Code starting with the faulting instruction
> ===========================================
>    0: 48 8b 3c 24          mov    (%rsp),%rdi
>    4: 48 89 c2              mov    %rax,%rdx
>    7: e8 97 fc ff ff        callq  0xfffffffffffffca3
>    c: 48 89 d0              mov    %rdx,%rax
>    f: 48 83 c4 08          add    $0x8,%rsp
>   13: 48                    rex.W
>   14: 3d                    .byte 0x3d
>   15: 01                    .byte 0x1
> [ 2330.323936] RSP: 002b:00007ffe077a9510 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000000
> [ 2330.323940] RAX: ffffffffffffffda RBX: 00005640dee9dcb8 RCX: 00007f337540b20d
> [ 2330.323942] RDX: 0000000000004018 RSI: 00005640dee9dcb8 RDI: 0000000000000185
> [ 2330.323945] RBP: 00007ffe077a9550 R08: 00005640dd627720 R09: 0000000000004000
> [ 2330.323947] R10: 0000000000000300 R11: 0000000000000293 R12: 0000000000004018
> [ 2330.323949] R13: 00005640dddcb4c0 R14: 0000000000004000 R15: 00007f32435090e0
> [ 2330.323954] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
> dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
> x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
> crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
> [ 2330.324038]  crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
> dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2330.324111] ---[ end trace d791391e77eef583 ]---
> [ 2330.324118] RIP: 0010:rb_replace_node (??:?)
> [ 2330.324122] Code: 55 48 89 f5 53 48 89 fb 48 83 ec 08 80 3c 01 00
> 0f 85 64 02 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 e8 4c 8b 23 48
> c1 e8 03 <0f> b6 34 08 48 8d 45 17 48 89 c7 83 e0 07 48 c1 ef 03 49 83
> e4 fc
> All code
> ========
>    0: 55                    push   %rbp
>    1: 48 89 f5              mov    %rsi,%rbp
>    4: 53                    push   %rbx
>    5: 48 89 fb              mov    %rdi,%rbx
>    8: 48 83 ec 08          sub    $0x8,%rsp
>    c: 80 3c 01 00          cmpb   $0x0,(%rcx,%rax,1)
>   10: 0f 85 64 02 00 00    jne    0x27a
>   16: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
>   1d: fc ff df
>   20: 48 89 e8              mov    %rbp,%rax
>   23: 4c 8b 23              mov    (%rbx),%r12
>   26: 48 c1 e8 03          shr    $0x3,%rax
>   2a:* 0f b6 34 08          movzbl (%rax,%rcx,1),%esi <-- trapping instruction
>   2e: 48 8d 45 17          lea    0x17(%rbp),%rax
>   32: 48 89 c7              mov    %rax,%rdi
>   35: 83 e0 07              and    $0x7,%eax
>   38: 48 c1 ef 03          shr    $0x3,%rdi
>   3c: 49 83 e4 fc          and    $0xfffffffffffffffc,%r12
>
> Code starting with the faulting instruction
> ===========================================
>    0: 0f b6 34 08          movzbl (%rax,%rcx,1),%esi
>    4: 48 8d 45 17          lea    0x17(%rbp),%rax
>    8: 48 89 c7              mov    %rax,%rdi
>    b: 83 e0 07              and    $0x7,%eax
>    e: 48 c1 ef 03          shr    $0x3,%rdi
>   12: 49 83 e4 fc          and    $0xfffffffffffffffc,%r12
> [ 2330.324129] RSP: 0018:ffff888c3f687d88 EFLAGS: 00010206
> [ 2330.324133] RAX: 0000000000000003 RBX: ffff888c081fc000 RCX: dffffc0000000000
> [ 2330.324135] RDX: ffff888c0a5c38e0 RSI: 000000000000001a RDI: ffff888c081fc000
> [ 2330.324137] RBP: 000000000000001a R08: fffffbfff4d88d09 R09: fffffbfff4d88d08
> [ 2330.324140] R10: fffffbfff4d88d08 R11: ffffffffa6c46847 R12: 0000000030747865
> [ 2330.324142] R13: ffff888c0a5c3910 R14: ffff888c0a5c3870 R15: ffff888c0a5c38e0
> [ 2330.324151] FS:  00007f3375a30780(0000) GS:ffff888c3f680000(0000)
> knlGS:0000000000000000
> [ 2330.324154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2330.324156] CR2: 00007f19d3da5000 CR3: 0000000bee77a001 CR4: 00000000003606e0
> [ 2330.324158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2330.324161] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2330.324163] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2330.324214] Kernel Offset: 0x23000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> This commit from 4.19.14 seems relevant:
>
> * https://github.com/torvalds/linux/commit/d5f9565c8d5ad3cf94982223cfcef1169b0bb60f
>
> As a reminder, we upgraded from 4.19.13 and started seeing crashes.


Right, @err needs to be set properly.

Probably something like :

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index f8bbd693c19c247e41839c2d0b5318ca51b23ee8..dbd14530510a934230096b293c4042dd65c672c5
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -443,6 +443,7 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)
                 * but not the last (covered above).
                 */
                rbn = &qp->q.rb_fragments.rb_node;
+               err = -EINVAL;
                do {
                        parent = *rbn;
                        skb1 = rb_to_skb(parent);
@@ -501,7 +502,6 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)

 discard_qp:
        inet_frag_kill(&qp->q);
-       err = -EINVAL;
        __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
 err:
        kfree_skb(skb);

^ permalink raw reply

* Re: BUG: KASAN: double-free or invalid-free in ip_defrag after upgrade from 4.19.13
From: Eric Dumazet @ 2019-01-30 22:57 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Linux Kernel Network Developers, mkubecek, David S. Miller,
	Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CANn89i+4OQihFo8+ONU2tqEexpvq+mLAYPD9kfxQ9U2zzuRuJQ@mail.gmail.com>

On Wed, Jan 30, 2019 at 2:50 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Right, @err needs to be set properly.
>
> Probably something like :
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index f8bbd693c19c247e41839c2d0b5318ca51b23ee8..dbd14530510a934230096b293c4042dd65c672c5
> 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -443,6 +443,7 @@ static int ip_frag_queue(struct ipq *qp, struct
> sk_buff *skb)
>                  * but not the last (covered above).
>                  */
>                 rbn = &qp->q.rb_fragments.rb_node;
> +               err = -EINVAL;
>                 do {
>                         parent = *rbn;
>                         skb1 = rb_to_skb(parent);
> @@ -501,7 +502,6 @@ static int ip_frag_queue(struct ipq *qp, struct
> sk_buff *skb)
>
>  discard_qp:
>         inet_frag_kill(&qp->q);
> -       err = -EINVAL;
>         __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
>  err:
>         kfree_skb(skb);


Or even better :/

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index f8bbd693c19c247e41839c2d0b5318ca51b23ee8..d95b32af4a0e3f552405c9e61cc372729834160c
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -425,6 +425,7 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)
         * fragment.
         */

+       err = -EINVAL;
        /* Find out where to put this fragment.  */
        prev_tail = qp->q.fragments_tail;
        if (!prev_tail)
@@ -501,7 +502,6 @@ static int ip_frag_queue(struct ipq *qp, struct
sk_buff *skb)

 discard_qp:
        inet_frag_kill(&qp->q);
-       err = -EINVAL;
        __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS);
 err:
        kfree_skb(skb);

^ permalink raw reply

* Re: [PATCH net-next v2 10/12] net: dsa: Wire up multicast IGMP snooping attribute notification
From: Florian Fainelli @ 2019-01-30 22:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, vivien.didelot, davem, idosch, jiri, ilias.apalodimas,
	ivan.khoronzhuk, roopa, nikolay
In-Reply-To: <20190130160639.GE15050@lunn.ch>

On 1/30/19 8:06 AM, Andrew Lunn wrote:
> On Tue, Jan 29, 2019 at 04:55:46PM -0800, Florian Fainelli wrote:
>> The bridge can at runtime be configured with or without IGMP snooping
>> enabled but we were not processing the switchdev attribute that notifies
>> about that toggle, do this now.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  include/net/dsa.h  |  2 ++
>>  net/dsa/dsa_priv.h | 11 +++++++++++
>>  net/dsa/port.c     | 13 +++++++++++++
>>  net/dsa/slave.c    |  4 ++++
>>  net/dsa/switch.c   | 28 ++++++++++++++++++++++++++++
>>  5 files changed, 58 insertions(+)
>>
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index 7f2a668ef2cc..2ee1ede7df5c 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -425,6 +425,8 @@ struct dsa_switch_ops {
>>  	/*
>>  	 * Multicast database
>>  	 */
>> +	int	(*port_multicast_toggle)(struct dsa_switch *ds, int port,
>> +					 bool mc_disabled);
> 
> 
> Hi Florin
> 
> Looks like there is an extra tab in there?

In the first or second line?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v2 07/12] net: dsa: Add ability to program multicast filter for CPU port
From: Florian Fainelli @ 2019-01-30 22:55 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, andrew, davem, idosch, jiri, ilias.apalodimas,
	ivan.khoronzhuk, roopa, nikolay
In-Reply-To: <20190130172859.GB3207@t480s.localdomain>

On 1/30/19 2:28 PM, Vivien Didelot wrote:
> Hi Florian,
> 
> On Tue, 29 Jan 2019 16:55:43 -0800, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> +static int dsa_slave_sync_unsync_mdb_addr(struct net_device *dev,
>> +					  const unsigned char *addr, bool add)
>> +{
>> +	struct switchdev_obj_port_mdb mdb = {
>> +		.obj = {
>> +			.id = SWITCHDEV_OBJ_ID_HOST_MDB,
>> +			.flags = SWITCHDEV_F_DEFER,
>> +		},
>> +		.vid = 0,
>> +	};
>> +	int ret = -EOPNOTSUPP;
> 
> Assignment unneeded here.
> 
>> +
>> +	ether_addr_copy(mdb.addr, addr);
>> +	if (add)
>> +		ret = switchdev_port_obj_add(dev, &mdb.obj, NULL);
>> +	else
>> +		ret = switchdev_port_obj_del(dev, &mdb.obj);
>> +
>> +	return ret;
>> +}
>> +
>> +static int dsa_slave_sync_mdb_addr(struct net_device *dev,
>> +				   const unsigned char *addr)
>> +{
>> +	return dsa_slave_sync_unsync_mdb_addr(dev, addr, true);
>> +}
>> +
>> +static int dsa_slave_unsync_mdb_addr(struct net_device *dev,
>> +				     const unsigned char *addr)
>> +{
>> +	return dsa_slave_sync_unsync_mdb_addr(dev, addr, false);
>> +}
> 
> This wrapper isn't necessary IMO. I'd go with something like:
> 
> static int dsa_slave_sync(struct net_device *dev, const unsigned char *addr)
> {
> 	struct switchdev_obj_port_mdb mdb = {
> 		.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB,
> 		.obj.flags = SWITCHDEV_F_DEFER,
> 	};
> 
> 	ether_addr_copy(mdb.addr, addr);
> 
> 	return switchdev_port_obj_add(dev, &mdb.obj, NULL);
> }
> 
> static int dsa_slave_unsync(struct net_device *dev, const unsigned char *addr)
> {
> 	struct switchdev_obj_port_mdb mdb = {
> 		.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB,
> 		.obj.flags = SWITCHDEV_F_DEFER,
> 	};
> 
> 	ether_addr_copy(mdb.addr, addr);
> 
> 	return switchdev_port_obj_del(dev, &mdb.obj);
> }
> 
> We may eventually wrap this cryptic netdevery in:
> 
> static int dsa_slave_mc_sync(struct net_device *dev)
> {
> 	return __hw_addr_sync_dev(&dev->mc, dev, dsa_slave_sync, dsa_slave_unsync);
> }
> 
> static void dsa_slave_mc_unsync(struct net_device *dev)
> {
> 	__hw_addr_unsync_dev(&dev->mc, dev, dsa_slave_sync);
> }
> 
>> +
>>  static int dsa_slave_open(struct net_device *dev)
>>  {
>>  	struct net_device *master = dsa_slave_to_master(dev);
>> @@ -126,6 +159,8 @@ static int dsa_slave_close(struct net_device *dev)
>>  
>>  	dev_mc_unsync(master, dev);
>>  	dev_uc_unsync(master, dev);
>> +	__hw_addr_unsync_dev(&dev->mc, dev, dsa_slave_unsync_mdb_addr);
>> +
>>  	if (dev->flags & IFF_ALLMULTI)
>>  		dev_set_allmulti(master, -1);
>>  	if (dev->flags & IFF_PROMISC)
>> @@ -150,7 +185,17 @@ static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
>>  static void dsa_slave_set_rx_mode(struct net_device *dev)
>>  {
>>  	struct net_device *master = dsa_slave_to_master(dev);
>> +	struct dsa_port *dp = dsa_slave_to_port(dev);
>>  
>> +	/* If the port is bridged, the bridge takes care of sending
>> +	 * SWITCHDEV_OBJ_ID_HOST_MDB to program the host's MC filter
>> +	 */
>> +	if (netdev_mc_empty(dev) || dp->bridge_dev)
>> +		goto out;
>> +
>> +	__hw_addr_sync_dev(&dev->mc, dev, dsa_slave_sync_mdb_addr,
>> +			   dsa_slave_unsync_mdb_addr);
> 
> And check the returned error code.

All good points, I have now incorporated your suggestions, thanks!
-- 
Florian

^ permalink raw reply

* [PATCH net-next v2 2/5] net: tls: Refactor tls aad space size calculation
From: Dave Watson @ 2019-01-30 21:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Dave Miller
  Cc: Vakul Garg, Boris Pismenny, Aviad Yehezkel, John Fastabend,
	Daniel Borkmann

TLS 1.3 has a different AAD size, use a variable in the code to
make TLS 1.3 support easy.

Signed-off-by: Dave Watson <davejwatson@fb.com>
---
 include/net/tls.h |  1 +
 net/tls/tls_sw.c  | 17 +++++++++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index da616db48413..754b130672f0 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -202,6 +202,7 @@ struct cipher_context {
 	char *iv;
 	u16 rec_seq_size;
 	char *rec_seq;
+	u16 aad_size;
 };
 
 union tls_crypto_context {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9326c06c2ffe..7b6386f4c685 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -185,7 +185,7 @@ static int tls_do_decryption(struct sock *sk,
 	int ret;
 
 	aead_request_set_tfm(aead_req, ctx->aead_recv);
-	aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
+	aead_request_set_ad(aead_req, tls_ctx->rx.aad_size);
 	aead_request_set_crypt(aead_req, sgin, sgout,
 			       data_len + tls_ctx->rx.tag_size,
 			       (u8 *)iv_recv);
@@ -289,12 +289,12 @@ static struct tls_rec *tls_get_rec(struct sock *sk)
 
 	sg_init_table(rec->sg_aead_in, 2);
 	sg_set_buf(&rec->sg_aead_in[0], rec->aad_space,
-		   sizeof(rec->aad_space));
+		   tls_ctx->tx.aad_size);
 	sg_unmark_end(&rec->sg_aead_in[1]);
 
 	sg_init_table(rec->sg_aead_out, 2);
 	sg_set_buf(&rec->sg_aead_out[0], rec->aad_space,
-		   sizeof(rec->aad_space));
+		   tls_ctx->tx.aad_size);
 	sg_unmark_end(&rec->sg_aead_out[1]);
 
 	return rec;
@@ -455,7 +455,7 @@ static int tls_do_encryption(struct sock *sk,
 	msg_en->sg.curr = start;
 
 	aead_request_set_tfm(aead_req, ctx->aead_send);
-	aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
+	aead_request_set_ad(aead_req, tls_ctx->tx.aad_size);
 	aead_request_set_crypt(aead_req, rec->sg_aead_in,
 			       rec->sg_aead_out,
 			       data_len, rec->iv_data);
@@ -1317,7 +1317,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 
 	aead_size = sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv);
 	mem_size = aead_size + (nsg * sizeof(struct scatterlist));
-	mem_size = mem_size + TLS_AAD_SPACE_SIZE;
+	mem_size = mem_size + tls_ctx->rx.aad_size;
 	mem_size = mem_size + crypto_aead_ivsize(ctx->aead_recv);
 
 	/* Allocate a single block of memory which contains
@@ -1333,7 +1333,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 	sgin = (struct scatterlist *)(mem + aead_size);
 	sgout = sgin + n_sgin;
 	aad = (u8 *)(sgout + n_sgout);
-	iv = aad + TLS_AAD_SPACE_SIZE;
+	iv = aad + tls_ctx->rx.aad_size;
 
 	/* Prepare IV */
 	err = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
@@ -1352,7 +1352,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 
 	/* Prepare sgin */
 	sg_init_table(sgin, n_sgin);
-	sg_set_buf(&sgin[0], aad, TLS_AAD_SPACE_SIZE);
+	sg_set_buf(&sgin[0], aad, tls_ctx->rx.aad_size);
 	err = skb_to_sgvec(skb, &sgin[1],
 			   rxm->offset + tls_ctx->rx.prepend_size,
 			   rxm->full_len - tls_ctx->rx.prepend_size);
@@ -1364,7 +1364,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 	if (n_sgout) {
 		if (out_iov) {
 			sg_init_table(sgout, n_sgout);
-			sg_set_buf(&sgout[0], aad, TLS_AAD_SPACE_SIZE);
+			sg_set_buf(&sgout[0], aad, tls_ctx->rx.aad_size);
 
 			*chunk = 0;
 			err = tls_setup_from_iter(sk, out_iov, data_len,
@@ -2100,6 +2100,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 		goto free_priv;
 	}
 
+	cctx->aad_size = TLS_AAD_SPACE_SIZE;
 	cctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
 	cctx->tag_size = tag_size;
 	cctx->overhead_size = cctx->prepend_size + cctx->tag_size;
-- 
2.17.1


^ permalink raw reply related

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: John David Anglin @ 2019-01-30 22:24 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190130172818.GJ21904@lunn.ch>

On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
I don't know if this is a hardware limitation or not, but currently the
armada 37xx doesn't support
level interrupts:

[    4.013280] genirq: Setting trigger mode 8 for irq 44 failed
(armada_37xx_irq_set_type+0x0/0x158)
[    4.014075] mv88e6085: probe of d0032004.mdio-mii:01 failed with
error -22

The function armada_37xx_irq_set_type() only supports edge interrupts.

On the plus side, DTC no longer objects to level interrupts on southbridge.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply

* [PATCH net-next v2 4/5] net: tls: Add tls 1.3 support
From: Dave Watson @ 2019-01-30 21:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Dave Miller
  Cc: Vakul Garg, Boris Pismenny, Aviad Yehezkel, John Fastabend,
	Daniel Borkmann

TLS 1.3 has minor changes from TLS 1.2 at the record layer.

* Header now hardcodes the same version and application content type in
  the header.
* The real content type is appended after the data, before encryption (or
  after decryption).
* The IV is xored with the sequence number, instead of concatinating four
  bytes of IV with the explicit IV.
* Zero-padding:  No exlicit length is given, we search backwards from the
  end of the decrypted data for the first non-zero byte, which is the
  content type.  Currently recv supports reading zero-padding, but there
  is no way for send to add zero padding.

Signed-off-by: Dave Watson <davejwatson@fb.com>
---
 include/net/tls.h             |  66 ++++++++++++++-----
 include/uapi/linux/tls.h      |   4 ++
 net/tls/tls_device.c          |   5 +-
 net/tls/tls_device_fallback.c |   3 +-
 net/tls/tls_main.c            |   3 +-
 net/tls/tls_sw.c              | 116 +++++++++++++++++++++++++++-------
 6 files changed, 154 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 754b130672f0..004bf01ce868 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -119,6 +119,9 @@ struct tls_rec {
 	/* AAD | msg_encrypted.sg.data (data contains overhead for hdr & iv & tag) */
 	struct scatterlist sg_aead_out[2];
 
+	char content_type;
+	struct scatterlist sg_content_type;
+
 	char aad_space[TLS_AAD_SPACE_SIZE];
 	u8 iv_data[TLS_CIPHER_AES_GCM_128_IV_SIZE +
 		   TLS_CIPHER_AES_GCM_128_SALT_SIZE];
@@ -203,6 +206,7 @@ struct cipher_context {
 	u16 rec_seq_size;
 	char *rec_seq;
 	u16 aad_size;
+	u16 tail_size;
 };
 
 union tls_crypto_context {
@@ -397,49 +401,77 @@ static inline bool tls_bigint_increment(unsigned char *seq, int len)
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
-					 struct cipher_context *ctx)
+					 struct cipher_context *ctx,
+					 int version)
 {
 	if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
 		tls_err_abort(sk, EBADMSG);
-	tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
-			     ctx->iv_size);
+
+	if (version != TLS_1_3_VERSION) {
+		tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+				     ctx->iv_size);
+	}
 }
 
 static inline void tls_fill_prepend(struct tls_context *ctx,
 			     char *buf,
 			     size_t plaintext_len,
-			     unsigned char record_type)
+			     unsigned char record_type,
+			     int version)
 {
 	size_t pkt_len, iv_size = ctx->tx.iv_size;
 
-	pkt_len = plaintext_len + iv_size + ctx->tx.tag_size;
+	pkt_len = plaintext_len + ctx->tx.tag_size;
+	if (version != TLS_1_3_VERSION) {
+		pkt_len += iv_size;
+
+		memcpy(buf + TLS_NONCE_OFFSET,
+		       ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+	}
 
 	/* we cover nonce explicit here as well, so buf should be of
 	 * size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
 	 */
-	buf[0] = record_type;
-	buf[1] = TLS_VERSION_MINOR(ctx->crypto_send.info.version);
-	buf[2] = TLS_VERSION_MAJOR(ctx->crypto_send.info.version);
+	buf[0] = version == TLS_1_3_VERSION ?
+		   TLS_RECORD_TYPE_DATA : record_type;
+	/* Note that VERSION must be TLS_1_2 for both TLS1.2 and TLS1.3 */
+	buf[1] = TLS_1_2_VERSION_MINOR;
+	buf[2] = TLS_1_2_VERSION_MAJOR;
 	/* we can use IV for nonce explicit according to spec */
 	buf[3] = pkt_len >> 8;
 	buf[4] = pkt_len & 0xFF;
-	memcpy(buf + TLS_NONCE_OFFSET,
-	       ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
 }
 
 static inline void tls_make_aad(char *buf,
 				size_t size,
 				char *record_sequence,
 				int record_sequence_size,
-				unsigned char record_type)
+				unsigned char record_type,
+				int version)
+{
+	if (version != TLS_1_3_VERSION) {
+		memcpy(buf, record_sequence, record_sequence_size);
+		buf += 8;
+	} else {
+		size += TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+	}
+
+	buf[0] = version == TLS_1_3_VERSION ?
+		  TLS_RECORD_TYPE_DATA : record_type;
+	buf[1] = TLS_1_2_VERSION_MAJOR;
+	buf[2] = TLS_1_2_VERSION_MINOR;
+	buf[3] = size >> 8;
+	buf[4] = size & 0xFF;
+}
+
+static inline void xor_iv_with_seq(int version, char *iv, char *seq)
 {
-	memcpy(buf, record_sequence, record_sequence_size);
+	int i;
 
-	buf[8] = record_type;
-	buf[9] = TLS_1_2_VERSION_MAJOR;
-	buf[10] = TLS_1_2_VERSION_MINOR;
-	buf[11] = size >> 8;
-	buf[12] = size & 0xFF;
+	if (version == TLS_1_3_VERSION) {
+		for (i = 0; i < 8; i++)
+			iv[i + 4] ^= seq[i];
+	}
 }
 
 static inline struct tls_context *tls_get_ctx(const struct sock *sk)
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 9affceaa3db4..401d6f01de6a 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -51,6 +51,10 @@
 #define TLS_1_2_VERSION_MINOR	0x3
 #define TLS_1_2_VERSION		TLS_VERSION_NUMBER(TLS_1_2)
 
+#define TLS_1_3_VERSION_MAJOR	0x3
+#define TLS_1_3_VERSION_MINOR	0x4
+#define TLS_1_3_VERSION		TLS_VERSION_NUMBER(TLS_1_3)
+
 /* Supported ciphers */
 #define TLS_CIPHER_AES_GCM_128				51
 #define TLS_CIPHER_AES_GCM_128_IV_SIZE			8
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index d753e362d2d9..7ee9008b2187 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -257,7 +257,8 @@ static int tls_push_record(struct sock *sk,
 	tls_fill_prepend(ctx,
 			 skb_frag_address(frag),
 			 record->len - ctx->tx.prepend_size,
-			 record_type);
+			 record_type,
+			 ctx->crypto_send.info.version);
 
 	/* HW doesn't care about the data in the tag, because it fills it. */
 	dummy_tag_frag.page = skb_frag_page(frag);
@@ -270,7 +271,7 @@ static int tls_push_record(struct sock *sk,
 	spin_unlock_irq(&offload_ctx->lock);
 	offload_ctx->open_record = NULL;
 	set_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags);
-	tls_advance_record_sn(sk, &ctx->tx);
+	tls_advance_record_sn(sk, &ctx->tx, ctx->crypto_send.info.version);
 
 	for (i = 0; i < record->num_frags; i++) {
 		frag = &record->frags[i];
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 450a6dbc5a88..54c3a758f2a7 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -73,7 +73,8 @@ static int tls_enc_record(struct aead_request *aead_req,
 	len -= TLS_CIPHER_AES_GCM_128_IV_SIZE;
 
 	tls_make_aad(aad, len - TLS_CIPHER_AES_GCM_128_TAG_SIZE,
-		     (char *)&rcd_sn, sizeof(rcd_sn), buf[0]);
+		(char *)&rcd_sn, sizeof(rcd_sn), buf[0],
+		TLS_1_2_VERSION);
 
 	memcpy(iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, buf + TLS_HEADER_SIZE,
 	       TLS_CIPHER_AES_GCM_128_IV_SIZE);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 0f028cfdf835..d1c2fd9a3f63 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -463,7 +463,8 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
 	}
 
 	/* check version */
-	if (crypto_info->version != TLS_1_2_VERSION) {
+	if (crypto_info->version != TLS_1_2_VERSION &&
+	    crypto_info->version != TLS_1_3_VERSION) {
 		rc = -ENOTSUPP;
 		goto err_crypto_info;
 	}
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 34f3523f668e..06d7ae97b929 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -120,6 +120,34 @@ static int skb_nsg(struct sk_buff *skb, int offset, int len)
         return __skb_nsg(skb, offset, len, 0);
 }
 
+static int padding_length(struct tls_sw_context_rx *ctx,
+			  struct tls_context *tls_ctx, struct sk_buff *skb)
+{
+	struct strp_msg *rxm = strp_msg(skb);
+	int sub = 0;
+
+	/* Determine zero-padding length */
+	if (tls_ctx->crypto_recv.info.version == TLS_1_3_VERSION) {
+		char content_type = 0;
+		int err;
+		int back = 17;
+
+		while (content_type == 0) {
+			if (back > rxm->full_len)
+				return -EBADMSG;
+			err = skb_copy_bits(skb,
+					    rxm->offset + rxm->full_len - back,
+					    &content_type, 1);
+			if (content_type)
+				break;
+			sub++;
+			back++;
+		}
+		ctx->control = content_type;
+	}
+	return sub;
+}
+
 static void tls_decrypt_done(struct crypto_async_request *req, int err)
 {
 	struct aead_request *aead_req = (struct aead_request *)req;
@@ -142,7 +170,7 @@ static void tls_decrypt_done(struct crypto_async_request *req, int err)
 		tls_err_abort(skb->sk, err);
 	} else {
 		struct strp_msg *rxm = strp_msg(skb);
-
+		rxm->full_len -= padding_length(ctx, tls_ctx, skb);
 		rxm->offset += tls_ctx->rx.prepend_size;
 		rxm->full_len -= tls_ctx->rx.overhead_size;
 	}
@@ -448,6 +476,8 @@ static int tls_do_encryption(struct sock *sk,
 	int rc;
 
 	memcpy(rec->iv_data, tls_ctx->tx.iv, sizeof(rec->iv_data));
+	xor_iv_with_seq(tls_ctx->crypto_send.info.version, rec->iv_data,
+			tls_ctx->tx.rec_seq);
 
 	sge->offset += tls_ctx->tx.prepend_size;
 	sge->length -= tls_ctx->tx.prepend_size;
@@ -483,7 +513,8 @@ static int tls_do_encryption(struct sock *sk,
 
 	/* Unhook the record from context if encryption is not failure */
 	ctx->open_rec = NULL;
-	tls_advance_record_sn(sk, &tls_ctx->tx);
+	tls_advance_record_sn(sk, &tls_ctx->tx,
+			      tls_ctx->crypto_send.info.version);
 	return rc;
 }
 
@@ -640,7 +671,17 @@ static int tls_push_record(struct sock *sk, int flags,
 
 	i = msg_pl->sg.end;
 	sk_msg_iter_var_prev(i);
-	sg_mark_end(sk_msg_elem(msg_pl, i));
+
+	rec->content_type = record_type;
+	if (tls_ctx->crypto_send.info.version == TLS_1_3_VERSION) {
+		/* Add content type to end of message.  No padding added */
+		sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
+		sg_mark_end(&rec->sg_content_type);
+		sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
+			 &rec->sg_content_type);
+	} else {
+		sg_mark_end(sk_msg_elem(msg_pl, i));
+	}
 
 	i = msg_pl->sg.start;
 	sg_chain(rec->sg_aead_in, 2, rec->inplace_crypto ?
@@ -653,18 +694,22 @@ static int tls_push_record(struct sock *sk, int flags,
 	i = msg_en->sg.start;
 	sg_chain(rec->sg_aead_out, 2, &msg_en->sg.data[i]);
 
-	tls_make_aad(rec->aad_space, msg_pl->sg.size,
+	tls_make_aad(rec->aad_space, msg_pl->sg.size + tls_ctx->tx.tail_size,
 		     tls_ctx->tx.rec_seq, tls_ctx->tx.rec_seq_size,
-		     record_type);
+		     record_type,
+		     tls_ctx->crypto_send.info.version);
 
 	tls_fill_prepend(tls_ctx,
 			 page_address(sg_page(&msg_en->sg.data[i])) +
-			 msg_en->sg.data[i].offset, msg_pl->sg.size,
-			 record_type);
+			 msg_en->sg.data[i].offset,
+			 msg_pl->sg.size + tls_ctx->tx.tail_size,
+			 record_type,
+			 tls_ctx->crypto_send.info.version);
 
 	tls_ctx->pending_open_record_frags = false;
 
-	rc = tls_do_encryption(sk, tls_ctx, ctx, req, msg_pl->sg.size, i);
+	rc = tls_do_encryption(sk, tls_ctx, ctx, req,
+			       msg_pl->sg.size + tls_ctx->tx.tail_size, i);
 	if (rc < 0) {
 		if (rc != -EINPROGRESS) {
 			tls_err_abort(sk, EBADMSG);
@@ -1292,7 +1337,8 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 	u8 *aad, *iv, *mem = NULL;
 	struct scatterlist *sgin = NULL;
 	struct scatterlist *sgout = NULL;
-	const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
+	const int data_len = rxm->full_len - tls_ctx->rx.overhead_size +
+		tls_ctx->rx.tail_size;
 
 	if (*zc && (out_iov || out_sg)) {
 		if (out_iov)
@@ -1343,12 +1389,20 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
 		kfree(mem);
 		return err;
 	}
-	memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+	if (tls_ctx->crypto_recv.info.version == TLS_1_3_VERSION)
+		memcpy(iv, tls_ctx->rx.iv, crypto_aead_ivsize(ctx->aead_recv));
+	else
+		memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+	xor_iv_with_seq(tls_ctx->crypto_recv.info.version, iv,
+			tls_ctx->rx.rec_seq);
 
 	/* Prepare AAD */
-	tls_make_aad(aad, rxm->full_len - tls_ctx->rx.overhead_size,
+	tls_make_aad(aad, rxm->full_len - tls_ctx->rx.overhead_size +
+		     tls_ctx->rx.tail_size,
 		     tls_ctx->rx.rec_seq, tls_ctx->rx.rec_seq_size,
-		     ctx->control);
+		     ctx->control,
+		     tls_ctx->crypto_recv.info.version);
 
 	/* Prepare sgin */
 	sg_init_table(sgin, n_sgin);
@@ -1405,6 +1459,7 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	int version = tls_ctx->crypto_recv.info.version;
 	struct strp_msg *rxm = strp_msg(skb);
 	int err = 0;
 
@@ -1417,13 +1472,17 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
 		err = decrypt_internal(sk, skb, dest, NULL, chunk, zc, async);
 		if (err < 0) {
 			if (err == -EINPROGRESS)
-				tls_advance_record_sn(sk, &tls_ctx->rx);
+				tls_advance_record_sn(sk, &tls_ctx->rx,
+						      version);
 
 			return err;
 		}
+
+		rxm->full_len -= padding_length(ctx, tls_ctx, skb);
+
 		rxm->offset += tls_ctx->rx.prepend_size;
 		rxm->full_len -= tls_ctx->rx.overhead_size;
-		tls_advance_record_sn(sk, &tls_ctx->rx);
+		tls_advance_record_sn(sk, &tls_ctx->rx, version);
 		ctx->decrypted = true;
 		ctx->saved_data_ready(sk);
 	} else {
@@ -1611,7 +1670,8 @@ int tls_sw_recvmsg(struct sock *sk,
 		to_decrypt = rxm->full_len - tls_ctx->rx.overhead_size;
 
 		if (to_decrypt <= len && !is_kvec && !is_peek &&
-		    ctx->control == TLS_RECORD_TYPE_DATA)
+		    ctx->control == TLS_RECORD_TYPE_DATA &&
+		    tls_ctx->crypto_recv.info.version != TLS_1_3_VERSION)
 			zc = true;
 
 		err = decrypt_skb_update(sk, skb, &msg->msg_iter,
@@ -1835,9 +1895,12 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
 
 	data_len = ((header[4] & 0xFF) | (header[3] << 8));
 
-	cipher_overhead = tls_ctx->rx.tag_size + tls_ctx->rx.iv_size;
+	cipher_overhead = tls_ctx->rx.tag_size;
+	if (tls_ctx->crypto_recv.info.version != TLS_1_3_VERSION)
+		cipher_overhead += tls_ctx->rx.iv_size;
 
-	if (data_len > TLS_MAX_PAYLOAD_SIZE + cipher_overhead) {
+	if (data_len > TLS_MAX_PAYLOAD_SIZE + cipher_overhead +
+	    tls_ctx->rx.tail_size) {
 		ret = -EMSGSIZE;
 		goto read_failure;
 	}
@@ -1846,12 +1909,12 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
 		goto read_failure;
 	}
 
-	if (header[1] != TLS_VERSION_MINOR(tls_ctx->crypto_recv.info.version) ||
-	    header[2] != TLS_VERSION_MAJOR(tls_ctx->crypto_recv.info.version)) {
+	/* Note that both TLS1.3 and TLS1.2 use TLS_1_2 version here */
+	if (header[1] != TLS_1_2_VERSION_MINOR ||
+	    header[2] != TLS_1_2_VERSION_MAJOR) {
 		ret = -EINVAL;
 		goto read_failure;
 	}
-
 #ifdef CONFIG_TLS_DEVICE
 	handle_device_resync(strp->sk, TCP_SKB_CB(skb)->seq + rxm->offset,
 			     *(u64*)tls_ctx->rx.rec_seq);
@@ -2100,10 +2163,19 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 		goto free_priv;
 	}
 
-	cctx->aad_size = TLS_AAD_SPACE_SIZE;
+	if (crypto_info->version == TLS_1_3_VERSION) {
+		nonce_size = 0;
+		cctx->aad_size = TLS_HEADER_SIZE;
+		cctx->tail_size = 1;
+	} else {
+		cctx->aad_size = TLS_AAD_SPACE_SIZE;
+		cctx->tail_size = 0;
+	}
+
 	cctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
 	cctx->tag_size = tag_size;
-	cctx->overhead_size = cctx->prepend_size + cctx->tag_size;
+	cctx->overhead_size = cctx->prepend_size + cctx->tag_size +
+		cctx->tail_size;
 	cctx->iv_size = iv_size;
 	cctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
 			   GFP_KERNEL);
-- 
2.17.1


^ permalink raw reply related

* [PATCH] net: esp4: Fix double free on esp4 functions
From: Ramin Farajpour Cami @ 2019-01-30 21:35 UTC (permalink / raw)
  To: davem; +Cc: herbert, steffen.klassert, netdev, Ramin Farajpour Cami

key/tmp is being kfree'd twice,once in the "aalg_desc->uinfo.auth.icv_fullbits / 8 != crypto_aead_authsize(aead)" call
to "free_key",twice When "crypto_aead_setauthsize(aead, x->aalg->alg_trunc_len / 8)" fails call to again  "free_key",

Signed-off-by: Ramin Farajpour Cami <ramin.blackhat@gmail.com>
---
 net/ipv4/esp4.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 5459f41fc26f..5a66e47641b0 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -467,6 +467,7 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *
 
 error_free:
 	kfree(tmp);
+	tmp = NULL;
 error:
 	return err;
 }
@@ -959,7 +960,7 @@ static int esp_init_authenc(struct xfrm_state *x)
 
 free_key:
 	kfree(key);
-
+	key = NULL;
 error:
 	return err;
 }
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v5 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Alexei Starovoitov @ 2019-01-30 21:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, jannh, paulmck, will.deacon,
	mingo, netdev, kernel-team
In-Reply-To: <20190130210529.GI2278@hirez.programming.kicks-ass.net>

On Wed, Jan 30, 2019 at 10:05:29PM +0100, Peter Zijlstra wrote:
> 
> Would something like the below work for you instead?
> 
> I find it easier to read, and the additional CONFIG symbol would give
> architectures (say ARM) an easy way to force the issue.
> 
> 
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -221,6 +221,72 @@ const struct bpf_func_proto bpf_get_curr
>  	.arg2_type	= ARG_CONST_SIZE,
>  };
>  
> +#if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> +	arch_spinlock_t *l = (void *)lock;
> +	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
> +	if (1) {
> +		union {
> +			__u32 val;
> +			arch_spinlock_t lock;
> +		} u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
> +		compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
> +	}
> +	arch_spin_lock(l);

And archs can select CONFIG_BPF_ARCH_SPINLOCK when they don't
use qspinlock and their arch_spinlock_t is compatible ?
Nice. I like the idea!
Probably needs a kconfig change somewhere too...
I'll play with it...

> +}
> +
> +static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> +{
> +	arch_spinlock_t *l = (void *)lock;
> +	arch_spin_unlock(l);
> +}
> +
> +#else
> +
> +static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> +{
> +	atomic_t *l = (void *)lock;
> +	do {
> +		atomic_cond_read_relaxed(l, !VAL);

wow. that's quite a macro magic.
Should it be
atomic_cond_read_relaxed(l, (!VAL));
like qspinlock.c does ?


^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Waiman Long @ 2019-01-30 21:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team
In-Reply-To: <baafd74c-cd6d-9bd0-30f8-559dbc60c3d1@redhat.com>

On 01/30/2019 04:11 PM, Waiman Long wrote:
> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>> Lockdep warns about false positive:
>>>>> This is not a false positive, and you probably also need to use
>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>
>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>> relies on 'owner' tracking.
>>>> Can you point out in the code the spin bit?
>>>> As far as I can see sem->owner is debug only feature.
>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>> The debugging part is just an add-on. It is not the reason for the
>>> presence of sem->owner.
>> I see. Got it.
>>
>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>> with this patch today.
>>>>
>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>> should be easy to do.
>> Yes, but looking through the code it's not clear to me that it's safe
>> to mix non_owner() versions with regular.
>> bpf/stackmap.c does down_read_trylock + up_read.
>> If we add new down_read_trylock_non_owner that set the owner to
>> NULL | RWSEM_* bits is this safe with conccurent read/write
>> that do regular versions?
>> rwsem_can_spin_on_owner() does:
>>         if (owner) {
>>                 ret = is_rwsem_owner_spinnable(owner) &&
>>                       owner_on_cpu(owner);
>> that looks correct.
>> For a second I thought there could be fault here due to non_owner.
>> But there could be other places where it's assumed that owner
>> is never null?
> The content of owner is not the cause of the lockdep warning. The
> lockdep code assumes that the task that acquires the lock will release
> it some time later. That is not the case when you need to acquire the
> lock by one task and released by another. In this case, you have to use
> the non_owner version of down/up_read which disable the lockdep
> acquire/release tracking. That will be the only difference between the
> two set of APIs.
>
> Cheers,
> Longman

BTW, you may want to do something like that to make sure that the lock
ownership is probably tracked.

-Longman

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b145..79eef9d 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
bpf_stack_
        } else {
                work->sem = &current->mm->mmap_sem;
                irq_work_queue(&work->irq_work);
+
+               /*
+                * The irq_work will release the mmap_sem with
+                * up_read_non_owner(). The rwsem_release() is called
+                * here to release the lock from lockdep's perspective.
+                */
+               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
        }
 }


^ permalink raw reply related

* Re: [PATCH net-next v2 1/7] devlink: add device information API
From: Jiri Pirko @ 2019-01-30 21:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, oss-drivers, andrew, f.fainelli, mkubecek, eugenem,
	jonathan.lemon
In-Reply-To: <20190130190513.25718-2-jakub.kicinski@netronome.com>

Wed, Jan 30, 2019 at 08:05:07PM CET, jakub.kicinski@netronome.com wrote:
>ethtool -i has served us well for a long time, but its showing
>its limitations more and more.  The device information should

Double space here -------------^^


>also be reported per device not per-netdev.
>
>Lay foundation for a simple devlink-based way of reading device
>info.  Add driver name and device serial number as initial pieces
      ^^---------------+
Double space here -----+


>of information exposed via this new API.
>
>RFC v2:
> - wrap the skb into an opaque structure (Jiri);
> - allow the serial number of be any length (Jiri & Andrew);
> - add driver name (Jonathan).
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/devlink.h        |  18 ++++++
> include/uapi/linux/devlink.h |   5 ++
> net/core/devlink.c           | 114 +++++++++++++++++++++++++++++++++++
> 3 files changed, 137 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 85c9eabaf056..5ef3570a3859 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -429,6 +429,7 @@ enum devlink_param_wol_types {
> }
> 
> struct devlink_region;
>+struct devlink_info_req;
> 
> typedef void devlink_snapshot_data_dest_t(const void *data);
> 
>@@ -484,6 +485,8 @@ struct devlink_ops {
> 	int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode);
> 	int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode,
> 				      struct netlink_ext_ack *extack);
>+	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
>+			struct netlink_ext_ack *extack);
> };
> 
> static inline void *devlink_priv(struct devlink *devlink)
>@@ -607,6 +610,10 @@ u32 devlink_region_shapshot_id_get(struct devlink *devlink);
> int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
> 				   u8 *data, u32 snapshot_id,
> 				   devlink_snapshot_data_dest_t *data_destructor);
>+int devlink_info_report_serial_number(struct devlink_info_req *req,
>+				      const char *sn);

I don't like the "report" part. The rest of the code uses "put".

Also. I think that verb should be at the
end of the function name, as it is common in the rest of the code.

So please rename to:
devlink_info_serial_number_put()
Same for the rest.


>+int devlink_info_report_driver_name(struct devlink_info_req *req,
>+				    const char *name);
> 
> #else
> 
>@@ -905,6 +912,17 @@ devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
> 	return 0;
> }
> 
>+static inline int
>+devlink_info_report_driver_name(struct devlink_info_req *req, const char *name)
>+{
>+	return 0;
>+}
>+
>+static inline int
>+devlink_info_report_serial_number(struct devlink_info_req *req, const char *sn)
>+{
>+	return 0;
>+}
> #endif
> 
> #endif /* _NET_DEVLINK_H_ */
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 61b4447a6c5b..fd089baa7c50 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -94,6 +94,8 @@ enum devlink_command {
> 	DEVLINK_CMD_PORT_PARAM_NEW,
> 	DEVLINK_CMD_PORT_PARAM_DEL,
> 
>+	DEVLINK_CMD_INFO_GET,		/* can dump */
>+
> 	/* add new commands above here */
> 	__DEVLINK_CMD_MAX,
> 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>@@ -290,6 +292,9 @@ enum devlink_attr {
> 	DEVLINK_ATTR_REGION_CHUNK_ADDR,         /* u64 */
> 	DEVLINK_ATTR_REGION_CHUNK_LEN,          /* u64 */
> 
>+	DEVLINK_ATTR_INFO_DRV_NAME,		/* string */

Please be consistent across the names of function, attr etc. So:
	DEVLINK_ATTR_INFO_DRIVER_NAME,


Otherwise, this looks good.

Thanks!

[...]

^ permalink raw reply

* Re: [ethtool 1/6] ethtool: move option parsing related code into function
From: John W. Linville @ 2019-01-30 21:08 UTC (permalink / raw)
  To: Nicholas Nunley; +Cc: Jeff Kirsher, netdev, nhorman, sassmann
In-Reply-To: <20190117230313.20248-1-nicholas.d.nunley@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]

On Thu, Jan 17, 2019 at 03:03:08PM -0800, Nicholas Nunley wrote:
> Move option parsing code into find_option function.
> 
> No behavior changes.
> 
> Based on patch by Kan Liang <kan.liang@intel.com>
> 
> Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>

Well, after looking at this series for a while I had decided to apply it. But when I applied it and did a 'make distcheck', I got this:

...

gcc -DTEST_ETHTOOL -g -O2   -o test-features test_features-test-features.o test_features-test-common.o test_features-ethtool.o test_features-rxclass.o test_features-amd8111e.o test_features-de2104x.o test_features-dsa.o test_features-e100.o test_features-e1000.o test_features-et131x.o test_features-igb.o test_features-fec_8xx.o test_features-ibm_emac.o test_features-ixgb.o test_features-ixgbe.o test_features-natsemi.o test_features-pcnet32.o test_features-realtek.o test_features-tg3.o test_features-marvell.o test_features-vioc.o test_features-smsc911x.o test_features-at76c50x-usb.o test_features-sfc.o test_features-stmmac.o test_features-sff-common.o test_features-sfpid.o test_features-sfpdiag.o test_features-ixgbevf.o test_features-tse.o test_features-vmxnet3.o test_features-qsfp.o test_features-fjes.o test_features-lan78xx.o -lm 
make[2]: Leaving directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
make  check-TESTS
make[2]: Entering directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
make[3]: Entering directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
FAIL: test-cmdline
PASS: test-features
============================================================================
Testsuite summary for ethtool 4.19
============================================================================
# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to netdev@vger.kernel.org
============================================================================
make[3]: *** [Makefile:1942: test-suite.log] Error 1
make[3]: Leaving directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
make[2]: *** [Makefile:2050: check-TESTS] Error 2
make[2]: Leaving directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
make[1]: *** [Makefile:2264: check-am] Error 2
make[1]: Leaving directory '/home/linville/git/ethtool/ethtool-4.19/_build/sub'
make: *** [Makefile:2186: distcheck] Error 1

...

I'll attach ./ethtool-4.19/_build/sub/test-suite.log to this
message. Obviously we need whatever additional changes are needed to
get 'make check-TESTS' to pass legitimately.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

[-- Attachment #2: test-suite.log --]
[-- Type: text/plain, Size: 1654 bytes --]

====================================
   ethtool 4.19: ./test-suite.log
====================================

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: test-cmdline
==================

E: ethtool -s devname returns 1
E: ethtool --change devname speed 100 duplex half mdix auto returns 1
E: ethtool -s devname duplex half returns 1
E: ethtool --change devname port tp returns 1
E: ethtool -s devname autoneg on returns 1
E: ethtool --change devname advertise 0x1 returns 1
E: ethtool --change devname advertise 0xf returns 1
E: ethtool --change devname advertise 0Xf returns 1
E: ethtool --change devname advertise 1 returns 1
E: ethtool --change devname advertise f returns 1
E: ethtool --change devname advertise 01 returns 1
E: ethtool --change devname advertise 0f returns 1
E: ethtool --change devname advertise 0xfffffffffffffffffffffffffffffffff returns 1
E: ethtool --change devname advertise fffffffffffffffffffffffffffffffff returns 1
E: ethtool --change devname advertise 0x0000fffffffffffffffffffffffffffff returns 1
E: ethtool --change devname advertise 0000fffffffffffffffffffffffffffff returns 1
E: ethtool -s devname phyad 1 returns 1
E: ethtool --change devname xcvr external returns 1
E: ethtool -s devname wol p returns 1
E: ethtool -s devname sopass 01:23:45:67:89:ab returns 1
E: ethtool -s devname msglvl 1 returns 1
E: ethtool -s devname msglvl hw on rx_status off returns 1
E: ethtool --change devname speed 100 duplex half port tp autoneg on advertise 0x1 phyad 1 xcvr external wol p sopass 01:23:45:67:89:ab msglvl 1 returns 1
FAIL test-cmdline (exit status: 1)


^ permalink raw reply

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
From: Waiman Long @ 2019-01-30 21:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team
In-Reply-To: <20190130201047.hscaxd7434hrznxx@ast-mbp.dhcp.thefacebook.com>

On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>> Lockdep warns about false positive:
>>>> This is not a false positive, and you probably also need to use
>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>
>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>> relies on 'owner' tracking.
>>> Can you point out in the code the spin bit?
>>> As far as I can see sem->owner is debug only feature.
>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>> No, sem->owner is mainly for performing optimistic spinning which is a
>> performance feature to make rwsem writer-lock performs similar to mutex.
>> The debugging part is just an add-on. It is not the reason for the
>> presence of sem->owner.
> I see. Got it.
>
>>> Also there is no down_read_trylock_non_owner() at the moment.
>>> We can argue about it for -next, but I'd rather silence lockdep
>>> with this patch today.
>>>
>> We can add down_read_trylock_non_owner() if there is a need for it. It
>> should be easy to do.
> Yes, but looking through the code it's not clear to me that it's safe
> to mix non_owner() versions with regular.
> bpf/stackmap.c does down_read_trylock + up_read.
> If we add new down_read_trylock_non_owner that set the owner to
> NULL | RWSEM_* bits is this safe with conccurent read/write
> that do regular versions?
> rwsem_can_spin_on_owner() does:
>         if (owner) {
>                 ret = is_rwsem_owner_spinnable(owner) &&
>                       owner_on_cpu(owner);
> that looks correct.
> For a second I thought there could be fault here due to non_owner.
> But there could be other places where it's assumed that owner
> is never null?

The content of owner is not the cause of the lockdep warning. The
lockdep code assumes that the task that acquires the lock will release
it some time later. That is not the case when you need to acquire the
lock by one task and released by another. In this case, you have to use
the non_owner version of down/up_read which disable the lockdep
acquire/release tracking. That will be the only difference between the
two set of APIs.

Cheers,
Longman

>
> May be we should live with this lockdep warn in bpf tree
> and fix it only in bpf-next?
>



^ permalink raw reply

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Paul E. McKenney @ 2019-01-30 21:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Will Deacon, Peter Zijlstra, Alexei Starovoitov, davem, daniel,
	jakub.kicinski, netdev, kernel-team, mingo, jannh
In-Reply-To: <20190130195113.xyqre4sxasit6vpu@ast-mbp.dhcp.thefacebook.com>

On Wed, Jan 30, 2019 at 11:51:14AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 10:36:18AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> > > Hi Alexei,
> > > 
> > > On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > > > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > > > BPF memory model is equal to memory model of underlying architecture.
> > > > > > What we can do is to make it bpf progs a bit more portable with
> > > > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > > > 
> > > > > Well, I agree with only introducing bits you actually need, and my
> > > > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > > > smp_store_release() might have been a far more useful example.
> > > > > 
> > > > > But I disagree with the last part; we have to pick a model now;
> > > > > otherwise you'll pain yourself into a corner.
> > > > > 
> > > > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > > > be gaining a lot of attention and that is very much a weak architecture.
> > > > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > > > the long run.
> > > > 
> > > > arm64 is gaining attention just like riscV is gaining it too.
> > > > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > > > BPF is not picking sides in CPU HW and ISA battles.
> > > 
> > > It's not about picking a side, it's about providing an abstraction of the
> > > various CPU architectures out there so that the programmer doesn't need to
> > > worry about where their program may run. Hell, even if you just said "eBPF
> > > follows x86 semantics" that would be better than saying nothing (and then we
> > > could have a discussion about whether x86 semantics are really what you
> > > want).
> > 
> > To reinforce this point, the Linux-kernel memory model (tools/memory-model)
> > is that abstraction for the Linux kernel.  Why not just use that for BPF?
> 
> I already answered this earlier in the thread.
> tldr: not going to sacrifice performance.

Understood.

But can we at least say that where there are no performance consequences,
BPF should follow LKMM?  You already mentioned smp_load_acquire()
and smp_store_release(), but the void atomics (e.g., atomic_inc())
should also work because they don't provide any ordering guarantees.
The _relaxed(), _release(), and _acquire() variants of the value-returning
atomics should be just fine as well.

The other value-returning atomics have strong ordering, which is fine
on many systems, but potentially suboptimal for the weakly ordered ones.
Though you have to have pretty good locality of reference to be able to
see the difference, because otherwise cache-miss overhead dominates.

Things like cmpxchg() don't seem to fit BPF because they are normally
used in spin loops, though there are some non-spinning use cases.

You correctly pointed out that READ_ONCE() and WRITE_ONCE() are suboptimal
on systems that don't support all sizes of loads, but I bet that there
are some sizes for which they are just fine across systems, for example,
pointer size and int size.

Does that help?  Or am I missing additional cases where performance
could be degraded?

							Thanx, Paul


^ permalink raw reply

* Re: [PATCH v5 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Peter Zijlstra @ 2019-01-30 21:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, jannh, paulmck, will.deacon, mingo, netdev,
	kernel-team
In-Reply-To: <20190128025010.342241-2-ast@kernel.org>

On Sun, Jan 27, 2019 at 06:50:02PM -0800, Alexei Starovoitov wrote:
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a74972b07e74..e1d6aefbab50 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -221,6 +221,63 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
>  	.arg2_type	= ARG_CONST_SIZE,
>  };
>  
> +#ifndef CONFIG_QUEUED_SPINLOCKS
> +struct dumb_spin_lock {
> +	atomic_t val;
> +};
> +#endif
> +
> +notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
> +{
> +#if defined(CONFIG_SMP)
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +	struct qspinlock *qlock = (void *)lock;
> +
> +	BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock));
> +	queued_spin_lock(qlock);
> +#else
> +	struct dumb_spin_lock *qlock = (void *)lock;
> +
> +	BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock));
> +	do {
> +		while (atomic_read(&qlock->val) != 0)
> +			cpu_relax();
> +	} while (atomic_cmpxchg(&qlock->val, 0, 1) != 0);
> +#endif
> +#endif
> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_lock_proto = {
> +	.func		= bpf_spin_lock,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
> +};
> +
> +notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
> +{
> +#if defined(CONFIG_SMP)
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +	struct qspinlock *qlock = (void *)lock;
> +
> +	queued_spin_unlock(qlock);
> +#else
> +	struct dumb_spin_lock *qlock = (void *)lock;
> +
> +	atomic_set_release(&qlock->val, 0);
> +#endif
> +#endif
> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_spin_unlock_proto = {
> +	.func		= bpf_spin_unlock,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
> +};
> +
>  #ifdef CONFIG_CGROUPS
>  BPF_CALL_0(bpf_get_current_cgroup_id)
>  {

Would something like the below work for you instead?

I find it easier to read, and the additional CONFIG symbol would give
architectures (say ARM) an easy way to force the issue.


--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -221,6 +221,72 @@ const struct bpf_func_proto bpf_get_curr
 	.arg2_type	= ARG_CONST_SIZE,
 };
 
+#if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
+
+static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
+{
+	arch_spinlock_t *l = (void *)lock;
+	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
+	if (1) {
+		union {
+			__u32 val;
+			arch_spinlock_t lock;
+		} u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
+		compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
+	}
+	arch_spin_lock(l);
+}
+
+static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
+{
+	arch_spinlock_t *l = (void *)lock;
+	arch_spin_unlock(l);
+}
+
+#else
+
+static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
+{
+	atomic_t *l = (void *)lock;
+	do {
+		atomic_cond_read_relaxed(l, !VAL);
+	} while (atomic_xchg(l, 1));
+}
+
+static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
+{
+	atomic_t *l = (void *)lock;
+	atomic_set_release(l, 0);
+}
+
+#endif
+
+notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
+{
+	__bpf_spin_lock(lock);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_spin_lock_proto = {
+	.func		= bpf_spin_lock,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+};
+
+notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
+{
+	__bpf_spin_unlock(lock);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_spin_unlock_proto = {
+	.func		= bpf_spin_unlock,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+};
+
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {

^ permalink raw reply

* KASAN: use-after-free Read in __wake_up_common_lock
From: syzbot @ 2019-01-30 21:02 UTC (permalink / raw)
  To: isdn, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    62967898789d Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10f0bf08c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
dashboard link: https://syzkaller.appspot.com/bug?extid=fb065bc06d3d4054be6f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+fb065bc06d3d4054be6f@syzkaller.appspotmail.com

QAT: Invalid ioctl
==================================================================
BUG: KASAN: use-after-free in debug_spin_lock_before  
kernel/locking/spinlock_debug.c:83 [inline]
BUG: KASAN: use-after-free in do_raw_spin_lock+0x303/0x360  
kernel/locking/spinlock_debug.c:112
Read of size 4 at addr ffff88808738e92c by task syz-executor1/8644

CPU: 1 PID: 8644 Comm: syz-executor1 Not tainted 5.0.0-rc4+ #50
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
  kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134
  debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
  do_raw_spin_lock+0x303/0x360 kernel/locking/spinlock_debug.c:112
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
  _raw_spin_lock_irqsave+0x9d/0xcd kernel/locking/spinlock.c:152
  __wake_up_common_lock+0x19b/0x390 kernel/sched/wait.c:120
  __wake_up+0xe/0x10 kernel/sched/wait.c:145
  dev_expire_timer+0x14b/0x570 drivers/isdn/mISDN/timerdev.c:174
  call_timer_fn+0x254/0x900 kernel/time/timer.c:1325
  expire_timers kernel/time/timer.c:1362 [inline]
  __run_timers+0x6fc/0xd50 kernel/time/timer.c:1681
IPVS: ftp: loaded support on port[0] = 21
  run_timer_softirq+0x52/0xb0 kernel/time/timer.c:1694
  __do_softirq+0x30b/0xb11 kernel/softirq.c:292
  invoke_softirq kernel/softirq.c:373 [inline]
  irq_exit+0x180/0x1d0 kernel/softirq.c:413
  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
  smp_apic_timer_interrupt+0x1b7/0x760 arch/x86/kernel/apic/apic.c:1062
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:766  
[inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160  
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x95/0xe0  
kernel/locking/spinlock.c:184
Code: 48 c7 c0 30 82 92 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c  
10 00 75 39 48 83 3d 12 d0 9d 01 00 74 24 48 89 df 57 9d <0f> 1f 44 00 00  
bf 01 00 00 00 e8 dc 75 63 f9 65 8b 05 95 3b 0d 78
RSP: 0018:ffff888050a7f360 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1325046 RBX: 0000000000000282 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 0000000000000282
RBP: ffff888050a7f370 R08: ffff888088b9e000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8b72b128
R13: ffff888088b9e000 R14: 00000000000d7480 R15: ffffffff8b72b128
  __debug_object_init+0x1c0/0x12d0 lib/debugobjects.c:418
  debug_object_init+0x16/0x20 lib/debugobjects.c:431
  __init_work+0x50/0x60 kernel/workqueue.c:504
  call_usermodehelper_setup+0x133/0x410 kernel/umh.c:389
  call_modprobe kernel/kmod.c:94 [inline]
  __request_module+0x4f5/0xeea kernel/kmod.c:171
  crypto_larval_lookup crypto/api.c:237 [inline]
  crypto_alg_mod_lookup+0x54e/0x6d0 crypto/api.c:280
QAT: Invalid ioctl
  crypto_find_alg crypto/api.c:504 [inline]
  crypto_alloc_tfm+0xd9/0x2f0 crypto/api.c:537
  crypto_alloc_skcipher+0x2d/0x40 crypto/skcipher.c:945
  cryptd_alloc_skcipher+0x121/0x270 crypto/cryptd.c:1226
IPVS: ftp: loaded support on port[0] = 21
  simd_skcipher_init+0x6c/0x1c0 crypto/simd.c:119
  crypto_skcipher_init_tfm+0x299/0x8c0 crypto/skcipher.c:862
  crypto_create_tfm+0xec/0x310 crypto/api.c:471
  crypto_alloc_tfm+0x104/0x2f0 crypto/api.c:543
  crypto_alloc_skcipher+0x2d/0x40 crypto/skcipher.c:945
QAT: Invalid ioctl
  skcipher_bind+0x26/0x30 crypto/algif_skcipher.c:310
  alg_bind+0x25d/0x570 crypto/af_alg.c:183
  __sys_bind+0x30b/0x420 net/socket.c:1483
  __do_sys_bind net/socket.c:1494 [inline]
  __se_sys_bind net/socket.c:1492 [inline]
  __x64_sys_bind+0x73/0xb0 net/socket.c:1492
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458089
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f7508d5ac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458089
RDX: 0000000000000058 RSI: 0000000020000340 RDI: 000000000000000a
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7508d5b6d4
R13: 00000000004be0ca R14: 00000000004ce420 R15: 00000000ffffffff

Allocated by task 8647:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_kmalloc mm/kasan/common.c:496 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
  kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
  kmalloc include/linux/slab.h:545 [inline]
  mISDN_open+0x104/0x3f0 drivers/isdn/mISDN/timerdev.c:59
  misc_open+0x398/0x4c0 drivers/char/misc.c:141
  chrdev_open+0x270/0x7c0 fs/char_dev.c:417
  do_dentry_open+0x48a/0x1210 fs/open.c:771
  vfs_open+0xa0/0xd0 fs/open.c:880
  do_last fs/namei.c:3418 [inline]
  path_openat+0x144f/0x5650 fs/namei.c:3534
  do_filp_open+0x26f/0x370 fs/namei.c:3564
  do_sys_open+0x59a/0x7c0 fs/open.c:1063
  __do_sys_openat fs/open.c:1090 [inline]
  __se_sys_openat fs/open.c:1084 [inline]
  __x64_sys_openat+0x9d/0x100 fs/open.c:1084
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 8649:
  save_stack+0x45/0xd0 mm/kasan/common.c:73
  set_track mm/kasan/common.c:85 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
  __cache_free mm/slab.c:3487 [inline]
  kfree+0xcf/0x230 mm/slab.c:3806
  mISDN_close+0x39b/0x530 drivers/isdn/mISDN/timerdev.c:97
  __fput+0x3c5/0xb10 fs/file_table.c:278
  ____fput+0x16/0x20 fs/file_table.c:309
  task_work_run+0x1f4/0x2b0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
  exit_to_usermode_loop+0x32a/0x3b0 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x696/0x800 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88808738e900
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 44 bytes inside of
  192-byte region [ffff88808738e900, ffff88808738e9c0)
The buggy address belongs to the page:
page:ffffea00021ce380 count:1 mapcount:0 mapping:ffff88812c3f0040 index:0x0
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea0002927088 ffffea000298ecc8 ffff88812c3f0040
raw: 0000000000000000 ffff88808738e000 0000000100000010 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88808738e800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff88808738e880: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88808738e900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                   ^
  ffff88808738e980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff88808738ea00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* [PATCH net-next 02/12] net: hns3: fix VF dump register issue
From: Huazhong Tan @ 2019-01-30 20:55 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, huangdaode, yisen.zhuang, salil.mehta,
	linuxarm, Jian Shen, Huazhong Tan
In-Reply-To: <20190130205552.8512-1-tanhuazhong@huawei.com>

From: Jian Shen <shenjian15@huawei.com>

In original codes, the .get_regs_len and .get_regs were missed
assigned. This patch fixes it.

Fixes: 1600c3e5f23e ("net: hns3: Support "ethtool -d" for HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index ed73f7fc9171..76ef06a7c261 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1115,6 +1115,8 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
 	.get_channels = hns3_get_channels,
 	.get_coalesce = hns3_get_coalesce,
 	.set_coalesce = hns3_set_coalesce,
+	.get_regs_len = hns3_get_regs_len,
+	.get_regs = hns3_get_regs,
 	.get_link = hns3_get_link,
 };
 
-- 
2.20.1



^ permalink raw reply related

* [PATCH net-next 00/12] code optimizations & bugfixes for HNS3 driver
From: Huazhong Tan @ 2019-01-30 20:55 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, huangdaode, yisen.zhuang, salil.mehta,
	linuxarm, Huazhong Tan

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver

Huazhong Tan (4):
  net: hns3: change hnae3_register_ae_dev() to int
  net: hns3: Fix NULL deref when unloading driver
  net: hns3: fix netif_napi_del() not do problem when unloading
  net: hns3: fix improper error handling in the hclge_init_ae_dev()

Jian Shen (4):
  net: hns3: fix VF dump register issue
  net: hns3: fix for rss result nonuniform
  net: hns3: stop sending keep alive msg to PF when VF is resetting
  net: hns3: keep flow director state unchanged when reset

Peng Li (2):
  net: hns3: use the correct interface to stop|open port
  net: hns3: fix an issue for hclgevf_ae_get_hdev

Yunsheng Lin (1):
  net: hns3: only support tc 0 for VF

liyongxin (1):
  net: hns3: reuse the definition of l3 and l4 header info union

 drivers/net/ethernet/hisilicon/hns3/hnae3.c   | 10 +-
 drivers/net/ethernet/hisilicon/hns3/hnae3.h   |  4 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   | 95 +++++++++----------
 .../net/ethernet/hisilicon/hns3/hns3_enet.h   |  1 +
 .../ethernet/hisilicon/hns3/hns3_ethtool.c    |  6 +-
 .../hisilicon/hns3/hns3pf/hclge_dcb.c         | 12 +--
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 50 +++++-----
 .../hisilicon/hns3/hns3pf/hclge_main.h        |  2 +-
 .../hisilicon/hns3/hns3pf/hclge_mbx.c         | 10 +-
 .../hisilicon/hns3/hns3pf/hclge_mdio.c        |  8 +-
 .../hisilicon/hns3/hns3pf/hclge_mdio.h        |  4 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 22 +++--
 .../hisilicon/hns3/hns3vf/hclgevf_main.c      | 25 ++++-
 13 files changed, 145 insertions(+), 104 deletions(-)

-- 
2.20.1



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox