Netdev List
 help / color / mirror / Atom feed
* [RESEND PATCH] bpfilter: check compiler capability in Kconfig
From: Masahiro Yamada @ 2018-06-26  3:55 UTC (permalink / raw)
  To: David S . Miller, netdev, Alexei Starovoitov
  Cc: Matteo Croce, Arnd Bergmann, Masahiro Yamada, linux-kbuild,
	Alexei Starovoitov, linux-kernel, Michal Marek, Daniel Borkmann

With the brand-new syntax extension of Kconfig, we can directly
check the compiler capability in the configuration phase.

If the cc-can-link.sh fails, the BPFILTER_UMH is automatically
hidden by the dependency.

I also deleted 'default n', which is no-op.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 Makefile               | 5 -----
 net/Makefile           | 4 ----
 net/bpfilter/Kconfig   | 2 +-
 scripts/cc-can-link.sh | 2 +-
 4 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/Makefile b/Makefile
index a6d4872..f71ea52 100644
--- a/Makefile
+++ b/Makefile
@@ -520,11 +520,6 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) $(KBUILD_CFLA
   KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
 endif
 
-ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/cc-can-link.sh $(CC)), y)
-  CC_CAN_LINK := y
-  export CC_CAN_LINK
-endif
-
 # The expansion should be delayed until arch/$(SRCARCH)/Makefile is included.
 # Some architectures define CROSS_COMPILE in arch/$(SRCARCH)/Makefile.
 # CC_VERSION_TEXT is referenced from Kconfig (so it needs export),
diff --git a/net/Makefile b/net/Makefile
index 13ec0d5..bdaf539 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -20,11 +20,7 @@ obj-$(CONFIG_TLS)		+= tls/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX)		+= unix/
 obj-$(CONFIG_NET)		+= ipv6/
-ifneq ($(CC_CAN_LINK),y)
-$(warning CC cannot link executables. Skipping bpfilter.)
-else
 obj-$(CONFIG_BPFILTER)		+= bpfilter/
-endif
 obj-$(CONFIG_PACKET)		+= packet/
 obj-$(CONFIG_NET_KEY)		+= key/
 obj-$(CONFIG_BRIDGE)		+= bridge/
diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
index a948b07..76deb66 100644
--- a/net/bpfilter/Kconfig
+++ b/net/bpfilter/Kconfig
@@ -1,6 +1,5 @@
 menuconfig BPFILTER
 	bool "BPF based packet filtering framework (BPFILTER)"
-	default n
 	depends on NET && BPF && INET
 	help
 	  This builds experimental bpfilter framework that is aiming to
@@ -9,6 +8,7 @@ menuconfig BPFILTER
 if BPFILTER
 config BPFILTER_UMH
 	tristate "bpfilter kernel module with user mode helper"
+	depends on $(success,$(srctree)/scripts/cc-can-link.sh $(CC))
 	default m
 	help
 	  This builds bpfilter kernel module with embedded user mode helper
diff --git a/scripts/cc-can-link.sh b/scripts/cc-can-link.sh
index 208eb28..6efcead 100755
--- a/scripts/cc-can-link.sh
+++ b/scripts/cc-can-link.sh
@@ -1,7 +1,7 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
-cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1 && echo "y"
+cat << "END" | $@ -x c - -o /dev/null >/dev/null 2>&1
 #include <stdio.h>
 int main(void)
 {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] net: preserve sock reference when scrubbing the skb.
From: Cong Wang @ 2018-06-26  4:15 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: Linux Kernel Network Developers, Eric Dumazet, Paolo Abeni,
	David Miller, Florian Westphal, NetFilter
In-Reply-To: <20180625155610.30802-1-fbl@redhat.com>

On Mon, Jun 25, 2018 at 8:59 AM Flavio Leitner <fbl@redhat.com> wrote:
>
> The sock reference is lost when scrubbing the packet and that breaks
> TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
> performance impacts of about 50% in a single TCP stream when crossing
> network namespaces.
>
> XPS breaks because the queue mapping stored in the socket is not
> available, so another random queue might be selected when the stack
> needs to transmit something like a TCP ACK, or TCP Retransmissions.
> That causes packet re-ordering and/or performance issues.
>
> TSQ breaks because it orphans the packet while it is still in the
> host, so packets are queued contributing to the buffer bloat problem.

Why should TSQ in one stack care about buffer bloat in another stack?

Actually, I think the current behavior is correct, once the packet leaves
its current stack (or netns), it should relief the backpressure on TCP
socket in this stack, whether it will be queued in another stack is beyond
its concern. This breaks the isolation between networking stacks.

^ permalink raw reply

* Re: [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Leon Romanovsky @ 2018-06-26  4:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Rasmus Villemoes, Doug Ledford, Kees Cook, RDMA mailing list,
	Hadar Hen Zion, Matan Barak, Michael J Ruhl, Noa Osherovich,
	Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev,
	linux-kernel
In-Reply-To: <20180625171157.GE5356@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]

On Mon, Jun 25, 2018 at 11:11:57AM -0600, Jason Gunthorpe wrote:
> On Mon, Jun 25, 2018 at 11:26:05AM +0200, Rasmus Villemoes wrote:
>
> >    check_shift_overflow(a, s, d) {
> >        unsigned _nbits = 8*sizeof(a);
> >        typeof(a) _a = (a);
> >        typeof(s) _s = (s);
> >        typeof(d) _d = (d);
> >
> >        *_d = ((u64)(_a) << (_s & (_nbits-1)));
> >        _s >= _nbits || (_s > 0 && (_a >> (_nbits - _s -
> >    is_signed_type(a))) != 0);
> >    }
>
> Those types are not quite right.. What about this?
>
>     check_shift_overflow(a, s, d) ({
>         unsigned int _nbits = 8*sizeof(d) - is_signed_type(d);
>         typeof(d) _a = a;  // Shift is always performed on type 'd'
>         typeof(s) _s = s;
>         typeof(d) _d = d;
>
>         *_d = (_a << (_s & (_nbits-1)));
>
> 	(((*_d) >> (_s & (_nbits-1)) != _a);
>     })
>
> And can we use mathamatcial invertability to prove no overlow and
> bound _a ? As above.

Rasmus and Jason,

Thanks for the feedback.
The reason why I introduced function, because wanted to reuse
check_mul_overflow macro, but for any reasons which I don't remember
now, I had hard time to fix compilation errors.

Anyway, I'll resubmit.

Thanks


>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-next 00/12] RDMA fixes 2018-06-24
From: Leon Romanovsky @ 2018-06-26  4:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Matan Barak, Michael J Ruhl,
	Noa Osherovich, Raed Salem, Yishai Hadas, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20180625213438.GA19857@ziepe.ca>

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

On Mon, Jun 25, 2018 at 03:34:38PM -0600, Jason Gunthorpe wrote:
> On Sun, Jun 24, 2018 at 11:23:41AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Hi,
> >
> > This is bunch of patches trigged by running syzkaller internally.
> >
> > I'm sending them based on rdma-next mainly for two reasons:
> > 1, Most of the patches fix the old issues and it doesn't matter when
> > they will hit the Linus's tree: now or later in a couple of weeks
> > during merge window.
> > 2. They interleave with code cleanup, mlx5-next patches and Michael's
> > feedback on flow counters series.
> >
> > Thanks
> >
> > Leon Romanovsky (12):
> >   RDMA/uverbs: Protect from attempts to create flows on unsupported QP
> >   RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
>
> I applied these two to for-rc
>
> >   RDMA/uverbs: Check existence of create_flow callback
> >   RDMA/verbs: Drop kernel variant of create_flow
> >   RDMA/verbs: Drop kernel variant of destroy_flow
> >   net/mlx5: Rate limit errors in command interface
> >   RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
> >   RDMA/umem: Don't check for negative return value of dma_map_sg_attrs()
> >   RDMA/uverbs: Remove redundant check
>
> These to for-next

Jason,

We would like to see patch "[PATCH mlx5-next 05/12] net/mlx5:
Rate limit errors in command interface" in out mlx5-next. Is it possible
at this point to drop it from for-next, so I'll be able to take it into
mlx5-next?

Thanks

>
> >   overflow.h: Add arithmetic shift helper
> >   RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
> >   RDMA/mlx5: Reuse existed shift_overlow helper
>
> And these will have to be respun.
>
> Thanks,
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Sergey Senozhatsky @ 2018-06-26  4:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, syzbot, linux-kernel, pmladek,
	sergey.senozhatsky, syzkaller-bugs, Samuel Ortiz, David S. Miller,
	linux-wireless, netdev
In-Reply-To: <20180625225937.43aee76c@vmware.local.home>

On (06/25/18 22:59), Steven Rostedt wrote:
> > Or ratelimited error reporting and cond_resched()
> > [that would be option B]:
> 
> I don't think this is a printk() issue per se, so I think Option B is
> the only option. You should not get stuck in an infinite loop if we run
> short on memory. Perhaps we could have an Option C which would exit
> this loop gracefully with some kind of error. But I haven't looked at
> the surrounding code to be sure if that is possible.

Agree. I like Option B - an endless loop is the root cause, at the same
time filling up logbuf with useless data is useless. Can't tell if Option C
is feasible, up to networking people.

	-ss

^ permalink raw reply

* Re: [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Leon Romanovsky @ 2018-06-26  4:24 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Doug Ledford, Jason Gunthorpe, Kees Cook, RDMA mailing list,
	Hadar Hen Zion, Matan Barak, Michael J Ruhl, Noa Osherovich,
	Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev,
	linux-kernel
In-Reply-To: <CAKwiHFhgsyWYD+q+JFb2HJEphnjiiOp=o4Airv3MW031q2jx8w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Mon, Jun 25, 2018 at 11:26:05AM +0200, Rasmus Villemoes wrote:
> On 24 June 2018 at 10:23, Leon Romanovsky <leon@kernel.org> wrote:
>
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Add shift_overflow() helper to help driver authors to ensure that
> > shift operand doesn't cause to overflow, which is very common pattern
> > for RDMA drivers.
> >
>
> Not a huge fan. The other _overflow functions have a different behaviour
> (in how they return the result and the overflow status) and are
> type-generic, and I think someone at some point will use such a
> generically-named helper for stuff other than size_t. At least the
> array_size and struct_size helpers have size in their name and are
> specifically about computing the size of something, and are designed to be
> used directly as arguments to allocators, where SIZE_MAX is a suitable
> sentinel. I can't see the other patches in this series, so I don't know how
> you plan on using it, but it should also be usable outside rdma.
>
> Aside: why does b have type size_t?
>
> Does __must_check really make sense for a function without side effects? It
> doesn't tell gcc to warn if the result is not used in a conditional, it
> just warns if the result is not used at all, which wouldn't realistically
> happen for a pure function.
>
> I'd much rather see a type-generic check_shift_overflow (we can agree to
> leave "left" out of the name) with semantics similar to the other
> check_*_overflow functions. Then, if a size_t-eating, SIZE_MAX-returning
> helper is more convenient for rdma, that should be easy to implement on top
> of that. It shouldn't really be that hard to do. Something like
>
> check_shift_overflow(a, s, d) {
>     unsigned _nbits = 8*sizeof(a);
>     typeof(a) _a = (a);
>     typeof(s) _s = (s);
>     typeof(d) _d = (d);
>
>     *_d = ((u64)(_a) << (_s & (_nbits-1)));
>
>     _s >= _nbits || (_s > 0 && (_a >> (_nbits - _s - is_signed_type(a))) !=
> 0);
> }
>
> which should also handle shifts of signed types (though it allows << 0 for
> negative values; that's easy to also disallow). But the exact semantics
> should be documented via a bunch of tests (hint hint) exercising corner
> cases.

I'll respin.

Thanks for the feedback.

>
> Rasmus

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/5] sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams
From: Xin Long @ 2018-06-26  4:33 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: 吉藤英明, Neil Horman, David Miller,
	network dev, linux-sctp, yoshfuji
In-Reply-To: <20180625163157.GA542@localhost.localdomain>

On Tue, Jun 26, 2018 at 12:31 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> Hi,
>
> On Tue, Jun 26, 2018 at 01:12:00AM +0900, 吉藤英明 wrote:
>> Hi,
>>
>> 2018-06-25 22:03 GMT+09:00 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
>> > On Mon, Jun 25, 2018 at 07:28:47AM -0400, Neil Horman wrote:
>> >> On Mon, Jun 25, 2018 at 04:31:26PM +0900, David Miller wrote:
>> >> > From: Xin Long <lucien.xin@gmail.com>
>> >> > Date: Mon, 25 Jun 2018 10:14:35 +0800
>> >> >
>> >> > >  struct sctp_paddrparams {
>> >> > > @@ -773,6 +775,8 @@ struct sctp_paddrparams {
>> >> > >   __u32                   spp_pathmtu;
>> >> > >   __u32                   spp_sackdelay;
>> >> > >   __u32                   spp_flags;
>> >> > > + __u32                   spp_ipv6_flowlabel;
>> >> > > + __u8                    spp_dscp;
>> >> > >  } __attribute__((packed, aligned(4)));
>> >> >
>> >> > I don't think you can change the size of this structure like this.
>> >> >
>> >> > This check in sctp_setsockopt_peer_addr_params():
>> >> >
>> >> >     if (optlen != sizeof(struct sctp_paddrparams))
>> >> >             return -EINVAL;
>> >> >
>> >> > is going to trigger in old kernels when executing programs
>> >> > built against the new struct definition.
>> >
>> > That will happen, yes, but do we really care about being future-proof
>> > here? I mean: if we also update such check(s) to support dealing with
>> > smaller-than-supported structs, newer kernels will be able to run
>> > programs built against the old struct, and the new one; while building
>> > using newer headers and running on older kernel may fool the
>> > application in other ways too (like enabling support for something
>> > that is available on newer kernel and that is not present in the older
>> > one).
>>
>> We should not break existing apps.
>> We still accept apps of pre-2.4 era without sin6_scope_id
>> (e.g., net/ipv6/af_inet6.c:inet6_bind()).
>
> Yes. That's what I tried to say. That is supporting an old app built
> with old kernel headers and running on a newer kernel, and not the
> other way around (an app built with fresh headers and running on an
> old kernel).
To make it, I will update the check like:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1df5d07..c949d8c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2715,13 +2715,18 @@ static int
sctp_setsockopt_peer_addr_params(struct sock *sk,
        struct sctp_sock        *sp = sctp_sk(sk);
        int error;
        int hb_change, pmtud_change, sackdelay_change;
+       int plen = sizeof(params);
+       int old_plen = plen - sizeof(u32) * 2;

-       if (optlen != sizeof(struct sctp_paddrparams))
+       if (optlen != plen && optlen != old_plen)
                return -EINVAL;

        if (copy_from_user(&params, optval, optlen))
                return -EFAULT;

+       if (optlen == old_plen)
+               params.spp_flags &= ~(SPP_DSCP | SPP_IPV6_FLOWLABEL);
+
        /* Validate flags and value parameters. */
        hb_change        = params.spp_flags & SPP_HB;
        pmtud_change     = params.spp_flags & SPP_PMTUD;
@@ -5591,10 +5596,13 @@ static int
sctp_getsockopt_peer_addr_params(struct sock *sk, int len,
        struct sctp_transport   *trans = NULL;
        struct sctp_association *asoc = NULL;
        struct sctp_sock        *sp = sctp_sk(sk);
+       int plen = sizeof(params);
+       int old_plen = plen - sizeof(u32) * 2;

-       if (len < sizeof(struct sctp_paddrparams))
+       if (len < old_plen)
                return -EINVAL;
-       len = sizeof(struct sctp_paddrparams);
+
+       len = len >= plen ? plen : old_plen;
        if (copy_from_user(&params, optval, len))
                return -EFAULT;

does it look ok to you?

^ permalink raw reply related

* Re: [PATCH net-next 1/5] ipv4: add __ip_queue_xmit() that supports tos param
From: Xin Long @ 2018-06-26  4:38 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, network dev, linux-sctp, Marcelo Ricardo Leitner
In-Reply-To: <20180625111312.GA16772@hmswarspite.think-freely.org>

On Mon, Jun 25, 2018 at 7:13 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Mon, Jun 25, 2018 at 04:26:54PM +0900, David Miller wrote:
>> From: Xin Long <lucien.xin@gmail.com>
>> Date: Mon, 25 Jun 2018 10:14:33 +0800
>>
>> > +EXPORT_SYMBOL(__ip_queue_xmit);
>> > +
>> > +int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
>> > +{
>> > +   return __ip_queue_xmit(sk, skb, fl, inet_sk(sk)->tos);
>> > +}
>> >  EXPORT_SYMBOL(ip_queue_xmit);
>>
>> Maybe better to only export __ip_queue_xmit() and make ip_queue_xmit() an
>> inline function in net/ip.h?
>>
> I concur.  No need to export both here
>
will do that.

^ permalink raw reply

* [PATCH] NFC: llcp: fix nfc_llcp_send_ui_frame() lockup
From: Sergey Senozhatsky @ 2018-06-26  4:41 UTC (permalink / raw)
  To: Samuel Ortiz, David S. Miller
  Cc: Steven Rostedt, Petr Mladek, syzkaller-bugs, linux-wireless,
	netdev, linux-kernel, syzbot, sergey.senozhatsky

syzbot reported the following nfc_llcp_send_ui_frame() lockup:

The kernel is CONFIG_PREEMPT_VOLUNTARY=y, llcp_sock_sendmsg() stuck
in an infinite error reporting loop, because the system is low memory
and MSG_DONTWAIT nfc_alloc_send_skb() allocations fail:

        do {
        ...
                pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
                                         frag_len + LLCP_HEADER_SIZE, &err);
                if (pdu == NULL) {
                        pr_err("Could not allocate PDU\n");
                        continue;
                }
        ...
        } while (remaining_len > 0);

nfc_llcp_send_ui_frame() spent enough time (94+ sec) trying to
allocate PDU, which resulted in RCU stall due to PREEMPT_VOLUNTARY:

 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
...
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
 INFO: rcu_sched self-detected stall on CPU
         1-....: (20918 ticks this GP) idle=55a/1/4611686018427387906
 softirq=11347/11347 fqs=20240
          (t=125005 jiffies g=5572 c=5571 q=149)
 NMI backtrace for cpu 1
 CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #115
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
 01/01/2011
 Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1336
  print_cpu_stall kernel/rcu/tree.c:1485 [inline]
  check_cpu_stall.isra.60.cold.78+0x36c/0x5a6 kernel/rcu/tree.c:1553
  __rcu_pending kernel/rcu/tree.c:3244 [inline]
  rcu_pending kernel/rcu/tree.c:3291 [inline]
  rcu_check_callbacks+0x23f/0xcd0 kernel/rcu/tree.c:2646
  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
  __hrtimer_run_queues+0x3eb/0x10c0 kernel/time/hrtimer.c:1460
  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
  smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
  </IRQ>
 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
 RIP: 0010:console_unlock+0xc84/0x10b0 kernel/printk/printk.c:2397
 Code: c1 e8 03 42 80 3c 38 00 0f 85 bd 03 00 00 48 83 3d 38 f7 8e 07 00 0f 84
 69 02 00 00 e8 45 56 19 00 48 8b bd b0 fe ff ff 57 9d <0f> 1f 44 00 00 e9 96
 f5 ff ff e8 2d 56 19 00 48 8b 7d 08 e8 94 cf
 RSP: 0018:ffff8801aab0f358 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
 RAX: ffff8801aa2802c0 RBX: 0000000000000200 RCX: 1ffff10035450163
 RDX: 0000000000000000 RSI: ffffffff8162b8fb RDI: 0000000000000293
 RBP: ffff8801aab0f4c0 R08: ffff8801aa280af8 R09: 0000000000000006
 R10: ffff8801aa2802c0 R11: 0000000000000000 R12: 0000000000000000
 R13: ffffffff84ea9880 R14: 0000000000000001 R15: dffffc0000000000
  vprintk_emit+0x6c6/0xdf0 kernel/printk/printk.c:1907
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
  printk+0xa7/0xcf kernel/printk/printk.c:1981
  nfc_llcp_send_ui_frame.cold.9+0x18/0x1f net/nfc/llcp_commands.c:758
  llcp_sock_sendmsg+0x278/0x350 net/nfc/llcp_sock.c:786
  sock_sendmsg_nosec net/socket.c:645 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:655
  ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
  __do_sys_sendmmsg net/socket.c:2285 [inline]
  __se_sys_sendmmsg net/socket.c:2282 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Address the issues by rate limiting nfc_alloc_send_skb() allocation
error, to avoid logbuf pollution, and do cond_resched() before llcp
attempts to allocate PDU again.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com
---
 net/nfc/llcp_commands.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 2ceefa183cee..e19fadaa9022 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/sched.h>
 #include <linux/nfc.h>
 
 #include <net/nfc/nfc.h>
@@ -755,7 +756,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
 		pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
 					 frag_len + LLCP_HEADER_SIZE, &err);
 		if (pdu == NULL) {
-			pr_err("Could not allocate PDU\n");
+			pr_err_ratelimited("Could not allocate PDU\n");
+			cond_resched();
 			continue;
 		}
 
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCHv2 net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Xin Long @ 2018-06-26  4:41 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: network dev, linux-sctp, Neil Horman, Michael Tuexen, davem
In-Reply-To: <20180625133024.GB820@localhost.localdomain>

On Mon, Jun 25, 2018 at 9:30 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Mon, Jun 25, 2018 at 10:06:46AM +0800, Xin Long wrote:
>> This feature is actually already supported by sk->sk_reuse which can be
>> set by socket level opt SO_REUSEADDR. But it's not working exactly as
>> RFC6458 demands in section 8.1.27, like:
>>
>>   - This option only supports one-to-one style SCTP sockets
>>   - This socket option must not be used after calling bind()
>>     or sctp_bindx().
>>
>> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
>> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
>> work in linux.
>>
>> To separate it from the socket level version, this patch adds 'reuse' in
>> sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
>> setup limitations that are needed when it is being enabled.
>>
>> "It should be noted that the behavior of the socket-level socket option
>> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
>> leaves SO_REUSEADDR as is for the compatibility.
>>
>> Note that the name SCTP_REUSE_PORT is kind of confusing, it is identical
>> to SO_REUSEADDR with some extra restriction, so here it uses 'reuse' in
>> sctp_sock instead of 'reuseport'. As for sk->sk_reuseport support for
>> SCTP, it will be added in another patch.
>
> To help changelog readers later, please update to something like:
>
> """\
> Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
> functionality is nearly identical to SO_REUSEADDR, but with some
> extra restrictions. Here it uses 'reuse' in sctp_sock instead of
> 'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
> added in another patch.
> """
>
> Makes sense, can you note the difference?
Sure, will post v3. thanks.

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Dmitry Vyukov @ 2018-06-26  4:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: syzbot, LKML, Petr Mladek, Steven Rostedt, Sergey Senozhatsky,
	syzkaller-bugs, Samuel Ortiz, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <20180626014924.GB11229@jagdpanzerIV>

On Tue, Jun 26, 2018 at 3:49 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (06/25/18 16:19), syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    77072ca59fdd Merge tag 'for-linus-20180623' of git://git.k..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=169c7c04400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=befbcd7305e41bb0
>> dashboard link: https://syzkaller.appspot.com/bug?extid=d29d18215e477cfbfbdd
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1585147f800000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com
>>
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>> llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>
> Hi,
>
> Thanks for the report.
>
> I'll Cc networking people on this.
>
> I've a strong feeling that we saw it before.

A very similar bug was reported before:

https://groups.google.com/d/msg/syzkaller-bugs/Axw2t6DvU60/TfLUoXsjBAAJ



> The kernel is
> CONFIG_PREEMPT_VOLUNTARY=y, llcp_sock_sendmsg() stuck in a error
> reporting loop:
>
>         do {
>         ...
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
>                         pr_err("Could not allocate PDU\n");
>                         continue;
>                 }
>         ...
>         } while (remaining_len > 0);
>
> [ 1004.674843] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1004.681035] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> ...
> [ 1098.508526] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1098.514698] llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> [ 1098.520844] INFO: rcu_sched self-detected stall on CPU
>
> 94 seconds worth of heavy printing, no preemption:
>
>> INFO: rcu_sched self-detected stall on CPU
>>       1-....: (20918 ticks this GP) idle=55a/1/4611686018427387906
>> softirq=11347/11347 fqs=20240
>>        (t=125005 jiffies g=5572 c=5571 q=149)
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #115
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  <IRQ>
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1336
>>  print_cpu_stall kernel/rcu/tree.c:1485 [inline]
>>  check_cpu_stall.isra.60.cold.78+0x36c/0x5a6 kernel/rcu/tree.c:1553
>>  __rcu_pending kernel/rcu/tree.c:3244 [inline]
>>  rcu_pending kernel/rcu/tree.c:3291 [inline]
>>  rcu_check_callbacks+0x23f/0xcd0 kernel/rcu/tree.c:2646
>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>  __hrtimer_run_queues+0x3eb/0x10c0 kernel/time/hrtimer.c:1460
>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>  </IRQ>
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> [inline]
>> RIP: 0010:console_unlock+0xc84/0x10b0 kernel/printk/printk.c:2397
>> Code: c1 e8 03 42 80 3c 38 00 0f 85 bd 03 00 00 48 83 3d 38 f7 8e 07 00 0f
>> 84 69 02 00 00 e8 45 56 19 00 48 8b bd b0 fe ff ff 57 9d <0f> 1f 44 00 00 e9
>> 96 f5 ff ff e8 2d 56 19 00 48 8b 7d 08 e8 94 cf
>> RSP: 0018:ffff8801aab0f358 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
>> RAX: ffff8801aa2802c0 RBX: 0000000000000200 RCX: 1ffff10035450163
>> RDX: 0000000000000000 RSI: ffffffff8162b8fb RDI: 0000000000000293
>> RBP: ffff8801aab0f4c0 R08: ffff8801aa280af8 R09: 0000000000000006
>> R10: ffff8801aa2802c0 R11: 0000000000000000 R12: 0000000000000000
>> R13: ffffffff84ea9880 R14: 0000000000000001 R15: dffffc0000000000
>>  vprintk_emit+0x6c6/0xdf0 kernel/printk/printk.c:1907
>>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
>>  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
>>  printk+0xa7/0xcf kernel/printk/printk.c:1981
>>  nfc_llcp_send_ui_frame.cold.9+0x18/0x1f net/nfc/llcp_commands.c:758
>>  llcp_sock_sendmsg+0x278/0x350 net/nfc/llcp_sock.c:786
>>  sock_sendmsg_nosec net/socket.c:645 [inline]
>>  sock_sendmsg+0xd5/0x120 net/socket.c:655
>>  ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
>>  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
>>  __do_sys_sendmmsg net/socket.c:2285 [inline]
>>  __se_sys_sendmmsg net/socket.c:2282 [inline]
>>  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
>>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
>
> So we can try switching to ratelimited error reporting
> [that would be option A]:
>
> ---
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..2f3becb709b8 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,7 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
> -                       pr_err("Could not allocate PDU\n");
> +                       pr_err_ratelimited("Could not allocate PDU\n");
>                         continue;
>                 }
>
> ---
>
>
> Or ratelimited error reporting and cond_resched()
> [that would be option B]:
>
> ---
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..61741db4c4e6 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -755,7 +755,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
> -                       pr_err("Could not allocate PDU\n");
> +                       pr_err_ratelimited("Could not allocate PDU\n");
> +                       cond_resched();
>                         continue;
>                 }
>
> ---
>
>         -ss
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180626014924.GB11229%40jagdpanzerIV.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jakub Kicinski @ 2018-06-26  4:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, simon.horman, john.hurley,
	dsahern, mlxsw
In-Reply-To: <20180625210148.9386-1-jiri@resnulli.us>

On Mon, 25 Jun 2018 23:01:39 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> For the TC clsact offload these days, some of HW drivers need
> to hold a magic ball. The reason is, with the first inserted rule inside
> HW they need to guess what fields will be used for the matching. If
> later on this guess proves to be wrong and user adds a filter with a
> different field to match, there's a problem. Mlxsw resolves it now with
> couple of patterns. Those try to cover as many match fields as possible.
> This aproach is far from optimal, both performance-wise and scale-wise.
> Also, there is a combination of filters that in certain order won't
> succeed.
> 
> Most of the time, when user inserts filters in chain, he knows right away
> how the filters are going to look like - what type and option will they
> have. For example, he knows that he will only insert filters of type
> flower matching destination IP address. He can specify a template that
> would cover all the filters in the chain.

Perhaps it's lack of sleep, but this paragraph threw me a little off
the track.  IIUC the goal of this set is to provide a way to inform the
HW about expected matches before any rule is programmed into the HW.
Not before any rule is added to a particular chain.  One can just use
the first rule in the chain to make a guess about the chain, but thanks
to this set user can configure *all* chains before any rules are added.

And that's needed because once any rule is added the tcam config can no
longer be easily modified?

^ permalink raw reply

* Re: [patch net-next 6/9] net: sched: cls_flower: propagate chain teplate creation and destruction to drivers
From: Jakub Kicinski @ 2018-06-26  5:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, simon.horman, john.hurley,
	dsahern, mlxsw
In-Reply-To: <20180625210148.9386-7-jiri@resnulli.us>

On Mon, 25 Jun 2018 23:01:45 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Introduce a couple of flower offload commands in order to propagate
> template creation/destruction events down to device drivers.
> Drivers may use this information to prepare HW in an optimal way
> for future filter insertions.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index d64d43843a3a..276ba25a09c3 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -1120,6 +1120,43 @@ static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg)
>  	}
>  }
>  
> +static void fl_hw_create_tmplt(struct tcf_chain *chain,
> +			       struct fl_flow_tmplt *tmplt,
> +			       struct netlink_ext_ack *extack)
> +{
> +	struct tc_cls_flower_offload cls_flower = {};
> +	struct tcf_block *block = chain->block;
> +	struct tcf_exts dummy_exts = { 0, };
> +
> +	cls_flower.common.chain_index = chain->index;

Did you skip extack on purpose?

> +	cls_flower.command = TC_CLSFLOWER_TMPLT_CREATE;
> +	cls_flower.cookie = (unsigned long) tmplt;
> +	cls_flower.dissector = &tmplt->dissector;
> +	cls_flower.mask = &tmplt->mask;
> +	cls_flower.key = &tmplt->dummy_key;
> +	cls_flower.exts = &dummy_exts;
> +
> +	/* We don't care if driver (any of them) fails to handle this
> +	 * call. It serves just as a hint for it.
> +	 */
> +	tc_setup_cb_call(block, NULL, TC_SETUP_CLSFLOWER,
> +			 &cls_flower, false);
> +}

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Dmitry Vyukov @ 2018-06-26  5:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, syzbot, LKML, Petr Mladek, Sergey Senozhatsky,
	syzkaller-bugs, Samuel Ortiz, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <20180625225937.43aee76c@vmware.local.home>

On Tue, Jun 26, 2018 at 4:59 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 26 Jun 2018 10:49:24 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>
>> So we can try switching to ratelimited error reporting
>> [that would be option A]:
>>
>> ---
>>
>> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
>> index 2ceefa183cee..2f3becb709b8 100644
>> --- a/net/nfc/llcp_commands.c
>> +++ b/net/nfc/llcp_commands.c
>> @@ -755,7 +755,7 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>>               pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>>                                        frag_len + LLCP_HEADER_SIZE, &err);
>>               if (pdu == NULL) {
>> -                     pr_err("Could not allocate PDU\n");
>> +                     pr_err_ratelimited("Could not allocate PDU\n");
>>                       continue;
>>               }
>>
>> ---
>>
>>
>> Or ratelimited error reporting and cond_resched()
>> [that would be option B]:
>
> I don't think this is a printk() issue per se, so I think Option B is
> the only option. You should not get stuck in an infinite loop if we run
> short on memory. Perhaps we could have an Option C which would exit
> this loop gracefully with some kind of error. But I haven't looked at
> the surrounding code to be sure if that is possible.

I suspect this is not even OOM. This is probably some persistent
logical condition that makes the allocation function fail, either we
ask for too much, or socket in some bad state. Potentially this is
even triggerable remotely because there are some remove variables
involved in size calculation.

^ permalink raw reply

* WARNING in sctp_assoc_update_frag_point
From: syzbot @ 2018-06-26  5:06 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, marcelo.leitner, netdev, nhorman,
	syzkaller-bugs, vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    6f0d349d922b Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12a423c0400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
dashboard link: https://syzkaller.appspot.com/bug?extid=f0d9d7cba052f9344b03
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 22543 at include/net/sctp/sctp.h:598 sctp_mtu_payload  
include/net/sctp/sctp.h:598 [inline]
WARNING: CPU: 0 PID: 22543 at include/net/sctp/sctp.h:598  
sctp_assoc_update_frag_point+0x252/0x2c0 net/sctp/associola.c:1401
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 22543 Comm: syz-executor2 Not tainted 4.18.0-rc2+ #117
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  panic+0x238/0x4e7 kernel/panic.c:184
  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:sctp_mtu_payload include/net/sctp/sctp.h:598 [inline]
RIP: 0010:sctp_assoc_update_frag_point+0x252/0x2c0 net/sctp/associola.c:1401
Code: 76 fa 45 39 e5 76 1e e8 0c 69 76 fa 45 29 e5 45 89 ec e9 34 ff ff ff  
e8 fc 68 76 fa 45 8d 66 34 e9 09 ff ff ff e8 ee 68 76 fa <0f> 0b 45 31 e4  
e9 17 ff ff ff e8 7f 3c b4 fa e9 31 fe ff ff 4c 89
RSP: 0018:ffff8801d7def378 EFLAGS: 00010216
RAX: 0000000000040000 RBX: ffff8801d8580ac0 RCX: ffffc900133ca000
RDX: 00000000000001b9 RSI: ffffffff8705a382 RDI: 0000000000000004
RBP: ffff8801d7def3a0 R08: ffff8801cfaa6000 R09: ffffed002e0421af
R10: ffffed002e0421af R11: ffff880170210d7f R12: 0000000000000044
R13: 0000000000000044 R14: 0000000000000010 R15: ffff8801d8580ac0
  sctp_assoc_set_pmtu net/sctp/associola.c:1417 [inline]
  sctp_assoc_sync_pmtu+0x251/0x2e0 net/sctp/associola.c:1445
  sctp_sendmsg_to_asoc+0x1851/0x21a0 net/sctp/socket.c:1921
  sctp_sendmsg+0x13c2/0x1d90 net/sctp/socket.c:2124
  inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
  sock_sendmsg_nosec net/socket.c:645 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:655
  ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
  __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
  __do_sys_sendmmsg net/socket.c:2285 [inline]
  __se_sys_sendmmsg net/socket.c:2282 [inline]
  __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a99
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f75b488ec68 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f75b488f6d4 RCX: 0000000000455a99
RDX: 0000000000000001 RSI: 0000000020005900 RDI: 0000000000000014
RBP: 000000000072bff0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004c0b1b R14: 00000000004d0938 R15: 0000000000000002
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: INFO: rcu detected stall in vprintk_emit
From: Sergey Senozhatsky @ 2018-06-26  5:07 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Steven Rostedt, Sergey Senozhatsky, syzbot, LKML, Petr Mladek,
	Sergey Senozhatsky, syzkaller-bugs, Samuel Ortiz, David S. Miller,
	linux-wireless, netdev
In-Reply-To: <CACT4Y+YWFhUk9cdpSR5+yR3+D6mNFUz2WwRcHULCNpj0-4qXyg@mail.gmail.com>

On (06/26/18 07:03), Dmitry Vyukov wrote:
> > I don't think this is a printk() issue per se, so I think Option B is
> > the only option. You should not get stuck in an infinite loop if we run
> > short on memory. Perhaps we could have an Option C which would exit
> > this loop gracefully with some kind of error. But I haven't looked at
> > the surrounding code to be sure if that is possible.
> 
> I suspect this is not even OOM.

Could be. But at the same time it stalls RCU, so OOM at some point
becomes realistic.

	-ss

^ permalink raw reply

* Re: [PATCH] NFC: llcp: fix nfc_llcp_send_ui_frame() lockup
From: Dmitry Vyukov @ 2018-06-26  5:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Samuel Ortiz, David S. Miller, Steven Rostedt, Petr Mladek,
	syzkaller-bugs, linux-wireless, netdev, LKML, syzbot,
	Sergey Senozhatsky
In-Reply-To: <20180626044119.30118-1-sergey.senozhatsky@gmail.com>

On Tue, Jun 26, 2018 at 6:41 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> syzbot reported the following nfc_llcp_send_ui_frame() lockup:
>
> The kernel is CONFIG_PREEMPT_VOLUNTARY=y, llcp_sock_sendmsg() stuck
> in an infinite error reporting loop, because the system is low memory
> and MSG_DONTWAIT nfc_alloc_send_skb() allocations fail:
>
>         do {
>         ...
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
>                         pr_err("Could not allocate PDU\n");
>                         continue;
>                 }
>         ...
>         } while (remaining_len > 0);
>
> nfc_llcp_send_ui_frame() spent enough time (94+ sec) trying to
> allocate PDU, which resulted in RCU stall due to PREEMPT_VOLUNTARY:
>
>  llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
> ...
>  llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  llcp: nfc_llcp_send_ui_frame: Could not allocate PDU
>  INFO: rcu_sched self-detected stall on CPU
>          1-....: (20918 ticks this GP) idle=55a/1/4611686018427387906
>  softirq=11347/11347 fqs=20240
>           (t=125005 jiffies g=5572 c=5571 q=149)
>  NMI backtrace for cpu 1
>  CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #115
>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
>  01/01/2011
>  Call Trace:
>   <IRQ>
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>   nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>   nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>   arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>   trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>   rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1336
>   print_cpu_stall kernel/rcu/tree.c:1485 [inline]
>   check_cpu_stall.isra.60.cold.78+0x36c/0x5a6 kernel/rcu/tree.c:1553
>   __rcu_pending kernel/rcu/tree.c:3244 [inline]
>   rcu_pending kernel/rcu/tree.c:3291 [inline]
>   rcu_check_callbacks+0x23f/0xcd0 kernel/rcu/tree.c:2646
>   update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>   tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>   tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>   __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>   __hrtimer_run_queues+0x3eb/0x10c0 kernel/time/hrtimer.c:1460
>   hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>   smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>   </IRQ>
>  RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 [inline]
>  RIP: 0010:console_unlock+0xc84/0x10b0 kernel/printk/printk.c:2397
>  Code: c1 e8 03 42 80 3c 38 00 0f 85 bd 03 00 00 48 83 3d 38 f7 8e 07 00 0f 84
>  69 02 00 00 e8 45 56 19 00 48 8b bd b0 fe ff ff 57 9d <0f> 1f 44 00 00 e9 96
>  f5 ff ff e8 2d 56 19 00 48 8b 7d 08 e8 94 cf
>  RSP: 0018:ffff8801aab0f358 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
>  RAX: ffff8801aa2802c0 RBX: 0000000000000200 RCX: 1ffff10035450163
>  RDX: 0000000000000000 RSI: ffffffff8162b8fb RDI: 0000000000000293
>  RBP: ffff8801aab0f4c0 R08: ffff8801aa280af8 R09: 0000000000000006
>  R10: ffff8801aa2802c0 R11: 0000000000000000 R12: 0000000000000000
>  R13: ffffffff84ea9880 R14: 0000000000000001 R15: dffffc0000000000
>   vprintk_emit+0x6c6/0xdf0 kernel/printk/printk.c:1907
>   vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
>   vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
>   printk+0xa7/0xcf kernel/printk/printk.c:1981
>   nfc_llcp_send_ui_frame.cold.9+0x18/0x1f net/nfc/llcp_commands.c:758
>   llcp_sock_sendmsg+0x278/0x350 net/nfc/llcp_sock.c:786
>   sock_sendmsg_nosec net/socket.c:645 [inline]
>   sock_sendmsg+0xd5/0x120 net/socket.c:655
>   ___sys_sendmsg+0x51d/0x930 net/socket.c:2161
>   __sys_sendmmsg+0x240/0x6f0 net/socket.c:2256
>   __do_sys_sendmmsg net/socket.c:2285 [inline]
>   __se_sys_sendmmsg net/socket.c:2282 [inline]
>   __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2282
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Address the issues by rate limiting nfc_alloc_send_skb() allocation
> error, to avoid logbuf pollution, and do cond_resched() before llcp
> attempts to allocate PDU again.
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Reported-by: syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com
> ---
>  net/nfc/llcp_commands.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
> index 2ceefa183cee..e19fadaa9022 100644
> --- a/net/nfc/llcp_commands.c
> +++ b/net/nfc/llcp_commands.c
> @@ -20,6 +20,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/sched.h>
>  #include <linux/nfc.h>
>
>  #include <net/nfc/nfc.h>
> @@ -755,7 +756,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
>                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
>                                          frag_len + LLCP_HEADER_SIZE, &err);
>                 if (pdu == NULL) {
> -                       pr_err("Could not allocate PDU\n");
> +                       pr_err_ratelimited("Could not allocate PDU\n");
> +                       cond_resched();
>                         continue;
>                 }


But this thread is still in an infinite (unkillable?) loop? If yes, we
are waiting for the next syzbot report ;)

I suspect this is not OOM. This is probably some persistent
logical condition that makes the allocation function fail, either we
ask for too much, or socket in some bad state. Potentially this is
even triggerable remotely because there are some remove variables
involved in size calculation.

^ permalink raw reply

* Re: [PATCH] NFC: llcp: fix nfc_llcp_send_ui_frame() lockup
From: Sergey Senozhatsky @ 2018-06-26  5:12 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Sergey Senozhatsky, Samuel Ortiz, David S. Miller, Steven Rostedt,
	Petr Mladek, syzkaller-bugs, linux-wireless, netdev, LKML, syzbot,
	Sergey Senozhatsky
In-Reply-To: <CACT4Y+ZvLqzEwXgdB25MrmAS_j2_HsS8NFaJaMojzMV3prXypg@mail.gmail.com>

On (06/26/18 07:07), Dmitry Vyukov wrote:
[..]
> >  #include <net/nfc/nfc.h>
> > @@ -755,7 +756,8 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap,
> >                 pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT,
> >                                          frag_len + LLCP_HEADER_SIZE, &err);
> >                 if (pdu == NULL) {
> > -                       pr_err("Could not allocate PDU\n");
> > +                       pr_err_ratelimited("Could not allocate PDU\n");
> > +                       cond_resched();
> >                         continue;
> >                 }
> 
> 
> But this thread is still in an infinite (unkillable?) loop? If yes, we
> are waiting for the next syzbot report ;)

The loop is still infinite, correct, but we have a preemption point now.
Sure, net people can come with a much better solution, I'll be happy to
scratch my patch.

	-ss

^ permalink raw reply

* [PATCH net-next v2] net: vhost: improve performance when enable busyloop
From: xiangxia.m.yue @ 2018-06-26  5:17 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, Tonghao Zhang, virtualization

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch improves the guest receive performance from
host. On the handle_tx side, we poll the sock receive
queue at the same time. handle_rx do that in the same way.

For avoiding deadlock, change the code to lock the vq one
by one and use the VHOST_NET_VQ_XX as a subclass for
mutex_lock_nested. With the patch, qemu can set differently
the busyloop_timeout for rx or tx queue.

We set the poll-us=100us and use the iperf3 to test
its throughput. The iperf3 command is shown as below.

on the guest:
iperf3  -s -D

on the host:
iperf3  -c 192.168.1.100 -i 1 -P 10 -t 10 -M 1400

* With the patch:     23.1 Gbits/sec
* Without the patch:  12.7 Gbits/sec

Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
---
 drivers/vhost/net.c   | 106 +++++++++++++++++++++++++++-----------------------
 drivers/vhost/vhost.c |  24 ++++--------
 2 files changed, 66 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e7cf7d2..38e9adb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -429,22 +429,62 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 	return vhost_poll_start(poll, sock->file);
 }
 
+static int sk_has_rx_data(struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	if (sock->ops->peek_len)
+		return sock->ops->peek_len(sock);
+
+	return skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static void vhost_net_busy_poll(struct vhost_net *net,
+				struct vhost_virtqueue *rvq,
+				struct vhost_virtqueue *tvq,
+				bool rx)
+{
+	unsigned long uninitialized_var(endtime);
+	struct socket *sock = rvq->private_data;
+	struct vhost_virtqueue *vq = rx ? tvq : rvq;
+	unsigned long busyloop_timeout = rx ? rvq->busyloop_timeout :
+					      tvq->busyloop_timeout;
+
+	mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+	vhost_disable_notify(&net->dev, vq);
+
+	preempt_disable();
+	endtime = busy_clock() + busyloop_timeout;
+	while (vhost_can_busy_poll(tvq->dev, endtime) &&
+	       !(sock && sk_has_rx_data(sock->sk)) &&
+	       vhost_vq_avail_empty(tvq->dev, tvq))
+		cpu_relax();
+	preempt_enable();
+
+	if ((rx && !vhost_vq_avail_empty(&net->dev, vq)) ||
+	    (!rx && (sock && sk_has_rx_data(sock->sk)))) {
+		vhost_poll_queue(&vq->poll);
+	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+		vhost_disable_notify(&net->dev, vq);
+		vhost_poll_queue(&vq->poll);
+	}
+
+	mutex_unlock(&vq->mutex);
+}
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_virtqueue *vq,
 				    struct iovec iov[], unsigned int iov_size,
 				    unsigned int *out_num, unsigned int *in_num)
 {
-	unsigned long uninitialized_var(endtime);
+	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
+
 	int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				  out_num, in_num, NULL, NULL);
 
 	if (r == vq->num && vq->busyloop_timeout) {
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-		while (vhost_can_busy_poll(vq->dev, endtime) &&
-		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax();
-		preempt_enable();
+		vhost_net_busy_poll(net, &nvq_rx->vq, vq, false);
+
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
@@ -484,7 +524,7 @@ static void handle_tx(struct vhost_net *net)
 	bool zcopy, zcopy_used;
 	int sent_pkts = 0;
 
-	mutex_lock(&vq->mutex);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
@@ -621,16 +661,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	return len;
 }
 
-static int sk_has_rx_data(struct sock *sk)
-{
-	struct socket *sock = sk->sk_socket;
-
-	if (sock->ops->peek_len)
-		return sock->ops->peek_len(sock);
-
-	return skb_queue_empty(&sk->sk_receive_queue);
-}
-
 static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 {
 	struct vhost_virtqueue *vq = &nvq->vq;
@@ -645,39 +675,19 @@ static void vhost_rx_signal_used(struct vhost_net_virtqueue *nvq)
 
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
-	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
-	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
-	struct vhost_virtqueue *vq = &nvq->vq;
-	unsigned long uninitialized_var(endtime);
-	int len = peek_head_len(rvq, sk);
-
-	if (!len && vq->busyloop_timeout) {
-		/* Flush batched heads first */
-		vhost_rx_signal_used(rvq);
-		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, 1);
-		vhost_disable_notify(&net->dev, vq);
-
-		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-
-		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       !sk_has_rx_data(sk) &&
-		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax();
+	struct vhost_net_virtqueue *nvq_rx = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_net_virtqueue *nvq_tx = &net->vqs[VHOST_NET_VQ_TX];
 
-		preempt_enable();
+	int len = peek_head_len(nvq_rx, sk);
 
-		if (!vhost_vq_avail_empty(&net->dev, vq))
-			vhost_poll_queue(&vq->poll);
-		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-			vhost_disable_notify(&net->dev, vq);
-			vhost_poll_queue(&vq->poll);
-		}
+	if (!len && nvq_rx->vq.busyloop_timeout) {
+		/* Flush batched heads first */
+		vhost_rx_signal_used(nvq_rx);
 
-		mutex_unlock(&vq->mutex);
+		/* Both tx vq and rx socket were polled here */
+		vhost_net_busy_poll(net, &nvq_rx->vq, &nvq_tx->vq, true);
 
-		len = peek_head_len(rvq, sk);
+		len = peek_head_len(nvq_rx, sk);
 	}
 
 	return len;
@@ -789,7 +799,7 @@ static void handle_rx(struct vhost_net *net)
 	__virtio16 num_buffers;
 	int recv_pkts = 0;
 
-	mutex_lock_nested(&vq->mutex, 0);
+	mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
 	sock = vq->private_data;
 	if (!sock)
 		goto out;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 895eaa2..1716b10 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,8 +294,11 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->nvqs; ++i) {
+		mutex_lock(&d->vqs[i]->mutex);
 		__vhost_vq_meta_reset(d->vqs[i]);
+		mutex_unlock(&d->vqs[i]->mutex);
+	}
 }
 
 static void vhost_vq_reset(struct vhost_dev *dev,
@@ -887,19 +890,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 #define vhost_get_used(vq, x, ptr) \
 	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
 
-static void vhost_dev_lock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_lock_nested(&d->vqs[i]->mutex, i);
-}
-
-static void vhost_dev_unlock_vqs(struct vhost_dev *d)
-{
-	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
-		mutex_unlock(&d->vqs[i]->mutex);
-}
 
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
@@ -950,7 +940,11 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
 		if (msg->iova <= vq_msg->iova &&
 		    msg->iova + msg->size - 1 > vq_msg->iova &&
 		    vq_msg->type == VHOST_IOTLB_MISS) {
+
+			mutex_lock(&node->vq->mutex);
 			vhost_poll_queue(&node->vq->poll);
+			mutex_unlock(&node->vq->mutex);
+
 			list_del(&node->node);
 			kfree(node);
 		}
@@ -982,7 +976,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 	int ret = 0;
 
 	mutex_lock(&dev->mutex);
-	vhost_dev_lock_vqs(dev);
 	switch (msg->type) {
 	case VHOST_IOTLB_UPDATE:
 		if (!dev->iotlb) {
@@ -1016,7 +1009,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		break;
 	}
 
-	vhost_dev_unlock_vqs(dev);
 	mutex_unlock(&dev->mutex);
 
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related

* WARNING: ODEBUG bug in sock_hash_free
From: syzbot @ 2018-06-26  5:30 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1725589f800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=71aeaaf993d216185076
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+71aeaaf993d216185076@syzkaller.appspotmail.com

------------[ cut here ]------------
ODEBUG: free active (active state 1) object type: rcu_head hint:            
(null)
WARNING: CPU: 1 PID: 4959 at lib/debugobjects.c:329  
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4959 Comm: kworker/1:3 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events bpf_map_free_deferred
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  panic+0x22f/0x4de kernel/panic.c:184
  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Code: 1a 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 92 00 00 00 48 8b 14 dd  
60 75 1a 88 4c 89 f6 48 c7 c7 e0 6a 1a 88 e8 06 62 ec fd <0f> 0b 83 05 39  
5b 44 06 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f
RSP: 0018:ffff880198e47490 EFLAGS: 00010082
RAX: 0000000000000051 RBX: 0000000000000003 RCX: ffffffff81854ed8
RDX: 0000000000000000 RSI: ffffffff8161f371 RDI: 0000000000000001
RBP: ffff880198e474d0 R08: ffff8801d84b2240 R09: ffffed003b5e3ec2
R10: ffffed003b5e3ec2 R11: ffff8801daf1f617 R12: 0000000000000001
R13: ffffffff88f91d80 R14: ffffffff881a6f80 R15: 0000000000000000
  __debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
  debug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
  kfree+0xc7/0x260 mm/slab.c:3812
  sock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
  process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
  worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
  kthread+0x345/0x410 kernel/kthread.c:240
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

======================================================
WARNING: possible circular locking dependency detected
4.17.0+ #39 Not tainted
------------------------------------------------------
kworker/1:3/4959 is trying to acquire lock:
00000000190110fa ((console_sem).lock){-...}, at: down_trylock+0x13/0x70  
kernel/locking/semaphore.c:136

but task is already holding lock:
00000000af3150e8 (&obj_hash[i].lock){-.-.}, at: __debug_check_no_obj_freed  
lib/debugobjects.c:774 [inline]
00000000af3150e8 (&obj_hash[i].lock){-.-.}, at:  
debug_check_no_obj_freed+0x159/0x584 lib/debugobjects.c:815

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&obj_hash[i].lock){-.-.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
        __debug_object_init+0x11f/0x12c0 lib/debugobjects.c:381
        debug_object_init+0x16/0x20 lib/debugobjects.c:429
        debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
        debug_init kernel/time/hrtimer.c:458 [inline]
        hrtimer_init+0x8f/0x460 kernel/time/hrtimer.c:1308
        init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1056
        __sched_fork+0x2a8/0x570 kernel/sched/core.c:2184
        init_idle+0x75/0x7a0 kernel/sched/core.c:5404
        sched_init+0xbeb/0xd10 kernel/sched/core.c:6102
        start_kernel+0x475/0x92d init/main.c:602
        x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
        x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242

-> #2 (&rq->lock){-.-.}:
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
        rq_lock kernel/sched/sched.h:1805 [inline]
        task_fork_fair+0x8a/0x660 kernel/sched/fair.c:9953
        sched_fork+0x43e/0xb30 kernel/sched/core.c:2380
        copy_process.part.38+0x1bf1/0x7180 kernel/fork.c:1765
        copy_process kernel/fork.c:1608 [inline]
        _do_fork+0x291/0x12a0 kernel/fork.c:2091
        kernel_thread+0x34/0x40 kernel/fork.c:2150
        rest_init+0x22/0xe4 init/main.c:408
        start_kernel+0x906/0x92d init/main.c:738
        x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
        x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242

-> #1 (&p->pi_lock){-.-.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
        try_to_wake_up+0xca/0x1280 kernel/sched/core.c:1984
        wake_up_process+0x10/0x20 kernel/sched/core.c:2147
        __up.isra.1+0x1b8/0x290 kernel/locking/semaphore.c:262
        up+0x12f/0x1b0 kernel/locking/semaphore.c:187
        __up_console_sem+0xbe/0x1b0 kernel/printk/printk.c:242
        console_unlock+0x79a/0x10a0 kernel/printk/printk.c:2411
        vprintk_emit+0x6b2/0xde0 kernel/printk/printk.c:1907
        vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
        vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
        printk+0x9e/0xba kernel/printk/printk.c:1981
        load_umh+0x51/0xbd net/bpfilter/bpfilter_kern.c:99
        do_one_initcall+0x127/0x913 init/main.c:884
        do_initcall_level init/main.c:952 [inline]
        do_initcalls init/main.c:960 [inline]
        do_basic_setup init/main.c:978 [inline]
        kernel_init_freeable+0x49b/0x58e init/main.c:1135
        kernel_init+0x11/0x1b3 init/main.c:1061
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

-> #0 ((console_sem).lock){-...}:
        lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3924
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
        down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
        __down_trylock_console_sem+0xae/0x200 kernel/printk/printk.c:225
        console_trylock+0x15/0xa0 kernel/printk/printk.c:2230
        console_trylock_spinning kernel/printk/printk.c:1643 [inline]
        vprintk_emit+0x699/0xde0 kernel/printk/printk.c:1906
        vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
        vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
        printk+0x9e/0xba kernel/printk/printk.c:1981
        __warn_printk+0x83/0xd0 kernel/panic.c:590
        debug_print_object+0x16a/0x210 lib/debugobjects.c:326
        __debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
        debug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
        kfree+0xc7/0x260 mm/slab.c:3812
        sock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
        bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
        process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
        worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

other info that might help us debug this:

Chain exists of:
   (console_sem).lock --> &rq->lock --> &obj_hash[i].lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&obj_hash[i].lock);
                                lock(&rq->lock);
                                lock(&obj_hash[i].lock);
   lock((console_sem).lock);

  *** DEADLOCK ***

4 locks held by kworker/1:3/4959:
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: atomic64_set  
include/asm-generic/atomic-instrumented.h:40 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: atomic_long_set  
include/asm-generic/atomic-long.h:59 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: set_work_data  
kernel/workqueue.c:617 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at:  
process_one_work+0xb35/0x1b70 kernel/workqueue.c:2124
  #1: 00000000776b40d0 ((work_completion)(&map->work)){+.+.}, at:  
process_one_work+0xb8c/0x1b70 kernel/workqueue.c:2128
  #2: 000000002a359661 (rcu_read_lock){....}, at: sock_hash_free+0x0/0x6e0  
include/net/sock.h:2176
  #3: 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at:  
__debug_check_no_obj_freed lib/debugobjects.c:774 [inline]
  #3: 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at:  
debug_check_no_obj_freed+0x159/0x584 lib/debugobjects.c:815

stack backtrace:
CPU: 1 PID: 4959 Comm: kworker/1:3 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events bpf_map_free_deferred
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_circular_bug.isra.36.cold.56+0x1bd/0x27d  
kernel/locking/lockdep.c:1227
  check_prev_add kernel/locking/lockdep.c:1867 [inline]
  check_prevs_add kernel/locking/lockdep.c:1980 [inline]
  validate_chain kernel/locking/lockdep.c:2421 [inline]
  __lock_acquire+0x343e/0x5140 kernel/locking/lockdep.c:3435
  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3924
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
  down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
  __down_trylock_console_sem+0xae/0x200 kernel/printk/printk.c:225
  console_trylock+0x15/0xa0 kernel/printk/printk.c:2230
  console_trylock_spinning kernel/printk/printk.c:1643 [inline]
  vprintk_emit+0x699/0xde0 kernel/printk/printk.c:1906
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
  printk+0x9e/0xba kernel/printk/printk.c:1981
  __warn_printk+0x83/0xd0 kernel/panic.c:590
  debug_print_object+0x16a/0x210 lib/debugobjects.c:326
  __debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
  debug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
  kfree+0xc7/0x260 mm/slab.c:3812
  sock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
  process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
  worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
  kthread+0x345/0x410 kernel/kthread.c:240
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Shutting down cpus with NMI
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Ka-Cheong Poon @ 2018-06-26  5:30 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <20180625175006.GI14823@oracle.com>

On 06/26/2018 01:50 AM, Sowmini Varadhan wrote:

>> If a socket is bound, I guess the scope_id should be used.  So
>> if a socket is not bound to a link local address and the socket
>> is used to sent to a link local peer, it should fail.
> 
> PF_RDS sockets *MUST* alwasy be bound.  See
> Documentation/networking/rds.txt:
> "   Sockets must be bound before you can send or receive data.
>      This is needed because binding also selects a transport and
>      attaches it to the socket. Once bound, the transport assignment
>      does not change."
> 
> Also, rds_sendmsg checks this (from net-next, your version
> has the equivalent ipv6_addr_any etc check):
> 
>          if (daddr == 0 || rs->rs_bound_addr == 0) {
>                  release_sock(sk);
>                  ret = -ENOTCONN; /* XXX not a great errno */
>                  goto out;
>          }


I think you misunderstood what I wrote.  The above is in response
to your original question:

--
 > And this is even more confusing because the fastpath in rds_sendmsg
 > does not take the bound_scope_id into consideration at all:
 > 1213         if (rs->rs_conn && 
ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr))
 > 1214                 conn = rs->rs_conn;
 > 1215         else {
 > 1216                 conn = rds_conn_create_outgoing( /* .. */, scope_id)
 > so if I erroneously passed a msg_name on a connected rds socket, what
 > would happen? (see also question about rds_connect() itself, below)

^ permalink raw reply

* Re: [PATCH v2] selftests: bpf: notification about privilege required to run test_lwt_seg6local.sh testing script
From: Song Liu @ 2018-06-26  5:33 UTC (permalink / raw)
  To: Jeffrin Jose T
  Cc: Alexei Starovoitov, Daniel Borkmann, shuah, Networking, open list,
	linux-kselftest
In-Reply-To: <20180622214032.6527-1-ahiliation@gmail.com>

On Fri, Jun 22, 2018 at 2:40 PM, Jeffrin Jose T <ahiliation@gmail.com> wrote:
> This test needs root privilege for it's successful execution.
>
> This patch is atleast used to notify the user about the privilege
> the script demands for the  smooth execution of the test.
>
> Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/testing/selftests/bpf/test_lwt_seg6local.sh | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
> index 1c77994b5e71..30575577a8b2 100755
> --- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh
> +++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
> @@ -21,6 +21,15 @@
>  # An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
>  # datagram can be read on NS6 when binding to fb00::6.
>
> +# Kselftest framework requirement - SKIP code is 4.
> +ksft_skip=4
> +
> +msg="skip all tests:"
> +if [ $UID != 0 ]; then
> +    echo $msg please run this as root >&2
> +    exit $ksft_skip
> +fi
> +
>  TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
>
>  cleanup()
> --
> 2.17.0
>

^ permalink raw reply

* Re: [PATCH bpf-next 1/7] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
From: Song Liu @ 2018-06-26  5:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking,
	Jiong Wang
In-Reply-To: <20180625035421.2991-2-jakub.kicinski@netronome.com>

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> Map read has been supported on NFP, this patch enables optimization for
> memcpy from map to packet.
>
> This patch also fixed one latent bug which will cause copying from
> unexpected address once memcpy for map pointer enabled.
>
> Reported-by: Mary Pham <mary.pham@netronome.com>
> Reported-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  drivers/net/ethernet/netronome/nfp/bpf/jit.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> index 8a92088df0d7..33111739b210 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
> @@ -670,7 +670,7 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>         xfer_num = round_up(len, 4) / 4;
>
>         if (src_40bit_addr)
> -               addr40_offset(nfp_prog, meta->insn.src_reg, off, &src_base,
> +               addr40_offset(nfp_prog, meta->insn.src_reg * 2, off, &src_base,
>                               &off);

Did this break other cases before this patch?

I am sorry if this is a dumb question. I don't think I fully
understand addr40_offset().

Song

>
>         /* Setup PREV_ALU fields to override memory read length. */
> @@ -3299,7 +3299,8 @@ curr_pair_is_memcpy(struct nfp_insn_meta *ld_meta,
>         if (!is_mbpf_load(ld_meta) || !is_mbpf_store(st_meta))
>                 return false;
>
> -       if (ld_meta->ptr.type != PTR_TO_PACKET)
> +       if (ld_meta->ptr.type != PTR_TO_PACKET &&
> +           ld_meta->ptr.type != PTR_TO_MAP_VALUE)
>                 return false;
>
>         if (st_meta->ptr.type != PTR_TO_PACKET)
> --
> 2.17.1
>

^ permalink raw reply

* Re: WARNING in sctp_assoc_update_frag_point
From: Xin Long @ 2018-06-26  6:04 UTC (permalink / raw)
  To: syzbot
  Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
	Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <0000000000005d2af6056f8474ac@google.com>

On Tue, Jun 26, 2018 at 1:06 PM, syzbot
<syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    6f0d349d922b Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12a423c0400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
> dashboard link: https://syzkaller.appspot.com/bug?extid=f0d9d7cba052f9344b03
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+f0d9d7cba052f9344b03@syzkaller.appspotmail.com
>
> WARNING: CPU: 0 PID: 22543 at include/net/sctp/sctp.h:598 sctp_mtu_payload
> include/net/sctp/sctp.h:598 [inline]
> WARNING: CPU: 0 PID: 22543 at include/net/sctp/sctp.h:598
> sctp_assoc_update_frag_point+0x252/0x2c0 net/sctp/associola.c:1401
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 0 PID: 22543 Comm: syz-executor2 Not tainted 4.18.0-rc2+ #117
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>  panic+0x238/0x4e7 kernel/panic.c:184
>  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
>  report_bug+0x252/0x2d0 lib/bug.c:186
>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
> RIP: 0010:sctp_mtu_payload include/net/sctp/sctp.h:598 [inline]
> RIP: 0010:sctp_assoc_update_frag_point+0x252/0x2c0 net/sctp/associola.c:1401
> Code: 76 fa 45 39 e5 76 1e e8 0c 69 76 fa 45 29 e5 45 89 ec e9 34 ff ff ff
> e8 fc 68 76 fa 45 8d 66 34 e9 09 ff ff ff e8 ee 68 76 fa <0f> 0b 45 31 e4 e9
> 17 ff ff ff e8 7f 3c b4 fa e9 31 fe ff ff 4c 89
> RSP: 0018:ffff8801d7def378 EFLAGS: 00010216
> RAX: 0000000000040000 RBX: ffff8801d8580ac0 RCX: ffffc900133ca000
> RDX: 00000000000001b9 RSI: ffffffff8705a382 RDI: 0000000000000004
> RBP: ffff8801d7def3a0 R08: ffff8801cfaa6000 R09: ffffed002e0421af
> R10: ffffed002e0421af R11: ffff880170210d7f R12: 0000000000000044
> R13: 0000000000000044 R14: 0000000000000010 R15: ffff8801d8580ac0
>  sctp_assoc_set_pmtu net/sctp/associola.c:1417 [inline]
>  sctp_assoc_sync_pmtu+0x251/0x2e0 net/sctp/associola.c:1445
We may need a fix:
@@ -282,7 +282,10 @@ bool sctp_transport_update_pmtu(struct
sctp_transport *t, u32 pmtu)

        if (dst) {
                /* Re-fetch, as under layers may have a higher minimum size */
-               pmtu = SCTP_TRUNC4(dst_mtu(dst));
+               int dst_mtu = SCTP_TRUNC4(dst_mtu(dst));
+
+               if (pmtu < dst_mtu)
+                       pmtu = dst_mtu;
                change = t->pathmtu != pmtu;

to make sure the t->pathmtu never get a value smaller than MINSEGMENT,
also meets the comments:
"Re-fetch, as under layers may have a higher minimum size".

^ permalink raw reply

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
From: Song Liu @ 2018-06-26  6:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, oss-drivers, Networking,
	Jiong Wang
In-Reply-To: <20180625035421.2991-3-jakub.kicinski@netronome.com>

On Sun, Jun 24, 2018 at 8:54 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
>
> The new added "reciprocal_value_adv" implements the advanced version of the
> algorithm described in Figure 4.2 of the paper except when dividend has MSB
> set which would require u128 divide on host and actually could be easily
> handled before calling the new "reciprocal_value_adv".
>
> The advanced version requires more complex calculation to get the
> reciprocal multiplier and other control variables, but then could reduce
> the required emulation operations.
>
> It makes no sense to use this advanced version for host divide emulation,
> those extra complexities for calculating multiplier etc could completely
> waive our saving on emulation operations.
>
> However, it makes sense to use it for JIT divide code generation (for
> example eBPF JIT backends) for which we are willing to trade performance of
> JITed code with that of host. As shown by the following pseudo code, the
> required emulation operations could go down from 6 (the basic version) to 3
> or 4.
>
> To use the result of "reciprocal_value_adv", suppose we want to calculate
> n/d, the C-style pseudo code will be the following, it could be easily
> changed to real code generation for other JIT targets.
>
>   struct reciprocal_value_adv rvalue;
>   u8 pre_shift, exp;
>
>   if (d >= (1u << 31)) {
>     result = n >= d;
>     return;
>   }
>   rvalue = reciprocal_value_adv(d, 32)
>   exp = rvalue.exp;
>   if (rvalue.is_wide_m && !(d & 1)) {
>     pre_shift = fls(d & -d) - 1;
>     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
>   } else {
>     pre_shift = 0;
>   }
>
>   // code generation starts.
>   if (imm == 1 << exp) {
>     result = n >> exp;
>   } else if (rvalue.is_wide_m) {
>     // pre_shift must be zero when reached here.
>     t = (n * rvalue.m) >> 32;
>     result = n - t;
>     result >>= 1;
>     result += t;
>     result >>= rvalue.sh - 1;
>   } else {
>     if (pre_shift)
>       result = n >> pre_shift;
>     result = ((u64)result * rvalue.m) >> 32;
>     result >>= rvalue.sh;
>   }
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  include/linux/reciprocal_div.h | 65 ++++++++++++++++++++++++++++++++++
>  lib/reciprocal_div.c           | 37 +++++++++++++++++++
>  2 files changed, 102 insertions(+)
>
> diff --git a/include/linux/reciprocal_div.h b/include/linux/reciprocal_div.h
> index e031e9f2f9d8..5a695e4697d3 100644
> --- a/include/linux/reciprocal_div.h
> +++ b/include/linux/reciprocal_div.h
> @@ -25,6 +25,9 @@ struct reciprocal_value {
>         u8 sh1, sh2;
>  };
>
> +/* "reciprocal_value" and "reciprocal_divide" together implement the basic
> + * version of the algorithm described in Figure 4.1 of the paper.
> + */
>  struct reciprocal_value reciprocal_value(u32 d);
>
>  static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
> @@ -33,4 +36,66 @@ static inline u32 reciprocal_divide(u32 a, struct reciprocal_value R)
>         return (t + ((a - t) >> R.sh1)) >> R.sh2;
>  }
>
> +struct reciprocal_value_adv {
> +       u32 m;
> +       u8 sh, exp;
> +       bool is_wide_m;
> +};
> +
> +/* "reciprocal_value_adv" implements the advanced version of the algorithm
> + * described in Figure 4.2 of the paper except when dividend has MSB set which
> + * would require u128 divide on host and actually could be easily handled before
> + * calling "reciprocal_value_adv".
> + *
> + * The advanced version requires more complex calculation to get the reciprocal
> + * multiplier and other control variables, but then could reduce the required
> + * emulation operations.
> + *
> + * It makes no sense to use this advanced version for host divide emulation,
> + * those extra complexities for calculating multiplier etc could completely
> + * waive our saving on emulation operations.
> + *
> + * However, it makes sense to use it for JIT divide code generation for which
> + * we are willing to trade performance of JITed code with that of host. As shown
> + * by the following pseudo code, the required emulation operations could go down
> + * from 6 (the basic version) to 3 or 4.
> + *
> + * To use the result of "reciprocal_value_adv", suppose we want to calculate
> + * n/d:
> + *
> + *   struct reciprocal_value_adv rvalue;
> + *   u8 pre_shift, exp;
> + *
> + *   if (d >= (1u << 31)) {
> + *     result = n >= d;
> + *     return;
> + *   }
> + *   rvalue = reciprocal_value_adv(d, 32)
> + *   exp = rvalue.exp;
> + *   if (rvalue.is_wide_m && !(d & 1)) {
> + *     pre_shift = fls(d & -d) - 1;
> + *     rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
> + *   } else {
> + *     pre_shift = 0;
> + *   }
> + *
> + *   // code generation starts.
> + *   if (imm == 1 << exp) {
> + *     result = n >> exp;
> + *   } else if (rvalue.is_wide_m) {
> + *     // pre_shift must be zero when reached here.
> + *     t = (n * rvalue.m) >> 32;
> + *     result = n - t;
> + *     result >>= 1;
> + *     result += t;
> + *     result >>= rvalue.sh - 1;
> + *   } else {
> + *     if (pre_shift)
> + *       result = n >> pre_shift;
> + *     result = ((u64)result * rvalue.m) >> 32;
> + *     result >>= rvalue.sh;
> + *   }
> + */
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec);
> +
>  #endif /* _LINUX_RECIPROCAL_DIV_H */
> diff --git a/lib/reciprocal_div.c b/lib/reciprocal_div.c
> index fcb4ce682c6f..a41501ebad7c 100644
> --- a/lib/reciprocal_div.c
> +++ b/lib/reciprocal_div.c
> @@ -26,3 +26,40 @@ struct reciprocal_value reciprocal_value(u32 d)
>         return R;
>  }
>  EXPORT_SYMBOL(reciprocal_value);
> +
> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
> +{
> +       struct reciprocal_value_adv R;
> +       u32 l, post_shift;
> +       u64 mhigh, mlow;
> +
> +       l = fls(d - 1);
> +       post_shift = l;
> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
> +        * MSB set. This case needs to be handled before calling
> +        * "reciprocal_value_adv", please see the comment at
> +        * include/linux/reciprocal_div.h.
> +        */

Shall we handle l == 32 case better? I guess the concern here is extra
handling may
slow down the fast path? If that's the case, we should at least add a
WARNING on the
slow path.

Thanks,
Song


> +       mlow = 1ULL << (32 + l);
> +       do_div(mlow, d);
> +       mhigh = (1ULL << (32 + l)) + (1ULL << (32 + l - prec));
> +       do_div(mhigh, d);
> +
> +       for (; post_shift > 0; post_shift--) {
> +               u64 lo = mlow >> 1, hi = mhigh >> 1;
> +
> +               if (lo >= hi)
> +                       break;
> +
> +               mlow = lo;
> +               mhigh = hi;
> +       }
> +
> +       R.m = (u32)mhigh;
> +       R.sh = post_shift;
> +       R.exp = l;
> +       R.is_wide_m = mhigh > U32_MAX;
> +
> +       return R;
> +}
> +EXPORT_SYMBOL(reciprocal_value_adv);
> --
> 2.17.1
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox