Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net-next 07/27] gianfar: remove use of VLAN_TAG_PRESENT
From: Claudiu Manoil @ 2016-12-13 12:09 UTC (permalink / raw)
  To: Michał Mirosław, netdev@vger.kernel.org
In-Reply-To: <244d34e8fb9a120fa79c40f06e9da7e10c1c0536.1481586602.git.mirq-linux@rere.qmqm.pl>

>-----Original Message-----
>From: Michał Mirosław [mailto:mirq-linux@rere.qmqm.pl]
>Sent: Tuesday, December 13, 2016 2:13 AM
>To: netdev@vger.kernel.org
>Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
>Subject: [PATCH net-next 07/27] gianfar: remove use of VLAN_TAG_PRESENT
>
>Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
>---
> drivers/net/ethernet/freescale/gianfar_ethtool.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c
>b/drivers/net/ethernet/freescale/gianfar_ethtool.c
>index 56588f2..95fa647 100644
>--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
>+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
>@@ -1155,11 +1155,9 @@ static int gfar_convert_to_filer(struct
>ethtool_rx_flow_spec *rule,
> 		prio = vlan_tci_prio(rule);
> 		prio_mask = vlan_tci_priom(rule);
>
>-		if (cfi == VLAN_TAG_PRESENT && cfi_mask ==
>VLAN_TAG_PRESENT) {
>-			vlan |= RQFPR_CFI;
>-			vlan_mask |= RQFPR_CFI;
>-		} else if (cfi != VLAN_TAG_PRESENT &&
>-			   cfi_mask == VLAN_TAG_PRESENT) {
>+		if (cfi_mask) {
>+			if (cfi)
>+				vlan |= RQFPR_CFI;
> 			vlan_mask |= RQFPR_CFI;
> 		}
> 	}

Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>

^ permalink raw reply

* Re: [PATCH iproute2 -net-next] lwt: BPF support for LWT
From: Stephen Hemminger @ 2016-12-12 23:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: tgraf, alexei.starovoitov, netdev
In-Reply-To: <43d8d9ddc604f83e9abff9f998b9581210529c30.1481501217.git.daniel@iogearbox.net>

On Mon, 12 Dec 2016 01:14:35 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> +
> +static int lwt_parse_bpf(struct rtattr *rta, size_t len, int *argcp, char ***argvp,
> +			 int attr, const enum bpf_prog_type bpf_type)

Please break long lines like this.


> +
> +	/* argv is currently the first unparsed argument,
> +	 * but the lwt_parse_encap() caller will move to the next,
> +	 * so step back */
> +	*argcp = argc + 1;

iproute2 uses kernel comment style. 

I went ahead and fixed these.

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Wei Xu @ 2016-12-13 17:46 UTC (permalink / raw)
  To: Theodore Ts'o, jasowang; +Cc: netdev, mst, nhorman, davem
In-Reply-To: <20161212233343.q5xlv55rc5npqaqp@thunk.org>


On 2016年12月13日 07:33, Theodore Ts'o wrote:
> Hi,
>
> I was doing a last minute regression test of the ext4 tree before
> sending a pull request to Linus, which I do using gce-xfstests[1], and
> I found that using networking was broken on GCE on linux-next.  I was
> using next-20161209, and after bisecting things, I narrowed down the
> commit which causing things to break to commit 449000102901:
> "virtio-net: enable multiqueue by default".  Reverting this commit on
> top of next-20161209 fixed the problem.
>
> [1] http://thunk.org/gce-xfstests
>
> You can reproduce the problem for building the kernel for Google
> Compute Engine --- I use a config such as this [2], and then try to
> boot a kernel on a VM.  The way I do this involves booting a test
> appliance and then kexec'ing into the kernel to be tested[3], using a
> 2cpu configuration.  (GCE machine type: n1-standard-2)
>
> [2] https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-4.9
> [3] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
>
> You can then take a look at serial console using a command such as
> "gcloud compute instances get-serial-port-output <instance-name>", and
> you will get something like this (see attached).  The important bit is
> that the dhclient command is completely failing to be able to get a
> response from the network, from which I deduce that apparently that
> either networking send or receive or both seem to be badly affected by
> the commit in question.
>
> Please let me know if there's anything I can do to help you debug this
> further.

Hi Ted,
Just had a quick try on GCE, sorry for my stupid questions.

Q1:
Which distribution are you using for the GCE instance?

Q2:
Are you running xfs test as an embedded VM case, which means XFS test
appliance is also a VM inside the GCE instance? Or the kernel is built
for the instance itself?

Q3:
Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
a local test bed if it makes sense.

>
> Cheers,
>
> 						- Ted
>
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Linux version 4.9.0-rc8-ext4-06387-g03e5cbd (tytso@tytso-ssd) (gcc version 4.9.2 (Debian 4.9.2-10) ) #9 SMP Mon Dec 12 04:50:16 UTC 2016
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 elevator=noop console=ttyS0  fstestcfg=4k fstestset=-g,quick fstestexc= fstestopt=aex fstesttyp=ext4 fstestapi=1.3
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load Kernel Modules.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Apply Kernel Variables...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting Configuration File System...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting FUSE Control File System...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted FUSE Control File System.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted Configuration File System.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Apply Kernel Variables.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Static Device Nodes in /dev.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Kernel Device Manager...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Kernel Device Manager.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Coldplug all Devices.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Wait for Complete Device Initialization...
> Dec 11 23:53:20 xfstests-201612120451 systemd-fsck[1659]: xfstests-root: clean, 56268/655360 files, 357439/2620928 blocks
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started File System Check on Root Device.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remount Root and Kernel File Systems...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Remount Root and Kernel File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Various fixups to make systemd work better on Debian.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Load/Save Random Seed...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems (Pre).
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems (Pre).
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load/Save Random Seed.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Wait for Complete Device Initialization.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Copy rules generated while the root was ro...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS0.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS1.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Copy rules generated while the root was ro.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS2.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS3.
> Dec 11 23:53:20 xfstests-201612120451 systemd-udevd[2568]: could not open moddep file '/lib/modules/4.9.0-rc8-ext4-06387-g03e5cbd/modules.dep.bin'
> Dec 11 23:53:20 xfstests-201612120451 lvm[2579]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Encrypted Volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Encrypted Volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
> Dec 11 23:53:20 xfstests-201612120451 lvm[2625]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
> Dec 11 23:53:20 xfstests-201612120451 lvm[2627]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remote File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Remote File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Create Volatile Files and Directories...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Generate ssh host keys if they do not exist...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Raise network interfaces....
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Volatile Files and Directories.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started LSB: Generate ssh host keys if they do not exist.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Update UTMP about System Boot/Shutdown...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Update UTMP about System Boot/Shutdown.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Internet Systems Consortium DHCP Client 4.3.1
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Copyright 2004-2014 Internet Systems Consortium.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: All rights reserved.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: For info, please visit https://www.isc.org/software/dhcp/
> Dec 11 23:53:20 xfstests-201612120451 dhclient:
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Configuring network interfaces...Internet Systems Consortium DHCP Client 4.3.1
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Copyright 2004-2014 Internet Systems Consortium.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: All rights reserved.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: For info, please visit https://www.isc.org/software/dhcp/
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Listening on LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   Socket/fallback
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Listening on LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   Socket/fallback
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCP[^[[32m  OK  ^[[0m] DISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
> Dec 11 23:53:20 xfstests-201612120451 dhclient: No DHCPOFFERS received.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Trying recorded lease 10.240.0.3
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: No DHCPOFFERS received.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Trying recorded lease 10.240.0.3
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: connect: Network is unreachable
> Dec 11 23:53:20 xfstests-201612120451 logger: /etc/dhcp/dhclient-exit-hooks returned non-zero exit status 2
> Dec 11 23:53:20 xfstests-201612120451 dhclient: bound: renewal in 38598 seconds.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: bound: renewal in 38598 seconds.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: done.
>

^ permalink raw reply

* sctp: suspicious rcu_dereference_check() usage in sctp_epaddr_lookup_transport
From: Dmitry Vyukov @ 2016-12-13 18:07 UTC (permalink / raw)
  To: Vladislav Yasevich, Neil Horman, David Miller, linux-sctp, netdev,
	LKML, Eric Dumazet, Marcelo Ricardo Leitner
  Cc: syzkaller

Hello,

I am getting the following reports while running syzkaller fuzzer:

[ INFO: suspicious RCU usage. ]
4.9.0+ #85 Not tainted
-------------------------------
./include/linux/rhashtable.h:572 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by syz-executor1/18023:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<     inline     >] lock_sock
include/net/sock.h:1454
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff87bb3ccf>]
sctp_getsockopt+0x45f/0x6800 net/sctp/socket.c:6432

stack backtrace:
CPU: 2 PID: 18023 Comm: syz-executor1 Not tainted 4.9.0+ #85
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
[<     inline     >] __dump_stack lib/dump_stack.c:15
[<        none        >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<        none        >] lockdep_rcu_suspicious+0x139/0x180
kernel/locking/lockdep.c:4448
[<     inline     >] __rhashtable_lookup ./include/linux/rhashtable.h:572
[<     inline     >] rhltable_lookup ./include/linux/rhashtable.h:660
[<        none        >] sctp_epaddr_lookup_transport+0x641/0x930
net/sctp/input.c:946
[<        none        >] sctp_endpoint_lookup_assoc+0x83/0x120
net/sctp/endpointola.c:335
[<        none        >] sctp_addr_id2transport+0xaf/0x1e0 net/sctp/socket.c:241
[<        none        >] sctp_getsockopt_peer_addr_info+0x216/0x630
net/sctp/socket.c:4625
[<        none        >] sctp_getsockopt+0x2860/0x6800 net/sctp/socket.c:6500
[<        none        >] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2685
[<     inline     >] SYSC_getsockopt net/socket.c:1819
[<        none        >] SyS_getsockopt+0x245/0x380 net/socket.c:1801
[<        none        >] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:203

On commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba (Dec 12).

^ permalink raw reply

* Re: [PATCH iproute2 1/2] tc/cls_flower: Add dest UDP port to tunnel params
From: Stephen Hemminger @ 2016-12-13 18:17 UTC (permalink / raw)
  To: Hadar Hen Zion; +Cc: netdev, Or Gerlitz, Roi Dayan, Amir Vadai
In-Reply-To: <1481616467-769-2-git-send-email-hadarh@mellanox.com>

On Tue, 13 Dec 2016 10:07:46 +0200
Hadar Hen Zion <hadarh@mellanox.com> wrote:

> Enhance IP tunnel parameters by adding destination UDP port.
> 
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
> Reviewed-by: Roi Dayan <roid@mellanox.com>

Both applied, thanks.

^ permalink raw reply

* Re: [PATCH iproute2 V2 1/2] tc: flower: Fix typo and style in flower man page
From: Stephen Hemminger @ 2016-12-13 18:17 UTC (permalink / raw)
  To: Roi Dayan; +Cc: netdev, Amir Vadai, Hadar Hen Zion
In-Reply-To: <1481632742-18020-2-git-send-email-roid@mellanox.com>

On Tue, 13 Dec 2016 14:39:01 +0200
Roi Dayan <roid@mellanox.com> wrote:

> Replace vlan_eth_type with vlan_ethtype.
> 
> Fixes: 745d91726006 ("tc: flower: Introduce vlan support")
> Signed-off-by: Roi Dayan <roid@mellanox.com>
> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>

Both applied, thanks.

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Hannes Frederic Sowa @ 2016-12-13 18:39 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Christoph Lameter
  Cc: John Fastabend, Mike Rapoport, netdev@vger.kernel.org, linux-mm,
	Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth,
	Vladislav Yasevich
In-Reply-To: <20161213171028.24dbf519@redhat.com>

On 13.12.2016 17:10, Jesper Dangaard Brouer wrote:
>> What is bad about RDMA is that it is a separate kernel subsystem.
>> What I would like to see is a deeper integration with the network
>> stack so that memory regions can be registred with a network socket
>> and work requests then can be submitted and processed that directly
>> read and write in these regions. The network stack should provide the
>> services that the hardware of the NIC does not suppport as usual.
> 
> Interesting.  So you even imagine sockets registering memory regions
> with the NIC.  If we had a proper NIC HW filter API across the drivers,
> to register the steering rule (like ibv_create_flow), this would be
> doable, but we don't (DPDK actually have an interesting proposal[1])

On a side note, this is what windows does with RIO ("registered I/O").
Maybe you want to look at the API to get some ideas: allocating and
pinning down memory in user space and registering that with sockets to
get zero-copy IO.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [iproute2 v3 net-next 0/8] Add support for vrf helper
From: Stephen Hemminger @ 2016-12-13 18:44 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <1481503995-24825-1-git-send-email-dsa@cumulusnetworks.com>

On Sun, 11 Dec 2016 16:53:07 -0800
David Ahern <dsa@cumulusnetworks.com> wrote:

> This series adds support to iproute2 to run a command against a specific
> VRF. The user semantics are similar to 'ip netns'.
> 
> The 'ip vrf' subcommand supports 3 usages:
> 
> 1. Run a command against a given vrf:
>        ip vrf exec NAME CMD
> 
>    Uses the recently committed cgroup/sock BPF option. vrf directory
>    is added to cgroup2 mount. Individual vrfs are created under it. BPF
>    filter is attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the
>    device index of the VRF. From there the current process (ip's pid) is
>    addded to the cgroups.proc file and the given command is exected. In
>    doing so all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically
>    bound to the VRF domain.
> 
>    The association is inherited parent to child allowing the command to
>    be a shell from which other commands are run relative to the VRF.
> 
> 2. Show the VRF a process is bound to:
>        ip vrf id [PID]
>    This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
>    entry. If pid arg is not given current process id is used.
> 
> 3. Show process ids bound to a VRF
>        ip vrf pids NAME
>    This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
>    shows the process ids in the particular vrf cgroup.
> 
> v3
> - bpf_prog_{at,de}tach changes as requested by Daniel
> - BPF macros added to bpf_util.h versus adding a new file as requested by Daniel
> 
> v2
> - updated suject of patch 3 to avoid spam filters on vger
> 
> David Ahern (8):
>   lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
>   bpf: export bpf_prog_load
>   bpf: Add BPF_ macros
>   move cmd_exec to lib utils
>   Add filesystem APIs to lib
>   change name_is_vrf to return index
>   libnetlink: Add variant of rtnl_talk that does not display RTNETLINK
>     answers error
>   Introduce ip vrf command
> 
>  include/bpf_util.h   | 186 +++++++++++++++++++++++++++++++++
>  include/libnetlink.h |   3 +
>  include/utils.h      |   4 +
>  ip/Makefile          |   3 +-
>  ip/ip.c              |   4 +-
>  ip/ip_common.h       |   4 +-
>  ip/iplink_vrf.c      |  29 ++++--
>  ip/ipnetns.c         |  34 ------
>  ip/ipvrf.c           | 289 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/Makefile         |   2 +-
>  lib/bpf.c            |  61 +++++++----
>  lib/exec.c           |  41 ++++++++
>  lib/fs.c             | 143 +++++++++++++++++++++++++
>  lib/libnetlink.c     |  20 +++-
>  man/man8/ip-vrf.8    |  88 ++++++++++++++++
>  15 files changed, 841 insertions(+), 70 deletions(-)
>  create mode 100644 ip/ipvrf.c
>  create mode 100644 lib/exec.c
>  create mode 100644 lib/fs.c
>  create mode 100644 man/man8/ip-vrf.8
> 

Thanks, applied. Then I went and cleanup the long lines and whitespace issues

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Theodore Ts'o @ 2016-12-13 19:44 UTC (permalink / raw)
  To: Wei Xu; +Cc: jasowang, netdev, mst, nhorman, davem
In-Reply-To: <bb997932-20d2-42f4-0f42-bd28ae151076@redhat.com>

Jason's patch fixed the issue, so I think we have the proper fix, but
to answer your questions:

On Wed, Dec 14, 2016 at 01:46:44AM +0800, Wei Xu wrote:
> 
> Q1:
> Which distribution are you using for the GCE instance?

The test appliance is based on Debian Jessie.

> Q2:
> Are you running xfs test as an embedded VM case, which means XFS test
> appliance is also a VM inside the GCE instance? Or the kernel is built
> for the instance itself?

No, GCE currently doesn't support running nested VM's (e.g., running
VM's inside GCE).  So the kernel is built for the instance itself.
The way the test appliance works is that it initially boots using the
Debian Jessie default kernel and then we kexec into the kernel under
test.

> Q3:
> Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
> a local test bed if it makes sense.

You definitely can't do it out of the box -- you need to build the
image using "gen-image --networking", and then run "kvm-xfstests -N
shell" as root.  But the bug doesn't reproduce on kvm-xfstests, using
a 4.9 host kernel and linux-next guest kernel.


Cheers,

					- Ted

^ permalink raw reply

* Re: bpf debug info
From: Alexei Starovoitov @ 2016-12-13 19:38 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev@vger.kernel.org, Brenden Blanco, Thomas Graf, Wangnan,
	He Kuang, Kernel Team

On Tue, Nov 29, 2016 at 9:01 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>> >If I try to run samples/bpf/test_cls_bpf.sh the verifier will complain:
>> >R0=imm0,min_value=0,max_value=0 R1=pkt(id=0,off=0,r=42) R2=pkt_end
>> >112: (0f) r4 += r3
>> >113: (0f) r1 += r4
>> >114: (b7) r0 = 2
>> >115: (69) r2 = *(u16 *)(r1 +2)
>> >invalid access to packet, off=2 size=2, R1(id=3,off=0,r=0)
>> >
>> >Now multiply 115 * 8 and convert to hex. This is address 0x398 in llvm-objdump:
>> >; struct udphdr *udp = data + tp_off;
>> >      388:       r1 += r4
>> >      390:       r0 = 2
>> >; if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
>> >      398:       r2 = *(u16 *)(r1 + 2)
>> >      3a0:       if r2 == 2304 goto 16
>> >
>> >Now it's clear which line of C code is causing the verifier to reject.
>> [...]
>>
>> Could llvm-objdump switch line numbering for bpf same way as verifier
>> output, so mapping step is not really needed?
>
> you mean that llvm-objdump to print 113,114,115 ?
> I guess it's doable. Will give it a try.

Hi Daniel,

your feature request turned out to be pretty straightforward
to implement. Please pull the latest llvm and rebuild llvm-objdump.
It will be printing instruction numbers instead of absolute addresses.
No "multiply 115 * 8 and convert to hex" steps necessary anymore.

Thanks

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: David Miller @ 2016-12-13 19:53 UTC (permalink / raw)
  To: john.fastabend
  Cc: brouer, cl, rppt, netdev, linux-mm, willemdebruijn.kernel,
	bjorn.topel, magnus.karlsson, alexander.duyck, mgorman, tom,
	bblanco, tariqt, saeedm, jesse.brandeburg, METH, vyasevich
In-Reply-To: <5850335F.6090000@gmail.com>

From: John Fastabend <john.fastabend@gmail.com>
Date: Tue, 13 Dec 2016 09:43:59 -0800

> What does "zero-copy send packet-pages to the application/socket that
> requested this" mean? At the moment on x86 page-flipping appears to be
> more expensive than memcpy (I can post some data shortly) and shared
> memory was proposed and rejected for security reasons when we were
> working on bifurcated driver.

The whole idea is that we map all the active RX ring pages into
userspace from the start.

And just how Jesper's page pool work will avoid DMA map/unmap,
it will also avoid changing the userspace mapping of the pages
as well.

Thus avoiding the TLB/VM overhead altogether.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] netlink: revert broken, broken "2-clause nla_ok()"
From: David Miller @ 2016-12-13 19:55 UTC (permalink / raw)
  To: adobriyan; +Cc: netdev, johannes
In-Reply-To: <20161213193015.GA10610@avx2>

From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 13 Dec 2016 22:30:15 +0300

> Commit 4f7df337fe79bba1e4c2d525525d63b5ba186bbd
> "netlink: 2-clause nla_ok()" is BROKEN.
> 
> First clause tests if "->nla_len" could even be accessed at all,
> it can not possibly be omitted.
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: John Fastabend @ 2016-12-13 20:08 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, cl, rppt, netdev, linux-mm, willemdebruijn.kernel,
	bjorn.topel, magnus.karlsson, alexander.duyck, mgorman, tom,
	bblanco, tariqt, saeedm, jesse.brandeburg, METH, vyasevich
In-Reply-To: <20161213.145333.514056260418695987.davem@davemloft.net>

On 16-12-13 11:53 AM, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Tue, 13 Dec 2016 09:43:59 -0800
> 
>> What does "zero-copy send packet-pages to the application/socket that
>> requested this" mean? At the moment on x86 page-flipping appears to be
>> more expensive than memcpy (I can post some data shortly) and shared
>> memory was proposed and rejected for security reasons when we were
>> working on bifurcated driver.
> 
> The whole idea is that we map all the active RX ring pages into
> userspace from the start.
> 
> And just how Jesper's page pool work will avoid DMA map/unmap,
> it will also avoid changing the userspace mapping of the pages
> as well.
> 
> Thus avoiding the TLB/VM overhead altogether.
> 

I get this but it requires applications to be isolated. The pages from
a queue can not be shared between multiple applications in different
trust domains. And the application has to be cooperative meaning it
can't "look" at data that has not been marked by the stack as OK. In
these schemes we tend to end up with something like virtio/vhost or
af_packet.

Any ACLs/filtering/switching/headers need to be done in hardware or
the application trust boundaries are broken.

If the above can not be met then a copy is needed. What I am trying
to tease out is the above comment along with other statements like
this "can be done with out HW filter features".

.John

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] net: qcom/emac: don't try to claim clocks on ACPI systems
From: Timur Tabi @ 2016-12-13 19:55 UTC (permalink / raw)
  To: David Miller, netdev, Christopher Covington, alokc

On ACPI systems, clocks are not available to drivers directly.  They are
handled exclusively by ACPI and/or firmware, so there is no clock driver.
Calls to clk_get() always fail, so we should not even attempt to claim
any clocks on ACPI systems.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index ae32f85..b1c1cdc 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -627,11 +627,12 @@ static int emac_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_undo_netdev;
 
-	/* initialize clocks */
-	ret = emac_clks_phase1_init(pdev, adpt);
-	if (ret) {
-		dev_err(&pdev->dev, "could not initialize clocks\n");
-		goto err_undo_netdev;
+	if (!has_acpi_companion(&pdev->dev)) {
+		ret = emac_clks_phase1_init(pdev, adpt);
+		if (ret) {
+			dev_err(&pdev->dev, "could not initialize clocks\n");
+			goto err_undo_netdev;
+		}
 	}
 
 	netdev->watchdog_timeo = EMAC_WATCHDOG_TIME;
@@ -655,11 +656,12 @@ static int emac_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_undo_mdiobus;
 
-	/* enable clocks */
-	ret = emac_clks_phase2_init(pdev, adpt);
-	if (ret) {
-		dev_err(&pdev->dev, "could not initialize clocks\n");
-		goto err_undo_mdiobus;
+	if (!has_acpi_companion(&pdev->dev)) {
+		ret = emac_clks_phase2_init(pdev, adpt);
+		if (ret) {
+			dev_err(&pdev->dev, "could not initialize clocks\n");
+			goto err_undo_mdiobus;
+		}
 	}
 
 	emac_mac_reset(adpt);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related

* [PATCH iproute2 1/1] tc: pass correct conversion specifier to print 'unsigned int' action index.
From: Roman Mashak @ 2016-12-13 20:31 UTC (permalink / raw)
  To: stephen; +Cc: netdev, jhs, daniel, xiyou.wangcong, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 tc/m_bpf.c      | 2 +-
 tc/m_connmark.c | 2 +-
 tc/m_csum.c     | 3 ++-
 tc/m_gact.c     | 3 ++-
 tc/m_ife.c      | 2 +-
 tc/m_ipt.c      | 2 +-
 tc/m_mirred.c   | 3 ++-
 tc/m_pedit.c    | 2 +-
 tc/m_simple.c   | 2 +-
 tc/m_skbedit.c  | 2 +-
 tc/m_skbmod.c   | 2 +-
 tc/m_vlan.c     | 2 +-
 tc/m_xt.c       | 2 +-
 tc/m_xt_old.c   | 2 +-
 14 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/tc/m_bpf.c b/tc/m_bpf.c
index 9bf2a85..6400724 100644
--- a/tc/m_bpf.c
+++ b/tc/m_bpf.c
@@ -161,7 +161,7 @@ static int bpf_print_opt(struct action_util *au, FILE *f, struct rtattr *arg)
 	}
 
 	fprintf(f, "default-action %s\n", action_n2a(parm->action));
-	fprintf(f, "\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+	fprintf(f, "\tindex %u ref %d bind %d", parm->index, parm->refcnt,
 		parm->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_connmark.c b/tc/m_connmark.c
index 20f98e4..295f90d 100644
--- a/tc/m_connmark.c
+++ b/tc/m_connmark.c
@@ -123,7 +123,7 @@ static int print_connmark(struct action_util *au, FILE *f, struct rtattr *arg)
 	ci = RTA_DATA(tb[TCA_CONNMARK_PARMS]);
 
 	fprintf(f, " connmark zone %d\n", ci->zone);
-	fprintf(f, "\t index %d ref %d bind %d", ci->index,
+	fprintf(f, "\t index %u ref %d bind %d", ci->index,
 		ci->refcnt, ci->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_csum.c b/tc/m_csum.c
index a6e4c1e..d5b1af6 100644
--- a/tc/m_csum.c
+++ b/tc/m_csum.c
@@ -199,7 +199,8 @@ print_csum(struct action_util *au, FILE *f, struct rtattr *arg)
 		uflag_1, uflag_2, uflag_3,
 		uflag_4, uflag_5, uflag_6,
 		action_n2a(sel->action));
-	fprintf(f, "\tindex %d ref %d bind %d", sel->index, sel->refcnt, sel->bindcnt);
+	fprintf(f, "\tindex %u ref %d bind %d", sel->index, sel->refcnt,
+		sel->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_CSUM_TM]) {
diff --git a/tc/m_gact.c b/tc/m_gact.c
index dc04b9f..755a3be 100644
--- a/tc/m_gact.c
+++ b/tc/m_gact.c
@@ -224,7 +224,8 @@ print_gact(struct action_util *au, FILE * f, struct rtattr *arg)
 	fprintf(f, "\n\t random type %s %s val %d",
 		prob_n2a(pp->ptype), action_n2a(pp->paction), pp->pval);
 #endif
-	fprintf(f, "\n\t index %d ref %d bind %d", p->index, p->refcnt, p->bindcnt);
+	fprintf(f, "\n\t index %u ref %d bind %d", p->index, p->refcnt,
+		p->bindcnt);
 	if (show_stats) {
 		if (tb[TCA_GACT_TM]) {
 			struct tcf_t *tm = RTA_DATA(tb[TCA_GACT_TM]);
diff --git a/tc/m_ife.c b/tc/m_ife.c
index e6f6153..f6131b1 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -312,7 +312,7 @@ static int print_ife(struct action_util *au, FILE *f, struct rtattr *arg)
 				    sizeof(b2)));
 	}
 
-	fprintf(f, "\n\t index %d ref %d bind %d", p->index, p->refcnt,
+	fprintf(f, "\n\t index %u ref %d bind %d", p->index, p->refcnt,
 		p->bindcnt);
 	if (show_stats) {
 		if (tb[TCA_IFE_TM]) {
diff --git a/tc/m_ipt.c b/tc/m_ipt.c
index d6f62bd..1b935ec 100644
--- a/tc/m_ipt.c
+++ b/tc/m_ipt.c
@@ -489,7 +489,7 @@ print_ipt(struct action_util *au, FILE * f, struct rtattr *arg)
 			__u32 index;
 
 			index = rta_getattr_u32(tb[TCA_IPT_INDEX]);
-			fprintf(f, "\n\tindex %d", index);
+			fprintf(f, "\n\tindex %u", index);
 		}
 
 		if (tb[TCA_IPT_CNT]) {
diff --git a/tc/m_mirred.c b/tc/m_mirred.c
index 11f4c9b..35ae21f 100644
--- a/tc/m_mirred.c
+++ b/tc/m_mirred.c
@@ -260,7 +260,8 @@ print_mirred(struct action_util *au, FILE * f, struct rtattr *arg)
 		mirred_n2a(p->eaction), dev, action_n2a(p->action));
 
 	fprintf(f, "\n ");
-	fprintf(f, "\tindex %d ref %d bind %d", p->index, p->refcnt, p->bindcnt);
+	fprintf(f, "\tindex %u ref %d bind %d", p->index, p->refcnt,
+		p->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_MIRRED_TM]) {
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 891c2ec..8e9bf07 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -527,7 +527,7 @@ int print_pedit(struct action_util *au, FILE *f, struct rtattr *arg)
 
 	fprintf(f, " pedit action %s keys %d\n ",
 		action_n2a(sel->action), sel->nkeys);
-	fprintf(f, "\t index %d ref %d bind %d", sel->index, sel->refcnt,
+	fprintf(f, "\t index %u ref %d bind %d", sel->index, sel->refcnt,
 		sel->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_simple.c b/tc/m_simple.c
index 732eaf1..3a8bd91 100644
--- a/tc/m_simple.c
+++ b/tc/m_simple.c
@@ -187,7 +187,7 @@ static int print_simple(struct action_util *au, FILE *f, struct rtattr *arg)
 	simpdata = RTA_DATA(tb[TCA_DEF_DATA]);
 
 	fprintf(f, "Simple <%s>\n", simpdata);
-	fprintf(f, "\t index %d ref %d bind %d", sel->index,
+	fprintf(f, "\t index %u ref %d bind %d", sel->index,
 		sel->refcnt, sel->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_skbedit.c b/tc/m_skbedit.c
index 368debc..8660d60 100644
--- a/tc/m_skbedit.c
+++ b/tc/m_skbedit.c
@@ -214,7 +214,7 @@ static int print_skbedit(struct action_util *au, FILE *f, struct rtattr *arg)
 			fprintf(f, " ptype %d", *ptype);
 	}
 
-	fprintf(f, "\n\t index %d ref %d bind %d",
+	fprintf(f, "\n\t index %u ref %d bind %d",
 		p->index, p->refcnt, p->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_skbmod.c b/tc/m_skbmod.c
index 0c293fc..acb7771 100644
--- a/tc/m_skbmod.c
+++ b/tc/m_skbmod.c
@@ -237,7 +237,7 @@ static int print_skbmod(struct action_util *au, FILE *f, struct rtattr *arg)
 	if (p->flags & SKBMOD_F_SWAPMAC)
 		fprintf(f, "swap mac ");
 
-	fprintf(f, "\n\t index %d ref %d bind %d", p->index, p->refcnt,
+	fprintf(f, "\n\t index %u ref %d bind %d", p->index, p->refcnt,
 		p->bindcnt);
 	if (show_stats) {
 		if (tb[TCA_SKBMOD_TM]) {
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index b32f746..44b9375 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -226,7 +226,7 @@ static int print_vlan(struct action_util *au, FILE *f, struct rtattr *arg)
 	}
 	fprintf(f, " %s", action_n2a(parm->action));
 
-	fprintf(f, "\n\t index %d ref %d bind %d", parm->index, parm->refcnt,
+	fprintf(f, "\n\t index %u ref %d bind %d", parm->index, parm->refcnt,
 		parm->bindcnt);
 
 	if (show_stats) {
diff --git a/tc/m_xt.c b/tc/m_xt.c
index 028bad6..dbb5498 100644
--- a/tc/m_xt.c
+++ b/tc/m_xt.c
@@ -372,7 +372,7 @@ print_ipt(struct action_util *au, FILE *f, struct rtattr *arg)
 		__u32 index;
 
 		index = rta_getattr_u32(tb[TCA_IPT_INDEX]);
-		fprintf(f, "\n\tindex %d", index);
+		fprintf(f, "\n\tindex %u", index);
 	}
 
 	if (tb[TCA_IPT_CNT]) {
diff --git a/tc/m_xt_old.c b/tc/m_xt_old.c
index 20a6342..e9cc624 100644
--- a/tc/m_xt_old.c
+++ b/tc/m_xt_old.c
@@ -412,7 +412,7 @@ print_ipt(struct action_util *au, FILE * f, struct rtattr *arg)
 			__u32 index;
 
 			index = rta_getattr_u32(tb[TCA_IPT_INDEX]);
-			fprintf(f, "\n\tindex %d", index);
+			fprintf(f, "\n\tindex %u", index);
 		}
 
 		if (tb[TCA_IPT_CNT]) {
-- 
1.9.1

^ permalink raw reply related

* Re: bpf debug info
From: Daniel Borkmann @ 2016-12-13 19:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev@vger.kernel.org, Brenden Blanco, Thomas Graf, Wangnan,
	He Kuang, Kernel Team
In-Reply-To: <CAADnVQKKjYDCqRGrB2UVKf=-KZBpt+5+M4nXXVuVefEGRv5MYQ@mail.gmail.com>

On 12/13/2016 08:38 PM, Alexei Starovoitov wrote:
> On Tue, Nov 29, 2016 at 9:01 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>>> If I try to run samples/bpf/test_cls_bpf.sh the verifier will complain:
>>>> R0=imm0,min_value=0,max_value=0 R1=pkt(id=0,off=0,r=42) R2=pkt_end
>>>> 112: (0f) r4 += r3
>>>> 113: (0f) r1 += r4
>>>> 114: (b7) r0 = 2
>>>> 115: (69) r2 = *(u16 *)(r1 +2)
>>>> invalid access to packet, off=2 size=2, R1(id=3,off=0,r=0)
>>>>
>>>> Now multiply 115 * 8 and convert to hex. This is address 0x398 in llvm-objdump:
>>>> ; struct udphdr *udp = data + tp_off;
>>>>       388:       r1 += r4
>>>>       390:       r0 = 2
>>>> ; if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
>>>>       398:       r2 = *(u16 *)(r1 + 2)
>>>>       3a0:       if r2 == 2304 goto 16
>>>>
>>>> Now it's clear which line of C code is causing the verifier to reject.
>>> [...]
>>>
>>> Could llvm-objdump switch line numbering for bpf same way as verifier
>>> output, so mapping step is not really needed?
>>
>> you mean that llvm-objdump to print 113,114,115 ?
>> I guess it's doable. Will give it a try.
>
> Hi Daniel,
>
> your feature request turned out to be pretty straightforward
> to implement. Please pull the latest llvm and rebuild llvm-objdump.
> It will be printing instruction numbers instead of absolute addresses.
> No "multiply 115 * 8 and convert to hex" steps necessary anymore.

That's great to hear, thanks for following up on this. Sounds about
right to upgrade.

Thanks,
Daniel

^ permalink raw reply

* Re: [RFC PATCH v3] audit: use proper refcount locking on audit_sock
From: Paul Moore @ 2016-12-13 20:50 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: netdev, linux-kernel, edumazet, linux-audit, xiyou.wangcong,
	dvyukov
In-Reply-To: <61c37ca790bc11bc023aea8f9b70ab3098aa30f5.1481626466.git.rgb@redhat.com>

On Tue, Dec 13, 2016 at 10:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock.  audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> <xiyou.wangcong@gmail.com> on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues.  This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
>  kernel/audit.c |   28 +++++++++++++++++++++++-----
>  1 files changed, 23 insertions(+), 5 deletions(-)

This looks more reasonable.  I still wonder about synchronization
between threads changing the audit_* connection variables and the
kauditd_thread, but I guess we can treat that as another issue; this
patch fixes a bug and is worth merging now.

I'm building a test kernel right now, assuming nothing blows up I'll
push this patch with the rest of the audit patches tomorrow; if
something bad happens, this is going to miss the first audit pull
request.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..3bb4126 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -446,14 +446,19 @@ static void kauditd_retry_skb(struct sk_buff *skb)
>   * Description:
>   * Break the auditd/kauditd connection and move all the records in the retry
>   * queue into the hold queue in case auditd reconnects.
> + * The audit_cmd_mutex must be held when calling this function.
>   */

Don't resend, but in the future please start comments like this on the
previous line.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Tom Herbert @ 2016-12-13 20:51 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Hannes Frederic Sowa, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <1481581466.24490.2@smtp.office365.com>

I think there may be some suspicious code in inet_csk_get_port. At
tb_found there is:

                if (((tb->fastreuse > 0 && reuse) ||
                     (tb->fastreuseport > 0 &&
                      !rcu_access_pointer(sk->sk_reuseport_cb) &&
                      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
                    smallest_size == -1)
                        goto success;
                if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) {
                        if ((reuse ||
                             (tb->fastreuseport > 0 &&
                              sk->sk_reuseport &&
                              !rcu_access_pointer(sk->sk_reuseport_cb) &&
                              uid_eq(tb->fastuid, uid))) &&
                            smallest_size != -1 && --attempts >= 0) {
                                spin_unlock_bh(&head->lock);
                                goto again;
                        }
                        goto fail_unlock;
                }

AFAICT there is redundancy in these two conditionals.  The same clause
is being checked in both: (tb->fastreuseport > 0 &&
!rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
first conditional should be hit, goto done,  and the second will never
evaluate that part to true-- unless the sk is changed (do we need
READ_ONCE for sk->sk_reuseport_cb?).

Another potential issue is the that the goto again goes back to doing
the port scan, but if snum had been set originally that doesn't seem
like what we want.

Thanks,
Tom




On Mon, Dec 12, 2016 at 2:24 PM, Josef Bacik <jbacik@fb.com> wrote:
>
> On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>>
>> On 12.12.2016 19:05, Josef Bacik wrote:
>>>
>>>  On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>>  wrote:
>>>>
>>>>  On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote:
>>>>
>>>>>
>>>>>   Hmm... Is your ephemeral port range includes the port your load
>>>>>   balancing app is using ?
>>>>
>>>>
>>>>  I suspect that you might have processes doing bind( port = 0) that are
>>>>  trapped into the bind_conflict() scan ?
>>>>
>>>>  With 100,000 + timewaits there, this possibly hurts.
>>>>
>>>>  Can you try the following loop breaker ?
>>>
>>>
>>>  It doesn't appear that the app is doing bind(port = 0) during normal
>>>  operation.  I tested this patch and it made no difference.  I'm going to
>>>  test simply restarting the app without changing to the SO_REUSEPORT
>>>  option.  Thanks,
>>
>>
>> Would it be possible to trace the time the function uses with trace? If
>> we don't see the number growing considerably over time we probably can
>> rule out that we loop somewhere in there (I would instrument
>> inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port).
>>
>> __inet_hash_connect -> __inet_check_established also takes a lock
>> (inet_ehash_lockp) which can be locked from inet_diag code path during
>> socket diag info dumping.
>>
>> Unfortunately we couldn't reproduce it so far. :/
>
>
> So I had a bcc script running to time how long we spent in
> inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port, but of
> course I'm an idiot and didn't actually separate out the stats so I could
> tell _which_ one was taking forever.  But anyway here's a normal
> distribution on the box
>
>     Some shit           : count     distribution
>         0 -> 1          : 0        |                                       |
>         2 -> 3          : 0        |                                       |
>         4 -> 7          : 0        |                                       |
>         8 -> 15         : 0        |                                       |
>        16 -> 31         : 0        |                                       |
>        32 -> 63         : 0        |                                       |
>        64 -> 127        : 0        |                                       |
>       128 -> 255        : 0        |                                       |
>       256 -> 511        : 0        |                                       |
>       512 -> 1023       : 0        |                                       |
>      1024 -> 2047       : 74       |                                       |
>      2048 -> 4095       : 10537
> |****************************************|
>      4096 -> 8191       : 8497     |********************************       |
>      8192 -> 16383      : 3745     |**************                         |
>     16384 -> 32767      : 300      |*                                      |
>     32768 -> 65535      : 250      |                                       |
>     65536 -> 131071     : 180      |                                       |
>    131072 -> 262143     : 71       |                                       |
>    262144 -> 524287     : 18       |                                       |
>    524288 -> 1048575    : 5        |                                       |
>
> With the times in nanoseconds, and here's the distribution during the
> problem
>
>     Some shit           : count     distribution
>         0 -> 1          : 0        |                                       |
>         2 -> 3          : 0        |                                       |
>         4 -> 7          : 0        |                                       |
>         8 -> 15         : 0        |                                       |
>        16 -> 31         : 0        |                                       |
>        32 -> 63         : 0        |                                       |
>        64 -> 127        : 0        |                                       |
>       128 -> 255        : 0        |                                       |
>       256 -> 511        : 0        |                                       |
>       512 -> 1023       : 0        |                                       |
>      1024 -> 2047       : 21       |                                       |
>      2048 -> 4095       : 21820
> |****************************************|
>      4096 -> 8191       : 11598    |*********************                  |
>      8192 -> 16383      : 4337     |*******                                |
>     16384 -> 32767      : 290      |                                       |
>     32768 -> 65535      : 59       |                                       |
>     65536 -> 131071     : 23       |                                       |
>    131072 -> 262143     : 12       |                                       |
>    262144 -> 524287     : 6        |                                       |
>    524288 -> 1048575    : 19       |                                       |
>   1048576 -> 2097151    : 1079     |*                                      |
>   2097152 -> 4194303    : 0        |                                       |
>   4194304 -> 8388607    : 1        |                                       |
>   8388608 -> 16777215   : 0        |                                       |
>  16777216 -> 33554431   : 0        |                                       |
>  33554432 -> 67108863   : 1192     |**                                     |
>               Some shit                     : count     distribution
>                   0 -> 1                    : 0        |                   |
>                   2 -> 3                    : 0        |                   |
>                   4 -> 7                    : 0        |                   |
>                   8 -> 15                   : 0        |                   |
>                  16 -> 31                   : 0        |                   |
>                  32 -> 63                   : 0        |                   |
>                  64 -> 127                  : 0        |                   |
>                 128 -> 255                  : 0        |                   |
>                 256 -> 511                  : 0        |                   |
>                 512 -> 1023                 : 0        |                   |
>                1024 -> 2047                 : 48       |                   |
>                2048 -> 4095                 : 14714
> |********************|
>                4096 -> 8191                 : 6769     |*********          |
>                8192 -> 16383                : 2234     |***                |
>               16384 -> 32767                : 422      |                   |
>               32768 -> 65535                : 208      |                   |
>               65536 -> 131071               : 61       |                   |
>              131072 -> 262143               : 10       |                   |
>              262144 -> 524287               : 416      |                   |
>              524288 -> 1048575              : 826      |*                  |
>             1048576 -> 2097151              : 598      |                   |
>             2097152 -> 4194303              : 10       |                   |
>             4194304 -> 8388607              : 0        |                   |
>             8388608 -> 16777215             : 1        |                   |
>            16777216 -> 33554431             : 289      |                   |
>            33554432 -> 67108863             : 921      |*                  |
>            67108864 -> 134217727            : 74       |                   |
>           134217728 -> 268435455            : 75       |                   |
>           268435456 -> 536870911            : 48       |                   |
>           536870912 -> 1073741823           : 25       |                   |
>          1073741824 -> 2147483647           : 3        |                   |
>          2147483648 -> 4294967295           : 2        |                   |
>          4294967296 -> 8589934591           : 1        |                   |
>
> As you can see we start getting tail latencies of up to 4-8 seconds.
> Tomorrow I'll separate out the stats so we can know which function is the
> problem child.  Sorry about not doing that first.  Thanks,
>
> Josef
>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2] igb: re-assign hw address pointer        on reset after PCI error
From: Brown, Aaron F @ 2016-12-13 20:51 UTC (permalink / raw)
  To: Guilherme G. Piccoli, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org
In-Reply-To: <1478803603-30306-1-git-send-email-gpiccoli@linux.vnet.ibm.com>

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> Behalf Of Guilherme G. Piccoli
> Sent: Thursday, November 10, 2016 10:47 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; gpiccoli@linux.vnet.ibm.com
> Subject: [Intel-wired-lan] [PATCH net v2] igb: re-assign hw address pointer on
> reset after PCI error
> 
> Whenever the igb driver detects the result of a read operation returns
> a value composed only by F's (like 0xFFFFFFFF), it will detach the
> net_device, clear the hw_addr pointer and warn to the user that adapter's
> link is lost - those steps happen on igb_rd32().
> 
> In case a PCI error happens on Power architecture, there's a recovery
> mechanism called EEH, that will reset the PCI slot and call driver's
> handlers to reset the adapter and network functionality as well.
> 
> We observed that once hw_addr is NULL after the error is detected on
> igb_rd32(), it's never assigned back, so in the process of resetting
> the network functionality we got a NULL pointer dereference in both
> igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
> such bug, this patch re-assigns the hw_addr value in the slot_reset
> handler.
> 
> Reported-by: Anthony H. Thai <ahthai@us.ibm.com>
> Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
>  1 file changed, 5 insertions(+)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* Re: sctp: suspicious rcu_dereference_check() usage in sctp_epaddr_lookup_transport
From: Marcelo Ricardo Leitner @ 2016-12-13 21:37 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Vladislav Yasevich, Neil Horman, David Miller, linux-sctp, netdev,
	LKML, Eric Dumazet, syzkaller, Xin Long
In-Reply-To: <CACT4Y+bEi6aXTTrk4P37hFnMdyte0voxUVdz8t0XQP95PgH9+w@mail.gmail.com>

On Tue, Dec 13, 2016 at 07:07:01PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> I am getting the following reports while running syzkaller fuzzer:
> 
> [ INFO: suspicious RCU usage. ]
> 4.9.0+ #85 Not tainted
> -------------------------------
> ./include/linux/rhashtable.h:572 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by syz-executor1/18023:
>  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<     inline     >] lock_sock
> include/net/sock.h:1454
>  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff87bb3ccf>]
> sctp_getsockopt+0x45f/0x6800 net/sctp/socket.c:6432
> 
> stack backtrace:
> CPU: 2 PID: 18023 Comm: syz-executor1 Not tainted 4.9.0+ #85
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> [<     inline     >] __dump_stack lib/dump_stack.c:15
> [<        none        >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> [<        none        >] lockdep_rcu_suspicious+0x139/0x180
> kernel/locking/lockdep.c:4448
> [<     inline     >] __rhashtable_lookup ./include/linux/rhashtable.h:572
> [<     inline     >] rhltable_lookup ./include/linux/rhashtable.h:660
> [<        none        >] sctp_epaddr_lookup_transport+0x641/0x930
> net/sctp/input.c:946

I think this was introduced in the rhlist converstion. We had removed
some rcu_read_lock() calls on sctp stack because rhashtable was already
calling it, but then we didn't add them back when moving to rhlist.

This code:
+/* return a transport without holding it, as it's only used under sock lock */
 struct sctp_transport *sctp_epaddr_lookup_transport(
                                const struct sctp_endpoint *ep,
                                const union sctp_addr *paddr)
 {
        struct net *net = sock_net(ep->base.sk);
+       struct rhlist_head *tmp, *list;
+       struct sctp_transport *t;
        struct sctp_hash_cmp_arg arg = {
-               .ep    = ep,
                .paddr = paddr,
                .net   = net,
+               .lport = htons(ep->base.bind_addr.port),
        };
 
-       return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
-                                     sctp_hash_params);
+       list = rhltable_lookup(&sctp_transport_hashtable, &arg,
+                              sctp_hash_params);

Had an implicit rcu_read_lock() on rhashtable_lookup_fast, but it
doesn't on rhltable_lookup and rhltable_lookup uses _rcu calls which
assumes we have rcu read protection.

  Marcelo

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: don't try to claim clocks on ACPI systems
From: Florian Fainelli @ 2016-12-13 21:46 UTC (permalink / raw)
  To: Timur Tabi, David Miller, netdev, Christopher Covington, alokc
In-Reply-To: <1481658930-565-1-git-send-email-timur@codeaurora.org>

On 12/13/2016 11:55 AM, Timur Tabi wrote:
> On ACPI systems, clocks are not available to drivers directly.  They are
> handled exclusively by ACPI and/or firmware, so there is no clock driver.
> Calls to clk_get() always fail, so we should not even attempt to claim
> any clocks on ACPI systems.
> 
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> ---
>  drivers/net/ethernet/qualcomm/emac/emac.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
> index ae32f85..b1c1cdc 100644
> --- a/drivers/net/ethernet/qualcomm/emac/emac.c
> +++ b/drivers/net/ethernet/qualcomm/emac/emac.c
> @@ -627,11 +627,12 @@ static int emac_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto err_undo_netdev;
>  
> -	/* initialize clocks */
> -	ret = emac_clks_phase1_init(pdev, adpt);
> -	if (ret) {
> -		dev_err(&pdev->dev, "could not initialize clocks\n");
> -		goto err_undo_netdev;
> +	if (!has_acpi_companion(&pdev->dev)) {

Is there a reason why the check is not moved down inwo
emac_clks_phase{1,2}_init functions? Do you anticipate other
ACPI-related changes in the future that would warrant having this check
moved at a higher level?
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: don't try to claim clocks on ACPI systems
From: Timur Tabi @ 2016-12-13 21:54 UTC (permalink / raw)
  To: Florian Fainelli, David Miller, netdev, Christopher Covington,
	alokc
In-Reply-To: <d6683e1c-803d-3859-f125-93fecaa0df02@gmail.com>

On 12/13/2016 03:46 PM, Florian Fainelli wrote:
> Is there a reason why the check is not moved down inwo
> emac_clks_phase{1,2}_init functions? Do you anticipate other
> ACPI-related changes in the future that would warrant having this check
> moved at a higher level?

No, this is the last ACPI-related change that I expect.  I could move 
the check into those functions, but I don't see how that's any different 
than what I'm doing now.  My way avoids calling a function altogether, 
your way calls into a function only to have it return immediately.

But I don't have any strong feelings either way.  I will change it if 
you want me to.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: don't try to claim clocks on ACPI systems
From: Florian Fainelli @ 2016-12-13 22:02 UTC (permalink / raw)
  To: Timur Tabi, David Miller, netdev, Christopher Covington, alokc
In-Reply-To: <58506E00.9040801@codeaurora.org>

On 12/13/2016 01:54 PM, Timur Tabi wrote:
> On 12/13/2016 03:46 PM, Florian Fainelli wrote:
>> Is there a reason why the check is not moved down inwo
>> emac_clks_phase{1,2}_init functions? Do you anticipate other
>> ACPI-related changes in the future that would warrant having this check
>> moved at a higher level?
> 
> No, this is the last ACPI-related change that I expect.  I could move
> the check into those functions, but I don't see how that's any different
> than what I'm doing now.  My way avoids calling a function altogether,
> your way calls into a function only to have it return immediately.
> 
> But I don't have any strong feelings either way.  I will change it if
> you want me to.

No strong feelings either, it just seems easier and safer to move the
check down in the function and make it return success rather than
potentially affecting the error path within the caller of
emac_clks_phase{1,2}_init here.
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: don't try to claim clocks on ACPI systems
From: Timur Tabi @ 2016-12-13 22:05 UTC (permalink / raw)
  To: Florian Fainelli, David Miller, netdev, Christopher Covington,
	alokc
In-Reply-To: <e4666d2c-1690-0513-b2e2-57733f04a69c@gmail.com>

On 12/13/2016 04:02 PM, Florian Fainelli wrote:
> No strong feelings either, it just seems easier and safer to move the
> check down in the function and make it return success rather than
> potentially affecting the error path within the caller of
> emac_clks_phase{1,2}_init here.

I suppose that makes sense.  I'll post a V2.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Craig Gallek @ 2016-12-13 23:03 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Josef Bacik, Hannes Frederic Sowa, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <CALx6S34S82kFVqYW1M0ZuHx_Mxut4QaLFVerLX4aEsksrFirXg@mail.gmail.com>

On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@herbertland.com> wrote:
> I think there may be some suspicious code in inet_csk_get_port. At
> tb_found there is:
>
>                 if (((tb->fastreuse > 0 && reuse) ||
>                      (tb->fastreuseport > 0 &&
>                       !rcu_access_pointer(sk->sk_reuseport_cb) &&
>                       sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
>                     smallest_size == -1)
>                         goto success;
>                 if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb, true)) {
>                         if ((reuse ||
>                              (tb->fastreuseport > 0 &&
>                               sk->sk_reuseport &&
>                               !rcu_access_pointer(sk->sk_reuseport_cb) &&
>                               uid_eq(tb->fastuid, uid))) &&
>                             smallest_size != -1 && --attempts >= 0) {
>                                 spin_unlock_bh(&head->lock);
>                                 goto again;
>                         }
>                         goto fail_unlock;
>                 }
>
> AFAICT there is redundancy in these two conditionals.  The same clause
> is being checked in both: (tb->fastreuseport > 0 &&
> !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
> uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
> first conditional should be hit, goto done,  and the second will never
> evaluate that part to true-- unless the sk is changed (do we need
> READ_ONCE for sk->sk_reuseport_cb?).
That's an interesting point... It looks like this function also
changed in 4.6 from using a single local_bh_disable() at the beginning
with several spin_lock(&head->lock) to exclusively
spin_lock_bh(&head->lock) at each locking point.  Perhaps the full bh
disable variant was preventing the timers in your stack trace from
running interleaved with this function before?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox