Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 1/2] dt-bindings: add device tree binding for Allwinner XR819 SDIO Wi-Fi
From: Icenowy Zheng @ 2017-10-03 16:59 UTC (permalink / raw)
  To: Kalle Valo, Rob Herring, Maxime Ripard, Chen-Yu Tsai
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20171003165944.13056-1-icenowy-h8G6r0blFSE@public.gmane.org>

Allwinner XR819 is a SDIO Wi-Fi chip, which has the functionality to use
an out-of-band interrupt pin instead of SDIO in-band interrupt.

Add the device tree binding of this chip, in order to make it possible
to add this interrupt pin to device trees.

Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Changes in v3:
- Renames the node name.
- Adds ACK from Rob.
Changes in v2:
- Removed status property in example.
- Added required property reg.

 .../bindings/net/wireless/allwinner,xr819.txt      | 38 ++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt

diff --git a/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt b/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
new file mode 100644
index 000000000000..7ae40441e343
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
@@ -0,0 +1,38 @@
+Allwinner XRadio wireless SDIO devices
+
+This node provides properties for controlling the XRadio wireless device. The
+node is expected to be specified as a child node to the SDIO controller that
+connects the device to the system.
+
+Required properties:
+
+ - reg : The SDIO function number, see "Use of function subnodes" in
+	../../mmc/mmc.txt.
+ - compatible : Should be "allwinner,xr819".
+
+Optional properties:
+ - interrupt-parent : the phandle for the interrupt controller to which the
+	device interrupts are connected.
+ - interrupts : specifies attributes for the out-of-band interrupt (host-wake).
+	When not specified the device will use in-band SDIO interrupts.
+
+Example:
+
+mmc1: mmc@01c10000 {
+	#address-cells = <1>;
+	#size-cells = <0>;
+
+	pinctrl-names = "default";
+	pinctrl-0 = <&mmc1_pins_a>;
+	vmmc-supply = <&reg_vcc_wifi>;
+	mmc-pwrseq = <&wifi_pwrseq>;
+	bus-width = <4>;
+	non-removable;
+
+	xr819: wifi@1 {
+		reg = <1>;
+		compatible = "allwinner,xr819";
+		interrupt-parent = <&pio>;
+		interrupts = <6 10 IRQ_TYPE_EDGE_RISING>;
+	};
+};
-- 
2.13.5

^ permalink raw reply related

* [PATCH v3 0/2] Allwinner XR819 SDIO Wi-Fi DT binding and OPi Zero XR819 IRQ
From: Icenowy Zheng @ 2017-10-03 16:59 UTC (permalink / raw)
  To: Kalle Valo, Rob Herring, Maxime Ripard, Chen-Yu Tsai
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng

The Allwinner XR819 SDIO Wi-Fi chip supports an out-of-band interrupt line,
and the in-band interrupt is also supported.

However the current out-of-tree driver uses the out-of-band interrupt by
default.

This patchset adds the device tree binding for the chip as well as the
out-of-band interrupt, then adds the interrupt to the device tree of
Orange Pi Zero.

Icenowy Zheng (1):
  dt-bindings: add device tree binding for Allwinner XR819 SDIO Wi-Fi

Sergey Matyukevich (1):
  ARM: sun8i: h2+: specify wifi interrupts for Orange Pi Zero

 .../bindings/net/wireless/allwinner,xr819.txt      | 38 ++++++++++++++++++++++
 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts  |  3 ++
 2 files changed, 41 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt

-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Update comment for min_mtu
From: David Miller @ 2017-10-03 16:56 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh, arjun
In-Reply-To: <1507011185-7447-1-git-send-email-ganeshgr@chelsio.com>

From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Tue,  3 Oct 2017 11:43:05 +0530

> From: Arjun Vynipadath <arjun@chelsio.com>
> 
> We have lost a comment for minimum mtu value set for netdevice with
> 'commit d894be57ca92 ("ethernet: use net core MTU range checking in
> more drivers"). Updating it accordingly.
> 
> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied.

^ permalink raw reply

* Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned
From: Andy Lutomirski @ 2017-10-03 16:46 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Andy Lutomirski, Jia-Ju Bai, David S. Miller, Neil Horman,
	vyasevich, Kalle Valo, Linux Crypto Mailing List,
	Network Development, linux-sctp, Linux Wireless List
In-Reply-To: <20171003052643.GB22750@gondor.apana.org.au>

On Mon, Oct 2, 2017 at 10:26 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Oct 02, 2017 at 09:18:24PM -0700, Andy Lutomirski wrote:
>> > On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai <baijiaju1990@163.com> wrote:
>> >
>> > The SCTP program may sleep under a spinlock, and the function call path is:
>> > sctp_generate_t3_rtx_event (acquire the spinlock)
>> >  sctp_do_sm
>> >    sctp_side_effects
>> >      sctp_cmd_interpreter
>> >        sctp_make_init_ack
>> >          sctp_pack_cookie
>> >            crypto_shash_setkey
>> >              shash_setkey_unaligned
>> >                kmalloc(GFP_KERNEL)
>>
>> I'm going to go out on a limb here: why on Earth is out crypto API so
>> full of indirection that we allocate memory at all here?
>
> The crypto API operates on a one key per-tfm basis.  So normally
> tfm allocation and key setting is done once only and not done on
> the data path.
>
> I have looked at the SCTP code and it appears to fit this paradigm.
> That is, we should be able to allocate the tfm and set the key when
> the key is actually generated via get_random_bytes, rather than every
> time the key is used which is not only a waste but as you see runs
> into API issues.

It's a waste because it loses a pre-computation advantage.

The fact that it has memory allocation issues is crypto API's fault,
full stop.  There is no legit reason to need to allocate anything.

^ permalink raw reply

* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Ido Schimmel @ 2017-10-03 16:42 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Andrew Lunn, Toshiaki Makita, Toshiaki Makita, David Miller,
	netdev
In-Reply-To: <87infwtd0b.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

On Tue, Oct 03, 2017 at 12:25:08PM -0400, Vivien Didelot wrote:
> Andrew Lunn <andrew@lunn.ch> writes:
> 
> >> The vlan will be effective only when vlan_filtering is enabled.
> >> When vlan_filtering is disabled, vlan information is still kept in the
> >> bridge and gets effective later when vlan_filtering becomes enable.
> >
> > O.K, so things are starting to get clearer.
> >
> > So when vlan filtering is disabled, the hardware should just ignore
> > the requests to add the vlan to the hardware?
> >
> > When vlan_filtering is enabled, are all the vlans in the software
> > bridge again offloaded? Or do we need to remember all the vlans which
> > we ignored while vlan filtering was disabled? The average switch has
> > nowhere to store these disabled vlans. It can only store active vlans.
> 
> When vlan_filtering is enabled on the bridge, the bridge code does
> propagates the default_pvid again if I recall correctly.
> 
> In my opinion the hardware mustn't ignore the VLAN requests, because we
> seem to agree that vlan_filtering disabled means that the target ports
> should not care yet about 802.1Q. So having some unused hardware VLAN
> entries and some ports with disabled 802.1Q mode must work together.
> 
> That being said we still have the wrong hardware FDB populated when
> CONFIG_BRIDGE_VLAN_FILTERING is enabled but not vlan_filtering...

The driver can make sure it's able to handle the configured
`vlan_filtering` state during port enslavement to the bridge and also
forbid it from being toggled once it's enslaved.

^ permalink raw reply

* Re: BUG in free_netdev() on ppp link deletion
From: Guillaume Nault @ 2017-10-03 16:40 UTC (permalink / raw)
  To: Beniamino Galvani
  Cc: linux-ppp, netdev, Paul Mackerras, David Ahern, Gao Feng
In-Reply-To: <20171003074413.GA26158@tp>

On Tue, Oct 03, 2017 at 09:44:14AM +0200, Beniamino Galvani wrote:
>  Call Trace:
>   ppp_destroy_interface+0xd8/0xe0 [ppp_generic]
>   ppp_disconnect_channel+0xda/0x110 [ppp_generic]
>   ppp_unregister_channel+0x5e/0x110 [ppp_generic]
>   pppox_unbind_sock+0x23/0x30 [pppox]
>   pppoe_connect+0x130/0x440 [pppoe]
>   SYSC_connect+0x98/0x110
>   ? do_fcntl+0x2c0/0x5d0
>   SyS_connect+0xe/0x10
>   entry_SYSCALL_64_fastpath+0x1a/0xa5
>  RIP: 0033:0x7fa71f4af840
>  RSP: 002b:00007ffe4ea40bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
>  RAX: ffffffffffffffda RBX: 0000556d37ae0538 RCX: 00007fa71f4af840
>  RDX: 000000000000001e RSI: 00007ffe4ea40c00 RDI: 0000000000000008
>  RBP: 0000556d37b2a1b0 R08: 0000556d396e95b0 R09: 0000000000000008
>  R10: 00000000aaaaaaab R11: 0000000000000246 R12: 0000556d37adc008
>  R13: 0000556d37adc004 R14: 0000556d37b2a1a4 R15: 0000000000000000
>  Code: 04 00 00 04 e8 cb 52 e3 ff 5b 41 5c 41 5d 5d c3 41 0f b7 84 24 32 02 00 00 4c 89 e7 48 29 c7 e8 80 8b aa ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 
>  RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88
>  ---[ end trace ed294ff0cc40eeff ]---
> 
> To reproduce this, establish a PPP connection through pppd, then bring
> down and delete the ppp interface:
> 
>  # pppd nodetach lock user client plugin rp-pppoe.so ens11 noauth nodeflate password password &
>  Plugin rp-pppoe.so loaded.
>  RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
>  PPP session is 16
>  Connected to fe:54:00:5f:04:13 via interface ens11
>  Using interface ppp0
>  Connect: ppp0 <--> ens11
>  CHAP authentication succeeded: Access granted
>  CHAP authentication succeeded
>  peer from calling number FE:54:00:5F:04:13 authorized
>  local  IP address 3.1.1.10
>  remote IP address 3.1.1.1
> 
>  # ip l set ppp0 down
>  # ip l del ppp0
> 
> It does not happen every time but only when ppp_destroy_interface() is
> called with dev->reg_state = UNREGISTERING, set by the concurrent
> rtnl_delete_link().
> 
Indeed, we have a race here: ppp_destroy_interface() can be called before
netdev_run_todo() completes. I'm working on it.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-03 16:36 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Mark Rutland, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <CACT4Y+azruiH-oEXVFen8WiDHhz0PS3DF7OdrkSCJYVRmkhoiQ@mail.gmail.com>

On Tue, Oct 3, 2017 at 9:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Oct 3, 2017 at 5:38 PM, 'Eric Dumazet' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>> On Tue, Oct 3, 2017 at 8:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
>>> <syzkaller@googlegroups.com> wrote:
>>>> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>>>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>>>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>>>>> > skb_copy_and_csum_bits().
>>>>>
>>>>>> > kernel BUG at net/core/skbuff.c:2626!
>>>>>
>>>>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>>>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>>>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>>>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>>>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>>>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>>>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>>>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>>>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>>>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>>>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>>>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>>>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>>>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>>>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>>>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>>>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>>>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>>>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>>>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>>>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>>>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>>>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>>>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>>>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>>>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>>>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>>>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>>>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>>>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>>>>
>>>>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>>>>> on loopback device, below minimum size of ipv4 MTU.
>>>>>
>>>>>> I tried to track it in August [1], but it seems hard to find all the
>>>>>> issues with this.
>>>>>>
>>>>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>>>>> Author: Eric Dumazet <edumazet@google.com>
>>>>>> Date:   Wed Aug 16 11:09:12 2017 -0700
>>>>>>
>>>>>>     ipv4: better IP_MAX_MTU enforcement
>>>>>>
>>>>>>     While working on yet another syzkaller report, I found
>>>>>>     that our IP_MAX_MTU enforcements were not properly done.
>>>>>>
>>>>>>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>>>>>     final result can be bigger than IP_MAX_MTU :/
>>>>>>
>>>>>>     This is a problem because device mtu can be changed on other cpus or
>>>>>>     threads.
>>>>>>
>>>>>>     While this patch does not fix the issue I am working on, it is
>>>>>>     probably worth addressing it.
>>>>>
>>>>> Just to check I've understood correctly, are you suggesting that the
>>>>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>>>>> doesn't seem to exist today)?
>>>>
>>>> We have plenty of places this is checked.
>>>>
>>>> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>>>>
>>>> Problem is : these checks are not fool proof yet.
>>>>
>>>> ( Only the admin was supposed to play these games )
>>>>
>>>>>
>>>>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>>>>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>>>>> fallback) doesn't use WRITE_ONCE().
>>>>
>>>> It does not matter how many strange values can be observed by the reader :
>>>> We must be fool proof anyway from reader point of view, so the
>>>> WRITE_ONCE() is not strictly needed.
>>>
>>>
>>> Note if writer stores some temporal garbage there (which C language
>>> perfectly allows), it does not matter what we do on reader side --
>>> reader won't get correct data anyway. Say mtu changes from 1000 to
>>> 2000, but writer temporary stores 1 there, reader can observe 1 while
>>> it must not. Synchronization is always a game of two.
>>
>> Since we have no sync here, a reader _must_ cope with any MTU value.
>>
>> We need to care of any value, so we do not care how dummy writers can be.
>>
>> Sure, a WRITE_ONCE() will help avoiding some strange values being written,
>>  but since we _allow_ writers to write such strange values,
>> there is really no point pretending to be safe here.
>>
>> Adding a WRITE_ONCE() will not fix the bug.
>
>
> Reader must cope with any value. But there is an additional
> requirement that it must behave correctly.

As soon as we allow admin to change MTU on a device, we _are_ dropping packets.
This is absolutely normal so far.

On most NIC , an MTU change closes and opens the port, typically
blocking the device for several seconds.

Networking is best effort, there is no 'requirement' that networking
stack is supposed to cope with any events,
including silly admins.

The following is still legal

while :
do
 ifconfig lo mu random
done

And nothing asks us to deliver any packet while this loop is running.

We only have to prevent crashes ;)

> If mtu was 1000 and then
> reset to 2000 once (and not other manipulations with mtu), then
> correct behavior is either (1) sending packets with mtu 1000 or (2)
> sending packets with mtu 2000 (after mtu change) and nothing else.
> Sending packets with mtu 500, dropping packets because mtu is observed
> to be 1, or formatting hard drive are all incorrect behaviors and must
> not happen.
>
> What you say is valid for communication with user-space
> (copy_form_user, etc). Because there we don't control write side and
> racy writes are indistinguishable from intentional writes that do the
> same.

This is absolutely not relevant to the bug we are looking at at this
moment, even if true.

Unless you found the root cause, I will kindly ask you to open another
thread on lkml
to not pollute this one.

Thank you !

^ permalink raw reply

* Re: [PATCH iproute2] iproute: build more easily on Android
From: Lorenzo Colitti @ 2017-10-03 16:35 UTC (permalink / raw)
  To: enh; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <CAJgzZor7VMuTWgRUvVp_q=DHEm5bdgHNWxBJ3wG+X9GWTEwB7Q@mail.gmail.com>

On Tue, Oct 3, 2017 at 5:23 AM, enh <enh@google.com> wrote:
>> Rather than moving everything, why not make kernel headers directory
>> configurable as part of the configure script setup process.
>
> the problem is that C libraries with their our own uapi headers still
> need your app-specific headers. to build iproute2 we need to put
> iproute2's include/ on our include path, but then the fact that your
> different uapi headers are *under* that directory causes the conflict.

Right - when building iproute2 we must have .../iproute2/include in
the include paths.

So when, say, ip/link_iptnl.c does #include <linux/in.h>, that file is
in two places in the path - the C library includes and the iproute
includes, and those two files conflict with each other.

There's no way to tell the compiler "use external/iproute2/include but
not external/iproute2/include/linux". But if the iproute2 files are in
uapi/ , then a simple #include <linux/in.h> won't find the UAPI copy
and the files won't conflict.

^ permalink raw reply

* [net-next 9/9] fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We've had support for setting both a minimum and maximum bandwidth via
.ndo_set_vf_bw since commit 883a9ccbae56 ("fm10k: Add support for SR-IOV
to driver", 2014-09-20).

Likely because we do not support minimum rates, the declaration
mis-ordered the "unused" parameter, which causes warnings when analyzed
with cppcheck.

Fix this warning by properly declaring the min_rate and max_rate
variables in the declaration and definition (rather than using
"unused"). Also rename "rate" to max_rate so as to clarify that we only
support setting the maximum rate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h     | 4 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 9 +++++----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 40856bc0f3b9..46973fb234c5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -562,8 +562,8 @@ s32 fm10k_iov_update_pvid(struct fm10k_intfc *interface, u16 glort, u16 pvid);
 int fm10k_ndo_set_vf_mac(struct net_device *netdev, int vf_idx, u8 *mac);
 int fm10k_ndo_set_vf_vlan(struct net_device *netdev,
 			  int vf_idx, u16 vid, u8 qos, __be16 vlan_proto);
-int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, int rate,
-			int unused);
+int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
+			int __always_unused min_rate, int max_rate);
 int fm10k_ndo_get_vf_config(struct net_device *netdev,
 			    int vf_idx, struct ifla_vf_info *ivi);
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 4a17cc903eed..ea3ab24265ee 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -613,7 +613,7 @@ int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int vf_idx, u16 vid,
 }
 
 int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
-			int __always_unused unused, int rate)
+			int __always_unused min_rate, int max_rate)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
 	struct fm10k_iov_data *iov_data = interface->iov_data;
@@ -624,14 +624,15 @@ int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
 		return -EINVAL;
 
 	/* rate limit cannot be less than 10Mbs or greater than link speed */
-	if (rate && ((rate < FM10K_VF_TC_MIN) || rate > FM10K_VF_TC_MAX))
+	if (max_rate &&
+	    (max_rate < FM10K_VF_TC_MIN || max_rate > FM10K_VF_TC_MAX))
 		return -EINVAL;
 
 	/* store values */
-	iov_data->vf_info[vf_idx].rate = rate;
+	iov_data->vf_info[vf_idx].rate = max_rate;
 
 	/* update hardware configuration */
-	hw->iov.ops.configure_tc(hw, vf_idx, rate);
+	hw->iov.ops.configure_tc(hw, vf_idx, max_rate);
 
 	return 0;
 }
-- 
2.14.2

^ permalink raw reply related

* [net-next 8/9] fm10k: prefer %s and __func__ for diagnostic prints
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Don't hard code the function names in the diagnostic output when these
reset related routines fail. Instead, use %s and __func__ so that future
refactors don't need to change the print outs.

Additionally, while we are here, add missing function header comments
for the new reset_prepare and reset_done function handlers.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 58538ce997e1..1e9ae3197b17 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2588,11 +2588,18 @@ static void fm10k_io_resume(struct pci_dev *pdev)
 
 	if (err)
 		dev_warn(&pdev->dev,
-			 "fm10k_io_resume failed: %d\n", err);
+			 "%s failed: %d\n", __func__, err);
 	else
 		netif_device_attach(netdev);
 }
 
+/**
+ * fm10k_io_reset_prepare - called when PCI function is about to be reset
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the PCI function is about to be reset,
+ * allowing the device driver to prepare for it.
+ */
 static void fm10k_io_reset_prepare(struct pci_dev *pdev)
 {
 	/* warn incase we have any active VF devices */
@@ -2602,6 +2609,13 @@ static void fm10k_io_reset_prepare(struct pci_dev *pdev)
 	fm10k_prepare_suspend(pci_get_drvdata(pdev));
 }
 
+/**
+ * fm10k_io_reset_done - called when PCI function has finished resetting
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called just after the PCI function is reset, such as via
+ * /sys/class/net/<enpX>/device/reset or similar.
+ */
 static void fm10k_io_reset_done(struct pci_dev *pdev)
 {
 	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
@@ -2609,7 +2623,7 @@ static void fm10k_io_reset_done(struct pci_dev *pdev)
 
 	if (err) {
 		dev_warn(&pdev->dev,
-			 "fm10k_io_reset_notify failed: %d\n", err);
+			 "%s failed: %d\n", __func__, err);
 		netif_device_detach(interface->netdev);
 	}
 }
-- 
2.14.2

^ permalink raw reply related

* [net-next 7/9] fm10k: Fix misuse of net_ratelimit()
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Joe Perches, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Joe Perches <joe@perches.com>

Correct the backward logic using !net_ratelimit()

Miscellanea:

o Add a blank line before the error return label

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 5d56ed5ad7a6..dbd69310f263 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -806,9 +806,10 @@ static int fm10k_tso(struct fm10k_ring *tx_ring,
 	tx_desc->mss = cpu_to_le16(skb_shinfo(skb)->gso_size);
 
 	return 1;
+
 err_vxlan:
 	tx_ring->netdev->features &= ~NETIF_F_GSO_UDP_TUNNEL;
-	if (!net_ratelimit())
+	if (net_ratelimit())
 		netdev_err(tx_ring->netdev,
 			   "TSO requested for unsupported tunnel, disabling offload\n");
 	return -1;
-- 
2.14.2

^ permalink raw reply related

* [net-next 6/9] fm10k: bump version number
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 189d52a8a605..5d56ed5ad7a6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -28,7 +28,7 @@
 
 #include "fm10k.h"
 
-#define DRV_VERSION	"0.21.7-k"
+#define DRV_VERSION	"0.22.1-k"
 #define DRV_SUMMARY	"Intel(R) Ethernet Switch Host Interface Driver"
 const char fm10k_driver_version[] = DRV_VERSION;
 char fm10k_driver_name[] = "fm10k";
-- 
2.14.2

^ permalink raw reply related

* [net-next 4/9] fm10k: introduce a message queue for MAC/VLAN messages
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Under some circumstances, when dealing with a large number of MAC
address or VLAN updates at once, the fm10k driver, particularly the VFs
can overload the mailbox with too many messages at once.

This results in a mailbox timeout, which causes the driver to initiate
a reset. During the reset, we re-send all the same messages that
originally caused the timeout. This results in a cycle of resets each
triggering a future reset.

To fix or avoid this, we introduce a workqueue item which monitors
a queue of MAC and VLAN requests. These requests are queued to the end
of the list, and we process as a FIFO periodically.

Initially we only handle requests for the netdev, but we do handle
unicast MAC addresses, multicast MAC addresses, and update VLAN
requests.

A future patch will add support to use this queue for handling MAC
update requests from the VF<->PF mailbox.

The MAC/VLAN work item will keep checking to make sure that each request
does not overflow the mailbox and cause a timeout. If it might, then the
work item will reschedule itself a short time later. This avoids any
reset cycle, since we never send the message if the mailbox is not
ready.

As an alternative, we tried increasing the mailbox message FIFO, but
this just delays the problem and results in needless memory waste on the
system. Our new message queue is dynamically allocated so only uses as
much memory as it needs. Additionally, it need not be contiguous like
the Tx and Rx FIFOs.

Note that this patch chose to only create a queue for MAC and VLAN
messages, since these are the only messages sent in a large enough
volume to cause the reset loop. Other messages are very unlikely to
overflow the mailbox Tx FIFO so easily.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h        |  39 +++++
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 199 ++++++++++++++++++-----
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c    | 201 ++++++++++++++++++++++++
 3 files changed, 397 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 74542e9d63c7..40856bc0f3b9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -248,6 +248,29 @@ struct fm10k_udp_port {
 	__be16			port;
 };
 
+enum fm10k_macvlan_request_type {
+	FM10K_UC_MAC_REQUEST,
+	FM10K_MC_MAC_REQUEST,
+	FM10K_VLAN_REQUEST
+};
+
+struct fm10k_macvlan_request {
+	enum fm10k_macvlan_request_type type;
+	struct list_head list;
+	union {
+		struct fm10k_mac_request {
+			u8 addr[ETH_ALEN];
+			u16 glort;
+			u16 vid;
+		} mac;
+		struct fm10k_vlan_request {
+			u32 vid;
+			u8 vsi;
+		} vlan;
+	};
+	bool set;
+};
+
 /* one work queue for entire driver */
 extern struct workqueue_struct *fm10k_workqueue;
 
@@ -276,6 +299,9 @@ enum fm10k_state_t {
 	__FM10K_SERVICE_SCHED,
 	__FM10K_SERVICE_REQUEST,
 	__FM10K_SERVICE_DISABLE,
+	__FM10K_MACVLAN_SCHED,
+	__FM10K_MACVLAN_REQUEST,
+	__FM10K_MACVLAN_DISABLE,
 	__FM10K_LINK_DOWN,
 	__FM10K_UPDATING_STATS,
 	/* This value must be last and determines the BITMAP size */
@@ -368,6 +394,12 @@ struct fm10k_intfc {
 	struct list_head vxlan_port;
 	struct list_head geneve_port;
 
+	/* MAC/VLAN update queue */
+	struct list_head macvlan_requests;
+	struct delayed_work macvlan_task;
+	/* MAC/VLAN update queue lock */
+	spinlock_t macvlan_lock;
+
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *dbg_intfc;
 #endif /* CONFIG_DEBUG_FS */
@@ -487,6 +519,7 @@ void fm10k_up(struct fm10k_intfc *interface);
 void fm10k_down(struct fm10k_intfc *interface);
 void fm10k_update_stats(struct fm10k_intfc *interface);
 void fm10k_service_event_schedule(struct fm10k_intfc *interface);
+void fm10k_macvlan_schedule(struct fm10k_intfc *interface);
 void fm10k_update_rx_drop_en(struct fm10k_intfc *interface);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 void fm10k_netpoll(struct net_device *netdev);
@@ -507,6 +540,12 @@ void fm10k_reset_rx_state(struct fm10k_intfc *);
 int fm10k_setup_tc(struct net_device *dev, u8 tc);
 int fm10k_open(struct net_device *netdev);
 int fm10k_close(struct net_device *netdev);
+int fm10k_queue_vlan_request(struct fm10k_intfc *interface, u32 vid,
+			     u8 vsi, bool set);
+int fm10k_queue_mac_request(struct fm10k_intfc *interface, u16 glort,
+			    const unsigned char *addr, u16 vid, bool set);
+void fm10k_clear_macvlan_queue(struct fm10k_intfc *interface,
+			       u16 glort, bool vlans);
 
 /* Ethtool */
 void fm10k_set_ethtool_ops(struct net_device *dev);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 77d495fedced..81e4425f0529 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -758,11 +758,132 @@ static bool fm10k_host_mbx_ready(struct fm10k_intfc *interface)
 	return (hw->mac.type == fm10k_mac_vf || interface->host_ready);
 }
 
+/**
+ * fm10k_queue_vlan_request - Queue a VLAN update request
+ * @interface: the fm10k interface structure
+ * @vid: the VLAN vid
+ * @vsi: VSI index number
+ * @set: whether to set or clear
+ *
+ * This function queues up a VLAN update. For VFs, this must be sent to the
+ * managing PF over the mailbox. For PFs, we'll use the same handling so that
+ * it's similar to the VF. This avoids storming the PF<->VF mailbox with too
+ * many VLAN updates during reset.
+ */
+int fm10k_queue_vlan_request(struct fm10k_intfc *interface,
+			     u32 vid, u8 vsi, bool set)
+{
+	struct fm10k_macvlan_request *request;
+	unsigned long flags;
+
+	/* This must be atomic since we may be called while the netdev
+	 * addr_list_lock is held
+	 */
+	request = kzalloc(sizeof(*request), GFP_ATOMIC);
+	if (!request)
+		return -ENOMEM;
+
+	request->type = FM10K_VLAN_REQUEST;
+	request->vlan.vid = vid;
+	request->vlan.vsi = vsi;
+	request->set = set;
+
+	spin_lock_irqsave(&interface->macvlan_lock, flags);
+	list_add_tail(&request->list, &interface->macvlan_requests);
+	spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+	fm10k_macvlan_schedule(interface);
+
+	return 0;
+}
+
+/**
+ * fm10k_queue_mac_request - Queue a MAC update request
+ * @interface: the fm10k interface structure
+ * @glort: the target glort for this update
+ * @addr: the address to update
+ * @vid: the vid to update
+ * @sync: whether to add or remove
+ *
+ * This function queues up a MAC request for sending to the switch manager.
+ * A separate thread monitors the queue and sends updates to the switch
+ * manager. Return 0 on success, and negative error code on failure.
+ **/
+int fm10k_queue_mac_request(struct fm10k_intfc *interface, u16 glort,
+			    const unsigned char *addr, u16 vid, bool set)
+{
+	struct fm10k_macvlan_request *request;
+	unsigned long flags;
+
+	/* This must be atomic since we may be called while the netdev
+	 * addr_list_lock is held
+	 */
+	request = kzalloc(sizeof(*request), GFP_ATOMIC);
+	if (!request)
+		return -ENOMEM;
+
+	if (is_multicast_ether_addr(addr))
+		request->type = FM10K_MC_MAC_REQUEST;
+	else
+		request->type = FM10K_UC_MAC_REQUEST;
+
+	ether_addr_copy(request->mac.addr, addr);
+	request->mac.glort = glort;
+	request->mac.vid = vid;
+	request->set = set;
+
+	spin_lock_irqsave(&interface->macvlan_lock, flags);
+	list_add_tail(&request->list, &interface->macvlan_requests);
+	spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+	fm10k_macvlan_schedule(interface);
+
+	return 0;
+}
+
+/**
+ * fm10k_clear_macvlan_queue - Cancel pending updates for a given glort
+ * @interface: the fm10k interface structure
+ * @glort: the target glort to clear
+ * @vlans: true to clear VLAN messages, false to ignore them
+ *
+ * Cancel any outstanding MAC/VLAN requests for a given glort. This is
+ * expected to be called when a logical port goes down.
+ **/
+void fm10k_clear_macvlan_queue(struct fm10k_intfc *interface,
+			       u16 glort, bool vlans)
+
+{
+	struct fm10k_macvlan_request *r, *tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&interface->macvlan_lock, flags);
+
+	/* Free any outstanding MAC/VLAN requests for this interface */
+	list_for_each_entry_safe(r, tmp, &interface->macvlan_requests, list) {
+		switch (r->type) {
+		case FM10K_MC_MAC_REQUEST:
+		case FM10K_UC_MAC_REQUEST:
+			/* Don't free requests for other interfaces */
+			if (r->mac.glort != glort)
+				break;
+			/* fall through */
+		case FM10K_VLAN_REQUEST:
+			if (vlans) {
+				list_del(&r->list);
+				kfree(r);
+			}
+			break;
+		}
+	}
+
+	spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+}
+
 static int fm10k_uc_vlan_unsync(struct net_device *netdev,
 				const unsigned char *uc_addr)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
-	struct fm10k_hw *hw = &interface->hw;
 	u16 glort = interface->glort;
 	u16 vid = interface->vid;
 	bool set = !!(vid / VLAN_N_VID);
@@ -771,10 +892,7 @@ static int fm10k_uc_vlan_unsync(struct net_device *netdev,
 	/* drop any leading bits on the VLAN ID */
 	vid &= VLAN_N_VID - 1;
 
-	if (fm10k_host_mbx_ready(interface))
-		err = hw->mac.ops.update_uc_addr(hw, glort, uc_addr,
-						 vid, set, 0);
-
+	err = fm10k_queue_mac_request(interface, glort, uc_addr, vid, set);
 	if (err)
 		return err;
 
@@ -786,7 +904,6 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
 				const unsigned char *mc_addr)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
-	struct fm10k_hw *hw = &interface->hw;
 	u16 glort = interface->glort;
 	u16 vid = interface->vid;
 	bool set = !!(vid / VLAN_N_VID);
@@ -795,9 +912,7 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
 	/* drop any leading bits on the VLAN ID */
 	vid &= VLAN_N_VID - 1;
 
-	if (fm10k_host_mbx_ready(interface))
-		err = hw->mac.ops.update_mc_addr(hw, glort, mc_addr, vid, set);
-
+	err = fm10k_queue_mac_request(interface, glort, mc_addr, vid, set);
 	if (err)
 		return err;
 
@@ -855,18 +970,14 @@ static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
 
 	/* only need to update the VLAN if not in promiscuous mode */
 	if (!(netdev->flags & IFF_PROMISC)) {
-		err = hw->mac.ops.update_vlan(hw, vid, 0, set);
+		err = fm10k_queue_vlan_request(interface, vid, 0, set);
 		if (err)
 			goto err_out;
 	}
 
-	/* update our base MAC address if host's mailbox is ready */
-	if (fm10k_host_mbx_ready(interface))
-		err = hw->mac.ops.update_uc_addr(hw, interface->glort,
-						 hw->mac.addr, vid, set, 0);
-	else
-		err = -EHOSTDOWN;
-
+	/* Update our base MAC address */
+	err = fm10k_queue_mac_request(interface, interface->glort,
+				      hw->mac.addr, vid, set);
 	if (err)
 		goto err_out;
 
@@ -910,7 +1021,6 @@ static u16 fm10k_find_next_vlan(struct fm10k_intfc *interface, u16 vid)
 
 static void fm10k_clear_unused_vlans(struct fm10k_intfc *interface)
 {
-	struct fm10k_hw *hw = &interface->hw;
 	u32 vid, prev_vid;
 
 	/* loop through and find any gaps in the table */
@@ -922,7 +1032,7 @@ static void fm10k_clear_unused_vlans(struct fm10k_intfc *interface)
 
 		/* send request to clear multiple bits at a time */
 		prev_vid += (vid - prev_vid - 1) << FM10K_VLAN_LENGTH_SHIFT;
-		hw->mac.ops.update_vlan(hw, prev_vid, 0, false);
+		fm10k_queue_vlan_request(interface, prev_vid, 0, false);
 	}
 }
 
@@ -937,15 +1047,11 @@ static int __fm10k_uc_sync(struct net_device *dev,
 	if (!is_valid_ether_addr(addr))
 		return -EADDRNOTAVAIL;
 
-	/* update table with current entries if host's mailbox is ready */
-	if (!fm10k_host_mbx_ready(interface))
-		return -EHOSTDOWN;
-
 	for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
 	     vid < VLAN_N_VID;
 	     vid = fm10k_find_next_vlan(interface, vid)) {
-		err = hw->mac.ops.update_uc_addr(hw, glort, addr,
-						 vid, sync, 0);
+		err = fm10k_queue_mac_request(interface, glort,
+					      addr, vid, sync);
 		if (err)
 			return err;
 	}
@@ -1002,15 +1108,18 @@ static int __fm10k_mc_sync(struct net_device *dev,
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	struct fm10k_hw *hw = &interface->hw;
 	u16 vid, glort = interface->glort;
+	s32 err;
 
-	/* update table with current entries if host's mailbox is ready */
-	if (!fm10k_host_mbx_ready(interface))
-		return 0;
+	if (!is_multicast_ether_addr(addr))
+		return -EADDRNOTAVAIL;
 
 	for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
 	     vid < VLAN_N_VID;
 	     vid = fm10k_find_next_vlan(interface, vid)) {
-		hw->mac.ops.update_mc_addr(hw, glort, addr, vid, sync);
+		err = fm10k_queue_mac_request(interface, glort,
+					      addr, vid, sync);
+		if (err)
+			return err;
 	}
 
 	return 0;
@@ -1050,7 +1159,8 @@ static void fm10k_set_rx_mode(struct net_device *dev)
 	if (interface->xcast_mode != xcast_mode) {
 		/* update VLAN table */
 		if (xcast_mode == FM10K_XCAST_MODE_PROMISC)
-			hw->mac.ops.update_vlan(hw, FM10K_VLAN_ALL, 0, true);
+			fm10k_queue_vlan_request(interface, FM10K_VLAN_ALL,
+						 0, true);
 		if (interface->xcast_mode == FM10K_XCAST_MODE_PROMISC)
 			fm10k_clear_unused_vlans(interface);
 
@@ -1098,22 +1208,20 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
 					       interface->glort_count, true);
 
 	/* update VLAN table */
-	hw->mac.ops.update_vlan(hw, FM10K_VLAN_ALL, 0,
-				xcast_mode == FM10K_XCAST_MODE_PROMISC);
+	fm10k_queue_vlan_request(interface, FM10K_VLAN_ALL, 0,
+				 xcast_mode == FM10K_XCAST_MODE_PROMISC);
 
 	/* Add filter for VLAN 0 */
-	hw->mac.ops.update_vlan(hw, 0, 0, true);
+	fm10k_queue_vlan_request(interface, 0, 0, true);
 
 	/* update table with current entries */
 	for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
 	     vid < VLAN_N_VID;
 	     vid = fm10k_find_next_vlan(interface, vid)) {
-		hw->mac.ops.update_vlan(hw, vid, 0, true);
+		fm10k_queue_vlan_request(interface, vid, 0, true);
 
-		/* Update unicast entries if host's mailbox is ready */
-		if (fm10k_host_mbx_ready(interface))
-			hw->mac.ops.update_uc_addr(hw, glort, hw->mac.addr,
-						   vid, true, 0);
+		fm10k_queue_mac_request(interface, glort,
+					hw->mac.addr, vid, true);
 	}
 
 	/* update xcast mode before synchronizing addresses if host's mailbox
@@ -1140,6 +1248,13 @@ void fm10k_reset_rx_state(struct fm10k_intfc *interface)
 	struct net_device *netdev = interface->netdev;
 	struct fm10k_hw *hw = &interface->hw;
 
+	/* Wait for MAC/VLAN work to finish */
+	while (test_bit(__FM10K_MACVLAN_SCHED, interface->state))
+		usleep_range(1000, 2000);
+
+	/* Cancel pending MAC/VLAN requests */
+	fm10k_clear_macvlan_queue(interface, interface->glort, true);
+
 	fm10k_mbx_lock(interface);
 
 	/* clear the logical port state on lower device if host's mailbox is
@@ -1374,8 +1489,8 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 	if (fm10k_host_mbx_ready(interface)) {
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_MULTI);
-		hw->mac.ops.update_uc_addr(hw, glort, sdev->dev_addr,
-					   0, true, 0);
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					0, true);
 	}
 
 	fm10k_mbx_unlock(interface);
@@ -1414,8 +1529,8 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 	if (fm10k_host_mbx_ready(interface)) {
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_NONE);
-		hw->mac.ops.update_uc_addr(hw, glort, sdev->dev_addr,
-					   0, false, 0);
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					0, false);
 	}
 
 	fm10k_mbx_unlock(interface);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index aef39909e4a2..58538ce997e1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -91,6 +91,76 @@ static int fm10k_hw_ready(struct fm10k_intfc *interface)
 	return FM10K_REMOVED(hw->hw_addr) ? -ENODEV : 0;
 }
 
+/**
+ * fm10k_macvlan_schedule - Schedule MAC/VLAN queue task
+ * @interface: fm10k private interface structure
+ *
+ * Schedule the MAC/VLAN queue monitor task. If the MAC/VLAN task cannot be
+ * started immediately, request that it be restarted when possible.
+ */
+void fm10k_macvlan_schedule(struct fm10k_intfc *interface)
+{
+	/* Avoid processing the MAC/VLAN queue when the service task is
+	 * disabled, or when we're resetting the device.
+	 */
+	if (!test_bit(__FM10K_MACVLAN_DISABLE, interface->state) &&
+	    !test_and_set_bit(__FM10K_MACVLAN_SCHED, interface->state)) {
+		clear_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+		/* We delay the actual start of execution in order to allow
+		 * multiple MAC/VLAN updates to accumulate before handling
+		 * them, and to allow some time to let the mailbox drain
+		 * between runs.
+		 */
+		queue_delayed_work(fm10k_workqueue,
+				   &interface->macvlan_task, 10);
+	} else {
+		set_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+	}
+}
+
+/**
+ * fm10k_stop_macvlan_task - Stop the MAC/VLAN queue monitor
+ * @interface: fm10k private interface structure
+ *
+ * Wait until the MAC/VLAN queue task has stopped, and cancel any future
+ * requests.
+ */
+static void fm10k_stop_macvlan_task(struct fm10k_intfc *interface)
+{
+	/* Disable the MAC/VLAN work item */
+	set_bit(__FM10K_MACVLAN_DISABLE, interface->state);
+
+	/* Make sure we waited until any current invocations have stopped */
+	cancel_delayed_work_sync(&interface->macvlan_task);
+
+	/* We set the __FM10K_MACVLAN_SCHED bit when we schedule the task.
+	 * However, it may not be unset of the MAC/VLAN task never actually
+	 * got a chance to run. Since we've canceled the task here, and it
+	 * cannot be rescheuled right now, we need to ensure the scheduled bit
+	 * gets unset.
+	 */
+	clear_bit(__FM10K_MACVLAN_SCHED, interface->state);
+}
+
+/**
+ * fm10k_resume_macvlan_task - Restart the MAC/VLAN queue monitor
+ * @interface: fm10k private interface structure
+ *
+ * Clear the __FM10K_MACVLAN_DISABLE bit and, if a request occurred, schedule
+ * the MAC/VLAN work monitor.
+ */
+static void fm10k_resume_macvlan_task(struct fm10k_intfc *interface)
+{
+	/* Re-enable the MAC/VLAN work item */
+	clear_bit(__FM10K_MACVLAN_DISABLE, interface->state);
+
+	/* We might have received a MAC/VLAN request while disabled. If so,
+	 * kick off the queue now.
+	 */
+	if (test_bit(__FM10K_MACVLAN_REQUEST, interface->state))
+		fm10k_macvlan_schedule(interface);
+}
+
 void fm10k_service_event_schedule(struct fm10k_intfc *interface)
 {
 	if (!test_bit(__FM10K_SERVICE_DISABLE, interface->state) &&
@@ -174,6 +244,12 @@ static bool fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 	if (test_and_set_bit(__FM10K_RESETTING, interface->state))
 		return false;
 
+	/* As the MAC/VLAN task will be accessing registers it must not be
+	 * running while we reset. Although the task will not be scheduled
+	 * once we start resetting it may already be running
+	 */
+	fm10k_stop_macvlan_task(interface);
+
 	rtnl_lock();
 
 	fm10k_iov_suspend(interface->pdev);
@@ -258,6 +334,8 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
 
 	rtnl_unlock();
 
+	fm10k_resume_macvlan_task(interface);
+
 	clear_bit(__FM10K_RESETTING, interface->state);
 
 	return err;
@@ -686,6 +764,112 @@ static void fm10k_service_task(struct work_struct *work)
 	fm10k_service_event_complete(interface);
 }
 
+/**
+ * fm10k_macvlan_task - send queued MAC/VLAN requests to switch manager
+ * @work: pointer to work_struct containing our data
+ *
+ * This work item handles sending MAC/VLAN updates to the switch manager. When
+ * the interface is up, it will attempt to queue mailbox messages to the
+ * switch manager requesting updates for MAC/VLAN pairs. If the Tx fifo of the
+ * mailbox is full, it will reschedule itself to try again in a short while.
+ * This ensures that the driver does not overload the switch mailbox with too
+ * many simultaneous requests, causing an unnecessary reset.
+ **/
+static void fm10k_macvlan_task(struct work_struct *work)
+{
+	struct fm10k_macvlan_request *item;
+	struct fm10k_intfc *interface;
+	struct delayed_work *dwork;
+	struct list_head *requests;
+	struct fm10k_hw *hw;
+	unsigned long flags;
+
+	dwork = to_delayed_work(work);
+	interface = container_of(dwork, struct fm10k_intfc, macvlan_task);
+	hw = &interface->hw;
+	requests = &interface->macvlan_requests;
+
+	do {
+		/* Pop the first item off the list */
+		spin_lock_irqsave(&interface->macvlan_lock, flags);
+		item = list_first_entry_or_null(requests,
+						struct fm10k_macvlan_request,
+						list);
+		if (item)
+			list_del_init(&item->list);
+
+		spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+		/* We have no more items to process */
+		if (!item)
+			goto done;
+
+		fm10k_mbx_lock(interface);
+
+		/* Check that we have plenty of space to send the message. We
+		 * want to ensure that the mailbox stays low enough to avoid a
+		 * change in the host state, otherwise we may see spurious
+		 * link up / link down notifications.
+		 */
+		if (!hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU + 5)) {
+			hw->mbx.ops.process(hw, &hw->mbx);
+			set_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+			fm10k_mbx_unlock(interface);
+
+			/* Put the request back on the list */
+			spin_lock_irqsave(&interface->macvlan_lock, flags);
+			list_add(&item->list, requests);
+			spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+			break;
+		}
+
+		switch (item->type) {
+		case FM10K_MC_MAC_REQUEST:
+			hw->mac.ops.update_mc_addr(hw,
+						   item->mac.glort,
+						   item->mac.addr,
+						   item->mac.vid,
+						   item->set);
+			break;
+		case FM10K_UC_MAC_REQUEST:
+			hw->mac.ops.update_uc_addr(hw,
+						   item->mac.glort,
+						   item->mac.addr,
+						   item->mac.vid,
+						   item->set,
+						   0);
+			break;
+		case FM10K_VLAN_REQUEST:
+			hw->mac.ops.update_vlan(hw,
+						item->vlan.vid,
+						item->vlan.vsi,
+						item->set);
+			break;
+		default:
+			break;
+		}
+
+		fm10k_mbx_unlock(interface);
+
+		/* Free the item now that we've sent the update */
+		kfree(item);
+	} while (true);
+
+done:
+	WARN_ON(!test_bit(__FM10K_MACVLAN_SCHED, interface->state));
+
+	/* flush memory to make sure state is correct */
+	smp_mb__before_atomic();
+	clear_bit(__FM10K_MACVLAN_SCHED, interface->state);
+
+	/* If a MAC/VLAN request was scheduled since we started, we should
+	 * re-schedule. However, there is no reason to re-schedule if there is
+	 * no work to do.
+	 */
+	if (test_bit(__FM10K_MACVLAN_REQUEST, interface->state))
+		fm10k_macvlan_schedule(interface);
+}
+
 /**
  * fm10k_configure_tx_ring - Configure Tx ring after Reset
  * @interface: board private structure
@@ -1918,11 +2102,15 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
 	INIT_LIST_HEAD(&interface->vxlan_port);
 	INIT_LIST_HEAD(&interface->geneve_port);
 
+	/* Initialize the MAC/VLAN queue */
+	INIT_LIST_HEAD(&interface->macvlan_requests);
+
 	netdev_rss_key_fill(rss_key, sizeof(rss_key));
 	memcpy(interface->rssrk, rss_key, sizeof(rss_key));
 
 	/* Initialize the mailbox lock */
 	spin_lock_init(&interface->mbx_lock);
+	spin_lock_init(&interface->macvlan_lock);
 
 	/* Start off interface as being down */
 	set_bit(__FM10K_DOWN, interface->state);
@@ -2131,6 +2319,9 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		    (unsigned long)interface);
 	INIT_WORK(&interface->service_task, fm10k_service_task);
 
+	/* Setup the MAC/VLAN queue */
+	INIT_DELAYED_WORK(&interface->macvlan_task, fm10k_macvlan_task);
+
 	/* kick off service timer now, even when interface is down */
 	mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
 
@@ -2184,6 +2375,10 @@ static void fm10k_remove(struct pci_dev *pdev)
 	del_timer_sync(&interface->service_timer);
 
 	fm10k_stop_service_event(interface);
+	fm10k_stop_macvlan_task(interface);
+
+	/* Remove all pending MAC/VLAN requests */
+	fm10k_clear_macvlan_queue(interface, interface->glort, true);
 
 	/* free netdev, this may bounce the interrupts due to setup_tc */
 	if (netdev->reg_state == NETREG_REGISTERED)
@@ -2220,6 +2415,9 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
 	 * a surprise remove if the PCIe device is disabled while we're
 	 * stopped. We stop the watchdog task until after we resume software
 	 * activity.
+	 *
+	 * Note that the MAC/VLAN task will be stopped as part of preparing
+	 * for reset so we don't need to handle it here.
 	 */
 	fm10k_stop_service_event(interface);
 
@@ -2259,6 +2457,9 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
 	/* restart the service task */
 	fm10k_start_service_event(interface);
 
+	/* Restart the MAC/VLAN request queue in-case of outstanding events */
+	fm10k_macvlan_schedule(interface);
+
 	return err;
 }
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 5/9] fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
from the netdev, replace the default handler for the VF<->PF messages.
This new handler is very similar to the default code, but uses the
MAC/VLAN queue instead of sending the message directly. Unfortunately we
can't easily re-use the default code, so we'll just replace the entire
function.

This ensures that a VF requesting a large number of VLANs or MAC
addresses does not start a reset cycle, as explained in the commit which
introduced the message queue.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 132 ++++++++++++++++++++++++++-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h  |   3 +-
 3 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 03897720bf0b..4a17cc903eed 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -35,10 +35,133 @@ static s32 fm10k_iov_msg_error(struct fm10k_hw *hw, u32 **results,
 	return fm10k_tlv_msg_error(hw, results, mbx);
 }
 
+/**
+ *  fm10k_iov_msg_queue_mac_vlan - Message handler for MAC/VLAN request from VF
+ *  @hw: Pointer to hardware structure
+ *  @results: Pointer array to message, results[0] is pointer to message
+ *  @mbx: Pointer to mailbox information structure
+ *
+ *  This function is a custom handler for MAC/VLAN requests from the VF. The
+ *  assumption is that it is acceptable to directly hand off the message from
+ *  the VF to the PF's switch manager. However, we use a MAC/VLAN message
+ *  queue to avoid overloading the mailbox when a large number of requests
+ *  come in.
+ **/
+static s32 fm10k_iov_msg_queue_mac_vlan(struct fm10k_hw *hw, u32 **results,
+					struct fm10k_mbx_info *mbx)
+{
+	struct fm10k_vf_info *vf_info = (struct fm10k_vf_info *)mbx;
+	struct fm10k_intfc *interface = hw->back;
+	u8 mac[ETH_ALEN];
+	u32 *result;
+	int err = 0;
+	bool set;
+	u16 vlan;
+	u32 vid;
+
+	/* we shouldn't be updating rules on a disabled interface */
+	if (!FM10K_VF_FLAG_ENABLED(vf_info))
+		err = FM10K_ERR_PARAM;
+
+	if (!err && !!results[FM10K_MAC_VLAN_MSG_VLAN]) {
+		result = results[FM10K_MAC_VLAN_MSG_VLAN];
+
+		/* record VLAN id requested */
+		err = fm10k_tlv_attr_get_u32(result, &vid);
+		if (err)
+			return err;
+
+		set = !(vid & FM10K_VLAN_CLEAR);
+		vid &= ~FM10K_VLAN_CLEAR;
+
+		/* if the length field has been set, this is a multi-bit
+		 * update request. For multi-bit requests, simply disallow
+		 * them when the pf_vid has been set. In this case, the PF
+		 * should have already cleared the VLAN_TABLE, and if we
+		 * allowed them, it could allow a rogue VF to receive traffic
+		 * on a VLAN it was not assigned. In the single-bit case, we
+		 * need to modify requests for VLAN 0 to use the default PF or
+		 * SW vid when assigned.
+		 */
+
+		if (vid >> 16) {
+			/* prevent multi-bit requests when PF has
+			 * administratively set the VLAN for this VF
+			 */
+			if (vf_info->pf_vid)
+				return FM10K_ERR_PARAM;
+		} else {
+			err = fm10k_iov_select_vid(vf_info, (u16)vid);
+			if (err < 0)
+				return err;
+
+			vid = err;
+		}
+
+		/* update VSI info for VF in regards to VLAN table */
+		err = hw->mac.ops.update_vlan(hw, vid, vf_info->vsi, set);
+	}
+
+	if (!err && !!results[FM10K_MAC_VLAN_MSG_MAC]) {
+		result = results[FM10K_MAC_VLAN_MSG_MAC];
+
+		/* record unicast MAC address requested */
+		err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan);
+		if (err)
+			return err;
+
+		/* block attempts to set MAC for a locked device */
+		if (is_valid_ether_addr(vf_info->mac) &&
+		    !ether_addr_equal(mac, vf_info->mac))
+			return FM10K_ERR_PARAM;
+
+		set = !(vlan & FM10K_VLAN_CLEAR);
+		vlan &= ~FM10K_VLAN_CLEAR;
+
+		err = fm10k_iov_select_vid(vf_info, vlan);
+		if (err < 0)
+			return err;
+
+		vlan = (u16)err;
+
+		/* Add this request to the MAC/VLAN queue */
+		err = fm10k_queue_mac_request(interface, vf_info->glort,
+					      mac, vlan, set);
+	}
+
+	if (!err && !!results[FM10K_MAC_VLAN_MSG_MULTICAST]) {
+		result = results[FM10K_MAC_VLAN_MSG_MULTICAST];
+
+		/* record multicast MAC address requested */
+		err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan);
+		if (err)
+			return err;
+
+		/* verify that the VF is allowed to request multicast */
+		if (!(vf_info->vf_flags & FM10K_VF_FLAG_MULTI_ENABLED))
+			return FM10K_ERR_PARAM;
+
+		set = !(vlan & FM10K_VLAN_CLEAR);
+		vlan &= ~FM10K_VLAN_CLEAR;
+
+		err = fm10k_iov_select_vid(vf_info, vlan);
+		if (err < 0)
+			return err;
+
+		vlan = (u16)err;
+
+		/* Add this request to the MAC/VLAN queue */
+		err = fm10k_queue_mac_request(interface, vf_info->glort,
+					      mac, vlan, set);
+	}
+
+	return err;
+}
+
 static const struct fm10k_msg_data iov_mbx_data[] = {
 	FM10K_TLV_MSG_TEST_HANDLER(fm10k_tlv_msg_test),
 	FM10K_VF_MSG_MSIX_HANDLER(fm10k_iov_msg_msix_pf),
-	FM10K_VF_MSG_MAC_VLAN_HANDLER(fm10k_iov_msg_mac_vlan_pf),
+	FM10K_VF_MSG_MAC_VLAN_HANDLER(fm10k_iov_msg_queue_mac_vlan),
 	FM10K_VF_MSG_LPORT_STATE_HANDLER(fm10k_iov_msg_lport_state_pf),
 	FM10K_TLV_MSG_ERROR_HANDLER(fm10k_iov_msg_error),
 };
@@ -126,8 +249,10 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
 		hw->mbx.ops.process(hw, &hw->mbx);
 
 		/* verify port mapping is valid, if not reset port */
-		if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort))
+		if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort)) {
 			hw->iov.ops.reset_lport(hw, vf_info);
+			fm10k_clear_macvlan_queue(interface, glort, false);
+		}
 
 		/* reset VFs that have mailbox timed out */
 		if (!mbx->timeout) {
@@ -190,6 +315,7 @@ void fm10k_iov_suspend(struct pci_dev *pdev)
 
 		hw->iov.ops.reset_resources(hw, vf_info);
 		hw->iov.ops.reset_lport(hw, vf_info);
+		fm10k_clear_macvlan_queue(interface, vf_info->glort, false);
 	}
 }
 
@@ -414,6 +540,8 @@ static inline void fm10k_reset_vf_info(struct fm10k_intfc *interface,
 	/* disable LPORT for this VF which clears switch rules */
 	hw->iov.ops.reset_lport(hw, vf_info);
 
+	fm10k_clear_macvlan_queue(interface, vf_info->glort, false);
+
 	/* assign new MAC+VLAN for this VF */
 	hw->iov.ops.assign_default_mac_vlan(hw, vf_info);
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 9e4fb3a44376..425d814aed4d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1186,7 +1186,7 @@ s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *hw, u32 **results,
  * Will report an error if the VLAN ID is out of range. For VID = 0, it will
  * return either the pf_vid or sw_vid depending on which one is set.
  */
-static s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid)
+s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid)
 {
 	if (!vid)
 		return vf_info->pf_vid ? vf_info->pf_vid : vf_info->sw_vid;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index 3336d3c10760..e04d41f1a532 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -1,5 +1,5 @@
 /* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -114,6 +114,7 @@ extern const struct fm10k_tlv_attr fm10k_err_msg_attr[];
 #define FM10K_PF_MSG_ERR_HANDLER(msg, func) \
 	FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_##msg, fm10k_err_msg_attr, func)
 
+s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid);
 s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *);
 s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **,
 			      struct fm10k_mbx_info *);
-- 
2.14.2

^ permalink raw reply related

* [net-next 0/9][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-03
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to fm10k only.

Jake provides majority of the changes in this series, starting with using
fm10k_prepare_for_reset() if we lose PCIe link.  Before we would detach
the device and close the netdev, which left a lot of items still active,
such as the Tx/Rx resources.  This could cause problems where register
reads would return potentially invalid values and would result in unknown
driver behavior, so call fm10k_prepare_for_reset() much like we do for
suspend/resume cycles.  This will attempt to shutdown as much as possible
to prevent possible issues.  Then replaced the PCI specific legacy power
management hooks with the new generic power management hooks for both
suspend and hibernate.  Introduced a workqueue item which monitors a
queue of MAC and VLAN requests since a large number of MAC address or
VLAN updates at once can overload the mailbox with too many messages at
once.  Fixed a cppcheck warning by properly declaring the min_rate and
max_rate variables in the declaration and definition for .ndo_set_vf_bw,
rather than using "unused" for the minimum rates.

Joe Perches fixes the backward logic when using net_ratelimit().

The following are changes since commit 4efac6ff4dc7d08ff3a63bf72172cb31323b6a8d:
  Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Jacob Keller (8):
  fm10k: prepare_for_reset() when we lose PCIe Link
  fm10k: use spinlock to implement mailbox lock
  fm10k: use generic PM hooks instead of legacy PCIe power hooks
  fm10k: introduce a message queue for MAC/VLAN messages
  fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests
  fm10k: bump version number
  fm10k: prefer %s and __func__ for diagnostic prints
  fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw

Joe Perches (1):
  fm10k: Fix misuse of net_ratelimit()

 drivers/net/ethernet/intel/fm10k/fm10k.h        |  60 +++-
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c    | 141 ++++++++-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |   5 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 199 +++++++++---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c    | 392 +++++++++++++++++++-----
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c     |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.h     |   3 +-
 7 files changed, 665 insertions(+), 137 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net-next 3/9] fm10k: use generic PM hooks instead of legacy PCIe power hooks
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Replace the PCI specific legacy power management hooks with the new
generic power management hooks which work properly for both suspend and
hibernate. The new generic system is better and properly handles the
lower level PCIe power management rather than forcing the driver to
handle it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 67 +++++++++-------------------
 1 file changed, 22 insertions(+), 45 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 240772ad5d69..aef39909e4a2 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2264,36 +2264,19 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
 
 #ifdef CONFIG_PM
 /**
- * fm10k_resume - Restore device to pre-sleep state
- * @pdev: PCI device information struct
+ * fm10k_resume - Generic PM resume hook
+ * @dev: generic device structure
  *
- * fm10k_resume is called after the system has powered back up from a sleep
- * state and is ready to resume operation.  This function is meant to restore
- * the device back to its pre-sleep state.
+ * Generic PM hook used when waking the device from a low power state after
+ * suspend or hibernation. This function does not need to handle lower PCIe
+ * device state as the stack takes care of that for us.
  **/
-static int fm10k_resume(struct pci_dev *pdev)
+static int fm10k_resume(struct device *dev)
 {
-	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+	struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
 	struct net_device *netdev = interface->netdev;
 	struct fm10k_hw *hw = &interface->hw;
-	u32 err;
-
-	pci_set_power_state(pdev, PCI_D0);
-	pci_restore_state(pdev);
-
-	/* pci_restore_state clears dev->state_saved so call
-	 * pci_save_state to restore it.
-	 */
-	pci_save_state(pdev);
-
-	err = pci_enable_device_mem(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "Cannot enable PCI device from suspend\n");
-		return err;
-	}
-	pci_set_master(pdev);
-
-	pci_wake_from_d3(pdev, false);
+	int err;
 
 	/* refresh hw_addr in case it was dropped */
 	hw->hw_addr = interface->uc_addr;
@@ -2308,36 +2291,27 @@ static int fm10k_resume(struct pci_dev *pdev)
 }
 
 /**
- * fm10k_suspend - Prepare the device for a system sleep state
- * @pdev: PCI device information struct
+ * fm10k_suspend - Generic PM suspend hook
+ * @dev: generic device structure
  *
- * fm10k_suspend is meant to shutdown the device prior to the system entering
- * a sleep state.  The fm10k hardware does not support wake on lan so the
- * driver simply needs to shut down the device so it is in a low power state.
+ * Generic PM hook used when setting the device into a low power state for
+ * system suspend or hibernation. This function does not need to handle lower
+ * PCIe device state as the stack takes care of that for us.
  **/
-static int fm10k_suspend(struct pci_dev *pdev,
-			 pm_message_t __always_unused state)
+static int fm10k_suspend(struct device *dev)
 {
-	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+	struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
 	struct net_device *netdev = interface->netdev;
-	int err = 0;
 
 	netif_device_detach(netdev);
 
 	fm10k_prepare_suspend(interface);
 
-	err = pci_save_state(pdev);
-	if (err)
-		return err;
-
-	pci_disable_device(pdev);
-	pci_wake_from_d3(pdev, false);
-	pci_set_power_state(pdev, PCI_D3hot);
-
 	return 0;
 }
 
 #endif /* CONFIG_PM */
+
 /**
  * fm10k_io_error_detected - called when PCI error is detected
  * @pdev: Pointer to PCI device
@@ -2447,15 +2421,18 @@ static const struct pci_error_handlers fm10k_err_handler = {
 	.reset_done = fm10k_io_reset_done,
 };
 
+static SIMPLE_DEV_PM_OPS(fm10k_pm_ops, fm10k_suspend, fm10k_resume);
+
 static struct pci_driver fm10k_driver = {
 	.name			= fm10k_driver_name,
 	.id_table		= fm10k_pci_tbl,
 	.probe			= fm10k_probe,
 	.remove			= fm10k_remove,
 #ifdef CONFIG_PM
-	.suspend		= fm10k_suspend,
-	.resume			= fm10k_resume,
-#endif
+	.driver = {
+		.pm		= &fm10k_pm_ops,
+	},
+#endif /* CONFIG_PM */
 	.sriov_configure	= fm10k_iov_configure,
 	.err_handler		= &fm10k_err_handler
 };
-- 
2.14.2

^ permalink raw reply related

* [net-next 1/9] fm10k: prepare_for_reset() when we lose PCIe Link
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

If we lose PCIe link, such as when an unannounced PFLR event occurs, or
when a device is surprise removed, we currently detach the device and
close the netdev. This unfortunately leaves a lot of things still
active, such as the msix_mbx_pf IRQ, and Tx/Rx resources.

This can cause problems because the register reads will return
potentially invalid values which may result in unknown driver behavior.

Begin the process of resetting using fm10k_prepare_for_reset(), much in
the same way as the suspend and resume cycle does. This will attempt to
shutdown as much as possible, in order to prevent possible issues.

A naive implementation for this has issues, because there are now
multiple flows calling the reset logic and setting a reset bit. This
would cause problems, because the "re-attach" routine might call
fm10k_handle_reset() prior to the reset actually finishing. Instead,
we'll add state bits to indicate which flow actually initiated the
reset.

For the general reset flow, we'll assume that if someone else is
resetting that we do not need to handle it at all, so it does not need
its own state bit. For the suspend case, we will simply issue a warning
indicating that we are attempting to recover from this case when
resuming.

For the detached subtask, we'll simply refuse to re-attach until we've
actually initiated a reset as part of that flow.

Finally, we'll stop attempting to manage the mailbox subtask when we're
detached, since there's nothing we can do if we don't have a PCIe
address.

Overall this produces a much cleaner shutdown and recovery cycle for
a PCIe surprise remove event.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h     |   2 +
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 103 ++++++++++++++++++++-------
 2 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 689c413b7782..ba70c58ca920 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -270,6 +270,8 @@ enum fm10k_flags_t {
 
 enum fm10k_state_t {
 	__FM10K_RESETTING,
+	__FM10K_RESET_DETACHED,
+	__FM10K_RESET_SUSPENDED,
 	__FM10K_DOWN,
 	__FM10K_SERVICE_SCHED,
 	__FM10K_SERVICE_REQUEST,
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 9575f7c1862d..4e5e3e64beda 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -153,7 +153,15 @@ static void fm10k_service_timer(unsigned long data)
 	fm10k_service_event_schedule(interface);
 }
 
-static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
+/**
+ * fm10k_prepare_for_reset - Prepare the driver and device for a pending reset
+ * @interface: fm10k private data structure
+ *
+ * This function prepares for a device reset by shutting as much down as we
+ * can. It does nothing and returns false if __FM10K_RESETTING was already set
+ * prior to calling this function. It returns true if it actually did work.
+ */
+static bool fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
 
@@ -162,8 +170,9 @@ static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 	/* put off any impending NetWatchDogTimeout */
 	netif_trans_update(netdev);
 
-	while (test_and_set_bit(__FM10K_RESETTING, interface->state))
-		usleep_range(1000, 2000);
+	/* Nothing to do if a reset is already in progress */
+	if (test_and_set_bit(__FM10K_RESETTING, interface->state))
+		return false;
 
 	rtnl_lock();
 
@@ -181,6 +190,8 @@ static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
 	interface->last_reset = jiffies + (10 * HZ);
 
 	rtnl_unlock();
+
+	return true;
 }
 
 static int fm10k_handle_reset(struct fm10k_intfc *interface)
@@ -189,6 +200,8 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
 	struct fm10k_hw *hw = &interface->hw;
 	int err;
 
+	WARN_ON(!test_bit(__FM10K_RESETTING, interface->state));
+
 	rtnl_lock();
 
 	pci_set_master(interface->pdev);
@@ -267,51 +280,75 @@ static void fm10k_detach_subtask(struct fm10k_intfc *interface)
 	struct net_device *netdev = interface->netdev;
 	u32 __iomem *hw_addr;
 	u32 value;
+	int err;
 
-	/* do nothing if device is still present or hw_addr is set */
+	/* do nothing if netdev is still present or hw_addr is set */
 	if (netif_device_present(netdev) || interface->hw.hw_addr)
 		return;
 
+	/* We've lost the PCIe register space, and can no longer access the
+	 * device. Shut everything except the detach subtask down and prepare
+	 * to reset the device in case we recover. If we actually prepare for
+	 * reset, indicate that we're detached.
+	 */
+	if (fm10k_prepare_for_reset(interface))
+		set_bit(__FM10K_RESET_DETACHED, interface->state);
+
 	/* check the real address space to see if we've recovered */
 	hw_addr = READ_ONCE(interface->uc_addr);
 	value = readl(hw_addr);
 	if (~value) {
+		/* Make sure the reset was initiated because we detached,
+		 * otherwise we might race with a different reset flow.
+		 */
+		if (!test_and_clear_bit(__FM10K_RESET_DETACHED,
+					interface->state))
+			return;
+
+		/* Restore the hardware address */
 		interface->hw.hw_addr = interface->uc_addr;
+
+		/* PCIe link has been restored, and the device is active
+		 * again. Restore everything and reset the device.
+		 */
+		err = fm10k_handle_reset(interface);
+		if (err) {
+			netdev_err(netdev, "Unable to reset device: %d\n", err);
+			interface->hw.hw_addr = NULL;
+			return;
+		}
+
+		/* Re-attach the netdev */
 		netif_device_attach(netdev);
-		set_bit(FM10K_FLAG_RESET_REQUESTED, interface->flags);
 		netdev_warn(netdev, "PCIe link restored, device now attached\n");
 		return;
 	}
-
-	rtnl_lock();
-
-	if (netif_running(netdev))
-		dev_close(netdev);
-
-	rtnl_unlock();
 }
 
-static void fm10k_reinit(struct fm10k_intfc *interface)
+static void fm10k_reset_subtask(struct fm10k_intfc *interface)
 {
 	int err;
 
-	fm10k_prepare_for_reset(interface);
-
-	err = fm10k_handle_reset(interface);
-	if (err)
-		dev_err(&interface->pdev->dev,
-			"fm10k_handle_reset failed: %d\n", err);
-}
-
-static void fm10k_reset_subtask(struct fm10k_intfc *interface)
-{
 	if (!test_and_clear_bit(FM10K_FLAG_RESET_REQUESTED,
 				interface->flags))
 		return;
 
+	/* If another thread has already prepared to reset the device, we
+	 * should not attempt to handle a reset here, since we'd race with
+	 * that thread. This may happen if we suspend the device or if the
+	 * PCIe link is lost. In this case, we'll just ignore the RESET
+	 * request, as it will (eventually) be taken care of when the thread
+	 * which actually started the reset is finished.
+	 */
+	if (!fm10k_prepare_for_reset(interface))
+		return;
+
 	netdev_err(interface->netdev, "Reset interface\n");
 
-	fm10k_reinit(interface);
+	err = fm10k_handle_reset(interface);
+	if (err)
+		dev_err(&interface->pdev->dev,
+			"fm10k_handle_reset failed: %d\n", err);
 }
 
 /**
@@ -381,6 +418,10 @@ static void fm10k_watchdog_update_host_state(struct fm10k_intfc *interface)
  **/
 static void fm10k_mbx_subtask(struct fm10k_intfc *interface)
 {
+	/* If we're resetting, bail out */
+	if (test_bit(__FM10K_RESETTING, interface->state))
+		return;
+
 	/* process upstream mailbox and update device state */
 	fm10k_watchdog_update_host_state(interface);
 
@@ -630,9 +671,11 @@ static void fm10k_service_task(struct work_struct *work)
 
 	interface = container_of(work, struct fm10k_intfc, service_task);
 
+	/* Check whether we're detached first */
+	fm10k_detach_subtask(interface);
+
 	/* tasks run even when interface is down */
 	fm10k_mbx_subtask(interface);
-	fm10k_detach_subtask(interface);
 	fm10k_reset_subtask(interface);
 
 	/* tasks only run when interface is up */
@@ -2177,7 +2220,8 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
 	 */
 	fm10k_stop_service_event(interface);
 
-	fm10k_prepare_for_reset(interface);
+	if (fm10k_prepare_for_reset(interface))
+		set_bit(__FM10K_RESET_SUSPENDED, interface->state);
 }
 
 static int fm10k_handle_resume(struct fm10k_intfc *interface)
@@ -2185,6 +2229,13 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
 	struct fm10k_hw *hw = &interface->hw;
 	int err;
 
+	/* Even if we didn't properly prepare for reset in
+	 * fm10k_prepare_suspend, we'll attempt to resume anyways.
+	 */
+	if (!test_and_clear_bit(__FM10K_RESET_SUSPENDED, interface->state))
+		dev_warn(&interface->pdev->dev,
+			 "Device was shut down as part of suspend... Attempting to recover\n");
+
 	/* reset statistics starting values */
 	hw->mac.ops.rebind_hw_stats(hw, &interface->stats);
 
-- 
2.14.2

^ permalink raw reply related

* [net-next 2/9] fm10k: use spinlock to implement mailbox lock
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Lets not re-invent the locking wheel. Remove our bitlock and use
a proper spinlock instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h     | 15 +++++----------
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |  3 +++
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index ba70c58ca920..74542e9d63c7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -276,7 +276,6 @@ enum fm10k_state_t {
 	__FM10K_SERVICE_SCHED,
 	__FM10K_SERVICE_REQUEST,
 	__FM10K_SERVICE_DISABLE,
-	__FM10K_MBX_LOCK,
 	__FM10K_LINK_DOWN,
 	__FM10K_UPDATING_STATS,
 	/* This value must be last and determines the BITMAP size */
@@ -346,6 +345,8 @@ struct fm10k_intfc {
 
 	struct fm10k_hw_stats stats;
 	struct fm10k_hw hw;
+	/* Mailbox lock */
+	spinlock_t mbx_lock;
 	u32 __iomem *uc_addr;
 	u32 __iomem *sw_addr;
 	u16 msg_enable;
@@ -386,23 +387,17 @@ struct fm10k_intfc {
 
 static inline void fm10k_mbx_lock(struct fm10k_intfc *interface)
 {
-	/* busy loop if we cannot obtain the lock as some calls
-	 * such as ndo_set_rx_mode may be made in atomic context
-	 */
-	while (test_and_set_bit(__FM10K_MBX_LOCK, interface->state))
-		udelay(20);
+	spin_lock(&interface->mbx_lock);
 }
 
 static inline void fm10k_mbx_unlock(struct fm10k_intfc *interface)
 {
-	/* flush memory to make sure state is correct */
-	smp_mb__before_atomic();
-	clear_bit(__FM10K_MBX_LOCK, interface->state);
+	spin_unlock(&interface->mbx_lock);
 }
 
 static inline int fm10k_mbx_trylock(struct fm10k_intfc *interface)
 {
-	return !test_and_set_bit(__FM10K_MBX_LOCK, interface->state);
+	return spin_trylock(&interface->mbx_lock);
 }
 
 /* fm10k_test_staterr - test bits in Rx descriptor status and error fields */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 4e5e3e64beda..240772ad5d69 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1921,6 +1921,9 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
 	netdev_rss_key_fill(rss_key, sizeof(rss_key));
 	memcpy(interface->rssrk, rss_key, sizeof(rss_key));
 
+	/* Initialize the mailbox lock */
+	spin_lock_init(&interface->mbx_lock);
+
 	/* Start off interface as being down */
 	set_bit(__FM10K_DOWN, interface->state);
 	set_bit(__FM10K_UPDATING_STATS, interface->state);
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Vivien Didelot @ 2017-10-03 16:25 UTC (permalink / raw)
  To: Andrew Lunn, Toshiaki Makita; +Cc: Toshiaki Makita, David Miller, netdev
In-Reply-To: <20171003153031.GP17713@lunn.ch>

Andrew Lunn <andrew@lunn.ch> writes:

>> The vlan will be effective only when vlan_filtering is enabled.
>> When vlan_filtering is disabled, vlan information is still kept in the
>> bridge and gets effective later when vlan_filtering becomes enable.
>
> O.K, so things are starting to get clearer.
>
> So when vlan filtering is disabled, the hardware should just ignore
> the requests to add the vlan to the hardware?
>
> When vlan_filtering is enabled, are all the vlans in the software
> bridge again offloaded? Or do we need to remember all the vlans which
> we ignored while vlan filtering was disabled? The average switch has
> nowhere to store these disabled vlans. It can only store active vlans.

When vlan_filtering is enabled on the bridge, the bridge code does
propagates the default_pvid again if I recall correctly.

In my opinion the hardware mustn't ignore the VLAN requests, because we
seem to agree that vlan_filtering disabled means that the target ports
should not care yet about 802.1Q. So having some unused hardware VLAN
entries and some ports with disabled 802.1Q mode must work together.

That being said we still have the wrong hardware FDB populated when
CONFIG_BRIDGE_VLAN_FILTERING is enabled but not vlan_filtering...

^ permalink raw reply

* Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward
From: David Ahern @ 2017-10-03 16:24 UTC (permalink / raw)
  To: cjacob, netdev; +Cc: linux-kernel, linux-arm-kernel
In-Reply-To: <1507016225-319-2-git-send-email-Christina.Jacob@cavium.com>

On 10/3/17 12:37 AM, cjacob wrote:
> diff --git a/samples/bpf/xdp3_kern.c b/samples/bpf/xdp3_kern.c
> new file mode 100644
> index 0000000..62d905d
> --- /dev/null
> +++ b/samples/bpf/xdp3_kern.c
> @@ -0,0 +1,204 @@
> +/* Copyright (c) 2016 PLUMgrid

2016 PLUMgrid?


> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#define KBUILD_MODNAME "

^ permalink raw reply

* Re: [PATCH] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-03 16:11 UTC (permalink / raw)
  To: andrew, f.fainelli; +Cc: netdev
In-Reply-To: <20171003155316.12312-1-dmurphy@ti.com>

All

On 10/03/2017 10:53 AM, Dan Murphy wrote:
> Add support for the TI  DP83822 10/100Mbit ethernet phy.
> 
> The DP83822 provides flexibility to connect to a MAC through a
> standard MII, RMII or RGMII interface.
> 

I need to submit an additional patch to remove DP83822 from the DP83848.
The main difference in the driver is that this driver supports WoL and the DP83848
does not.

So please kindly review this code and I can submit v2 with the DP83848 change

Dan

> Datasheet:
> http://www.ti.com/product/DP83822I/datasheet
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
>  drivers/net/phy/Kconfig   |   5 +
>  drivers/net/phy/Makefile  |   1 +
>  drivers/net/phy/dp83822.c | 313 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 319 insertions(+)
>  create mode 100644 drivers/net/phy/dp83822.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index cd931cf..8e78a48 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -277,6 +277,11 @@ config DAVICOM_PHY
>  	---help---
>  	  Currently supports dm9161e and dm9131
>  
> +config DP83822_PHY
> +	tristate "Texas Instruments DP83822 PHY"
> +	---help---
> +	  Supports the DP83822 PHY.
> +
>  config DP83848_PHY
>  	tristate "Texas Instruments DP83848 PHY"
>  	---help---
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 416df92..df3b82b 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -55,6 +55,7 @@ obj-$(CONFIG_CICADA_PHY)	+= cicada.o
>  obj-$(CONFIG_CORTINA_PHY)	+= cortina.o
>  obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
>  obj-$(CONFIG_DP83640_PHY)	+= dp83640.o
> +obj-$(CONFIG_DP83822_PHY)	+= dp83822.o
>  obj-$(CONFIG_DP83848_PHY)	+= dp83848.o
>  obj-$(CONFIG_DP83867_PHY)	+= dp83867.o
>  obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
> diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c
> new file mode 100644
> index 0000000..1d77515
> --- /dev/null
> +++ b/drivers/net/phy/dp83822.c
> @@ -0,0 +1,313 @@
> +/*
> + * Driver for the Texas Instruments DP83822 PHY
> + *
> + * Copyright (C) 2017 Texas Instruments Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/ethtool.h>
> +#include <linux/etherdevice.h>
> +#include <linux/kernel.h>
> +#include <linux/mii.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/phy.h>
> +#include <linux/netdevice.h>
> +
> +#define DP83822_PHY_ID	        0x2000a240
> +#define DP83822_DEVADDR		0x1f
> +
> +#define MII_DP83822_MISR1	0x12
> +#define MII_DP83822_MISR2	0x13
> +#define MII_DP83822_RESET_CTRL	0x1f
> +
> +#define DP83822_HW_RESET	BIT(15)
> +#define DP83822_SW_RESET	BIT(14)
> +
> +/* MISR1 bits */
> +#define DP83822_RX_ERR_HF_INT_EN	BIT(0)
> +#define DP83822_FALSE_CARRIER_HF_INT_EN	BIT(1)
> +#define DP83822_ANEG_COMPLETE_INT_EN	BIT(2)
> +#define DP83822_DUP_MODE_CHANGE_INT_EN	BIT(3)
> +#define DP83822_SPEED_CHANGED_INT_EN	BIT(4)
> +#define DP83822_LINK_STAT_INT_EN	BIT(5)
> +#define DP83822_ENERGY_DET_INT_EN	BIT(6)
> +#define DP83822_LINK_QUAL_INT_EN	BIT(7)
> +
> +/* MISR2 bits */
> +#define DP83822_JABBER_DET_INT_EN	BIT(0)
> +#define DP83822_WOL_PKT_INT_EN		BIT(1)
> +#define DP83822_SLEEP_MODE_INT_EN	BIT(2)
> +#define DP83822_MDI_XOVER_INT_EN	BIT(3)
> +#define DP83822_LB_FIFO_INT_EN		BIT(4)
> +#define DP83822_PAGE_RX_INT_EN		BIT(5)
> +#define DP83822_ANEG_ERR_INT_EN		BIT(6)
> +#define DP83822_EEE_ERROR_CHANGE_INT_EN	BIT(7)
> +
> +/* INT_STAT1 bits */
> +#define DP83822_WOL_INT_EN	BIT(4)
> +#define DP83822_WOL_INT_STAT	BIT(12)
> +
> +#define MII_DP83822_RXSOP1	0x04A5
> +#define	MII_DP83822_RXSOP2	0x04A6
> +#define	MII_DP83822_RXSOP3	0x04A7
> +
> +/* WoL Registers */
> +#define	MII_DP83822_WOL_CFG	0x04A0
> +#define	MII_DP83822_WOL_STAT	0x04A1
> +#define	MII_DP83822_WOL_DA1	0x04A2
> +#define	MII_DP83822_WOL_DA2	0x04A3
> +#define	MII_DP83822_WOL_DA3	0x04A4
> +
> +/* WoL bits */
> +#define DP83822_WOL_MAGIC_EN	BIT(1)
> +#define DP83822_WOL_SECURE_ON	BIT(5)
> +#define DP83822_WOL_EN		BIT(7)
> +#define DP83822_WOL_INDICATION_SEL BIT(8)
> +#define DP83822_WOL_CLR_INDICATION BIT(11)
> +
> +static int dp83822_ack_interrupt(struct phy_device *phydev)
> +{
> +	int err = phy_read(phydev, MII_DP83822_MISR1);
> +
> +	if (err < 0)
> +		return err;
> +
> +	err = phy_read(phydev, MII_DP83822_MISR2);
> +	if (err < 0)
> +		return err;
> +
> +	return 0;
> +}
> +
> +static int dp83822_set_wol(struct phy_device *phydev,
> +			   struct ethtool_wolinfo *wol)
> +{
> +	struct net_device *ndev = phydev->attached_dev;
> +	u16 value;
> +	const u8 *mac;
> +
> +	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
> +		mac = (const u8 *)ndev->dev_addr;
> +
> +		if (!is_valid_ether_addr(mac))
> +			return -EFAULT;
> +
> +		/* MAC addresses start with byte 5, but stored in mac[0].
> +		 * 822 PHYs store bytes 4|5, 2|3, 0|1
> +		 */
> +		phy_write_mmd(phydev, DP83822_DEVADDR,
> +			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
> +		phy_write_mmd(phydev, DP83822_DEVADDR,
> +			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
> +			      (mac[5] << 8) | mac[4]);
> +
> +		value = phy_read_mmd(phydev, DP83822_DEVADDR,
> +				     MII_DP83822_WOL_CFG);
> +		if (wol->wolopts & WAKE_MAGIC)
> +			value |= DP83822_WOL_MAGIC_EN;
> +		else
> +			value &= ~DP83822_WOL_MAGIC_EN;
> +
> +		if (wol->wolopts & WAKE_MAGICSECURE) {
> +			value |= DP83822_WOL_SECURE_ON;
> +			phy_write_mmd(phydev, DP83822_DEVADDR,
> +				      MII_DP83822_RXSOP1,
> +				      (wol->sopass[1] << 8) | wol->sopass[0]);
> +			phy_write_mmd(phydev, DP83822_DEVADDR,
> +				      MII_DP83822_RXSOP2,
> +				      (wol->sopass[3] << 8) | wol->sopass[2]);
> +			phy_write_mmd(phydev, DP83822_DEVADDR,
> +				      MII_DP83822_RXSOP3,
> +				      (wol->sopass[5] << 8) | wol->sopass[4]);
> +		} else {
> +			value &= ~DP83822_WOL_SECURE_ON;
> +		}
> +
> +		value |= (DP83822_WOL_EN | DP83822_WOL_CLR_INDICATION |
> +			  DP83822_WOL_CLR_INDICATION);
> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> +			      value);
> +	} else {
> +		value =
> +		    phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> +		value &= (~DP83822_WOL_EN);
> +		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> +			      value);
> +	}
> +
> +	return 0;
> +}
> +
> +static void dp83822_get_wol(struct phy_device *phydev,
> +			    struct ethtool_wolinfo *wol)
> +{
> +	int value;
> +
> +	wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
> +	wol->wolopts = 0;
> +
> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> +	if (value & DP83822_WOL_MAGIC_EN)
> +		wol->wolopts |= WAKE_MAGIC;
> +
> +	if (value & DP83822_WOL_SECURE_ON)
> +		wol->wolopts |= WAKE_MAGICSECURE;
> +
> +	if (~value & DP83822_WOL_CLR_INDICATION)
> +		wol->wolopts = 0;
> +
> +	wol->sopass[0] = (phy_read_mmd(phydev,
> +				       DP83822_DEVADDR,
> +				       MII_DP83822_RXSOP1) & 0xFF);
> +	wol->sopass[1] =
> +	    (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP1) >> 8);
> +	wol->sopass[2] =
> +	    (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2) & 0xFF);
> +	wol->sopass[3] =
> +	    (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2) >> 8);
> +	wol->sopass[4] =
> +	    (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3) & 0xFF);
> +	wol->sopass[5] =
> +	    (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3) >> 8);
> +}
> +
> +static int dp83822_config_intr(struct phy_device *phydev)
> +{
> +	int misr_status;
> +	int err;
> +
> +	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> +		misr_status = phy_read(phydev, MII_DP83822_MISR1);
> +		if (misr_status < 0)
> +			return misr_status;
> +
> +		misr_status |= (DP83822_RX_ERR_HF_INT_EN |
> +				DP83822_FALSE_CARRIER_HF_INT_EN |
> +				DP83822_ANEG_COMPLETE_INT_EN |
> +				DP83822_DUP_MODE_CHANGE_INT_EN |
> +				DP83822_SPEED_CHANGED_INT_EN |
> +				DP83822_LINK_STAT_INT_EN |
> +				DP83822_ENERGY_DET_INT_EN |
> +				DP83822_LINK_QUAL_INT_EN);
> +
> +		err = phy_write(phydev, MII_DP83822_MISR1, misr_status);
> +		if (err < 0)
> +			return err;
> +
> +		misr_status = phy_read(phydev, MII_DP83822_MISR2);
> +		if (misr_status < 0)
> +			return misr_status;
> +
> +		misr_status |= (DP83822_JABBER_DET_INT_EN |
> +				DP83822_WOL_PKT_INT_EN |
> +				DP83822_SLEEP_MODE_INT_EN |
> +				DP83822_MDI_XOVER_INT_EN |
> +				DP83822_LB_FIFO_INT_EN |
> +				DP83822_PAGE_RX_INT_EN |
> +				DP83822_ANEG_ERR_INT_EN |
> +				DP83822_EEE_ERROR_CHANGE_INT_EN);
> +
> +		err = phy_write(phydev, MII_DP83822_MISR2, misr_status);
> +	} else {
> +		err = phy_write(phydev, MII_DP83822_MISR1, 0);
> +		if (err < 0)
> +			return err;
> +
> +		err = phy_write(phydev, MII_DP83822_MISR1, 0);
> +	}
> +
> +	return err;
> +}
> +
> +static int dp83822_phy_reset(struct phy_device *phydev)
> +{
> +	int err;
> +
> +	err = phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);
> +	if (err < 0)
> +		return err;
> +
> +	return 0;
> +}
> +
> +static int dp83822_suspend(struct phy_device *phydev)
> +{
> +	int value;
> +
> +	mutex_lock(&phydev->lock);
> +
> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> +	if (~value & DP83822_WOL_EN) {
> +		value = phy_read(phydev, MII_BMCR);
> +		phy_write(phydev, MII_BMCR, value | BMCR_PDOWN);
> +	}
> +
> +	mutex_unlock(&phydev->lock);
> +
> +	return 0;
> +}
> +
> +static int dp83822_resume(struct phy_device *phydev)
> +{
> +	int value;
> +
> +	mutex_lock(&phydev->lock);
> +
> +	value = phy_read(phydev, MII_BMCR);
> +	phy_write(phydev, MII_BMCR, value & ~BMCR_PDOWN);
> +
> +	value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> +
> +	phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG, value |
> +		      DP83822_WOL_CLR_INDICATION);
> +
> +	mutex_unlock(&phydev->lock);
> +
> +	return 0;
> +}
> +
> +static struct phy_driver dp83822_driver[] = {
> +	{
> +	 .phy_id = DP83822_PHY_ID,
> +	 .phy_id_mask = 0xfffffff0,
> +	 .name = "TI DP83822",
> +	 .features = PHY_BASIC_FEATURES,
> +	 .flags = PHY_HAS_INTERRUPT,
> +
> +	 .config_init = genphy_config_init,
> +	 .soft_reset = dp83822_phy_reset,
> +
> +	 .get_wol = dp83822_get_wol,
> +	 .set_wol = dp83822_set_wol,
> +
> +	 /* IRQ related */
> +	 .ack_interrupt = dp83822_ack_interrupt,
> +	 .config_intr = dp83822_config_intr,
> +
> +	 .config_aneg = genphy_config_aneg,
> +	 .read_status = genphy_read_status,
> +	 .suspend = dp83822_suspend,
> +	 .resume = dp83822_resume,
> +	 },
> +};
> +module_phy_driver(dp83822_driver);
> +
> +static struct mdio_device_id __maybe_unused dp83822_tbl[] = {
> +	{ DP83822_PHY_ID, 0xfffffff0 },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(mdio, dp83822_tbl);
> +
> +MODULE_DESCRIPTION("Texas Instruments DP83822 PHY driver");
> +MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
> +MODULE_LICENSE("GPL");
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] tools: bpftool: add documentation
From: David Ahern @ 2017-10-03 16:09 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Jakub Kicinski
  Cc: netdev, oss-drivers, David Beckett
In-Reply-To: <59D3B456.4020209@iogearbox.net>

On 10/3/17 9:01 AM, Daniel Borkmann wrote:
> On 10/03/2017 05:39 PM, David Ahern wrote:
>> On 10/2/17 9:29 PM, Alexei Starovoitov wrote:
>>> On Mon, Oct 02, 2017 at 06:35:09PM -0700, Jakub Kicinski wrote:
>>>>> will pretty print them as verifier output as well?
>>>>
>>>> We tried to use LLVM as a library for this but the interface is
>>>> painfully unstable and it's a heavy dependency.  The current thinking
>>>> is to try to put the instruction printing code in some higher level
>>>> library, but I would rather leave that as a follow up.
>>>
>>> follow up, of course.
>>> Not depending on llvm is must have for this tool.
>>> I think we need tiny and simple tools first.
>>> Since you're using gpl+bsd license for this tool I think
>>> it would be fine to copy-paste verifier's pretty print code into it.
>>
>> I have done that including integrating it into bpf-tool.
> 
> Great, to avoid letting the pretty print code become stale,
> could the printer be ripped out of the verifier into its own
> file or header under kernel/bpf/ such that it can be used from
> kernel but also integrated from bpftool compilation? There's
> likely not much kernel specifics in there anyway, wdyt?

The pretty print code I have is based on the verifier code from
February. At this point I forget all of the changes I made to it in the
past 7 months.

I agree that it would be best to try to pull the verifier code into a
separate file for easier re-use and keeping the tool up to date.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Dmitry Vyukov @ 2017-10-03 16:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mark Rutland, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <CANn89iJrJzTL+e4FDN76Fc1iptXpM9aJ6Hwa=pT=+v-zbv7+8g@mail.gmail.com>

On Tue, Oct 3, 2017 at 5:38 PM, 'Eric Dumazet' via syzkaller
<syzkaller@googlegroups.com> wrote:
> On Tue, Oct 3, 2017 at 8:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
>> <syzkaller@googlegroups.com> wrote:
>>> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>> Hi Eric,
>>>>
>>>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>>>> > skb_copy_and_csum_bits().
>>>>
>>>>> > kernel BUG at net/core/skbuff.c:2626!
>>>>
>>>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>>>
>>>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>>>> on loopback device, below minimum size of ipv4 MTU.
>>>>
>>>>> I tried to track it in August [1], but it seems hard to find all the
>>>>> issues with this.
>>>>>
>>>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>>>> Author: Eric Dumazet <edumazet@google.com>
>>>>> Date:   Wed Aug 16 11:09:12 2017 -0700
>>>>>
>>>>>     ipv4: better IP_MAX_MTU enforcement
>>>>>
>>>>>     While working on yet another syzkaller report, I found
>>>>>     that our IP_MAX_MTU enforcements were not properly done.
>>>>>
>>>>>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>>>>     final result can be bigger than IP_MAX_MTU :/
>>>>>
>>>>>     This is a problem because device mtu can be changed on other cpus or
>>>>>     threads.
>>>>>
>>>>>     While this patch does not fix the issue I am working on, it is
>>>>>     probably worth addressing it.
>>>>
>>>> Just to check I've understood correctly, are you suggesting that the
>>>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>>>> doesn't seem to exist today)?
>>>
>>> We have plenty of places this is checked.
>>>
>>> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>>>
>>> Problem is : these checks are not fool proof yet.
>>>
>>> ( Only the admin was supposed to play these games )
>>>
>>>>
>>>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>>>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>>>> fallback) doesn't use WRITE_ONCE().
>>>
>>> It does not matter how many strange values can be observed by the reader :
>>> We must be fool proof anyway from reader point of view, so the
>>> WRITE_ONCE() is not strictly needed.
>>
>>
>> Note if writer stores some temporal garbage there (which C language
>> perfectly allows), it does not matter what we do on reader side --
>> reader won't get correct data anyway. Say mtu changes from 1000 to
>> 2000, but writer temporary stores 1 there, reader can observe 1 while
>> it must not. Synchronization is always a game of two.
>
> Since we have no sync here, a reader _must_ cope with any MTU value.
>
> We need to care of any value, so we do not care how dummy writers can be.
>
> Sure, a WRITE_ONCE() will help avoiding some strange values being written,
>  but since we _allow_ writers to write such strange values,
> there is really no point pretending to be safe here.
>
> Adding a WRITE_ONCE() will not fix the bug.


Reader must cope with any value. But there is an additional
requirement that it must behave correctly. If mtu was 1000 and then
reset to 2000 once (and not other manipulations with mtu), then
correct behavior is either (1) sending packets with mtu 1000 or (2)
sending packets with mtu 2000 (after mtu change) and nothing else.
Sending packets with mtu 500, dropping packets because mtu is observed
to be 1, or formatting hard drive are all incorrect behaviors and must
not happen.

What you say is valid for communication with user-space
(copy_form_user, etc). Because there we don't control write side and
racy writes are indistinguishable from intentional writes that do the
same.

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] tools: bpftool: add documentation
From: Daniel Borkmann @ 2017-10-03 16:01 UTC (permalink / raw)
  To: David Ahern, Alexei Starovoitov, Jakub Kicinski
  Cc: netdev, oss-drivers, David Beckett
In-Reply-To: <881bcc51-015c-097e-a5a4-1f2312a3d9f1@gmail.com>

On 10/03/2017 05:39 PM, David Ahern wrote:
> On 10/2/17 9:29 PM, Alexei Starovoitov wrote:
>> On Mon, Oct 02, 2017 at 06:35:09PM -0700, Jakub Kicinski wrote:
>>>> will pretty print them as verifier output as well?
>>>
>>> We tried to use LLVM as a library for this but the interface is
>>> painfully unstable and it's a heavy dependency.  The current thinking
>>> is to try to put the instruction printing code in some higher level
>>> library, but I would rather leave that as a follow up.
>>
>> follow up, of course.
>> Not depending on llvm is must have for this tool.
>> I think we need tiny and simple tools first.
>> Since you're using gpl+bsd license for this tool I think
>> it would be fine to copy-paste verifier's pretty print code into it.
>
> I have done that including integrating it into bpf-tool.

Great, to avoid letting the pretty print code become stale,
could the printer be ripped out of the verifier into its own
file or header under kernel/bpf/ such that it can be used from
kernel but also integrated from bpftool compilation? There's
likely not much kernel specifics in there anyway, wdyt?

^ permalink raw reply

* Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward
From: Daniel Borkmann @ 2017-10-03 15:54 UTC (permalink / raw)
  To: cjacob, netdev; +Cc: linux-kernel, linux-arm-kernel, alexei.starovoitov
In-Reply-To: <1507016225-319-2-git-send-email-Christina.Jacob@cavium.com>

On 10/03/2017 09:37 AM, cjacob wrote:
> Implements port to port forwarding with route table and arp table
> lookup for ipv4 packets using bpf_redirect helper function and
> lpm_trie  map.
>
> Signed-off-by: cjacob <Christina.Jacob@cavium.com>

Thanks for the patch, just few minor comments below!

Note, should be full name, e.g.:

   Signed-off-by: Christina Jacob <Christina.Jacob@cavium.com>

Also you From: only shows 'cjacob' as can be seen from the cover letter
as well, so perhaps check your git settings to make that full name:

   cjacob (1):
     xdp: Sample xdp program implementing ip forward

If there's one single patch, then cover letter is not needed, only
for >1 sets.

[...]
> +#define KBUILD_MODNAME "foo"
> +#include <uapi/linux/bpf.h>
> +#include <linux/in.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_packet.h>
> +#include <linux/if_vlan.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include "bpf_helpers.h"
> +#include <linux/slab.h>
> +#include <net/ip_fib.h>
> +
> +struct trie_value {
> +	__u8 prefix[4];
> +	long value;
> +	int gw;
> +	int ifindex;
> +	int metric;
> +};
> +
> +union key_4 {
> +	u32 b32[2];
> +	u8 b8[8];
> +};
> +
> +struct arp_entry {
> +	int dst;
> +	long mac;
> +};
> +
> +struct direct_map {
> +	long mac;
> +	int ifindex;
> +	struct arp_entry arp;
> +};
> +
> +/* Map for trie implementation*/
> +struct bpf_map_def SEC("maps") lpm_map = {
> +	.type = BPF_MAP_TYPE_LPM_TRIE,
> +	.key_size = 8,
> +	.value_size =
> +		sizeof(struct trie_value),

(Nit: there are couple of such breaks throughout the patch, can we
  just use single line for such cases where reasonable?)

> +	.max_entries = 50,
> +	.map_flags = BPF_F_NO_PREALLOC,
> +};
> +
> +/* Map for counter*/
> +struct bpf_map_def SEC("maps") rxcnt = {
> +	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
> +	.key_size = sizeof(u32),
> +	.value_size = sizeof(long),
> +	.max_entries = 256,
> +};
> +
> +/* Map for ARP table*/
> +struct bpf_map_def SEC("maps") arp_table = {
> +	.type = BPF_MAP_TYPE_HASH,
> +	.key_size = sizeof(int),
> +	.value_size = sizeof(long),

Perhaps these should be proper structs here, such that it
becomes easier to read/handle later on lookup.

> +	.max_entries = 50,
> +};
> +
> +/* Map to keep the exact match entries in the route table*/
> +struct bpf_map_def SEC("maps") exact_match = {
> +	.type = BPF_MAP_TYPE_HASH,
> +	.key_size = sizeof(int),
> +	.value_size = sizeof(struct direct_map),
> +	.max_entries = 50,
> +};
> +
> +/**
> + * Function to set source and destination mac of the packet
> + */
> +static inline void set_src_dst_mac(void *data, void *src, void *dst)
> +{
> +	unsigned short *p      = data;
> +	unsigned short *dest   = dst;
> +	unsigned short *source = src;
> +
> +	p[3] = source[0];
> +	p[4] = source[1];
> +	p[5] = source[2];
> +	p[0] = dest[0];
> +	p[1] = dest[1];
> +	p[2] = dest[2];

You could just use __builtin_memcpy() given length is
constant anyway, so LLVM will do the inlining.

> +}
> +
> +/**
> + * Parse IPV4 packet to get SRC, DST IP and protocol
> + */
> +static inline int parse_ipv4(void *data, u64 nh_off, void *data_end,
> +			     unsigned int *src, unsigned int *dest)
> +{
> +	struct iphdr *iph = data + nh_off;
> +
> +	if (iph + 1 > data_end)
> +		return 0;
> +	*src = (unsigned int)iph->saddr;
> +	*dest = (unsigned int)iph->daddr;

Why not stay with __be32 types?

> +	return iph->protocol;
> +}
> +
> +SEC("xdp3")
> +int xdp_prog3(struct xdp_md *ctx)
> +{
> +	void *data_end = (void *)(long)ctx->data_end;
> +	void *data = (void *)(long)ctx->data;
> +	struct ethhdr *eth = data;
> +	int rc = XDP_DROP, forward_to;
> +	long *value;
> +	struct trie_value *prefix_value;
> +	long *dest_mac = NULL, *src_mac = NULL;
> +	u16 h_proto;
> +	u64 nh_off;
> +	u32 ipproto;
> +	union key_4 key4;
> +
> +	nh_off = sizeof(*eth);
> +	if (data + nh_off > data_end)
> +		return rc;
> +
> +	h_proto = eth->h_proto;
> +
> +	if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
> +		struct vlan_hdr *vhdr;
> +
> +		vhdr = data + nh_off;
> +		nh_off += sizeof(struct vlan_hdr);
> +		if (data + nh_off > data_end)
> +			return rc;
> +		h_proto = vhdr->h_vlan_encapsulated_proto;
> +	}
> +	if (h_proto == htons(ETH_P_ARP)) {
> +		return XDP_PASS;
> +	} else if (h_proto == htons(ETH_P_IP)) {
> +		int src_ip = 0, dest_ip = 0;
> +		struct direct_map *direct_entry;
> +
> +		ipproto = parse_ipv4(data, nh_off, data_end, &src_ip, &dest_ip);
> +		direct_entry = (struct direct_map *)bpf_map_lookup_elem
> +			(&exact_match, &dest_ip);
> +		/*check for exact match, this would give a faster lookup*/
> +		if (direct_entry && direct_entry->mac &&
> +		    direct_entry->arp.mac) {
> +			src_mac = &direct_entry->mac;
> +			dest_mac = &direct_entry->arp.mac;
> +			forward_to = direct_entry->ifindex;
> +		} else {
> +			/*Look up in the trie for lpm*/
> +			// Key for trie

Nit: please check style throughout the patch.

> +			key4.b32[0] = 32;
> +			key4.b8[4] = dest_ip % 0x100;
> +			key4.b8[5] = (dest_ip >> 8) % 0x100;
> +			key4.b8[6] = (dest_ip >> 16) % 0x100;
> +			key4.b8[7] = (dest_ip >> 24) % 0x100;
> +			prefix_value =
> +				((struct trie_value *)bpf_map_lookup_elem
> +				 (&lpm_map, &key4));

For key, please use proper struct bpf_lpm_trie_key, see also
usage example in tools/testing/selftests/bpf/test_lpm_map.c
for LPM handling.

> +			if (!prefix_value) {
> +				return XDP_DROP;
> +			} else {
> +				src_mac = &prefix_value->value;
> +				if (src_mac) {
> +					dest_mac = (long *)bpf_map_lookup_elem
> +						(&arp_table, &dest_ip);
> +					if (!dest_mac) {
> +						if (prefix_value->gw) {
> +							dest_ip = *(unsigned int *)(&(prefix_value->gw));
> +							dest_mac = (long *)bpf_map_lookup_elem
> +								(&arp_table, &dest_ip);
> +						} else {
> +							return XDP_DROP;
> +						}
> +					}
> +					forward_to = prefix_value->ifindex;
> +				} else {
> +					return XDP_DROP;
> +				}
> +			}
> +		}
> +	} else {
> +		ipproto = 0;
> +	}
> +	if (src_mac && dest_mac) {
> +		set_src_dst_mac(data, src_mac,
> +				dest_mac);
> +		value = bpf_map_lookup_elem
> +			(&rxcnt, &ipproto);
> +		if (value)
> +			*value += 1;
> +		return  bpf_redirect(
> +				     forward_to,
> +				     0);
> +	}
> +	return rc;

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox