* Re: [PATCH] 8139too: Remove unnecessary netif_napi_del()
From: David Miller @ 2018-05-25 20:37 UTC (permalink / raw)
To: chenbo; +Cc: netdev, linux-kernel
In-Reply-To: <20180524194835.14700-1-chenbo@pdx.edu>
From: Bo Chen <chenbo@pdx.edu>
Date: Thu, 24 May 2018 12:48:35 -0700
> The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device
> napi list, and explicit calls to netif_napi_del() are unnecessary.
>
> Signed-off-by: Bo Chen <chenbo@pdx.edu>
Since this is just unnecessary work and not a bug, applied to net-next.
Thanks.
^ permalink raw reply
* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Vlad Buslov @ 2018-05-25 20:39 UTC (permalink / raw)
To: Cong Wang
Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
Jiri Pirko, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, Kees Cook,
LKML, NetFilter, coreteam, kliteyn
In-Reply-To: <CAM_iQpXMbtUWsaGBrpJH08dM4p9oVwpMSGrev1PThbP5d23sdA@mail.gmail.com>
On Thu 24 May 2018 at 23:34, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov <vladbu@mellanox.com> wrote:
>> Currently, all netlink protocol handlers for updating rules, actions and
>> qdiscs are protected with single global rtnl lock which removes any
>> possibility for parallelism. This patch set is a first step to remove
>> rtnl lock dependency from TC rules update path. It updates act API to
>> use atomic operations, rcu and spinlocks for fine-grained locking. It
>> also extend API with functions that are needed to update existing
>> actions for parallel execution.
>
> Can you give a summary here for what and how it is achieved?
Got it, will expand cover letter in V2 with summary.
>
> You said this is the first step, what do you want to achieve in this
> very first step? And how do you achieve it? Do you break the RTNL
But aren't this questions answered in paragraph you quoted?
What: Change act API to not rely on one-big-global-RTNL-lock and to use
more fine-grained synchronization methods to allow safe concurrent
execution.
How: Refactor act API code to use atomics, rcu and spinlocks, etc. for
protecting shared data structures, add new functions required to update
specific actions implementation for parallel execution. (step 2)
If you feel that this cover letter is too terse, I will add outline of
changes in V2.
> lock down to, for a quick example, a per-device lock? Or perhaps you
> completely remove it because of what reason?
I want to remove RTNL _dependency_ from act API data structures and
code. I probably should me more specific in this case:
Florian recently made a change that allows registering netlink protocol
handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with
this flag are called without RTNL taken. My end goal is to have rule
update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered
with UNLOCKED flag to allow parallel execution.
I do not intend to globally remove or break RTNL.
>
> I go through all the descriptions of your 14 patches (but not any code),
> I still have no clue how you successfully avoid RTNL. Please don't
> let me read into your code to understand that, there must be some
> high-level justification on how it works. Without it, I don't event want
> to read into the code.
On internal code review I've been asked not to duplicate info from
commit messages in cover letter, but I guess I can expand it with some
high level outline in V2.
>
> Thanks.
Thank you for your feedback!
^ permalink raw reply
* Re: [pull request][net-next V2 0/6] Mellanox, mlx5e updates 2018-05-19
From: David Miller @ 2018-05-25 20:42 UTC (permalink / raw)
To: saeedm; +Cc: netdev
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 24 May 2018 14:38:14 -0700
> This is a mlx5e only pull request, for more information please see tag
> log below.
>
> Please pull and let me know if there's any problem.
>
> v1->v2:
> 1) patch #1 commit message: lldptool usage example and explanation on why
> dcbnl is selected over devlink interface as was agreed on mailing list.
>
> 2) patches #1 and #6: Add total_size in dcbnl_buffer to report the total
> available buffer size of the netdev, as suggested by John.
>
> 3) Added Reviewed-by tag to all the patches.
Ok, thanks for the discussion and details in patch #1.
Pulled, thanks.
^ permalink raw reply
* Re: [PATCH net-next] ifb: fix packets checksum
From: David Miller @ 2018-05-25 20:43 UTC (permalink / raw)
To: jmaxwell37
Cc: dsahern, mschiffer, zhangshengju, ktkhai, netdev, linux-kernel,
jmaxwell
In-Reply-To: <20180524213829.15208-1-jmaxwell37@gmail.com>
From: Jon Maxwell <jmaxwell37@gmail.com>
Date: Fri, 25 May 2018 07:38:29 +1000
> Fixup the checksum for CHECKSUM_COMPLETE when pulling skbs on RX path.
> Otherwise we get splats when tc mirred is used to redirect packets to ifb.
>
> Before fix:
>
> nic: hw csum failure
>
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
This definitely seems correct, but I am really surprised a bug like this has
lasted as long as it has.
So I'll let this sit for another day or two for review.
^ permalink raw reply
* Re: [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Alexei Starovoitov @ 2018-05-25 20:45 UTC (permalink / raw)
To: Sean Young
Cc: linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cd5140387a0f9c5ffc68d1846774f12fed45f34d.1526651592.git.sean@mess.org>
On Fri, May 18, 2018 at 03:07:29PM +0100, Sean Young wrote:
> Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
>
> Signed-off-by: Sean Young <sean@mess.org>
...
> enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
> BPF_CGROUP_INET6_CONNECT,
> BPF_CGROUP_INET4_POST_BIND,
> BPF_CGROUP_INET6_POST_BIND,
> + BPF_LIRC_MODE2,
> __MAX_BPF_ATTACH_TYPE
> };
>
> @@ -1902,6 +1904,53 @@ union bpf_attr {
> * egress otherwise). This is the only flag supported for now.
> * Return
> * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded key press with *scancode*,
> + * *toggle* value in the given *protocol*. The scancode will be
> + * translated to a keycode using the rc keymap, and reported as
> + * an input key down event. After a period a key up event is
> + * generated. This period can be extended by calling either
> + * **bpf_rc_keydown** () with the same values, or calling
> + * **bpf_rc_repeat** ().
> + *
> + * Some protocols include a toggle bit, in case the button
> + * was released and pressed again between consecutive scancodes
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * The *protocol* is the decoded protocol number (see
> + * **enum rc_proto** for some predefined values).
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
> + *
> + * Return
> + * 0
> + *
> + * int bpf_rc_repeat(void *ctx)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded repeat key message. This delays
> + * the generation of a key up event for previously generated
> + * key down event.
> + *
> + * Some IR protocols like NEC have a special IR message for
> + * repeating last button, for when a button is held down.
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
Hi Sean,
thank you for working on this. The patch set looks good to me.
I'd only ask to change above two helper names to something more specific.
Since BPF_PROG_TYPE_LIRC_MODE2 is the name of new prog type and kconfig.
May be bpf_lirc2_keydown() and bpf_lirc2_repeat() ?
> @@ -1576,6 +1577,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> case BPF_SK_SKB_STREAM_PARSER:
> case BPF_SK_SKB_STREAM_VERDICT:
> return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
> + case BPF_LIRC_MODE2:
> + return rc_dev_prog_attach(attr);
...
> + case BPF_LIRC_MODE2:
> + return rc_dev_prog_detach(attr);
and similar rename for internal function names that go into bpf core.
Please add accumulated acks when you respin.
Thanks
^ permalink raw reply
* Re: [PATCH net-next v5 0/2] openvswitch: Support conntrack zone limit
From: David Miller @ 2018-05-25 20:45 UTC (permalink / raw)
To: yihung.wei; +Cc: netdev, pshelar
In-Reply-To: <1527209803-48274-1-git-send-email-yihung.wei@gmail.com>
From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Thu, 24 May 2018 17:56:41 -0700
> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max. In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table. Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
>
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone. For example, we can designate different zone to different VM,
> and set conntrack limit to each zone. By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs. Moreover, the users can
> set various conntrack limit to different zone based on their preference.
>
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone. If the number of
> connection is above a configured limitation, OVS will return ENOMEM to the
> userspace. If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
>
> The first patch defines the conntrack limit netlink definition, and the
> second patch provides the implementation.
...
Series applied, thanks for sticking with it so long and responding to the
feedback you received.
^ permalink raw reply
* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-25 20:46 UTC (permalink / raw)
To: Don Dutile
Cc: Jakub Kicinski, Bjorn Helgaas, linux-pci, netdev, Sathya Perla,
Felix Manlunas, alexander.duyck, john.fastabend, Jacob Keller,
oss-drivers, Christoph Hellwig
In-Reply-To: <88390255-55f7-57a6-5324-d443373d1984@redhat.com>
On Fri, May 25, 2018 at 03:27:52PM -0400, Don Dutile wrote:
> On 05/25/2018 10:02 AM, Bjorn Helgaas wrote:
> > On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > > Hi Bjorn!
> > >
> > > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > > to not fail, e.g.:
> > > > >
> > > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > >
> > > > > For devices which VF support depends on loaded FW we have the
> > > > > pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
> > > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > > to 0. Remove the special values completely and simply initialize
> > > > > driver_max_VFs to total_VFs. Then always use driver_max_VFs.
> > > > > Add a helper for drivers to reset the VF limit back to total.
> > > >
> > > > I still can't really make sense out of the changelog.
> > > >
> > > > I think part of the reason it's confusing is because there are two
> > > > things going on:
> > > >
> > > > 1) You want this:
> > > > pci_sriov_set_totalvfs(dev, 0);
> > > > x = pci_sriov_get_totalvfs(dev)
> > > >
> > > > to return 0 instead of total_VFs. That seems to connect with
> > > > your subject line. It means "sriov_totalvfs" in sysfs could be
> > > > 0, but I don't know how that is useful (I'm sure it is; just
> > > > educate me :))
> > >
> > > Let me just quote the bug report that got filed on our internal bug
> > > tracker :)
> > >
> > > When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > > errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > > then tries to set that as the sriov_numvfs parameter.
> > >
> > > For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
> > > but it's set to max. When FW is switched to flower*, the correct
> > > sriov_totalvfs value is presented.
> > >
> > > * flower is a project name
> >
> > From the point of view of the PCI core (which knows nothing about
> > device firmware and relies on the architected config space described
> > by the PCIe spec), this sounds like an erratum: with some firmware
> > installed, the device is not capable of SR-IOV, but still advertises
> > an SR-IOV capability with "TotalVFs > 0".
> >
> > Regardless of whether that's an erratum, we do allow PF drivers to use
> > pci_sriov_set_totalvfs() to limit the number of VFs that may be
> > enabled by writing to the PF's "sriov_numvfs" sysfs file.
> >
> +1.
>
> > But the current implementation does not allow a PF driver to limit VFs
> > to 0, and that does seem nonsensical.
> >
> Well, not really -- claiming to support VFs, and then wanting it to be 0...
> I could certainly argue is non-sensical.
> From a sw perspective, sure, see if we can set VFs to 0 (and reset to another value later).
>
> /me wishes that implementers would follow the architecture vs torquing it into strange shapes.
>
> > > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > > VFs can be enabled, looks like this is the code:
> > >
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > >
> > > > 2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not
> > > > sure what you intend for this. Is *every* driver supposed to
> > > > call it in .remove()? Could/should this be done in the core
> > > > somehow instead of depending on every driver?
> > >
> > > Good question, I was just thinking yesterday we may want to call it
> > > from the core, but I don't think it's strictly necessary nor always
> > > sufficient (we may reload FW without re-probing).
> > >
> > > We have a device which supports different number of VFs based on the FW
> > > loaded. Some legacy FWs does not inform the driver how many VFs it can
> > > support, because it supports max. So the flow in our driver is this:
> > >
> > > load_fw(dev);
> > > ...
> > > max_vfs = ask_fw_for_max_vfs(dev);
> > > if (max_vfs >= 0)
> > > return pci_sriov_set_totalvfs(dev, max_vfs);
> > > else /* FW didn't tell us, assume max */
> > > return pci_sriov_reset_totalvfs(dev);
> > >
> > > We also reset the max on device remove, but that's not strictly
> > > necessary.
> > >
> > > Other users of pci_sriov_set_totalvfs() always know the value to set
> > > the total to (either always get it from FW or it's a constant).
> > >
> > > If you prefer we can work out the correct max for those legacy cases in
> > > the driver as well, although it seemed cleaner to just ask the core,
> > > since it already has total_VFs value handy :)
> > >
> > > > I'm also having a hard time connecting your user-space command example
> > > > with the rest of this. Maybe it will make more sense to me tomorrow
> > > > after some coffee.
> > >
> > > OpenStack assumes it will always be able to set sriov_numvfs to
> > > sriov_totalvfs, see this 'if':
> > >
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
> >
> > Thanks for educating me. I think there are two issues here that we
> > can separate. I extracted the patch below for the first.
> >
> > The second is the question of resetting driver_max_VFs. I think we
> > currently have a general issue in the core:
> >
> > - load PF driver 1
> > - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> > - unload PF driver 1
> > - load PF driver 2
> >
> > Now driver_max_VFs is still stuck at the lower value set by driver 1.
> > I don't think that's the way this should work.
> >
> > I guess this is partly a consequence of setting driver_max_VFs in
> > sriov_init(), which is called before driver attach and should only
> um, if it's at sriov_init() how is max changed by a PF driver?
> or am I missing something subtle (a new sysfs param) as to what is being changed?
sriov_init() basically just sets the default driver_max_VFs to Total_VFs.
If the PF driver later calls pci_sriov_set_totalvfs(), it can reduce
driver_max_VFs.
My concern is that there's nothing that resets driver_max_VFs back to
Total_VFs if we unload and reload the PF driver.
> > depend on hardware characteristics, so it is related to the patch
> > below. But I think we should fix it in general, not just for
> > netronome.
> >
> >
> > commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> > Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date: Fri May 25 08:18:34 2018 -0500
> >
> > PCI/IOV: Allow PF drivers to limit total_VFs to 0
> > Some SR-IOV PF drivers implement .sriov_configure(), which allows
> > user-space to enable VFs by writing the desired number of VFs to the sysfs
> > "sriov_numvfs" file (see sriov_numvfs_store()).
> > The PCI core limits the number of VFs to the TotalVFs advertised by the
> > device in its SR-IOV capability. The PF driver can limit the number of VFs
> > to even fewer (it may have pre-allocated data structures or knowledge of
> > device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> > could not limit the VFs to 0.
> > Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> > by the PF driver, even if the limit is 0.
> > This sequence:
> > pci_sriov_set_totalvfs(dev, 0);
> > x = pci_sriov_get_totalvfs(dev);
> > previously set "x" to TotalVFs from the SR-IOV capability. Now it will set
> > "x" to 0.
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 192b82898a38..d0d73dbbd5ca 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> > iov->nres = nres;
> > iov->ctrl = ctrl;
> > iov->total_VFs = total;
> > + iov->driver_max_VFs = total;
> > pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> > iov->pgsz = pgsz;
> > iov->self = dev;
> > @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> > if (!dev->is_physfn)
> > return 0;
> > - if (dev->sriov->driver_max_VFs)
> > - return dev->sriov->driver_max_VFs;
> > -
> > - return dev->sriov->total_VFs;
> > + return dev->sriov->driver_max_VFs;
> > }
> > EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> >
>
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: dsa_loop: Make dynamic debugging helpful
From: David Miller @ 2018-05-25 20:52 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20180525035215.19341-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 24 May 2018 20:52:14 -0700
> Remove redundant debug prints from phy_read/write since we can trace those
> calls through trace events. Enhance dynamic debug prints to print arguments
> which helps figuring how what is going on at the driver level with higher level
> configuration interfaces.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied, thanks Florian.
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2018-05-25 20:58 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP, from
Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from
Eric Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from Daniel
Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem
de Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio.
Please pull, thanks a lot.
The following changes since commit 6741c4bb389da103c0d79ad1961884628900bfe6:
Merge tag 'mips_fixes_4.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips (2018-05-21 08:58:00 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to eb110410b9f6477726026669f3f0c0567e8241e6:
ibmvnic: Fix partial success login retries (2018-05-25 16:32:48 -0400)
----------------------------------------------------------------
Alexey Kodanev (1):
dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
Anders Roxell (2):
selftests: bpf: config: enable NET_SCH_INGRESS for xdp_meta.sh
selftests: net: reuseport_bpf_numa: don't fail if no numa support
Andrew Zaborowski (1):
mac80211_hwsim: Fix radio dump for radio idx 0
Bo Chen (1):
pcnet32: add an error handling path in pcnet32_probe_pci()
Bob Copeland (1):
mac80211: mesh: fix premature update of rc stats
Colin Ian King (2):
batman-adv: don't pass a NULL hard_iface to batadv_hardif_put
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
Daniel Borkmann (1):
bpf: properly enforce index mask to prevent out-of-bounds speculation
David S. Miller (6):
Merge tag 'mac80211-for-davem-2018-05-23' of git://git.kernel.org/.../jberg/mac80211
Merge branch 'virtio_net-mergeable-XDP'
Merge tag 'wireless-drivers-for-davem-2018-05-22' of git://git.kernel.org/.../kvalo/wireless-drivers
Merge tag 'mlx5-fixes-2018-05-24' of git://git.kernel.org/.../saeed/linux
Merge tag 'batadv-net-for-davem-20180524' of git://git.open-mesh.org/linux-merge
Merge git://git.kernel.org/.../bpf/bpf
Dedy Lansky (1):
nl80211: fix nlmsg allocation in cfg80211_ft_event
Eran Ben Elisha (1):
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
Eric Biggers (2):
cfg80211: further limit wiphy names to 64 bytes
ppp: remove the PPPIOCDETACH ioctl
Eric Dumazet (1):
ipmr: properly check rhltable_init() return value
Fabio Estevam (2):
net: fec: ptp: Switch to SPDX identifier
net: fec: Add a SPDX identifier
Florian Fainelli (2):
net: phy: broadcom: Fix auxiliary control register reads
net: phy: broadcom: Fix bcm_write_exp()
Govindarajulu Varadarajan (1):
enic: set DMA mask to 47 bit
Haim Dreyfuss (1):
cfg80211: fix NULL pointer derference when querying regdb
Jack Morgenstein (1):
net/mlx4: Fix irq-unsafe spinlock usage
Jason Wang (6):
virtio-net: correctly redirect linearized packet
virtio-net: correctly transmit XDP buff after linearizing
virtio-net: correctly check num_buf during err path
virtio-net: fix leaking page for gso packet during mergeable XDP
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
vhost: synchronize IOTLB message with dev cleanup
Kalle Valo (3):
MAINTAINERS: update Kalle's email address
MAINTAINERS: change Kalle as ath.ko maintainer
MAINTAINERS: change Kalle as wcn36xx maintainer
Linus Lüssing (1):
batman-adv: Fix TT sync flags for intermediate TT responses
Marek Lindner (1):
batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
Nathan Fontenot (1):
ibmvnic: Only do H_EOI for mobility events
Or Gerlitz (1):
net : sched: cls_api: deal with egdev path only if needed
Qing Huang (1):
mlx4_core: allocate ICM memory in page size chunks
Rafał Miłecki (3):
bcma: fix buffer size caused crash in bcma_core_mips_print_irq()
Revert "ssb: Prevent build of PCI host features in module"
ssb: make SSB_PCICORE_HOSTMODE depend on SSB = y
Roopa Prabhu (1):
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
Shahed Shaikh (1):
qed: Fix mask for physical address in ILT entry
Stefano Brivio (1):
selftests/net: Add missing config options for PMTU tests
Sven Eckelmann (1):
batman-adv: Avoid race in TT TVLV allocator helper
Thomas Falcon (1):
ibmvnic: Fix partial success login retries
Wenwen Wang (1):
isdn: eicon: fix a missing-check bug
Willem de Bruijn (2):
packet: fix reserve calculation
ipv4: remove warning in ip_recv_error
Xin Long (1):
sctp: fix the issue that flags are ignored when using kernel_connect
Yossi Kuperman (1):
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
Documentation/networking/ppp_generic.txt | 6 ------
MAINTAINERS | 8 ++++----
drivers/bcma/driver_mips.c | 2 +-
drivers/isdn/hardware/eicon/diva.c | 22 +++++++++++++++-------
drivers/isdn/hardware/eicon/diva.h | 5 +++--
drivers/isdn/hardware/eicon/divasmain.c | 18 +++++++++++-------
drivers/net/ethernet/amd/pcnet32.c | 10 +++++++---
drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++++----
drivers/net/ethernet/freescale/fec_main.c | 1 +
drivers/net/ethernet/freescale/fec_ptp.c | 14 +-------------
drivers/net/ethernet/ibm/ibmvnic.c | 22 +++++++++++++++-------
drivers/net/ethernet/mellanox/mlx4/icm.c | 16 +++++++++-------
drivers/net/ethernet/mellanox/mlx4/intf.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/qp.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 12 +++++-------
drivers/net/ethernet/qlogic/qed/qed_cxt.c | 2 +-
drivers/net/phy/bcm-cygnus.c | 6 +++---
drivers/net/phy/bcm-phy-lib.c | 2 +-
drivers/net/phy/bcm-phy-lib.h | 7 +++++++
drivers/net/phy/bcm7xxx.c | 4 ++--
drivers/net/ppp/ppp_generic.c | 27 +++++----------------------
drivers/net/tun.c | 19 +++++++++++++++----
drivers/net/virtio_net.c | 21 ++++++++++-----------
drivers/net/wireless/mac80211_hwsim.c | 4 ++--
drivers/ssb/Kconfig | 4 ++--
drivers/vhost/vhost.c | 3 +++
include/linux/bpf_verifier.h | 2 +-
include/net/sctp/sctp.h | 2 ++
include/uapi/linux/nl80211.h | 2 +-
include/uapi/linux/ppp-ioctl.h | 2 +-
kernel/bpf/verifier.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
net/batman-adv/multicast.c | 2 +-
net/batman-adv/translation-table.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
net/dccp/proto.c | 2 --
net/ipv4/fib_frontend.c | 1 +
net/ipv4/ip_sockglue.c | 2 --
net/ipv4/ipmr_base.c | 5 ++++-
net/mac80211/mesh_plink.c | 8 ++++----
net/packet/af_packet.c | 2 +-
net/sched/cls_api.c | 2 +-
net/sctp/ipv6.c | 2 +-
net/sctp/protocol.c | 2 +-
net/sctp/socket.c | 51 +++++++++++++++++++++++++++++++++++----------------
net/wireless/nl80211.c | 3 ++-
net/wireless/reg.c | 3 +++
tools/testing/selftests/bpf/config | 2 ++
tools/testing/selftests/net/config | 5 +++++
tools/testing/selftests/net/reuseport_bpf_numa.c | 4 +++-
49 files changed, 372 insertions(+), 193 deletions(-)
^ permalink raw reply
* Inefficient call to ipv6_chk_acast_addr_src in icmp6_send
From: Salam Noureddine @ 2018-05-25 21:02 UTC (permalink / raw)
To: Network Development
Hi,
The call to ipv6_chk_acast_addr_src in icmp6_send can be pretty costly on
systems with a lot of net_devices since it can end up looping through all
net_devices in a net namespace searching for an anycast address. A few
thousand icmp6 error packets can end up consuming a whole CPU.
I am thinking of fixing this by adding a hash table along the lines of
inet6_addr_lst,
providing a fast lookup for anycast addresses. Is that the right way to go?
Thanks,
Salam
^ permalink raw reply
* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-05-25 21:05 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>
On Fri, 25 May 2018 09:02:23 -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > >
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > >
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0. Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs. Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.
> > >
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > >
> > > 1) You want this:
> > >
> > > pci_sriov_set_totalvfs(dev, 0);
> > > x = pci_sriov_get_totalvfs(dev)
> > >
> > > to return 0 instead of total_VFs. That seems to connect with
> > > your subject line. It means "sriov_totalvfs" in sysfs could be
> > > 0, but I don't know how that is useful (I'm sure it is; just
> > > educate me :))
> >
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> >
> > When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > then tries to set that as the sriov_numvfs parameter.
> >
> > For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
> > but it's set to max. When FW is switched to flower*, the correct
> > sriov_totalvfs value is presented.
> >
> > * flower is a project name
>
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
>
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.
Think more of an FPGA which can be reprogrammed at runtime to have
different capabilities than an erratum. Some FWs simply have no use
for VFs and save resources (and validation time) by not supporting it.
> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
>
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> >
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> >
> > > 2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not
> > > sure what you intend for this. Is *every* driver supposed to
> > > call it in .remove()? Could/should this be done in the core
> > > somehow instead of depending on every driver?
> >
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> >
> > We have a device which supports different number of VFs based on the FW
> > loaded. Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max. So the flow in our driver is this:
> >
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > return pci_sriov_reset_totalvfs(dev);
> >
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> >
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> >
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> >
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this. Maybe it will make more sense to me tomorrow
> > > after some coffee.
> >
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> >
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
>
> Thanks for educating me. I think there are two issues here that we
> can separate. I extracted the patch below for the first.
>
> The second is the question of resetting driver_max_VFs. I think we
> currently have a general issue in the core:
>
> - load PF driver 1
> - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> - unload PF driver 1
> - load PF driver 2
>
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
>
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below. But I think we should fix it in general, not just for
> netronome.
Okay, perfect. That makes sense. The patch below certainly fixes the
first issue for us. Thank you!
As far as the second issue goes - agreed, having the core reset the
number of VFs to total_VFs definitely makes sense. It doesn't cater to
the case where FW is reloaded without reprobing, but we don't do this
today anyway.
Should I try to come up with a patch to reset total_VFs after detach?
> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Fri May 25 08:18:34 2018 -0500
>
> PCI/IOV: Allow PF drivers to limit total_VFs to 0
>
> Some SR-IOV PF drivers implement .sriov_configure(), which allows
> user-space to enable VFs by writing the desired number of VFs to the sysfs
> "sriov_numvfs" file (see sriov_numvfs_store()).
>
> The PCI core limits the number of VFs to the TotalVFs advertised by the
> device in its SR-IOV capability. The PF driver can limit the number of VFs
> to even fewer (it may have pre-allocated data structures or knowledge of
> device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> could not limit the VFs to 0.
>
> Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> by the PF driver, even if the limit is 0.
>
> This sequence:
>
> pci_sriov_set_totalvfs(dev, 0);
> x = pci_sriov_get_totalvfs(dev);
>
> previously set "x" to TotalVFs from the SR-IOV capability. Now it will set
> "x" to 0.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> iov->nres = nres;
> iov->ctrl = ctrl;
> iov->total_VFs = total;
> + iov->driver_max_VFs = total;
> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> iov->pgsz = pgsz;
> iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> if (!dev->is_physfn)
> return 0;
>
> - if (dev->sriov->driver_max_VFs)
> - return dev->sriov->driver_max_VFs;
> -
> - return dev->sriov->total_VFs;
> + return dev->sriov->driver_max_VFs;
> }
> EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>
^ permalink raw reply
* [PATCH] ath9k: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-25 21:22 UTC (permalink / raw)
To: QCA ath9k Development, Kalle Valo, David S. Miller
Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/ath/ath9k/ar5008_phy.c | 2 ++
drivers/net/wireless/ath/ath9k/ar9002_phy.c | 1 +
drivers/net/wireless/ath/ath9k/main.c | 1 +
3 files changed, 4 insertions(+)
diff --git a/drivers/net/wireless/ath/ath9k/ar5008_phy.c b/drivers/net/wireless/ath/ath9k/ar5008_phy.c
index 7922550..ef2dd68 100644
--- a/drivers/net/wireless/ath/ath9k/ar5008_phy.c
+++ b/drivers/net/wireless/ath/ath9k/ar5008_phy.c
@@ -583,12 +583,14 @@ static void ar5008_hw_init_chain_masks(struct ath_hw *ah)
case 0x5:
REG_SET_BIT(ah, AR_PHY_ANALOG_SWAP,
AR_PHY_SWAP_ALT_CHAIN);
+ /* fall through */
case 0x3:
if (ah->hw_version.macVersion == AR_SREV_REVISION_5416_10) {
REG_WRITE(ah, AR_PHY_RX_CHAINMASK, 0x7);
REG_WRITE(ah, AR_PHY_CAL_CHAINMASK, 0x7);
break;
}
+ /* else: fall through */
case 0x1:
case 0x2:
case 0x7:
diff --git a/drivers/net/wireless/ath/ath9k/ar9002_phy.c b/drivers/net/wireless/ath/ath9k/ar9002_phy.c
index 61a9b85..7132918 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_phy.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_phy.c
@@ -119,6 +119,7 @@ static int ar9002_hw_set_channel(struct ath_hw *ah, struct ath9k_channel *chan)
aModeRefSel = 2;
if (aModeRefSel)
break;
+ /* else: fall through */
case 1:
default:
aModeRefSel = 0;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index a3be8ad..11d84f4 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1928,6 +1928,7 @@ static int ath9k_ampdu_action(struct ieee80211_hw *hw,
case IEEE80211_AMPDU_TX_STOP_FLUSH:
case IEEE80211_AMPDU_TX_STOP_FLUSH_CONT:
flush = true;
+ /* fall through */
case IEEE80211_AMPDU_TX_STOP_CONT:
ath9k_ps_wakeup(sc);
ath_tx_aggr_stop(sc, sta, tid);
--
2.7.4
^ permalink raw reply related
* [PATCH] IB: Revert "remove redundant INFINIBAND kconfig dependencies"
From: Arnd Bergmann @ 2018-05-25 21:29 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Latchesar Ionkov, samba-technical-w/Ol4Ecudpl8XjKLYN78aQ,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Christoph Hellwig, devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
linux-cifs-u79uwXL29TY76Z2rM5mHXA,
rds-devel-N0ozoZBvEnrZJqsBc5GL+g, Sagi Grimberg,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields,
Bart Van Assche, Greg Thelen, Arnd Bergmann, Eric Van Hensbergen,
Santosh Shilimkar, Jens Axboe,
v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Trond Myklebust,
Oleg Drokin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
Jeff Layton, linux-kernel-u79uwXL29TY76Z2rM5mHXA, David S. Miller
Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
link error when another driver using it is built-in. The
INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
a 'bool' symbol that does not force anything to be a module in turn.
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_request':
trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_destroy_trans':
trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
Fixes: 9533b292a7ac ("IB: remove redundant INFINIBAND kconfig dependencies")
Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
---
The patch that introduced the problem has been queued in the
rdma-fixes/for-rc tree. Please revert the patch before sending
the branch to Linus.
---
drivers/infiniband/ulp/srpt/Kconfig | 2 +-
drivers/nvme/host/Kconfig | 2 +-
drivers/nvme/target/Kconfig | 2 +-
drivers/staging/lustre/lnet/Kconfig | 2 +-
fs/cifs/Kconfig | 2 +-
net/9p/Kconfig | 2 +-
net/rds/Kconfig | 2 +-
net/sunrpc/Kconfig | 2 +-
8 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/Kconfig b/drivers/infiniband/ulp/srpt/Kconfig
index 25bf6955b6d0..fb8b7182f05e 100644
--- a/drivers/infiniband/ulp/srpt/Kconfig
+++ b/drivers/infiniband/ulp/srpt/Kconfig
@@ -1,6 +1,6 @@
config INFINIBAND_SRPT
tristate "InfiniBand SCSI RDMA Protocol target support"
- depends on INFINIBAND_ADDR_TRANS && TARGET_CORE
+ depends on INFINIBAND && INFINIBAND_ADDR_TRANS && TARGET_CORE
---help---
Support for the SCSI RDMA Protocol (SRP) Target driver. The
diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index dbb7464c018c..88a8b5916624 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -27,7 +27,7 @@ config NVME_FABRICS
config NVME_RDMA
tristate "NVM Express over Fabrics RDMA host driver"
- depends on INFINIBAND_ADDR_TRANS && BLOCK
+ depends on INFINIBAND && INFINIBAND_ADDR_TRANS && BLOCK
select NVME_CORE
select NVME_FABRICS
select SG_POOL
diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig
index 7595664ee753..3c7b61ddb0d1 100644
--- a/drivers/nvme/target/Kconfig
+++ b/drivers/nvme/target/Kconfig
@@ -27,7 +27,7 @@ config NVME_TARGET_LOOP
config NVME_TARGET_RDMA
tristate "NVMe over Fabrics RDMA target support"
- depends on INFINIBAND_ADDR_TRANS
+ depends on INFINIBAND && INFINIBAND_ADDR_TRANS
depends on NVME_TARGET
select SGL_ALLOC
help
diff --git a/drivers/staging/lustre/lnet/Kconfig b/drivers/staging/lustre/lnet/Kconfig
index f3b1ad4bd3dc..ad049e6f24e4 100644
--- a/drivers/staging/lustre/lnet/Kconfig
+++ b/drivers/staging/lustre/lnet/Kconfig
@@ -34,7 +34,7 @@ config LNET_SELFTEST
config LNET_XPRT_IB
tristate "LNET infiniband support"
- depends on LNET && PCI && INFINIBAND_ADDR_TRANS
+ depends on LNET && PCI && INFINIBAND && INFINIBAND_ADDR_TRANS
default LNET && INFINIBAND
help
This option allows the LNET users to use infiniband as an
diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index d61e2de8d0eb..5f132d59dfc2 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -197,7 +197,7 @@ config CIFS_SMB311
config CIFS_SMB_DIRECT
bool "SMB Direct support (Experimental)"
- depends on CIFS=m && INFINIBAND_ADDR_TRANS || CIFS=y && INFINIBAND_ADDR_TRANS=y
+ depends on CIFS=m && INFINIBAND && INFINIBAND_ADDR_TRANS || CIFS=y && INFINIBAND=y && INFINIBAND_ADDR_TRANS=y
help
Enables SMB Direct experimental support for SMB 3.0, 3.02 and 3.1.1.
SMB Direct allows transferring SMB packets over RDMA. If unsure,
diff --git a/net/9p/Kconfig b/net/9p/Kconfig
index 46c39f7da444..e6014e0e51f7 100644
--- a/net/9p/Kconfig
+++ b/net/9p/Kconfig
@@ -32,7 +32,7 @@ config NET_9P_XEN
config NET_9P_RDMA
- depends on INET && INFINIBAND_ADDR_TRANS
+ depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS
tristate "9P RDMA Transport (Experimental)"
help
This builds support for an RDMA transport.
diff --git a/net/rds/Kconfig b/net/rds/Kconfig
index 1a31502ee7db..bffde4b46c5d 100644
--- a/net/rds/Kconfig
+++ b/net/rds/Kconfig
@@ -8,7 +8,7 @@ config RDS
config RDS_RDMA
tristate "RDS over Infiniband"
- depends on RDS && INFINIBAND_ADDR_TRANS
+ depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS
---help---
Allow RDS to use Infiniband as a transport.
This transport supports RDMA operations.
diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
index 6358e5271070..ac09ca803296 100644
--- a/net/sunrpc/Kconfig
+++ b/net/sunrpc/Kconfig
@@ -50,7 +50,7 @@ config SUNRPC_DEBUG
config SUNRPC_XPRT_RDMA
tristate "RPC-over-RDMA transport"
- depends on SUNRPC && INFINIBAND_ADDR_TRANS
+ depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS
default SUNRPC && INFINIBAND
select SG_POOL
help
--
2.9.0
^ permalink raw reply related
* [PATCH, net-next 1/2] bpf: btf: avoid -Wreturn-type warning
From: Arnd Bergmann @ 2018-05-25 21:33 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann
Cc: Arnd Bergmann, Martin KaFai Lau, Song Liu, netdev, linux-kernel
gcc warns about a noreturn function possibly returning in
some configurations:
kernel/bpf/btf.c: In function 'env_type_is_resolve_sink':
kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type]
Using BUG() instead of BUG_ON() avoids that warning and otherwise
does the exact same thing.
Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
kernel/bpf/btf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 9cbeabb5aca3..2822a0cf4f48 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -749,7 +749,7 @@ static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
!btf_type_is_array(next_type) &&
!btf_type_is_struct(next_type);
default:
- BUG_ON(1);
+ BUG();
}
}
--
2.9.0
^ permalink raw reply related
* [PATCH, net-next 2/2] bpf: avoid -Wmaybe-uninitialized warning
From: Arnd Bergmann @ 2018-05-25 21:33 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann
Cc: Arnd Bergmann, Yonghong Song, David S. Miller, Song Liu,
Martin KaFai Lau, Chenbo Feng, Jakub Kicinski, netdev,
linux-kernel
In-Reply-To: <20180525213331.2115471-1-arnd@arndb.de>
The stack_map_get_build_id_offset() function is too long for gcc to track
whether 'work' may or may not be initialized at the end of it, leading
to a false-positive warning:
kernel/bpf/stackmap.c: In function 'stack_map_get_build_id_offset':
kernel/bpf/stackmap.c:334:13: error: 'work' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This removes the 'in_nmi_ctx' flag and uses the state of that variable
itself to see if it got initialized.
Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
kernel/bpf/stackmap.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b59ace0f0f09..b675a3f3d141 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -285,11 +285,10 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
{
int i;
struct vm_area_struct *vma;
- bool in_nmi_ctx = in_nmi();
bool irq_work_busy = false;
- struct stack_map_irq_work *work;
+ struct stack_map_irq_work *work = NULL;
- if (in_nmi_ctx) {
+ if (in_nmi()) {
work = this_cpu_ptr(&up_read_work);
if (work->irq_work.flags & IRQ_WORK_BUSY)
/* cannot queue more up_read, fallback */
@@ -328,7 +327,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
}
- if (!in_nmi_ctx) {
+ if (!work) {
up_read(¤t->mm->mmap_sem);
} else {
work->sem = ¤t->mm->mmap_sem;
--
2.9.0
^ permalink raw reply related
* [PATCH, net-next] net/mlx5e: fix TLS dependency
From: Arnd Bergmann @ 2018-05-25 21:36 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, David S. Miller
Cc: Arnd Bergmann, Boris Pismenny, Ilan Tayari, Or Gerlitz,
Ilya Lesokhin, Feras Daoud, netdev, linux-rdma, linux-kernel
With CONFIG_TLS=m and MLX5_CORE_EN=y, we get a link failure:
drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In function `mlx5e_tls_handle_ooo':
tls_rxtx.c:(.text+0x24c): undefined reference to `tls_get_record'
drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In function `mlx5e_tls_handle_tx_skb':
tls_rxtx.c:(.text+0x9a8): undefined reference to `tls_device_sk_destruct'
This narrows down the dependency to only allow the configurations
that will actually work. The existing dependency on TLS_DEVICE is
not sufficient here since MLX5_EN_TLS is a 'bool' symbol.
Fixes: c83294b9efa5 ("net/mlx5e: TLS, Add Innova TLS TX support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index ee6684779d11..2545296a0c08 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -91,6 +91,7 @@ config MLX5_EN_TLS
bool "TLS cryptography-offload accelaration"
depends on MLX5_CORE_EN
depends on TLS_DEVICE
+ depends on TLS=y || MLX5_CORE=m
depends on MLX5_ACCEL
default n
---help---
--
2.9.0
^ permalink raw reply related
* [PATCH, net-next] qcom-emag: hide ACPI specific functions
From: Arnd Bergmann @ 2018-05-25 21:37 UTC (permalink / raw)
To: Timur Tabi, David S. Miller
Cc: Arnd Bergmann, Hemanth Puranik, netdev, linux-kernel
A couple of functions in this file are only used when building with
ACPI enabled, leading to a build warning on most architectures:
drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: error: 'qdf2400_ops' defined but not used [-Werror=unused-variable]
static struct sgmii_ops qdf2400_ops = {
^~~~~~~~~~~
drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: error: 'qdf2432_ops' defined but not used [-Werror=unused-variable]
static struct sgmii_ops qdf2432_ops = {
This hides all the unused functions by putting them into the
corresponding #ifdef.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/qualcomm/emac/emac-sgmii.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
index 562420b834df..01b80e0a5367 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
@@ -108,6 +108,7 @@ static void emac_sgmii_link_init(struct emac_adapter *adpt)
writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2);
}
+#ifdef CONFIG_ACPI
static int emac_sgmii_irq_clear(struct emac_adapter *adpt, u8 irq_bits)
{
struct emac_sgmii *phy = &adpt->phy;
@@ -288,6 +289,7 @@ static struct sgmii_ops qdf2400_ops = {
.link_change = emac_sgmii_common_link_change,
.reset = emac_sgmii_common_reset,
};
+#endif
static int emac_sgmii_acpi_match(struct device *dev, void *data)
{
--
2.9.0
^ permalink raw reply related
* Re: [PATCH] IB: Revert "remove redundant INFINIBAND kconfig dependencies"
From: Leon Romanovsky @ 2018-05-25 21:38 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Doug Ledford, Jason Gunthorpe, Keith Busch, Jens Axboe,
Christoph Hellwig, Sagi Grimberg, Oleg Drokin, Andreas Dilger,
James Simmons, Greg Kroah-Hartman, Steve French,
Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov,
David S. Miller, Santosh Shilimkar, Trond Myklebust,
Anna Schumaker, J. Bruce Fields, Jeff Layton <jlayto
In-Reply-To: <20180525213123.2113748-1-arnd@arndb.de>
[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]
On Fri, May 25, 2018 at 11:29:59PM +0200, Arnd Bergmann wrote:
> Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
> on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
> link error when another driver using it is built-in. The
> INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
> a 'bool' symbol that does not force anything to be a module in turn.
>
> fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
> smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
> net/9p/trans_rdma.o: In function `rdma_request':
> trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
> net/9p/trans_rdma.o: In function `rdma_destroy_trans':
> trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
> trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
>
> Fixes: 9533b292a7ac ("IB: remove redundant INFINIBAND kconfig dependencies")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> The patch that introduced the problem has been queued in the
> rdma-fixes/for-rc tree. Please revert the patch before sending
> the branch to Linus.
> ---
It was already sent to Linus.
https://marc.info/?l=linux-rdma&m=152719509803047&w=2
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply
* [PATCH] mwifiex: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-25 21:38 UTC (permalink / raw)
To: Amitkumar Karwar, Nishant Sarmukadam, Ganapathi Bhat, Xinming Hu,
Kalle Valo, David S. Miller
Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 4 ++++
drivers/net/wireless/marvell/mwifiex/scan.c | 1 +
2 files changed, 5 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 54a2297..16a705d 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -1158,6 +1158,7 @@ mwifiex_cfg80211_change_virtual_intf(struct wiphy *wiphy,
case NL80211_IFTYPE_UNSPECIFIED:
mwifiex_dbg(priv->adapter, INFO,
"%s: kept type as IBSS\n", dev->name);
+ /* fall through */
case NL80211_IFTYPE_ADHOC: /* This shouldn't happen */
return 0;
default:
@@ -1188,6 +1189,7 @@ mwifiex_cfg80211_change_virtual_intf(struct wiphy *wiphy,
case NL80211_IFTYPE_UNSPECIFIED:
mwifiex_dbg(priv->adapter, INFO,
"%s: kept type as STA\n", dev->name);
+ /* fall through */
case NL80211_IFTYPE_STATION: /* This shouldn't happen */
return 0;
default:
@@ -1210,6 +1212,7 @@ mwifiex_cfg80211_change_virtual_intf(struct wiphy *wiphy,
case NL80211_IFTYPE_UNSPECIFIED:
mwifiex_dbg(priv->adapter, INFO,
"%s: kept type as AP\n", dev->name);
+ /* fall through */
case NL80211_IFTYPE_AP: /* This shouldn't happen */
return 0;
default:
@@ -1249,6 +1252,7 @@ mwifiex_cfg80211_change_virtual_intf(struct wiphy *wiphy,
case NL80211_IFTYPE_UNSPECIFIED:
mwifiex_dbg(priv->adapter, INFO,
"%s: kept type as P2P\n", dev->name);
+ /* fall through */
case NL80211_IFTYPE_P2P_CLIENT:
case NL80211_IFTYPE_P2P_GO:
return 0;
diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c
index d7ce7f7..19df92b 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -1308,6 +1308,7 @@ int mwifiex_update_bss_desc_with_ie(struct mwifiex_adapter *adapter,
case WLAN_EID_CHANNEL_SWITCH:
bss_entry->chan_sw_ie_present = true;
+ /* fall through */
case WLAN_EID_PWR_CAPABILITY:
case WLAN_EID_TPC_REPORT:
case WLAN_EID_QUIET:
--
2.7.4
^ permalink raw reply related
* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Cong Wang @ 2018-05-25 21:40 UTC (permalink / raw)
To: Vlad Buslov
Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
Jiri Pirko, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, Kees Cook,
LKML, NetFilter, coreteam, kliteyn
In-Reply-To: <vbfo9h3zd01.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>
On Fri, May 25, 2018 at 1:39 PM, Vlad Buslov <vladbu@mellanox.com> wrote:
>
> On Thu 24 May 2018 at 23:34, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov <vladbu@mellanox.com> wrote:
>>> Currently, all netlink protocol handlers for updating rules, actions and
>>> qdiscs are protected with single global rtnl lock which removes any
>>> possibility for parallelism. This patch set is a first step to remove
>>> rtnl lock dependency from TC rules update path. It updates act API to
>>> use atomic operations, rcu and spinlocks for fine-grained locking. It
>>> also extend API with functions that are needed to update existing
>>> actions for parallel execution.
>>
>> Can you give a summary here for what and how it is achieved?
>
> Got it, will expand cover letter in V2 with summary.
>>
>> You said this is the first step, what do you want to achieve in this
>> very first step? And how do you achieve it? Do you break the RTNL
>
> But aren't this questions answered in paragraph you quoted?
Obviously not, you said to remove it, but never explains why it can
be removed and how it is removed. This is crucial for review.
"use atomic operations, rcu and spinlocks for fine-grained locking"
is literately nothing, why atomic/rcu makes RTNL unnecessary?
How RCU is used? What spinlocks are you talking about? What
do these spinlocks protect after removing RTNL? Why are they
safe with other netdevice and netns operations?
You explain _nothing_ here. Really. Please don't force people to
read 14 patches to understand how it works. In fact, no one wants
to read the code unless there is some high-level explanation that
makes basic sense.
> What: Change act API to not rely on one-big-global-RTNL-lock and to use
> more fine-grained synchronization methods to allow safe concurrent
> execution.
Sure, how fine-grained it is after your patchset? Why this fine-grained
lock could safely replace RTNL?
Could you stop letting us guess your puzzle words? It would save your
time from exchanging emails with me, it would save my time from
guessing you too. It is a win-win.
> How: Refactor act API code to use atomics, rcu and spinlocks, etc. for
> protecting shared data structures, add new functions required to update
What shared data structures? The per-netns idr which is already protected
by a spinlock? The TC hierarchy? The shared standalone actions? Hey,
why do I have to guess? :-/
> specific actions implementation for parallel execution. (step 2)
Claim is easy, prove is hard. I can easily claim I break RTNL down
to a per-netns lock, but I can't prove it really works. :-D
>
> If you feel that this cover letter is too terse, I will add outline of
> changes in V2.
It is not my rule, it is how you have to help people to review your
14 patches. I think it is a fair game: you help people like me to
review your patches, we help you to get them reviewed and merged
if they all make sense.
>
>> lock down to, for a quick example, a per-device lock? Or perhaps you
>> completely remove it because of what reason?
>
> I want to remove RTNL _dependency_ from act API data structures and
> code. I probably should me more specific in this case:
>
> Florian recently made a change that allows registering netlink protocol
> handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with
> this flag are called without RTNL taken. My end goal is to have rule
> update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered
> with UNLOCKED flag to allow parallel execution.
Please add this paragraph in your cover letter, it is very important for review.
>
> I do not intend to globally remove or break RTNL.
>
>>
>> I go through all the descriptions of your 14 patches (but not any code),
>> I still have no clue how you successfully avoid RTNL. Please don't
>> let me read into your code to understand that, there must be some
>> high-level justification on how it works. Without it, I don't event want
>> to read into the code.
>
> On internal code review I've been asked not to duplicate info from
> commit messages in cover letter, but I guess I can expand it with some
> high level outline in V2.
In cover letter, you should put a high-level overview of "why" and "how".
If, in the worst case, on high-level it doesn't make sense, why should
we bother to read the code? In short, you have to convince people to
read your code here.
In each patch description, you should explain what a single patch does.
I don't see any duplication.
^ permalink raw reply
* [RFC PATCH 1/2] net: macb: Add CAP to disable hardware TX checksum offloading
From: Jennifer Dahm @ 2018-05-25 21:44 UTC (permalink / raw)
To: netdev, David S . Miller, Nicolas Ferre; +Cc: Nathan Sullivan, Jennifer Dahm
In-Reply-To: <1527284654-24835-1-git-send-email-jennifer.dahm@ni.com>
Certain PHYs have significant bugs in their TX checksum offloading
that cannot be solved in software. In order to accommodate these PHYS,
add a CAP to disable this hardware.
Signed-off-by: Jennifer Dahm <jennifer.dahm@ni.com>
---
drivers/net/ethernet/cadence/macb.h | 1 +
drivers/net/ethernet/cadence/macb_main.c | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 8665982..6b85e97 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -635,6 +635,7 @@
#define MACB_CAPS_USRIO_DISABLED 0x00000010
#define MACB_CAPS_JUMBO 0x00000020
#define MACB_CAPS_GEM_HAS_PTP 0x00000040
+#define MACB_CAPS_DISABLE_TX_HW_CSUM 0x00000080
#define MACB_CAPS_FIFO_MODE 0x10000000
#define MACB_CAPS_GIGABIT_MODE_AVAILABLE 0x20000000
#define MACB_CAPS_SG_DISABLED 0x40000000
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 3e93df5..a5d564b 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3360,8 +3360,12 @@ static int macb_init(struct platform_device *pdev)
dev->hw_features |= MACB_NETIF_LSO;
/* Checksum offload is only available on gem with packet buffer */
- if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE))
- dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+ if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE)) {
+ if (!(bp->caps & MACB_CAPS_DISABLE_TX_HW_CSUM))
+ dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+ else
+ dev->hw_features |= NETIF_F_RXCSUM;
+ }
if (bp->caps & MACB_CAPS_SG_DISABLED)
dev->hw_features &= ~NETIF_F_SG;
dev->features = dev->hw_features;
--
2.7.4
^ permalink raw reply related
* [RFC PATCH 2/2] net: macb: Disable TX checksum offloading on all Zynq
From: Jennifer Dahm @ 2018-05-25 21:44 UTC (permalink / raw)
To: netdev, David S . Miller, Nicolas Ferre; +Cc: Nathan Sullivan, Jennifer Dahm
In-Reply-To: <1527284654-24835-1-git-send-email-jennifer.dahm@ni.com>
The Zynq ethernet hardware has checksum offloading bugs that cause
small UDP packets (<= 2 bytes) to be sent with an incorrect checksum
(0xffff) and forwarded UDP packets to be re-checksummed, which is
illegal behavior. The best solution we have right now is to disable
hardware TX checksum offloading entirely.
Signed-off-by: Jennifer Dahm <jennifer.dahm@ni.com>
---
drivers/net/ethernet/cadence/macb_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a5d564b..e8cc68a 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3807,7 +3807,8 @@ static const struct macb_config zynqmp_config = {
};
static const struct macb_config zynq_config = {
- .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_NO_GIGABIT_HALF,
+ .caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_NO_GIGABIT_HALF
+ | MACB_CAPS_DISABLE_TX_HW_CSUM,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
--
2.7.4
^ permalink raw reply related
* [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
From: Jennifer Dahm @ 2018-05-25 21:44 UTC (permalink / raw)
To: netdev, David S . Miller, Nicolas Ferre; +Cc: Nathan Sullivan, Jennifer Dahm
During testing, I discovered that the Zynq GEM hardware overwrites all
outgoing UDP packet checksums, which is illegal in packet forwarding
cases. This happens both with and without the checksum-zeroing
behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6
("net: macb: initialize checksum when using checksum offloading"). The
only solution to both the small packet bug and the packet forwarding
bug that I can find is to disable TX checksum offloading entirely.
There's still the possibility that these bugs are actually with the
driver software and not with the hardware. I've found several places
where the checksum is set to 0xFFFF (the incorrect checksum found in
small packets) when something goes wrong, and I can imagine a buggy
driver writing over the checksum blindly when TX checksum offloading
is enabled.
I would like feedback on two things:
1. Is it possible that the two bugs described above are caused by the
driver and not by the hardware? If so, where should I look to
implicate the driver?
2. Is this a problem we care enough about to completely disable TX
checksum offloading?
Here is the testing procedure I used to reproduce these bugs on my
machine. Specifically, without this patchset, step 9 fails. Without
007e4ba3ee, step 8 also fails.
1. Set up the test environment:
a. Acquire a Zynq device with two ethernet ports. This is the DUT.
b. Acquire a USB-Ethernet adapter.
c. Acquire two ethernet cables.
d. Connect one Ethernet port on the DUT to your computer's network
switch.
e. Connect the other Ethernet port to the USB-Ethernet adapter and
plug that adapter into your computer.
f. Set up a Linux VM to send packets through the DUT. I recommend
using a VM here so that you can easily detach it from the primary
network to force outgoing traffic through the DUT.
g. Set up a computer with a packet inspecting program to receive and
inspect packets. This doesn't need to be a VM. For the purposes
of this test, I'll be using a Windows instance with WireShark.
2. Load the kernel you want to test onto the DUT, making sure to
include the `bridge` module.
3. Set up a bridge on the DUT. The following commands on the DUT
should work, replacing `eth0` and `eth1` with the two ethernet
interfaces on the DUT:
```
brctl addbr test
brctl addif test eth0 eth1
ifconfig eth0 0.0.0.0
ifconfig eth1 0.0.0.0
dhclient test -v
```
4. Disconnect the Linux VM from your host computer's network and
connect it to the USB-Ethernet adapter in order to force outgoing
network traffic through the DUT. If necessary, run dhclient on the
Linux VM to acquire an IP address.
5. Ensure that you can reach your Windows instance from your Linux VM
through the DUT (e.g. ping).
6. Start WireShark on your Windows instance and start monitoring
traffic on a specific, unused port (e.g. 61557).
7. Using netcat, send a few not-tiny UDP packets from your Linux VM to
your Windows instance to ensure that valid UDP packets are properly
forwarded. Ex:
```
echo "hello world" | netcat -u <WindowsIP> 61557
```
Inspect these packets to ensure that the data arrived intact and
that the checksum looks reasonable (i.e. not 0x0000 or 0xFFFF).
8. Using netcat, send a few tiny UDP packets (2 bytes or fewer) from
Linux VM to your Windows instance to ensure that the checksum is
reasonable. Ex:
```
echo "h" | netcat -u <WindowsIP> 61557
```
9. Using a custom program, send UDP packets with broken checksums
(e.g. 0xABCD) from your Linux VM to your Windows instance. Inspect
these packets with WireShark and make sure that the packet arrived
with the same checksum you sent it with.
For step 9, I wrote a C program using the Linux socket API that will
send a properly formatted UDP packet with the payload "Hello!" and a
(broken) checksum of 0xABCD to port 61557 on the host provided at the
command line. I can send the full program if you would like, but here
is the important part of it:
```
struct custom_udp {
int16_t s_port;
int16_t d_port;
int16_t length;
int16_t check;
char data[];
};
int send_message(int sockfd, in_port_t port, const char *message) {
struct custom_udp *frame;
int16_t message_len;
int16_t frame_len;
int ret;
message_len = strlen(message) * sizeof(char);
frame_len = sizeof(struct custom_udp) + message_len;
frame = malloc(frame_len);
frame->s_port = htons(0);
frame->d_port = htons(port);
frame->length = htons(frame_len);
frame->check = htons(0xABCD);
memmove(frame->data, message, message_len);
ret = write(sockfd, frame, frame_len);
free(frame);
return ret;
}
```
Jennifer Dahm (1):
net/macb: Disable TX checksum offloading on all Zynq-7000
drivers/net/ethernet/cadence/macb.h | 1 +
drivers/net/ethernet/cadence/macb_main.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)
--
2.7.4
^ permalink raw reply
* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-25 21:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140521.662a9c96@cakuba>
On Fri, May 25, 2018 at 02:05:21PM -0700, Jakub Kicinski wrote:
> On Fri, 25 May 2018 09:02:23 -0500, Bjorn Helgaas wrote:
> > On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > > to not fail, e.g.:
> > > > >
> > > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > >
> > > > > For devices which VF support depends on loaded FW we have the
> > > > > pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
> > > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > > to 0. Remove the special values completely and simply initialize
> > > > > driver_max_VFs to total_VFs. Then always use driver_max_VFs.
> > > > > Add a helper for drivers to reset the VF limit back to total.
> > > >
> > > > I still can't really make sense out of the changelog.
> > > >
> > > > I think part of the reason it's confusing is because there are two
> > > > things going on:
> > > >
> > > > 1) You want this:
> > > >
> > > > pci_sriov_set_totalvfs(dev, 0);
> > > > x = pci_sriov_get_totalvfs(dev)
> > > >
> > > > to return 0 instead of total_VFs. That seems to connect with
> > > > your subject line. It means "sriov_totalvfs" in sysfs could be
> > > > 0, but I don't know how that is useful (I'm sure it is; just
> > > > educate me :))
> > >
> > > Let me just quote the bug report that got filed on our internal bug
> > > tracker :)
> > >
> > > When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > > errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > > then tries to set that as the sriov_numvfs parameter.
> > >
> > > For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
> > > but it's set to max. When FW is switched to flower*, the correct
> > > sriov_totalvfs value is presented.
> > >
> > > * flower is a project name
> >
> > From the point of view of the PCI core (which knows nothing about
> > device firmware and relies on the architected config space described
> > by the PCIe spec), this sounds like an erratum: with some firmware
> > installed, the device is not capable of SR-IOV, but still advertises
> > an SR-IOV capability with "TotalVFs > 0".
> >
> > Regardless of whether that's an erratum, we do allow PF drivers to use
> > pci_sriov_set_totalvfs() to limit the number of VFs that may be
> > enabled by writing to the PF's "sriov_numvfs" sysfs file.
>
> Think more of an FPGA which can be reprogrammed at runtime to have
> different capabilities than an erratum. Some FWs simply have no use
> for VFs and save resources (and validation time) by not supporting it.
This is a bit of a gray area. Reloading firmware or reprogramming an
FPGA has the potential to create a new and different device than we
had before, but the PCI core doesn't know that. The typical sequence
is:
- PCI core enumerates device
- driver binds to device (we call .probe())
- driver loads new firmware to device
- driver resets device with pci_reset_function() or similar
- pci_reset_function() saves config space
- pci_reset_function() resets device
- device uses new firmware when it comes out of reset
- pci_reset_function() restores config space
Loading the new firmware might change what the device looks like in
config space -- it could change the number or size of BARs, the
capabilities advertised, etc. We currently sweep that under the rug
and blindly restore the old config space.
It looks like your driver does the reset differently, so maybe it
keeps the original config space setup.
But all that said, I agree that we should allow a PF driver to prevent
VF enablement, whether because the firmware doesn't support it or the
PF driver just wants to prevent use of VFs for whatever reason (maybe
we don't have enough MMIO resources, we don't need the VFs, etc.)
> Okay, perfect. That makes sense. The patch below certainly fixes the
> first issue for us. Thank you!
>
> As far as the second issue goes - agreed, having the core reset the
> number of VFs to total_VFs definitely makes sense. It doesn't cater to
> the case where FW is reloaded without reprobing, but we don't do this
> today anyway.
>
> Should I try to come up with a patch to reset total_VFs after detach?
Yes, please.
Bjorn
^ permalink raw reply
* Re: [PATCH, net-next 1/2] bpf: btf: avoid -Wreturn-type warning
From: Song Liu @ 2018-05-25 21:53 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Alexei Starovoitov, Daniel Borkmann, Martin Lau,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20180525213331.2115471-1-arnd@arndb.de>
> On May 25, 2018, at 2:33 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> gcc warns about a noreturn function possibly returning in
> some configurations:
>
> kernel/bpf/btf.c: In function 'env_type_is_resolve_sink':
> kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type]
>
> Using BUG() instead of BUG_ON() avoids that warning and otherwise
> does the exact same thing.
>
> Fixes: eb3f595dab40 ("bpf: btf: Validate type reference")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> kernel/bpf/btf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 9cbeabb5aca3..2822a0cf4f48 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -749,7 +749,7 @@ static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
> !btf_type_is_array(next_type) &&
> !btf_type_is_struct(next_type);
> default:
> - BUG_ON(1);
> + BUG();
> }
> }
>
> --
> 2.9.0
>
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox