* RE: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC address support for ethtool nftuple filters
From: Brown, Aaron F @ 2018-04-05 1:47 UTC (permalink / raw)
To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180329210751.11531-9-vinicius.gomes@intel.com>
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Vinicius Costa Gomes
> Sent: Thursday, March 29, 2018 2:08 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC
> address support for ethtool nftuple filters
>
> This adds the capability of configuring the queue steering of arriving
> packets based on their source and destination MAC addresses.
>
> In practical terms this adds support for the following use cases,
> characterized by these examples:
>
> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
> to the RX queue 0)
This is now working for me, testing with the dst MAC being the MAC on the i210. I set the filter and all the traffic to the destination MAC address gets routed to the chosen RX queue.
> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
> (this will direct packets with source address "44:44:44:44:44:44" to
> the RX queue 3)
However, I am still not getting the raw ethernet source filter to work. Even back to back with no other system to "confuse" the stream, I set the filter so the source MAC is the same as the MAC on the link partner, send traffic and the traffic bounces around the queues as if the filter is not set.
>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
> drivers/net/ethernet/intel/igb/igb_ethtool.c | 35
> ++++++++++++++++++++++++----
> 1 file changed, 31 insertions(+), 4 deletions(-)
^ permalink raw reply
* Re: linux-next: manual merge of the net-next tree with the pci tree
From: Stephen Rothwell @ 2018-04-05 1:40 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: David Miller, Networking, Linux-Next Mailing List,
Linux Kernel Mailing List, Tariq Toukan, Saeed Mahameed,
Tal Gilboa
In-Reply-To: <20180403131454.4f9f032d@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 3018 bytes --]
Hi all,
On Tue, 3 Apr 2018 13:14:54 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>
> between commit:
>
> 2907938d2375 ("net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth")
>
> from the pci tree and commit:
>
> 0608d4dbaf4e ("net/mlx5e: Unify slow PCI heuristic")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging. You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 884337f88589,0aab3afc6885..000000000000
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@@ -3880,16 -4026,50 +4033,20 @@@ void mlx5e_build_default_indir_rqt(u32
> indirection_rqt[i] = i % num_channels;
> }
>
> - static bool cqe_compress_heuristic(u32 link_speed, u32 pci_bw)
> -static int mlx5e_get_pci_bw(struct mlx5_core_dev *mdev, u32 *pci_bw)
> -{
> - enum pcie_link_width width;
> - enum pci_bus_speed speed;
> - int err = 0;
> -
> - err = pcie_get_minimum_link(mdev->pdev, &speed, &width);
> - if (err)
> - return err;
> -
> - if (speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
> - return -EINVAL;
> -
> - switch (speed) {
> - case PCIE_SPEED_2_5GT:
> - *pci_bw = 2500 * width;
> - break;
> - case PCIE_SPEED_5_0GT:
> - *pci_bw = 5000 * width;
> - break;
> - case PCIE_SPEED_8_0GT:
> - *pci_bw = 8000 * width;
> - break;
> - default:
> - return -EINVAL;
> - }
> -
> - return 0;
> -}
> -
> + static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
> {
> - return (link_speed && pci_bw &&
> - (pci_bw < 40000) && (pci_bw < link_speed));
> - }
> + u32 link_speed = 0;
> + u32 pci_bw = 0;
>
> - static bool hw_lro_heuristic(u32 link_speed, u32 pci_bw)
> - {
> - return !(link_speed && pci_bw &&
> - (pci_bw <= 16000) && (pci_bw < link_speed));
> + mlx5e_get_max_linkspeed(mdev, &link_speed);
> - mlx5e_get_pci_bw(mdev, &pci_bw);
> ++ pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
> + mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
> + link_speed, pci_bw);
> +
> + #define MLX5E_SLOW_PCI_RATIO (2)
> +
> + return link_speed && pci_bw &&
> + link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
> }
>
> void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
This is now a conflict between the pci tree and Linus' tree.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH net] netns: filter uevents correctly
From: Christian Brauner @ 2018-04-05 1:35 UTC (permalink / raw)
To: Eric W. Biederman, davem
Cc: Christian Brauner, gregkh, netdev, linux-kernel, avagin, ktkhai,
serge
In-Reply-To: <871sfuha2d.fsf@xmission.com>
On Wed, Apr 04, 2018 at 05:38:02PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
>
> > On Wed, Apr 04, 2018 at 09:48:57PM +0200, Christian Brauner wrote:
> >> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> >>
> >> enabled sending hotplug events into all network namespaces back in 2010.
> >> Over time the set of uevents that get sent into all network namespaces has
> >> shrunk. We have now reached the point where hotplug events for all devices
> >> that carry a namespace tag are filtered according to that namespace.
> >>
> >> Specifically, they are filtered whenever the namespace tag of the kobject
> >> does not match the namespace tag of the netlink socket. One example are
> >> network devices. Uevents for network devices only show up in the network
> >> namespaces these devices are moved to or created in.
> >>
> >> However, any uevent for a kobject that does not have a namespace tag
> >> associated with it will not be filtered and we will *try* to broadcast it
> >> into all network namespaces.
> >>
> >> The original patchset was written in 2010 before user namespaces were a
> >> thing. With the introduction of user namespaces sending out uevents became
> >> partially isolated as they were filtered by user namespaces:
> >>
> >> net/netlink/af_netlink.c:do_one_broadcast()
> >>
> >> if (!net_eq(sock_net(sk), p->net)) {
> >> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> return;
> >>
> >> if (!peernet_has_id(sock_net(sk), p->net))
> >> return;
> >>
> >> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> CAP_NET_BROADCAST))
> >> j return;
> >> }
> >>
> >> The file_ns_capable() check will check whether the caller had
> >> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> namespace of interest. This check is fine in general but seems insufficient
> >> to me when paired with uevents. The reason is that devices always belong to
> >> the initial user namespace so uevents for kobjects that do not carry a
> >> namespace tag should never be sent into another user namespace. This has
> >> been the intention all along. But there's one case where this breaks,
> >> namely if a new user namespace is created by root on the host and an
> >> identity mapping is established between root on the host and root in the
> >> new user namespace. Here's a reproducer:
> >>
> >> sudo unshare -U --map-root
> >> udevadm monitor -k
> >> # Now change to initial user namespace and e.g. do
> >> modprobe kvm
> >> # or
> >> rmmod kvm
> >>
> >> will allow the non-initial user namespace to retrieve all uevents from the
> >> host. This seems very anecdotal given that in the general case user
> >> namespaces do not see any uevents and also can't really do anything useful
> >> with them.
> >>
> >> Additionally, it is now possible to send uevents from userspace. As such we
> >> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> namespace of the network namespace of the netlink socket) userspace process
> >> make a decision what uevents should be sent.
> >>
> >> This makes me think that we should simply ensure that uevents for kobjects
> >> that do not carry a namespace tag are *always* filtered by user namespace
> >> in kobj_bcast_filter(). Specifically:
> >> - If the owning user namespace of the uevent socket is not init_user_ns the
> >> event will always be filtered.
> >> - If the network namespace the uevent socket belongs to was created in the
> >> initial user namespace but was opened from a non-initial user namespace
> >> the event will be filtered as well.
> >> Put another way, uevents for kobjects not carrying a namespace tag are now
> >> always only sent to the initial user namespace. The regression potential
> >> for this is near to non-existent since user namespaces can't really do
> >> anything with interesting devices.
> >>
> >> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> >
> > That was supposed to be [PATCH net] not [PATCH net-next] which is
> > obviously closed. Sorry about that.
>
> This does not appear to be a fix.
> This looks like feature work.
> The motivation appears to be that looks wrong let's change it.
Hm, it looked like an oversight an therefore seems like a bug which is
why I thought would be a good candidate for net. Recent patches to the
semantics here just make it more obvious and provide a better argument
to fix it in the current release rather then defer it to the next one.
But I'm happy to leave this for net-next. I don't want to rush things if
this change in semantics is not trivial enough. For the record, I'm
merely fixing/expanding on the current status quo.
David, is it ok to queue this or would you prefer I resend when net-next
reopens?
>
> So let's please leave this for when net-next opens again so we can
> have time to fully consider a change in semantics.
Sure, if we agree that this is the way to go I'm happy too.
Is your issue just with when we merge it and you disagree from a
technical perspective? That wasn't entirely obvious from your previous
mail. :)
Thanks!
Christian
>
> Thank you,
> Eric
^ permalink raw reply
* Re: [PATCH net v5 3/3] ipv6: udp6: set dst cache for a connected sk if current not valid
From: Sasha Levin @ 2018-04-05 1:34 UTC (permalink / raw)
To: Sasha Levin, Alexey Kodanev, netdev@vger.kernel.org
Cc: Eric Dumazet, stable@vger.kernel.org
In-Reply-To: <1522677635-5364-4-git-send-email-alexey.kodanev@oracle.com>
Hi Alexey Kodanev.
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag.
fixing commit: 33c162a980fe ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update.
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92,
v4.15.15: Failed to apply! Possible dependencies:
22b12d2bf404: ("ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()")
v4.14.32: Failed to apply! Possible dependencies:
22b12d2bf404: ("ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()")
v4.9.92: Failed to apply! Possible dependencies:
22b12d2bf404: ("ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()")
^ permalink raw reply
* Re: [PATCH net 2/2] net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
From: Sasha Levin @ 2018-04-05 1:33 UTC (permalink / raw)
To: Sasha Levin, Florian Fainelli, netdev@vger.kernel.org
Cc: Florian Fainelli, stable@vger.kernel.org
In-Reply-To: <20180401172630.12883-3-f.fainelli@gmail.com>
Hi Florian Fainelli.
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag.
fixing commit: 9cde94506eac bgmac: implement scatter/gather support.
The bot has also determined it's probably a bug fixing patch. (score: 45.9160)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126,
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net 1/2] net: bgmac: Correctly annotate register space
From: Sasha Levin @ 2018-04-05 1:33 UTC (permalink / raw)
To: Sasha Levin, Florian Fainelli, netdev@vger.kernel.org
Cc: Florian Fainelli, stable@vger.kernel.org
In-Reply-To: <20180401172630.12883-2-f.fainelli@gmail.com>
Hi Florian Fainelli.
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag.
fixing commit: f6a95a24957a net: ethernet: bgmac: Add platform device support.
The bot has also determined it's probably a bug fixing patch. (score: 67.8237)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92,
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
dd5c5d037f5e: ("net: ethernet: bgmac: add NS2 support")
1676aba5ef7e: ("net: ethernet: bgmac: device tree phy enablement")
^ permalink raw reply
* Re: [PATCH net] route: check sysctl_fib_multipath_use_neigh earlier than hash
From: Sasha Levin @ 2018-04-05 1:33 UTC (permalink / raw)
To: Sasha Levin, Xin Long, network dev
Cc: davem@davemloft.net, Julian Anastasov, stable@vger.kernel.org
In-Reply-To: <6265ae9495a93efac372d41001c5ebccbb916df7.1522593635.git.lucien.xin@gmail.com>
Hi Xin Long.
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag.
fixing commit: a6db4494d218 net: ipv4: Consider failed nexthops in multipath routes.
The bot has also determined it's probably a bug fixing patch. (score: 58.0140)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92,
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
^ permalink raw reply
* Re: [PATCH v2 net-next 01/10] mlxsw: spectrum_acl: Fix flex actions header ifndef define construct
From: Sasha Levin @ 2018-04-05 1:33 UTC (permalink / raw)
To: Sasha Levin, Ido Schimmel, Jiri Pirko, netdev@vger.kernel.org
Cc: davem@davemloft.net, jiri@mellanox.com, petrm@mellanox.com,
stable@vger.kernel.org
In-Reply-To: <20180401143459.30770-2-idosch@mellanox.com>
Hi Ido Schimmel
Jiri Pirko.
[This is an automated email]
This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 2.6781)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126,
v4.15.15: Build OK!
v4.14.32: Failed to apply! Possible dependencies:
d3b939b8f9a5: ("mlxsw: spectrum: Move ACL flexible actions instance to spectrum")
e2b2d35a052d: ("mlxsw: spectrum: Change init order")
v4.9.92: Failed to apply! Possible dependencies:
d3b939b8f9a5: ("mlxsw: spectrum: Move ACL flexible actions instance to spectrum")
38ebc0f45474: ("mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops")
ff7b0d27208b: ("mlxsw: spectrum: Add support for counter allocator")
7aa0f5aa9030: ("mlxsw: spectrum: Implement TC flower offload")
22a677661f56: ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
1d20d23c59c9: ("mlxsw: Move PCI id table definitions into driver modules")
62e86f9e8281: ("mlxsw: pci: Rename header with HW definitions")
98d0f7b9acda: ("mlxsw: spectrum: Add packet sample offloading support")
65acb5d0827c: ("mlxsw: spectrum: Make the add_matchall_tc_entry symmetric")
5724b8b56947: ("net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror'")
v4.4.126: Failed to apply! Possible dependencies:
d3b939b8f9a5: ("mlxsw: spectrum: Move ACL flexible actions instance to spectrum")
38ebc0f45474: ("mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops")
ff7b0d27208b: ("mlxsw: spectrum: Add support for counter allocator")
7aa0f5aa9030: ("mlxsw: spectrum: Implement TC flower offload")
b090ef068645: ("mlxsw: Introduce simplistic KVD linear area manager")
464dce188487: ("mlxsw: spectrum_router: Add basic ipv4 router initialization")
f00817df2b42: ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)")
90183b980d0a: ("mlxsw: spectrum: Initialize egress scheduling")
7f71eb46a485: ("mlxsw: spectrum: Split vFID range in two")
bd40e9d6d538: ("mlxsw: spectrum: Allocate active VLANs only for port netdevs")
0d65fc13042f: ("mlxsw: spectrum: Implement LAG port join/leave")
c4745500e988: ("mlxsw: Implement devlink interface")
89309da39f55: ("mlxsw: core: Implement temperature hwmon interface")
7f71eb46a485: ("mlxsw: spectrum: Split vFID range in two")
bd40e9d6d538: ("mlxsw: spectrum: Allocate active VLANs only for port netdevs")
0d65fc13042f: ("mlxsw: spectrum: Implement LAG port join/leave")
18f1e70c4137: ("mlxsw: spectrum: Introduce port splitting")
3e9b27b8fc8b: ("mlxsw: spectrum: Unmap local port from module during teardown")
bd40e9d6d538: ("mlxsw: spectrum: Allocate active VLANs only for port netdevs")
c4745500e988: ("mlxsw: Implement devlink interface")
Please let us know if you'd like to have this patch included in a stable tree.
^ permalink raw reply
* Re: [PATCH v2 net-next 06/10] mlxsw: core: Fix arg name of MLXSW_CORE_RES_VALID and MLXSW_CORE_RES_GET
From: Sasha Levin @ 2018-04-05 1:33 UTC (permalink / raw)
To: Sasha Levin, Ido Schimmel, Jiri Pirko, netdev@vger.kernel.org
Cc: davem@davemloft.net, jiri@mellanox.com, petrm@mellanox.com,
stable@vger.kernel.org
In-Reply-To: <20180401143459.30770-7-idosch@mellanox.com>
Hi Ido Schimmel
Jiri Pirko.
[This is an automated email]
This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 1.5151)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126,
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
c1a3831121f6: ("mlxsw: Convert resources into array")
v4.4.126: Failed to apply! Possible dependencies:
403547d38d0b: ("mlxsw: profile: Add KVD resources to profile config")
489107bda1d1: ("mlxsw: Add KVD sizes configuration into profile")
57d316ba2017: ("mlxsw: pci: Add resources query implementation.")
8060646a0fd1: ("mlxsw: core: Add support for packets received from LAG port")
89309da39f55: ("mlxsw: core: Implement temperature hwmon interface")
89309da39f55: ("mlxsw: core: Implement temperature hwmon interface")
932762b69a28: ("mlxsw: Move devlink port registration into common core code")
8060646a0fd1: ("mlxsw: core: Add support for packets received from LAG port")
89309da39f55: ("mlxsw: core: Implement temperature hwmon interface")
90183b980d0a: ("mlxsw: spectrum: Initialize egress scheduling")
7f71eb46a485: ("mlxsw: spectrum: Split vFID range in two")
bd40e9d6d538: ("mlxsw: spectrum: Allocate active VLANs only for port netdevs")
0d65fc13042f: ("mlxsw: spectrum: Implement LAG port join/leave")
c4745500e988: ("mlxsw: Implement devlink interface")
89309da39f55: ("mlxsw: core: Implement temperature hwmon interface")
7f71eb46a485: ("mlxsw: spectrum: Split vFID range in two")
Please let us know if you'd like to have this patch included in a stable tree.
^ permalink raw reply
* [rtlwifi-btcoex] Suspicious code in halbtc8821a1ant driver
From: Gustavo A. R. Silva @ 2018-04-05 1:25 UTC (permalink / raw)
To: Yan-Hsuan Chuang, Ping-Ke Shih, Kalle Valo
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Gustavo A. R. Silva
Hi all,
While doing some static analysis I came across the following piece of code at drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c:1581:
1581 static void btc8821a1ant_act_bt_sco_hid_only_busy(struct btc_coexist *btcoexist,
1582 u8 wifi_status)
1583 {
1584 /* tdma and coex table */
1585 btc8821a1ant_ps_tdma(btcoexist, NORMAL_EXEC, true, 5);
1586
1587 if (BT_8821A_1ANT_WIFI_STATUS_NON_CONNECTED_ASSO_AUTH_SCAN ==
1588 wifi_status)
1589 btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 1);
1590 else
1591 btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 1);
1592 }
The issue here is that the code for both branches of the if-else statement is identical.
The if-else was introduced a year ago in this commit c6821613e653
I wonder if an argument should be changed in any of the calls to btc8821a1ant_coex_table_with_type?
What do you think?
Thanks
--
Gustavo
^ permalink raw reply
* Re: [PATCH net] net: dsa: Discard frames from unused ports
From: Florian Fainelli @ 2018-04-05 0:49 UTC (permalink / raw)
To: Andrew Lunn, David Miller; +Cc: netdev, Vivien Didelot
In-Reply-To: <1522886204-1545-1-git-send-email-andrew@lunn.ch>
On 04/04/2018 04:56 PM, Andrew Lunn wrote:
> The Marvell switches under some conditions will pass a frame to the
> host with the port being the CPU port. Such frames are invalid, and
> should be dropped. Not dropping them can result in a crash when
> incrementing the receive statistics for an invalid port.
>
> Reported-by: Chris Healy <cphealy@gmail.com>
> Fixes: 5f6b4e14cada ("net: dsa: User per-cpu 64-bit statistics")
Are you sure this is the commit that introduced the problem?
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
> net/dsa/dsa_priv.h | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 70de7895e5b8..6c1bf3d9f652 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -126,6 +126,7 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
> struct dsa_port *cpu_dp = dev->dsa_ptr;
> struct dsa_switch_tree *dst = cpu_dp->dst;
> struct dsa_switch *ds;
> + struct dsa_port *slave_port;
>
> if (device < 0 || device >= DSA_MAX_SWITCHES)
> return NULL;
> @@ -137,7 +138,12 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
> if (port < 0 || port >= ds->num_ports)
> return NULL;
>
> - return ds->ports[port].slave;
> + slave_port = &ds->ports[port];
> +
> + if (slave_port->type != DSA_PORT_TYPE_USER)
Can we optimize this with an unlikely()?
> + return NULL;
> +
> + return slave_port->slave;
> }
>
> /* port.c */
>
--
Florian
^ permalink raw reply
* Re: [PATCH v2 00/21] Allow compile-testing NO_DMA (drivers)
From: Rob Herring @ 2018-04-05 0:32 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Ulf Hansson, Wolfram Sang, linux-iio-u79uwXL29TY76Z2rM5mHXA,
linux-fpga-u79uwXL29TY76Z2rM5mHXA,
open list:REMOTE PROCESSOR (REMOTEPROC) SUBSYSTEM, Linux-ALSA,
Bjorn Andersson, Eric Anholt, netdev,
linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Linux I2C,
linux1394-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
Christoph Hellwig, Stefan Wahren, Boris Brezillon,
James E . J . Bottomley, Herbert Xu,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, Richard Weinberger, Jassi Brar,
Marek Vasut, "open list:SERIAL DRIVERS" <linux-serial@
In-Reply-To: <1521208314-4783-1-git-send-email-geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>
On Fri, Mar 16, 2018 at 8:51 AM, Geert Uytterhoeven
<geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org> wrote:
> Hi all,
>
> If NO_DMA=y, get_dma_ops() returns a reference to the non-existing
> symbol bad_dma_ops, thus causing a link failure if it is ever used.
>
> The intention of this is twofold:
> 1. To catch users of the DMA API on systems that do no support the DMA
> mapping API,
> 2. To avoid building drivers that cannot work on such systems anyway.
>
> However, the disadvantage is that we have to keep on adding dependencies
> on HAS_DMA all over the place.
>
> Thanks to the COMPILE_TEST symbol, lots of drivers now depend on one or
> more platform dependencies (that imply HAS_DMA) || COMPILE_TEST, thus
> already covering intention #2. Having to add an explicit dependency on
> HAS_DMA here is cumbersome, and hinders compile-testing.
The same can be said for CONFIG_IOMEM and CONFIG_OF. Any plans to
remove those too? CONFIG_IOMEM is mostly just a !CONFIG_UM option.
Rob
^ permalink raw reply
* [PATCH] brcm80211: brcmsmac: phy_lcn: remove duplicate code
From: Gustavo A. R. Silva @ 2018-04-05 0:09 UTC (permalink / raw)
To: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
Wright Feng, Kalle Valo
Cc: linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
netdev, linux-kernel, Gustavo A. R. Silva
Remove and refactor some code in order to avoid having identical code
for different branches.
Notice that this piece of code hasn't been modified since 2011.
Addresses-Coverity-ID: 1226756 ("Identical code for different branches")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
index 93d4cde..9d830d2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
@@ -3388,13 +3388,8 @@ void wlc_lcnphy_deaf_mode(struct brcms_phy *pi, bool mode)
u8 phybw40;
phybw40 = CHSPEC_IS40(pi->radio_chanspec);
- if (LCNREV_LT(pi->pubpi.phy_rev, 2)) {
- mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
- mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
- } else {
- mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
- mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
- }
+ mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
+ mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
if (phybw40 == 0) {
mod_phy_reg((pi), 0x410,
--
2.7.4
^ permalink raw reply related
* Re: [RFC] net: bump the default number of RSS queues
From: Jakub Kicinski @ 2018-04-05 0:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, netdev, brouer, alexander.duyck, oss-drivers
In-Reply-To: <3f6f40f3-cca2-0f43-8940-05003da88524@gmail.com>
On Tue, 3 Apr 2018 17:20:49 -0700, Eric Dumazet wrote:
> On 04/03/2018 05:14 PM, Jakub Kicinski wrote:
> > Some popular NIC vendors are not adhering to
> > netif_get_num_default_rss_queues() which leads to users being
> > surprised and filing bugs :) Bump the number of default RX
> > queues to something more reasonable for modern machines.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > I'm mostly wondering what's the policy on this default? When
> > should it be applied? Why was 8 chosen as the default? We
> > can abandon using netif_get_num_default_rss_queues() for the
> > nfp but I wonder what's the correct course of action here...
> > Should new drivers use netif_get_num_default_rss_queues() for
> > example?
> >
> > include/linux/netdevice.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 2a2d9cf50aa2..26fe145ada2a 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3260,7 +3260,7 @@ static inline unsigned int get_netdev_rx_queue_index(
> > }
> > #endif
> >
> > -#define DEFAULT_MAX_NUM_RSS_QUEUES (8)
> > +#define DEFAULT_MAX_NUM_RSS_QUEUES (64)
> > int netif_get_num_default_rss_queues(void);
>
> There is no evidence having so many queues is beneficial.
>
> Too many queues -> lots of overhead in many cases.
>
> So I would rather not touch this, unless you can present good numbers ;)
Thank you for the comment! I don't have convincing number it was more
of a matter of consistency :)
Now I think I forgot about aRFS, when aRFS support for the nfp is added
we will probably start ignoring the default as well.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 0/3] bpf/verifier: subprog/func_call simplifications
From: Edward Cree @ 2018-04-04 23:58 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, netdev
In-Reply-To: <20180403233718.rrzh6ds67hraxhax@ast-mbp>
On 04/04/18 00:37, Alexei Starovoitov wrote:
> hmm. that doesn't fail for me and any other bots didn't complain.
> Are you sure you're running the latest kernel and tests?
Ah, test_progs isn't actually rebuilding because __NR_bpf is undeclared;
something must be going wrong with header files.
Never mind.
> hmm. what's wrong with bsearch? It's trivial and fast.
bsearch is O(log n), and the sort() call on the subprog_starts (which happens
every time add_subprog() is called) is O(n log n).
Whereas reading aux->subprogno is O(1).
As far as I'm concerned, that's a sign that the latter data structure is the
appropriate one.
> Even if we don't see the solution today we have to work towards it.
I guess I'm just not confident "towards" is the direction you think it is.
> Compiler designers could have combined multiple of such passes into
> fewer ones, but it's not done, because it increases complexity and
> causes tough bugs where pass is partially complete.
I'm not trying to combine together multiple 'for bb in prog/for insn in bb'-
type passes. The combining I was doing was more on 'for all possible
execution paths'-type passes, because it's those that explode combinatorially.
Happily I think we can go a long way towards getting rid of them; but while I
think we can get down to only having 1, I don't think we can reach 0.
> The prime example where more than 4k instructions and loops are mandatory
> is user space stack analysis inside the program. Like walking python stack
> requires non-trival pointer chasing. With 'pragma unroll' the stack depth
> limit today is ~18. That's not really usable. Looping through 100 python
> frames would require about 16k bpf assembler instructions.
But this would be solved by having support for bounded loops, and I think I've
successfully shown that this is not inherently incompatible with a do_check()
style walk.
> Hence do_check approach must go. The rough idea is to compute per basic
> block a set of INs (registers and stack) that basic block needs
> to see to be safe and corresponding set of OUTs.
> Then propagate this knowledge across cfg edges.
> Once we have such set per bpf function, it will essentially answer the question
> 'what arguments this function needs to see to be safe and what it returns'
> To make bpf libraries scale we'd need to keep such information
> around after the verification, so dynamic linking and indirect calls
> are fast to verify.
> It's very high level obviously. There are many gotchas to resolve.
I agree that if we can do this it'll be ideal. But that's a big 'if'; my
example code was intended to demonstrate that the "set of INs bb/func needs to
see to be safe" can be an arbitrarily complicated disjunction, and that instead
of a combinatorially exploding number of paths to walk (the do_check() approach)
you now have combinatorially exploding IN-constraints to propagate backwards.
> Please do, since that's my concern with tsort.
> The verifier is the key piece of bpf infra and to be effective maintainers
> we need to thoroughly understand the verifier code.
> We cannot just take the patch based on the cover letter. The author may
> disappear tomorrow and what we're going to do with the code?
I entirely accept this argument. Unfortunately, when writing explanations,
it's difficult to know when one has reached something that will be
understood, so inevitably there will have to be a few iterations to get
there ;-)
^ permalink raw reply
* [PATCH net] net: dsa: Discard frames from unused ports
From: Andrew Lunn @ 2018-04-04 23:56 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Florian Fainelli, Vivien Didelot, Andrew Lunn
The Marvell switches under some conditions will pass a frame to the
host with the port being the CPU port. Such frames are invalid, and
should be dropped. Not dropping them can result in a crash when
incrementing the receive statistics for an invalid port.
Reported-by: Chris Healy <cphealy@gmail.com>
Fixes: 5f6b4e14cada ("net: dsa: User per-cpu 64-bit statistics")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
net/dsa/dsa_priv.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 70de7895e5b8..6c1bf3d9f652 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -126,6 +126,7 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
struct dsa_port *cpu_dp = dev->dsa_ptr;
struct dsa_switch_tree *dst = cpu_dp->dst;
struct dsa_switch *ds;
+ struct dsa_port *slave_port;
if (device < 0 || device >= DSA_MAX_SWITCHES)
return NULL;
@@ -137,7 +138,12 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
if (port < 0 || port >= ds->num_ports)
return NULL;
- return ds->ports[port].slave;
+ slave_port = &ds->ports[port];
+
+ if (slave_port->type != DSA_PORT_TYPE_USER)
+ return NULL;
+
+ return slave_port->slave;
}
/* port.c */
--
2.16.3
^ permalink raw reply related
* Re: [PATCH iproute2 rdma: Ignore unknown netlink attributes
From: Stephen Hemminger @ 2018-04-04 23:43 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Leon Romanovsky, netdev, RDMA mailing list, David Ahern,
Steve Wise
In-Reply-To: <20180403072842.32153-1-leon@kernel.org>
On Tue, 3 Apr 2018 10:28:42 +0300
Leon Romanovsky <leon@kernel.org> wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> The check if netlink attributes supplied more than maximum supported
> is to strict and may lead to backward compatibility issues with old
> application with a newer kernel that supports new attribute.
>
> CC: Steve Wise <swise@opengridcomputing.com>
> Fixes: 74bd75c2b68d ("rdma: Add basic infrastructure for RDMA tool")
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Applied
^ permalink raw reply
* Re: [PATCH iproute2] ip/l2tp: remove offset and peer-offset options
From: Stephen Hemminger @ 2018-04-04 23:43 UTC (permalink / raw)
To: Guillaume Nault; +Cc: netdev, James Chapman
In-Reply-To: <46d360e852235de84e44c796fa934f9cfce988b1.1522769819.git.g.nault@alphalink.fr>
On Tue, 3 Apr 2018 17:39:54 +0200
Guillaume Nault <g.nault@alphalink.fr> wrote:
> Ignore options "peer-offset" and "offset" when creating sessions. Keep
> them when dumping sessions in order to avoid breaking external scripts.
>
> "peer-offset" has always been a noop in iproute2. "offset" is now
> ignored in Linux 4.16 (and was broken before that).
>
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Sure, this makes sense applied.
In theory, you could have just dropped them from the JSON output.
^ permalink raw reply
* Re: [PATCH iproute2-next] tc: Correct json output for actions
From: Stephen Hemminger @ 2018-04-04 23:42 UTC (permalink / raw)
To: Roman Mashak; +Cc: Yuval Mintz, dsahern, mlxsw, netdev
In-Reply-To: <858ta2brwv.fsf@mojatatu.com>
On Wed, 04 Apr 2018 17:09:04 -0400
Roman Mashak <mrv@mojatatu.com> wrote:
> Yuval Mintz <yuvalm@mellanox.com> writes:
>
> > Commit 9fd3f0b255d9 ("tc: enable json output for actions") added JSON
> > support for tc-actions at the expense of breaking other use cases that
> > reach tc_print_action(), as the latter don't expect the 'actions' array
> > to be a new object.
> >
> > Consider the following taken duringrun of tc_chain.sh selftest,
> > and see the latter command output is broken:
> >
> > $ ./tc/tc -j -p actions list action gact | grep -C 3 actions
> > [ {
> > "total acts": 1
> > },{
> > "actions": [ {
> > "order": 0,
> >
> > $ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
> > },
> > "skip_hw": true,
> > "not_in_hw": true,{
> > "actions": [ {
> > "order": 1,
> > "kind": "gact",
> > "control_action": {
> >
> > Relocate the open/close of the JSON object to declare the object only
> > for the case that needs it.
> >
> > Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
>
> [...]
>
>
> Good catch, thanks Yuval.
>
> Tested-by: Roman Mashak <mrv@mojatatu.com>
Applied
^ permalink raw reply
* Re: [PATCH net-next 09/11] devlink: convert occ_get op to separate registration
From: David Ahern @ 2018-04-04 23:00 UTC (permalink / raw)
To: Jakub Kicinski, Jiri Pirko
Cc: Ido Schimmel, netdev, davem, jiri, petrm, mlxsw
In-Reply-To: <20180404155910.7baf2ec9@cakuba.netronome.com>
On 4/4/18 4:59 PM, Jakub Kicinski wrote:
> On Wed, 4 Apr 2018 08:25:11 +0200, Jiri Pirko wrote:
>>> Jiri, I am not aware of any other API where a driver registers with it
>>> yet doesn't want the handler to be called so either waits to register
>>
>> Again, the thing is, this is kind of unusual because of the reload
>> thing.
>
> FWIW my knee jerk thought is that it's strange that devlink ops can
> be executed at all while reload is happening (incl. another reload
> request). I'm probably missing the real issue..
>
Just responding with the same question ...
Why are you not unregistering resources on a reload?
^ permalink raw reply
* Re: [PATCH net-next 09/11] devlink: convert occ_get op to separate registration
From: Jakub Kicinski @ 2018-04-04 22:59 UTC (permalink / raw)
To: Jiri Pirko; +Cc: David Ahern, Ido Schimmel, netdev, davem, jiri, petrm, mlxsw
In-Reply-To: <20180404062511.GO3313@nanopsycho>
On Wed, 4 Apr 2018 08:25:11 +0200, Jiri Pirko wrote:
> >Jiri, I am not aware of any other API where a driver registers with it
> >yet doesn't want the handler to be called so either waits to register
>
> Again, the thing is, this is kind of unusual because of the reload
> thing.
FWIW my knee jerk thought is that it's strange that devlink ops can
be executed at all while reload is happening (incl. another reload
request). I'm probably missing the real issue..
^ permalink raw reply
* Re: [PATCH net] netns: filter uevents correctly
From: Eric W. Biederman @ 2018-04-04 22:38 UTC (permalink / raw)
To: Christian Brauner
Cc: davem, gregkh, netdev, linux-kernel, avagin, ktkhai, serge
In-Reply-To: <20180404203048.GA21118@gmail.com>
Christian Brauner <christian.brauner@canonical.com> writes:
> On Wed, Apr 04, 2018 at 09:48:57PM +0200, Christian Brauner wrote:
>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
>>
>> enabled sending hotplug events into all network namespaces back in 2010.
>> Over time the set of uevents that get sent into all network namespaces has
>> shrunk. We have now reached the point where hotplug events for all devices
>> that carry a namespace tag are filtered according to that namespace.
>>
>> Specifically, they are filtered whenever the namespace tag of the kobject
>> does not match the namespace tag of the netlink socket. One example are
>> network devices. Uevents for network devices only show up in the network
>> namespaces these devices are moved to or created in.
>>
>> However, any uevent for a kobject that does not have a namespace tag
>> associated with it will not be filtered and we will *try* to broadcast it
>> into all network namespaces.
>>
>> The original patchset was written in 2010 before user namespaces were a
>> thing. With the introduction of user namespaces sending out uevents became
>> partially isolated as they were filtered by user namespaces:
>>
>> net/netlink/af_netlink.c:do_one_broadcast()
>>
>> if (!net_eq(sock_net(sk), p->net)) {
>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> return;
>>
>> if (!peernet_has_id(sock_net(sk), p->net))
>> return;
>>
>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> CAP_NET_BROADCAST))
>> j return;
>> }
>>
>> The file_ns_capable() check will check whether the caller had
>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>> namespace of interest. This check is fine in general but seems insufficient
>> to me when paired with uevents. The reason is that devices always belong to
>> the initial user namespace so uevents for kobjects that do not carry a
>> namespace tag should never be sent into another user namespace. This has
>> been the intention all along. But there's one case where this breaks,
>> namely if a new user namespace is created by root on the host and an
>> identity mapping is established between root on the host and root in the
>> new user namespace. Here's a reproducer:
>>
>> sudo unshare -U --map-root
>> udevadm monitor -k
>> # Now change to initial user namespace and e.g. do
>> modprobe kvm
>> # or
>> rmmod kvm
>>
>> will allow the non-initial user namespace to retrieve all uevents from the
>> host. This seems very anecdotal given that in the general case user
>> namespaces do not see any uevents and also can't really do anything useful
>> with them.
>>
>> Additionally, it is now possible to send uevents from userspace. As such we
>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> namespace of the network namespace of the netlink socket) userspace process
>> make a decision what uevents should be sent.
>>
>> This makes me think that we should simply ensure that uevents for kobjects
>> that do not carry a namespace tag are *always* filtered by user namespace
>> in kobj_bcast_filter(). Specifically:
>> - If the owning user namespace of the uevent socket is not init_user_ns the
>> event will always be filtered.
>> - If the network namespace the uevent socket belongs to was created in the
>> initial user namespace but was opened from a non-initial user namespace
>> the event will be filtered as well.
>> Put another way, uevents for kobjects not carrying a namespace tag are now
>> always only sent to the initial user namespace. The regression potential
>> for this is near to non-existent since user namespaces can't really do
>> anything with interesting devices.
>>
>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
>
> That was supposed to be [PATCH net] not [PATCH net-next] which is
> obviously closed. Sorry about that.
This does not appear to be a fix.
This looks like feature work.
The motivation appears to be that looks wrong let's change it.
So let's please leave this for when net-next opens again so we can
have time to fully consider a change in semantics.
Thank you,
Eric
^ permalink raw reply
* RE: [RFC PATCH] packet: mark ring entry as in-use inside spin_lock to prevent RX ring overrun
From: Jon Rosen (jrosen) @ 2018-04-04 22:31 UTC (permalink / raw)
To: Willem de Bruijn
Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
Benjamin Poirier, open list:NETWORKING [GENERAL], open list
In-Reply-To: <CAF=yD-LP7RjenkfbFq+dG9XFZx2D7-JwqOeVD+1tLjgXxh8SpA@mail.gmail.com>
> >> > One issue with the above proposed change to use TP_STATUS_IN_PROGRESS
> >> > is that the documentation of the tp_status field is somewhat
> >> > inconsistent. In some places it's described as TP_STATUS_KERNEL(0)
> >> > meaning the entry is owned by the kernel and !TP_STATUS_KERNEL(0)
> >> > meaning the entry is owned by user space. In other places ownership
> >> > by user space is defined by the TP_STATUS_USER(1) bit being set.
> >>
> >> But indeed this example in packet_mmap.txt is problematic
> >>
> >> if (status == TP_STATUS_KERNEL)
> >> retval = poll(&pfd, 1, timeout);
> >>
> >> It does not really matter whether the docs are possibly inconsistent and
> >> which one is authoritative. Examples like the above make it likely that
> >> some user code expects such code to work.
> >
> > Yes, that's exactly my concern. Yet another troubling example seems to be
> > lipbcap which also is looking specifically for status to be anything other than
> > TP_STATUS_KERNEL(0) to indicate a frame is available in user space.
>
> Good catch. If pcap-linux.c relies on this then the status field
> cannot be changed. Other fields can be modified freely while tp_status
> remains 0, perhaps that's an option.
Possibly. Someone else suggested something similar but in at least the
one example we thought through it still seemed like it didn't address the problem.
For example, let's say we used tp_len == -1 to indicate to other kernel threads
that the entry was already in progress. This would require that user space never
set tp_len = -1 before returning the entry back to the kernel. If it did then no
kernel thread would ever claim ownership and the ring would hang.
Now, it seems pretty unlikely that user space would do such a thing so maybe we
could look past that, but then we run into the issue that there is still a window
of opportunity for other kernel threads to come in and wrap the ring.
The reason is we can't set tp_len to the correct length after setting tp_status because
user space could grab the entry and see tp_len == -1 so we have to set tp_len
before we set tp_status. This means that there is still a window where other
kernel threads could come in and see tp_len as something other than -1 and
a tp_status of TP_STATUS_KERNEL and think it's ok to allocate the entry.
This puts us back to where we are today (arguably with a smaller window,
but a window none the less).
Alternatively we could reacquire the spin_lock to then set tp_len followed by
tp_status. This would give the necessary indivisibility in the kernel while
preserving proper order as made visible to user space, but it comes at the cost
of another spin_lock.
Thanks for the suggestion. If you can think of a way around this I'm all ears.
I'll think on this some more but so far I'm stuck on how to get past having to
broaden the scope of the spin_lock, reacquire the spin_lock, or use some sort
of atomic construct along with a parallel shadow ring structure (still thinking
through that one as well).
^ permalink raw reply
* Re: [RFC PATCH] packet: mark ring entry as in-use inside spin_lock to prevent RX ring overrun
From: Willem de Bruijn @ 2018-04-04 21:44 UTC (permalink / raw)
To: Jon Rosen (jrosen)
Cc: David S. Miller, Willem de Bruijn, Eric Dumazet, Kees Cook,
David Windsor, Rosen, Rami, Reshetova, Elena, Mike Maloney,
Benjamin Poirier, open list:NETWORKING [GENERAL], open list
In-Reply-To: <057239cdf1c34bc69636009c9e099214@XCH-RTP-016.cisco.com>
>> > One issue with the above proposed change to use TP_STATUS_IN_PROGRESS
>> > is that the documentation of the tp_status field is somewhat
>> > inconsistent. In some places it's described as TP_STATUS_KERNEL(0)
>> > meaning the entry is owned by the kernel and !TP_STATUS_KERNEL(0)
>> > meaning the entry is owned by user space. In other places ownership
>> > by user space is defined by the TP_STATUS_USER(1) bit being set.
>>
>> But indeed this example in packet_mmap.txt is problematic
>>
>> if (status == TP_STATUS_KERNEL)
>> retval = poll(&pfd, 1, timeout);
>>
>> It does not really matter whether the docs are possibly inconsistent and
>> which one is authoritative. Examples like the above make it likely that
>> some user code expects such code to work.
>
> Yes, that's exactly my concern. Yet another troubling example seems to be
> lipbcap which also is looking specifically for status to be anything other than
> TP_STATUS_KERNEL(0) to indicate a frame is available in user space.
Good catch. If pcap-linux.c relies on this then the status field
cannot be changed. Other fields can be modified freely while tp_status
remains 0, perhaps that's an option.
^ permalink raw reply
* Re: [PATCH iproute2-next] tc: Correct json output for actions
From: Roman Mashak @ 2018-04-04 21:09 UTC (permalink / raw)
To: Yuval Mintz; +Cc: dsahern, mlxsw, netdev
In-Reply-To: <1522844653-37136-1-git-send-email-yuvalm@mellanox.com>
Yuval Mintz <yuvalm@mellanox.com> writes:
> Commit 9fd3f0b255d9 ("tc: enable json output for actions") added JSON
> support for tc-actions at the expense of breaking other use cases that
> reach tc_print_action(), as the latter don't expect the 'actions' array
> to be a new object.
>
> Consider the following taken duringrun of tc_chain.sh selftest,
> and see the latter command output is broken:
>
> $ ./tc/tc -j -p actions list action gact | grep -C 3 actions
> [ {
> "total acts": 1
> },{
> "actions": [ {
> "order": 0,
>
> $ ./tc/tc -p -j -s filter show dev enp3s0np2 ingress | grep -C 3 actions
> },
> "skip_hw": true,
> "not_in_hw": true,{
> "actions": [ {
> "order": 1,
> "kind": "gact",
> "control_action": {
>
> Relocate the open/close of the JSON object to declare the object only
> for the case that needs it.
>
> Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[...]
Good catch, thanks Yuval.
Tested-by: Roman Mashak <mrv@mojatatu.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox