* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Mark Brown @ 2019-08-20 12:55 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
Florian Fainelli, linux-spi, netdev
In-Reply-To: <CA+h21hrZbun_j+oABJFP+P+V3zHP2x0mAhv-1ocF38miCvZHew@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]
On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:
> I'm not sure how to respond to this, because I don't know anything
> about the timing of DMA transfers.
> Maybe snapshotting DMA transfers the same way is not possible (if at
> all). Maybe they are not exactly adequate for this sort of application
> anyway. Maybe it depends.
DMA transfers generally proceed without any involvement from the CPU,
this is broadly the point of DMA. You *may* be able to split into
multiple transactions but it's not reliable that you'd be able to do so
on byte boundaries and there will be latency getting notified of
completions.
> In other words, from a purely performance perspective, I am against
> limiting the API to just snapshotting the first and last byte. At this
> level of "zoom", if I change the offset of the byte to anything other
> than 3, the synchronization offset refuses to converge towards zero,
> because the snapshot is incurring a constant offset that the servo
> loop from userspace (phc2sys) can't compensate for.
> Maybe the SPI master driver should just report what sort of
> snapshotting capability it can offer, ranging from none (default
> unless otherwise specified), to transfer-level (DMA style) or
> byte-level.
That does then have the consequence that the majority of controllers
aren't going to be usable with the API which isn't great.
> I'm afraid more actual experimentation is needed with DMA-based
> controllers to understand what can be expected from them, and as a
> result, how the API should map around them.
> MDIO bus controllers are in a similar situation (with Hubert's patch)
> but at least there the frame size is fixed and I haven't heard of an
> MDIO controller to use DMA.
I'm not 100% clear what the problem you're trying to solve is, or if
it's a sensible problem to try to solve for that matter.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v2] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-20 12:56 UTC (permalink / raw)
To: Pravin B Shelar, netdev@vger.kernel.org, David S. Miller,
Justin Pettit, Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov
Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <1566304834-22836-1-git-send-email-paulb@mellanox.com>
Hey guys, sorry for spam, I used the --in-reply-to this time so it gets
to the original thread ("[PATCH net-next v2] net: openvswitch: Set OvS
recirc_id from tc chain index") ,
Ignore this thread and respond there if needed.
Thanks.
On 8/20/2019 3:40 PM, Paul Blakey wrote:
> Regarding the user_features change, I tested the above patch with this one in
> userspace that I'll send once this is accepted, togother with the rest
> of connection tracking offload patches.
>
> I also have a test for it, if anyone wants it.
>
> Patch is:
> lib/netdev-offloads-tc: Probe recirc tc sharing feature on first recirc_id rule
> [...]
^ permalink raw reply
* Re: [EXT] Re: [PATCH net-next 0/1] net: fec: add C45 MDIO read/write support
From: Andrew Lunn @ 2019-08-20 13:04 UTC (permalink / raw)
To: Andy Duan
Cc: Marco Hartmann, davem@davemloft.net, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Christian Herber
In-Reply-To: <VI1PR0402MB360079EAAE7042048B2F5AC8FFAB0@VI1PR0402MB3600.eurprd04.prod.outlook.com>
On Tue, Aug 20, 2019 at 02:32:26AM +0000, Andy Duan wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> > On Mon, Aug 19, 2019 at 05:11:14PM +0000, Marco Hartmann wrote:
> > > As of yet, the Fast Ethernet Controller (FEC) driver only supports
> > > Clause 22 conform MDIO transactions. IEEE 802.3ae Clause 45 defines a
> > > modified MDIO protocol that uses a two staged access model in order to
> > > increase the address space.
> > >
> > > This patch adds support for Clause 45 conform MDIO read and write
> > > operations to the FEC driver.
> >
> > Hi Marco
> >
> > Do all versions of the FEC hardware support C45? Or do we need to make use
> > of the quirk support in this driver to just enable it for some revisions of FEC?
> >
> > Thanks
> > Andrew
>
> i.MX legacy platforms like i.MX6/7 series, they doesn't support Write & Read Increment.
> But for i.MX8MQ/MM series, it support C45 full features like Write & Read Increment.
>
> For the patch itself, it doesn't support Write & Read Increment, so I think the patch doesn't
> need to add quirk support.
Hi Andy
So what happens with something older than a i.MX8MQ/MM when a C45
transfer is attempted? This patch adds a new write. Does that write
immediately trigger a completion interrupt? Does it never trigger an
interrupt, and we have to wait FEC_MII_TIMEOUT?
Ideally, if the hardware does not support C45, we want it to return
EOPNOTSUPP.
Thanks
Andrew
^ permalink raw reply
* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Andrew Lunn @ 2019-08-20 13:26 UTC (permalink / raw)
To: Miroslav Lichvar
Cc: Hubert Feurstein, netdev, linux-kernel, Richard Cochran,
Florian Fainelli, Heiner Kallweit, Vladimir Oltean,
David S. Miller
In-Reply-To: <20190820094903.GI891@localhost>
On Tue, Aug 20, 2019 at 11:49:03AM +0200, Miroslav Lichvar wrote:
> On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
>
> > + /* PTP offset compensation:
> > + * After the MDIO access is completed (from the chip perspective), the
> > + * switch chip will snapshot the PHC timestamp. To make sure our system
> > + * timestamp corresponds to the PHC timestamp, we have to add the
> > + * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> > + * takes the average of pre_ts and post_ts to calculate the final
> > + * system timestamp. With this in mind, we have to add ptp_sts_offset
> > + * twice to post_ts, in order to not introduce an constant time offset.
> > + */
> > + if (sts)
> > + timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
>
> This correction looks good to me.
>
> Is the MDIO write delay constant in reality, or does it at least have
> an upper bound? That is, is it always true that the post_ts timestamp
> does not point to a time before the PHC timestamp was actually taken?
The post_ts could be before the target hardware does anything. The
write triggers an MDIO bus transaction, sending about 64 bits of data
down a wire at around 2.5Mbps. So there is a minimum delay of 25uS
just sending the bits down the wire. It is unclear to me exactly when
the post_ts is taken, has the hardware actually sent the bits, or has
it just initiated sending the bits? In this case, there is an
interrupt sometime later indicating the transaction has completed, so
my guess would be, post_ts indicates the transaction has been
initiated.
Also, how long does the device on the end of the bus actually take to
decode the bits on the wire and do what it needs to do?
> This is important to not break the estimation of maximum error in the
> measured offset. Applications using the ioctl may assume that the
> maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> phc2sys). That would not work if the delay could be occasionally 50
> microseconds for instance, i.e. the post_ts timestamp would be earlier
> than the PHC timestamp.
Given my understanding of the hardware, post_ts-pre_ts should be
constant. But what it exactly measures is not clearly defined :-(
And different hardware will have different definitions.
But the real point is, by doing these timestamps here, we are as close
as possible to where the action really happens, and we are minimising
the number of undefined things we are measuring, so in general, the
error is minimised.
Andrew
^ permalink raw reply
* Re: [EXT] Re: [PATCH net-next 0/1] Add BASE-T1 PHY support
From: Christian Herber @ 2019-08-20 13:36 UTC (permalink / raw)
To: Heiner Kallweit, davem@davemloft.net
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <13e65051-fe4f-5964-30b3-75285e6d2eee@gmail.com>
On 19.08.2019 21:07, Heiner Kallweit wrote:
> Caution: EXT Email
>
> On 19.08.2019 08:32, Christian Herber wrote:
>> On 16.08.2019 22:59, Heiner Kallweit wrote:
>>> On 15.08.2019 17:32, Christian Herber wrote:
>>>> This patch adds basic support for BASE-T1 PHYs in the framework.
>>>> BASE-T1 PHYs main area of application are automotive and industrial.
>>>> BASE-T1 is standardized in IEEE 802.3, namely
>>>> - IEEE 802.3bw: 100BASE-T1
>>>> - IEEE 802.3bp 1000BASE-T1
>>>> - IEEE 802.3cg: 10BASE-T1L and 10BASE-T1S
>>>>
>>>> There are no products which contain BASE-T1 and consumer type PHYs like
>>>> 1000BASE-T. However, devices exist which combine 100BASE-T1 and 1000BASE-T1
>>>> PHYs with auto-negotiation.
>>>
>>> Is this meant in a way that *currently* there are no PHY's combining Base-T1
>>> with normal Base-T modes? Or are there reasons why this isn't possible in
>>> general? I'm asking because we have PHY's combining copper and fiber, and e.g.
>>> the mentioned Aquantia PHY that combines NBase-T with 1000Base-T2.
>>>
>>>>
>>>> The intention of this patch is to make use of the existing Clause 45 functions.
>>>> BASE-T1 adds some additional registers e.g. for aneg control, which follow a
>>>> similiar register layout as the existing devices. The bits which are used in
>>>> BASE-T1 specific registers are the same as in basic registers, thus the
>>>> existing functions can be resued, with get_aneg_ctrl() selecting the correct
>>>> register address.
>>>>
>>> If Base-T1 can't be combined with other modes then at a first glance I see no
>>> benefit in defining new registers e.g. for aneg control, and the standard ones
>>> are unused. Why not using the standard registers? Can you shed some light on that?
>>>
>>> Are the new registers internally shadowed to the standard location?
>>> That's something I've seen on other PHY's: one register appears in different
>>> places in different devices.
>>>
>>>> The current version of ethtool has been prepared for 100/1000BASE-T1 and works
>>>> with this patch. 10BASE-T1 needs to be added to ethtool.
>>>>
>>>> Christian Herber (1):
>>>> Added BASE-T1 PHY support to PHY Subsystem
>>>>
>>>> drivers/net/phy/phy-c45.c | 113 +++++++++++++++++++++++++++++++----
>>>> drivers/net/phy/phy-core.c | 4 +-
>>>> include/uapi/linux/ethtool.h | 2 +
>>>> include/uapi/linux/mdio.h | 21 +++++++
>>>> 4 files changed, 129 insertions(+), 11 deletions(-)
>>>>
>>>
>>> Heiner
>>>
>>
>> Hi Heiner,
>>
>> I do not think the Aquantia part you are describing is publicly
>> documented, so i cannot comment on that part.
> Right, datasheet isn't publicly available. All I wanted to say with
> mentioning this PHY: It's not a rare exception that a PHY combines
> standard BaseT modes with "non-consumer" modes for special purposes.
> One practical use case of this proprietary 1000Base-T2 mode is
> re-using existing 2-pair cabling in aircrafts.
>
>> There are multiple reasons why e.g. xBASE-T1 plus 1000BASE-T is
>> unlikely. First, the is no use-case known to me, where this would be
>> required. Second, there is no way that you can do an auto-negotiation
>> between the two, as these both have their own auto-neg defined (Clause
>> 28/73 vs. Clause 98). Thirdly, if you would ever have a product with
>> both, I believe it would just include two full PHYs and a way to select
>> which flavor you want. Of course, this is the theory until proven
>> otherwise, but to me it is sufficient to use a single driver.
>>
> I'm with you if you say it's unlikely. However your statement in the
> commit message leaves the impression that there can't be such a device.
> And that's a difference.
>
> Regarding "including two full PHYs":
> This case we have already, there are PHYs combining different IP blocks,
> each one supporting a specific mode (e.g. copper and fiber). There you
> also have the case of different autoneg methods, clause 28 vs. clause 37.
>
>> The registers are different in the fields they include. It is just that
>> the flags which are used by the Linux driver, like restarting auto-neg,
>> are at the same position.
>>
> Good to know. Your commit description doesn't mention any specific PHY.
> I suppose you have PHYs you'd like to operate with the genphy_c45 driver.
> Could you give an example? And ideally, is a public datasheet available?
>
>> Christian
>>
>>
> Heiner
>
There are no public BASE-T1 devices on the market right now that use
Clause 45 standard registers. The first such products were developed
before the IEEE standard (BroadR-Reach) and used Clause 22 access (see
e.g. the support in the Kernel for TJA110x).
The most convenient way to test with a BASE-T1 device would be to use an
SFP (e.g.
https://technica-engineering.de/produkt/1000base-t1-sfp-module/).
Alternative source could be Goepel.
There are also a number of media-converters around where one could break
out the MDIO and connect to a processor. Of course, in all cases it
should be made sure that this is a Clause-45 device.
As all relevant parts are NDA-restricted, this is pretty much all the
information I can share.
Christian
^ permalink raw reply
* Re: [PATCH bpf-next] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Daniel Borkmann @ 2019-08-20 13:36 UTC (permalink / raw)
To: Quentin Monnet, Alexei Starovoitov; +Cc: bpf, netdev, oss-drivers
In-Reply-To: <20190820095233.17097-1-quentin.monnet@netronome.com>
On 8/20/19 11:52 AM, Quentin Monnet wrote:
> Implement the show_fdinfo hook for BTF FDs file operations, and make it
> print the id and the size of the BTF object. This allows for a quick
> retrieval of the BTF id from its FD; or it can help understanding what
> type of object (BTF) the file descriptor points to.
>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> kernel/bpf/btf.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 5fcc7a17eb5a..39e184f1b27c 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3376,6 +3376,19 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
> btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
> }
>
> +#ifdef CONFIG_PROC_FS
> +static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
> +{
> + const struct btf *btf = filp->private_data;
> +
> + seq_printf(m,
> + "btf_id:\t%u\n"
> + "data_size:\t%u\n",
> + btf->id,
> + btf->data_size);
Looks good, exposing btf_id makes sense to me in order to correlate with applications.
Do you have a concrete use case for data_size to expose it this way as opposed to fetch
it via btf_get_info_by_fd()? If not, I'd say lets only add btf_id in there.
> +}
> +#endif
> +
> static int btf_release(struct inode *inode, struct file *filp)
> {
> btf_put(filp->private_data);
> @@ -3383,6 +3396,9 @@ static int btf_release(struct inode *inode, struct file *filp)
> }
>
> const struct file_operations btf_fops = {
> +#ifdef CONFIG_PROC_FS
> + .show_fdinfo = bpf_btf_show_fdinfo,
> +#endif
> .release = btf_release,
> };
>
>
Thanks,
Daniel
^ permalink raw reply
* [PATCH] bonding: force enable lacp port after link state recovery for 802.3ad
From: zhangsha.zhang @ 2019-08-20 13:38 UTC (permalink / raw)
To: j.vosburgh, vfalico, andy, davem
Cc: netdev, linux-kernel, zhangsha.zhang, yuehaibing, hunongda,
alex.chen
From: Sha Zhang <zhangsha.zhang@huawei.com>
After the commit 334031219a84 ("bonding/802.3ad: fix slave link
initialization transition states") merged,
the slave's link status will be changed to BOND_LINK_FAIL
from BOND_LINK_DOWN in the following scenario:
- Driver reports loss of carrier and
bonding driver receives NETDEV_CHANGE notifier
- slave's duplex and speed is zerod and
its port->is_enabled is cleard to 'false';
- Driver reports link recovery and
bonding driver receives NETDEV_UP notifier;
- If speed/duplex getting failed here, the link status
will be changed to BOND_LINK_FAIL;
- The MII monotor later recover the slave's speed/duplex
and set link status to BOND_LINK_UP, but remains
the 'port->is_enabled' to 'false'.
In this scenario, the lacp port will not be enabled even its speed
and duplex are valid. The bond will not send LACPDU's, and its
state is 'AD_STATE_DEFAULTED' forever. The simplest fix I think
is to force enable lacp after port slave speed check in
bond_miimon_commit. As enabled, the lacp port can run its state machine
normally after link recovery.
Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
---
drivers/net/bonding/bond_main.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 931d9d9..379253a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2194,6 +2194,7 @@ static void bond_miimon_commit(struct bonding *bond)
{
struct list_head *iter;
struct slave *slave, *primary;
+ struct port *port;
bond_for_each_slave(bond, slave, iter) {
switch (slave->new_link) {
@@ -2205,8 +2206,13 @@ static void bond_miimon_commit(struct bonding *bond)
* link status
*/
if (BOND_MODE(bond) == BOND_MODE_8023AD &&
- slave->link == BOND_LINK_UP)
+ slave->link == BOND_LINK_UP) {
bond_3ad_adapter_speed_duplex_changed(slave);
+ if (slave->duplex == DUPLEX_FULL) {
+ port = &(SLAVE_AD_INFO(slave)->port);
+ port->is_enabled = true;
+ }
+ }
continue;
case BOND_LINK_UP:
--
1.8.3.1
^ permalink raw reply related
* [PATCH] selftests: net: add missing NFT_FWD_NETDEV to config
From: Anders Roxell @ 2019-08-20 13:41 UTC (permalink / raw)
To: davem, shuah; +Cc: netdev, linux-kselftest, linux-kernel, Anders Roxell
When running xfrm_policy.sh we see the following
# sysctl cannot stat /proc/sys/net/ipv4/conf/eth1/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv4/conf/eth1/forwarding #
# sysctl cannot stat /proc/sys/net/ipv4/conf/veth0/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv4/conf/veth0/forwarding #
# sysctl cannot stat /proc/sys/net/ipv4/conf/eth1/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv4/conf/eth1/forwarding #
# sysctl cannot stat /proc/sys/net/ipv4/conf/veth0/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv4/conf/veth0/forwarding #
# sysctl cannot stat /proc/sys/net/ipv6/conf/eth1/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv6/conf/eth1/forwarding #
# sysctl cannot stat /proc/sys/net/ipv6/conf/veth0/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv6/conf/veth0/forwarding #
# sysctl cannot stat /proc/sys/net/ipv6/conf/eth1/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv6/conf/eth1/forwarding #
# sysctl cannot stat /proc/sys/net/ipv6/conf/veth0/forwarding No such file or directory
cannot: stat_/proc/sys/net/ipv6/conf/veth0/forwarding #
# modprobe FATAL Module ip_tables not found in directory /lib/modules/5.3.0-rc5-next-20190820+
FATAL: Module_ip_tables #
# iptables v1.6.2 can't initialize iptables table `filter' Table does not exist (do you need to insmod?)
v1.6.2: can't_initialize #
Rework to enable CONFIG_NF_TABLES_NETDEV and CONFIG_NFT_FWD_NETDEV.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
tools/testing/selftests/net/config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index b8503a8119b0..e30b0ae5d474 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -29,3 +29,5 @@ CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_ETF=m
CONFIG_TEST_BLACKHOLE_DEV=m
CONFIG_KALLSYMS=y
+CONFIG_NF_TABLES_NETDEV=y
+CONFIG_NFT_FWD_NETDEV=m
--
2.20.1
^ permalink raw reply related
* [PATCH] selftests: bpf: install files test_xdp_vlan.sh
From: Anders Roxell @ 2019-08-20 13:41 UTC (permalink / raw)
To: shuah, ast, daniel, davem, jakub.kicinski, hawk, john.fastabend
Cc: linux-kselftest, netdev, bpf, linux-kernel, Anders Roxell
When ./test_xdp_vlan_mode_generic.sh runs it complains that it can't
find file test_xdp_vlan.sh.
# selftests: bpf: test_xdp_vlan_mode_generic.sh
# ./test_xdp_vlan_mode_generic.sh: line 9: ./test_xdp_vlan.sh: No such
file or directory
Rework so that test_xdp_vlan.sh gets installed, added to the variable
TEST_PROGS_EXTENDED.
Fixes: d35661fcf95d ("selftests/bpf: add wrapper scripts for test_xdp_vlan.sh")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
tools/testing/selftests/bpf/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1faad0c3c3c9..d7968e20463c 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -68,7 +68,8 @@ TEST_PROGS := test_kmod.sh \
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh \
tcp_client.py \
- tcp_server.py
+ tcp_server.py \
+ test_xdp_vlan.sh
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
--
2.20.1
^ permalink raw reply related
* [PATCH] selftests: bpf: add config fragment BPF_JIT
From: Anders Roxell @ 2019-08-20 13:41 UTC (permalink / raw)
To: shuah, ast, daniel
Cc: linux-kselftest, netdev, bpf, linux-kernel, Anders Roxell
When running test_kmod.sh the following shows up
# sysctl cannot stat /proc/sys/net/core/bpf_jit_enable No such file or directory
cannot: stat_/proc/sys/net/core/bpf_jit_enable #
# sysctl cannot stat /proc/sys/net/core/bpf_jit_harden No such file or directory
cannot: stat_/proc/sys/net/core/bpf_jit_harden #
Rework to enable CONFIG_BPF_JIT to solve "No such file or directory"
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
tools/testing/selftests/bpf/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index f7a0744db31e..5dc109f4c097 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -34,3 +34,4 @@ CONFIG_NET_MPLS_GSO=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_IPV6_SIT=m
+CONFIG_BPF_JIT=y
--
2.20.1
^ permalink raw reply related
* Re: [oss-drivers] Re: [PATCH bpf-next] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Quentin Monnet @ 2019-08-20 13:48 UTC (permalink / raw)
To: Daniel Borkmann, Alexei Starovoitov; +Cc: bpf, netdev, oss-drivers
In-Reply-To: <fcb2e528-6750-2192-befe-dd68ca36fc62@iogearbox.net>
2019-08-20 15:36 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 8/20/19 11:52 AM, Quentin Monnet wrote:
>> Implement the show_fdinfo hook for BTF FDs file operations, and make it
>> print the id and the size of the BTF object. This allows for a quick
>> retrieval of the BTF id from its FD; or it can help understanding what
>> type of object (BTF) the file descriptor points to.
>>
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> ---
>> kernel/bpf/btf.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 5fcc7a17eb5a..39e184f1b27c 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3376,6 +3376,19 @@ void btf_type_seq_show(const struct btf *btf,
>> u32 type_id, void *obj,
>> btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
>> }
>> +#ifdef CONFIG_PROC_FS
>> +static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
>> +{
>> + const struct btf *btf = filp->private_data;
>> +
>> + seq_printf(m,
>> + "btf_id:\t%u\n"
>> + "data_size:\t%u\n",
>> + btf->id,
>> + btf->data_size);
>
> Looks good, exposing btf_id makes sense to me in order to correlate with
> applications.
> Do you have a concrete use case for data_size to expose it this way as
> opposed to fetch
> it via btf_get_info_by_fd()? If not, I'd say lets only add btf_id in there.
No, I don't have one for data_size to be honest. I'll respin with btf_id
only then.
Thanks,
Quentin
^ permalink raw reply
* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-08-20 13:48 UTC (permalink / raw)
To: Mark Brown
Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190820125557.GB4738@sirena.co.uk>
Hi Mark,
On Tue, 20 Aug 2019 at 15:55, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:
>
> > I'm not sure how to respond to this, because I don't know anything
> > about the timing of DMA transfers.
> > Maybe snapshotting DMA transfers the same way is not possible (if at
> > all). Maybe they are not exactly adequate for this sort of application
> > anyway. Maybe it depends.
>
> DMA transfers generally proceed without any involvement from the CPU,
> this is broadly the point of DMA. You *may* be able to split into
> multiple transactions but it's not reliable that you'd be able to do so
> on byte boundaries and there will be latency getting notified of
> completions.
>
> > In other words, from a purely performance perspective, I am against
> > limiting the API to just snapshotting the first and last byte. At this
> > level of "zoom", if I change the offset of the byte to anything other
> > than 3, the synchronization offset refuses to converge towards zero,
> > because the snapshot is incurring a constant offset that the servo
> > loop from userspace (phc2sys) can't compensate for.
>
> > Maybe the SPI master driver should just report what sort of
> > snapshotting capability it can offer, ranging from none (default
> > unless otherwise specified), to transfer-level (DMA style) or
> > byte-level.
>
> That does then have the consequence that the majority of controllers
> aren't going to be usable with the API which isn't great.
>
Can we continue this discussion on this thread:
https://www.spinics.net/lists/netdev/msg593772.html
The whole point there is that if there's nothing that the driver can
do, the SPI core will take the timestamps and record their (bad)
precision.
> > I'm afraid more actual experimentation is needed with DMA-based
> > controllers to understand what can be expected from them, and as a
> > result, how the API should map around them.
> > MDIO bus controllers are in a similar situation (with Hubert's patch)
> > but at least there the frame size is fixed and I haven't heard of an
> > MDIO controller to use DMA.
>
> I'm not 100% clear what the problem you're trying to solve is, or if
> it's a sensible problem to try to solve for that matter.
The problem can simply be summarized as: you're trying to read a clock
over SPI, but there's so much timing jitter in you doing that, that
you have a high degree of uncertainty in the actual precision of the
readout you took.
The solution has two parts:
- Make the SPI access itself more predictable in terms of latency.
This is always going to have to be dealt with on a driver-by-driver,
hardware-by-hardware basis.
- Provide a way of taking a software timestamp in the time interval
when the latency is predictable, and as close as possible to the
moment when the SPI slave will receive the request. Disabling
interrupts and preemption always helps to snapshot that critical
section. Again, the SPI core can't do that. And finding the correct
"pre" and "post" hooks that surround the hardware transfer in a
deterministic fashion is crucial. If you read the cover letter, I used
a GPIO pin to make sure the timestamps are where they should be, and
that they don't vary in width (post - pre) - there are also some
screenshots on Gdrive. Maybe something similar is not impossible for a
DMA transfer, although the problem formulation so far is too vague to
emit a more clear statement.
If you know when the SPI slave's clock was actually read, you have a
better idea of what time it was.
Regards,
-Vladimir
^ permalink raw reply
* [PATCH bpf-next v2] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Quentin Monnet @ 2019-08-20 13:53 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann
Cc: bpf, netdev, oss-drivers, Quentin Monnet
Implement the show_fdinfo hook for BTF FDs file operations, and make it
print the id of the BTF object. This allows for a quick retrieval of the
BTF id from its FD; or it can help understanding what type of object
(BTF) the file descriptor points to.
v2:
- Do not expose data_size, only btf_id, in FD info.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
kernel/bpf/btf.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..6b403dc18486 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3376,6 +3376,15 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
}
+#ifdef CONFIG_PROC_FS
+static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+ const struct btf *btf = filp->private_data;
+
+ seq_printf(m, "btf_id:\t%u\n", btf->id);
+}
+#endif
+
static int btf_release(struct inode *inode, struct file *filp)
{
btf_put(filp->private_data);
@@ -3383,6 +3392,9 @@ static int btf_release(struct inode *inode, struct file *filp)
}
const struct file_operations btf_fops = {
+#ifdef CONFIG_PROC_FS
+ .show_fdinfo = bpf_btf_show_fdinfo,
+#endif
.release = btf_release,
};
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] selftests: bpf: install files test_xdp_vlan.sh
From: Jesper Dangaard Brouer @ 2019-08-20 14:07 UTC (permalink / raw)
To: Anders Roxell
Cc: shuah, ast, daniel, davem, jakub.kicinski, hawk, john.fastabend,
linux-kselftest, netdev, bpf, linux-kernel
In-Reply-To: <20190820134121.25728-1-anders.roxell@linaro.org>
On Tue, 20 Aug 2019 15:41:21 +0200
Anders Roxell <anders.roxell@linaro.org> wrote:
> When ./test_xdp_vlan_mode_generic.sh runs it complains that it can't
> find file test_xdp_vlan.sh.
>
> # selftests: bpf: test_xdp_vlan_mode_generic.sh
> # ./test_xdp_vlan_mode_generic.sh: line 9: ./test_xdp_vlan.sh: No such
> file or directory
>
> Rework so that test_xdp_vlan.sh gets installed, added to the variable
> TEST_PROGS_EXTENDED.
>
> Fixes: d35661fcf95d ("selftests/bpf: add wrapper scripts for test_xdp_vlan.sh")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
Thanks for catching this!
Acked-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
> tools/testing/selftests/bpf/Makefile | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 1faad0c3c3c9..d7968e20463c 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -68,7 +68,8 @@ TEST_PROGS := test_kmod.sh \
> TEST_PROGS_EXTENDED := with_addr.sh \
> with_tunnels.sh \
> tcp_client.py \
> - tcp_server.py
> + tcp_server.py \
> + test_xdp_vlan.sh
>
> # Compile but not part of 'make run_tests'
> TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH net-next] netdevsim: Fix build error without CONFIG_INET
From: Ido Schimmel @ 2019-08-20 14:08 UTC (permalink / raw)
To: Jakub Kicinski
Cc: YueHaibing, davem, idosch, jiri, mcroce, linux-kernel, netdev
In-Reply-To: <20190819145900.5d9cc1f3@cakuba.netronome.com>
On Mon, Aug 19, 2019 at 02:59:00PM -0700, Jakub Kicinski wrote:
> On Mon, 19 Aug 2019 20:08:25 +0800, YueHaibing wrote:
> > If CONFIG_INET is not set, building fails:
> >
> > drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
> > dev.c:(.text+0x67b): undefined reference to `ip_send_check'
> >
> > Add CONFIG_INET Kconfig dependency to fix this.
> >
> > Reported-by: Hulk Robot <hulkci@huawei.com>
> > Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
> > Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>
> Hmm.. I'd rather the test module did not have hard dependencies on
> marginally important config options. We have done a pretty good job
> so far limiting the requirements though separating the code out at
> compilation object level. The more tests depend on netdevsim and the
> more bots we have running tests against randconfig - the more important
> this is.
>
> This missing reference here is for calculating a checksum over a
> constant header.. could we perhaps just hard code the checksum?
Sure. I was AFK today, will send a patch later today when I get home.
Thanks
^ permalink raw reply
* Re: linux-next: Tree for Aug 19 (drivers/net/netdevsim/dev.o)
From: Ido Schimmel @ 2019-08-20 14:13 UTC (permalink / raw)
To: Randy Dunlap
Cc: Stephen Rothwell, Linux Next Mailing List,
Linux Kernel Mailing List, netdev@vger.kernel.org
In-Reply-To: <92ef45a5-c933-0493-b2ff-50352fa8bf3f@infradead.org>
On Mon, Aug 19, 2019 at 09:16:13PM -0700, Randy Dunlap wrote:
> On 8/19/19 2:18 AM, Stephen Rothwell wrote:
> > Hi all,
> >
> > Changes since 20190816:
> >
>
> on x86_64:
> # CONFIG_INET is not set
>
> ld: drivers/net/netdevsim/dev.o: in function `nsim_dev_trap_report_work':
> dev.c:(.text+0x52f): undefined reference to `ip_send_check'
Thanks, Randy.
There is a fix here [1], but some changes were requested. I will send a
fix later today and Cc you.
[1] https://patchwork.ozlabs.org/patch/1149229/
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: flow_offload: mangle 128-bit packet field with one action
From: Edward Cree @ 2019-08-20 14:15 UTC (permalink / raw)
To: Pablo Neira Ayuso, netfilter-devel
Cc: davem, netdev, jakub.kicinski, jiri, vladbu
In-Reply-To: <20190820105225.13943-1-pablo@netfilter.org>
On 20/08/2019 11:52, Pablo Neira Ayuso wrote:
> The existing infrastructure needs the front-end to generate up to four
> actions (one for each 32-bit word) to mangle an IPv6 address. This patch
> allows you to mangle fields than are longer than 4-bytes with one single
> action. Drivers have been adapted to this new representation following a
> simple approach, that is, iterate over the array of words and configure
> the hardware IR to make the packet mangling. FLOW_ACTION_MANGLE_MAX_WORDS
> defines the maximum number of words from one given offset (currently 4
> words).
>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
What's the point of this?
Why do you need to be able to do this with a single action? It doesn't
look like this extra 70 lines of code is actually buying you anything,
and it makes more work for any other drivers that want to implement the
offload API.
^ permalink raw reply
* [PATCH v2 net-next] netdevsim: Fix build error without CONFIG_INET
From: YueHaibing @ 2019-08-20 14:14 UTC (permalink / raw)
To: davem, idosch, jiri, mcroce, jakub.kicinski
Cc: linux-kernel, netdev, YueHaibing
In-Reply-To: <20190819120825.74460-1-yuehaibing@huawei.com>
If CONFIG_INET is not set, building fails:
drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
dev.c:(.text+0x67b): undefined reference to `ip_send_check'
Use ip_fast_csum instead of ip_send_check to avoid
dependencies on CONFIG_INET.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
v2: use ip_fast_csum instead of ip_send_check
---
drivers/net/netdevsim/dev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index c5b0261..39cdb6c 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -389,7 +389,8 @@ static struct sk_buff *nsim_dev_trap_skb_build(void)
iph->ihl = 0x5;
iph->tot_len = htons(tot_len);
iph->ttl = 100;
- ip_send_check(iph);
+ iph->check = 0;
+ iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
get_random_bytes(&udph->source, sizeof(u16));
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v6 6/7] nfc: pn533: Add autopoll capability
From: Lars Poeschel @ 2019-08-20 14:32 UTC (permalink / raw)
To: Johan Hovold
Cc: Allison Randal, Steve Winslow, Greg Kroah-Hartman,
Gustavo A. R. Silva, Kate Stewart, Thomas Gleixner,
open list:NFC SUBSYSTEM, open list
In-Reply-To: <20190820122344.GK32300@localhost>
On Tue, Aug 20, 2019 at 02:23:44PM +0200, Johan Hovold wrote:
> On Tue, Aug 20, 2019 at 02:03:43PM +0200, Lars Poeschel wrote:
> > drivers/nfc/pn533/pn533.c | 193 +++++++++++++++++++++++++++++++++++++-
> > drivers/nfc/pn533/pn533.h | 10 +-
> > 2 files changed, 197 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c
> > index a8c756caa678..7e915239222b 100644
> > --- a/drivers/nfc/pn533/pn533.c
> > +++ b/drivers/nfc/pn533/pn533.c
> > @@ -185,6 +185,32 @@ struct pn533_cmd_jump_dep_response {
> > u8 gt[];
> > } __packed;
> >
> > +struct pn532_autopoll_resp {
> > + u8 type;
> > + u8 ln;
> > + u8 tg;
> > + u8 tgdata[];
> > +} __packed;
>
> No need for __packed.
I did a quick test without the __packed and indeed: It worked. I'd
sworn that it is needed in this place, because this autopoll response is
data that the nfc chip puts on the serial wire without padding inbetween
and this struct is used to access this data and without the __packed the
compiler should be allowed to put padding bytes between the struct
fields. But it choose to not do it. I am still not shure, why this
happens, but ok. I can remove the __packed.
> > +static int pn533_autopoll_complete(struct pn533 *dev, void *arg,
> > + struct sk_buff *resp)
> > +{
> > + u8 nbtg;
> > + int rc;
> > + struct pn532_autopoll_resp *apr;
> > + struct nfc_target nfc_tgt;
> > +
> > + if (IS_ERR(resp)) {
> > + rc = PTR_ERR(resp);
> > +
> > + nfc_err(dev->dev, "%s autopoll complete error %d\n",
> > + __func__, rc);
> > +
> > + if (rc == -ENOENT) {
> > + if (dev->poll_mod_count != 0)
> > + return rc;
> > + goto stop_poll;
> > + } else if (rc < 0) {
> > + nfc_err(dev->dev,
> > + "Error %d when running autopoll\n", rc);
> > + goto stop_poll;
> > + }
> > + }
> > +
> > + nbtg = resp->data[0];
> > + if ((nbtg > 2) || (nbtg <= 0))
> > + return -EAGAIN;
> > +
> > + apr = (struct pn532_autopoll_resp *)&resp->data[1];
> > + while (nbtg--) {
> > + memset(&nfc_tgt, 0, sizeof(struct nfc_target));
> > + switch (apr->type) {
> > + case PN532_AUTOPOLL_TYPE_ISOA:
> > + dev_dbg(dev->dev, "ISOA");
>
> You forgot the '\n' here and elsewhere (some nfc_err as well).
I can add them. I will wait a bit for more review comments before
posting a new patchset.
Thanks so far for your quick review,
Lars
^ permalink raw reply
* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20 14:25 UTC (permalink / raw)
To: Hubert Feurstein
Cc: netdev, lkml, Andrew Lunn, Richard Cochran, Florian Fainelli,
Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <CAFfN3gW-4avfnrV7t-2nC+cVt3sgMD33L44P4PGU-MCAtuR+XA@mail.gmail.com>
On Tue, Aug 20, 2019 at 02:29:27PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
> > This is important to not break the estimation of maximum error in the
> > measured offset. Applications using the ioctl may assume that the
> > maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> > phc2sys). That would not work if the delay could be occasionally 50
> > microseconds for instance, i.e. the post_ts timestamp would be earlier
> > than the PHC timestamp.
> >
> If the timestamps are taken in the MDIO driver (imx-fec in my case), then
> everything is quite deterministic (see results in the cover letter). Of course,
> it still can be improved slightly, by splitting up the writel into iowmb and
> write_relaxed and disable the interrupts while capturing the timestamps
> (as I did in my original RFC patch). But phc2sys takes by default 5 measurements
> and uses the one with the smallest delay, so this shouldn't be necessary.
The delay that phc2sys sees is the difference between post_ts and
pre_ts, which doesn't contain the actual delay, right? So, I'm not
sure how well the phc2sys filtering actually works. Even if the
measured delay was related to the write delay, would be 5 measurements
always enough to get a "correct" sample?
> Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
> the timestamp is aligned with the PHC timestamp, but this also increases
> the delay which is reported by phc2sys (~26us). But the maximum error
> which must be expected is definitely much less (< 1us). So maybe it is better
> to shift both timestamps pre_ts and post_ts by ptp_sts_offset.
If you could guarantee that [pre_ts + ptp_sts_offset, post_ts +
ptp_sts_offset] always contains the PHC timestamps, then that would be
great. From what Andrew is writing, this doesn't seem to be the case.
I'd suggest a different approach:
- specify a minimum delay for the write and a minimum delay for the
completion (if they are not equal)
- take a second "post" system timestamp after the completion
- adjust pre_ts and post_ts so that the middle point is equal to what
you have now, but the interval contains both
pre_ts + min_write_delay and post2_ts - min_completion_delay
This way you get the best stability and also a delay that correctly
estimates the maximum error.
--
Miroslav Lichvar
^ permalink raw reply
* Re: [PATCH bpf-next] xsk: proper socket state check in xsk_poll
From: Daniel Borkmann @ 2019-08-20 14:30 UTC (permalink / raw)
To: Björn Töpel, syzbot+c82697e3043781e08802, ast, netdev
Cc: bjorn.topel, bpf, davem, hawk, jakub.kicinski, john.fastabend,
jonathan.lemon, kafai, linux-kernel, magnus.karlsson,
songliubraving, syzkaller-bugs, xdp-newbies, yhs, hdanton
In-Reply-To: <20190820100405.25564-1-bjorn.topel@gmail.com>
On 8/20/19 12:04 PM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The poll() implementation for AF_XDP sockets did not perform the
> proper state checks, prior accessing the socket umem. This patch fixes
> that by performing a xsk_is_bound() check.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> net/xdp/xsk.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..08bed5e92af4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + struct net_device *dev = READ_ONCE(xs->dev);
> +
> + return dev && xs->state == XSK_BOUND;
> +}
> +
> static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> {
> bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> - if (unlikely(!xs->dev))
> + if (unlikely(!xsk_is_bound(xs)))
> return -ENXIO;
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> return -ENETDOWN;
> @@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> struct net_device *dev = xs->dev;
> struct xdp_umem *umem = xs->umem;
>
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> umem->need_wakeup);
> @@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (!xsk_is_bound(xs))
> return;
I think I'm a bit confused by your READ_ONCE() usage. ;-/ I can see why you're
using it in xsk_is_bound() above, but then at the same time all the other callbacks
like xsk_poll() or xsk_unbind_dev() above have a struct net_device *dev = xs->dev
right before the test. Could you elaborate?
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Sabrina Dubroca @ 2019-08-20 14:41 UTC (permalink / raw)
To: Antoine Tenart
Cc: Igor Russkikh, Andrew Lunn, davem@davemloft.net,
f.fainelli@gmail.com, hkallweit1@gmail.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
allan.nielsen@microchip.com, camelia.groza@nxp.com,
Simon Edelhaus, Pavel Belous
In-Reply-To: <20190820100140.GA3292@kwain>
2019-08-20, 12:01:40 +0200, Antoine Tenart wrote:
> So it seems the ability to enable or disable the offloading on a given
> interface is the main missing feature. I'll add that, however I'll
> probably (at least at first):
>
> - Have the interface to be fully offloaded or fully handled in s/w (with
> errors being thrown if a given configuration isn't supported). Having
> both at the same time on a given interface would be tricky because of
> the MACsec validation parameter.
>
> - Won't allow to enable/disable the offloading of there are rules in
> place, as we're not sure the same rules would be accepted by the other
> implementation.
That's probably quite problematic actually, because to do that you
need to be able to resync the state between software and hardware,
particularly packet numbers. So, yeah, we're better off having it
completely blocked until we have a working implementation (if that
ever happens).
However, that means in case the user wants to set up something that's
not offloadable, they'll have to:
- configure the offloaded version until EOPNOTSUPP
- tear everything down
- restart from scratch without offloading
That's inconvenient.
Talking about packet numbers, can you describe how PN exhaustion is
handled? I couldn't find much about packet numbers at all in the
driver patches (I hope the hw doesn't wrap around from 2^32-1 to 0 on
the same SA). At some point userspace needs to know that we're
getting close to 2^32 and that it's time to re-key. Since the whole
TX path of the software implementation is bypassed, it looks like the
PN (as far as drivers/net/macsec.c is concerned) never increases, so
userspace can't know when to negotiate a new SA.
> I'm not sure if we should allow to mix the implementations on a given
> physical interface (by having two MACsec interfaces attached) as the
> validation would be impossible to do (we would have no idea if a
> packet was correctly handled by the offloading part or just not being
> a MACsec packet in the first place, in Rx).
That's something that really bothers me with this proposal. Quoting
from the commit message:
> the packets seen by the networking stack on both the physical and
> MACsec virtual interface are exactly the same
If the HW/driver is expected to strip the sectag, I don't see how we
could ever have multiple secy's at the same time and demultiplex
properly between them. That's part of the reason why I chose to have
virtual interfaces: without them, picking the right secy on TX gets
weird.
AFAICT, it means that any filters (tc, tcpdump) need to be different
between offloaded and non-offloaded cases.
And it's going to be confusing to the administrator when they look at
tcpdumps expecting to see MACsec frames.
I don't see how this implementation handles non-macsec traffic (on TX,
that would be packets sent directly through the real interface, for
example by wpa_supplicant - on RX, incoming MKA traffic for
wpa_supplicant). Unless I missed something, incoming MKA traffic will
end up on the macsec interface as well as the lower interface (not
entirely critical, as long as wpa_supplicant can grab it on the lower
device, but not consistent with the software implementation). How does
the driver distinguish traffic that should pass through unmodified
from traffic that the HW needs to encapsulate and encrypt?
If you look at IPsec offloading, the networking stack builds up the
ESP header, and passes the unencrypted data down to the driver. I'm
wondering if the same would be possible with MACsec offloading: the
macsec virtual interface adds the header (and maybe a dummy ICV), and
then the HW does the encryption. In case of HW that needs to add the
sectag itself, the driver would first strip the headers that the stack
created. On receive, the driver would recreate a sectag and the macsec
interface would just skip all verification (decrypt, PN).
--
Sabrina
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: flow_offload: mangle 128-bit packet field with one action
From: Pablo Neira Ayuso @ 2019-08-20 14:44 UTC (permalink / raw)
To: Edward Cree; +Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, vladbu
In-Reply-To: <f18d8369-f87d-5b9a-6c9d-daf48a3b95f1@solarflare.com>
On Tue, Aug 20, 2019 at 03:15:16PM +0100, Edward Cree wrote:
> On 20/08/2019 11:52, Pablo Neira Ayuso wrote:
> > The existing infrastructure needs the front-end to generate up to four
> > actions (one for each 32-bit word) to mangle an IPv6 address. This patch
> > allows you to mangle fields than are longer than 4-bytes with one single
> > action. Drivers have been adapted to this new representation following a
> > simple approach, that is, iterate over the array of words and configure
> > the hardware IR to make the packet mangling. FLOW_ACTION_MANGLE_MAX_WORDS
> > defines the maximum number of words from one given offset (currently 4
> > words).
> >
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>
> What's the point of this?
> Why do you need to be able to do this with a single action? It doesn't
> look like this extra 70 lines of code is actually buying you anything,
> and it makes more work for any other drivers that want to implement the
> offload API.
It looks to me this limitation is coming from tc pedit.
Four actions to mangle an IPv6 address consume more memory when making
the translation, and if you expect a lot of rules.
I think drivers can do more than one 32-bit word mangling in one go.
^ permalink raw reply
* [PATCH net-next 1/9] s390/qeth: use node_descriptor struct
From: Julian Wiedmann @ 2019-08-20 14:46 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Heiko Carstens, Stefan Raspl, Ursula Braun,
Julian Wiedmann
In-Reply-To: <20190820144643.64041-1-jwi@linux.ibm.com>
Rather than fumbling with hard-coded offsets, use the proper struct to
access the retrieved RCD information.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
drivers/s390/net/qeth_core.h | 2 +-
drivers/s390/net/qeth_core_main.c | 31 +++++++++++++++++++++++--------
2 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 28db887d38ed..47e01cdd1775 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -651,7 +651,7 @@ struct qeth_card_blkt {
struct qeth_card_info {
unsigned short unit_addr2;
unsigned short cula;
- unsigned short chpid;
+ u8 chpid;
__u16 func_level;
char mcl_level[QETH_MCL_LENGTH + 1];
u8 open_when_online:1;
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 0803070246aa..50f2773a1f8c 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1775,19 +1775,32 @@ static int qeth_send_control_data(struct qeth_card *card,
return rc;
}
+struct qeth_node_desc {
+ struct node_descriptor nd1;
+ struct node_descriptor nd2;
+ struct node_descriptor nd3;
+};
+
static void qeth_read_conf_data_cb(struct qeth_card *card,
struct qeth_cmd_buffer *iob)
{
- unsigned char *prcd = iob->data;
+ struct qeth_node_desc *nd = (struct qeth_node_desc *) iob->data;
+ u8 *tag;
QETH_CARD_TEXT(card, 2, "cfgunit");
- card->info.chpid = prcd[30];
- card->info.unit_addr2 = prcd[31];
- card->info.cula = prcd[63];
- card->info.is_vm_nic = ((prcd[0x10] == _ascebc['V']) &&
- (prcd[0x11] == _ascebc['M']));
- card->info.use_v1_blkt = prcd[74] == 0xF0 && prcd[75] == 0xF0 &&
- prcd[76] >= 0xF1 && prcd[76] <= 0xF4;
+ card->info.is_vm_nic = nd->nd1.plant[0] == _ascebc['V'] &&
+ nd->nd1.plant[1] == _ascebc['M'];
+ tag = (u8 *)&nd->nd1.tag;
+ card->info.chpid = tag[0];
+ card->info.unit_addr2 = tag[1];
+
+ tag = (u8 *)&nd->nd2.tag;
+ card->info.cula = tag[1];
+
+ card->info.use_v1_blkt = nd->nd3.model[0] == 0xF0 &&
+ nd->nd3.model[1] == 0xF0 &&
+ nd->nd3.model[2] >= 0xF1 &&
+ nd->nd3.model[2] <= 0xF4;
qeth_notify_reply(iob->reply, 0);
qeth_put_cmd(iob);
@@ -1803,6 +1816,8 @@ static int qeth_read_conf_data(struct qeth_card *card)
ciw = ccw_device_get_ciw(channel->ccwdev, CIW_TYPE_RCD);
if (!ciw || ciw->cmd == 0)
return -EOPNOTSUPP;
+ if (ciw->count < sizeof(struct qeth_node_desc))
+ return -EINVAL;
iob = qeth_alloc_cmd(channel, ciw->count, 1, QETH_RCD_TIMEOUT);
if (!iob)
--
2.17.1
^ permalink raw reply related
* [PATCH net-next 0/9] s390/net: updates 2019-08-20
From: Julian Wiedmann @ 2019-08-20 14:46 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Heiko Carstens, Stefan Raspl, Ursula Braun,
Julian Wiedmann
Hi Dave,
please apply the following patches to net-next. This series brings a mix
of cleanups and small improvements for various parts of qeth's control
path. Also, a minor cleanup for ctcm and lcs.
Thanks,
Julian
Julian Wiedmann (9):
s390/qeth: use node_descriptor struct
s390/qeth: propagate length of processed cmd IO data to callback
s390/qeth: use correct length field in SNMP cmd callback
s390/qeth: keep cmd alive after IO completion
s390/qeth: merge qeth_reply struct into qeth_cmd_buffer
s390/qeth: get vnicc sub-cmd type from reply data
s390/qeth: streamline control code for promisc mode
s390/ctcm: don't use intparm for channel IO
s390/lcs: don't use intparm for channel IO
drivers/s390/net/ctcm_fsms.c | 42 ++---
drivers/s390/net/ctcm_main.c | 6 +-
drivers/s390/net/ctcm_mpc.c | 6 +-
drivers/s390/net/lcs.c | 6 +-
drivers/s390/net/qeth_core.h | 36 ++---
drivers/s390/net/qeth_core_main.c | 261 ++++++++++++++----------------
drivers/s390/net/qeth_core_mpc.h | 1 -
drivers/s390/net/qeth_l2_main.c | 62 +++----
drivers/s390/net/qeth_l3_main.c | 24 +--
9 files changed, 197 insertions(+), 247 deletions(-)
--
2.17.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox