* [PATCH net-next] r8169: fix DMA issue on MIPS platform
From: Heiner Kallweit @ 2019-08-23 18:07 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: Aaro Koskinen, netdev@vger.kernel.org
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca5b ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
---
drivers/net/ethernet/realtek/r8169_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 910944120..6182e7d33 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5822,6 +5822,10 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
skb->tail += pkt_size;
skb->len = pkt_size;
+ dma_sync_single_for_device(tp_to_dev(tp),
+ le64_to_cpu(desc->addr),
+ pkt_size, DMA_FROM_DEVICE);
+
rtl8169_rx_csum(skb, status);
skb->protocol = eth_type_trans(skb, dev);
--
2.23.0
^ permalink raw reply related
* [PATCH net] Revert "r8169: remove not needed call to dma_sync_single_for_device"
From: Heiner Kallweit @ 2019-08-23 17:57 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Aaro Koskinen
This reverts commit f072218cca5b076dd99f3dfa3aaafedfd0023a51.
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca5b ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
---
This fix doesn't apply cleanly on net-next, I'll provide a separate one for net-next.
---
drivers/net/ethernet/realtek/r8169_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index e1dd6ea60..bae0074ab 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5921,6 +5921,7 @@ static struct sk_buff *rtl8169_try_rx_copy(void *data,
skb = napi_alloc_skb(&tp->napi, pkt_size);
if (skb)
skb_copy_to_linear_data(skb, data, pkt_size);
+ dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
return skb;
}
--
2.23.0
^ permalink raw reply related
* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-23 18:00 UTC (permalink / raw)
To: Alex Williamson
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <20190823111641.7f928917@x1.home>
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 23, 2019 10:47 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>
> On Fri, 23 Aug 2019 16:14:04 +0000
> Parav Pandit <parav@mellanox.com> wrote:
>
> > > > Idea is to have mdev alias as optional.
> > > > Each mdev_parent says whether it wants mdev_core to generate an
> > > > alias or not. So only networking device drivers would set it to true.
> > > > For rest, alias won't be generated, and won't be compared either
> > > > during creation time. User continue to provide only uuid.
> > >
> > > Ok
> > >
> > > > I am tempted to have alias collision detection only within
> > > > children mdevs of the same parent, but doing so will always
> > > > mandate to prefix in netdev name. And currently we are left with
> > > > only 3 characters to prefix it, so that may not be good either.
> > > > Hence, I think mdev core wide alias is better with 12 characters.
> > >
> > > I suppose it depends on the API, if the vendor driver can ask the
> > > mdev core for an alias as part of the device creation process, then
> > > it could manage the netdev namespace for all its devices, choosing
> > > how many characters to use, and fail the creation if it can't meet a
> > > uniqueness requirement. IOW, mdev-core would always provide a full
> > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > >
> > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > mdev core allowed to create a mdev.
>
> The mdev vendor driver has the opportunity to fail the device creation in
> mdev_parent_ops.create().
>
That is not helpful for below reasons.
1. vendor driver doesn't have visibility in other vendor's alias.
2. Even for single vendor, it needs to maintain global list of devices to see collision.
3. multiple vendors needs to implement same scheme.
Mdev core should be the owner. Shifting ownership from one layer to a lower layer in vendor driver doesn't solve the problem
(if there is one, which I think doesn't exist).
> > And then devlink core chooses
> > only 6 bytes (12 characters) and there is collision. Things fall
> > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > core's ownership to provide unique aliases.
>
> You're suggesting/contemplating multiple solutions here, 3-char prefix + 12-
> char sha1 vs <parent netdev> + ?-char sha1. Also, the 15-char total limit is
> imposed by an external subsystem, where the vendor driver is the gateway
> between that subsystem and mdev. How would mdev integrate with another
> subsystem that maybe only has 9-chars available? Would the vendor driver API
> specify "I need an alias" or would it specify "I need an X-char length alias"?
Yes, Vendor driver should say how long the alias it wants.
However before we implement that, I suggest let such vendor/user/driver arrive which needs that.
Such variable length alias can be added at that time and even with that alias collision can be detected by single mdev module.
> Does it make sense that mdev-core would fail creation of a device if there's a
> collision in the 12-char address space between different subsystems? For
> example, does enm0123456789ab really collide with xyz0123456789ab?
I think so, because at mdev level its 12-char alias matters.
Choosing the prefix not adding prefix is really a user space choice.
> So if
> mdev were to provided a 40-char sha1, is it possible that the vendor driver
> could consume this in its create callback, truncate it to the number of chars
> required by the vendor driver's subsystem, and determine whether a collision
> exists?
We shouldn't shift the problem from mdev to multiple vendor drivers to detect collision.
I still think that user providing alias is better because it knows the use-case system in use, and eliminates these collision issue.
>
> > > > I do not understand how an extra character reduces collision, if
> > > > that's what you meant.
> > >
> > > If the default were for example 3-chars, we might already have
> > > device 'abc'. A collision would expose one more char of the new
> > > device, so we might add device with alias 'abcd'. I mentioned
> > > previously that this leaves an issue for userspace that we can't
> > > change the alias of device abc, so without additional information,
> > > userspace can only determine via elimination the mapping of alias to
> > > device, but userspace has more information available to it in the
> > > form of sysfs links.
> > > > Module options are almost not encouraged anymore with other
> > > > subsystems/drivers.
> > >
> > > We don't live in a world of absolutes. I agree that the defaults
> > > should work in the vast majority of cases. Requiring a user to
> > > twiddle module options to make things work is undesirable, verging
> > > on a bug. A module option to enable some specific feature, unsafe
> > > condition, or test that is outside of the typical use case is
> > > reasonable, imo.
> > > > For testing collision rate, a sample user space script and sample
> > > > mtty is easy and get us collision count too. We shouldn't put that
> > > > using module option in production kernel. I practically have the
> > > > code ready to play with; Changing 12 to smaller value is easy with
> > > > module reload.
> > > >
> > > > #define MDEV_ALIAS_LEN 12
> > >
> > > If it can't be tested with a shipping binary, it probably won't be
> > > tested. Thanks,
> > It is not the role of mdev core to expose collision
> > efficiency/deficiency of the sha1. It can be tested outside before
> > mdev choose to use it.
>
> The testing I'm considering is the user and kernel response to a collision.
>
> > I am saying we should test with 12 characters with 10,000 or more
> > devices and see how collision occurs. Even if collision occurs, mdev
> > returns EEXIST status indicating user to pick a different UUID for
> > those rare conditions.
>
> The only way we're going to see collision with a 12-char sha1 is if we burn the
> CPU cycles to find uuids that collide in that space. 10,000 devices is not
> remotely enough to generate a collision in that address space. That puts a
> prerequisite in place that in order to test collision, someone needs to know
> certain magic inputs. OTOH, if we could use a shorter abbreviation, collisions
> are trivial to test experimentally. Thanks,
>
Yes, and therefore a sane user who wants to create more mdevs, wouldn't intentionally stress it to see failures.
> Alex
^ permalink raw reply
* Re: [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: David Ahern @ 2019-08-23 17:59 UTC (permalink / raw)
To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1566582703-26567-1-git-send-email-alexey.kodanev@oracle.com>
On 8/23/19 1:51 PM, Alexey Kodanev wrote:
> When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
> same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked
> in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
> doesn't work anymore.
>
> This issue was found with LTP mpls03 tests.
>
> Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
> net/mpls/mpls_iptunnel.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
Thanks for the report and patch. On first glance, it seems odd that
neigh_xmit could be called with rt_gw4 (formerly rt_gateway) set to 0
and if it is non-zero, why isn't family set.
I am traveling today and doubt I will be able to take a deep look at
this until Monday.
^ permalink raw reply
* Aw: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
From: Frank Wunderlich @ 2019-08-23 17:56 UTC (permalink / raw)
To: "René van Dorst"
Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
linux-mips, Russell King, Stefan Roese,
"René van Dorst"
In-Reply-To: <20190823134516.27559-1-opensource@vdorst.com>
tested on bpi-r2 (mt7623/mt7530) and bpi-r64 (mt7622/rtl8367)
as reported to rene directly rx-path needs some rework because current rx-speed
on bpi-r2 is 865 Mbits/sec instead of ~940 Mbits/sec
Tested-by: Frank Wunderlich <frank-w@public-files.de>
regards Frank
> Gesendet: Freitag, 23. August 2019 um 15:45 Uhr
> Von: "René van Dorst" <opensource@vdorst.com>
> An: "John Crispin" <john@phrozen.org>, "Sean Wang" <sean.wang@mediatek.com>, "Nelson Chang" <nelson.chang@mediatek.com>, "David S . Miller" <davem@davemloft.net>, "Matthias Brugger" <matthias.bgg@gmail.com>
> Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-mips@vger.kernel.org, "Russell King" <linux@armlinux.org.uk>, "Frank Wunderlich" <frank-w@public-files.de>, "Stefan Roese" <sr@denx.de>, "René van Dorst" <opensource@vdorst.com>
> Betreff: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
>
> These patches converts mediatek driver to PHYLINK API.
>
> v2->v3:
> * Phylink improvements and clean-ups after review
> v1->v2:
> * Rebase for mt76x8 changes
> * Phylink improvements and clean-ups after review
> * SGMII port doesn't support 2.5Gbit in SGMII mode only in BASE-X mode.
> Refactor the code.
>
> René van Dorst (3):
> net: ethernet: mediatek: Add basic PHYLINK support
> net: ethernet: mediatek: Re-add support SGMII
> dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the
> new phylink API
>
> .../arm/mediatek/mediatek,sgmiisys.txt | 2 -
> .../dts/mediatek/mt7622-bananapi-bpi-r64.dts | 28 +-
> arch/arm64/boot/dts/mediatek/mt7622.dtsi | 1 -
> drivers/net/ethernet/mediatek/Kconfig | 2 +-
> drivers/net/ethernet/mediatek/mtk_eth_path.c | 75 +--
> drivers/net/ethernet/mediatek/mtk_eth_soc.c | 529 ++++++++++++------
> drivers/net/ethernet/mediatek/mtk_eth_soc.h | 68 ++-
> drivers/net/ethernet/mediatek/mtk_sgmii.c | 65 ++-
> 8 files changed, 477 insertions(+), 293 deletions(-)
>
> --
> 2.20.1
>
>
^ permalink raw reply
* [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: Alexey Kodanev @ 2019-08-23 17:51 UTC (permalink / raw)
To: netdev; +Cc: David Ahern, David Miller, Alexey Kodanev
When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked
in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
doesn't work anymore.
This issue was found with LTP mpls03 tests.
Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
net/mpls/mpls_iptunnel.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index d25e91d..44b6750 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -133,12 +133,12 @@ static int mpls_xmit(struct sk_buff *skb)
mpls_stats_inc_outucastpkts(out_dev, skb);
if (rt) {
- if (rt->rt_gw_family == AF_INET)
- err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gw4,
- skb);
- else if (rt->rt_gw_family == AF_INET6)
+ if (rt->rt_gw_family == AF_INET6)
err = neigh_xmit(NEIGH_ND_TABLE, out_dev, &rt->rt_gw6,
skb);
+ else
+ err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gw4,
+ skb);
} else if (rt6) {
if (ipv6_addr_v4mapped(&rt6->rt6i_gateway)) {
/* 6PE (RFC 4798) */
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v4 00/14] NFC: nxp-nci: clean up and new device support
From: Andy Shevchenko @ 2019-08-23 17:20 UTC (permalink / raw)
To: Sedat Dilek
Cc: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
Sedat Dilek
In-Reply-To: <892584913.468.1566336479573@ox.credativ.com>
On Tue, Aug 20, 2019 at 11:27:59PM +0200, Sedat Dilek wrote:
> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> hat am 29. Juli 2019 15:35 geschrieben:
> I gave that patchset v4 a try against Linux v5.3-rc5.
>
> And played with neard and neard-tools v0.16-0.1 from Debian/buster AMD64.
>
> # nfctool --list
>
> # nfctool --enable --device=nfc0
>
> # nfctool --list --device=nfc0
> nfc0:
> Tags: [ tag11 ]
> Devices: [ ]
> Protocols: [ Felica MIFARE Jewel ISO-DEP NFC-DEP ]
> Powered: Yes
> RF Mode: Initiator
> lto: 0
> rw: 0
> miux: 0
>
> # nfctool --device=nfc0 --poll=Both --sniff --dump-symm
> Start sniffer on nfc0
>
> Start polling on nfc0 as both initiator and target
>
> Targets found for nfc0
> Tags: [ tag11 ]
> Devices: [ ]
>
> But I see in the logs:
>
> # journalctl -u neard.service -f
> Aug 20 23:01:15 iniza neard[6158]: Error while reading NFC bytes
>
> What does this error mean?
> How can I get more informations?
> Can you aid with debugging help?
Unfortunately I have no idea about user space tools.
But I think neard has to be updated to match kernel somehow.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-23 17:16 UTC (permalink / raw)
To: Parav Pandit
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866E33AF7203DE47F713FAAD1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>
On Fri, 23 Aug 2019 16:14:04 +0000
Parav Pandit <parav@mellanox.com> wrote:
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, August 23, 2019 9:22 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Fri, 23 Aug 2019 14:53:06 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, August 23, 2019 7:58 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Fri, 23 Aug 2019 08:14:39 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >
> > > > > Hi Alex,
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > <cohuck@redhat.com>;
> > > > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > > >
> > > > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > > > > >
> > > > > > >
> > > > > > >> -----Original Message-----
> > > > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > > > <cohuck@redhat.com>;
> > > > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > > >> core
> > > > > > >>
> > > > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >> -----Original Message-----
> > > > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > > >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > > >> <cohuck@redhat.com>;
> > > > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > > >> >> core
> > > > > > >> >>
> > > > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com
> > wrote:
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> >> -----Original Message-----
> > > > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > > >> >> <cohuck@redhat.com>;
> > > > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > > >> >> >> mdev core
> > > > > > >> >> >>
> > > > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST,
> > > > > > >> >> >> parav@mellanox.com
> > > > wrote:
> > > > > > >> >> >> >
> > > > > > >> >> >> >
> > > > > > >> >> >> >> -----Original Message-----
> > > > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > > >> >> >> >> mdev core
> > > > > > >> >> >> >>
> > > > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's
> > set.
> > > > > > >> >> >> >> > > > > In fact, proposing that the user does not
> > > > > > >> >> >> >> > > > > set it, mdev-core provides one
> > > > > > >> >> >> >> > > automatically.
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > > > >> >> >> >> > > > > > > overhead, as I ask about above in how
> > > > > > >> >> >> >> > > > > > > many characters we actually have to work
> > > > > > >> >> >> >> > > > > > > with in IFNAMESZ, maybe we start with
> > > > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > > > > >> >> >> disambiguation.
> > > > > > >> >> >> >> > > > > > > If we can eliminate overhead in
> > > > > > >> >> >> >> > > > > > > IFNAMESZ, let's start with
> > > > > > >> 12.
> > > > > > >> >> >> >> > > > > > > Thanks,
> > > > > > >> >> >> >> > > > > > >
> > > > > > >> >> >> >> > > > > > If user is going to choose the alias, why
> > > > > > >> >> >> >> > > > > > does it have to be limited to
> > > > > > >> >> >> >> sha1?
> > > > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > > > >> >> >> >> > > > > >
> > > > > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > > > >> >> >> >> > > > > abbreviated sha1. The user does not provide
> > > > > > >> >> >> >> > > > > the
> > > > > > >> >> >> >> alias.
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > > Instead of mdev imposing number of
> > > > > > >> >> >> >> > > > > > characters on the alias, it should be best
> > > > > > >> >> >> >> > > > > left to the user.
> > > > > > >> >> >> >> > > > > > Because in future if netdev improves on
> > > > > > >> >> >> >> > > > > > the naming scheme, mdev will be
> > > > > > >> >> >> >> > > > > limiting it, which is not right.
> > > > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > > > >> >> >> >> > > > > > User configuring mdev for networking
> > > > > > >> >> >> >> > > > > > devices in a given kernel knows what
> > > > > > >> >> >> >> > > > > user is doing.
> > > > > > >> >> >> >> > > > > > So user can choose alias name size as it finds
> > suitable.
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > > > >> >> >> >> > > > > Thanks,
> > > > > > >> >> >> >> > > >
> > > > > > >> >> >> >> > > > I understood your point. But mdev doesn't know
> > > > > > >> >> >> >> > > > how user is going to use
> > > > > > >> >> >> >> > > udev/systemd to name the netdev.
> > > > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters,
> > > > > > >> >> >> >> > > > it could result in
> > > > > > >> >> collision.
> > > > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > > > >> >> >> >> > > > user, as user know the best
> > > > > > >> >> >> >> > > policy for its use case in the environment its using.
> > > > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > > > > >> >> >> >> > >
> > > > > > >> >> >> >> > > Haven't you already provided examples where
> > > > > > >> >> >> >> > > certain drivers or subsystems have unique netdev
> > prefixes?
> > > > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev
> > > > > > >> >> >> >> > > prefix for the mdev subsystem and avoid all
> > > > > > >> >> >> >> > > other collisions? I'm not in favor of the user
> > > > > > >> >> >> >> > > providing both a uuid and an alias/instance.
> > > > > > >> >> >> >> > > Thanks,
> > > > > > >> >> >> >> > >
> > > > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > > > >> >> >> >> > first 9 characters have
> > > > > > >> >> >> >> collision?
> > > > > > >> >> >> >>
> > > > > > >> >> >> >> I think it would be a mistake to waste so many chars
> > > > > > >> >> >> >> on a prefix, but
> > > > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a
> > > > > > >> >> >> >> collision before we have 10s of thousands of
> > > > > > >> >> >> >> devices. Thanks,
> > > > > > >> >> >> >>
> > > > > > >> >> >> >> Alex
> > > > > > >> >> >> >
> > > > > > >> >> >> >Jiri, Dave,
> > > > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > > > >> >> >> >
> > > > > > >> >> >> >This will be supplied during devlink port attr set
> > > > > > >> >> >> >such as,
> > > > > > >> >> >> >
> > > > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > > > >> >> >> >const char *mdev_alias);
> > > > > > >> >> >> >
> > > > > > >> >> >> >This alias is used to generate representor netdev's
> > > > phys_port_name.
> > > > > > >> >> >> >This alias from the mdev device's sysfs will be used
> > > > > > >> >> >> >by the udev/systemd to
> > > > > > >> >> >> generate predicable netdev's name.
> > > > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > > > > >> >> >>
> > > > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > > > >> >> >>
> > > > > > >> >> >Since users sees two devices with same phys_port_name,
> > > > > > >> >> >user should destroy
> > > > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > > > >> >>
> > > > > > >> >> Driver should make sure phys port name wont collide,
> > > > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > > > >> >alias and if there
> > > > > > >> is any other mdev with same alias exist, it returns -EEXIST
> > > > > > >> error before progressing further.
> > > > > > >> >This way user will get to know upfront in event of collision
> > > > > > >> >before the mdev
> > > > > > >> device gets created.
> > > > > > >> >How about that?
> > > > > > >>
> > > > > > >> Sounds fine to me. Now the question is how many chars do we
> > > > > > >> want to
> > > > have.
> > > > > > >>
> > > > > > >12 characters from Alex's suggestion similar to git?
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > >
> > > > > Can you please confirm this scheme looks good now? I like to get
> > > > > patches
> > > > started.
> > > >
> > > > My only concern is your comment that in the event of an abbreviated
> > > > sha1 collision (as exceptionally rare as that might be at 12-chars),
> > > > we'd fail the device create, while my original suggestion was that
> > > > vfio-core would add an extra character to the alias. For
> > > > non-networking devices, the sha1 is unnecessary, so the extension
> > > > behavior seems preferred. The user is only responsible to provide a
> > > > unique uuid. Perhaps the failure behavior could be applied based on
> > > > the mdev device_api. A module option on mdev to specify the default
> > > > number of alias chars would also be useful for testing so that we
> > > > can set it low enough to validate the collision behavior. Thanks,
> > > >
> > >
> > > Idea is to have mdev alias as optional.
> > > Each mdev_parent says whether it wants mdev_core to generate an alias
> > > or not. So only networking device drivers would set it to true.
> > > For rest, alias won't be generated, and won't be compared either
> > > during creation time. User continue to provide only uuid.
> >
> > Ok
> >
> > > I am tempted to have alias collision detection only within children
> > > mdevs of the same parent, but doing so will always mandate to prefix
> > > in netdev name. And currently we are left with only 3 characters to
> > > prefix it, so that may not be good either. Hence, I think mdev core
> > > wide alias is better with 12 characters.
> >
> > I suppose it depends on the API, if the vendor driver can ask the mdev core for
> > an alias as part of the device creation process, then it could manage the netdev
> > namespace for all its devices, choosing how many characters to use, and fail
> > the creation if it can't meet a uniqueness requirement. IOW, mdev-core would
> > always provide a full sha1 and therefore gets itself out of the
> > uniqueness/collision aspects.
> >
> This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> mdev core allowed to create a mdev.
The mdev vendor driver has the opportunity to fail the device creation
in mdev_parent_ops.create().
> And then devlink core chooses
> only 6 bytes (12 characters) and there is collision. Things fall
> apart. Since mdev provides unique uuid based scheme, it's the mdev
> core's ownership to provide unique aliases.
You're suggesting/contemplating multiple solutions here, 3-char prefix +
12-char sha1 vs <parent netdev> + ?-char sha1. Also, the 15-char
total limit is imposed by an external subsystem, where the vendor
driver is the gateway between that subsystem and mdev. How would mdev
integrate with another subsystem that maybe only has 9-chars
available? Would the vendor driver API specify "I need an alias" or
would it specify "I need an X-char length alias"? Does it make sense
that mdev-core would fail creation of a device if there's a collision
in the 12-char address space between different subsystems? For
example, does enm0123456789ab really collide with xyz0123456789ab? So
if mdev were to provided a 40-char sha1, is it possible that the vendor
driver could consume this in its create callback, truncate it to the
number of chars required by the vendor driver's subsystem, and
determine whether a collision exists?
> > > I do not understand how an extra character reduces collision, if
> > > that's what you meant.
> >
> > If the default were for example 3-chars, we might already have
> > device 'abc'. A collision would expose one more char of the new
> > device, so we might add device with alias 'abcd'. I mentioned
> > previously that this leaves an issue for userspace that we can't
> > change the alias of device abc, so without additional information,
> > userspace can only determine via elimination the mapping of alias
> > to device, but userspace has more information available to it in
> > the form of sysfs links.
> > > Module options are almost not encouraged anymore with other
> > > subsystems/drivers.
> >
> > We don't live in a world of absolutes. I agree that the defaults
> > should work in the vast majority of cases. Requiring a user to
> > twiddle module options to make things work is undesirable, verging
> > on a bug. A module option to enable some specific feature, unsafe
> > condition, or test that is outside of the typical use case is
> > reasonable, imo.
> > > For testing collision rate, a sample user space script and sample
> > > mtty is easy and get us collision count too. We shouldn't put
> > > that using module option in production kernel. I practically have
> > > the code ready to play with; Changing 12 to smaller value is easy
> > > with module reload.
> > >
> > > #define MDEV_ALIAS_LEN 12
> >
> > If it can't be tested with a shipping binary, it probably won't be
> > tested. Thanks,
> It is not the role of mdev core to expose collision
> efficiency/deficiency of the sha1. It can be tested outside before
> mdev choose to use it.
The testing I'm considering is the user and kernel response to a
collision.
> I am saying we should test with 12 characters with 10,000 or more
> devices and see how collision occurs. Even if collision occurs, mdev
> returns EEXIST status indicating user to pick a different UUID for
> those rare conditions.
The only way we're going to see collision with a 12-char sha1 is if we
burn the CPU cycles to find uuids that collide in that space. 10,000
devices is not remotely enough to generate a collision in that address
space. That puts a prerequisite in place that in order to test
collision, someone needs to know certain magic inputs. OTOH, if we
could use a shorter abbreviation, collisions are trivial to test
experimentally. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH net-next 6/6] net: dsa: clear VLAN flags for CPU port
From: Florian Fainelli @ 2019-08-23 17:00 UTC (permalink / raw)
To: Vladimir Oltean, Vivien Didelot, netdev; +Cc: davem, andrew
In-Reply-To: <aee63928-a99e-3849-c8b4-dee9b660247c@gmail.com>
On 8/22/19 4:51 PM, Vladimir Oltean wrote:
> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>> When the bridge offloads a VLAN on a slave port, we also need to
>> program its dedicated CPU port as a member of the VLAN.
>>
>> Drivers may handle the CPU port's membership as they want. For example,
>> Marvell as a special "Unmodified" mode to pass frames as is through
>> such ports.
>>
>> Even though DSA expects the drivers to handle the CPU port membership,
>> they are unlikely to program such VLANs untagged, and certainly not as
>> PVID. This patch clears the VLAN flags before programming the CPU port.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
>> Suggested-by: Vladimir Oltean <olteanv@gmail.com>
>> ---
>> net/dsa/slave.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index 8267c156a51a..48df48f76c67 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -332,6 +332,12 @@ static int dsa_slave_vlan_add(struct net_device
>> *dev,
>> if (err)
>> return err;
>> + /* We need the dedicated CPU port to be a member of the VLAN as
>> well.
>> + * Even though drivers often handle CPU membership in special ways,
>> + * CPU ports are likely to be tagged, so clear the VLAN flags.
>> + */
>> + vlan.flags = 0;
>> +
>
> How does this work exactly?
> If I run 'sudo bridge vlan add vid 1 dev swp4 pvid untagged', then the
> CPU port starts sending VLAN-tagged traffic. I see this in tcpdump on
> the DSA master port, but if I tcpdump on swp4, the VLAN tag is removed.
> Who is doing that?
If vlan.flags = 0, then it does not have either BRIDGE_VLAN_INFO_PVID or
BRIDGE_VLAN_INFO_UNTAGGED which means the VLAN should be programmed
tagged on the CPU.
Since swp4 is part of the same VLAN, but has it configured PVID
untagged, the tag is removed, that sounds about what I would expect to
see...
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Florian Fainelli @ 2019-08-23 16:49 UTC (permalink / raw)
To: Vladimir Oltean; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpzSNZTK6-wQJkJCC9vs0hao12_tpQRLM5JLXXD_26c_w@mail.gmail.com>
On 8/23/19 9:32 AM, Vladimir Oltean wrote:
> Hi Florian,
>
> On Fri, 23 Aug 2019 at 19:23, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 8/22/19 4:44 PM, Vladimir Oltean wrote:
>>> On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>>>
>>>> Hi Vladimir,
>>>>
>>>> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
>>>>> Hi Vivien,
>>>>>
>>>>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>>>>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
>>>>>>
>>>>>> This function is used in the tag_8021q.c code to offload the PVID of
>>>>>> ports, which would simply not work if .port_vlan_add is not supported
>>>>>> by the underlying switch.
>>>>>>
>>>>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
>>>>>> that is to say in dsa_slave_vlan_rx_add_vid.
>>>>>>
>>>>>
>>>>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
>>>>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
>>>>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
>>>>> sub-interface nothing breaks, it just says:
>>>>> RTNETLINK answers: Operation not supported
>>>>> which IMO is expected.
>>>>
>>>> I do not know what you mean. This patch does not change the behavior of
>>>> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
>>>>
>>>
>>> Yes, but what's wrong with just forwarding -EOPNOTSUPP?
>>
>> It makes us fail adding the VLAN to the slave network device, which
>> sounds silly, if we can't offload it in HW, that's fine, we can still do
>> a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
>>
>> Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
>> feature bit only if we have the ability to offload, now that I think
>> about it. Would you want me to cook a patch doing that?
>
> sja1105 doesn't support offloading NETIF_F_HW_VLAN_CTAG_FILTER even
> though it does support programming VLANs.
The additional of the ndo_vlan_rx_{add,kill}_vid() is such that
standalone DSA ports continue to work while there is a bridge with
vlan_filtering=1 spanning other ports. In order for that ndo operation
to be called, we need to advertise the NETIF_F_HW_VLAN_CTAG_FILTER feature.
> Adding an offloaded VLAN sub-interface on a standalone switch port
> (vlan_filtering=0, uses dsa_8021q) would make the driver insert a VLAN
> entry whilst the TPID is ETH_P_DSA_8021Q.
> Maybe just let the driver set the netdev features, similar to how it
> does for the number of TX queues?
Why should we bend the framework because sja1105 and dsa_8021q are
special? Let me counter the argument: if the tagging is DSA_8021Q, why
not clear the feature then?
--
Florian
^ permalink raw reply
* Re: [PATCH bpf-next 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
From: Jonathan Lemon @ 2019-08-23 16:46 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-3-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The state variable was read, and written outside the control mutex
> (struct xdp_sock, mutex), without proper barriers and {READ,
> WRITE}_ONCE correctness.
>
> In this commit this issue is addressed, and the state member is now
> used a point of synchronization whether the socket is setup correctly
> or not.
>
> This also fixes a race, found by syzcaller, in xsk_poll() where umem
> could be accessed when stale.
>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP
> rings")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> net/xdp/xsk.c | 57
> ++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f3351013c2a5..31236e61069b 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
> struct xdp_buff *xdp, u32 len)
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + if (READ_ONCE(xs->state) == XSK_BOUND) {
> + /* Matches smp_wmb() in bind(). */
> + smp_rmb();
> + return true;
> + }
> + return false;
> +}
> +
> int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
> {
> u32 len;
>
> + if (!xsk_is_bound(xs))
> + return -EINVAL;
> +
> if (xs->dev != xdp->rxq->dev || xs->queue_id !=
> xdp->rxq->queue_index)
> return -EINVAL;
>
> @@ -362,6 +375,8 @@ static int xsk_sendmsg(struct socket *sock, struct
> msghdr *m, size_t total_len)
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> + if (unlikely(!xsk_is_bound(xs)))
> + return -ENXIO;
> if (unlikely(!xs->dev))
> return -ENXIO;
Can probably remove the xs->dev check now, replaced by checking
xs->state, right?
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> @@ -378,10 +393,15 @@ static unsigned int xsk_poll(struct file *file,
> struct socket *sock,
> struct poll_table_struct *wait)
> {
> unsigned int mask = datagram_poll(file, sock, wait);
> - struct sock *sk = sock->sk;
> - struct xdp_sock *xs = xdp_sk(sk);
> - struct net_device *dev = xs->dev;
> - struct xdp_umem *umem = xs->umem;
> + struct xdp_sock *xs = xdp_sk(sock->sk);
> + struct net_device *dev;
> + struct xdp_umem *umem;
> +
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> + dev = xs->dev;
> + umem = xs->umem;
>
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> @@ -417,10 +437,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (xs->state != XSK_BOUND)
> return;
> -
> - xs->state = XSK_UNBOUND;
> + WRITE_ONCE(xs->state, XSK_UNBOUND);
>
> /* Wait for driver to stop using the xdp socket. */
> xdp_del_sk_umem(xs->umem, xs);
> @@ -495,7 +514,9 @@ static int xsk_release(struct socket *sock)
> local_bh_enable();
>
> xsk_delete_from_maps(xs);
> + mutex_lock(&xs->mutex);
> xsk_unbind_dev(xs);
> + mutex_unlock(&xs->mutex);
>
> xskq_destroy(xs->rx);
> xskq_destroy(xs->tx);
> @@ -589,19 +610,18 @@ static int xsk_bind(struct socket *sock, struct
> sockaddr *addr, int addr_len)
> }
>
> umem_xs = xdp_sk(sock->sk);
> - if (!umem_xs->umem) {
> - /* No umem to inherit. */
> + if (!xsk_is_bound(umem_xs)) {
> err = -EBADF;
> sockfd_put(sock);
> goto out_unlock;
> - } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> + }
> + if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
> err = -EINVAL;
> sockfd_put(sock);
> goto out_unlock;
> }
> -
> xdp_get_umem(umem_xs->umem);
> - xs->umem = umem_xs->umem;
> + WRITE_ONCE(xs->umem, umem_xs->umem);
> sockfd_put(sock);
> } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
> err = -EINVAL;
> @@ -626,10 +646,15 @@ static int xsk_bind(struct socket *sock, struct
> sockaddr *addr, int addr_len)
> xdp_add_sk_umem(xs->umem, xs);
>
> out_unlock:
> - if (err)
> + if (err) {
> dev_put(dev);
> - else
> - xs->state = XSK_BOUND;
> + } else {
> + /* Matches smp_rmb() in bind() for shared umem
> + * sockets, and xsk_is_bound().
> + */
> + smp_wmb();
> + WRITE_ONCE(xs->state, XSK_BOUND);
> + }
> out_release:
> mutex_unlock(&xs->mutex);
> rtnl_unlock();
> @@ -869,7 +894,7 @@ static int xsk_mmap(struct file *file, struct
> socket *sock,
> unsigned long pfn;
> struct page *qpg;
>
> - if (xs->state != XSK_READY)
> + if (READ_ONCE(xs->state) != XSK_READY)
> return -EBUSY;
>
> if (offset == XDP_PGOFF_RX_RING) {
> --
> 2.20.1
^ permalink raw reply
* Re: [PATCH bpf-next 4/4] xsk: lock the control mutex in sock_diag interface
From: Jonathan Lemon @ 2019-08-23 16:44 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-5-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> When accessing the members of an XDP socket, the control mutex should
> be held. This commit fixes that.
>
> Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 3/4] xsk: avoid store-tearing when assigning umem
From: Jonathan Lemon @ 2019-08-23 16:44 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-4-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The umem member of struct xdp_sock is read outside of the control
> mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
> potentional store-tearing.
>
> Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 1/4] xsk: avoid store-tearing when assigning queues
From: Jonathan Lemon @ 2019-08-23 16:43 UTC (permalink / raw)
To: Björn Töpel
Cc: ast, daniel, netdev, Björn Töpel, magnus.karlsson,
magnus.karlsson, bpf, syzbot+c82697e3043781e08802, hdanton,
i.maximets
In-Reply-To: <20190822091306.20581-2-bjorn.topel@gmail.com>
On 22 Aug 2019, at 2:13, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
> potential store-tearing. These members are read outside of the control
> mutex in the mmap implementation.
>
> Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH net v2] openvswitch: Fix conntrack cache with timeout
From: Yi-Hung Wei @ 2019-08-23 16:40 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAOrHB_A6Hn9o=8uzHQTp=cttMQsf=dYpobvq7C7_W398sw8UJA@mail.gmail.com>
On Thu, Aug 22, 2019 at 11:51 PM Pravin Shelar <pshelar@ovn.org> wrote:
>
> On Thu, Aug 22, 2019 at 1:28 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> >
> > This patch addresses a conntrack cache issue with timeout policy.
> > Currently, we do not check if the timeout extension is set properly in the
> > cached conntrack entry. Thus, after packet recirculate from conntrack
> > action, the timeout policy is not applied properly. This patch fixes the
> > aforementioned issue.
> >
> > Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > ---
> > v1->v2: Fix rcu dereference issue reported by kbuild test robot.
> > ---
> > net/openvswitch/conntrack.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > index 848c6eb55064..4d7896135e73 100644
> > --- a/net/openvswitch/conntrack.c
> > +++ b/net/openvswitch/conntrack.c
> > @@ -1657,6 +1666,10 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
> > ct_info.timeout))
> > pr_info_ratelimited("Failed to associated timeout "
> > "policy `%s'\n", ct_info.timeout);
> > + else
> > + ct_info.nf_ct_timeout = rcu_dereference(
> > + nf_ct_timeout_find(ct_info.ct)->timeout);
> Is this dereference safe from NULL pointer?
Hi Pravin,
Thanks for your review. I am not sure if
nf_ct_timeout_find(ct_info.ct) will return NULL in this case.
We only run into this statement when ct_info.timeout[0] is set, and it
is only set in parse_ct() when CONFIG_NF_CONNTRACK_TIMEOUT is
configured. Also, in this else condition the timeout extension is
supposed to be set properly by nf_ct_set_timeout().
Am I missing something?
Thanks,
-Yi-Hung
^ permalink raw reply
* [PATCH] net: fix skb use after free in netpoll_send_skb_on_dev
From: Feng Sun @ 2019-08-23 16:32 UTC (permalink / raw)
To: davem
Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
sakari.ailus, netdev, linux-kernel, Feng Sun, Xiaojun Zhao
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into two use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo->txq.
hit the first case with following kernel log:
[ 117.864773] kernel BUG at mm/slub.c:306!
[ 117.864773] invalid opcode: 0000 [#1] SMP PTI
[ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231
[ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[ 117.864781] Call Trace:
[ 117.864781] ? tun_net_xmit+0x21c/0x460
[ 117.864781] kfree_skbmem+0x4e/0x60
[ 117.864782] kfree_skb+0x3a/0xa0
[ 117.864782] tun_net_xmit+0x21c/0x460
[ 117.864782] netpoll_start_xmit+0x11d/0x1b0
[ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200
[ 117.864789] __br_forward+0x1b9/0x1e0 [bridge]
[ 117.864789] ? skb_clone+0x53/0xd0
[ 117.864790] ? __skb_clone+0x2e/0x120
[ 117.864790] deliver_clone+0x37/0x50 [bridge]
[ 117.864790] maybe_deliver+0x89/0xc0 [bridge]
[ 117.864791] br_flood+0x6c/0x130 [bridge]
[ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge]
[ 117.864792] netpoll_start_xmit+0x11d/0x1b0
[ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200
[ 117.864792] netpoll_send_udp+0x2c6/0x3e8
[ 117.864793] write_msg+0xd9/0xf0 [netconsole]
[ 117.864793] console_unlock+0x386/0x4e0
[ 117.864793] vprintk_emit+0x17e/0x280
[ 117.864794] vprintk_default+0x29/0x50
[ 117.864794] vprintk_func+0x4c/0xbc
[ 117.864794] printk+0x58/0x6f
[ 117.864795] loop_fun+0x24/0x41 [printmsg_loop]
[ 117.864795] kthread+0x104/0x140
[ 117.864795] ? 0xffffffffc05b1000
[ 117.864796] ? kthread_park+0x80/0x80
[ 117.864796] ret_from_fork+0x35/0x40
Signed-off-by: Feng Sun <loyou85@gmail.com>
Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>
---
net/core/netpoll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2cf27da..b4bffe6 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -335,7 +335,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
HARD_TX_UNLOCK(dev, txq);
- if (status == NETDEV_TX_OK)
+ if (status == NETDEV_TX_OK || status == NET_XMIT_DROP)
break;
}
@@ -352,7 +352,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
}
- if (status != NETDEV_TX_OK) {
+ if (status != NETDEV_TX_OK && status != NET_XMIT_DROP) {
skb_queue_tail(&npinfo->txq, skb);
schedule_delayed_work(&npinfo->tx_work,0);
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Vladimir Oltean @ 2019-08-23 16:32 UTC (permalink / raw)
To: Florian Fainelli; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <2a43ee4c-0e20-1037-d856-3945d516ea7b@gmail.com>
Hi Florian,
On Fri, 23 Aug 2019 at 19:23, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 8/22/19 4:44 PM, Vladimir Oltean wrote:
> > On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
> >>
> >> Hi Vladimir,
> >>
> >> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> >>> Hi Vivien,
> >>>
> >>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
> >>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
> >>>>
> >>>> This function is used in the tag_8021q.c code to offload the PVID of
> >>>> ports, which would simply not work if .port_vlan_add is not supported
> >>>> by the underlying switch.
> >>>>
> >>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
> >>>> that is to say in dsa_slave_vlan_rx_add_vid.
> >>>>
> >>>
> >>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
> >>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
> >>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
> >>> sub-interface nothing breaks, it just says:
> >>> RTNETLINK answers: Operation not supported
> >>> which IMO is expected.
> >>
> >> I do not know what you mean. This patch does not change the behavior of
> >> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
> >>
> >
> > Yes, but what's wrong with just forwarding -EOPNOTSUPP?
>
> It makes us fail adding the VLAN to the slave network device, which
> sounds silly, if we can't offload it in HW, that's fine, we can still do
> a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
>
> Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
> feature bit only if we have the ability to offload, now that I think
> about it. Would you want me to cook a patch doing that?
sja1105 doesn't support offloading NETIF_F_HW_VLAN_CTAG_FILTER even
though it does support programming VLANs.
Adding an offloaded VLAN sub-interface on a standalone switch port
(vlan_filtering=0, uses dsa_8021q) would make the driver insert a VLAN
entry whilst the TPID is ETH_P_DSA_8021Q.
Maybe just let the driver set the netdev features, similar to how it
does for the number of TX queues?
> --
> Florian
Regards,
-Vladimir
^ permalink raw reply
* Re: [PATCH net-next] dpaa2-eth: Add pause frame support
From: Andrew Lunn @ 2019-08-23 16:30 UTC (permalink / raw)
To: Ioana Radulescu; +Cc: netdev, davem, ioana.ciornei
In-Reply-To: <1566573579-9940-1-git-send-email-ruxandra.radulescu@nxp.com>
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ethtool.c
> @@ -78,29 +78,20 @@ static int
> dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
> struct ethtool_link_ksettings *link_settings)
> {
> - struct dpni_link_state state = {0};
> - int err = 0;
> struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
>
> - err = dpni_get_link_state(priv->mc_io, 0, priv->mc_token, &state);
> - if (err) {
> - netdev_err(net_dev, "ERROR %d getting link state\n", err);
> - goto out;
> - }
> -
> /* At the moment, we have no way of interrogating the DPMAC
> * from the DPNI side - and for that matter there may exist
> * no DPMAC at all. So for now we just don't report anything
> * beyond the DPNI attributes.
> */
> - if (state.options & DPNI_LINK_OPT_AUTONEG)
> + if (priv->link_state.options & DPNI_LINK_OPT_AUTONEG)
> link_settings->base.autoneg = AUTONEG_ENABLE;
> - if (!(state.options & DPNI_LINK_OPT_HALF_DUPLEX))
> + if (!(priv->link_state.options & DPNI_LINK_OPT_HALF_DUPLEX))
> link_settings->base.duplex = DUPLEX_FULL;
> - link_settings->base.speed = state.rate;
> + link_settings->base.speed = priv->link_state.rate;
>
> -out:
> - return err;
> + return 0;
Hi Ioana
I think this patch can be broken up a bit, to help review.
It looks like this change to report state via priv->link_state should
be a separate patch. I think this change can be made without the pause
changes. That then makes the pause changes themselves simpler.
> +static void dpaa2_eth_get_pauseparam(struct net_device *net_dev,
> + struct ethtool_pauseparam *pause)
> +{
> + struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
> + u64 link_options = priv->link_state.options;
> +
> + pause->rx_pause = !!(link_options & DPNI_LINK_OPT_PAUSE);
> + pause->tx_pause = pause->rx_pause ^
> + !!(link_options & DPNI_LINK_OPT_ASYM_PAUSE);
Since you don't support auto-neg, you should set pause->autoneg to
false. It probably already is set to false, by a memset, but setting
it explicitly is a form of documentation, this hardware only supports
forced pause configuration.
> +}
> +
> +static int dpaa2_eth_set_pauseparam(struct net_device *net_dev,
> + struct ethtool_pauseparam *pause)
> +{
> + struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
> + struct dpni_link_cfg cfg = {0};
> + int err;
> +
> + if (!dpaa2_eth_has_pause_support(priv)) {
> + netdev_info(net_dev, "No pause frame support for DPNI version < %d.%d\n",
> + DPNI_PAUSE_VER_MAJOR, DPNI_PAUSE_VER_MINOR);
> + return -EOPNOTSUPP;
> + }
> +
> + if (pause->autoneg)
> + netdev_err(net_dev, "Pause frame autoneg not supported\n");
And here you should return -EOPNOTSUPP. No need for an netdev_err. It
is not an error, you simply don't support it.
There is also the issue of what is the PHY doing? It would not be good
if the MAC is configured one way, but the PHY is advertising something
else. So it appears you have no control over the PHY. But i assume you
know what the PHY is actually doing? Does it advertise pause/asym
pause?
It might be, to keep things consistent, we only accept pause
configuration when auto-neg in general is disabled.
Andrew
^ permalink raw reply
* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Florian Fainelli @ 2019-08-23 16:23 UTC (permalink / raw)
To: Vladimir Oltean, Vivien Didelot; +Cc: netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpgCJ9oKwQxzu62hmvyCOyDZ52R5fYnejprGHWeZR7L6Q@mail.gmail.com>
On 8/22/19 4:44 PM, Vladimir Oltean wrote:
> On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>
>> Hi Vladimir,
>>
>> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
>>> Hi Vivien,
>>>
>>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
>>>>
>>>> This function is used in the tag_8021q.c code to offload the PVID of
>>>> ports, which would simply not work if .port_vlan_add is not supported
>>>> by the underlying switch.
>>>>
>>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
>>>> that is to say in dsa_slave_vlan_rx_add_vid.
>>>>
>>>
>>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
>>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
>>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
>>> sub-interface nothing breaks, it just says:
>>> RTNETLINK answers: Operation not supported
>>> which IMO is expected.
>>
>> I do not know what you mean. This patch does not change the behavior of
>> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
>>
>
> Yes, but what's wrong with just forwarding -EOPNOTSUPP?
It makes us fail adding the VLAN to the slave network device, which
sounds silly, if we can't offload it in HW, that's fine, we can still do
a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
feature bit only if we have the ability to offload, now that I think
about it. Would you want me to cook a patch doing that?
--
Florian
^ permalink raw reply
* Re: [PATCH net-next] drop_monitor: Make timestamps y2038 safe
From: Neil Horman @ 2019-08-23 16:17 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, arnd, andrew, ayal, mlxsw, Ido Schimmel
In-Reply-To: <20190823154721.9927-1-idosch@idosch.org>
On Fri, Aug 23, 2019 at 06:47:21PM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
>
> Timestamps are currently communicated to user space as 'struct
> timespec', which is not considered y2038 safe since it uses a 32-bit
> signed value for seconds.
>
> Fix this while the API is still not part of any official kernel release
> by using 64-bit nanoseconds timestamps instead.
>
> Fixes: ca30707dee2b ("drop_monitor: Add packet alert mode")
> Fixes: 5e58109b1ea4 ("drop_monitor: Add support for packet alert mode for hardware drops")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
> Arnd, I have followed your recommendation to use 64-bit nanoseconds
> timestamps. I would appreciate it if you could review this change.
>
> Thanks!
> ---
> include/uapi/linux/net_dropmon.h | 2 +-
> net/core/drop_monitor.c | 14 ++++++--------
> 2 files changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h
> index 75a35dccb675..8bf79a9eb234 100644
> --- a/include/uapi/linux/net_dropmon.h
> +++ b/include/uapi/linux/net_dropmon.h
> @@ -75,7 +75,7 @@ enum net_dm_attr {
> NET_DM_ATTR_PC, /* u64 */
> NET_DM_ATTR_SYMBOL, /* string */
> NET_DM_ATTR_IN_PORT, /* nested */
> - NET_DM_ATTR_TIMESTAMP, /* struct timespec */
> + NET_DM_ATTR_TIMESTAMP, /* u64 */
> NET_DM_ATTR_PROTO, /* u16 */
> NET_DM_ATTR_PAYLOAD, /* binary */
> NET_DM_ATTR_PAD,
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index bfc024024aa3..cc60cc22e2db 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -552,7 +552,7 @@ static size_t net_dm_packet_report_size(size_t payload_len)
> /* NET_DM_ATTR_IN_PORT */
> net_dm_in_port_size() +
> /* NET_DM_ATTR_TIMESTAMP */
> - nla_total_size(sizeof(struct timespec)) +
> + nla_total_size(sizeof(u64)) +
> /* NET_DM_ATTR_ORIG_LEN */
> nla_total_size(sizeof(u32)) +
> /* NET_DM_ATTR_PROTO */
> @@ -592,7 +592,6 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
> u64 pc = (u64)(uintptr_t) NET_DM_SKB_CB(skb)->pc;
> char buf[NET_DM_MAX_SYMBOL_LEN];
> struct nlattr *attr;
> - struct timespec ts;
> void *hdr;
> int rc;
>
> @@ -615,8 +614,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
> if (rc)
> goto nla_put_failure;
>
> - if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
> - nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
> + if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
> + ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
> goto nla_put_failure;
>
> if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
> @@ -716,7 +715,7 @@ net_dm_hw_packet_report_size(size_t payload_len,
> /* NET_DM_ATTR_IN_PORT */
> net_dm_in_port_size() +
> /* NET_DM_ATTR_TIMESTAMP */
> - nla_total_size(sizeof(struct timespec)) +
> + nla_total_size(sizeof(u64)) +
> /* NET_DM_ATTR_ORIG_LEN */
> nla_total_size(sizeof(u32)) +
> /* NET_DM_ATTR_PROTO */
> @@ -730,7 +729,6 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
> {
> struct net_dm_hw_metadata *hw_metadata;
> struct nlattr *attr;
> - struct timespec ts;
> void *hdr;
>
> hw_metadata = NET_DM_SKB_CB(skb)->hw_metadata;
> @@ -761,8 +759,8 @@ static int net_dm_hw_packet_report_fill(struct sk_buff *msg,
> goto nla_put_failure;
> }
>
> - if (ktime_to_timespec_cond(skb->tstamp, &ts) &&
> - nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts))
> + if (nla_put_u64_64bit(msg, NET_DM_ATTR_TIMESTAMP,
> + ktime_to_ns(skb->tstamp), NET_DM_ATTR_PAD))
> goto nla_put_failure;
>
> if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len))
> --
> 2.21.0
>
>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-23 16:14 UTC (permalink / raw)
To: Alex Williamson
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <20190823095229.210e1e84@x1.home>
> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 23, 2019 9:22 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
>
> On Fri, 23 Aug 2019 14:53:06 +0000
> Parav Pandit <parav@mellanox.com> wrote:
>
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, August 23, 2019 7:58 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 08:14:39 +0000
> > > Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > <cohuck@redhat.com>;
> > > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > > > >
> > > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > > <cohuck@redhat.com>;
> > > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > >> core
> > > > > >>
> > > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > > > >> >
> > > > > >> >
> > > > > >> >> -----Original Message-----
> > > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> <cohuck@redhat.com>;
> > > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > >> >> core
> > > > > >> >>
> > > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com
> wrote:
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >> -----Original Message-----
> > > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> >> <cohuck@redhat.com>;
> > > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > >> >> >> mdev core
> > > > > >> >> >>
> > > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST,
> > > > > >> >> >> parav@mellanox.com
> > > wrote:
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> >> -----Original Message-----
> > > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia
> > > > > >> >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > >> >> >> >> mdev core
> > > > > >> >> >> >>
> > > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's
> set.
> > > > > >> >> >> >> > > > > In fact, proposing that the user does not
> > > > > >> >> >> >> > > > > set it, mdev-core provides one
> > > > > >> >> >> >> > > automatically.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > > >> >> >> >> > > > > > > overhead, as I ask about above in how
> > > > > >> >> >> >> > > > > > > many characters we actually have to work
> > > > > >> >> >> >> > > > > > > with in IFNAMESZ, maybe we start with
> > > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > > > >> >> >> disambiguation.
> > > > > >> >> >> >> > > > > > > If we can eliminate overhead in
> > > > > >> >> >> >> > > > > > > IFNAMESZ, let's start with
> > > > > >> 12.
> > > > > >> >> >> >> > > > > > > Thanks,
> > > > > >> >> >> >> > > > > > >
> > > > > >> >> >> >> > > > > > If user is going to choose the alias, why
> > > > > >> >> >> >> > > > > > does it have to be limited to
> > > > > >> >> >> >> sha1?
> > > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > > >> >> >> >> > > > > >
> > > > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > > >> >> >> >> > > > > abbreviated sha1. The user does not provide
> > > > > >> >> >> >> > > > > the
> > > > > >> >> >> >> alias.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > > Instead of mdev imposing number of
> > > > > >> >> >> >> > > > > > characters on the alias, it should be best
> > > > > >> >> >> >> > > > > left to the user.
> > > > > >> >> >> >> > > > > > Because in future if netdev improves on
> > > > > >> >> >> >> > > > > > the naming scheme, mdev will be
> > > > > >> >> >> >> > > > > limiting it, which is not right.
> > > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > > >> >> >> >> > > > > > User configuring mdev for networking
> > > > > >> >> >> >> > > > > > devices in a given kernel knows what
> > > > > >> >> >> >> > > > > user is doing.
> > > > > >> >> >> >> > > > > > So user can choose alias name size as it finds
> suitable.
> > > > > >> >> >> >> > > > >
> > > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > > >> >> >> >> > > > > Thanks,
> > > > > >> >> >> >> > > >
> > > > > >> >> >> >> > > > I understood your point. But mdev doesn't know
> > > > > >> >> >> >> > > > how user is going to use
> > > > > >> >> >> >> > > udev/systemd to name the netdev.
> > > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters,
> > > > > >> >> >> >> > > > it could result in
> > > > > >> >> collision.
> > > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > > >> >> >> >> > > > user, as user know the best
> > > > > >> >> >> >> > > policy for its use case in the environment its using.
> > > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > > > >> >> >> >> > >
> > > > > >> >> >> >> > > Haven't you already provided examples where
> > > > > >> >> >> >> > > certain drivers or subsystems have unique netdev
> prefixes?
> > > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev
> > > > > >> >> >> >> > > prefix for the mdev subsystem and avoid all
> > > > > >> >> >> >> > > other collisions? I'm not in favor of the user
> > > > > >> >> >> >> > > providing both a uuid and an alias/instance.
> > > > > >> >> >> >> > > Thanks,
> > > > > >> >> >> >> > >
> > > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > > >> >> >> >> > first 9 characters have
> > > > > >> >> >> >> collision?
> > > > > >> >> >> >>
> > > > > >> >> >> >> I think it would be a mistake to waste so many chars
> > > > > >> >> >> >> on a prefix, but
> > > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a
> > > > > >> >> >> >> collision before we have 10s of thousands of
> > > > > >> >> >> >> devices. Thanks,
> > > > > >> >> >> >>
> > > > > >> >> >> >> Alex
> > > > > >> >> >> >
> > > > > >> >> >> >Jiri, Dave,
> > > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > > >> >> >> >
> > > > > >> >> >> >This will be supplied during devlink port attr set
> > > > > >> >> >> >such as,
> > > > > >> >> >> >
> > > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > > >> >> >> >const char *mdev_alias);
> > > > > >> >> >> >
> > > > > >> >> >> >This alias is used to generate representor netdev's
> > > phys_port_name.
> > > > > >> >> >> >This alias from the mdev device's sysfs will be used
> > > > > >> >> >> >by the udev/systemd to
> > > > > >> >> >> generate predicable netdev's name.
> > > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > > > >> >> >>
> > > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > > >> >> >>
> > > > > >> >> >Since users sees two devices with same phys_port_name,
> > > > > >> >> >user should destroy
> > > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > > >> >>
> > > > > >> >> Driver should make sure phys port name wont collide,
> > > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > > >> >alias and if there
> > > > > >> is any other mdev with same alias exist, it returns -EEXIST
> > > > > >> error before progressing further.
> > > > > >> >This way user will get to know upfront in event of collision
> > > > > >> >before the mdev
> > > > > >> device gets created.
> > > > > >> >How about that?
> > > > > >>
> > > > > >> Sounds fine to me. Now the question is how many chars do we
> > > > > >> want to
> > > have.
> > > > > >>
> > > > > >12 characters from Alex's suggestion similar to git?
> > > > >
> > > > > Ok.
> > > > >
> > > >
> > > > Can you please confirm this scheme looks good now? I like to get
> > > > patches
> > > started.
> > >
> > > My only concern is your comment that in the event of an abbreviated
> > > sha1 collision (as exceptionally rare as that might be at 12-chars),
> > > we'd fail the device create, while my original suggestion was that
> > > vfio-core would add an extra character to the alias. For
> > > non-networking devices, the sha1 is unnecessary, so the extension
> > > behavior seems preferred. The user is only responsible to provide a
> > > unique uuid. Perhaps the failure behavior could be applied based on
> > > the mdev device_api. A module option on mdev to specify the default
> > > number of alias chars would also be useful for testing so that we
> > > can set it low enough to validate the collision behavior. Thanks,
> > >
> >
> > Idea is to have mdev alias as optional.
> > Each mdev_parent says whether it wants mdev_core to generate an alias
> > or not. So only networking device drivers would set it to true.
> > For rest, alias won't be generated, and won't be compared either
> > during creation time. User continue to provide only uuid.
>
> Ok
>
> > I am tempted to have alias collision detection only within children
> > mdevs of the same parent, but doing so will always mandate to prefix
> > in netdev name. And currently we are left with only 3 characters to
> > prefix it, so that may not be good either. Hence, I think mdev core
> > wide alias is better with 12 characters.
>
> I suppose it depends on the API, if the vendor driver can ask the mdev core for
> an alias as part of the device creation process, then it could manage the netdev
> namespace for all its devices, choosing how many characters to use, and fail
> the creation if it can't meet a uniqueness requirement. IOW, mdev-core would
> always provide a full sha1 and therefore gets itself out of the
> uniqueness/collision aspects.
>
This doesn't work. At mdev core level 20 bytes sha1 are unique, so mdev core allowed to create a mdev.
And then devlink core chooses only 6 bytes (12 characters) and there is collision. Things fall apart.
Since mdev provides unique uuid based scheme, it's the mdev core's ownership to provide unique aliases.
> > I do not understand how an extra character reduces collision, if
> > that's what you meant.
>
> If the default were for example 3-chars, we might already have device 'abc'. A
> collision would expose one more char of the new device, so we might add
> device with alias 'abcd'. I mentioned previously that this leaves an issue for
> userspace that we can't change the alias of device abc, so without additional
> information, userspace can only determine via elimination the mapping of alias
> to device, but userspace has more information available to it in the form of
> sysfs links.
>
> > Module options are almost not encouraged anymore with other
> > subsystems/drivers.
>
> We don't live in a world of absolutes. I agree that the defaults should work in
> the vast majority of cases. Requiring a user to twiddle module options to make
> things work is undesirable, verging on a bug. A module option to enable some
> specific feature, unsafe condition, or test that is outside of the typical use case
> is reasonable, imo.
>
> > For testing collision rate, a sample user space script and sample mtty
> > is easy and get us collision count too. We shouldn't put that using
> > module option in production kernel. I practically have the code ready
> > to play with; Changing 12 to smaller value is easy with module reload.
> >
> > #define MDEV_ALIAS_LEN 12
>
> If it can't be tested with a shipping binary, it probably won't be tested. Thanks,
>
It is not the role of mdev core to expose collision efficiency/deficiency of the sha1.
It can be tested outside before mdev choose to use it.
I am saying we should test with 12 characters with 10,000 or more devices and see how collision occurs.
Even if collision occurs, mdev returns EEXIST status indicating user to pick a different UUID for those rare conditions.
^ permalink raw reply
* Re: [PATCH] ethernet: Delete unnecessary checks before the macro call “dev_kfree_skb”
From: Scott Branden @ 2019-08-23 16:10 UTC (permalink / raw)
To: Christophe JAILLET, Markus Elfring, netdev, linux-arm-kernel,
linux-stm32, intel-wired-lan, bcm-kernel-feedback-list,
UNGLinuxDriver, Alexandre Torgue, Alexios Zavras, Allison Randal,
Bryan Whitehead, Claudiu Manoil, David S. Miller, Doug Berger,
Douglas Miller, Florian Fainelli, Giuseppe Cavallaro,
Greg Kroah-Hartman, Jeff Kirsher, Jilayne Lovejoy, Jonathan Lemon,
Jose Abreu, Kate Stewart
Cc: kernel-janitors, LKML
In-Reply-To: <4ab7f2a5-f472-f462-9d4c-7c8d5237c44e@wanadoo.fr>
On 2019-08-23 7:08 a.m., Christophe JAILLET wrote:
> Hi,
>
> in this patch, there is one piece that looked better before. (see below)
>
> Removing the 'if (skb)' is fine, but concatening everything in one
> statement just to save 2 variables and a few LOC is of no use, IMHO,
> and the code is less readable.
Agreed.
>
> just my 2c.
>
>
> CJ
>
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index d3a0b614dbfa..8b19ddcdafaa 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -2515,19 +2515,14 @@ static int bcmgenet_dma_teardown(struct
> bcmgenet_priv *priv)
> static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
> {
> struct netdev_queue *txq;
> - struct sk_buff *skb;
> - struct enet_cb *cb;
> int i;
>
> bcmgenet_fini_rx_napi(priv);
> bcmgenet_fini_tx_napi(priv);
>
> - for (i = 0; i < priv->num_tx_bds; i++) {
> - cb = priv->tx_cbs + i;
> - skb = bcmgenet_free_tx_cb(&priv->pdev->dev, cb);
> - if (skb)
> - dev_kfree_skb(skb);
> - }
> + for (i = 0; i < priv->num_tx_bds; i++)
> + dev_kfree_skb(bcmgenet_free_tx_cb(&priv->pdev->dev,
> + priv->tx_cbs + i));
>
> for (i = 0; i < priv->hw_params->tx_queues; i++) {
> txq = netdev_get_tx_queue(priv->dev, priv->tx_rings[i].queue);
^ permalink raw reply
* Re: libbpf distro packaging
From: Alexei Starovoitov @ 2019-08-23 16:00 UTC (permalink / raw)
To: Jiri Olsa, Julia Kartseva
Cc: Andrii Nakryiko, labbott@redhat.com, acme@kernel.org,
debian-kernel@lists.debian.org, netdev@vger.kernel.org,
Andrey Ignatov, Yonghong Song, jolsa@kernel.org, Daniel Borkmann
In-Reply-To: <20190823092253.GA20775@krava>
On 8/23/19 2:22 AM, Jiri Olsa wrote:
> btw, the libbpf GH repo tag v0.0.4 has 0.0.3 version set in Makefile:
>
> VERSION = 0
> PATCHLEVEL = 0
> EXTRAVERSION = 3
>
> current code takes version from libbpf.map so it's fine,
> but would be great to start from 0.0.5 so we don't need to
> bother with rpm patches.. is 0.0.5 planned soon?
Technically we can bump it at any time.
The goal was to bump it only when new kernel is released
to capture a collection of new APIs in a given 0.0.X release.
So that libbpf versions are synchronized with kernel versions
in some what loose way.
In this case we can make an exception and bump it now.
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-23 15:57 UTC (permalink / raw)
To: Andrew Lunn
Cc: Nikolay Aleksandrov, roopa, davem, UNGLinuxDriver,
alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190823132538.GO13020@lunn.ch>
Hi Andrew
The 08/23/2019 15:25, Andrew Lunn wrote:
> External E-Mail
>
>
> > > > Why do the devices have to be from the same driver ?
> > After seeing yours and Andrews comments I realize that I try to do two
> > things, but I have only explained one of them.
> >
> > Here is what I was trying to do:
> > A. Prevent ports from going into promisc mode, if it is not needed.
>
> The switch definition is promisc is a bit odd. You really need to
> split it into two use cases.
>
> The Linux interface is not a member of a bridge. In this case, promisc
> mode would mean all frames ingressing the port should be forwarded to
> the CPU. Without promisc, you can program the hardware to just accept
> frames with the interfaces MAC address. So this is just the usual
> behaviour of an interface.
Yes, this is well understood.
>
> When the interface is part of the bridge, then you can turn on all the
> learning and not forward frames to the CPU, unless the CPU asks for
> them. But you need to watch out for various flags. By default, you
> should flood to the CPU, unknown destinations to the CPU etc. But some
> of these can be turned off by flags.
>
> > B. Prevent adding the CPU to the flood-mask (in Ocelot we have a
> > flood-mask controlling who should be included when flooding due to
> > destination unknown).
>
> So destination unknown should be flooded to the CPU. The CPU might
> know where to send the frame.
Exactly the CPU should be in the flood mask by default. But if the
network driver knows that CPU will not forward it anywhere else, then it
is safe to remove the CPU from the flood mask. The network driver will
know this by monitoring which interfaces are added to the bridge.
>
> > To solve item "B", the network driver needs to detect if there is a
> > foreign interfaces added to the bridge. If that is the case then to add
> > the CPU port to the flooding mask otherwise no.
>
> It is not just a foreign interface. What about the MAC address on the
> bridge interface?
I think the network driver will get this event and it can install a
entry in the MAC table to copy the frames to the CPU.
>
> > > > This is too specific targeting some devices.
> > Maybe I was wrong to mention specific HW in the commit message. The
> > purpose of the patch was to add an optimization (not to copy all the
> > frames to the CPU) for HW that is capable of learning and flooding the
> > frames.
>
> To some extent, this is also tied to your hardware not learning MAC
> addresses from frames passed from the CPU. You should also consider
> fixing this. The SW bridge does send out notifications when it
> adds/removes MAC addresses to its tables. You probably want to receive
> this modifications, and use them to program your hardware to forward
> frames to the CPU when needed.
Yes we will fix this. We will listen to the notifications and then update
the HW so it would send those frames to CPU.
Maybe intially I should just resend this patch, with the changes that I
mention previously. And after that to send a new patch series with all
these remarks that you mention here Andrew.
>
> Andrew
>
--
/Horatiu
^ permalink raw reply
* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-23 15:52 UTC (permalink / raw)
To: Parav Pandit
Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866867150DAABA422F25FF8D1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>
On Fri, 23 Aug 2019 14:53:06 +0000
Parav Pandit <parav@mellanox.com> wrote:
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, August 23, 2019 7:58 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Fri, 23 Aug 2019 08:14:39 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > Hi Alex,
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>; Kirti
> > > > Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > <cohuck@redhat.com>;
> > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >>
> > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:
> > > > >> >
> > > > >> >
> > > > >> >> -----Original Message-----
> > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > >> >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > >> >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> <cohuck@redhat.com>;
> > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >> >>
> > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com wrote:
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >> -----Original Message-----
> > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> >> <cohuck@redhat.com>;
> > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > >> >> >> core
> > > > >> >> >>
> > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST, parav@mellanox.com
> > wrote:
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >> -----Original Message-----
> > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia <cjia@nvidia.com>;
> > > > >> >> >> >> netdev@vger.kernel.org
> > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > >> >> >> >> mdev core
> > > > >> >> >> >>
> > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's set.
> > > > >> >> >> >> > > > > In fact, proposing that the user does not set
> > > > >> >> >> >> > > > > it, mdev-core provides one
> > > > >> >> >> >> > > automatically.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > >> >> >> >> > > > > > > overhead, as I ask about above in how many
> > > > >> >> >> >> > > > > > > characters we actually have to work with in
> > > > >> >> >> >> > > > > > > IFNAMESZ, maybe we start with
> > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > >> >> >> >> > > > > > > namespace) and expand as necessary for
> > > > >> >> >> disambiguation.
> > > > >> >> >> >> > > > > > > If we can eliminate overhead in IFNAMESZ,
> > > > >> >> >> >> > > > > > > let's start with
> > > > >> 12.
> > > > >> >> >> >> > > > > > > Thanks,
> > > > >> >> >> >> > > > > > >
> > > > >> >> >> >> > > > > > If user is going to choose the alias, why does
> > > > >> >> >> >> > > > > > it have to be limited to
> > > > >> >> >> >> sha1?
> > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > >> >> >> >> > > > > >
> > > > >> >> >> >> > > > > > It can be an alpha-numeric string.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > >> >> >> >> > > > > abbreviated sha1. The user does not provide the
> > > > >> >> >> >> alias.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > > Instead of mdev imposing number of characters
> > > > >> >> >> >> > > > > > on the alias, it should be best
> > > > >> >> >> >> > > > > left to the user.
> > > > >> >> >> >> > > > > > Because in future if netdev improves on the
> > > > >> >> >> >> > > > > > naming scheme, mdev will be
> > > > >> >> >> >> > > > > limiting it, which is not right.
> > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > >> >> >> >> > > > > > User configuring mdev for networking devices
> > > > >> >> >> >> > > > > > in a given kernel knows what
> > > > >> >> >> >> > > > > user is doing.
> > > > >> >> >> >> > > > > > So user can choose alias name size as it finds suitable.
> > > > >> >> >> >> > > > >
> > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > >> >> >> >> > > > > Thanks,
> > > > >> >> >> >> > > >
> > > > >> >> >> >> > > > I understood your point. But mdev doesn't know how
> > > > >> >> >> >> > > > user is going to use
> > > > >> >> >> >> > > udev/systemd to name the netdev.
> > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters, it
> > > > >> >> >> >> > > > could result in
> > > > >> >> collision.
> > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > >> >> >> >> > > > user, as user know the best
> > > > >> >> >> >> > > policy for its use case in the environment its using.
> > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.
> > > > >> >> >> >> > >
> > > > >> >> >> >> > > Haven't you already provided examples where certain
> > > > >> >> >> >> > > drivers or subsystems have unique netdev prefixes?
> > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev prefix
> > > > >> >> >> >> > > for the mdev subsystem and avoid all other
> > > > >> >> >> >> > > collisions? I'm not in favor of the user providing
> > > > >> >> >> >> > > both a uuid and an alias/instance. Thanks,
> > > > >> >> >> >> > >
> > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > >> >> >> >> > first 9 characters have
> > > > >> >> >> >> collision?
> > > > >> >> >> >>
> > > > >> >> >> >> I think it would be a mistake to waste so many chars on
> > > > >> >> >> >> a prefix, but
> > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a collision
> > > > >> >> >> >> before we have 10s of thousands of devices. Thanks,
> > > > >> >> >> >>
> > > > >> >> >> >> Alex
> > > > >> >> >> >
> > > > >> >> >> >Jiri, Dave,
> > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > >> >> >> >
> > > > >> >> >> >This will be supplied during devlink port attr set such
> > > > >> >> >> >as,
> > > > >> >> >> >
> > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > >> >> >> >const char *mdev_alias);
> > > > >> >> >> >
> > > > >> >> >> >This alias is used to generate representor netdev's
> > phys_port_name.
> > > > >> >> >> >This alias from the mdev device's sysfs will be used by
> > > > >> >> >> >the udev/systemd to
> > > > >> >> >> generate predicable netdev's name.
> > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>
> > > > >> >> >>
> > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > >> >> >>
> > > > >> >> >Since users sees two devices with same phys_port_name, user
> > > > >> >> >should destroy
> > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > >> >>
> > > > >> >> Driver should make sure phys port name wont collide,
> > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > >> >alias and if there
> > > > >> is any other mdev with same alias exist, it returns -EEXIST error
> > > > >> before progressing further.
> > > > >> >This way user will get to know upfront in event of collision
> > > > >> >before the mdev
> > > > >> device gets created.
> > > > >> >How about that?
> > > > >>
> > > > >> Sounds fine to me. Now the question is how many chars do we want to
> > have.
> > > > >>
> > > > >12 characters from Alex's suggestion similar to git?
> > > >
> > > > Ok.
> > > >
> > >
> > > Can you please confirm this scheme looks good now? I like to get patches
> > started.
> >
> > My only concern is your comment that in the event of an abbreviated
> > sha1 collision (as exceptionally rare as that might be at 12-chars), we'd fail the
> > device create, while my original suggestion was that vfio-core would add an
> > extra character to the alias. For non-networking devices, the sha1 is
> > unnecessary, so the extension behavior seems preferred. The user is only
> > responsible to provide a unique uuid. Perhaps the failure behavior could be
> > applied based on the mdev device_api. A module option on mdev to specify the
> > default number of alias chars would also be useful for testing so that we can set
> > it low enough to validate the collision behavior. Thanks,
> >
>
> Idea is to have mdev alias as optional.
> Each mdev_parent says whether it wants mdev_core to generate an alias
> or not. So only networking device drivers would set it to true.
> For rest, alias won't be generated, and won't be compared either
> during creation time. User continue to provide only uuid.
Ok
> I am tempted to have alias collision detection only within children
> mdevs of the same parent, but doing so will always mandate to prefix
> in netdev name. And currently we are left with only 3 characters to
> prefix it, so that may not be good either. Hence, I think mdev core
> wide alias is better with 12 characters.
I suppose it depends on the API, if the vendor driver can ask the mdev
core for an alias as part of the device creation process, then it could
manage the netdev namespace for all its devices, choosing how many
characters to use, and fail the creation if it can't meet a uniqueness
requirement. IOW, mdev-core would always provide a full sha1 and
therefore gets itself out of the uniqueness/collision aspects.
> I do not understand how an extra character reduces collision, if
> that's what you meant.
If the default were for example 3-chars, we might already have device
'abc'. A collision would expose one more char of the new device, so we
might add device with alias 'abcd'. I mentioned previously that this
leaves an issue for userspace that we can't change the alias of device
abc, so without additional information, userspace can only determine via
elimination the mapping of alias to device, but userspace has more
information available to it in the form of sysfs links.
> Module options are almost not encouraged
> anymore with other subsystems/drivers.
We don't live in a world of absolutes. I agree that the defaults
should work in the vast majority of cases. Requiring a user to twiddle
module options to make things work is undesirable, verging on a bug. A
module option to enable some specific feature, unsafe condition, or test
that is outside of the typical use case is reasonable, imo.
> For testing collision rate, a sample user space script and sample
> mtty is easy and get us collision count too. We shouldn't put that
> using module option in production kernel. I practically have the code
> ready to play with; Changing 12 to smaller value is easy with module
> reload.
>
> #define MDEV_ALIAS_LEN 12
If it can't be tested with a shipping binary, it probably won't be
tested. Thanks,
Alex
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox