* Re: [PATCH net] net: fec: Coverity issue: Dereference null return value
From: Paolo Abeni @ 2022-12-20 14:46 UTC (permalink / raw)
To: wei.fang, davem, edumazet, kuba, xiaoning.wang, shenwei.wang,
linux-imx
Cc: netdev, linux-kernel
In-Reply-To: <20221215091149.936369-1-wei.fang@nxp.com>
Hello,
On Thu, 2022-12-15 at 17:11 +0800, wei.fang@nxp.com wrote:
> From: Wei Fang <wei.fang@nxp.com>
>
> The build_skb might return a null pointer but there is no check on the
> return value in the fec_enet_rx_queue(). So a null pointer dereference
> might occur. To avoid this, we check the return value of build_skb. If
> the return value is a null pointer, the driver will recycle the page and
> update the statistic of ndev. Then jump to rx_processing_done to clear
> the status flags of the BD so that the hardware can recycle the BD.
>
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>
You need to include a suitable fixes tag here. Please repost adding it
and retaining Alex's reviwed-by tag, thanks!
Paolo
^ permalink raw reply
* Re: [PATCH] net: bridge: mcast: read ngrec once in igmp3/mld2 report
From: Paolo Abeni @ 2022-12-20 14:59 UTC (permalink / raw)
To: Nikolay Aleksandrov, Joy Gu, bridge
Cc: roopa, davem, edumazet, kuba, joern, netdev, linux-kernel
In-Reply-To: <05d630bf-7fa8-4495-6345-207f133ef746@blackwall.org>
On Tue, 2022-12-20 at 12:13 +0200, Nikolay Aleksandrov wrote:
> On 20/12/2022 04:48, Joy Gu wrote:
> > In br_ip4_multicast_igmp3_report() and br_ip6_multicast_mld2_report(),
> > "ih" or "mld2r" is a pointer into the skb header. It's dereferenced to
> > get "num", which is used in the for-loop condition that follows.
> >
> > Compilers are free to not spend a register on "num" and dereference that
> > pointer every time "num" would be used, i.e. every loop iteration. Which
> > would be a bug if pskb_may_pull() (called by ip_mc_may_pull() or
> > ipv6_mc_may_pull() in the loop body) were to change pointers pointing
> > into the skb header, e.g. by freeing "skb->head".
> >
> > We can avoid this by using READ_ONCE().
> >
> > Suggested-by: Joern Engel <joern@purestorage.com>
> > Signed-off-by: Joy Gu <jgu@purestorage.com>
> > ---
> > net/bridge/br_multicast.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
>
> I doubt any compiler would do that (partly due to the ntohs()).
I would say that any compiler behaving as described above is buggy, as
'skb' is modified inside the loop, and the header pointer is fetched
from the skb, it can't be considered invariant.
> If you have hit a bug or
> seen this with some compiler please provide more details, disassembly of the resulting
> code would be best.
Exactly. A more detailed description of the issue you see is necessary.
And very likely the proposed solution is not the correct one.
Cheers,
Paolo
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Enguerrand de Ribaucourt @ 2022-12-20 15:02 UTC (permalink / raw)
To: Heiner Kallweit
Cc: netdev, Paolo Abeni, woojung huh, davem, UNGLinuxDriver,
Andrew Lunn, Russell King - ARM Linux
In-Reply-To: <7ac42bd4-3088-5bd5-dcfc-c1e74466abb5@gmail.com>
> From: "Heiner Kallweit" <hkallweit1@gmail.com>
> To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
> "netdev" <netdev@vger.kernel.org>
> Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
> <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
> ARM Linux" <linux@armlinux.org.uk>
> Sent: Tuesday, December 20, 2022 3:40:15 PM
> Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
> phy_disable_interrupts()
> On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> > It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> > made non static. For consistency with the other exported functions in
> > this file, EXPORT_SYMBOL should be used.
> No, it wasn't forgotten. It's intentional. The function is supposed to
> be used within phylib only.
> None of the phylib maintainers was on the addressee list of your patch.
> Seems you didn't check with get_maintainers.pl.
> You should explain your use case to the phylib maintainers. Maybe lan78xx
> uses phylib in a wrong way, maybe an extension to phylib is needed.
> Best start with explaining why lan78xx_link_status_change() needs to
> fiddle with the PHY interrupt. It would help be helpful to understand
> what "chip" refers to in the comment. The MAC, or the PHY?
> Does the lan78xx code assume that a specific PHY is used, and the
> functionality would actually belong to the respective PHY driver?
Thank you for your swift reply,
The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
LAN7801 MAC driver) comes from a workaround by the original author which resets
the fixed speed in the PHY when the Ethernet cable is swapped. According to his
message, the link could not be correctly setup without this workaround.
Unfortunately, I don't have the cables to test the code without the workaround
and it's description doesn't explain what problem happens more precisely.
The PHY the original author used is a LAN8835. The workaround code directly
modified the interrupt configuration registers of this LAN8835 PHY within
lan78xx_link_status_change(). This caused problems if a different PHY was used
because the register at this address did not correspond to the interrupts
configuration. As suggested by the lan78xx.c maintainer, a generic function
should be used instead to toggle the interrupts of the PHY. However, it seems
that maybe the MAC driver shouldn't meddle with the PHY's interrupts according
to you. Would you consider this use case a valid one?
Enguerrand
> > Fixes: 3dd4ef1bdbac ("net: phy: make phy_disable_interrupts() non-static")
>> Signed-off-by: Enguerrand de Ribaucourt
> > <enguerrand.de-ribaucourt@savoirfairelinux.com>
> > ---
> > drivers/net/phy/phy.c | 1 +
> > 1 file changed, 1 insertion(+)
> > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> > index e5b6cb1a77f9..33250da76466 100644
> > --- a/drivers/net/phy/phy.c
> > +++ b/drivers/net/phy/phy.c
> > @@ -992,6 +992,7 @@ int phy_disable_interrupts(struct phy_device *phydev)
> > /* Disable PHY interrupts */
> > return phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
> > }
> > +EXPORT_SYMBOL(phy_disable_interrupts);
> > /**
> > * phy_interrupt - PHY interrupt handler
^ permalink raw reply
* Re: [net-next] ipv6: fix routing cache overflow for raw sockets
From: David Ahern @ 2022-12-20 15:10 UTC (permalink / raw)
To: Paolo Abeni, Jon Maxwell, davem
Cc: edumazet, kuba, yoshfuji, netdev, linux-kernel
In-Reply-To: <9f145202ca6a59b48d4430ed26a7ab0fe4c5dfaf.camel@redhat.com>
On 12/20/22 5:35 AM, Paolo Abeni wrote:
> On Mon, 2022-12-19 at 10:48 +1100, Jon Maxwell wrote:
>> Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
>> route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
>> consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
>> these warnings:
>>
>> [1] 99.187805] dst_alloc: 7728 callbacks suppressed
>> [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
>> .
>> .
>> [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
>
> If I read correctly, the maximum number of dst that the raw socket can
> use this way is limited by the number of packets it allows via the
> sndbuf limit, right?
>
> Are other FLOWI_FLAG_KNOWN_NH users affected, too? e.g. nf_dup_ipv6,
> ipvs, seg6?
>
> @DavidA: why do we need to create RTF_CACHE clones for KNOWN_NH flows?
>
> Thanks,
>
> Paolo
>
If I recall the details correctly: that sysctl limit was added back when
ipv6 routes were managed as dst_entries and there was a desire to allow
an admin to limit the memory consumed. At this point in time, IPv6 is
more inline with IPv4 - a separate struct for fib entries from dst
entries. That "Route cache is full" message is now out of date since
this is dst_entries which have a gc mechanism.
IPv4 does not limit the number of dst_entries that can be allocated
(ip_rt_max_size is the sysctl variable behind the ipv4 version of
max_size and it is a no-op). IPv6 can probably do the same here?
I do not believe the suggested flag is the right change.
^ permalink raw reply
* Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990
From: Dave Taht @ 2022-12-20 15:10 UTC (permalink / raw)
To: Abhishek Kumar
Cc: kvalo, ath10k, linux-wireless, linux-kernel, netdev,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Make-Wifi-fast
In-Reply-To: <20221220075215.1.Ic12e347e0d61a618124b742614e82bbd5d770173@changeid>
I am always interested in flent.org tcp_nup, tcp_ndown, and rrul_be
tests on wifi hardware. In AP mode, especially, against a few clients
in rtt_fair on the "ending the anomaly" test suite at the bottom of
this link: https://www.cs.kau.se/tohojo/airtime-fairness/ . Of these,
it's trying to optimize bandwidth more fairly and keep latencies low
when 4 or more stations are trying to transmit (in a world with 16 or
more stations online), that increasingly bothers me the most. I'm
seeing 5+ seconds on some rtt_fair-like tests nowadays.
I was also seeing huge simultaneous upload vs download disparities on
the latest kernels, on various threads over here:
https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002 and
more recently here:
https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605
I don't understand why napi with the default budget (64) is even
needed on the ath10k, as a single txop takes a minimum of ~200us, but
perhaps your patch will help. Still, measuring the TCP statistics
in-band would be nice to see. Some new tools are appearing that can do
this, Apple's goresponsiveness, crusader... that are simpler to use
than flent.
On Tue, Dec 20, 2022 at 12:17 AM Abhishek Kumar <kuabhs@chromium.org> wrote:
>
> NAPI poll can be done in threaded context along with soft irq
> context. Threaded context can be scheduled efficiently, thus
> creating less of bottleneck during Rx processing. This patch is
> to enable threaded NAPI on ath10k driver.
>
> Based on testing, it was observed that on WCN3990, the CPU0 reaches
> 100% utilization when napi runs in softirq context. At the same
> time the other CPUs are at low consumption percentage. This
> does not allow device to reach its maximum throughput potential.
> After enabling threaded napi, CPU load is balanced across all CPUs
> and following improvments were observed:
> - UDP_RX increase by ~22-25%
> - TCP_RX increase by ~15%
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> Signed-off-by: Abhishek Kumar <kuabhs@chromium.org>
> ---
>
> drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
> drivers/net/wireless/ath/ath10k/hw.h | 2 ++
> drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
> 3 files changed, 21 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 5eb131ab916fd..ee4b6ba508c81 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA988X_HW_2_0_VERSION,
> @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9887_HW_1_0_VERSION,
> @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_2_VERSION,
> @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_2_1_VERSION,
> @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_2_1_VERSION,
> @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_0_VERSION,
> @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_2_VERSION,
> @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA99X0_HW_2_0_DEV_VERSION,
> @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9984_HW_1_0_DEV_VERSION,
> @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9888_HW_2_0_DEV_VERSION,
> @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_0_DEV_VERSION,
> @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_1_DEV_VERSION,
> @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_1_DEV_VERSION,
> @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA4019_HW_1_0_DEV_VERSION,
> @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = WCN3990_HW_1_0_DEV_VERSION,
> @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = true,
> .use_fw_tx_credits = false,
> .delay_unmap_buffer = true,
> + .enable_threaded_napi = true,
> },
> };
>
> diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> index 9643031a4427a..adf3076b96503 100644
> --- a/drivers/net/wireless/ath/ath10k/hw.h
> +++ b/drivers/net/wireless/ath/ath10k/hw.h
> @@ -639,6 +639,8 @@ struct ath10k_hw_params {
> bool use_fw_tx_credits;
>
> bool delay_unmap_buffer;
> +
> + bool enable_threaded_napi;
> };
>
> struct htt_resp;
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index cfcb759a87dea..b94150fb6ef06 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
>
> bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
>
> + if (ar->hw_params.enable_threaded_napi)
> + dev_set_threaded(&ar->napi_dev, true);
> +
> ath10k_core_napi_enable(ar);
> ath10k_snoc_irq_enable(ar);
> ath10k_snoc_rx_post(ar);
> --
> 2.39.0.314.g84b9a713c41-goog
>
--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Andrew Lunn @ 2022-12-20 15:17 UTC (permalink / raw)
To: Enguerrand de Ribaucourt
Cc: Heiner Kallweit, netdev, Paolo Abeni, woojung huh, davem,
UNGLinuxDriver, Russell King - ARM Linux
In-Reply-To: <1721908413.470634.1671548576554.JavaMail.zimbra@savoirfairelinux.com>
On Tue, Dec 20, 2022 at 10:02:56AM -0500, Enguerrand de Ribaucourt wrote:
> > From: "Heiner Kallweit" <hkallweit1@gmail.com>
> > To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
> > "netdev" <netdev@vger.kernel.org>
> > Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
> > <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> > <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
> > ARM Linux" <linux@armlinux.org.uk>
> > Sent: Tuesday, December 20, 2022 3:40:15 PM
> > Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
> > phy_disable_interrupts()
>
> > On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> > > It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> > > made non static. For consistency with the other exported functions in
> > > this file, EXPORT_SYMBOL should be used.
>
> > No, it wasn't forgotten. It's intentional. The function is supposed to
> > be used within phylib only.
>
> > None of the phylib maintainers was on the addressee list of your patch.
> > Seems you didn't check with get_maintainers.pl.
>
> > You should explain your use case to the phylib maintainers. Maybe lan78xx
> > uses phylib in a wrong way, maybe an extension to phylib is needed.
> > Best start with explaining why lan78xx_link_status_change() needs to
> > fiddle with the PHY interrupt. It would help be helpful to understand
> > what "chip" refers to in the comment. The MAC, or the PHY?
> > Does the lan78xx code assume that a specific PHY is used, and the
> > functionality would actually belong to the respective PHY driver?
>
> Thank you for your swift reply,
>
> The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
> LAN7801 MAC driver) comes from a workaround by the original author which resets
> the fixed speed in the PHY when the Ethernet cable is swapped. According to his
> message, the link could not be correctly setup without this workaround.
This seems like a PHY bug, so the workaround should be in the PHY
driver, not the MAC driver. It will then apply to all MAC:PHY
combinations, not just the lan78xx paired with this PHY.
The PHY driver has the callback link_change_notify. You might be able
to use that.
Andrew
^ permalink raw reply
* Re: [net-next] ipv6: fix routing cache overflow for raw sockets
From: Julian Anastasov @ 2022-12-20 15:17 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jon Maxwell, davem, edumazet, kuba, yoshfuji, dsahern, netdev,
linux-kernel
In-Reply-To: <9f145202ca6a59b48d4430ed26a7ab0fe4c5dfaf.camel@redhat.com>
Hello,
On Tue, 20 Dec 2022, Paolo Abeni wrote:
> On Mon, 2022-12-19 at 10:48 +1100, Jon Maxwell wrote:
> > Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
> > route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
> > consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
> > these warnings:
> >
> > [1] 99.187805] dst_alloc: 7728 callbacks suppressed
> > [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
> > .
> > .
> > [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
>
> If I read correctly, the maximum number of dst that the raw socket can
> use this way is limited by the number of packets it allows via the
> sndbuf limit, right?
>
> Are other FLOWI_FLAG_KNOWN_NH users affected, too? e.g. nf_dup_ipv6,
> ipvs, seg6?
For IPVS there is no sndbuf limit. IPVS uses this flag
when receiving packets from world (destined to some local Virtual
IP) and then diverts/redirects the packets (without changing daddr)
to one of its backend servers on the LAN (no RTF_GATEWAY on such routes).
So, for each packet IPVS requests output route with FLOWI_FLAG_KNOWN_NH
flag and then sends the packet to backend server (nexthop) using
this route attached to the skb. Packet rate is usually high. The goal is
nexthop to be used from the route, not from the IP header. KNOWN_NH
means "nexthop is provided in route, not in daddr". As for the
implementation details in ipv6, I can not comment. But all users that
set the flag wants this, to send packet where daddr can be != nexthop.
> @DavidA: why do we need to create RTF_CACHE clones for KNOWN_NH flows?
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Heiner Kallweit @ 2022-12-20 15:19 UTC (permalink / raw)
To: Enguerrand de Ribaucourt
Cc: netdev, Paolo Abeni, woojung huh, davem, UNGLinuxDriver,
Andrew Lunn, Russell King - ARM Linux
In-Reply-To: <1721908413.470634.1671548576554.JavaMail.zimbra@savoirfairelinux.com>
On 20.12.2022 16:02, Enguerrand de Ribaucourt wrote:
>> From: "Heiner Kallweit" <hkallweit1@gmail.com>
>> To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
>> "netdev" <netdev@vger.kernel.org>
>> Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
>> <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
>> <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
>> ARM Linux" <linux@armlinux.org.uk>
>> Sent: Tuesday, December 20, 2022 3:40:15 PM
>> Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
>> phy_disable_interrupts()
>
>> On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
>>> It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
>>> made non static. For consistency with the other exported functions in
>>> this file, EXPORT_SYMBOL should be used.
>
>> No, it wasn't forgotten. It's intentional. The function is supposed to
>> be used within phylib only.
>
>> None of the phylib maintainers was on the addressee list of your patch.
>> Seems you didn't check with get_maintainers.pl.
>
>> You should explain your use case to the phylib maintainers. Maybe lan78xx
>> uses phylib in a wrong way, maybe an extension to phylib is needed.
>> Best start with explaining why lan78xx_link_status_change() needs to
>> fiddle with the PHY interrupt. It would help be helpful to understand
>> what "chip" refers to in the comment. The MAC, or the PHY?
>> Does the lan78xx code assume that a specific PHY is used, and the
>> functionality would actually belong to the respective PHY driver?
>
> Thank you for your swift reply,
>
> The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
> LAN7801 MAC driver) comes from a workaround by the original author which resets
> the fixed speed in the PHY when the Ethernet cable is swapped. According to his
> message, the link could not be correctly setup without this workaround.
>
> Unfortunately, I don't have the cables to test the code without the workaround
> and it's description doesn't explain what problem happens more precisely.
>
> The PHY the original author used is a LAN8835. The workaround code directly
> modified the interrupt configuration registers of this LAN8835 PHY within
> lan78xx_link_status_change(). This caused problems if a different PHY was used
> because the register at this address did not correspond to the interrupts
> configuration. As suggested by the lan78xx.c maintainer, a generic function
> should be used instead to toggle the interrupts of the PHY. However, it seems
> that maybe the MAC driver shouldn't meddle with the PHY's interrupts according
> to you. Would you consider this use case a valid one?
>
So this workaround works around a silicon bug in LAN8835?
Then the code supposedly should go to the link_change_notify handler of the
Microchip PHY driver for LAN8835.
There's just a generic PHY driver for LAN88xx. Would be helpful to know
which Microchip PHY's are affected.
> Enguerrand
>
>>> Fixes: 3dd4ef1bdbac ("net: phy: make phy_disable_interrupts() non-static")
>>> Signed-off-by: Enguerrand de Ribaucourt
>>> <enguerrand.de-ribaucourt@savoirfairelinux.com>
>>> ---
>>> drivers/net/phy/phy.c | 1 +
>>> 1 file changed, 1 insertion(+)
>
>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>> index e5b6cb1a77f9..33250da76466 100644
>>> --- a/drivers/net/phy/phy.c
>>> +++ b/drivers/net/phy/phy.c
>>> @@ -992,6 +992,7 @@ int phy_disable_interrupts(struct phy_device *phydev)
>>> /* Disable PHY interrupts */
>>> return phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
>>> }
>>> +EXPORT_SYMBOL(phy_disable_interrupts);
>
>>> /**
>>> * phy_interrupt - PHY interrupt handler
^ permalink raw reply
* [PATCH net] nfp: fix schedule in atomic context when sync mc address
From: Simon Horman @ 2022-12-20 15:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, oss-drivers, Yinjun Zhang, Louis Peens, Simon Horman
From: Yinjun Zhang <yinjun.zhang@corigine.com>
The callback `.ndo_set_rx_mode` is called in atomic context, sleep
is not allowed in the implementation. Now use workqueue mechanism
to avoid this issue.
Fixes: de6248644966 ("nfp: add support for multicast filter")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
---
drivers/net/ethernet/netronome/nfp/nfp_net.h | 7 +++
.../ethernet/netronome/nfp/nfp_net_common.c | 61 +++++++++++++++++--
2 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index da33f09facb9..432d79d691c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -617,6 +617,9 @@ struct nfp_net_dp {
* @vnic_no_name: For non-port PF vNIC make ndo_get_phys_port_name return
* -EOPNOTSUPP to keep backwards compatibility (set by app)
* @port: Pointer to nfp_port structure if vNIC is a port
+ * @mc_lock: Protect mc_addrs list
+ * @mc_addrs: List of mc addrs to add/del to HW
+ * @mc_work: Work to update mc addrs
* @app_priv: APP private data for this vNIC
*/
struct nfp_net {
@@ -718,6 +721,10 @@ struct nfp_net {
struct nfp_port *port;
+ spinlock_t mc_lock;
+ struct list_head mc_addrs;
+ struct work_struct mc_work;
+
void *app_priv;
};
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 09053373288f..18fc9971f1c8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1334,9 +1334,14 @@ int nfp_ctrl_open(struct nfp_net *nn)
return err;
}
-static int nfp_net_mc_cfg(struct net_device *netdev, const unsigned char *addr, const u32 cmd)
+struct nfp_mc_addr_entry {
+ u8 addr[ETH_ALEN];
+ u32 cmd;
+ struct list_head list;
+};
+
+static int nfp_net_mc_cfg(struct nfp_net *nn, const unsigned char *addr, const u32 cmd)
{
- struct nfp_net *nn = netdev_priv(netdev);
int ret;
ret = nfp_net_mbox_lock(nn, NFP_NET_CFG_MULTICAST_SZ);
@@ -1351,6 +1356,25 @@ static int nfp_net_mc_cfg(struct net_device *netdev, const unsigned char *addr,
return nfp_net_mbox_reconfig_and_unlock(nn, cmd);
}
+static int nfp_net_mc_prep(struct nfp_net *nn, const unsigned char *addr, const u32 cmd)
+{
+ struct nfp_mc_addr_entry *entry;
+
+ entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
+ if (!entry)
+ return -ENOMEM;
+
+ ether_addr_copy(entry->addr, addr);
+ entry->cmd = cmd;
+ spin_lock_bh(&nn->mc_lock);
+ list_add_tail(&entry->list, &nn->mc_addrs);
+ spin_unlock_bh(&nn->mc_lock);
+
+ schedule_work(&nn->mc_work);
+
+ return 0;
+}
+
static int nfp_net_mc_sync(struct net_device *netdev, const unsigned char *addr)
{
struct nfp_net *nn = netdev_priv(netdev);
@@ -1361,12 +1385,35 @@ static int nfp_net_mc_sync(struct net_device *netdev, const unsigned char *addr)
return -EINVAL;
}
- return nfp_net_mc_cfg(netdev, addr, NFP_NET_CFG_MBOX_CMD_MULTICAST_ADD);
+ return nfp_net_mc_prep(nn, addr, NFP_NET_CFG_MBOX_CMD_MULTICAST_ADD);
}
static int nfp_net_mc_unsync(struct net_device *netdev, const unsigned char *addr)
{
- return nfp_net_mc_cfg(netdev, addr, NFP_NET_CFG_MBOX_CMD_MULTICAST_DEL);
+ struct nfp_net *nn = netdev_priv(netdev);
+
+ return nfp_net_mc_prep(nn, addr, NFP_NET_CFG_MBOX_CMD_MULTICAST_DEL);
+}
+
+static void nfp_net_mc_addr_config(struct work_struct *work)
+{
+ struct nfp_net *nn = container_of(work, struct nfp_net, mc_work);
+ struct nfp_mc_addr_entry *entry, *tmp;
+ struct list_head tmp_list;
+
+ INIT_LIST_HEAD(&tmp_list);
+
+ spin_lock_bh(&nn->mc_lock);
+ list_splice_init(&nn->mc_addrs, &tmp_list);
+ spin_unlock_bh(&nn->mc_lock);
+
+ list_for_each_entry_safe(entry, tmp, &tmp_list, list) {
+ if (nfp_net_mc_cfg(nn, entry->addr, entry->cmd))
+ nn_err(nn, "Config mc address to HW failed.\n");
+
+ list_del(&entry->list);
+ kfree(entry);
+ }
}
static void nfp_net_set_rx_mode(struct net_device *netdev)
@@ -2633,6 +2680,11 @@ int nfp_net_init(struct nfp_net *nn)
if (!nn->dp.netdev)
return 0;
+
+ spin_lock_init(&nn->mc_lock);
+ INIT_LIST_HEAD(&nn->mc_addrs);
+ INIT_WORK(&nn->mc_work, nfp_net_mc_addr_config);
+
return register_netdev(nn->dp.netdev);
err_clean_mbox:
@@ -2652,5 +2704,6 @@ void nfp_net_clean(struct nfp_net *nn)
unregister_netdev(nn->dp.netdev);
nfp_net_ipsec_clean(nn);
nfp_ccm_mbox_clean(nn);
+ flush_work(&nn->mc_work);
nfp_net_reconfig_wait_posted(nn);
}
--
2.30.2
^ permalink raw reply related
* Re: Possible race with xsk_flush
From: Shawn Bohrer @ 2022-12-20 15:25 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: netdev, bpf, bjorn, magnus.karlsson, kernel-team
In-Reply-To: <CAJ8uoz1D3WY=joXqMo80a5Vqx+3N=5YX6Lh=KC1=coM5zDb-dA@mail.gmail.com>
On Tue, Dec 20, 2022 at 10:06:48AM +0100, Magnus Karlsson wrote:
> On Tue, Dec 20, 2022 at 2:32 AM Shawn Bohrer <sbohrer@cloudflare.com> wrote:
> >
> > On Fri, Dec 16, 2022 at 11:05:19AM +0100, Magnus Karlsson wrote:
> > > To summarize, we are expecting this ordering:
> > >
> > > CPU 0 __xsk_rcv_zc()
> > > CPU 0 __xsk_map_flush()
> > > CPU 2 __xsk_rcv_zc()
> > > CPU 2 __xsk_map_flush()
> > >
> > > But we are seeing this order:
> > >
> > > CPU 0 __xsk_rcv_zc()
> > > CPU 2 __xsk_rcv_zc()
> > > CPU 0 __xsk_map_flush()
> > > CPU 2 __xsk_map_flush()
> > >
> > > Here is the veth NAPI poll loop:
> > >
> > > static int veth_poll(struct napi_struct *napi, int budget)
> > > {
> > > struct veth_rq *rq =
> > > container_of(napi, struct veth_rq, xdp_napi);
> > > struct veth_stats stats = {};
> > > struct veth_xdp_tx_bq bq;
> > > int done;
> > >
> > > bq.count = 0;
> > >
> > > xdp_set_return_frame_no_direct();
> > > done = veth_xdp_rcv(rq, budget, &bq, &stats);
> > >
> > > if (done < budget && napi_complete_done(napi, done)) {
> > > /* Write rx_notify_masked before reading ptr_ring */
> > > smp_store_mb(rq->rx_notify_masked, false);
> > > if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
> > > if (napi_schedule_prep(&rq->xdp_napi)) {
> > > WRITE_ONCE(rq->rx_notify_masked, true);
> > > __napi_schedule(&rq->xdp_napi);
> > > }
> > > }
> > > }
> > >
> > > if (stats.xdp_tx > 0)
> > > veth_xdp_flush(rq, &bq);
> > > if (stats.xdp_redirect > 0)
> > > xdp_do_flush();
> > > xdp_clear_return_frame_no_direct();
> > >
> > > return done;
> > > }
> > >
> > > Something I have never seen before is that there is
> > > napi_complete_done() and a __napi_schedule() before xdp_do_flush().
> > > Let us check if this has something to do with it. So two suggestions
> > > to be executed separately:
> > >
> > > * Put a probe at the __napi_schedule() above and check if it gets
> > > triggered before this problem
> > > * Move the "if (stats.xdp_redirect > 0) xdp_do_flush();" to just
> > > before "if (done < budget && napi_complete_done(napi, done)) {"
> > >
> > > This might provide us some hints on what is going on.
> >
> > After staring at this code for way too long I finally made a
> > breakthrough! I could not understand how this race could occur when
> > napi_poll() calls netpoll_poll_lock(). Here is netpoll_poll_lock():
> >
> > ```
> > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > {
> > struct net_device *dev = napi->dev;
> >
> > if (dev && dev->npinfo) {
> > int owner = smp_processor_id();
> >
> > while (cmpxchg(&napi->poll_owner, -1, owner) != -1)
> > cpu_relax();
> >
> > return napi;
> > }
> > return NULL;
> > }
> > ```
> > If dev or dev->npinfo are NULL then it doesn't acquire a lock at all!
> > Adding some more trace points I see:
> >
> > ```
> > iperf2-1325 [002] ..s1. 264246.626880: __napi_poll: (__napi_poll+0x0/0x150) n=0xffff91c885bff000 poll_owner=-1 dev=0xffff91c881d4e000 npinfo=0x0
> > iperf2-1325 [002] d.Z1. 264246.626882: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x3b/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > iperf2-1325 [002] d.Z1. 264246.626883: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x42/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > iperf2-1325 [002] d.Z1. 264246.626884: xsk_flush: (__xsk_map_flush+0x32/0xb0) xs=0xffff91c8bfe77000
> > ```
> >
> > Here you can see that poll_owner=-1 meaning the lock was never
> > acquired because npinfo is NULL. This means that the same veth rx
> > queue can be napi_polled from multiple CPU and nothing stops it from
> > running concurrently. They all look like this, just most of the time
> > there aren't concurrent napi_polls running for the same queue. They
> > do however move around CPUs as I explained earlier.
> >
> > I'll note that I've ran with your suggested change of moving
> > xdp_do_flush() before napi_complete_done() all weekend and I have not
> > reproduced the issue. I don't know if that truly means the issue is
> > fixed by that change or not. I suspect it does fix the issue because
> > it prevents the napi_struct from being scheduled again before the
> > first poll has completed, and nap_schedule_prep() ensures that only
> > one instance is ever running.
>
> Thanks Shawn! Good news that the patch seems to fix the problem. To
> me, napi_schedule_prep() makes sure that only one NAPI instance is
> running. Netpoll is an optional feature and I do not even have it
> compiled into my kernel. At least I do not have it defined in my
> .config and I cannot find any netpoll symbols with a readelf command.
> If netpoll is not used, I would suspect that npinfo == NULL. So to me,
> it is still a mystery why this is happening.
Oh I don't think it is a mystery anymore. The napi_complete_done()
signals that this instance of of the napi_poll is complete. As you
said nap_schedule_prep() checks to ensure that only one instance of
napi_poll is running at a time, but we just indicated it was done with
napi_complete_done(). This allows this CPU or more importantly any
other CPU to reschedule napi polling for this receive queue, but we
haven't called xdp_do_flush() yet so the flush can race. I'll note
that the napi_schedule_prep()/__napi_schedule() in veth_poll really
isn't the problem since it will schedule itself back on the same CPU.
The problem is simply that another CPU is free to call
napi_scheulde_prep()/__napi_schedule() in that window after
napi_complete_done() and before xdp_do_flush(). The veth driver can
schedule a napi_poll from the transmit path which is what starts the
poll on a second CPU.
I was simply blinded by that stupid netpoll_poll_lock() but it is
completely unnecessary.
> To validate or disprove that there are two instances of the same napi
> running, could you please put a probe in veth_poll() right after
> veth_xdp_rcv() and one at xdp_do_flush() and dump the napi id in both
> cases? And run the original code with the bug :-). It would be good to
> know what exactly happens in the code between these points when this
> bug occurs. Maybe we could do this by enabling the trace buffer and
> dumping it when the bug occurs.
The two instances that race are definitely different napi_polls, the
second one is scheduled from the veth transmit path.
> > If we think this is the correct fix I'll let it run for another day or
> > two and prepare a patch.
>
> There is one more advantage to this bug fix and that is performance.
> By calling __xdp_map_flush() immediately after the receive function,
> user space can start to work on the packets quicker so performance
> will improve.
Yes definitely.
Thanks,
Shawn Bohrer
^ permalink raw reply
* Re: [PATCH net-next v3 2/3] net: qualcomm: rmnet: add tx packets aggregation
From: Paolo Abeni @ 2022-12-20 15:28 UTC (permalink / raw)
To: Daniele Palmas
Cc: David Miller, Jakub Kicinski, Eric Dumazet,
Subash Abhinov Kasiviswanathan, Sean Tranchetti, Jonathan Corbet,
Alexander Lobakin, Gal Pressman, Bjørn Mork,
Greg Kroah-Hartman, Dave Taht, netdev
In-Reply-To: <CAGRyCJEzg2gFCf3svgKGSv5+W4QRsVhbYQ+KZoEfvw_=2Rb+Zg@mail.gmail.com>
On Tue, 2022-12-20 at 15:28 +0100, Daniele Palmas wrote:
> Hello Paolo,
>
> Il giorno ven 9 dic 2022 alle ore 08:38 Daniele Palmas
> <dnlplm@gmail.com> ha scritto:
> >
> > Il giorno mer 7 dic 2022 alle ore 13:46 Paolo Abeni
> > <pabeni@redhat.com> ha scritto:
> > > > +static void rmnet_map_flush_tx_packet_work(struct work_struct *work)
> > > > +{
> > > > + struct sk_buff *skb = NULL;
> > > > + struct rmnet_port *port;
> > > > +
> > > > + port = container_of(work, struct rmnet_port, agg_wq);
> > > > +
> > > > + spin_lock_bh(&port->agg_lock);
> > > > + if (likely(port->agg_state == -EINPROGRESS)) {
> > > > + /* Buffer may have already been shipped out */
> > > > + if (likely(port->skbagg_head)) {
> > > > + skb = port->skbagg_head;
> > > > + reset_aggr_params(port);
> > > > + }
> > > > + port->agg_state = 0;
> > > > + }
> > > > +
> > > > + spin_unlock_bh(&port->agg_lock);
> > > > + if (skb)
> > > > + rmnet_send_skb(port, skb);
> > > > +}
> > > > +
> > > > +static enum hrtimer_restart rmnet_map_flush_tx_packet_queue(struct hrtimer *t)
> > > > +{
> > > > + struct rmnet_port *port;
> > > > +
> > > > + port = container_of(t, struct rmnet_port, hrtimer);
> > > > +
> > > > + schedule_work(&port->agg_wq);
> > >
> > > Why you need to schedule a work and you can't instead call the core of
> > > rmnet_map_flush_tx_packet_work() here? it looks like the latter does
> > > not need process context...
> > >
> >
> > Ack.
> >
>
> looks like removing the work is not as straightforward as I thought.
>
> Now the timer cb has become:
>
> static enum hrtimer_restart rmnet_map_flush_tx_packet_cb(struct hrtimer *t)
> {
> struct sk_buff *skb = NULL;
> struct rmnet_port *port;
>
> port = container_of(t, struct rmnet_port, hrtimer);
>
> spin_lock_bh(&port->agg_lock);
> if (likely(port->agg_state == -EINPROGRESS)) {
> /* Buffer may have already been shipped out */
> if (likely(port->skbagg_head)) {
> skb = port->skbagg_head;
> reset_aggr_params(port);
> }
> port->agg_state = 0;
> }
> spin_unlock_bh(&port->agg_lock);
>
> if (skb)
> rmnet_send_skb(port, skb);
>
> return HRTIMER_NORESTART;
> }
>
> but this is causing the following warning:
>
> [ 3106.701296] WARNING: CPU: 15 PID: 0 at kernel/softirq.c:375
> __local_bh_enable_ip+0x54/0x70
> ...
> [ 3106.701537] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G OE
> 6.1.0-rc5-rmnet-v4-warn #1
> [ 3106.701543] Hardware name: LENOVO 30DH00H2IX/1048, BIOS S08KT40A 08/23/2021
> [ 3106.701546] RIP: 0010:__local_bh_enable_ip+0x54/0x70
> [ 3106.701554] Code: a9 00 ff ff 00 74 27 65 ff 0d 08 bb 75 61 65 8b
> 05 01 bb 75 61 85 c0 74 06 5d c3 cc cc cc cc 0f 1f 44 00 00 5d c3 cc
> cc cc cc <0f> 0b eb bf 65 66 8b 05 e0 ca 76 61 66 85 c0 74 cc e8 e6 fd
> ff ff
> [ 3106.701559] RSP: 0018:ffffb8aa80510ec8 EFLAGS: 00010006
> [ 3106.701564] RAX: 0000000080010202 RBX: ffff932d7b687868 RCX: 0000000000000000
> [ 3106.701569] RDX: 0000000000000001 RSI: 0000000000000201 RDI: ffffffffc0bd5f7c
> [ 3106.701573] RBP: ffffb8aa80510ec8 R08: ffff933bdc3e31a0 R09: 000002d355c2f99d
> [ 3106.701576] R10: 0000000000000000 R11: ffffb8aa80510ff8 R12: ffff932d7b687828
> [ 3106.701580] R13: ffff932d7b687000 R14: ffff932cc1a76400 R15: ffff933bdc3e3180
> [ 3106.701584] FS: 0000000000000000(0000) GS:ffff933bdc3c0000(0000)
> knlGS:0000000000000000
> [ 3106.701589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3106.701593] CR2: 00007ffc26dae080 CR3: 0000000209f04003 CR4: 00000000007706e0
> [ 3106.701597] PKRU: 55555554
> [ 3106.701599] Call Trace:
> [ 3106.701602] <IRQ>
> [ 3106.701608] _raw_spin_unlock_bh+0x1d/0x30
> [ 3106.701623] rmnet_map_flush_tx_packet_cb+0x4c/0x90 [rmnet]
> [ 3106.701640] ? rmnet_send_skb+0x90/0x90 [rmnet]
> [ 3106.701655] __hrtimer_run_queues+0x106/0x260
> [ 3106.701664] hrtimer_interrupt+0x101/0x220
> [ 3106.701671] __sysvec_apic_timer_interrupt+0x61/0x110
> [ 3106.701677] sysvec_apic_timer_interrupt+0x7b/0x90
> [ 3106.701685] </IRQ>
> [ 3106.701687] <TASK>
> [ 3106.701689] asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [ 3106.701694] RIP: 0010:cpuidle_enter_state+0xde/0x6e0
> [ 3106.701704] Code: eb d6 60 e8 74 01 68 ff 8b 53 04 49 89 c7 0f 1f
> 44 00 00 31 ff e8 b2 23 67 ff 80 7d d0 00 0f 85 da 00 00 00 fb 0f 1f
> 44 00 00 <45> 85 f6 0f 88 01 02 00 00 4d 63 ee 49 83 fd 09 0f 87 b6 04
> 00 00
> [ 3106.701709] RSP: 0018:ffffb8aa801dbe38 EFLAGS: 00000246
> [ 3106.701713] RAX: ffff933bdc3f1380 RBX: ffffd8aa7fbc0700 RCX: 000000000000001f
> [ 3106.701717] RDX: 000000000000000f RSI: 0000000000000002 RDI: 0000000000000000
> [ 3106.701720] RBP: ffffb8aa801dbe88 R08: 000002d355d34146 R09: 000000000006e988
> [ 3106.701723] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffffa04ba5c0
> [ 3106.701727] R13: 0000000000000001 R14: 0000000000000001 R15: 000002d355d34146
> [ 3106.701735] ? cpuidle_enter_state+0xce/0x6e0
> [ 3106.701744] cpuidle_enter+0x2e/0x50
> [ 3106.701751] do_idle+0x204/0x290
> [ 3106.701758] cpu_startup_entry+0x20/0x30
> [ 3106.701763] start_secondary+0x122/0x160
> [ 3106.701773] secondary_startup_64_no_verify+0xe5/0xeb
> [ 3106.701784] </TASK>
>
> The reason is not obvious to me, so I need to dig further...
It happens because __hrtimer_run_queues runs in hard-IRQ context.
To address the above you need to replace all the
spin_lock_bh(&port->agg_lock);
instances with the spin_lock_irqsave() variant. With one exception: in
rmnet_map_flush_tx_packet_cb() you can use simply spin_lock() as such
fuction is already in hard-irq context.
Cheers,
Paolo
^ permalink raw reply
* Re: [syzbot] KASAN: use-after-free Read in ovs_vport_locate
From: Aaron Conole @ 2022-12-20 15:40 UTC (permalink / raw)
To: Paolo Abeni
Cc: syzbot, davem, dev, edumazet, kuba, linux-kernel, netdev, pshelar,
syzkaller-bugs, Eelco Chaudron
In-Reply-To: <d81e425ea627bf4228f19126dcb3dc3b79a412cc.camel@redhat.com>
Paolo Abeni <pabeni@redhat.com> writes:
> On Tue, 2022-12-20 at 00:22 -0800, syzbot wrote:
>> HEAD commit: 041fae9c105a Merge tag 'f2fs-for-6.2-rc1' of git://git.ker..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15c5d020480000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=836aafbf33f4fa6c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=8f4e2dcfcb3209ac35f9
>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/30e749b24df4/disk-041fae9c.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd6d972f5b02/vmlinux-041fae9c.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/405163d7c7cc/bzImage-041fae9c.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com
>>
>> netlink: 208 bytes leftover after parsing attributes in process `syz-executor.4'.
>> ==================================================================
>> BUG: KASAN: use-after-free in read_pnet include/net/net_namespace.h:383 [inline]
>> BUG: KASAN: use-after-free in ovs_dp_get_net net/openvswitch/datapath.h:195 [inline]
>> BUG: KASAN: use-after-free in ovs_vport_locate+0x131/0x150 net/openvswitch/vport.c:103
>> Read of size 8 at addr ffff88802055e360 by task syz-executor.4/5621
>>
>> CPU: 0 PID: 5621 Comm: syz-executor.4 Not tainted 6.1.0-syzkaller-10971-g041fae9c105a #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
>> Call Trace:
>> <TASK>
>> __dump_stack lib/dump_stack.c:88 [inline]
>> dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
>> print_address_description mm/kasan/report.c:306 [inline]
>> print_report+0x15e/0x461 mm/kasan/report.c:417
>> kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
>> read_pnet include/net/net_namespace.h:383 [inline]
>> ovs_dp_get_net net/openvswitch/datapath.h:195 [inline]
>> ovs_vport_locate+0x131/0x150 net/openvswitch/vport.c:103
>> lookup_datapath+0x54/0x3a0 net/openvswitch/datapath.c:1628
>> ovs_dp_reset_user_features net/openvswitch/datapath.c:1639 [inline]
>> ovs_dp_cmd_new+0xd5b/0x11c0 net/openvswitch/datapath.c:1848
>> genl_family_rcv_msg_doit.isra.0+0x1e6/0x2d0 net/netlink/genetlink.c:968
>> genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
>> genl_rcv_msg+0x4ff/0x7e0 net/netlink/genetlink.c:1065
>> netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
>> genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
>> netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
>> netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
>> netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
>> sock_sendmsg_nosec net/socket.c:714 [inline]
>> sock_sendmsg+0xd3/0x120 net/socket.c:734
>> ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
>> __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> RIP: 0033:0x7f142348c0d9
>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007f14240ff168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 00007f14235abf80 RCX: 00007f142348c0d9
>> RDX: 0000000000000800 RSI: 0000000020000100 RDI: 0000000000000003
>> RBP: 00007f14234e7ae9 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> R13: 00007ffdd965a34f R14: 00007f14240ff300 R15: 0000000000022000
>> </TASK>
>>
>> Allocated by task 5564:
>> kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
>> kasan_set_track+0x25/0x30 mm/kasan/common.c:52
>> ____kasan_kmalloc mm/kasan/common.c:371 [inline]
>> ____kasan_kmalloc mm/kasan/common.c:330 [inline]
>> __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
>> kmalloc include/linux/slab.h:580 [inline]
>> kzalloc include/linux/slab.h:720 [inline]
>> ovs_dp_cmd_new+0x1a3/0x11c0 net/openvswitch/datapath.c:1796
>> genl_family_rcv_msg_doit.isra.0+0x1e6/0x2d0 net/netlink/genetlink.c:968
>> genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
>> genl_rcv_msg+0x4ff/0x7e0 net/netlink/genetlink.c:1065
>> netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
>> genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
>> netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
>> netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
>> netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
>> sock_sendmsg_nosec net/socket.c:714 [inline]
>> sock_sendmsg+0xd3/0x120 net/socket.c:734
>> ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
>> __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Freed by task 5564:
>> kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
>> kasan_set_track+0x25/0x30 mm/kasan/common.c:52
>> kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
>> ____kasan_slab_free mm/kasan/common.c:236 [inline]
>> ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
>> kasan_slab_free include/linux/kasan.h:177 [inline]
>> __cache_free mm/slab.c:3394 [inline]
>> __do_kmem_cache_free mm/slab.c:3580 [inline]
>> __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
>> ovs_dp_cmd_new+0x25e/0x11c0 net/openvswitch/datapath.c:1884
>> genl_family_rcv_msg_doit.isra.0+0x1e6/0x2d0 net/netlink/genetlink.c:968
>> genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
>> genl_rcv_msg+0x4ff/0x7e0 net/netlink/genetlink.c:1065
>> netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
>> genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
>> netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
>> netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
>> netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
>> sock_sendmsg_nosec net/socket.c:714 [inline]
>> sock_sendmsg+0xd3/0x120 net/socket.c:734
>> ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
>> __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Last potentially related work creation:
>> kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
>> __kasan_record_aux_stack+0x7b/0x90 mm/kasan/generic.c:488
>> insert_work+0x48/0x350 kernel/workqueue.c:1358
>> __queue_work+0x693/0x13b0 kernel/workqueue.c:1517
>> queue_work_on+0xf2/0x110 kernel/workqueue.c:1545
>> queue_work include/linux/workqueue.h:503 [inline]
>> addr_event.part.0+0x33e/0x4f0 drivers/infiniband/core/roce_gid_mgmt.c:853
>> addr_event drivers/infiniband/core/roce_gid_mgmt.c:824 [inline]
>> inet6addr_event+0x142/0x1c0 drivers/infiniband/core/roce_gid_mgmt.c:883
>> notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
>> atomic_notifier_call_chain+0x74/0x180 kernel/notifier.c:225
>> ipv6_add_addr+0x1266/0x1de0 net/ipv6/addrconf.c:1165
>> addrconf_add_linklocal+0x1cc/0x590 net/ipv6/addrconf.c:3215
>> addrconf_addr_gen+0x326/0x370 net/ipv6/addrconf.c:3346
>> addrconf_dev_config+0x255/0x410 net/ipv6/addrconf.c:3391
>> addrconf_notify+0xfb6/0x1c80 net/ipv6/addrconf.c:3635
>> notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
>> call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1944
>> netdev_state_change net/core/dev.c:1319 [inline]
>> netdev_state_change+0x104/0x130 net/core/dev.c:1312
>> linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:182
>> __linkwatch_run_queue+0x23f/0x6a0 net/core/link_watch.c:235
>> linkwatch_event+0x4e/0x70 net/core/link_watch.c:278
>> process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
>> worker_thread+0x669/0x1090 kernel/workqueue.c:2436
>> kthread+0x2e8/0x3a0 kernel/kthread.c:376
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>>
>> Second to last potentially related work creation:
>> kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
>> __kasan_record_aux_stack+0x7b/0x90 mm/kasan/generic.c:488
>> insert_work+0x48/0x350 kernel/workqueue.c:1358
>> __queue_work+0x693/0x13b0 kernel/workqueue.c:1517
>> queue_work_on+0xf2/0x110 kernel/workqueue.c:1545
>> queue_work include/linux/workqueue.h:503 [inline]
>> netdevice_queue_work drivers/infiniband/core/roce_gid_mgmt.c:659 [inline]
>> netdevice_event+0x5e9/0x8f0 drivers/infiniband/core/roce_gid_mgmt.c:802
>> notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
>> call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1944
>> call_netdevice_notifiers_extack net/core/dev.c:1982 [inline]
>> call_netdevice_notifiers net/core/dev.c:1996 [inline]
>> register_netdevice+0xfb4/0x1640 net/core/dev.c:10078
>> bond_newlink drivers/net/bonding/bond_netlink.c:560 [inline]
>> bond_newlink+0x4b/0xa0 drivers/net/bonding/bond_netlink.c:550
>> rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
>> __rtnl_newlink+0x10c2/0x1840 net/core/rtnetlink.c:3624
>> rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3637
>> rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141
>> netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
>> netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
>> netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
>> netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
>> sock_sendmsg_nosec net/socket.c:714 [inline]
>> sock_sendmsg+0xd3/0x120 net/socket.c:734
>> __sys_sendto+0x23a/0x340 net/socket.c:2117
>> __do_sys_sendto net/socket.c:2129 [inline]
>> __se_sys_sendto net/socket.c:2125 [inline]
>> __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> The buggy address belongs to the object at ffff88802055e300
>> which belongs to the cache kmalloc-192 of size 192
>> The buggy address is located 96 bytes inside of
>> 192-byte region [ffff88802055e300, ffff88802055e3c0)
>>
>> The buggy address belongs to the physical page:
>> page:ffffea0000815780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802055ef00 pfn:0x2055e
>> flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
>> raw: 00fff00000000200 ffff888012040000 ffffea0000873fd0 ffffea000083f490
>> raw: ffff88802055ef00 ffff88802055e000 000000010000000e 0000000000000000
>> page dumped because: kasan: bad access detected
>> page_owner tracks the page as allocated
>> page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2420c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 1, tgid 1 (swapper/0), ts 8303233753, free_ts 8269599359
>> prep_new_page mm/page_alloc.c:2531 [inline]
>> get_page_from_freelist+0x119c/0x2ce0 mm/page_alloc.c:4283
>> __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
>> __alloc_pages_node include/linux/gfp.h:237 [inline]
>> kmem_getpages mm/slab.c:1363 [inline]
>> cache_grow_begin+0x94/0x390 mm/slab.c:2574
>> cache_alloc_refill+0x27f/0x380 mm/slab.c:2947
>> ____cache_alloc mm/slab.c:3023 [inline]
>> ____cache_alloc mm/slab.c:3006 [inline]
>> __do_cache_alloc mm/slab.c:3206 [inline]
>> slab_alloc_node mm/slab.c:3254 [inline]
>> __kmem_cache_alloc_node+0x44f/0x510 mm/slab.c:3544
>> kmalloc_trace+0x26/0x60 mm/slab_common.c:1062
>> kmalloc include/linux/slab.h:580 [inline]
>> kzalloc include/linux/slab.h:720 [inline]
>> call_usermodehelper_setup+0x9c/0x340 kernel/umh.c:366
>> kobject_uevent_env+0xed3/0x1620 lib/kobject_uevent.c:614
>> device_add+0xb76/0x1e90 drivers/base/core.c:3498
>> rfkill_register+0x1a9/0xb00 net/rfkill/core.c:1070
>> wiphy_register+0x24ae/0x2ae0 net/wireless/core.c:1007
>> virt_wifi_make_wiphy drivers/net/wireless/virt_wifi.c:383 [inline]
>> virt_wifi_init_module+0x352/0x3da drivers/net/wireless/virt_wifi.c:665
>> do_one_initcall+0x141/0x790 init/main.c:1306
>> do_initcall_level init/main.c:1379 [inline]
>> do_initcalls init/main.c:1395 [inline]
>> do_basic_setup init/main.c:1414 [inline]
>> kernel_init_freeable+0x6f9/0x782 init/main.c:1634
>> kernel_init+0x1e/0x1d0 init/main.c:1522
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>> page last free stack trace:
>> reset_page_owner include/linux/page_owner.h:24 [inline]
>> free_pages_prepare mm/page_alloc.c:1446 [inline]
>> free_pcp_prepare+0x65c/0xc00 mm/page_alloc.c:1496
>> free_unref_page_prepare mm/page_alloc.c:3369 [inline]
>> free_unref_page+0x1d/0x490 mm/page_alloc.c:3464
>> __vunmap+0x85d/0xd30 mm/vmalloc.c:2727
>> free_work+0x5c/0x80 mm/vmalloc.c:100
>> process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
>> worker_thread+0x669/0x1090 kernel/workqueue.c:2436
>> kthread+0x2e8/0x3a0 kernel/kthread.c:376
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>>
>> Memory state around the buggy address:
>> ffff88802055e200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ffff88802055e280: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
>> > ffff88802055e300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ^
>> ffff88802055e380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>> ffff88802055e400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ==================================================================
>
> @Eelco, @Aaron: could you please have a look? I think the
> ovs_dp_cmd_new() error path does not clean correctly the to-be-deleted
> datapath, possibly a ovs_dp_detach_port() call for the internal one is
> missing.
Okay - I'll have a look. I guess there could be some missing protection
as well during allocation / reference. Either way, building a KASAN
kernel now. Thanks for the report!
> Thanks,
>
> Paolo
^ permalink raw reply
* Re: [net-next] ipv6: fix routing cache overflow for raw sockets
From: Julian Anastasov @ 2022-12-20 15:41 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jon Maxwell, davem, edumazet, kuba, yoshfuji, dsahern, netdev,
linux-kernel
In-Reply-To: <9f145202ca6a59b48d4430ed26a7ab0fe4c5dfaf.camel@redhat.com>
Hello,
On Tue, 20 Dec 2022, Paolo Abeni wrote:
> Are other FLOWI_FLAG_KNOWN_NH users affected, too? e.g. nf_dup_ipv6,
> ipvs, seg6?
I forgot to mention one thing: IPVS can cache such routes in
its own storage, one per backend server, it still calls dst->ops->check
for them. So, such route can live for long time, that is why they were
created as uncached. So, IPVS requests one route, remembers it and then
can attach it to multiple packets for this backend server with
skb_dst_set_noref. So, IPVS have to use 4096 backend servers to
hit this limit.
It does not look correct in this patch to invalidate the
FLOWI_FLAG_KNOWN_NH flag with a FLOWI_FLAG_SKIP_RAW flag. The
same thing would be to not set FLOWI_FLAG_KNOWN_NH which is
wrong for the hdrincl case.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Enguerrand de Ribaucourt @ 2022-12-20 15:48 UTC (permalink / raw)
To: Heiner Kallweit, Andrew Lunn
Cc: netdev, Paolo Abeni, woojung huh, davem, UNGLinuxDriver,
Russell King - ARM Linux
In-Reply-To: <cc720a28-9e73-7c88-86af-8814b02ee580@gmail.com>
> From: "Heiner Kallweit" <hkallweit1@gmail.com>
> To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>
> Cc: "netdev" <netdev@vger.kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, "woojung huh" <woojung.huh@microchip.com>,
> "davem" <davem@davemloft.net>, "UNGLinuxDriver" <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>,
> "Russell King - ARM Linux" <linux@armlinux.org.uk>
> Sent: Tuesday, December 20, 2022 4:19:40 PM
> Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
> On 20.12.2022 16:02, Enguerrand de Ribaucourt wrote:
> >> From: "Heiner Kallweit" <hkallweit1@gmail.com>
> >> To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
> >> "netdev" <netdev@vger.kernel.org>
> >> Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
> >> <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> >> <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
> >> ARM Linux" <linux@armlinux.org.uk>
> >> Sent: Tuesday, December 20, 2022 3:40:15 PM
> >> Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
> >> phy_disable_interrupts()
> >> On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> >>> It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> >>> made non static. For consistency with the other exported functions in
> >>> this file, EXPORT_SYMBOL should be used.
> >> No, it wasn't forgotten. It's intentional. The function is supposed to
> >> be used within phylib only.
> >> None of the phylib maintainers was on the addressee list of your patch.
> >> Seems you didn't check with get_maintainers.pl.
> >> You should explain your use case to the phylib maintainers. Maybe lan78xx
> >> uses phylib in a wrong way, maybe an extension to phylib is needed.
> >> Best start with explaining why lan78xx_link_status_change() needs to
> >> fiddle with the PHY interrupt. It would help be helpful to understand
> >> what "chip" refers to in the comment. The MAC, or the PHY?
> >> Does the lan78xx code assume that a specific PHY is used, and the
> >> functionality would actually belong to the respective PHY driver?
> > Thank you for your swift reply,
> > The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
> > LAN7801 MAC driver) comes from a workaround by the original author which resets
> > the fixed speed in the PHY when the Ethernet cable is swapped. According to his
> > message, the link could not be correctly setup without this workaround.
> > Unfortunately, I don't have the cables to test the code without the workaround
> > and it's description doesn't explain what problem happens more precisely.
> > The PHY the original author used is a LAN8835. The workaround code directly
> > modified the interrupt configuration registers of this LAN8835 PHY within
> > lan78xx_link_status_change(). This caused problems if a different PHY was used
> > because the register at this address did not correspond to the interrupts
> > configuration. As suggested by the lan78xx.c maintainer, a generic function
> > should be used instead to toggle the interrupts of the PHY. However, it seems
> > that maybe the MAC driver shouldn't meddle with the PHY's interrupts according
> > to you. Would you consider this use case a valid one?
> So this workaround works around a silicon bug in LAN8835?
> Then the code supposedly should go to the link_change_notify handler of the
> Microchip PHY driver for LAN8835.
> There's just a generic PHY driver for LAN88xx. Would be helpful to know
> which Microchip PHY's are affected.
link_change_notify() seems very promising indeed!
My proposed approach would be to copy the original workaround actions
within link_change_notify():
1. disable interrupts
2. reset speed
3. enable interrupts
However, I don't have access to the LAN8835 to test if this would work. I also
don't have knowledge about which other Microchip PHYs could be impacted. Maybe
there is an active Microchip developer we could communicate with to find out?
Either way, it now seems clear that the LAN8835 interrupt code should be
removed from lan78xx.c.
> > Enguerrand
> >>> Fixes: 3dd4ef1bdbac ("net: phy: make phy_disable_interrupts() non-static")
> >>> Signed-off-by: Enguerrand de Ribaucourt
> >>> <enguerrand.de-ribaucourt@savoirfairelinux.com>
> >>> ---
> >>> drivers/net/phy/phy.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> >>> index e5b6cb1a77f9..33250da76466 100644
> >>> --- a/drivers/net/phy/phy.c
> >>> +++ b/drivers/net/phy/phy.c
> >>> @@ -992,6 +992,7 @@ int phy_disable_interrupts(struct phy_device *phydev)
> >>> /* Disable PHY interrupts */
> >>> return phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
> >>> }
> >>> +EXPORT_SYMBOL(phy_disable_interrupts);
> >>> /**
> >>> * phy_interrupt - PHY interrupt handler
^ permalink raw reply
* Re: Possible race with xsk_flush
From: Magnus Karlsson @ 2022-12-20 15:59 UTC (permalink / raw)
To: Shawn Bohrer; +Cc: netdev, bpf, bjorn, magnus.karlsson, kernel-team
In-Reply-To: <Y6HUBWCytyebNnOx@sbohrer-cf-dell>
On Tue, Dec 20, 2022 at 4:26 PM Shawn Bohrer <sbohrer@cloudflare.com> wrote:
>
> On Tue, Dec 20, 2022 at 10:06:48AM +0100, Magnus Karlsson wrote:
> > On Tue, Dec 20, 2022 at 2:32 AM Shawn Bohrer <sbohrer@cloudflare.com> wrote:
> > >
> > > On Fri, Dec 16, 2022 at 11:05:19AM +0100, Magnus Karlsson wrote:
> > > > To summarize, we are expecting this ordering:
> > > >
> > > > CPU 0 __xsk_rcv_zc()
> > > > CPU 0 __xsk_map_flush()
> > > > CPU 2 __xsk_rcv_zc()
> > > > CPU 2 __xsk_map_flush()
> > > >
> > > > But we are seeing this order:
> > > >
> > > > CPU 0 __xsk_rcv_zc()
> > > > CPU 2 __xsk_rcv_zc()
> > > > CPU 0 __xsk_map_flush()
> > > > CPU 2 __xsk_map_flush()
> > > >
> > > > Here is the veth NAPI poll loop:
> > > >
> > > > static int veth_poll(struct napi_struct *napi, int budget)
> > > > {
> > > > struct veth_rq *rq =
> > > > container_of(napi, struct veth_rq, xdp_napi);
> > > > struct veth_stats stats = {};
> > > > struct veth_xdp_tx_bq bq;
> > > > int done;
> > > >
> > > > bq.count = 0;
> > > >
> > > > xdp_set_return_frame_no_direct();
> > > > done = veth_xdp_rcv(rq, budget, &bq, &stats);
> > > >
> > > > if (done < budget && napi_complete_done(napi, done)) {
> > > > /* Write rx_notify_masked before reading ptr_ring */
> > > > smp_store_mb(rq->rx_notify_masked, false);
> > > > if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
> > > > if (napi_schedule_prep(&rq->xdp_napi)) {
> > > > WRITE_ONCE(rq->rx_notify_masked, true);
> > > > __napi_schedule(&rq->xdp_napi);
> > > > }
> > > > }
> > > > }
> > > >
> > > > if (stats.xdp_tx > 0)
> > > > veth_xdp_flush(rq, &bq);
> > > > if (stats.xdp_redirect > 0)
> > > > xdp_do_flush();
> > > > xdp_clear_return_frame_no_direct();
> > > >
> > > > return done;
> > > > }
> > > >
> > > > Something I have never seen before is that there is
> > > > napi_complete_done() and a __napi_schedule() before xdp_do_flush().
> > > > Let us check if this has something to do with it. So two suggestions
> > > > to be executed separately:
> > > >
> > > > * Put a probe at the __napi_schedule() above and check if it gets
> > > > triggered before this problem
> > > > * Move the "if (stats.xdp_redirect > 0) xdp_do_flush();" to just
> > > > before "if (done < budget && napi_complete_done(napi, done)) {"
> > > >
> > > > This might provide us some hints on what is going on.
> > >
> > > After staring at this code for way too long I finally made a
> > > breakthrough! I could not understand how this race could occur when
> > > napi_poll() calls netpoll_poll_lock(). Here is netpoll_poll_lock():
> > >
> > > ```
> > > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > > {
> > > struct net_device *dev = napi->dev;
> > >
> > > if (dev && dev->npinfo) {
> > > int owner = smp_processor_id();
> > >
> > > while (cmpxchg(&napi->poll_owner, -1, owner) != -1)
> > > cpu_relax();
> > >
> > > return napi;
> > > }
> > > return NULL;
> > > }
> > > ```
> > > If dev or dev->npinfo are NULL then it doesn't acquire a lock at all!
> > > Adding some more trace points I see:
> > >
> > > ```
> > > iperf2-1325 [002] ..s1. 264246.626880: __napi_poll: (__napi_poll+0x0/0x150) n=0xffff91c885bff000 poll_owner=-1 dev=0xffff91c881d4e000 npinfo=0x0
> > > iperf2-1325 [002] d.Z1. 264246.626882: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x3b/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > > iperf2-1325 [002] d.Z1. 264246.626883: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x42/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > > iperf2-1325 [002] d.Z1. 264246.626884: xsk_flush: (__xsk_map_flush+0x32/0xb0) xs=0xffff91c8bfe77000
> > > ```
> > >
> > > Here you can see that poll_owner=-1 meaning the lock was never
> > > acquired because npinfo is NULL. This means that the same veth rx
> > > queue can be napi_polled from multiple CPU and nothing stops it from
> > > running concurrently. They all look like this, just most of the time
> > > there aren't concurrent napi_polls running for the same queue. They
> > > do however move around CPUs as I explained earlier.
> > >
> > > I'll note that I've ran with your suggested change of moving
> > > xdp_do_flush() before napi_complete_done() all weekend and I have not
> > > reproduced the issue. I don't know if that truly means the issue is
> > > fixed by that change or not. I suspect it does fix the issue because
> > > it prevents the napi_struct from being scheduled again before the
> > > first poll has completed, and nap_schedule_prep() ensures that only
> > > one instance is ever running.
> >
> > Thanks Shawn! Good news that the patch seems to fix the problem. To
> > me, napi_schedule_prep() makes sure that only one NAPI instance is
> > running. Netpoll is an optional feature and I do not even have it
> > compiled into my kernel. At least I do not have it defined in my
> > .config and I cannot find any netpoll symbols with a readelf command.
> > If netpoll is not used, I would suspect that npinfo == NULL. So to me,
> > it is still a mystery why this is happening.
>
> Oh I don't think it is a mystery anymore. The napi_complete_done()
> signals that this instance of of the napi_poll is complete. As you
> said nap_schedule_prep() checks to ensure that only one instance of
> napi_poll is running at a time, but we just indicated it was done with
> napi_complete_done(). This allows this CPU or more importantly any
> other CPU to reschedule napi polling for this receive queue, but we
> haven't called xdp_do_flush() yet so the flush can race. I'll note
> that the napi_schedule_prep()/__napi_schedule() in veth_poll really
> isn't the problem since it will schedule itself back on the same CPU.
> The problem is simply that another CPU is free to call
> napi_scheulde_prep()/__napi_schedule() in that window after
> napi_complete_done() and before xdp_do_flush(). The veth driver can
> schedule a napi_poll from the transmit path which is what starts the
> poll on a second CPU.
Bingo! Would you like to prepare a patch or would you like me to do
it? This has been broken since the support was put in veth in 4.19.
> I was simply blinded by that stupid netpoll_poll_lock() but it is
> completely unnecessary.
>
> > To validate or disprove that there are two instances of the same napi
> > running, could you please put a probe in veth_poll() right after
> > veth_xdp_rcv() and one at xdp_do_flush() and dump the napi id in both
> > cases? And run the original code with the bug :-). It would be good to
> > know what exactly happens in the code between these points when this
> > bug occurs. Maybe we could do this by enabling the trace buffer and
> > dumping it when the bug occurs.
>
> The two instances that race are definitely different napi_polls, the
> second one is scheduled from the veth transmit path.
>
> > > If we think this is the correct fix I'll let it run for another day or
> > > two and prepare a patch.
> >
> > There is one more advantage to this bug fix and that is performance.
> > By calling __xdp_map_flush() immediately after the receive function,
> > user space can start to work on the packets quicker so performance
> > will improve.
>
> Yes definitely.
>
> Thanks,
> Shawn Bohrer
^ permalink raw reply
* Re: Possible race with xsk_flush
From: Shawn Bohrer @ 2022-12-20 16:02 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: netdev, bpf, bjorn, magnus.karlsson, kernel-team
In-Reply-To: <CAJ8uoz3r=2jo0EwMz1iAsatR59qFKoLw4mcK7w+TpE387s_Y-g@mail.gmail.com>
On Tue, Dec 20, 2022 at 04:59:30PM +0100, Magnus Karlsson wrote:
> On Tue, Dec 20, 2022 at 4:26 PM Shawn Bohrer <sbohrer@cloudflare.com> wrote:
> >
> > On Tue, Dec 20, 2022 at 10:06:48AM +0100, Magnus Karlsson wrote:
> > > On Tue, Dec 20, 2022 at 2:32 AM Shawn Bohrer <sbohrer@cloudflare.com> wrote:
> > > >
> > > > On Fri, Dec 16, 2022 at 11:05:19AM +0100, Magnus Karlsson wrote:
> > > > > To summarize, we are expecting this ordering:
> > > > >
> > > > > CPU 0 __xsk_rcv_zc()
> > > > > CPU 0 __xsk_map_flush()
> > > > > CPU 2 __xsk_rcv_zc()
> > > > > CPU 2 __xsk_map_flush()
> > > > >
> > > > > But we are seeing this order:
> > > > >
> > > > > CPU 0 __xsk_rcv_zc()
> > > > > CPU 2 __xsk_rcv_zc()
> > > > > CPU 0 __xsk_map_flush()
> > > > > CPU 2 __xsk_map_flush()
> > > > >
> > > > > Here is the veth NAPI poll loop:
> > > > >
> > > > > static int veth_poll(struct napi_struct *napi, int budget)
> > > > > {
> > > > > struct veth_rq *rq =
> > > > > container_of(napi, struct veth_rq, xdp_napi);
> > > > > struct veth_stats stats = {};
> > > > > struct veth_xdp_tx_bq bq;
> > > > > int done;
> > > > >
> > > > > bq.count = 0;
> > > > >
> > > > > xdp_set_return_frame_no_direct();
> > > > > done = veth_xdp_rcv(rq, budget, &bq, &stats);
> > > > >
> > > > > if (done < budget && napi_complete_done(napi, done)) {
> > > > > /* Write rx_notify_masked before reading ptr_ring */
> > > > > smp_store_mb(rq->rx_notify_masked, false);
> > > > > if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
> > > > > if (napi_schedule_prep(&rq->xdp_napi)) {
> > > > > WRITE_ONCE(rq->rx_notify_masked, true);
> > > > > __napi_schedule(&rq->xdp_napi);
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > if (stats.xdp_tx > 0)
> > > > > veth_xdp_flush(rq, &bq);
> > > > > if (stats.xdp_redirect > 0)
> > > > > xdp_do_flush();
> > > > > xdp_clear_return_frame_no_direct();
> > > > >
> > > > > return done;
> > > > > }
> > > > >
> > > > > Something I have never seen before is that there is
> > > > > napi_complete_done() and a __napi_schedule() before xdp_do_flush().
> > > > > Let us check if this has something to do with it. So two suggestions
> > > > > to be executed separately:
> > > > >
> > > > > * Put a probe at the __napi_schedule() above and check if it gets
> > > > > triggered before this problem
> > > > > * Move the "if (stats.xdp_redirect > 0) xdp_do_flush();" to just
> > > > > before "if (done < budget && napi_complete_done(napi, done)) {"
> > > > >
> > > > > This might provide us some hints on what is going on.
> > > >
> > > > After staring at this code for way too long I finally made a
> > > > breakthrough! I could not understand how this race could occur when
> > > > napi_poll() calls netpoll_poll_lock(). Here is netpoll_poll_lock():
> > > >
> > > > ```
> > > > static inline void *netpoll_poll_lock(struct napi_struct *napi)
> > > > {
> > > > struct net_device *dev = napi->dev;
> > > >
> > > > if (dev && dev->npinfo) {
> > > > int owner = smp_processor_id();
> > > >
> > > > while (cmpxchg(&napi->poll_owner, -1, owner) != -1)
> > > > cpu_relax();
> > > >
> > > > return napi;
> > > > }
> > > > return NULL;
> > > > }
> > > > ```
> > > > If dev or dev->npinfo are NULL then it doesn't acquire a lock at all!
> > > > Adding some more trace points I see:
> > > >
> > > > ```
> > > > iperf2-1325 [002] ..s1. 264246.626880: __napi_poll: (__napi_poll+0x0/0x150) n=0xffff91c885bff000 poll_owner=-1 dev=0xffff91c881d4e000 npinfo=0x0
> > > > iperf2-1325 [002] d.Z1. 264246.626882: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x3b/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > > > iperf2-1325 [002] d.Z1. 264246.626883: __xsk_rcv_zc_L7: (__xsk_rcv_zc+0x42/0xc0) addr=0x1503100 len=0x42 xs=0xffff91c8bfe77000 fq=0xffff91c8c1a43f80 dev=0xffff91c881d4e000
> > > > iperf2-1325 [002] d.Z1. 264246.626884: xsk_flush: (__xsk_map_flush+0x32/0xb0) xs=0xffff91c8bfe77000
> > > > ```
> > > >
> > > > Here you can see that poll_owner=-1 meaning the lock was never
> > > > acquired because npinfo is NULL. This means that the same veth rx
> > > > queue can be napi_polled from multiple CPU and nothing stops it from
> > > > running concurrently. They all look like this, just most of the time
> > > > there aren't concurrent napi_polls running for the same queue. They
> > > > do however move around CPUs as I explained earlier.
> > > >
> > > > I'll note that I've ran with your suggested change of moving
> > > > xdp_do_flush() before napi_complete_done() all weekend and I have not
> > > > reproduced the issue. I don't know if that truly means the issue is
> > > > fixed by that change or not. I suspect it does fix the issue because
> > > > it prevents the napi_struct from being scheduled again before the
> > > > first poll has completed, and nap_schedule_prep() ensures that only
> > > > one instance is ever running.
> > >
> > > Thanks Shawn! Good news that the patch seems to fix the problem. To
> > > me, napi_schedule_prep() makes sure that only one NAPI instance is
> > > running. Netpoll is an optional feature and I do not even have it
> > > compiled into my kernel. At least I do not have it defined in my
> > > .config and I cannot find any netpoll symbols with a readelf command.
> > > If netpoll is not used, I would suspect that npinfo == NULL. So to me,
> > > it is still a mystery why this is happening.
> >
> > Oh I don't think it is a mystery anymore. The napi_complete_done()
> > signals that this instance of of the napi_poll is complete. As you
> > said nap_schedule_prep() checks to ensure that only one instance of
> > napi_poll is running at a time, but we just indicated it was done with
> > napi_complete_done(). This allows this CPU or more importantly any
> > other CPU to reschedule napi polling for this receive queue, but we
> > haven't called xdp_do_flush() yet so the flush can race. I'll note
> > that the napi_schedule_prep()/__napi_schedule() in veth_poll really
> > isn't the problem since it will schedule itself back on the same CPU.
> > The problem is simply that another CPU is free to call
> > napi_scheulde_prep()/__napi_schedule() in that window after
> > napi_complete_done() and before xdp_do_flush(). The veth driver can
> > schedule a napi_poll from the transmit path which is what starts the
> > poll on a second CPU.
>
> Bingo! Would you like to prepare a patch or would you like me to do
> it? This has been broken since the support was put in veth in 4.19.
I'll prepare the patch today. I feel like I've spent enough of my
life chasing this down and appreciate your help!
Thanks,
Shawn Bohrer
^ permalink raw reply
* Re: [syzbot] kernel BUG in rxrpc_put_peer
From: David Howells @ 2022-12-20 16:02 UTC (permalink / raw)
To: syzbot
Cc: dhowells, davem, edumazet, kuba, linux-afs, linux-kernel,
marc.dionne, netdev, pabeni, syzkaller-bugs
In-Reply-To: <0000000000002b4a9f05ef2b616f@google.com>
#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/ 2bc808999a4484720747bbfcb03f9eb4a36223f0
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Russell King (Oracle) @ 2022-12-20 16:13 UTC (permalink / raw)
To: Enguerrand de Ribaucourt
Cc: Heiner Kallweit, netdev, Paolo Abeni, woojung huh, davem,
UNGLinuxDriver, Andrew Lunn
In-Reply-To: <1721908413.470634.1671548576554.JavaMail.zimbra@savoirfairelinux.com>
Hi,
On Tue, Dec 20, 2022 at 10:02:56AM -0500, Enguerrand de Ribaucourt wrote:
> > From: "Heiner Kallweit" <hkallweit1@gmail.com>
> > To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
> > "netdev" <netdev@vger.kernel.org>
> > Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
> > <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> > <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
> > ARM Linux" <linux@armlinux.org.uk>
> > Sent: Tuesday, December 20, 2022 3:40:15 PM
> > Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
> > phy_disable_interrupts()
>
> > On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> > > It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> > > made non static. For consistency with the other exported functions in
> > > this file, EXPORT_SYMBOL should be used.
>
> > No, it wasn't forgotten. It's intentional. The function is supposed to
> > be used within phylib only.
>
> > None of the phylib maintainers was on the addressee list of your patch.
> > Seems you didn't check with get_maintainers.pl.
>
> > You should explain your use case to the phylib maintainers. Maybe lan78xx
> > uses phylib in a wrong way, maybe an extension to phylib is needed.
> > Best start with explaining why lan78xx_link_status_change() needs to
> > fiddle with the PHY interrupt. It would help be helpful to understand
> > what "chip" refers to in the comment. The MAC, or the PHY?
> > Does the lan78xx code assume that a specific PHY is used, and the
> > functionality would actually belong to the respective PHY driver?
>
> Thank you for your swift reply,
>
> The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
> LAN7801 MAC driver) comes from a workaround by the original author which resets
> the fixed speed in the PHY when the Ethernet cable is swapped. According to his
> message, the link could not be correctly setup without this workaround.
>
> Unfortunately, I don't have the cables to test the code without the workaround
> and it's description doesn't explain what problem happens more precisely.
>
> The PHY the original author used is a LAN8835. The workaround code directly
> modified the interrupt configuration registers of this LAN8835 PHY within
> lan78xx_link_status_change(). This caused problems if a different PHY was used
> because the register at this address did not correspond to the interrupts
> configuration. As suggested by the lan78xx.c maintainer, a generic function
> should be used instead to toggle the interrupts of the PHY. However, it seems
> that maybe the MAC driver shouldn't meddle with the PHY's interrupts according
> to you. Would you consider this use case a valid one?
This sounds to me like you're just trying to get a workaround merged
upstream using a different approach, but you can't actually test to
see whether it does work or not. Would that be a fair assessment?
Do you know anyone who would be able to test? If not, I would suggest
not trying to upstream code for a workaround that can't be tested as
working.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Enguerrand de Ribaucourt @ 2022-12-20 16:22 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Heiner Kallweit, netdev, Paolo Abeni, woojung huh, davem,
UNGLinuxDriver, Andrew Lunn
In-Reply-To: <Y6HfK7bnkL2uyp3m@shell.armlinux.org.uk>
----- Original Message -----
> From: "Russell King (Oracle)" <linux@armlinux.org.uk>
> To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>
> Cc: "Heiner Kallweit" <hkallweit1@gmail.com>, "netdev" <netdev@vger.kernel.org>, "Paolo Abeni" <pabeni@redhat.com>,
> "woojung huh" <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>
> Sent: Tuesday, December 20, 2022 5:13:31 PM
> Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
> Hi,
> On Tue, Dec 20, 2022 at 10:02:56AM -0500, Enguerrand de Ribaucourt wrote:
> > > From: "Heiner Kallweit" <hkallweit1@gmail.com>
> > > To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>,
> > > "netdev" <netdev@vger.kernel.org>
> > > Cc: "Paolo Abeni" <pabeni@redhat.com>, "woojung huh"
> > > <woojung.huh@microchip.com>, "davem" <davem@davemloft.net>, "UNGLinuxDriver"
> > > <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>, "Russell King -
> > > ARM Linux" <linux@armlinux.org.uk>
> > > Sent: Tuesday, December 20, 2022 3:40:15 PM
> > > Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to
> > > phy_disable_interrupts()
> > > On 20.12.2022 14:19, Enguerrand de Ribaucourt wrote:
> > > > It seems EXPORT_SYMBOL was forgotten when phy_disable_interrupts() was
> > > > made non static. For consistency with the other exported functions in
> > > > this file, EXPORT_SYMBOL should be used.
> > > No, it wasn't forgotten. It's intentional. The function is supposed to
> > > be used within phylib only.
> > > None of the phylib maintainers was on the addressee list of your patch.
> > > Seems you didn't check with get_maintainers.pl.
> > > You should explain your use case to the phylib maintainers. Maybe lan78xx
> > > uses phylib in a wrong way, maybe an extension to phylib is needed.
> > > Best start with explaining why lan78xx_link_status_change() needs to
> > > fiddle with the PHY interrupt. It would help be helpful to understand
> > > what "chip" refers to in the comment. The MAC, or the PHY?
> > > Does the lan78xx code assume that a specific PHY is used, and the
> > > functionality would actually belong to the respective PHY driver?
> > Thank you for your swift reply,
> > The requirement to toggle the PHY interrupt in lan78xx_link_status_change() (the
> > LAN7801 MAC driver) comes from a workaround by the original author which resets
> > the fixed speed in the PHY when the Ethernet cable is swapped. According to his
> > message, the link could not be correctly setup without this workaround.
> > Unfortunately, I don't have the cables to test the code without the workaround
> > and it's description doesn't explain what problem happens more precisely.
> > The PHY the original author used is a LAN8835. The workaround code directly
> > modified the interrupt configuration registers of this LAN8835 PHY within
> > lan78xx_link_status_change(). This caused problems if a different PHY was used
> > because the register at this address did not correspond to the interrupts
> > configuration. As suggested by the lan78xx.c maintainer, a generic function
> > should be used instead to toggle the interrupts of the PHY. However, it seems
> > that maybe the MAC driver shouldn't meddle with the PHY's interrupts according
> > to you. Would you consider this use case a valid one?
> This sounds to me like you're just trying to get a workaround merged
> upstream using a different approach, but you can't actually test to
> see whether it does work or not. Would that be a fair assessment?
> Do you know anyone who would be able to test? If not, I would suggest
> not trying to upstream code for a workaround that can't be tested as
> working.
Hi,
I definitely agree with you that the workaround code can't be moved from
lan78xx.c to microchip.c without testing. Unfortunately, I don't know
who to contact to test on the hardware.
In the meantime, I suggest removing the workaround from lan78xx.c since
it is not compatible with most of the PHYs users of lan78xx.c could use.
Some previous messages on the mailing list attests to it.
My Patch v1 could also do it, since it keeps the workaround but limits
it with the phy_id.
Regards,
Enguerrand
> Thanks.
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* AW: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Sven Schuchmann @ 2022-12-20 16:47 UTC (permalink / raw)
To: Enguerrand de Ribaucourt, Russell King (Oracle)
Cc: Heiner Kallweit, netdev, Paolo Abeni, woojung huh, davem,
UNGLinuxDriver, Andrew Lunn
In-Reply-To: <1201721565.475609.1671553321565.JavaMail.zimbra@savoirfairelinux.com>
> -----Ursprüngliche Nachricht-----
> Von: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com>
> Gesendet: Dienstag, 20. Dezember 2022 17:22
> An: Russell King (Oracle) <linux@armlinux.org.uk>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>; netdev <netdev@vger.kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; woojung huh <woojung.huh@microchip.com>; davem <davem@davemloft.net>;
> UNGLinuxDriver <UNGLinuxDriver@microchip.com>; Andrew Lunn <andrew@lunn.ch>
> Betreff: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
>
> In the meantime, I suggest removing the workaround from lan78xx.c since
> it is not compatible with most of the PHYs users of lan78xx.c could use.
Same over here, we are using the lan7801 together with a dp83tc811 from TI
and also had to remove this code. On the raspberrypi kernel there is also
this patch https://github.com/raspberrypi/linux/commit/2cc41cfc79323d90b7fd8f28b22e77db2a0c3360
which is also modifying LAN88XX registers from the lan78xx driver.
But I also have no idea how this could be fixed.
Overall I would say to remove any PHY specific code from the LAN drivers
maybe by "fixup" function or something else. But at this time everyone
has to modify the lan78xx code, as the above patch indicates even
you *are* using a LAN88XX Phy.
Maybe someone else as some ideas?
Sven
^ permalink raw reply
* Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
From: Andrew Lunn @ 2022-12-20 16:55 UTC (permalink / raw)
To: Enguerrand de Ribaucourt
Cc: Heiner Kallweit, netdev, Paolo Abeni, woojung huh, davem,
UNGLinuxDriver, Russell King - ARM Linux
In-Reply-To: <1567686748.473254.1671551305632.JavaMail.zimbra@savoirfairelinux.com>
On Tue, Dec 20, 2022 at 10:48:25AM -0500, Enguerrand de Ribaucourt wrote:
> > From: "Heiner Kallweit" <hkallweit1@gmail.com>
> > To: "Enguerrand de Ribaucourt" <enguerrand.de-ribaucourt@savoirfairelinux.com>
> > Cc: "netdev" <netdev@vger.kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, "woojung huh" <woojung.huh@microchip.com>,
> > "davem" <davem@davemloft.net>, "UNGLinuxDriver" <UNGLinuxDriver@microchip.com>, "Andrew Lunn" <andrew@lunn.ch>,
> > "Russell King - ARM Linux" <linux@armlinux.org.uk>
> > Sent: Tuesday, December 20, 2022 4:19:40 PM
> > Subject: Re: [PATCH v3 1/3] net: phy: add EXPORT_SYMBOL to phy_disable_interrupts()
> My proposed approach would be to copy the original workaround actions
> within link_change_notify():
> 1. disable interrupts
> 2. reset speed
> 3. enable interrupts
>
> However, I don't have access to the LAN8835 to test if this would work. I also
> don't have knowledge about which other Microchip PHYs could be impacted. Maybe
> there is an active Microchip developer we could communicate with to find out?
Woojung Huh added this code, and he sometimes contributes here.
Woojung, do you still have access to the hardware?
Andrew
^ permalink raw reply
* Re: [PATCH ethtool-next v3 v3] JSON output support for Netlink implementation of --show-coalesce option
From: patchwork-bot+netdevbpf @ 2022-12-20 17:10 UTC (permalink / raw)
To: Maxim Georgiev; +Cc: mkubecek, netdev, kuba
In-Reply-To: <20221215051347.70022-1-glipus@gmail.com>
Hello:
This patch was applied to ethtool/ethtool.git (master)
by Michal Kubecek <mkubecek@suse.cz>:
On Wed, 14 Dec 2022 22:13:47 -0700 you wrote:
> Add --json support for Netlink implementation of --show-coalesce option
> No changes for non-JSON output for this feature.
>
> Example output without --json:
> [ethtool-git]$ sudo ./ethtool --show-coalesce enp9s0u2u1u2
> Coalesce parameters for enp9s0u2u1u2:
> Adaptive RX: n/a TX: n/a
> stats-block-usecs: n/a
> sample-interval: n/a
> pkt-rate-low: n/a
> pkt-rate-high: n/a
>
> [...]
Here is the summary with links:
- [ethtool-next,v3,v3] JSON output support for Netlink implementation of --show-coalesce option
https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=8cee2091ae2a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [syzbot] KASAN: use-after-free Read in put_pmu_ctx
From: Stanislav Fomichev @ 2022-12-20 17:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: syzbot, acme, alexander.shishkin, bpf, jolsa, linux-kernel,
linux-perf-users, mark.rutland, mingo, namhyung, netdev,
syzkaller-bugs
In-Reply-To: <Y6Fw1ymTcFrMR3Hl@hirez.programming.kicks-ass.net>
[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]
On Tue, Dec 20, 2022 at 12:22 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Dec 19, 2022 at 11:33:29AM -0800, sdf@google.com wrote:
> > On 12/19, Peter Zijlstra wrote:
> > > On Mon, Dec 19, 2022 at 12:04:43AM -0800, syzbot wrote:
>
> > > > HEAD commit: 13e3c7793e2f Merge tag 'for-netdev' of
> > > https://git.kernel...
> > > > git tree: bpf
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=177df7e0480000
> > > > kernel config:
> > > https://syzkaller.appspot.com/x/.config?x=b0e91ad4b5f69c47
> > > > dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=b8e8c01c8ade4fe6e48f
>
> ^ so syzbot knows what tree and config were used to trigger the report,
> then why:
I haven't used it before and wasn't sure whether it would take the
last commit from the branch of the one where it failed. Adding them
shouldn't hurt, right?
> > Let's maybe try it this way:
> >
> > #syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
> > 13e3c7793e2f
>
> do you have to repeat that again in order for it to test something?
Yeah, I was trying to understand why it doesn't like my patch. It
seems I can retry with the patch attached which hopefully should fix
the possible formatting issue; let's see.
#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
13e3c7793e2f
[-- Attachment #2: pmu.patch --]
[-- Type: text/x-patch, Size: 433 bytes --]
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e47914ac8732..bbff551783e1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12689,7 +12689,8 @@ SYSCALL_DEFINE5(perf_event_open,
return event_fd;
err_context:
- /* event->pmu_ctx freed by free_event() */
+ put_pmu_ctx(event->pmu_ctx);
+ event->pmu_ctx = NULL; /* _free_event() */
err_locked:
mutex_unlock(&ctx->mutex);
perf_unpin_context(ctx);
^ permalink raw reply related
* [PATCH net] net: vrf: determine the dst using the original ifindex for multicast
From: Antoine Tenart @ 2022-12-20 17:18 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet
Cc: Antoine Tenart, netdev, David Ahern, Jianlin Shi
Multicast packets received on an interface bound to a VRF are marked as
belonging to the VRF and the skb device is updated to point to the VRF
device itself. This was fine even when a route was associated to a
device as when performing a fib table lookup 'oif' in fib6_table_lookup
(coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
FLOWI_FLAG_SKIP_NH_OIF was set.
With commit 40867d74c374 ("net: Add l3mdev index to flow struct and
avoid oif reset for port devices") this is not longer true and multicast
traffic is not received on the original interface.
Instead of adding back a similar check in fib6_table_lookup determine
the dst using the original ifindex for multicast VRF traffic. To make
things consistent across the function do the above for all strict
packets, which was the logic before commit 6f12fa775530 ("vrf: mark skb
for multicast or link-local as enslaved to VRF"). Note that reverting to
this behavior should be fine as the change was about marking packets
belonging to the VRF, not about their dst.
Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Cc: David Ahern <dsahern@kernel.org>
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
drivers/net/vrf.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 6b5a4d036d15..bdb3a76a352e 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1385,8 +1385,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
/* loopback, multicast & non-ND link-local traffic; do not push through
* packet taps again. Reset pkt_type for upper layers to process skb.
- * For strict packets with a source LLA, determine the dst using the
- * original ifindex.
+ * For non-loopback strict packets, determine the dst using the original
+ * ifindex.
*/
if (skb->pkt_type == PACKET_LOOPBACK || (need_strict && !is_ndisc)) {
skb->dev = vrf_dev;
@@ -1395,7 +1395,7 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
if (skb->pkt_type == PACKET_LOOPBACK)
skb->pkt_type = PACKET_HOST;
- else if (ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)
+ else
vrf_ip6_input_dst(skb, vrf_dev, orig_iif);
goto out;
--
2.38.1
^ permalink raw reply related
* Re: [PATCH v2 1/3] dsa: marvell: Provide per device information about max frame size
From: Lukasz Majewski @ 2022-12-20 17:33 UTC (permalink / raw)
To: Alexander Duyck
Cc: Andrew Lunn, Vladimir Oltean, Vivien Didelot, Florian Fainelli,
David S. Miller, Jakub Kicinski, Russell King, netdev,
linux-kernel
In-Reply-To: <20221219130005.6e995cb0@wsk>
[-- Attachment #1: Type: text/plain, Size: 4804 bytes --]
Hi Alexander,
> Hi Alexander,
>
> > On Fri, Dec 16, 2022 at 5:05 AM Lukasz Majewski <lukma@denx.de>
> > wrote:
> > >
> > > Hi Alexander,
> > >
> > > > On Thu, 2022-12-15 at 15:45 +0100, Lukasz Majewski wrote:
> > > > > Different Marvell DSA switches support different size of max
> > > > > frame bytes to be sent.
> > > > >
> > > > > For example mv88e6185 supports max 1632 bytes, which is now
> > > > > in-driver standard value. On the other hand - mv88e6250
> > > > > supports 2048 bytes.
> > > > >
> > > > > As this value is internal and may be different for each switch
> > > > > IC, new entry in struct mv88e6xxx_info has been added to store
> > > > > it.
> > > > >
> > > > > Signed-off-by: Lukasz Majewski <lukma@denx.de>
> > > > > ---
> > > > > Changes for v2:
> > > > > - Define max_frame_size with default value of 1632 bytes,
> > > > > - Set proper value for the mv88e6250 switch SoC (linkstreet)
> > > > > family ---
> > > > > drivers/net/dsa/mv88e6xxx/chip.c | 13 ++++++++++++-
> > > > > drivers/net/dsa/mv88e6xxx/chip.h | 1 +
> > > > > 2 files changed, 13 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
> > > > > b/drivers/net/dsa/mv88e6xxx/chip.c index
> > > > > 2ca3cbba5764..7ae4c389ce50 100644 ---
> > > > > a/drivers/net/dsa/mv88e6xxx/chip.c +++
> > > > > b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3093,7 +3093,9 @@
> > > > > static int mv88e6xxx_get_max_mtu(struct dsa_switch *ds, int
> > > > > port) if (chip->info->ops->port_set_jumbo_size) return 10240 -
> > > > > VLAN_ETH_HLEN - EDSA_HLEN - ETH_FCS_LEN; else if
> > > > > (chip->info->ops->set_max_frame_size)
> > > > > - return 1632 - VLAN_ETH_HLEN - EDSA_HLEN -
> > > > > ETH_FCS_LEN;
> > > > > + return (chip->info->max_frame_size -
> > > > > VLAN_ETH_HLEN
> > > > > + - EDSA_HLEN - ETH_FCS_LEN);
> > > > > +
> > > > > return 1522 - VLAN_ETH_HLEN - EDSA_HLEN - ETH_FCS_LEN;
> > > > > }
> > > > >
> > > > >
> > > >
> > > > Is there any specific reason for triggering this based on the
> > > > existance of the function call?
> > >
> > > This was the original code in this driver.
> > >
> > > This value (1632 or 2048 bytes) is SoC (family) specific.
> > >
> > > By checking which device defines set_max_frame_size callback, I
> > > could fill the chip->info->max_frame_size with 1632 value.
> > >
> > > > Why not just replace:
> > > > else if (chip->info->ops->set_max_frame_size)
> > > > with:
> > > > else if (chip->info->max_frame_size)
> > > >
> > >
> > > I think that the callback check is a bit "defensive" approach ->
> > > 1522B is the default value and 1632 (or 10240 - jumbo) can be set
> > > only when proper callback is defined.
> > >
> > > > Otherwise my concern is one gets defined without the other
> > > > leading to a future issue as 0 - extra headers will likely wrap
> > > > and while the return value may be a signed int, it is usually
> > > > stored in an unsigned int so it would effectively uncap the
> > > > MTU.
> > >
> > > Please correct me if I misunderstood something:
> > >
> > > The problem is with new mv88eXXXX devices, which will not provide
> > > max_frame_size information to their chip->info struct?
> > >
> > > Or is there any other issue?
> >
> > That was mostly my concern. I was adding a bit of my own defensive
> > programming in the event that somebody forgot to fill out the
> > chip->info. If nothing else it might make sense to add a check to
> > verify that the max_frame_size is populated before blindly using it.
> > So perhaps you could do something similar to the max_t approach I
> > had called out earlier but instead of applying it on the last case
> > you could apply it for the "set_max_frame_size" case with 1632
> > being the minimum and being overwritten by 2048 if it is set in
> > max_frame_size.
>
> I think that I shall add:
>
> else if (chip->info->ops->set_max_frame_size)
> return max_t(int, chip->info->max_frame_size, 1632) -
> (headers)
>
> So then the "default" value of 1632 will be overwritten by 2048 bytes.
>
Is this approach acceptable for you?
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox