[PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths
@ 2024-07-28  0:52 Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b) Sasha Levin
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johannes Berg, syzbot+bc0f5b92cc7091f45fb6, Sasha Levin, johannes,
	davem, edumazet, kuba, pabeni, linux-wireless, netdev

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit 23daf1b4c91db9b26f8425cc7039cf96d22ccbfe ]

Setting the AP channel width is meant for use with the normal
20/40/... MHz channel width progression, and switching around
in S1G or narrow channels isn't supported. Disallow that.

Reported-by: syzbot+bc0f5b92cc7091f45fb6@syzkaller.appspotmail.com
Link: https://msgid.link/20240515141600.d4a9590bfe32.I19a32d60097e81b527eafe6b0924f6c5fbb2dc45@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/wireless/nl80211.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 72c7bf5585816..81d5bf186180f 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3419,6 +3419,33 @@ static int __nl80211_set_channel(struct cfg80211_registered_device *rdev,
 			if (chandef.chan != cur_chan)
 				return -EBUSY;
 
+			/* only allow this for regular channel widths */
+			switch (wdev->links[link_id].ap.chandef.width) {
+			case NL80211_CHAN_WIDTH_20_NOHT:
+			case NL80211_CHAN_WIDTH_20:
+			case NL80211_CHAN_WIDTH_40:
+			case NL80211_CHAN_WIDTH_80:
+			case NL80211_CHAN_WIDTH_80P80:
+			case NL80211_CHAN_WIDTH_160:
+			case NL80211_CHAN_WIDTH_320:
+				break;
+			default:
+				return -EINVAL;
+			}
+
+			switch (chandef.width) {
+			case NL80211_CHAN_WIDTH_20_NOHT:
+			case NL80211_CHAN_WIDTH_20:
+			case NL80211_CHAN_WIDTH_40:
+			case NL80211_CHAN_WIDTH_80:
+			case NL80211_CHAN_WIDTH_80P80:
+			case NL80211_CHAN_WIDTH_160:
+			case NL80211_CHAN_WIDTH_320:
+				break;
+			default:
+				return -EINVAL;
+			}
+
 			result = rdev_set_ap_chanwidth(rdev, dev, link_id,
 						       &chandef);
 			if (result)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-29  8:45   ` Heiner Kallweit
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 06/27] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Sasha Levin
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Heiner Kallweit, Jakub Kicinski, Sasha Levin, nic_swsd, davem,
	edumazet, pabeni, netdev

From: Heiner Kallweit <hkallweit1@gmail.com>

[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]

This early RTL8168b version was the first PCIe chip version, and it's
quite quirky. Last sign of life is from more than 15 yrs ago.
Let's remove detection of this chip version, we'll see whether anybody
complains. If not, support for this chip version can be removed a few
kernel versions later.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 7b9e04884575e..d2d46fe17631a 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
 
 		/* 8168B family. */
 		{ 0x7c8, 0x380,	RTL_GIGA_MAC_VER_17 },
-		{ 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
+		/* This one is very old and rare, let's see if anybody complains.
+		 * { 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
+		 */
 
 		/* 8101 family. */
 		{ 0x7c8, 0x448,	RTL_GIGA_MAC_VER_39 },
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b) Sasha Levin
@ 2024-07-29  8:45   ` Heiner Kallweit
  2024-08-10  9:12     ` Sasha Levin
  0 siblings, 1 reply; 21+ messages in thread
From: Heiner Kallweit @ 2024-07-29  8:45 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Jakub Kicinski, nic_swsd, davem, edumazet, pabeni, netdev

On 28.07.2024 02:52, Sasha Levin wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> 
> [ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
> 
> This early RTL8168b version was the first PCIe chip version, and it's
> quite quirky. Last sign of life is from more than 15 yrs ago.
> Let's remove detection of this chip version, we'll see whether anybody
> complains. If not, support for this chip version can be removed a few
> kernel versions later.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 7b9e04884575e..d2d46fe17631a 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
>  
>  		/* 8168B family. */
>  		{ 0x7c8, 0x380,	RTL_GIGA_MAC_VER_17 },
> -		{ 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
> +		/* This one is very old and rare, let's see if anybody complains.
> +		 * { 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
> +		 */
>  
>  		/* 8101 family. */
>  		{ 0x7c8, 0x448,	RTL_GIGA_MAC_VER_39 },

It may be the case that there are still few users out there with this ancient hw.
We will know better once 6.11 is out for a few month. In this case we would have to
revert this change. 
I don't think it's a change which should go to stable.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-07-29  8:45   ` Heiner Kallweit
@ 2024-08-10  9:12     ` Sasha Levin
  2024-08-11 14:32       ` Heiner Kallweit
  0 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2024-08-10  9:12 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: linux-kernel, stable, Jakub Kicinski, nic_swsd, davem, edumazet,
	pabeni, netdev

On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
>On 28.07.2024 02:52, Sasha Levin wrote:
>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>
>> [ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
>>
>> This early RTL8168b version was the first PCIe chip version, and it's
>> quite quirky. Last sign of life is from more than 15 yrs ago.
>> Let's remove detection of this chip version, we'll see whether anybody
>> complains. If not, support for this chip version can be removed a few
>> kernel versions later.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>  drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 7b9e04884575e..d2d46fe17631a 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
>>
>>  		/* 8168B family. */
>>  		{ 0x7c8, 0x380,	RTL_GIGA_MAC_VER_17 },
>> -		{ 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
>> +		/* This one is very old and rare, let's see if anybody complains.
>> +		 * { 0x7c8, 0x300,	RTL_GIGA_MAC_VER_11 },
>> +		 */
>>
>>  		/* 8101 family. */
>>  		{ 0x7c8, 0x448,	RTL_GIGA_MAC_VER_39 },
>
>It may be the case that there are still few users out there with this ancient hw.
>We will know better once 6.11 is out for a few month. In this case we would have to
>revert this change.
>I don't think it's a change which should go to stable.

Sure, I'll drop it.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-08-10  9:12     ` Sasha Levin
@ 2024-08-11 14:32       ` Heiner Kallweit
  2024-08-11 21:16         ` Sasha Levin
  0 siblings, 1 reply; 21+ messages in thread
From: Heiner Kallweit @ 2024-08-11 14:32 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Jakub Kicinski, nic_swsd, davem, edumazet,
	pabeni, netdev

On 10.08.2024 11:12, Sasha Levin wrote:
> On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
>> On 28.07.2024 02:52, Sasha Levin wrote:
>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>>
>>> [ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
>>>
>>> This early RTL8168b version was the first PCIe chip version, and it's
>>> quite quirky. Last sign of life is from more than 15 yrs ago.
>>> Let's remove detection of this chip version, we'll see whether anybody
>>> complains. If not, support for this chip version can be removed a few
>>> kernel versions later.
>>>
>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>> Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
>>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>> ---
>>>  drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 7b9e04884575e..d2d46fe17631a 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
>>>
>>>          /* 8168B family. */
>>>          { 0x7c8, 0x380,    RTL_GIGA_MAC_VER_17 },
>>> -        { 0x7c8, 0x300,    RTL_GIGA_MAC_VER_11 },
>>> +        /* This one is very old and rare, let's see if anybody complains.
>>> +         * { 0x7c8, 0x300,    RTL_GIGA_MAC_VER_11 },
>>> +         */
>>>
>>>          /* 8101 family. */
>>>          { 0x7c8, 0x448,    RTL_GIGA_MAC_VER_39 },
>>
>> It may be the case that there are still few users out there with this ancient hw.
>> We will know better once 6.11 is out for a few month. In this case we would have to
>> revert this change.
>> I don't think it's a change which should go to stable.
> 
> Sure, I'll drop it.
> 
Just saw that this patch has been added to stable again an hour ago.
Technical issue?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-08-11 14:32       ` Heiner Kallweit
@ 2024-08-11 21:16         ` Sasha Levin
  2025-02-23  0:57           ` Michael Pflüger
  0 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2024-08-11 21:16 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: linux-kernel, stable, Jakub Kicinski, nic_swsd, davem, edumazet,
	pabeni, netdev

On Sun, Aug 11, 2024 at 04:32:07PM +0200, Heiner Kallweit wrote:
>On 10.08.2024 11:12, Sasha Levin wrote:
>> On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
>>> On 28.07.2024 02:52, Sasha Levin wrote:
>>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>>>
>>>> [ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
>>>>
>>>> This early RTL8168b version was the first PCIe chip version, and it's
>>>> quite quirky. Last sign of life is from more than 15 yrs ago.
>>>> Let's remove detection of this chip version, we'll see whether anybody
>>>> complains. If not, support for this chip version can be removed a few
>>>> kernel versions later.
>>>>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>> Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
>>>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>>>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>>> ---
>>>>  drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>> index 7b9e04884575e..d2d46fe17631a 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>> @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
>>>>
>>>>          /* 8168B family. */
>>>>          { 0x7c8, 0x380,    RTL_GIGA_MAC_VER_17 },
>>>> -        { 0x7c8, 0x300,    RTL_GIGA_MAC_VER_11 },
>>>> +        /* This one is very old and rare, let's see if anybody complains.
>>>> +         * { 0x7c8, 0x300,    RTL_GIGA_MAC_VER_11 },
>>>> +         */
>>>>
>>>>          /* 8101 family. */
>>>>          { 0x7c8, 0x448,    RTL_GIGA_MAC_VER_39 },
>>>
>>> It may be the case that there are still few users out there with this ancient hw.
>>> We will know better once 6.11 is out for a few month. In this case we would have to
>>> revert this change.
>>> I don't think it's a change which should go to stable.
>>
>> Sure, I'll drop it.
>>
>Just saw that this patch has been added to stable again an hour ago.
>Technical issue?

Indeed, I ended up dropping from the wrong local branch yesterday,
sorry!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2024-08-11 21:16         ` Sasha Levin
@ 2025-02-23  0:57           ` Michael Pflüger
  2025-02-23  9:43             ` Heiner Kallweit
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Pflüger @ 2025-02-23  0:57 UTC (permalink / raw)
  To: sashal
  Cc: davem, edumazet, hkallweit1, kuba, linux-kernel, netdev, nic_swsd,
	pabeni, stable

 > Indeed, I ended up dropping from the wrong local branch yesterday

So it looks like the patch was kept in stable because i hit the problem
now after upgrading from kernel 6.9 to 6.13. It's a PCIe card which was
cheaply available in europe so it wouldn't surprise me if some more of
these are around and in use. So, can we keep supporting that chip and
revert the change, or do i have to get a new card?

Regards,

Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b)
  2025-02-23  0:57           ` Michael Pflüger
@ 2025-02-23  9:43             ` Heiner Kallweit
  0 siblings, 0 replies; 21+ messages in thread
From: Heiner Kallweit @ 2025-02-23  9:43 UTC (permalink / raw)
  To: Michael Pflüger, sashal
  Cc: davem, edumazet, kuba, linux-kernel, netdev, nic_swsd, pabeni,
	stable

On 23.02.2025 01:57, Michael Pflüger wrote:
>> Indeed, I ended up dropping from the wrong local branch yesterday
> 
> So it looks like the patch was kept in stable because i hit the problem
> now after upgrading from kernel 6.9 to 6.13. It's a PCIe card which was
> cheaply available in europe so it wouldn't surprise me if some more of
> these are around and in use. So, can we keep supporting that chip and
> revert the change, or do i have to get a new card?
> 
Detection of this chip version was removed in May last year and this is
the first such report. So it doesn't seem there's a significant number
of these old cards out there. It was the first PCIe version, and it's
broken in different ways, requiring version-specific workarounds.
I'd prefer not to re-add these workarounds in mainline, also given that
cards with modern versions like RTL8168h are available new for <10€.
Alternatively you can build a custom kernel, or use r8168 vendor driver.

> Regards,
> 
> Michael
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 06/27] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b) Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink Sasha Levin
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dragos Tatulea, Tariq Toukan, Jakub Kicinski, Sasha Levin, saeedm,
	davem, edumazet, pabeni, netdev, linux-rdma

From: Dragos Tatulea <dtatulea@nvidia.com>

[ Upstream commit fba8334721e266f92079632598e46e5f89082f30 ]

When all the strides in a WQE have been consumed, the WQE is unlinked
from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible
to receive CQEs with 0 consumed strides for the same WQE even after the
WQE is fully consumed and unlinked. This triggers an additional unlink
for the same wqe which corrupts the linked list.

Fix this scenario by accepting 0 sized consumed strides without
unlinking the WQE again.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b5333da20e8a7..cdc84a27a04ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2374,6 +2374,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
 
+	if (unlikely(!cstrides))
+		return;
+
 	wq  = &rq->mpwqe.wq;
 	wqe = mlx5_wq_ll_get_wqe(wq, wqe_id);
 	mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b) Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 06/27] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-29 15:01   ` Jakub Kicinski
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT Sasha Levin
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jakub Kicinski, Kuniyuki Iwashima, Eric Dumazet, David S . Miller,
	Sasha Levin, pabeni, idosch, jiri, amcohen, horms, lirongqing,
	juntong.deng, netdev

From: Jakub Kicinski <kuba@kernel.org>

[ Upstream commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f ]

Now that we have an intermediate layer of code for handling
rtnl-level netlink dump quirks, we can move the rtnl_lock
taking there.

For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
avoid taking rtnl_lock just to generate NLM_DONE, once again.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/core/rtnetlink.c     | 9 +++++++--
 net/netlink/af_netlink.c | 2 --
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4668d67180407..eabfc8290f5e2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -6486,6 +6486,7 @@ static int rtnl_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	const bool needs_lock = !(cb->flags & RTNL_FLAG_DUMP_UNLOCKED);
 	rtnl_dumpit_func dumpit = cb->data;
 	int err;
 
@@ -6495,7 +6496,11 @@ static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	if (!dumpit)
 		return 0;
 
+	if (needs_lock)
+		rtnl_lock();
 	err = dumpit(skb, cb);
+	if (needs_lock)
+		rtnl_unlock();
 
 	/* Old dump handlers used to send NLM_DONE as in a separate recvmsg().
 	 * Some applications which parse netlink manually depend on this.
@@ -6515,7 +6520,8 @@ static int rtnetlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 				const struct nlmsghdr *nlh,
 				struct netlink_dump_control *control)
 {
-	if (control->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE) {
+	if (control->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE ||
+	    !(control->flags & RTNL_FLAG_DUMP_UNLOCKED)) {
 		WARN_ON(control->data);
 		control->data = control->dump;
 		control->dump = rtnl_dumpit;
@@ -6703,7 +6709,6 @@ static int __net_init rtnetlink_net_init(struct net *net)
 	struct netlink_kernel_cfg cfg = {
 		.groups		= RTNLGRP_MAX,
 		.input		= rtnetlink_rcv,
-		.cb_mutex	= &rtnl_mutex,
 		.flags		= NL_CFG_F_NONROOT_RECV,
 		.bind		= rtnetlink_bind,
 	};
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index fa9c090cf629e..8bbbe75e75dbe 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2330,8 +2330,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 
 		cb->extack = &extack;
 
-		if (cb->flags & RTNL_FLAG_DUMP_UNLOCKED)
-			extra_mutex = NULL;
 		if (extra_mutex)
 			mutex_lock(extra_mutex);
 		nlk->dump_done_errno = cb->dump(skb, cb);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink Sasha Levin
@ 2024-07-29 15:01   ` Jakub Kicinski
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2024-07-29 15:01 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Kuniyuki Iwashima, Eric Dumazet,
	David S . Miller, pabeni, idosch, jiri, amcohen, horms,
	lirongqing, juntong.deng, netdev

On Sat, 27 Jul 2024 20:52:51 -0400 Sasha Levin wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> 
> [ Upstream commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f ]
> 
> Now that we have an intermediate layer of code for handling
> rtnl-level netlink dump quirks, we can move the rtnl_lock
> taking there.
> 
> For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
> avoid taking rtnl_lock just to generate NLM_DONE, once again.

Not really a fix, FWIW, but not scary either so your call.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (2 preceding siblings ...)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-29 15:00   ` Jakub Kicinski
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 11/27] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect() Sasha Levin
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sebastian Andrzej Siewior, Andrii Nakryiko, Eduard Zingerman,
	Hao Luo, Jiri Olsa, John Fastabend, KP Singh, Martin KaFai Lau,
	Song Liu, Stanislav Fomichev, Yonghong Song, Alexei Starovoitov,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen,
	Jakub Kicinski, Sasha Levin, daniel, mingo, peterz, juri.lelli,
	vincent.guittot, davem, edumazet, pabeni, akpm, brauner, oleg,
	kees, tandersen, mjguzik, willy, kent.overstreet, zhangpeng.00,
	linmiaohe, hca, jiri, lorenzo, yan, bpf, netdev

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

[ Upstream commit 401cb7dae8130fd34eb84648e02ab4c506df7d5e ]

The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
  packet and makes decisions. While doing that, the per-CPU variable
  bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
  and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.

The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.

Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++--------
 include/linux/sched.h  |  3 +++
 kernel/bpf/cpumap.c    |  3 +++
 kernel/bpf/devmap.c    |  9 ++++++-
 kernel/fork.c          |  1 +
 net/bpf/test_run.c     | 11 ++++++++-
 net/core/dev.c         | 29 +++++++++++++++++++++-
 net/core/filter.c      | 44 +++++++++------------------------
 net/core/lwt_bpf.c     |  3 +++
 9 files changed, 114 insertions(+), 45 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5669da513cd7c..b2dc932fdc35e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -733,21 +733,59 @@ struct bpf_nh_params {
 	};
 };
 
+/* flags for bpf_redirect_info kern_flags */
+#define BPF_RI_F_RF_NO_DIRECT	BIT(0)	/* no napi_direct on return_frame */
+#define BPF_RI_F_RI_INIT	BIT(1)
+
 struct bpf_redirect_info {
 	u64 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
 	u32 flags;
-	u32 kern_flags;
 	u32 map_id;
 	enum bpf_map_type map_type;
 	struct bpf_nh_params nh;
+	u32 kern_flags;
 };
 
-DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
+struct bpf_net_context {
+	struct bpf_redirect_info ri;
+};
 
-/* flags for bpf_redirect_info kern_flags */
-#define BPF_RI_F_RF_NO_DIRECT	BIT(0)	/* no napi_direct on return_frame */
+static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx)
+{
+	struct task_struct *tsk = current;
+
+	if (tsk->bpf_net_context != NULL)
+		return NULL;
+	bpf_net_ctx->ri.kern_flags = 0;
+
+	tsk->bpf_net_context = bpf_net_ctx;
+	return bpf_net_ctx;
+}
+
+static inline void bpf_net_ctx_clear(struct bpf_net_context *bpf_net_ctx)
+{
+	if (bpf_net_ctx)
+		current->bpf_net_context = NULL;
+}
+
+static inline struct bpf_net_context *bpf_net_ctx_get(void)
+{
+	return current->bpf_net_context;
+}
+
+static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void)
+{
+	struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
+
+	if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_RI_INIT)) {
+		memset(&bpf_net_ctx->ri, 0, offsetof(struct bpf_net_context, ri.nh));
+		bpf_net_ctx->ri.kern_flags |= BPF_RI_F_RI_INIT;
+	}
+
+	return &bpf_net_ctx->ri;
+}
 
 /* Compute the linear packet data range [data, data_end) which
  * will be accessed by various program types (cls_bpf, act_bpf,
@@ -1018,25 +1056,23 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
 
-void bpf_clear_redirect_map(struct bpf_map *map);
-
 static inline bool xdp_return_frame_no_direct(void)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT;
 }
 
 static inline void xdp_set_return_frame_no_direct(void)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT;
 }
 
 static inline void xdp_clear_return_frame_no_direct(void)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT;
 }
@@ -1592,7 +1628,7 @@ static __always_inline long __bpf_xdp_redirect_map(struct bpf_map *map, u64 inde
 						   u64 flags, const u64 flag_mask,
 						   void *lookup_elem(struct bpf_map *map, u32 key))
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	const u64 action_mask = XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX;
 
 	/* Lower bits of the flags are used as return code on lookup failure */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a5f4b48fca184..6d7170259a538 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -53,6 +53,7 @@ struct bio_list;
 struct blk_plug;
 struct bpf_local_storage;
 struct bpf_run_ctx;
+struct bpf_net_context;
 struct capture_control;
 struct cfs_rq;
 struct fs_struct;
@@ -1506,6 +1507,8 @@ struct task_struct {
 	/* Used for BPF run context */
 	struct bpf_run_ctx		*bpf_ctx;
 #endif
+	/* Used by BPF for per-TASK xdp storage */
+	struct bpf_net_context		*bpf_net_context;
 
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	unsigned long			lowest_stack;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a8e34416e960f..66974bd027109 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -240,12 +240,14 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 				int xdp_n, struct xdp_cpumap_stats *stats,
 				struct list_head *list)
 {
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int nframes;
 
 	if (!rcpu->prog)
 		return xdp_n;
 
 	rcu_read_lock_bh();
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 
 	nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats);
 
@@ -255,6 +257,7 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 	if (unlikely(!list_empty(list)))
 		cpu_map_bpf_prog_run_skb(rcpu, list, stats);
 
+	bpf_net_ctx_clear(bpf_net_ctx);
 	rcu_read_unlock_bh(); /* resched point, may call do_softirq() */
 
 	return nframes;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 7f3b34452243c..fbfdfb60db8d7 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -196,7 +196,14 @@ static void dev_map_free(struct bpf_map *map)
 	list_del_rcu(&dtab->list);
 	spin_unlock(&dev_map_lock);
 
-	bpf_clear_redirect_map(map);
+	/* bpf_redirect_info->map is assigned in __bpf_xdp_redirect_map()
+	 * during NAPI callback and cleared after the XDP redirect. There is no
+	 * explicit RCU read section which protects bpf_redirect_info->map but
+	 * local_bh_disable() also marks the beginning an RCU section. This
+	 * makes the complete softirq callback RCU protected. Thus after
+	 * following synchronize_rcu() there no bpf_redirect_info->map == map
+	 * assignment.
+	 */
 	synchronize_rcu();
 
 	/* Make sure prior __dev_map_entry_free() have completed. */
diff --git a/kernel/fork.c b/kernel/fork.c
index 99076dbe27d83..f314bdd7e6108 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2355,6 +2355,7 @@ __latent_entropy struct task_struct *copy_process(
 	RCU_INIT_POINTER(p->bpf_storage, NULL);
 	p->bpf_ctx = NULL;
 #endif
+	p->bpf_net_context =  NULL;
 
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 36ae54f57bf57..a6d7f790cdda8 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -283,9 +283,10 @@ static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
 static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
 			      u32 repeat)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int err = 0, act, ret, i, nframes = 0, batch_sz;
 	struct xdp_frame **frames = xdp->frames;
+	struct bpf_redirect_info *ri;
 	struct xdp_page_head *head;
 	struct xdp_frame *frm;
 	bool redirect = false;
@@ -295,6 +296,8 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
 	batch_sz = min_t(u32, repeat, xdp->batch_size);
 
 	local_bh_disable();
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+	ri = bpf_net_ctx_get_ri();
 	xdp_set_return_frame_no_direct();
 
 	for (i = 0; i < batch_sz; i++) {
@@ -359,6 +362,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
 	}
 
 	xdp_clear_return_frame_no_direct();
+	bpf_net_ctx_clear(bpf_net_ctx);
 	local_bh_enable();
 	return err;
 }
@@ -394,6 +398,7 @@ static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx,
 static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 			u32 *retval, u32 *time, bool xdp)
 {
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	struct bpf_prog_array_item item = {.prog = prog};
 	struct bpf_run_ctx *old_ctx;
 	struct bpf_cg_run_ctx run_ctx;
@@ -419,10 +424,14 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 	do {
 		run_ctx.prog_item = &item;
 		local_bh_disable();
+		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+
 		if (xdp)
 			*retval = bpf_prog_run_xdp(prog, ctx);
 		else
 			*retval = bpf_prog_run(prog, ctx);
+
+		bpf_net_ctx_clear(bpf_net_ctx);
 		local_bh_enable();
 	} while (bpf_test_timer_continue(&t, 1, repeat, &ret, time));
 	bpf_reset_run_ctx(old_ctx);
diff --git a/net/core/dev.c b/net/core/dev.c
index 2b4819b610b8a..195aa4d488bec 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4029,10 +4029,13 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 {
 	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
 	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int sch_ret;
 
 	if (!entry)
 		return skb;
+
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
@@ -4061,10 +4064,12 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 			break;
 		}
 		*ret = NET_RX_SUCCESS;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	case TC_ACT_SHOT:
 		kfree_skb_reason(skb, drop_reason);
 		*ret = NET_RX_DROP;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	/* used by tc_run */
 	case TC_ACT_STOLEN:
@@ -4074,8 +4079,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		fallthrough;
 	case TC_ACT_CONSUMED:
 		*ret = NET_RX_SUCCESS;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	}
+	bpf_net_ctx_clear(bpf_net_ctx);
 
 	return skb;
 }
@@ -4085,11 +4092,14 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
 	struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
 	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS;
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int sch_ret;
 
 	if (!entry)
 		return skb;
 
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+
 	/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
 	 * already set by the caller.
 	 */
@@ -4105,10 +4115,12 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		/* No need to push/pop skb's mac_header here on egress! */
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	case TC_ACT_SHOT:
 		kfree_skb_reason(skb, drop_reason);
 		*ret = NET_XMIT_DROP;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	/* used by tc_run */
 	case TC_ACT_STOLEN:
@@ -4118,8 +4130,10 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		fallthrough;
 	case TC_ACT_CONSUMED:
 		*ret = NET_XMIT_SUCCESS;
+		bpf_net_ctx_clear(bpf_net_ctx);
 		return NULL;
 	}
+	bpf_net_ctx_clear(bpf_net_ctx);
 
 	return skb;
 }
@@ -6301,6 +6315,7 @@ enum {
 static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 			   unsigned flags, u16 budget)
 {
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	bool skip_schedule = false;
 	unsigned long timeout;
 	int rc;
@@ -6318,6 +6333,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 	clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
 
 	local_bh_disable();
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 
 	if (flags & NAPI_F_PREFER_BUSY_POLL) {
 		napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
@@ -6340,6 +6356,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 	netpoll_poll_unlock(have_poll_lock);
 	if (rc == budget)
 		__busy_poll_stop(napi, skip_schedule);
+	bpf_net_ctx_clear(bpf_net_ctx);
 	local_bh_enable();
 }
 
@@ -6349,6 +6366,7 @@ static void __napi_busy_loop(unsigned int napi_id,
 {
 	unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
 	int (*napi_poll)(struct napi_struct *napi, int budget);
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	void *have_poll_lock = NULL;
 	struct napi_struct *napi;
 
@@ -6367,6 +6385,7 @@ static void __napi_busy_loop(unsigned int napi_id,
 		int work = 0;
 
 		local_bh_disable();
+		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 		if (!napi_poll) {
 			unsigned long val = READ_ONCE(napi->state);
 
@@ -6397,6 +6416,7 @@ static void __napi_busy_loop(unsigned int napi_id,
 			__NET_ADD_STATS(dev_net(napi->dev),
 					LINUX_MIB_BUSYPOLLRXPACKETS, work);
 		skb_defer_free_flush(this_cpu_ptr(&softnet_data));
+		bpf_net_ctx_clear(bpf_net_ctx);
 		local_bh_enable();
 
 		if (!loop_end || loop_end(loop_end_arg, start_time))
@@ -6824,6 +6844,7 @@ static int napi_thread_wait(struct napi_struct *napi)
 
 static void napi_threaded_poll_loop(struct napi_struct *napi)
 {
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	struct softnet_data *sd;
 	unsigned long last_qs = jiffies;
 
@@ -6832,6 +6853,8 @@ static void napi_threaded_poll_loop(struct napi_struct *napi)
 		void *have;
 
 		local_bh_disable();
+		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+
 		sd = this_cpu_ptr(&softnet_data);
 		sd->in_napi_threaded_poll = true;
 
@@ -6847,6 +6870,7 @@ static void napi_threaded_poll_loop(struct napi_struct *napi)
 			net_rps_action_and_irq_enable(sd);
 		}
 		skb_defer_free_flush(sd);
+		bpf_net_ctx_clear(bpf_net_ctx);
 		local_bh_enable();
 
 		if (!repoll)
@@ -6872,10 +6896,12 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
 	unsigned long time_limit = jiffies +
 		usecs_to_jiffies(READ_ONCE(net_hotdata.netdev_budget_usecs));
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int budget = READ_ONCE(net_hotdata.netdev_budget);
 	LIST_HEAD(list);
 	LIST_HEAD(repoll);
 
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 start:
 	sd->in_net_rx_action = true;
 	local_irq_disable();
@@ -6928,7 +6954,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		sd->in_net_rx_action = false;
 
 	net_rps_action_and_irq_enable(sd);
-end:;
+end:
+	bpf_net_ctx_clear(bpf_net_ctx);
 }
 
 struct netdev_adjacent {
diff --git a/net/core/filter.c b/net/core/filter.c
index 9933851c685e7..c0f350b963272 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2476,9 +2476,6 @@ static const struct bpf_func_proto bpf_clone_redirect_proto = {
 	.arg3_type      = ARG_ANYTHING,
 };
 
-DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
-EXPORT_PER_CPU_SYMBOL_GPL(bpf_redirect_info);
-
 static struct net_device *skb_get_peer_dev(struct net_device *dev)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -2491,7 +2488,7 @@ static struct net_device *skb_get_peer_dev(struct net_device *dev)
 
 int skb_do_redirect(struct sk_buff *skb)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	struct net *net = dev_net(skb->dev);
 	struct net_device *dev;
 	u32 flags = ri->flags;
@@ -2524,7 +2521,7 @@ int skb_do_redirect(struct sk_buff *skb)
 
 BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL)))
 		return TC_ACT_SHOT;
@@ -2545,7 +2542,7 @@ static const struct bpf_func_proto bpf_redirect_proto = {
 
 BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	if (unlikely(flags))
 		return TC_ACT_SHOT;
@@ -2567,7 +2564,7 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
 BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
 	   int, plen, u64, flags)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	if (unlikely((plen && plen < sizeof(*params)) || flags))
 		return TC_ACT_SHOT;
@@ -4293,30 +4290,13 @@ void xdp_do_check_flushed(struct napi_struct *napi)
 }
 #endif
 
-void bpf_clear_redirect_map(struct bpf_map *map)
-{
-	struct bpf_redirect_info *ri;
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		ri = per_cpu_ptr(&bpf_redirect_info, cpu);
-		/* Avoid polluting remote cacheline due to writes if
-		 * not needed. Once we pass this test, we need the
-		 * cmpxchg() to make sure it hasn't been changed in
-		 * the meantime by remote CPU.
-		 */
-		if (unlikely(READ_ONCE(ri->map) == map))
-			cmpxchg(&ri->map, map, NULL);
-	}
-}
-
 DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
 EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
 
 u32 xdp_master_redirect(struct xdp_buff *xdp)
 {
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	struct net_device *master, *slave;
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 
 	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
 	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
@@ -4388,7 +4368,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
 			map = READ_ONCE(ri->map);
 
 			/* The map pointer is cleared when the map is being torn
-			 * down by bpf_clear_redirect_map()
+			 * down by dev_map_free()
 			 */
 			if (unlikely(!map)) {
 				err = -ENOENT;
@@ -4433,7 +4413,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	enum bpf_map_type map_type = ri->map_type;
 
 	if (map_type == BPF_MAP_TYPE_XSKMAP)
@@ -4447,7 +4427,7 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect);
 int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp,
 			  struct xdp_frame *xdpf, struct bpf_prog *xdp_prog)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	enum bpf_map_type map_type = ri->map_type;
 
 	if (map_type == BPF_MAP_TYPE_XSKMAP)
@@ -4464,7 +4444,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
 				       enum bpf_map_type map_type, u32 map_id,
 				       u32 flags)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	struct bpf_map *map;
 	int err;
 
@@ -4476,7 +4456,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
 			map = READ_ONCE(ri->map);
 
 			/* The map pointer is cleared when the map is being torn
-			 * down by bpf_clear_redirect_map()
+			 * down by dev_map_free()
 			 */
 			if (unlikely(!map)) {
 				err = -ENOENT;
@@ -4518,7 +4498,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
 int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 			    struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 	enum bpf_map_type map_type = ri->map_type;
 	void *fwd = ri->tgt_value;
 	u32 map_id = ri->map_id;
@@ -4554,7 +4534,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 
 BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
 	if (unlikely(flags))
 		return XDP_ABORTED;
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 4a0797f0a154b..5350dce2e52d6 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -38,6 +38,7 @@ static inline struct bpf_lwt *bpf_lwt_lwtunnel(struct lwtunnel_state *lwt)
 static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
 		       struct dst_entry *dst, bool can_redirect)
 {
+	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int ret;
 
 	/* Migration disable and BH disable are needed to protect per-cpu
@@ -45,6 +46,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
 	 */
 	migrate_disable();
 	local_bh_disable();
+	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 	bpf_compute_data_pointers(skb);
 	ret = bpf_prog_run_save_cb(lwt->prog, skb);
 
@@ -77,6 +79,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
 		break;
 	}
 
+	bpf_net_ctx_clear(bpf_net_ctx);
 	local_bh_enable();
 	migrate_enable();
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT Sasha Levin
@ 2024-07-29 15:00   ` Jakub Kicinski
  2024-08-10  9:12     ` Sasha Levin
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2024-07-29 15:00 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Sebastian Andrzej Siewior, Andrii Nakryiko,
	Eduard Zingerman, Hao Luo, Jiri Olsa, John Fastabend, KP Singh,
	Martin KaFai Lau, Song Liu, Stanislav Fomichev, Yonghong Song,
	Alexei Starovoitov, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, daniel, mingo, peterz,
	juri.lelli, vincent.guittot, davem, edumazet, pabeni, akpm,
	brauner, oleg, kees, tandersen, mjguzik, willy, kent.overstreet,
	zhangpeng.00, linmiaohe, hca, jiri, lorenzo, yan, bpf, netdev

On Sat, 27 Jul 2024 20:52:53 -0400 Sasha Levin wrote:
> Subject: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

no no no, let's drop this one, it's not a fix, and there's a ton 
of fallout

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
  2024-07-29 15:00   ` Jakub Kicinski
@ 2024-08-10  9:12     ` Sasha Levin
  0 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-08-10  9:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-kernel, stable, Sebastian Andrzej Siewior, Andrii Nakryiko,
	Eduard Zingerman, Hao Luo, Jiri Olsa, John Fastabend, KP Singh,
	Martin KaFai Lau, Song Liu, Stanislav Fomichev, Yonghong Song,
	Alexei Starovoitov, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, daniel, mingo, peterz,
	juri.lelli, vincent.guittot, davem, edumazet, pabeni, akpm,
	brauner, oleg, kees, tandersen, mjguzik, willy, kent.overstreet,
	zhangpeng.00, linmiaohe, hca, jiri, lorenzo, yan, bpf, netdev

On Mon, Jul 29, 2024 at 08:00:14AM -0700, Jakub Kicinski wrote:
>On Sat, 27 Jul 2024 20:52:53 -0400 Sasha Levin wrote:
>> Subject: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
>
>no no no, let's drop this one, it's not a fix, and there's a ton
>of fallout

Ack, will do. Thanks!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 11/27] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect().
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (3 preceding siblings ...)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 13/27] wifi: mac80211: fix NULL dereference at band check in starting tx ba session Sasha Levin
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kuniyuki Iwashima, Paolo Abeni, Sasha Levin, davem, edumazet,
	kuba, mhal, daan.j.demeyer, netdev

From: Kuniyuki Iwashima <kuniyu@amazon.com>

[ Upstream commit 1ca27e0c8c13ac50a4acf9cdf77069e2d94a547d ]

When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
to lock the two sockets to check their states in unix_stream_connect().

We use unix_state_lock() for the server and unix_state_lock_nested() for
client with tricky sk->sk_state check to avoid deadlock.

The possible deadlock scenario are the following:

  1) Self connect()
  2) Simultaneous connect()

The former is simple, attempt to grab the same lock, and the latter is
AB-BA deadlock.

After the server's unix_state_lock(), we check the server socket's state,
and if it's not TCP_LISTEN, connect() fails with -EINVAL.

Then, we avoid the former deadlock by checking the client's state before
unix_state_lock_nested().  If its state is not TCP_LISTEN, we can make
sure that the client and the server are not identical based on the state.

Also, the latter deadlock can be avoided in the same way.  Due to the
server sk->sk_state requirement, AB-BA deadlock could happen only with
TCP_LISTEN sockets.  So, if the client's state is TCP_LISTEN, we can
give up the second lock to avoid the deadlock.

  CPU 1                 CPU 2                  CPU 3
  connect(A -> B)       connect(B -> A)        listen(A)
  ---                   ---                    ---
  unix_state_lock(B)
  B->sk_state == TCP_LISTEN
  READ_ONCE(A->sk_state) == TCP_CLOSE
                            ^^^^^^^^^
                            ok, will lock A    unix_state_lock(A)
             .--------------'                  WRITE_ONCE(A->sk_state, TCP_LISTEN)
             |                                 unix_state_unlock(A)
             |
             |          unix_state_lock(A)
             |          A->sk_sk_state == TCP_LISTEN
             |          READ_ONCE(B->sk_state) == TCP_LISTEN
             v                                    ^^^^^^^^^^
  unix_state_lock_nested(A)                       Don't lock B !!

Currently, while checking the client's state, we also check if it's
TCP_ESTABLISHED, but this is unlikely and can be checked after we know
the state is not TCP_CLOSE.

Moreover, if it happens after the second lock, we now jump to the restart
label, but it's unlikely that the server is not found during the retry,
so the jump is mostly to revist the client state check.

Let's remove the retry logic and check the state against TCP_CLOSE first.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/unix/af_unix.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 142f56770b77f..ff0029e9a7f91 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1473,6 +1473,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	struct unix_sock *u = unix_sk(sk), *newu, *otheru;
 	struct net *net = sock_net(sk);
 	struct sk_buff *skb = NULL;
+	unsigned char state;
 	long timeo;
 	int err;
 
@@ -1523,7 +1524,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto out;
 	}
 
-	/* Latch state of peer */
 	unix_state_lock(other);
 
 	/* Apparently VFS overslept socket death. Retry. */
@@ -1553,37 +1553,21 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto restart;
 	}
 
-	/* Latch our state.
-
-	   It is tricky place. We need to grab our state lock and cannot
-	   drop lock on peer. It is dangerous because deadlock is
-	   possible. Connect to self case and simultaneous
-	   attempt to connect are eliminated by checking socket
-	   state. other is TCP_LISTEN, if sk is TCP_LISTEN we
-	   check this before attempt to grab lock.
-
-	   Well, and we have to recheck the state after socket locked.
+	/* self connect and simultaneous connect are eliminated
+	 * by rejecting TCP_LISTEN socket to avoid deadlock.
 	 */
-	switch (READ_ONCE(sk->sk_state)) {
-	case TCP_CLOSE:
-		/* This is ok... continue with connect */
-		break;
-	case TCP_ESTABLISHED:
-		/* Socket is already connected */
-		err = -EISCONN;
-		goto out_unlock;
-	default:
-		err = -EINVAL;
+	state = READ_ONCE(sk->sk_state);
+	if (unlikely(state != TCP_CLOSE)) {
+		err = state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
 		goto out_unlock;
 	}
 
 	unix_state_lock_nested(sk, U_LOCK_SECOND);
 
-	if (sk->sk_state != TCP_CLOSE) {
+	if (unlikely(sk->sk_state != TCP_CLOSE)) {
+		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
 		unix_state_unlock(sk);
-		unix_state_unlock(other);
-		sock_put(other);
-		goto restart;
+		goto out_unlock;
 	}
 
 	err = security_unix_stream_connect(sk, other, newsk);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 13/27] wifi: mac80211: fix NULL dereference at band check in starting tx ba session
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (4 preceding siblings ...)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 11/27] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect() Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 16/27] wifi: nl80211: don't give key data to userspace Sasha Levin
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zong-Zhe Yang, Johannes Berg, Sasha Levin, johannes, davem,
	edumazet, kuba, pabeni, linux-wireless, netdev

From: Zong-Zhe Yang <kevin_yang@realtek.com>

[ Upstream commit 021d53a3d87eeb9dbba524ac515651242a2a7e3b ]

In MLD connection, link_data/link_conf are dynamically allocated. They
don't point to vif->bss_conf. So, there will be no chanreq assigned to
vif->bss_conf and then the chan will be NULL. Tweak the code to check
ht_supported/vht_supported/has_he/has_eht on sta deflink.

Crash log (with rtw89 version under MLO development):
[ 9890.526087] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 9890.526102] #PF: supervisor read access in kernel mode
[ 9890.526105] #PF: error_code(0x0000) - not-present page
[ 9890.526109] PGD 0 P4D 0
[ 9890.526114] Oops: 0000 [#1] PREEMPT SMP PTI
[ 9890.526119] CPU: 2 PID: 6367 Comm: kworker/u16:2 Kdump: loaded Tainted: G           OE      6.9.0 #1
[ 9890.526123] Hardware name: LENOVO 2356AD1/2356AD1, BIOS G7ETB3WW (2.73 ) 11/28/2018
[ 9890.526126] Workqueue: phy2 rtw89_core_ba_work [rtw89_core]
[ 9890.526203] RIP: 0010:ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211
[ 9890.526279] Code: f7 e8 d5 93 3e ea 48 83 c4 28 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8b 84 24 e0 f1 ff ff 48 8b 80 90 1b 00 00 <83> 38 03 0f 84 37 fe ff ff bb ea ff ff ff eb cc 49 8b 84 24 10 f3
All code
========
   0:	f7 e8                	imul   %eax
   2:	d5                   	(bad)
   3:	93                   	xchg   %eax,%ebx
   4:	3e ea                	ds (bad)
   6:	48 83 c4 28          	add    $0x28,%rsp
   a:	89 d8                	mov    %ebx,%eax
   c:	5b                   	pop    %rbx
   d:	41 5c                	pop    %r12
   f:	41 5d                	pop    %r13
  11:	41 5e                	pop    %r14
  13:	41 5f                	pop    %r15
  15:	5d                   	pop    %rbp
  16:	c3                   	retq
  17:	cc                   	int3
  18:	cc                   	int3
  19:	cc                   	int3
  1a:	cc                   	int3
  1b:	49 8b 84 24 e0 f1 ff 	mov    -0xe20(%r12),%rax
  22:	ff
  23:	48 8b 80 90 1b 00 00 	mov    0x1b90(%rax),%rax
  2a:*	83 38 03             	cmpl   $0x3,(%rax)		<-- trapping instruction
  2d:	0f 84 37 fe ff ff    	je     0xfffffffffffffe6a
  33:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
  38:	eb cc                	jmp    0x6
  3a:	49                   	rex.WB
  3b:	8b                   	.byte 0x8b
  3c:	84 24 10             	test   %ah,(%rax,%rdx,1)
  3f:	f3                   	repz

Code starting with the faulting instruction
===========================================
   0:	83 38 03             	cmpl   $0x3,(%rax)
   3:	0f 84 37 fe ff ff    	je     0xfffffffffffffe40
   9:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
   e:	eb cc                	jmp    0xffffffffffffffdc
  10:	49                   	rex.WB
  11:	8b                   	.byte 0x8b
  12:	84 24 10             	test   %ah,(%rax,%rdx,1)
  15:	f3                   	repz
[ 9890.526285] RSP: 0018:ffffb8db09013d68 EFLAGS: 00010246
[ 9890.526291] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9308e0d656c8
[ 9890.526295] RDX: 0000000000000000 RSI: ffffffffab99460b RDI: ffffffffab9a7685
[ 9890.526300] RBP: ffffb8db09013db8 R08: 0000000000000000 R09: 0000000000000873
[ 9890.526304] R10: ffff9308e0d64800 R11: 0000000000000002 R12: ffff9308e5ff6e70
[ 9890.526308] R13: ffff930952500e20 R14: ffff9309192a8c00 R15: 0000000000000000
[ 9890.526313] FS:  0000000000000000(0000) GS:ffff930b4e700000(0000) knlGS:0000000000000000
[ 9890.526316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9890.526318] CR2: 0000000000000000 CR3: 0000000391c58005 CR4: 00000000001706f0
[ 9890.526321] Call Trace:
[ 9890.526324]  <TASK>
[ 9890.526327] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 9890.526335] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 9890.526340] ? page_fault_oops (arch/x86/mm/fault.c:713)
[ 9890.526347] ? search_module_extables (kernel/module/main.c:3256 (discriminator 3))
[ 9890.526353] ? ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Link: https://patch.msgid.link/20240617115217.22344-1-kevin_yang@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/agg-tx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 21d55dc539f6c..677bbbac9f169 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -616,7 +616,9 @@ int ieee80211_start_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid,
 		return -EINVAL;
 
 	if (!pubsta->deflink.ht_cap.ht_supported &&
-	    sta->sdata->vif.bss_conf.chanreq.oper.chan->band != NL80211_BAND_6GHZ)
+	    !pubsta->deflink.vht_cap.vht_supported &&
+	    !pubsta->deflink.he_cap.has_he &&
+	    !pubsta->deflink.eht_cap.has_eht)
 		return -EINVAL;
 
 	if (WARN_ON_ONCE(!local->ops->ampdu_action))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 16/27] wifi: nl80211: don't give key data to userspace
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (5 preceding siblings ...)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 13/27] wifi: mac80211: fix NULL dereference at band check in starting tx ba session Sasha Levin
@ 2024-07-28  0:52 ` Sasha Levin
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 17/27] can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum Sasha Levin
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johannes Berg, Miriam Rachel Korenblit, Sasha Levin, johannes,
	davem, edumazet, kuba, pabeni, linux-wireless, netdev

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit a7e5793035792cc46a1a4b0a783655ffa897dfe9 ]

When a key is requested by userspace, there's really no need
to include the key data, the sequence counter is really what
userspace needs in this case. The fact that it's included is
just a historic quirk.

Remove the key data.

Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240627104411.b6a4f097e4ea.I7e6cc976cb9e8a80ef25a3351330f313373b4578@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/wireless/nl80211.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 81d5bf186180f..715ecbb3689e7 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4482,10 +4482,7 @@ static void get_key_callback(void *c, struct key_params *params)
 	struct nlattr *key;
 	struct get_key_cookie *cookie = c;
 
-	if ((params->key &&
-	     nla_put(cookie->msg, NL80211_ATTR_KEY_DATA,
-		     params->key_len, params->key)) ||
-	    (params->seq &&
+	if ((params->seq &&
 	     nla_put(cookie->msg, NL80211_ATTR_KEY_SEQ,
 		     params->seq_len, params->seq)) ||
 	    (params->cipher &&
@@ -4497,10 +4494,7 @@ static void get_key_callback(void *c, struct key_params *params)
 	if (!key)
 		goto nla_put_failure;
 
-	if ((params->key &&
-	     nla_put(cookie->msg, NL80211_KEY_DATA,
-		     params->key_len, params->key)) ||
-	    (params->seq &&
+	if ((params->seq &&
 	     nla_put(cookie->msg, NL80211_KEY_SEQ,
 		     params->seq_len, params->seq)) ||
 	    (params->cipher &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 17/27] can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (6 preceding siblings ...)
  2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 16/27] wifi: nl80211: don't give key data to userspace Sasha Levin
@ 2024-07-28  0:53 ` Sasha Levin
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 18/27] can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd Sasha Levin
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marc Kleine-Budde, Stefan Althöfer, Thomas Kopp, Sasha Levin,
	manivannan.sadhasivam, mailhol.vincent, davem, edumazet, kuba,
	pabeni, linux-can, netdev

From: Marc Kleine-Budde <mkl@pengutronix.de>

[ Upstream commit b8e0ddd36ce9536ad7478dd27df06c9ae92370ba ]

This is a preparatory patch to work around a problem similar to
erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

When handling the TEF interrupt, the driver reads the FIFO header
index from the TEF FIFO STA register of the chip.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN transmit complete events that were already processed to be
re-processed.

Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().

The mcp251xfd_handle_tefif() function reads the CAN transmit complete
events from the chip, iterates over them and pushes them into the
network stack. The original driver already contains code to detect old
CAN transmit complete events, that will be updated in the next patch.

Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../net/can/spi/mcp251xfd/mcp251xfd-ring.c    |  2 +
 drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 54 +++++++++++++------
 drivers/net/can/spi/mcp251xfd/mcp251xfd.h     | 13 ++---
 3 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
index bfe4caa0c99d4..4cb79a4f24612 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c
@@ -485,6 +485,8 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv)
 		clear_bit(MCP251XFD_FLAGS_FD_MODE, priv->flags);
 	}
 
+	tx_ring->obj_num_shift_to_u8 = BITS_PER_TYPE(tx_ring->obj_num) -
+		ilog2(tx_ring->obj_num);
 	tx_ring->obj_size = tx_obj_size;
 
 	rem = priv->rx_obj_num;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
index e5bd57b65aafe..b41fad3b37c06 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
@@ -2,7 +2,7 @@
 //
 // mcp251xfd - Microchip MCP251xFD Family CAN controller driver
 //
-// Copyright (c) 2019, 2020, 2021 Pengutronix,
+// Copyright (c) 2019, 2020, 2021, 2023 Pengutronix,
 //               Marc Kleine-Budde <kernel@pengutronix.de>
 //
 // Based on:
@@ -16,6 +16,11 @@
 
 #include "mcp251xfd.h"
 
+static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta)
+{
+	return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF);
+}
+
 static inline int
 mcp251xfd_tef_tail_get_from_chip(const struct mcp251xfd_priv *priv,
 				 u8 *tef_tail)
@@ -120,28 +125,44 @@ mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv,
 	return 0;
 }
 
-static int mcp251xfd_tef_ring_update(struct mcp251xfd_priv *priv)
+static int
+mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p)
 {
 	const struct mcp251xfd_tx_ring *tx_ring = priv->tx;
-	unsigned int new_head;
-	u8 chip_tx_tail;
+	const u8 shift = tx_ring->obj_num_shift_to_u8;
+	u8 chip_tx_tail, tail, len;
+	u32 fifo_sta;
 	int err;
 
-	err = mcp251xfd_tx_tail_get_from_chip(priv, &chip_tx_tail);
+	err = regmap_read(priv->map_reg, MCP251XFD_REG_FIFOSTA(priv->tx->fifo_nr),
+			  &fifo_sta);
 	if (err)
 		return err;
 
-	/* chip_tx_tail, is the next TX-Object send by the HW.
-	 * The new TEF head must be >= the old head, ...
+	if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) {
+		*len_p = tx_ring->obj_num;
+		return 0;
+	}
+
+	chip_tx_tail = FIELD_GET(MCP251XFD_REG_FIFOSTA_FIFOCI_MASK, fifo_sta);
+
+	err =  mcp251xfd_check_tef_tail(priv);
+	if (err)
+		return err;
+	tail = mcp251xfd_get_tef_tail(priv);
+
+	/* First shift to full u8. The subtraction works on signed
+	 * values, that keeps the difference steady around the u8
+	 * overflow. The right shift acts on len, which is an u8.
 	 */
-	new_head = round_down(priv->tef->head, tx_ring->obj_num) + chip_tx_tail;
-	if (new_head <= priv->tef->head)
-		new_head += tx_ring->obj_num;
+	BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(chip_tx_tail));
+	BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(tail));
+	BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(len));
 
-	/* ... but it cannot exceed the TX head. */
-	priv->tef->head = min(new_head, tx_ring->head);
+	len = (chip_tx_tail << shift) - (tail << shift);
+	*len_p = len >> shift;
 
-	return mcp251xfd_check_tef_tail(priv);
+	return 0;
 }
 
 static inline int
@@ -182,13 +203,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv)
 	u8 tef_tail, len, l;
 	int err, i;
 
-	err = mcp251xfd_tef_ring_update(priv);
+	err = mcp251xfd_get_tef_len(priv, &len);
 	if (err)
 		return err;
 
 	tef_tail = mcp251xfd_get_tef_tail(priv);
-	len = mcp251xfd_get_tef_len(priv);
-	l = mcp251xfd_get_tef_linear_len(priv);
+	l = mcp251xfd_get_tef_linear_len(priv, len);
 	err = mcp251xfd_tef_obj_read(priv, hw_tef_obj, tef_tail, l);
 	if (err)
 		return err;
@@ -223,6 +243,8 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv)
 		struct mcp251xfd_tx_ring *tx_ring = priv->tx;
 		int offset;
 
+		ring->head += len;
+
 		/* Increment the TEF FIFO tail pointer 'len' times in
 		 * a single SPI message.
 		 *
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index b35bfebd23f29..4628bf847bc9b 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -524,6 +524,7 @@ struct mcp251xfd_tef_ring {
 
 	/* u8 obj_num equals tx_ring->obj_num */
 	/* u8 obj_size equals sizeof(struct mcp251xfd_hw_tef_obj) */
+	/* u8 obj_num_shift_to_u8 equals tx_ring->obj_num_shift_to_u8 */
 
 	union mcp251xfd_write_reg_buf irq_enable_buf;
 	struct spi_transfer irq_enable_xfer;
@@ -542,6 +543,7 @@ struct mcp251xfd_tx_ring {
 	u8 nr;
 	u8 fifo_nr;
 	u8 obj_num;
+	u8 obj_num_shift_to_u8;
 	u8 obj_size;
 
 	struct mcp251xfd_tx_obj obj[MCP251XFD_TX_OBJ_NUM_MAX];
@@ -861,17 +863,8 @@ static inline u8 mcp251xfd_get_tef_tail(const struct mcp251xfd_priv *priv)
 	return priv->tef->tail & (priv->tx->obj_num - 1);
 }
 
-static inline u8 mcp251xfd_get_tef_len(const struct mcp251xfd_priv *priv)
+static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv, u8 len)
 {
-	return priv->tef->head - priv->tef->tail;
-}
-
-static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv)
-{
-	u8 len;
-
-	len = mcp251xfd_get_tef_len(priv);
-
 	return min_t(u8, len, priv->tx->obj_num - mcp251xfd_get_tef_tail(priv));
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 18/27] can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (7 preceding siblings ...)
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 17/27] can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum Sasha Levin
@ 2024-07-28  0:53 ` Sasha Levin
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 19/27] net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on sa8775p-ride-r3 Sasha Levin
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 20/27] mlxsw: pci: Lock configuration space of upstream bridge during reset Sasha Levin
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marc Kleine-Budde, Stefan Althöfer, Thomas Kopp, Sasha Levin,
	manivannan.sadhasivam, mailhol.vincent, davem, edumazet, kuba,
	pabeni, linux-can, netdev

From: Marc Kleine-Budde <mkl@pengutronix.de>

[ Upstream commit 3a0a88fcbaf9e027ecca3fe8775be9700b4d6460 ]

This patch updates the workaround for a problem similar to erratum
DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

In the bad case, the driver reads a too large head index. As the FIFO
is implemented as a ring buffer, this results in re-handling old CAN
transmit complete events.

Every transmit complete event contains with a sequence number that
equals to the sequence number of the corresponding TX request. This
way old TX complete events can be detected.

If the original driver detects a non matching sequence number, it
prints an info message and tries again later. As wrong sequence
numbers can be explained by the erratum DS80000789E 6, demote the info
message to debug level, streamline the code and update the comments.

Keep the behavior: If an old CAN TX complete event is detected, abort
the iteration and mark the number of valid CAN TX complete events as
processed in the chip by incrementing the FIFO's tail index.

Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 71 +++++++------------
 1 file changed, 27 insertions(+), 44 deletions(-)

diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
index b41fad3b37c06..5b0c7890d4b44 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
@@ -60,56 +60,39 @@ static int mcp251xfd_check_tef_tail(const struct mcp251xfd_priv *priv)
 	return 0;
 }
 
-static int
-mcp251xfd_handle_tefif_recover(const struct mcp251xfd_priv *priv, const u32 seq)
-{
-	const struct mcp251xfd_tx_ring *tx_ring = priv->tx;
-	u32 tef_sta;
-	int err;
-
-	err = regmap_read(priv->map_reg, MCP251XFD_REG_TEFSTA, &tef_sta);
-	if (err)
-		return err;
-
-	if (tef_sta & MCP251XFD_REG_TEFSTA_TEFOVIF) {
-		netdev_err(priv->ndev,
-			   "Transmit Event FIFO buffer overflow.\n");
-		return -ENOBUFS;
-	}
-
-	netdev_info(priv->ndev,
-		    "Transmit Event FIFO buffer %s. (seq=0x%08x, tef_tail=0x%08x, tef_head=0x%08x, tx_head=0x%08x).\n",
-		    tef_sta & MCP251XFD_REG_TEFSTA_TEFFIF ?
-		    "full" : tef_sta & MCP251XFD_REG_TEFSTA_TEFNEIF ?
-		    "not empty" : "empty",
-		    seq, priv->tef->tail, priv->tef->head, tx_ring->head);
-
-	/* The Sequence Number in the TEF doesn't match our tef_tail. */
-	return -EAGAIN;
-}
-
 static int
 mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv,
 			   const struct mcp251xfd_hw_tef_obj *hw_tef_obj,
 			   unsigned int *frame_len_ptr)
 {
 	struct net_device_stats *stats = &priv->ndev->stats;
+	u32 seq, tef_tail_masked, tef_tail;
 	struct sk_buff *skb;
-	u32 seq, seq_masked, tef_tail_masked, tef_tail;
 
-	seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2518FD_MASK,
+	 /* Use the MCP2517FD mask on the MCP2518FD, too. We only
+	  * compare 7 bits, this is enough to detect old TEF objects.
+	  */
+	seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK,
 			hw_tef_obj->flags);
-
-	/* Use the MCP2517FD mask on the MCP2518FD, too. We only
-	 * compare 7 bits, this should be enough to detect
-	 * net-yet-completed, i.e. old TEF objects.
-	 */
-	seq_masked = seq &
-		field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK);
 	tef_tail_masked = priv->tef->tail &
 		field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK);
-	if (seq_masked != tef_tail_masked)
-		return mcp251xfd_handle_tefif_recover(priv, seq);
+
+	/* According to mcp2518fd erratum DS80000789E 6. the FIFOCI
+	 * bits of a FIFOSTA register, here the TX FIFO tail index
+	 * might be corrupted and we might process past the TEF FIFO's
+	 * head into old CAN frames.
+	 *
+	 * Compare the sequence number of the currently processed CAN
+	 * frame with the expected sequence number. Abort with
+	 * -EBADMSG if an old CAN frame is detected.
+	 */
+	if (seq != tef_tail_masked) {
+		netdev_dbg(priv->ndev, "%s: chip=0x%02x ring=0x%02x\n", __func__,
+			   seq, tef_tail_masked);
+		stats->tx_fifo_errors++;
+
+		return -EBADMSG;
+	}
 
 	tef_tail = mcp251xfd_get_tef_tail(priv);
 	skb = priv->can.echo_skb[tef_tail];
@@ -223,12 +206,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv)
 		unsigned int frame_len = 0;
 
 		err = mcp251xfd_handle_tefif_one(priv, &hw_tef_obj[i], &frame_len);
-		/* -EAGAIN means the Sequence Number in the TEF
-		 * doesn't match our tef_tail. This can happen if we
-		 * read the TEF objects too early. Leave loop let the
-		 * interrupt handler call us again.
+		/* -EBADMSG means we're affected by mcp2518fd erratum
+		 * DS80000789E 6., i.e. the Sequence Number in the TEF
+		 * doesn't match our tef_tail. Don't process any
+		 * further and mark processed frames as good.
 		 */
-		if (err == -EAGAIN)
+		if (err == -EBADMSG)
 			goto out_netif_wake_queue;
 		if (err)
 			return err;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 19/27] net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on sa8775p-ride-r3
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (8 preceding siblings ...)
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 18/27] can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd Sasha Levin
@ 2024-07-28  0:53 ` Sasha Levin
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 20/27] mlxsw: pci: Lock configuration space of upstream bridge during reset Sasha Levin
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Bartosz Golaszewski, Andrew Lunn, Jakub Kicinski, Sasha Levin,
	vkoul, alexandre.torgue, joabreu, davem, edumazet, pabeni,
	mcoquelin.stm32, netdev, linux-arm-msm, linux-stm32,
	linux-arm-kernel

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

[ Upstream commit 3c466d6537b99f801b3f68af3d8124d4312437a0 ]

On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at
the time of the DMA reset. We can however extract the RX clock from the
internal SERDES block. Once the link is up, we can revert to the
previous state.

The AQR115C PHY doesn't support in-band signalling so we can count on
getting the link up notification and safely reuse existing callbacks
which are already used by another HW quirk workaround which enables the
functional clock to avoid a DMA reset due to timeout.

Only enable loopback on revision 3 of the board - check the phy_mode to
make sure.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240703181500.28491-3-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../stmicro/stmmac/dwmac-qcom-ethqos.c        | 23 +++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
index 466c4002f00d4..3a7f3a8b06718 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
@@ -21,6 +21,7 @@
 #define RGMII_IO_MACRO_CONFIG2		0x1C
 #define RGMII_IO_MACRO_DEBUG1		0x20
 #define EMAC_SYSTEM_LOW_POWER_DEBUG	0x28
+#define EMAC_WRAPPER_SGMII_PHY_CNTRL1	0xf4
 
 /* RGMII_IO_MACRO_CONFIG fields */
 #define RGMII_CONFIG_FUNC_CLK_EN		BIT(30)
@@ -79,6 +80,9 @@
 #define ETHQOS_MAC_CTRL_SPEED_MODE		BIT(14)
 #define ETHQOS_MAC_CTRL_PORT_SEL		BIT(15)
 
+/* EMAC_WRAPPER_SGMII_PHY_CNTRL1 bits */
+#define SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN	BIT(3)
+
 #define SGMII_10M_RX_CLK_DVDR			0x31
 
 struct ethqos_emac_por {
@@ -95,6 +99,7 @@ struct ethqos_emac_driver_data {
 	bool has_integrated_pcs;
 	u32 dma_addr_width;
 	struct dwmac4_addrs dwmac4_addrs;
+	bool needs_sgmii_loopback;
 };
 
 struct qcom_ethqos {
@@ -114,6 +119,7 @@ struct qcom_ethqos {
 	unsigned int num_por;
 	bool rgmii_config_loopback_en;
 	bool has_emac_ge_3;
+	bool needs_sgmii_loopback;
 };
 
 static int rgmii_readl(struct qcom_ethqos *ethqos, unsigned int offset)
@@ -191,8 +197,22 @@ ethqos_update_link_clk(struct qcom_ethqos *ethqos, unsigned int speed)
 	clk_set_rate(ethqos->link_clk, ethqos->link_clk_rate);
 }
 
+static void
+qcom_ethqos_set_sgmii_loopback(struct qcom_ethqos *ethqos, bool enable)
+{
+	if (!ethqos->needs_sgmii_loopback ||
+	    ethqos->phy_mode != PHY_INTERFACE_MODE_2500BASEX)
+		return;
+
+	rgmii_updatel(ethqos,
+		      SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN,
+		      enable ? SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN : 0,
+		      EMAC_WRAPPER_SGMII_PHY_CNTRL1);
+}
+
 static void ethqos_set_func_clk_en(struct qcom_ethqos *ethqos)
 {
+	qcom_ethqos_set_sgmii_loopback(ethqos, true);
 	rgmii_updatel(ethqos, RGMII_CONFIG_FUNC_CLK_EN,
 		      RGMII_CONFIG_FUNC_CLK_EN, RGMII_IO_MACRO_CONFIG);
 }
@@ -277,6 +297,7 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = {
 	.has_emac_ge_3 = true,
 	.link_clk_name = "phyaux",
 	.has_integrated_pcs = true,
+	.needs_sgmii_loopback = true,
 	.dma_addr_width = 36,
 	.dwmac4_addrs = {
 		.dma_chan = 0x00008100,
@@ -674,6 +695,7 @@ static void ethqos_fix_mac_speed(void *priv, unsigned int speed, unsigned int mo
 {
 	struct qcom_ethqos *ethqos = priv;
 
+	qcom_ethqos_set_sgmii_loopback(ethqos, false);
 	ethqos->speed = speed;
 	ethqos_update_link_clk(ethqos, speed);
 	ethqos_configure(ethqos);
@@ -809,6 +831,7 @@ static int qcom_ethqos_probe(struct platform_device *pdev)
 	ethqos->num_por = data->num_por;
 	ethqos->rgmii_config_loopback_en = data->rgmii_config_loopback_en;
 	ethqos->has_emac_ge_3 = data->has_emac_ge_3;
+	ethqos->needs_sgmii_loopback = data->needs_sgmii_loopback;
 
 	ethqos->link_clk = devm_clk_get(dev, data->link_clk_name ?: "rgmii");
 	if (IS_ERR(ethqos->link_clk))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH AUTOSEL 6.10 20/27] mlxsw: pci: Lock configuration space of upstream bridge during reset
  2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
                   ` (9 preceding siblings ...)
  2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 19/27] net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on sa8775p-ride-r3 Sasha Levin
@ 2024-07-28  0:53 ` Sasha Levin
  10 siblings, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2024-07-28  0:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ido Schimmel, Petr Machata, Przemek Kitszel, Jakub Kicinski,
	Sasha Levin, davem, edumazet, pabeni, netdev

From: Ido Schimmel <idosch@nvidia.com>

[ Upstream commit 0970836c348b6bc2ea77ce4348a136d6febfd440 ]

The driver triggers a "Secondary Bus Reset" (SBR) by calling
__pci_reset_function_locked() which asserts the SBR bit in the "Bridge
Control Register" in the configuration space of the upstream bridge for
2ms. This is done without locking the configuration space of the
upstream bridge port, allowing user space to access it concurrently.

Linux 6.11 will start warning about such unlocked resets [1][2]:

pcieport 0000:00:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x51c/0x6a0

Avoid the warning and the concurrent access by locking the configuration
space of the upstream bridge prior to the reset and unlocking it
afterwards.

[1] https://lore.kernel.org/all/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com/
[2] https://lore.kernel.org/all/20240531213150.GA610983@bhelgaas/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/9937b0afdb50f2f2825945393c94c093c04a5897.1720447210.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index c0ced4d315f3d..d92f640bae575 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1599,6 +1599,7 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci,
 {
 	struct pci_dev *pdev = mlxsw_pci->pdev;
 	char mrsr_pl[MLXSW_REG_MRSR_LEN];
+	struct pci_dev *bridge;
 	int err;
 
 	if (!pci_reset_sbr_supported) {
@@ -1615,6 +1616,9 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci,
 sbr:
 	device_lock_assert(&pdev->dev);
 
+	bridge = pci_upstream_bridge(pdev);
+	if (bridge)
+		pci_cfg_access_lock(bridge);
 	pci_cfg_access_lock(pdev);
 	pci_save_state(pdev);
 
@@ -1624,6 +1628,8 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci,
 
 	pci_restore_state(pdev);
 	pci_cfg_access_unlock(pdev);
+	if (bridge)
+		pci_cfg_access_unlock(bridge);
 
 	return err;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-02-23  9:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-28  0:52 [PATCH AUTOSEL 6.10 01/27] wifi: nl80211: disallow setting special AP channel widths Sasha Levin
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 03/27] r8169: remove detection of chip version 11 (early RTL8168b) Sasha Levin
2024-07-29  8:45   ` Heiner Kallweit
2024-08-10  9:12     ` Sasha Levin
2024-08-11 14:32       ` Heiner Kallweit
2024-08-11 21:16         ` Sasha Levin
2025-02-23  0:57           ` Michael Pflüger
2025-02-23  9:43             ` Heiner Kallweit
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 06/27] net/mlx5e: SHAMPO, Fix invalid WQ linked list unlink Sasha Levin
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 08/27] rtnetlink: move rtnl_lock handling out of af_netlink Sasha Levin
2024-07-29 15:01   ` Jakub Kicinski
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT Sasha Levin
2024-07-29 15:00   ` Jakub Kicinski
2024-08-10  9:12     ` Sasha Levin
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 11/27] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect() Sasha Levin
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 13/27] wifi: mac80211: fix NULL dereference at band check in starting tx ba session Sasha Levin
2024-07-28  0:52 ` [PATCH AUTOSEL 6.10 16/27] wifi: nl80211: don't give key data to userspace Sasha Levin
2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 17/27] can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum Sasha Levin
2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 18/27] can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd Sasha Levin
2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 19/27] net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on sa8775p-ride-r3 Sasha Levin
2024-07-28  0:53 ` [PATCH AUTOSEL 6.10 20/27] mlxsw: pci: Lock configuration space of upstream bridge during reset Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).