* [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns()
@ 2025-01-14 16:44 Roger Quadros
2025-01-14 17:12 ` Simon Horman
2025-01-15 5:18 ` Siddharth Vadapalli
0 siblings, 2 replies; 12+ messages in thread
From: Roger Quadros @ 2025-01-14 16:44 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Grygorii Strashko, Siddharth Vadapalli
Cc: srk, netdev, linux-kernel, Roger Quadros
When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
negative error value on error. So not NULL check is not sufficient
to deteremine if IRQ is valid. Check that IRQ is greater then zero
to ensure it is valid.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
---
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 5465bf872734..e1de45fb18ae 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -2248,7 +2248,7 @@ static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common)
for (i = 0; i < common->tx_ch_num; i++) {
struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[i];
- if (tx_chn->irq)
+ if (tx_chn->irq > 0)
devm_free_irq(dev, tx_chn->irq, tx_chn);
netif_napi_del(&tx_chn->napi_tx);
---
base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
change-id: 20250114-am65-cpsw-fix-tx-irq-free-846ac55ee6e1
Best regards,
--
Roger Quadros <rogerq@kernel.org>
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-14 16:44 [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() Roger Quadros @ 2025-01-14 17:12 ` Simon Horman 2025-01-15 5:18 ` Siddharth Vadapalli 1 sibling, 0 replies; 12+ messages in thread From: Simon Horman @ 2025-01-14 17:12 UTC (permalink / raw) To: Roger Quadros Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, Siddharth Vadapalli, srk, netdev, linux-kernel On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > negative error value on error. So not NULL check is not sufficient > to deteremine if IRQ is valid. Check that IRQ is greater then zero > to ensure it is valid. > > Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") > Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-14 16:44 [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() Roger Quadros 2025-01-14 17:12 ` Simon Horman @ 2025-01-15 5:18 ` Siddharth Vadapalli 2025-01-15 10:04 ` Roger Quadros 1 sibling, 1 reply; 12+ messages in thread From: Siddharth Vadapalli @ 2025-01-15 5:18 UTC (permalink / raw) To: Roger Quadros Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, Siddharth Vadapalli, srk, netdev, linux-kernel On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: Hello Roger, > When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); Additionally, following the above section we have: if (tx_chn->irq < 0) { dev_err(dev, "Failed to get tx dma irq %d\n", tx_chn->irq); ret = tx_chn->irq; goto err; } Could you please provide details on the code-path which will lead to a negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: 1. am65_cpsw_nuss_update_tx_rx_chns() 2. am65_cpsw_nuss_suspend() Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it appears to me that "tx_chn->irq" will never be negative within am65_cpsw_nuss_remove_tx_chns() Please let me know if I have overlooked something. Regards, Siddharth. > negative error value on error. So not NULL check is not sufficient > to deteremine if IRQ is valid. Check that IRQ is greater then zero > to ensure it is valid. > > Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") > Signed-off-by: Roger Quadros <rogerq@kernel.org> > --- > drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > index 5465bf872734..e1de45fb18ae 100644 > --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c > +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > @@ -2248,7 +2248,7 @@ static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common) > for (i = 0; i < common->tx_ch_num; i++) { > struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[i]; > > - if (tx_chn->irq) > + if (tx_chn->irq > 0) > devm_free_irq(dev, tx_chn->irq, tx_chn); > > netif_napi_del(&tx_chn->napi_tx); > > --- > base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532 > change-id: 20250114-am65-cpsw-fix-tx-irq-free-846ac55ee6e1 > > Best regards, > -- > Roger Quadros <rogerq@kernel.org> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 5:18 ` Siddharth Vadapalli @ 2025-01-15 10:04 ` Roger Quadros 2025-01-15 10:38 ` Siddharth Vadapalli 0 siblings, 1 reply; 12+ messages in thread From: Roger Quadros @ 2025-01-15 10:04 UTC (permalink / raw) To: Siddharth Vadapalli Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel Hi Siddharth, On 15/01/2025 07:18, Siddharth Vadapalli wrote: > On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > > Hello Roger, > >> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > > You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: Yes I meant tx instead of rx. > > tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > > Additionally, following the above section we have: > > if (tx_chn->irq < 0) { > dev_err(dev, "Failed to get tx dma irq %d\n", > tx_chn->irq); > ret = tx_chn->irq; > goto err; > } > > Could you please provide details on the code-path which will lead to a > negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > > There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > 1. am65_cpsw_nuss_update_tx_rx_chns() > 2. am65_cpsw_nuss_suspend() > Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > appears to me that "tx_chn->irq" will never be negative within > am65_cpsw_nuss_remove_tx_chns() > > Please let me know if I have overlooked something. The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called repeatedly (by user changing number of TX queues) even if previous call to am65_cpsw_nuss_init_tx_chns() failed. Please try the below patch to simulate the error condition. Then do the following - bring down all network interfaces - change num TX queues to 2. IRQ for 2nd channel fails. - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. Also I think it is good practice to code for set value than to code for existing code paths as they can change in the future. --test patch starts-- diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 36c29d3db329..c22cadaaf3d3 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -155,7 +155,7 @@ NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) -#define AM65_CPSW_DEFAULT_TX_CHNS 8 +#define AM65_CPSW_DEFAULT_TX_CHNS 1 #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 /* CPPI streaming packet interface */ @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) tx_chn->dsize_log2 = __fls(hdesc_size_out); WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); + if (i == 1) + tx_chn->irq = -ENODEV; + else + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); if (tx_chn->irq < 0) { dev_err(dev, "Failed to get tx dma irq %d\n", tx_chn->irq); -- cheers, -roger ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 10:04 ` Roger Quadros @ 2025-01-15 10:38 ` Siddharth Vadapalli 2025-01-15 15:49 ` Roger Quadros 0 siblings, 1 reply; 12+ messages in thread From: Siddharth Vadapalli @ 2025-01-15 10:38 UTC (permalink / raw) To: Roger Quadros Cc: Siddharth Vadapalli, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: > Hi Siddharth, > > On 15/01/2025 07:18, Siddharth Vadapalli wrote: > > On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > > > > Hello Roger, > > > >> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > > > > You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > > assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: > > Yes I meant tx instead of rx. > > > > > tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > > > > Additionally, following the above section we have: > > > > if (tx_chn->irq < 0) { > > dev_err(dev, "Failed to get tx dma irq %d\n", > > tx_chn->irq); > > ret = tx_chn->irq; > > goto err; > > } > > > > Could you please provide details on the code-path which will lead to a > > negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > > > > There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > > 1. am65_cpsw_nuss_update_tx_rx_chns() > > 2. am65_cpsw_nuss_suspend() > > Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > > in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > > appears to me that "tx_chn->irq" will never be negative within > > am65_cpsw_nuss_remove_tx_chns() > > > > Please let me know if I have overlooked something. > > The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called > repeatedly (by user changing number of TX queues) even if previous call > to am65_cpsw_nuss_init_tx_chns() failed. Thank you for clarifying. So the issue/bug was discovered since the implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag misled me. Maybe the "Fixes" tag should be updated? Though we should code to future-proof it as done in this patch, the "Fixes" tag pointing to the very first commit of the driver might not be accurate as the code-path associated with the bug cannot be exercised at that commit. Independent of the above change suggested for the "Fixes" tag, Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns() which I have described below. > > Please try the below patch to simulate the error condition. > > Then do the following > - bring down all network interfaces > - change num TX queues to 2. IRQ for 2nd channel fails. > - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. > > Also I think it is good practice to code for set value than to code > for existing code paths as they can change in the future. > > --test patch starts-- > > diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > index 36c29d3db329..c22cadaaf3d3 100644 > --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c > +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > @@ -155,7 +155,7 @@ > NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ > NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) > > -#define AM65_CPSW_DEFAULT_TX_CHNS 8 > +#define AM65_CPSW_DEFAULT_TX_CHNS 1 > #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 > > /* CPPI streaming packet interface */ > @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) > tx_chn->dsize_log2 = __fls(hdesc_size_out); > WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); > > - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > + if (i == 1) > + tx_chn->irq = -ENODEV; > + else > + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns() seem to be written under the assumption that failure will result in the driver's probe failing. With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds true. Please consider the following sequence: 1. am65_cpsw_nuss_probe() am65_cpsw_nuss_register_ndevs() am65_cpsw_nuss_init_tx_chns() => Succeeds 2. Probe is successful 3. am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user am65_cpsw_nuss_remove_tx_chns() => Succeeds am65_cpsw_nuss_init_tx_chns() => Partially fails devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common); ^ DEVM Action is added, but since the driver isn't removed, the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(), the cleanup will be performed. This might have to be fixed in the following manner: @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common, common->tx_ch_num = num_tx; common->rx_ch_num_flows = num_rx; ret = am65_cpsw_nuss_init_tx_chns(common); - if (ret) + if (ret) { + devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common); + am65_cpsw_nuss_free_tx_chns(common); return ret; + } ret = am65_cpsw_nuss_init_rx_chns(common); + if (ret) { + devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common); + am65_cpsw_nuss_free_rx_chns(common); + } return ret; } Please let me know what you think. Regards, Siddharth. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 10:38 ` Siddharth Vadapalli @ 2025-01-15 15:49 ` Roger Quadros 2025-01-15 16:38 ` Roger Quadros 2025-01-16 5:09 ` Siddharth Vadapalli 0 siblings, 2 replies; 12+ messages in thread From: Roger Quadros @ 2025-01-15 15:49 UTC (permalink / raw) To: Siddharth Vadapalli Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel Hi Siddharth, On 15/01/2025 12:38, Siddharth Vadapalli wrote: > On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: >> Hi Siddharth, >> >> On 15/01/2025 07:18, Siddharth Vadapalli wrote: >>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: >>> >>> Hello Roger, >>> >>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns >>> >>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to >>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: >> >> Yes I meant tx instead of rx. >> >>> >>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >>> >>> Additionally, following the above section we have: >>> >>> if (tx_chn->irq < 0) { >>> dev_err(dev, "Failed to get tx dma irq %d\n", >>> tx_chn->irq); >>> ret = tx_chn->irq; >>> goto err; >>> } >>> >>> Could you please provide details on the code-path which will lead to a >>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? >>> >>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: >>> 1. am65_cpsw_nuss_update_tx_rx_chns() >>> 2. am65_cpsw_nuss_suspend() >>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only >>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it >>> appears to me that "tx_chn->irq" will never be negative within >>> am65_cpsw_nuss_remove_tx_chns() >>> >>> Please let me know if I have overlooked something. >> >> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called >> repeatedly (by user changing number of TX queues) even if previous call >> to am65_cpsw_nuss_init_tx_chns() failed. > > Thank you for clarifying. So the issue/bug was discovered since the > implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag > misled me. Maybe the "Fixes" tag should be updated? Though we should > code to future-proof it as done in this patch, the "Fixes" tag pointing > to the very first commit of the driver might not be accurate as the > code-path associated with the bug cannot be exercised at that commit. Fair enough. I'll change the Fixes commit. > > Independent of the above change suggested for the "Fixes" tag, > > Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> > > There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns() > which I have described below. > >> >> Please try the below patch to simulate the error condition. >> >> Then do the following >> - bring down all network interfaces >> - change num TX queues to 2. IRQ for 2nd channel fails. >> - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. >> >> Also I think it is good practice to code for set value than to code >> for existing code paths as they can change in the future. >> >> --test patch starts-- >> >> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c >> index 36c29d3db329..c22cadaaf3d3 100644 >> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c >> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c >> @@ -155,7 +155,7 @@ >> NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ >> NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) >> >> -#define AM65_CPSW_DEFAULT_TX_CHNS 8 >> +#define AM65_CPSW_DEFAULT_TX_CHNS 1 >> #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 >> >> /* CPPI streaming packet interface */ >> @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) >> tx_chn->dsize_log2 = __fls(hdesc_size_out); >> WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); >> >> - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >> + if (i == 1) >> + tx_chn->irq = -ENODEV; >> + else >> + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > > The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns() > seem to be written under the assumption that failure will result in the > driver's probe failing. > > With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds > true. Please consider the following sequence: > > 1. > am65_cpsw_nuss_probe() > am65_cpsw_nuss_register_ndevs() > am65_cpsw_nuss_init_tx_chns() => Succeeds > > 2. > Probe is successful > > 3. > am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user > am65_cpsw_nuss_remove_tx_chns() => Succeeds > am65_cpsw_nuss_init_tx_chns() => Partially fails > devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common); > ^ DEVM Action is added, but since the driver isn't removed, > the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. > > Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(), > the cleanup will be performed. This might have to be fixed in the > following manner: > > @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common, > common->tx_ch_num = num_tx; > common->rx_ch_num_flows = num_rx; > ret = am65_cpsw_nuss_init_tx_chns(common); > - if (ret) > + if (ret) { > + devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common); > + am65_cpsw_nuss_free_tx_chns(common); > return ret; > + } > > ret = am65_cpsw_nuss_init_rx_chns(common); > + if (ret) { > + devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common); > + am65_cpsw_nuss_free_rx_chns(common); > + } > > return ret; > } > > Please let me know what you think. I've already implemented a cleanup series to get rid of devm_add/remove_action, cleanup probe error path and streamline TX and RQ queue init/cleanup. I'll send out the series soon as soon as I finish some tests. -- cheers, -roger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 15:49 ` Roger Quadros @ 2025-01-15 16:38 ` Roger Quadros 2025-01-16 5:15 ` Siddharth Vadapalli 2025-01-16 5:09 ` Siddharth Vadapalli 1 sibling, 1 reply; 12+ messages in thread From: Roger Quadros @ 2025-01-15 16:38 UTC (permalink / raw) To: Siddharth Vadapalli Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel Siddharth, On 15/01/2025 17:49, Roger Quadros wrote: > Hi Siddharth, > > On 15/01/2025 12:38, Siddharth Vadapalli wrote: >> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: >>> Hi Siddharth, >>> >>> On 15/01/2025 07:18, Siddharth Vadapalli wrote: >>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: >>>> >>>> Hello Roger, >>>> >>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns >>>> >>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to >>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: >>> >>> Yes I meant tx instead of rx. >>> >>>> >>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >>>> >>>> Additionally, following the above section we have: >>>> >>>> if (tx_chn->irq < 0) { >>>> dev_err(dev, "Failed to get tx dma irq %d\n", >>>> tx_chn->irq); >>>> ret = tx_chn->irq; >>>> goto err; >>>> } >>>> >>>> Could you please provide details on the code-path which will lead to a >>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? >>>> >>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: >>>> 1. am65_cpsw_nuss_update_tx_rx_chns() >>>> 2. am65_cpsw_nuss_suspend() >>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only >>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it >>>> appears to me that "tx_chn->irq" will never be negative within >>>> am65_cpsw_nuss_remove_tx_chns() >>>> >>>> Please let me know if I have overlooked something. >>> >>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called >>> repeatedly (by user changing number of TX queues) even if previous call >>> to am65_cpsw_nuss_init_tx_chns() failed. >> >> Thank you for clarifying. So the issue/bug was discovered since the >> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag >> misled me. Maybe the "Fixes" tag should be updated? Though we should >> code to future-proof it as done in this patch, the "Fixes" tag pointing >> to the very first commit of the driver might not be accurate as the >> code-path associated with the bug cannot be exercised at that commit. > > Fair enough. I'll change the Fixes commit. Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(), am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns() were all introduced in the Fixes commit I stated. Could you please share why you thought it is not accurate? > >> >> Independent of the above change suggested for the "Fixes" tag, >> >> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> >> >> There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns() >> which I have described below. >> >>> >>> Please try the below patch to simulate the error condition. >>> >>> Then do the following >>> - bring down all network interfaces >>> - change num TX queues to 2. IRQ for 2nd channel fails. >>> - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. >>> >>> Also I think it is good practice to code for set value than to code >>> for existing code paths as they can change in the future. >>> >>> --test patch starts-- >>> >>> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c >>> index 36c29d3db329..c22cadaaf3d3 100644 >>> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c >>> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c >>> @@ -155,7 +155,7 @@ >>> NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ >>> NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) >>> >>> -#define AM65_CPSW_DEFAULT_TX_CHNS 8 >>> +#define AM65_CPSW_DEFAULT_TX_CHNS 1 >>> #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 >>> >>> /* CPPI streaming packet interface */ >>> @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) >>> tx_chn->dsize_log2 = __fls(hdesc_size_out); >>> WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); >>> >>> - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >>> + if (i == 1) >>> + tx_chn->irq = -ENODEV; >>> + else >>> + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >> >> The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns() >> seem to be written under the assumption that failure will result in the >> driver's probe failing. >> >> With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds >> true. Please consider the following sequence: >> >> 1. >> am65_cpsw_nuss_probe() >> am65_cpsw_nuss_register_ndevs() >> am65_cpsw_nuss_init_tx_chns() => Succeeds >> >> 2. >> Probe is successful >> >> 3. >> am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user >> am65_cpsw_nuss_remove_tx_chns() => Succeeds >> am65_cpsw_nuss_init_tx_chns() => Partially fails >> devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common); >> ^ DEVM Action is added, but since the driver isn't removed, >> the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. >> >> Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(), >> the cleanup will be performed. This might have to be fixed in the >> following manner: >> >> @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common, >> common->tx_ch_num = num_tx; >> common->rx_ch_num_flows = num_rx; >> ret = am65_cpsw_nuss_init_tx_chns(common); >> - if (ret) >> + if (ret) { >> + devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common); >> + am65_cpsw_nuss_free_tx_chns(common); >> return ret; >> + } >> >> ret = am65_cpsw_nuss_init_rx_chns(common); >> + if (ret) { >> + devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common); >> + am65_cpsw_nuss_free_rx_chns(common); >> + } >> >> return ret; >> } >> >> Please let me know what you think. > > I've already implemented a cleanup series to get rid of devm_add/remove_action, > cleanup probe error path and streamline TX and RQ queue init/cleanup. > I'll send out the series soon as soon as I finish some tests. > -- cheers, -roger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 16:38 ` Roger Quadros @ 2025-01-16 5:15 ` Siddharth Vadapalli 2025-01-16 11:47 ` Roger Quadros 0 siblings, 1 reply; 12+ messages in thread From: Siddharth Vadapalli @ 2025-01-16 5:15 UTC (permalink / raw) To: Roger Quadros Cc: Siddharth Vadapalli, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote: > Siddharth, > > On 15/01/2025 17:49, Roger Quadros wrote: > > Hi Siddharth, > > > > On 15/01/2025 12:38, Siddharth Vadapalli wrote: > >> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: > >>> Hi Siddharth, > >>> > >>> On 15/01/2025 07:18, Siddharth Vadapalli wrote: > >>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > >>>> > >>>> Hello Roger, > >>>> > >>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > >>>> > >>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > >>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: > >>> > >>> Yes I meant tx instead of rx. > >>> > >>>> > >>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > >>>> > >>>> Additionally, following the above section we have: > >>>> > >>>> if (tx_chn->irq < 0) { > >>>> dev_err(dev, "Failed to get tx dma irq %d\n", > >>>> tx_chn->irq); > >>>> ret = tx_chn->irq; > >>>> goto err; > >>>> } > >>>> > >>>> Could you please provide details on the code-path which will lead to a > >>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > >>>> > >>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > >>>> 1. am65_cpsw_nuss_update_tx_rx_chns() > >>>> 2. am65_cpsw_nuss_suspend() > >>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > >>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > >>>> appears to me that "tx_chn->irq" will never be negative within > >>>> am65_cpsw_nuss_remove_tx_chns() > >>>> > >>>> Please let me know if I have overlooked something. > >>> > >>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called > >>> repeatedly (by user changing number of TX queues) even if previous call > >>> to am65_cpsw_nuss_init_tx_chns() failed. > >> > >> Thank you for clarifying. So the issue/bug was discovered since the > >> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag > >> misled me. Maybe the "Fixes" tag should be updated? Though we should > >> code to future-proof it as done in this patch, the "Fixes" tag pointing > >> to the very first commit of the driver might not be accurate as the > >> code-path associated with the bug cannot be exercised at that commit. > > > > Fair enough. I'll change the Fixes commit. > > Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(), > am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns() > were all introduced in the Fixes commit I stated. > > Could you please share why you thought it is not accurate? Though the functions were introduced in the Fixes commit that you have mentioned in the commit message, the check for "tx_chn->irq" being strictly positive as implemented in this patch, is not required until the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason I say so is that a negative value for "tx_chn->irq" within am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns() to partially fail *followed by* invocation of am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed commit which introduced them, since the driver probe fails when am65_cpsw_nuss_init_tx_chns() fails. The code path: am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails am65_cpsw_nuss_remove_tx_chns() => Invoked later on isn't possible in the blamed commit. Regards, Siddharth. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-16 5:15 ` Siddharth Vadapalli @ 2025-01-16 11:47 ` Roger Quadros 2025-01-16 12:10 ` Siddharth Vadapalli 0 siblings, 1 reply; 12+ messages in thread From: Roger Quadros @ 2025-01-16 11:47 UTC (permalink / raw) To: Siddharth Vadapalli Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On 16/01/2025 07:15, Siddharth Vadapalli wrote: > On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote: >> Siddharth, >> >> On 15/01/2025 17:49, Roger Quadros wrote: >>> Hi Siddharth, >>> >>> On 15/01/2025 12:38, Siddharth Vadapalli wrote: >>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: >>>>> Hi Siddharth, >>>>> >>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote: >>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: >>>>>> >>>>>> Hello Roger, >>>>>> >>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns >>>>>> >>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to >>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: >>>>> >>>>> Yes I meant tx instead of rx. >>>>> >>>>>> >>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >>>>>> >>>>>> Additionally, following the above section we have: >>>>>> >>>>>> if (tx_chn->irq < 0) { >>>>>> dev_err(dev, "Failed to get tx dma irq %d\n", >>>>>> tx_chn->irq); >>>>>> ret = tx_chn->irq; >>>>>> goto err; >>>>>> } >>>>>> >>>>>> Could you please provide details on the code-path which will lead to a >>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? >>>>>> >>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: >>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns() >>>>>> 2. am65_cpsw_nuss_suspend() >>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only >>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it >>>>>> appears to me that "tx_chn->irq" will never be negative within >>>>>> am65_cpsw_nuss_remove_tx_chns() >>>>>> >>>>>> Please let me know if I have overlooked something. >>>>> >>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called >>>>> repeatedly (by user changing number of TX queues) even if previous call >>>>> to am65_cpsw_nuss_init_tx_chns() failed. >>>> >>>> Thank you for clarifying. So the issue/bug was discovered since the >>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag >>>> misled me. Maybe the "Fixes" tag should be updated? Though we should >>>> code to future-proof it as done in this patch, the "Fixes" tag pointing >>>> to the very first commit of the driver might not be accurate as the >>>> code-path associated with the bug cannot be exercised at that commit. >>> >>> Fair enough. I'll change the Fixes commit. >> >> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(), >> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns() >> were all introduced in the Fixes commit I stated. >> >> Could you please share why you thought it is not accurate? > > Though the functions were introduced in the Fixes commit that you have > mentioned in the commit message, the check for "tx_chn->irq" being > strictly positive as implemented in this patch, is not required until > the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason > I say so is that a negative value for "tx_chn->irq" within > am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns() > to partially fail *followed by* invocation of > am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed > commit which introduced them, since the driver probe fails when > am65_cpsw_nuss_init_tx_chns() fails. The code path: > > am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails > am65_cpsw_nuss_remove_tx_chns() => Invoked later on > > isn't possible in the blamed commit. But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was introduced in the blamed commit and the test case I shared to test .set_channels with different channel counts can still fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails. > > Regards, > Siddharth. -- cheers, -roger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-16 11:47 ` Roger Quadros @ 2025-01-16 12:10 ` Siddharth Vadapalli 2025-01-16 13:00 ` Roger Quadros 0 siblings, 1 reply; 12+ messages in thread From: Siddharth Vadapalli @ 2025-01-16 12:10 UTC (permalink / raw) To: Roger Quadros Cc: Siddharth Vadapalli, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On Thu, Jan 16, 2025 at 01:47:59PM +0200, Roger Quadros wrote: > > > On 16/01/2025 07:15, Siddharth Vadapalli wrote: > > On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote: > >> Siddharth, > >> > >> On 15/01/2025 17:49, Roger Quadros wrote: > >>> Hi Siddharth, > >>> > >>> On 15/01/2025 12:38, Siddharth Vadapalli wrote: > >>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: > >>>>> Hi Siddharth, > >>>>> > >>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote: > >>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > >>>>>> > >>>>>> Hello Roger, > >>>>>> > >>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > >>>>>> > >>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > >>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: > >>>>> > >>>>> Yes I meant tx instead of rx. > >>>>> > >>>>>> > >>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > >>>>>> > >>>>>> Additionally, following the above section we have: > >>>>>> > >>>>>> if (tx_chn->irq < 0) { > >>>>>> dev_err(dev, "Failed to get tx dma irq %d\n", > >>>>>> tx_chn->irq); > >>>>>> ret = tx_chn->irq; > >>>>>> goto err; > >>>>>> } > >>>>>> > >>>>>> Could you please provide details on the code-path which will lead to a > >>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > >>>>>> > >>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > >>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns() > >>>>>> 2. am65_cpsw_nuss_suspend() > >>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > >>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > >>>>>> appears to me that "tx_chn->irq" will never be negative within > >>>>>> am65_cpsw_nuss_remove_tx_chns() > >>>>>> > >>>>>> Please let me know if I have overlooked something. > >>>>> > >>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called > >>>>> repeatedly (by user changing number of TX queues) even if previous call > >>>>> to am65_cpsw_nuss_init_tx_chns() failed. > >>>> > >>>> Thank you for clarifying. So the issue/bug was discovered since the > >>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag > >>>> misled me. Maybe the "Fixes" tag should be updated? Though we should > >>>> code to future-proof it as done in this patch, the "Fixes" tag pointing > >>>> to the very first commit of the driver might not be accurate as the > >>>> code-path associated with the bug cannot be exercised at that commit. > >>> > >>> Fair enough. I'll change the Fixes commit. > >> > >> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(), > >> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns() > >> were all introduced in the Fixes commit I stated. > >> > >> Could you please share why you thought it is not accurate? > > > > Though the functions were introduced in the Fixes commit that you have > > mentioned in the commit message, the check for "tx_chn->irq" being > > strictly positive as implemented in this patch, is not required until > > the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason > > I say so is that a negative value for "tx_chn->irq" within > > am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns() > > to partially fail *followed by* invocation of > > am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed > > commit which introduced them, since the driver probe fails when > > am65_cpsw_nuss_init_tx_chns() fails. The code path: > > > > am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails > > am65_cpsw_nuss_remove_tx_chns() => Invoked later on > > > > isn't possible in the blamed commit. > > But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was > introduced in the blamed commit and the test case I shared to > test .set_channels with different channel counts can still > fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails. I was looking for "am65_cpsw_nuss_update_tx_rx_chns()" in the blamed commit. I realize now that it was renamed from am65_cpsw_nuss_update_tx_chns() which indeed is present in the blamed commit. I apologize for the confusion caused. Regards, Siddharth. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-16 12:10 ` Siddharth Vadapalli @ 2025-01-16 13:00 ` Roger Quadros 0 siblings, 0 replies; 12+ messages in thread From: Roger Quadros @ 2025-01-16 13:00 UTC (permalink / raw) To: Siddharth Vadapalli Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On 16/01/2025 14:10, Siddharth Vadapalli wrote: > On Thu, Jan 16, 2025 at 01:47:59PM +0200, Roger Quadros wrote: >> >> >> On 16/01/2025 07:15, Siddharth Vadapalli wrote: >>> On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote: >>>> Siddharth, >>>> >>>> On 15/01/2025 17:49, Roger Quadros wrote: >>>>> Hi Siddharth, >>>>> >>>>> On 15/01/2025 12:38, Siddharth Vadapalli wrote: >>>>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: >>>>>>> Hi Siddharth, >>>>>>> >>>>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote: >>>>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: >>>>>>>> >>>>>>>> Hello Roger, >>>>>>>> >>>>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns >>>>>>>> >>>>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to >>>>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: >>>>>>> >>>>>>> Yes I meant tx instead of rx. >>>>>>> >>>>>>>> >>>>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); >>>>>>>> >>>>>>>> Additionally, following the above section we have: >>>>>>>> >>>>>>>> if (tx_chn->irq < 0) { >>>>>>>> dev_err(dev, "Failed to get tx dma irq %d\n", >>>>>>>> tx_chn->irq); >>>>>>>> ret = tx_chn->irq; >>>>>>>> goto err; >>>>>>>> } >>>>>>>> >>>>>>>> Could you please provide details on the code-path which will lead to a >>>>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? >>>>>>>> >>>>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: >>>>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns() >>>>>>>> 2. am65_cpsw_nuss_suspend() >>>>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only >>>>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it >>>>>>>> appears to me that "tx_chn->irq" will never be negative within >>>>>>>> am65_cpsw_nuss_remove_tx_chns() >>>>>>>> >>>>>>>> Please let me know if I have overlooked something. >>>>>>> >>>>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called >>>>>>> repeatedly (by user changing number of TX queues) even if previous call >>>>>>> to am65_cpsw_nuss_init_tx_chns() failed. >>>>>> >>>>>> Thank you for clarifying. So the issue/bug was discovered since the >>>>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag >>>>>> misled me. Maybe the "Fixes" tag should be updated? Though we should >>>>>> code to future-proof it as done in this patch, the "Fixes" tag pointing >>>>>> to the very first commit of the driver might not be accurate as the >>>>>> code-path associated with the bug cannot be exercised at that commit. >>>>> >>>>> Fair enough. I'll change the Fixes commit. >>>> >>>> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(), >>>> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns() >>>> were all introduced in the Fixes commit I stated. >>>> >>>> Could you please share why you thought it is not accurate? >>> >>> Though the functions were introduced in the Fixes commit that you have >>> mentioned in the commit message, the check for "tx_chn->irq" being >>> strictly positive as implemented in this patch, is not required until >>> the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason >>> I say so is that a negative value for "tx_chn->irq" within >>> am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns() >>> to partially fail *followed by* invocation of >>> am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed >>> commit which introduced them, since the driver probe fails when >>> am65_cpsw_nuss_init_tx_chns() fails. The code path: >>> >>> am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails >>> am65_cpsw_nuss_remove_tx_chns() => Invoked later on >>> >>> isn't possible in the blamed commit. >> >> But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was >> introduced in the blamed commit and the test case I shared to >> test .set_channels with different channel counts can still >> fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails. > > I was looking for "am65_cpsw_nuss_update_tx_rx_chns()" in the blamed > commit. I realize now that it was renamed from > am65_cpsw_nuss_update_tx_chns() which indeed is present in the blamed > commit. I apologize for the confusion caused. > No worries at all. I'll respin the patch with the typo fix and add more description in commit log to clarify this. -- cheers, -roger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() 2025-01-15 15:49 ` Roger Quadros 2025-01-15 16:38 ` Roger Quadros @ 2025-01-16 5:09 ` Siddharth Vadapalli 1 sibling, 0 replies; 12+ messages in thread From: Siddharth Vadapalli @ 2025-01-16 5:09 UTC (permalink / raw) To: Roger Quadros Cc: Siddharth Vadapalli, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Grygorii Strashko, srk, netdev, linux-kernel On Wed, Jan 15, 2025 at 05:49:44PM +0200, Roger Quadros wrote: > Hi Siddharth, > > On 15/01/2025 12:38, Siddharth Vadapalli wrote: [...] > > Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(), > > the cleanup will be performed. This might have to be fixed in the > > following manner: > > > > @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common, > > common->tx_ch_num = num_tx; > > common->rx_ch_num_flows = num_rx; > > ret = am65_cpsw_nuss_init_tx_chns(common); > > - if (ret) > > + if (ret) { > > + devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common); > > + am65_cpsw_nuss_free_tx_chns(common); > > return ret; > > + } > > > > ret = am65_cpsw_nuss_init_rx_chns(common); > > + if (ret) { > > + devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common); > > + am65_cpsw_nuss_free_rx_chns(common); > > + } > > > > return ret; > > } > > > > Please let me know what you think. > > I've already implemented a cleanup series to get rid of devm_add/remove_action, > cleanup probe error path and streamline TX and RQ queue init/cleanup. > I'll send out the series soon as soon as I finish some tests. Sure, thank you. Regards, Siddharth. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-16 13:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-14 16:44 [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() Roger Quadros 2025-01-14 17:12 ` Simon Horman 2025-01-15 5:18 ` Siddharth Vadapalli 2025-01-15 10:04 ` Roger Quadros 2025-01-15 10:38 ` Siddharth Vadapalli 2025-01-15 15:49 ` Roger Quadros 2025-01-15 16:38 ` Roger Quadros 2025-01-16 5:15 ` Siddharth Vadapalli 2025-01-16 11:47 ` Roger Quadros 2025-01-16 12:10 ` Siddharth Vadapalli 2025-01-16 13:00 ` Roger Quadros 2025-01-16 5:09 ` Siddharth Vadapalli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox