* [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow @ 2025-05-25 10:22 Suraj Gupta 2025-05-26 0:36 ` kernel test robot 2025-05-27 16:16 ` Sean Anderson 0 siblings, 2 replies; 13+ messages in thread From: Suraj Gupta @ 2025-05-25 10:22 UTC (permalink / raw) To: andrew+netdev, davem, edumazet, kuba, pabeni, vkoul, michal.simek, sean.anderson, radhey.shyam.pandey, horms Cc: netdev, linux-arm-kernel, linux-kernel, git, harini.katakam Add support to configure / report interrupt coalesce count and delay via ethtool in DMAEngine flow. Netperf numbers are not good when using non-dmaengine default values, so tuned coalesce count and delay and defined separate default values in dmaengine flow. Netperf numbers and CPU utilisation change in DMAengine flow after introducing coalescing with default parameters: coalesce parameters: Transfer type Before(w/o coalescing) After(with coalescing) TCP Tx, CPU utilisation% 925, 27 941, 22 TCP Rx, CPU utilisation% 607, 32 741, 36 UDP Tx, CPU utilisation% 857, 31 960, 28 UDP Rx, CPU utilisation% 762, 26 783, 18 Above numbers are observed with 4x Cortex-a53. Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> --- This patch depend on following AXI DMA dmengine driver changes sent to dmaengine mailing list as pre-requisit series: https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd.com/ --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 +++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index 5ff742103beb..cdf6cbb6f2fd 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -126,6 +126,12 @@ #define XAXIDMA_DFT_TX_USEC 50 #define XAXIDMA_DFT_RX_USEC 16 +/* Default TX/RX Threshold and delay timer values for SGDMA mode with DMAEngine */ +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 +#define XAXIDMAENGINE_DFT_TX_USEC 5 +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 +#define XAXIDMAENGINE_DFT_RX_USEC 16 + #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */ #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */ #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 1b7a653c1f4e..f9c7d90d4ecb 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct net_device *ndev) { struct axienet_local *lp = netdev_priv(ndev); struct skbuf_dma_descriptor *skbuf_dma; + struct dma_slave_config tx_config, rx_config; int i, ret; lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ -1520,6 +1521,22 @@ static int axienet_init_dmaengine(struct net_device *ndev) goto err_dma_release_tx; } + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; + + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); + if (ret) { + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); + goto err_dma_release_tx; + } + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); + if (ret) { + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); + goto err_dma_release_tx; + } + lp->tx_ring_tail = 0; lp->tx_ring_head = 0; lp->rx_ring_tail = 0; @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct net_device *ndev, struct axienet_local *lp = netdev_priv(ndev); u32 cr; + if (lp->use_dmaengine) { + struct dma_slave_caps tx_caps, rx_caps; + + dma_get_slave_caps(lp->tx_chan, &tx_caps); + dma_get_slave_caps(lp->rx_chan, &rx_caps); + + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; + return 0; + } + ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; spin_lock_irq(&lp->rx_cr_lock); @@ -2233,6 +2263,29 @@ axienet_ethtools_set_coalesce(struct net_device *ndev, return -EINVAL; } + if (lp->use_dmaengine) { + struct dma_slave_config tx_cfg, rx_cfg; + int ret; + + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; + + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); + if (ret) { + NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters"); + return ret; + } + + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); + if (ret) { + NL_SET_ERR_MSG(extack, "failed to set rx coalesce parameters"); + return ret; + } + return 0; + } + if (new_dim && !old_dim) { cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), ecoalesce->rx_coalesce_usecs); -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-25 10:22 [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow Suraj Gupta @ 2025-05-26 0:36 ` kernel test robot 2025-05-27 16:16 ` Sean Anderson 1 sibling, 0 replies; 13+ messages in thread From: kernel test robot @ 2025-05-26 0:36 UTC (permalink / raw) To: Suraj Gupta, andrew+netdev, davem, edumazet, kuba, pabeni, vkoul, michal.simek, sean.anderson, radhey.shyam.pandey, horms Cc: oe-kbuild-all, netdev, linux-arm-kernel, linux-kernel, git, harini.katakam Hi Suraj, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Suraj-Gupta/net-xilinx-axienet-Configure-and-report-coalesce-parameters-in-DMAengine-flow/20250525-182400 base: net-next/main patch link: https://lore.kernel.org/r/20250525102217.1181104-1-suraj.gupta2%40amd.com patch subject: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20250526/202505260804.Mhztve8t-lkp@intel.com/config) compiler: alpha-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250526/202505260804.Mhztve8t-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505260804.Mhztve8t-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/net/ethernet/xilinx/xilinx_axienet_main.c: In function 'axienet_init_dmaengine': >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1524:18: error: 'struct dma_slave_config' has no member named 'coalesce_cnt' 1524 | tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; | ^ >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1525:18: error: 'struct dma_slave_config' has no member named 'coalesce_usecs' 1525 | tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1526:18: error: 'struct dma_slave_config' has no member named 'coalesce_cnt' 1526 | rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:1527:18: error: 'struct dma_slave_config' has no member named 'coalesce_usecs' 1527 | rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c: In function 'axienet_ethtools_get_coalesce': >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2196:61: error: 'struct dma_slave_caps' has no member named 'coalesce_cnt' 2196 | ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; | ^ >> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2197:55: error: 'struct dma_slave_caps' has no member named 'coalesce_usecs' 2197 | ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2198:61: error: 'struct dma_slave_caps' has no member named 'coalesce_cnt' 2198 | ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2199:55: error: 'struct dma_slave_caps' has no member named 'coalesce_usecs' 2199 | ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c: In function 'axienet_ethtools_set_coalesce': drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2270:23: error: 'struct dma_slave_config' has no member named 'coalesce_cnt' 2270 | tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2271:23: error: 'struct dma_slave_config' has no member named 'coalesce_usecs' 2271 | tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2272:23: error: 'struct dma_slave_config' has no member named 'coalesce_cnt' 2272 | rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; | ^ drivers/net/ethernet/xilinx/xilinx_axienet_main.c:2273:23: error: 'struct dma_slave_config' has no member named 'coalesce_usecs' 2273 | rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; | ^ vim +1524 drivers/net/ethernet/xilinx/xilinx_axienet_main.c 1494 1495 /** 1496 * axienet_init_dmaengine - init the dmaengine code. 1497 * @ndev: Pointer to net_device structure 1498 * 1499 * Return: 0, on success. 1500 * non-zero error value on failure 1501 * 1502 * This is the dmaengine initialization code. 1503 */ 1504 static int axienet_init_dmaengine(struct net_device *ndev) 1505 { 1506 struct axienet_local *lp = netdev_priv(ndev); 1507 struct skbuf_dma_descriptor *skbuf_dma; 1508 struct dma_slave_config tx_config, rx_config; 1509 int i, ret; 1510 1511 lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); 1512 if (IS_ERR(lp->tx_chan)) { 1513 dev_err(lp->dev, "No Ethernet DMA (TX) channel found\n"); 1514 return PTR_ERR(lp->tx_chan); 1515 } 1516 1517 lp->rx_chan = dma_request_chan(lp->dev, "rx_chan0"); 1518 if (IS_ERR(lp->rx_chan)) { 1519 ret = PTR_ERR(lp->rx_chan); 1520 dev_err(lp->dev, "No Ethernet DMA (RX) channel found\n"); 1521 goto err_dma_release_tx; 1522 } 1523 > 1524 tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > 1525 tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > 1526 rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > 1527 rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; 1528 1529 ret = dmaengine_slave_config(lp->tx_chan, &tx_config); 1530 if (ret) { 1531 dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); 1532 goto err_dma_release_tx; 1533 } 1534 ret = dmaengine_slave_config(lp->rx_chan, &rx_config); 1535 if (ret) { 1536 dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); 1537 goto err_dma_release_tx; 1538 } 1539 1540 lp->tx_ring_tail = 0; 1541 lp->tx_ring_head = 0; 1542 lp->rx_ring_tail = 0; 1543 lp->rx_ring_head = 0; 1544 lp->tx_skb_ring = kcalloc(TX_BD_NUM_MAX, sizeof(*lp->tx_skb_ring), 1545 GFP_KERNEL); 1546 if (!lp->tx_skb_ring) { 1547 ret = -ENOMEM; 1548 goto err_dma_release_rx; 1549 } 1550 for (i = 0; i < TX_BD_NUM_MAX; i++) { 1551 skbuf_dma = kzalloc(sizeof(*skbuf_dma), GFP_KERNEL); 1552 if (!skbuf_dma) { 1553 ret = -ENOMEM; 1554 goto err_free_tx_skb_ring; 1555 } 1556 lp->tx_skb_ring[i] = skbuf_dma; 1557 } 1558 1559 lp->rx_skb_ring = kcalloc(RX_BUF_NUM_DEFAULT, sizeof(*lp->rx_skb_ring), 1560 GFP_KERNEL); 1561 if (!lp->rx_skb_ring) { 1562 ret = -ENOMEM; 1563 goto err_free_tx_skb_ring; 1564 } 1565 for (i = 0; i < RX_BUF_NUM_DEFAULT; i++) { 1566 skbuf_dma = kzalloc(sizeof(*skbuf_dma), GFP_KERNEL); 1567 if (!skbuf_dma) { 1568 ret = -ENOMEM; 1569 goto err_free_rx_skb_ring; 1570 } 1571 lp->rx_skb_ring[i] = skbuf_dma; 1572 } 1573 /* TODO: Instead of BD_NUM_DEFAULT use runtime support */ 1574 for (i = 0; i < RX_BUF_NUM_DEFAULT; i++) 1575 axienet_rx_submit_desc(ndev); 1576 dma_async_issue_pending(lp->rx_chan); 1577 1578 return 0; 1579 1580 err_free_rx_skb_ring: 1581 for (i = 0; i < RX_BUF_NUM_DEFAULT; i++) 1582 kfree(lp->rx_skb_ring[i]); 1583 kfree(lp->rx_skb_ring); 1584 err_free_tx_skb_ring: 1585 for (i = 0; i < TX_BD_NUM_MAX; i++) 1586 kfree(lp->tx_skb_ring[i]); 1587 kfree(lp->tx_skb_ring); 1588 err_dma_release_rx: 1589 dma_release_channel(lp->rx_chan); 1590 err_dma_release_tx: 1591 dma_release_channel(lp->tx_chan); 1592 return ret; 1593 } 1594 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-25 10:22 [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow Suraj Gupta 2025-05-26 0:36 ` kernel test robot @ 2025-05-27 16:16 ` Sean Anderson 2025-05-28 12:00 ` Gupta, Suraj 1 sibling, 1 reply; 13+ messages in thread From: Sean Anderson @ 2025-05-27 16:16 UTC (permalink / raw) To: Suraj Gupta, andrew+netdev, davem, edumazet, kuba, pabeni, vkoul, michal.simek, radhey.shyam.pandey, horms Cc: netdev, linux-arm-kernel, linux-kernel, git, harini.katakam On 5/25/25 06:22, Suraj Gupta wrote: > Add support to configure / report interrupt coalesce count and delay via > ethtool in DMAEngine flow. > Netperf numbers are not good when using non-dmaengine default values, > so tuned coalesce count and delay and defined separate default > values in dmaengine flow. > > Netperf numbers and CPU utilisation change in DMAengine flow after > introducing coalescing with default parameters: > coalesce parameters: > Transfer type Before(w/o coalescing) After(with coalescing) > TCP Tx, CPU utilisation% 925, 27 941, 22 > TCP Rx, CPU utilisation% 607, 32 741, 36 > UDP Tx, CPU utilisation% 857, 31 960, 28 > UDP Rx, CPU utilisation% 762, 26 783, 18 > > Above numbers are observed with 4x Cortex-a53. How does this affect latency? I would expect these RX settings to increase latency around 5-10x. I only use these settings with DIM since it will disable coalescing during periods of light load for better latency. (of course the way to fix this in general is RSS or some other method involving multiple queues). > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > --- > This patch depend on following AXI DMA dmengine driver changes sent to > dmaengine mailing list as pre-requisit series: > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd.com/ > --- > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 +++++++++++++++++++ > 2 files changed, 59 insertions(+) > > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h > index 5ff742103beb..cdf6cbb6f2fd 100644 > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > @@ -126,6 +126,12 @@ > #define XAXIDMA_DFT_TX_USEC 50 > #define XAXIDMA_DFT_RX_USEC 16 > > +/* Default TX/RX Threshold and delay timer values for SGDMA mode with DMAEngine */ > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > + > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */ > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */ > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct net_device *ndev) > { > struct axienet_local *lp = netdev_priv(ndev); > struct skbuf_dma_descriptor *skbuf_dma; > + struct dma_slave_config tx_config, rx_config; > int i, ret; > > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); > @@ -1520,6 +1521,22 @@ static int axienet_init_dmaengine(struct net_device *ndev) > goto err_dma_release_tx; > } > > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; I think it would be clearer to just do something like struct dma_slave_config tx_config = { .coalesce_cnt = 16, .coalesce_usecs = 5, }; since these are only used once. And this ensures that you initialize the whole struct. But what tree are you using? I don't see these members on net-next or dmaengine. > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); > + if (ret) { > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); > + goto err_dma_release_tx; > + } > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); > + if (ret) { > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); > + goto err_dma_release_tx; > + } > + > lp->tx_ring_tail = 0; > lp->tx_ring_head = 0; > lp->rx_ring_tail = 0; > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct net_device *ndev, > struct axienet_local *lp = netdev_priv(ndev); > u32 cr; > > + if (lp->use_dmaengine) { > + struct dma_slave_caps tx_caps, rx_caps; > + > + dma_get_slave_caps(lp->tx_chan, &tx_caps); > + dma_get_slave_caps(lp->rx_chan, &rx_caps); > + > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; > + return 0; > + } > + > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; > > spin_lock_irq(&lp->rx_cr_lock); > @@ -2233,6 +2263,29 @@ axienet_ethtools_set_coalesce(struct net_device *ndev, > return -EINVAL; > } > > + if (lp->use_dmaengine) { > + struct dma_slave_config tx_cfg, rx_cfg; > + int ret; > + > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; > + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; > + > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); > + if (ret) { > + NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters"); > + return ret; > + } > + > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); > + if (ret) { > + NL_SET_ERR_MSG(extack, "failed to set rx coalesce parameters"); > + return ret; > + } > + return 0; > + } > + > if (new_dim && !old_dim) { > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-27 16:16 ` Sean Anderson @ 2025-05-28 12:00 ` Gupta, Suraj 2025-05-28 13:09 ` Subbaraya Sundeep 2025-05-29 16:17 ` Sean Anderson 0 siblings, 2 replies; 13+ messages in thread From: Gupta, Suraj @ 2025-05-28 12:00 UTC (permalink / raw) To: Sean Anderson, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini [AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Sean Anderson <sean.anderson@linux.dev> > Sent: Tuesday, May 27, 2025 9:47 PM > To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini > <harini.katakam@amd.com> > Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce > parameters in DMAengine flow > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On 5/25/25 06:22, Suraj Gupta wrote: > > Add support to configure / report interrupt coalesce count and delay > > via ethtool in DMAEngine flow. > > Netperf numbers are not good when using non-dmaengine default values, > > so tuned coalesce count and delay and defined separate default values > > in dmaengine flow. > > > > Netperf numbers and CPU utilisation change in DMAengine flow after > > introducing coalescing with default parameters: > > coalesce parameters: > > Transfer type Before(w/o coalescing) After(with coalescing) > > TCP Tx, CPU utilisation% 925, 27 941, 22 > > TCP Rx, CPU utilisation% 607, 32 741, 36 > > UDP Tx, CPU utilisation% 857, 31 960, 28 > > UDP Rx, CPU utilisation% 762, 26 783, 18 > > > > Above numbers are observed with 4x Cortex-a53. > > How does this affect latency? I would expect these RX settings to increase latency > around 5-10x. I only use these settings with DIM since it will disable coalescing > during periods of light load for better latency. > > (of course the way to fix this in general is RSS or some other method involving > multiple queues). > I took values before NAPI addition in legacy flow (rx_threshold: 24, rx_usec: 50) as reference. But netperf numbers were low with them, so tried tuning both and selected the pair which gives good numbers. > > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > > --- > > This patch depend on following AXI DMA dmengine driver changes sent to > > dmaengine mailing list as pre-requisit series: > > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. > > com/ > > --- > > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 +++++++++++++++++++ > > 2 files changed, 59 insertions(+) > > > > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > index 5ff742103beb..cdf6cbb6f2fd 100644 > > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > @@ -126,6 +126,12 @@ > > #define XAXIDMA_DFT_TX_USEC 50 > > #define XAXIDMA_DFT_RX_USEC 16 > > > > +/* Default TX/RX Threshold and delay timer values for SGDMA mode with > DMAEngine */ > > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > > + > > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */ > > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */ > > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ > > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct > > net_device *ndev) { > > struct axienet_local *lp = netdev_priv(ndev); > > struct skbuf_dma_descriptor *skbuf_dma; > > + struct dma_slave_config tx_config, rx_config; > > int i, ret; > > > > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ -1520,6 > > +1521,22 @@ static int axienet_init_dmaengine(struct net_device *ndev) > > goto err_dma_release_tx; > > } > > > > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; > > I think it would be clearer to just do something like > > struct dma_slave_config tx_config = { > .coalesce_cnt = 16, > .coalesce_usecs = 5, > }; > > since these are only used once. And this ensures that you initialize the whole struct. > > But what tree are you using? I don't see these members on net-next or dmaengine. These changes are proposed in separate series in dmaengine https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd.com/ and I described it here below my SOB. > > > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); > > + if (ret) { > > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); > > + goto err_dma_release_tx; > > + } > > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); > > + if (ret) { > > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); > > + goto err_dma_release_tx; > > + } > > + > > lp->tx_ring_tail = 0; > > lp->tx_ring_head = 0; > > lp->rx_ring_tail = 0; > > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct net_device > *ndev, > > struct axienet_local *lp = netdev_priv(ndev); > > u32 cr; > > > > + if (lp->use_dmaengine) { > > + struct dma_slave_caps tx_caps, rx_caps; > > + > > + dma_get_slave_caps(lp->tx_chan, &tx_caps); > > + dma_get_slave_caps(lp->rx_chan, &rx_caps); > > + > > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; > > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; > > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; > > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; > > + return 0; > > + } > > + > > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; > > > > spin_lock_irq(&lp->rx_cr_lock); > > @@ -2233,6 +2263,29 @@ axienet_ethtools_set_coalesce(struct net_device > *ndev, > > return -EINVAL; > > } > > > > + if (lp->use_dmaengine) { > > + struct dma_slave_config tx_cfg, rx_cfg; > > + int ret; > > + > > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; > > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; > > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; > > + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; > > + > > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); > > + if (ret) { > > + NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters"); > > + return ret; > > + } > > + > > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); > > + if (ret) { > > + NL_SET_ERR_MSG(extack, "failed to set rx coalesce > parameters"); > > + return ret; > > + } > > + return 0; > > + } > > + > > if (new_dim && !old_dim) { > > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), > > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-28 12:00 ` Gupta, Suraj @ 2025-05-28 13:09 ` Subbaraya Sundeep 2025-05-29 16:17 ` Sean Anderson 1 sibling, 0 replies; 13+ messages in thread From: Subbaraya Sundeep @ 2025-05-28 13:09 UTC (permalink / raw) To: Gupta, Suraj Cc: Sean Anderson, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini On 2025-05-28 at 12:00:56, Gupta, Suraj (Suraj.Gupta2@amd.com) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > Fix your mail settings. Cannot be internal if posting to mailing list :) Thanks, Sundeep > > -----Original Message----- > > From: Sean Anderson <sean.anderson@linux.dev> > > Sent: Tuesday, May 27, 2025 9:47 PM > > To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; > > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org > > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- > > kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini > > <harini.katakam@amd.com> > > Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce > > parameters in DMAengine flow > > > > Caution: This message originated from an External Source. Use proper caution > > when opening attachments, clicking links, or responding. > > > > > > On 5/25/25 06:22, Suraj Gupta wrote: > > > Add support to configure / report interrupt coalesce count and delay > > > via ethtool in DMAEngine flow. > > > Netperf numbers are not good when using non-dmaengine default values, > > > so tuned coalesce count and delay and defined separate default values > > > in dmaengine flow. > > > > > > Netperf numbers and CPU utilisation change in DMAengine flow after > > > introducing coalescing with default parameters: > > > coalesce parameters: > > > Transfer type Before(w/o coalescing) After(with coalescing) > > > TCP Tx, CPU utilisation% 925, 27 941, 22 > > > TCP Rx, CPU utilisation% 607, 32 741, 36 > > > UDP Tx, CPU utilisation% 857, 31 960, 28 > > > UDP Rx, CPU utilisation% 762, 26 783, 18 > > > > > > Above numbers are observed with 4x Cortex-a53. > > > > How does this affect latency? I would expect these RX settings to increase latency > > around 5-10x. I only use these settings with DIM since it will disable coalescing > > during periods of light load for better latency. > > > > (of course the way to fix this in general is RSS or some other method involving > > multiple queues). > > > > I took values before NAPI addition in legacy flow (rx_threshold: 24, rx_usec: 50) as reference. But netperf numbers were low with them, so tried tuning both and selected the pair which gives good numbers. > > > > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > > > --- > > > This patch depend on following AXI DMA dmengine driver changes sent to > > > dmaengine mailing list as pre-requisit series: > > > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. > > > com/ > > > --- > > > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > > > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 +++++++++++++++++++ > > > 2 files changed, 59 insertions(+) > > > > > > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > > b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > > index 5ff742103beb..cdf6cbb6f2fd 100644 > > > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > > @@ -126,6 +126,12 @@ > > > #define XAXIDMA_DFT_TX_USEC 50 > > > #define XAXIDMA_DFT_RX_USEC 16 > > > > > > +/* Default TX/RX Threshold and delay timer values for SGDMA mode with > > DMAEngine */ > > > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > > > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > > > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > > > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > > > + > > > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */ > > > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */ > > > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ > > > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > > > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct > > > net_device *ndev) { > > > struct axienet_local *lp = netdev_priv(ndev); > > > struct skbuf_dma_descriptor *skbuf_dma; > > > + struct dma_slave_config tx_config, rx_config; > > > int i, ret; > > > > > > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ -1520,6 > > > +1521,22 @@ static int axienet_init_dmaengine(struct net_device *ndev) > > > goto err_dma_release_tx; > > > } > > > > > > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > > > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > > > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > > > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; > > > > I think it would be clearer to just do something like > > > > struct dma_slave_config tx_config = { > > .coalesce_cnt = 16, > > .coalesce_usecs = 5, > > }; > > > > since these are only used once. And this ensures that you initialize the whole struct. > > > > But what tree are you using? I don't see these members on net-next or dmaengine. > > These changes are proposed in separate series in dmaengine https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd.com/ and I described it here below my SOB. > > > > > > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); > > > + if (ret) { > > > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); > > > + goto err_dma_release_tx; > > > + } > > > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); > > > + if (ret) { > > > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); > > > + goto err_dma_release_tx; > > > + } > > > + > > > lp->tx_ring_tail = 0; > > > lp->tx_ring_head = 0; > > > lp->rx_ring_tail = 0; > > > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct net_device > > *ndev, > > > struct axienet_local *lp = netdev_priv(ndev); > > > u32 cr; > > > > > > + if (lp->use_dmaengine) { > > > + struct dma_slave_caps tx_caps, rx_caps; > > > + > > > + dma_get_slave_caps(lp->tx_chan, &tx_caps); > > > + dma_get_slave_caps(lp->rx_chan, &rx_caps); > > > + > > > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; > > > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; > > > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; > > > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; > > > + return 0; > > > + } > > > + > > > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; > > > > > > spin_lock_irq(&lp->rx_cr_lock); > > > @@ -2233,6 +2263,29 @@ axienet_ethtools_set_coalesce(struct net_device > > *ndev, > > > return -EINVAL; > > > } > > > > > > + if (lp->use_dmaengine) { > > > + struct dma_slave_config tx_cfg, rx_cfg; > > > + int ret; > > > + > > > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; > > > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; > > > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; > > > + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; > > > + > > > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); > > > + if (ret) { > > > + NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters"); > > > + return ret; > > > + } > > > + > > > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); > > > + if (ret) { > > > + NL_SET_ERR_MSG(extack, "failed to set rx coalesce > > parameters"); > > > + return ret; > > > + } > > > + return 0; > > > + } > > > + > > > if (new_dim && !old_dim) { > > > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), > > > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-28 12:00 ` Gupta, Suraj 2025-05-28 13:09 ` Subbaraya Sundeep @ 2025-05-29 16:17 ` Sean Anderson 2025-05-29 16:29 ` Andrew Lunn 2025-05-30 10:18 ` Gupta, Suraj 1 sibling, 2 replies; 13+ messages in thread From: Sean Anderson @ 2025-05-29 16:17 UTC (permalink / raw) To: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini On 5/28/25 08:00, Gupta, Suraj wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: Sean Anderson <sean.anderson@linux.dev> >> Sent: Tuesday, May 27, 2025 9:47 PM >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; >> Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- >> kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini >> <harini.katakam@amd.com> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce >> parameters in DMAengine flow >> >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> On 5/25/25 06:22, Suraj Gupta wrote: >> > Add support to configure / report interrupt coalesce count and delay >> > via ethtool in DMAEngine flow. >> > Netperf numbers are not good when using non-dmaengine default values, >> > so tuned coalesce count and delay and defined separate default values >> > in dmaengine flow. >> > >> > Netperf numbers and CPU utilisation change in DMAengine flow after >> > introducing coalescing with default parameters: >> > coalesce parameters: >> > Transfer type Before(w/o coalescing) After(with coalescing) >> > TCP Tx, CPU utilisation% 925, 27 941, 22 >> > TCP Rx, CPU utilisation% 607, 32 741, 36 >> > UDP Tx, CPU utilisation% 857, 31 960, 28 >> > UDP Rx, CPU utilisation% 762, 26 783, 18 >> > >> > Above numbers are observed with 4x Cortex-a53. >> >> How does this affect latency? I would expect these RX settings to increase latency >> around 5-10x. I only use these settings with DIM since it will disable coalescing >> during periods of light load for better latency. >> >> (of course the way to fix this in general is RSS or some other method involving >> multiple queues). >> > > I took values before NAPI addition in legacy flow (rx_threshold: 24, rx_usec: 50) as reference. But netperf numbers were low with them, so tried tuning both and selected the pair which gives good numbers. Yeah, but the reason is that you are trading latency for throughput. There is only one queue, so when the interface is saturated you will not get good latency anyway (since latency-sensitive packets will get head-of-line blocked). But when activity is sparse you can good latency if there is no coalescing. So I think coalescing should only be used when there is a lot of traffic. Hence why I only adjusted the settings once I implemented DIM. I think you should be able to implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as efficient without NAPI. Actually, if you are looking into improving performance, I think lack of NAPI is probably the biggest limitation with the dmaengine backend. >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> >> > --- >> > This patch depend on following AXI DMA dmengine driver changes sent to >> > dmaengine mailing list as pre-requisit series: >> > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. >> > com/ >> > --- >> > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 +++++++++++++++++++ >> > 2 files changed, 59 insertions(+) >> > >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h >> > index 5ff742103beb..cdf6cbb6f2fd 100644 >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h >> > @@ -126,6 +126,12 @@ >> > #define XAXIDMA_DFT_TX_USEC 50 >> > #define XAXIDMA_DFT_RX_USEC 16 >> > >> > +/* Default TX/RX Threshold and delay timer values for SGDMA mode with >> DMAEngine */ >> > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 >> > +#define XAXIDMAENGINE_DFT_TX_USEC 5 >> > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 >> > +#define XAXIDMAENGINE_DFT_RX_USEC 16 >> > + >> > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */ >> > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */ >> > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> > index 1b7a653c1f4e..f9c7d90d4ecb 100644 >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct >> > net_device *ndev) { >> > struct axienet_local *lp = netdev_priv(ndev); >> > struct skbuf_dma_descriptor *skbuf_dma; >> > + struct dma_slave_config tx_config, rx_config; >> > int i, ret; >> > >> > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ -1520,6 >> > +1521,22 @@ static int axienet_init_dmaengine(struct net_device *ndev) >> > goto err_dma_release_tx; >> > } >> > >> > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; >> > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; >> > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; >> > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; >> >> I think it would be clearer to just do something like >> >> struct dma_slave_config tx_config = { >> .coalesce_cnt = 16, >> .coalesce_usecs = 5, >> }; >> >> since these are only used once. And this ensures that you initialize the whole struct. >> >> But what tree are you using? I don't see these members on net-next or dmaengine. > > These changes are proposed in separate series in dmaengine https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd.com/ and I described it here below my SOB. I think you should post those patches with this series to allow them to be reviewed appropriately. --Sean >> >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); >> > + if (ret) { >> > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); >> > + goto err_dma_release_tx; >> > + } >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); >> > + if (ret) { >> > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); >> > + goto err_dma_release_tx; >> > + } >> > + >> > lp->tx_ring_tail = 0; >> > lp->tx_ring_head = 0; >> > lp->rx_ring_tail = 0; >> > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct net_device >> *ndev, >> > struct axienet_local *lp = netdev_priv(ndev); >> > u32 cr; >> > >> > + if (lp->use_dmaengine) { >> > + struct dma_slave_caps tx_caps, rx_caps; >> > + >> > + dma_get_slave_caps(lp->tx_chan, &tx_caps); >> > + dma_get_slave_caps(lp->rx_chan, &rx_caps); >> > + >> > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; >> > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; >> > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; >> > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; >> > + return 0; >> > + } >> > + >> > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; >> > >> > spin_lock_irq(&lp->rx_cr_lock); >> > @@ -2233,6 +2263,29 @@ axienet_ethtools_set_coalesce(struct net_device >> *ndev, >> > return -EINVAL; >> > } >> > >> > + if (lp->use_dmaengine) { >> > + struct dma_slave_config tx_cfg, rx_cfg; >> > + int ret; >> > + >> > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; >> > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; >> > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; >> > + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; >> > + >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); >> > + if (ret) { >> > + NL_SET_ERR_MSG(extack, "failed to set tx coalesce parameters"); >> > + return ret; >> > + } >> > + >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); >> > + if (ret) { >> > + NL_SET_ERR_MSG(extack, "failed to set rx coalesce >> parameters"); >> > + return ret; >> > + } >> > + return 0; >> > + } >> > + >> > if (new_dim && !old_dim) { >> > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), >> > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-29 16:17 ` Sean Anderson @ 2025-05-29 16:29 ` Andrew Lunn 2025-05-29 16:35 ` Sean Anderson 2025-05-30 10:18 ` Gupta, Suraj 1 sibling, 1 reply; 13+ messages in thread From: Andrew Lunn @ 2025-05-29 16:29 UTC (permalink / raw) To: Sean Anderson Cc: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini > Yeah, but the reason is that you are trading latency for throughput. > There is only one queue, so when the interface is saturated you will not > get good latency anyway (since latency-sensitive packets will get > head-of-line blocked). But when activity is sparse you can good latency > if there is no coalescing. So I think coalescing should only be used > when there is a lot of traffic. Hence why I only adjusted the settings > once I implemented DIM. I think you should be able to implement it by > calling net_dim from axienet_dma_rx_cb, but it will not be as efficient > without NAPI. > > Actually, if you are looking into improving performance, I think lack of > NAPI is probably the biggest limitation with the dmaengine backend. It latency is the goal, especially for mixing high and low priority traffic, having BQL implemented is also important. Does this driver have that? Andrew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-29 16:29 ` Andrew Lunn @ 2025-05-29 16:35 ` Sean Anderson 0 siblings, 0 replies; 13+ messages in thread From: Sean Anderson @ 2025-05-29 16:35 UTC (permalink / raw) To: Andrew Lunn Cc: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini On 5/29/25 12:29, Andrew Lunn wrote: >> Yeah, but the reason is that you are trading latency for throughput. >> There is only one queue, so when the interface is saturated you will not >> get good latency anyway (since latency-sensitive packets will get >> head-of-line blocked). But when activity is sparse you can good latency >> if there is no coalescing. So I think coalescing should only be used >> when there is a lot of traffic. Hence why I only adjusted the settings >> once I implemented DIM. I think you should be able to implement it by >> calling net_dim from axienet_dma_rx_cb, but it will not be as efficient >> without NAPI. >> >> Actually, if you are looking into improving performance, I think lack of >> NAPI is probably the biggest limitation with the dmaengine backend. > > It latency is the goal, especially for mixing high and low priority > traffic, having BQL implemented is also important. Does this driver > have that? > > Andrew Yes, see commit c900e49d58eb ("net: xilinx: axienet: Implement BQL"). --Sean ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-29 16:17 ` Sean Anderson 2025-05-29 16:29 ` Andrew Lunn @ 2025-05-30 10:18 ` Gupta, Suraj 2025-05-30 11:53 ` Gupta, Suraj 2025-05-30 20:44 ` Sean Anderson 1 sibling, 2 replies; 13+ messages in thread From: Gupta, Suraj @ 2025-05-30 10:18 UTC (permalink / raw) To: Sean Anderson, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini [AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Sean Anderson <sean.anderson@linux.dev> > Sent: Thursday, May 29, 2025 9:48 PM > To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini > <harini.katakam@amd.com> > Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce > parameters in DMAengine flow > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On 5/28/25 08:00, Gupta, Suraj wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > >> -----Original Message----- > >> From: Sean Anderson <sean.anderson@linux.dev> > >> Sent: Tuesday, May 27, 2025 9:47 PM > >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal > >> <michal.simek@amd.com>; Pandey, Radhey Shyam > >> <radhey.shyam.pandey@amd.com>; horms@kernel.org > >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; > >> Katakam, Harini <harini.katakam@amd.com> > >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and > >> report coalesce parameters in DMAengine flow > >> > >> Caution: This message originated from an External Source. Use proper > >> caution when opening attachments, clicking links, or responding. > >> > >> > >> On 5/25/25 06:22, Suraj Gupta wrote: > >> > Add support to configure / report interrupt coalesce count and > >> > delay via ethtool in DMAEngine flow. > >> > Netperf numbers are not good when using non-dmaengine default > >> > values, so tuned coalesce count and delay and defined separate > >> > default values in dmaengine flow. > >> > > >> > Netperf numbers and CPU utilisation change in DMAengine flow after > >> > introducing coalescing with default parameters: > >> > coalesce parameters: > >> > Transfer type Before(w/o coalescing) After(with coalescing) > >> > TCP Tx, CPU utilisation% 925, 27 941, 22 > >> > TCP Rx, CPU utilisation% 607, 32 741, 36 > >> > UDP Tx, CPU utilisation% 857, 31 960, 28 > >> > UDP Rx, CPU utilisation% 762, 26 783, 18 > >> > > >> > Above numbers are observed with 4x Cortex-a53. > >> > >> How does this affect latency? I would expect these RX settings to > >> increase latency around 5-10x. I only use these settings with DIM > >> since it will disable coalescing during periods of light load for better latency. > >> > >> (of course the way to fix this in general is RSS or some other method > >> involving multiple queues). > >> > > > > I took values before NAPI addition in legacy flow (rx_threshold: 24, rx_usec: 50) as > reference. But netperf numbers were low with them, so tried tuning both and > selected the pair which gives good numbers. > > Yeah, but the reason is that you are trading latency for throughput. > There is only one queue, so when the interface is saturated you will not get good > latency anyway (since latency-sensitive packets will get head-of-line blocked). But > when activity is sparse you can good latency if there is no coalescing. So I think > coalescing should only be used when there is a lot of traffic. Hence why I only > adjusted the settings once I implemented DIM. I think you should be able to > implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as efficient > without NAPI. > Ok, got it. I'll keep default values used before NAPI in legacy flow (coalesce count: 24, delay: 50) for both Tx and Rx and remove perf comparisons. > Actually, if you are looking into improving performance, I think lack of NAPI is > probably the biggest limitation with the dmaengine backend. > Yes, I agree. NAPI for DMAEngine implementation is underway and will be sent to mainline soon. > >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > >> > --- > >> > This patch depend on following AXI DMA dmengine driver changes sent > >> > to dmaengine mailing list as pre-requisit series: > >> > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. > >> > com/ > >> > --- > >> > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 > >> > +++++++++++++++++++ > >> > 2 files changed, 59 insertions(+) > >> > > >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> > index 5ff742103beb..cdf6cbb6f2fd 100644 > >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> > @@ -126,6 +126,12 @@ > >> > #define XAXIDMA_DFT_TX_USEC 50 > >> > #define XAXIDMA_DFT_RX_USEC 16 > >> > > >> > +/* Default TX/RX Threshold and delay timer values for SGDMA mode > >> > +with > >> DMAEngine */ > >> > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > >> > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > >> > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > >> > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > >> > + > >> > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet > */ > >> > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet > */ > >> > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ > >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct > >> > net_device *ndev) { > >> > struct axienet_local *lp = netdev_priv(ndev); > >> > struct skbuf_dma_descriptor *skbuf_dma; > >> > + struct dma_slave_config tx_config, rx_config; > >> > int i, ret; > >> > > >> > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ > >> > -1520,6 > >> > +1521,22 @@ static int axienet_init_dmaengine(struct net_device > >> > +*ndev) > >> > goto err_dma_release_tx; > >> > } > >> > > >> > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > >> > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > >> > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > >> > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; > >> > >> I think it would be clearer to just do something like > >> > >> struct dma_slave_config tx_config = { > >> .coalesce_cnt = 16, > >> .coalesce_usecs = 5, > >> }; > >> > >> since these are only used once. And this ensures that you initialize the whole > struct. > >> > >> But what tree are you using? I don't see these members on net-next or > dmaengine. > > > > These changes are proposed in separate series in dmaengine > https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd.com/ and I > described it here below my SOB. > > I think you should post those patches with this series to allow them to be reviewed > appropriately. > > --Sean DMAengine series functionality depends on commit (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/commit/drivers/dma/xilinx?h=next&id=7e01511443c30a55a5ae78d3debd46d4d872517e) in dmaengine which is currently not there in net-next. So I sent that to dmaengine only. Please let me know if any way to send as single series. > > >> > >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); > >> > + if (ret) { > >> > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); > >> > + goto err_dma_release_tx; > >> > + } > >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); > >> > + if (ret) { > >> > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); > >> > + goto err_dma_release_tx; > >> > + } > >> > + > >> > lp->tx_ring_tail = 0; > >> > lp->tx_ring_head = 0; > >> > lp->rx_ring_tail = 0; > >> > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct > >> > net_device > >> *ndev, > >> > struct axienet_local *lp = netdev_priv(ndev); > >> > u32 cr; > >> > > >> > + if (lp->use_dmaengine) { > >> > + struct dma_slave_caps tx_caps, rx_caps; > >> > + > >> > + dma_get_slave_caps(lp->tx_chan, &tx_caps); > >> > + dma_get_slave_caps(lp->rx_chan, &rx_caps); > >> > + > >> > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; > >> > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; > >> > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; > >> > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; > >> > + return 0; > >> > + } > >> > + > >> > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; > >> > > >> > spin_lock_irq(&lp->rx_cr_lock); @@ -2233,6 +2263,29 @@ > >> > axienet_ethtools_set_coalesce(struct net_device > >> *ndev, > >> > return -EINVAL; > >> > } > >> > > >> > + if (lp->use_dmaengine) { > >> > + struct dma_slave_config tx_cfg, rx_cfg; > >> > + int ret; > >> > + > >> > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; > >> > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; > >> > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; > >> > + rx_cfg.coalesce_usecs = ecoalesce->rx_coalesce_usecs; > >> > + > >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); > >> > + if (ret) { > >> > + NL_SET_ERR_MSG(extack, "failed to set tx coalesce > parameters"); > >> > + return ret; > >> > + } > >> > + > >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); > >> > + if (ret) { > >> > + NL_SET_ERR_MSG(extack, "failed to set rx > >> > + coalesce > >> parameters"); > >> > + return ret; > >> > + } > >> > + return 0; > >> > + } > >> > + > >> > if (new_dim && !old_dim) { > >> > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), > >> > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-30 10:18 ` Gupta, Suraj @ 2025-05-30 11:53 ` Gupta, Suraj 2025-05-30 20:44 ` Sean Anderson 1 sibling, 0 replies; 13+ messages in thread From: Gupta, Suraj @ 2025-05-30 11:53 UTC (permalink / raw) To: Sean Anderson, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini [Public] > -----Original Message----- > From: Gupta, Suraj <Suraj.Gupta2@amd.com> > Sent: Friday, May 30, 2025 3:49 PM > To: Sean Anderson <sean.anderson@linux.dev>; andrew+netdev@lunn.ch; > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini > <harini.katakam@amd.com> > Subject: RE: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce > parameters in DMAengine flow > > [AMD Official Use Only - AMD Internal Distribution Only] > *Modified sensitivity level to Public. > > -----Original Message----- > > From: Sean Anderson <sean.anderson@linux.dev> > > Sent: Thursday, May 29, 2025 9:48 PM > > To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal > > <michal.simek@amd.com>; Pandey, Radhey Shyam > > <radhey.shyam.pandey@amd.com>; horms@kernel.org > > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > > linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; > > Katakam, Harini <harini.katakam@amd.com> > > Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and > > report coalesce parameters in DMAengine flow > > > > Caution: This message originated from an External Source. Use proper > > caution when opening attachments, clicking links, or responding. > > > > > > On 5/28/25 08:00, Gupta, Suraj wrote: > > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > > >> -----Original Message----- > > >> From: Sean Anderson <sean.anderson@linux.dev> > > >> Sent: Tuesday, May 27, 2025 9:47 PM > > >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > > >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > > >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal > > >> <michal.simek@amd.com>; Pandey, Radhey Shyam > > >> <radhey.shyam.pandey@amd.com>; horms@kernel.org > > >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > > >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; > > >> Katakam, Harini <harini.katakam@amd.com> > > >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and > > >> report coalesce parameters in DMAengine flow > > >> > > >> Caution: This message originated from an External Source. Use > > >> proper caution when opening attachments, clicking links, or responding. > > >> > > >> > > >> On 5/25/25 06:22, Suraj Gupta wrote: > > >> > Add support to configure / report interrupt coalesce count and > > >> > delay via ethtool in DMAEngine flow. > > >> > Netperf numbers are not good when using non-dmaengine default > > >> > values, so tuned coalesce count and delay and defined separate > > >> > default values in dmaengine flow. > > >> > > > >> > Netperf numbers and CPU utilisation change in DMAengine flow > > >> > after introducing coalescing with default parameters: > > >> > coalesce parameters: > > >> > Transfer type Before(w/o coalescing) After(with coalescing) > > >> > TCP Tx, CPU utilisation% 925, 27 941, 22 > > >> > TCP Rx, CPU utilisation% 607, 32 741, 36 > > >> > UDP Tx, CPU utilisation% 857, 31 960, 28 > > >> > UDP Rx, CPU utilisation% 762, 26 783, 18 > > >> > > > >> > Above numbers are observed with 4x Cortex-a53. > > >> > > >> How does this affect latency? I would expect these RX settings to > > >> increase latency around 5-10x. I only use these settings with DIM > > >> since it will disable coalescing during periods of light load for better latency. > > >> > > >> (of course the way to fix this in general is RSS or some other > > >> method involving multiple queues). > > >> > > > > > > I took values before NAPI addition in legacy flow (rx_threshold: 24, > > > rx_usec: 50) as > > reference. But netperf numbers were low with them, so tried tuning > > both and selected the pair which gives good numbers. > > > > Yeah, but the reason is that you are trading latency for throughput. > > There is only one queue, so when the interface is saturated you will > > not get good latency anyway (since latency-sensitive packets will get > > head-of-line blocked). But when activity is sparse you can good > > latency if there is no coalescing. So I think coalescing should only > > be used when there is a lot of traffic. Hence why I only adjusted the > > settings once I implemented DIM. I think you should be able to > > implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as > efficient without NAPI. > > > > Ok, got it. I'll keep default values used before NAPI in legacy flow (coalesce count: > 24, delay: 50) for both Tx and Rx and remove perf comparisons. > > > Actually, if you are looking into improving performance, I think lack > > of NAPI is probably the biggest limitation with the dmaengine backend. > > > Yes, I agree. NAPI for DMAEngine implementation is underway and will be sent to > mainline soon. > > > >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > > >> > --- > > >> > This patch depend on following AXI DMA dmengine driver changes > > >> > sent to dmaengine mailing list as pre-requisit series: > > >> > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. > > >> > com/ > > >> > --- > > >> > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > > >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 > > >> > +++++++++++++++++++ > > >> > 2 files changed, 59 insertions(+) > > >> > > > >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > >> > index 5ff742103beb..cdf6cbb6f2fd 100644 > > >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > > >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > > >> > @@ -126,6 +126,12 @@ > > >> > #define XAXIDMA_DFT_TX_USEC 50 > > >> > #define XAXIDMA_DFT_RX_USEC 16 > > >> > > > >> > +/* Default TX/RX Threshold and delay timer values for SGDMA mode > > >> > +with > > >> DMAEngine */ > > >> > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > > >> > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > > >> > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > > >> > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > > >> > + > > >> > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx > packet > > */ > > >> > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx > packet > > */ > > >> > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits > */ > > >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > >> > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > > >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > > >> > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct > > >> > net_device *ndev) { > > >> > struct axienet_local *lp = netdev_priv(ndev); > > >> > struct skbuf_dma_descriptor *skbuf_dma; > > >> > + struct dma_slave_config tx_config, rx_config; > > >> > int i, ret; > > >> > > > >> > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ > > >> > -1520,6 > > >> > +1521,22 @@ static int axienet_init_dmaengine(struct net_device > > >> > +*ndev) > > >> > goto err_dma_release_tx; > > >> > } > > >> > > > >> > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > > >> > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > > >> > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > > >> > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; > > >> > > >> I think it would be clearer to just do something like > > >> > > >> struct dma_slave_config tx_config = { > > >> .coalesce_cnt = 16, > > >> .coalesce_usecs = 5, > > >> }; > > >> > > >> since these are only used once. And this ensures that you > > >> initialize the whole > > struct. > > >> > > >> But what tree are you using? I don't see these members on net-next > > >> or > > dmaengine. > > > > > > These changes are proposed in separate series in dmaengine > > https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd. > > com/ and I described it here below my SOB. > > > > I think you should post those patches with this series to allow them > > to be reviewed appropriately. > > > > --Sean > > DMAengine series functionality depends on commit > (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/commit/drivers/d > ma/xilinx?h=next&id=7e01511443c30a55a5ae78d3debd46d4d872517e) in > dmaengine which is currently not there in net-next. So I sent that to dmaengine only. > Please let me know if any way to send as single series. > > > > >> > > >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_config); > > >> > + if (ret) { > > >> > + dev_err(lp->dev, "Failed to configure Tx coalesce parameters\n"); > > >> > + goto err_dma_release_tx; > > >> > + } > > >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_config); > > >> > + if (ret) { > > >> > + dev_err(lp->dev, "Failed to configure Rx coalesce parameters\n"); > > >> > + goto err_dma_release_tx; > > >> > + } > > >> > + > > >> > lp->tx_ring_tail = 0; > > >> > lp->tx_ring_head = 0; > > >> > lp->rx_ring_tail = 0; > > >> > @@ -2170,6 +2187,19 @@ axienet_ethtools_get_coalesce(struct > > >> > net_device > > >> *ndev, > > >> > struct axienet_local *lp = netdev_priv(ndev); > > >> > u32 cr; > > >> > > > >> > + if (lp->use_dmaengine) { > > >> > + struct dma_slave_caps tx_caps, rx_caps; > > >> > + > > >> > + dma_get_slave_caps(lp->tx_chan, &tx_caps); > > >> > + dma_get_slave_caps(lp->rx_chan, &rx_caps); > > >> > + > > >> > + ecoalesce->tx_max_coalesced_frames = tx_caps.coalesce_cnt; > > >> > + ecoalesce->tx_coalesce_usecs = tx_caps.coalesce_usecs; > > >> > + ecoalesce->rx_max_coalesced_frames = rx_caps.coalesce_cnt; > > >> > + ecoalesce->rx_coalesce_usecs = rx_caps.coalesce_usecs; > > >> > + return 0; > > >> > + } > > >> > + > > >> > ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled; > > >> > > > >> > spin_lock_irq(&lp->rx_cr_lock); @@ -2233,6 +2263,29 @@ > > >> > axienet_ethtools_set_coalesce(struct net_device > > >> *ndev, > > >> > return -EINVAL; > > >> > } > > >> > > > >> > + if (lp->use_dmaengine) { > > >> > + struct dma_slave_config tx_cfg, rx_cfg; > > >> > + int ret; > > >> > + > > >> > + tx_cfg.coalesce_cnt = ecoalesce->tx_max_coalesced_frames; > > >> > + tx_cfg.coalesce_usecs = ecoalesce->tx_coalesce_usecs; > > >> > + rx_cfg.coalesce_cnt = ecoalesce->rx_max_coalesced_frames; > > >> > + rx_cfg.coalesce_usecs = > > >> > + ecoalesce->rx_coalesce_usecs; > > >> > + > > >> > + ret = dmaengine_slave_config(lp->tx_chan, &tx_cfg); > > >> > + if (ret) { > > >> > + NL_SET_ERR_MSG(extack, "failed to set tx > > >> > + coalesce > > parameters"); > > >> > + return ret; > > >> > + } > > >> > + > > >> > + ret = dmaengine_slave_config(lp->rx_chan, &rx_cfg); > > >> > + if (ret) { > > >> > + NL_SET_ERR_MSG(extack, "failed to set rx > > >> > + coalesce > > >> parameters"); > > >> > + return ret; > > >> > + } > > >> > + return 0; > > >> > + } > > >> > + > > >> > if (new_dim && !old_dim) { > > >> > cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), > > >> > ecoalesce->rx_coalesce_usecs); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-30 10:18 ` Gupta, Suraj 2025-05-30 11:53 ` Gupta, Suraj @ 2025-05-30 20:44 ` Sean Anderson 2025-06-03 11:07 ` Gupta, Suraj 1 sibling, 1 reply; 13+ messages in thread From: Sean Anderson @ 2025-05-30 20:44 UTC (permalink / raw) To: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini On 5/30/25 06:18, Gupta, Suraj wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: Sean Anderson <sean.anderson@linux.dev> >> Sent: Thursday, May 29, 2025 9:48 PM >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; >> Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- >> kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini >> <harini.katakam@amd.com> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce >> parameters in DMAengine flow >> >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> On 5/28/25 08:00, Gupta, Suraj wrote: >> > [AMD Official Use Only - AMD Internal Distribution Only] >> > >> >> -----Original Message----- >> >> From: Sean Anderson <sean.anderson@linux.dev> >> >> Sent: Tuesday, May 27, 2025 9:47 PM >> >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal >> >> <michal.simek@amd.com>; Pandey, Radhey Shyam >> >> <radhey.shyam.pandey@amd.com>; horms@kernel.org >> >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; >> >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; >> >> Katakam, Harini <harini.katakam@amd.com> >> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and >> >> report coalesce parameters in DMAengine flow >> >> >> >> Caution: This message originated from an External Source. Use proper >> >> caution when opening attachments, clicking links, or responding. >> >> >> >> >> >> On 5/25/25 06:22, Suraj Gupta wrote: >> >> > Add support to configure / report interrupt coalesce count and >> >> > delay via ethtool in DMAEngine flow. >> >> > Netperf numbers are not good when using non-dmaengine default >> >> > values, so tuned coalesce count and delay and defined separate >> >> > default values in dmaengine flow. >> >> > >> >> > Netperf numbers and CPU utilisation change in DMAengine flow after >> >> > introducing coalescing with default parameters: >> >> > coalesce parameters: >> >> > Transfer type Before(w/o coalescing) After(with coalescing) >> >> > TCP Tx, CPU utilisation% 925, 27 941, 22 >> >> > TCP Rx, CPU utilisation% 607, 32 741, 36 >> >> > UDP Tx, CPU utilisation% 857, 31 960, 28 >> >> > UDP Rx, CPU utilisation% 762, 26 783, 18 >> >> > >> >> > Above numbers are observed with 4x Cortex-a53. >> >> >> >> How does this affect latency? I would expect these RX settings to >> >> increase latency around 5-10x. I only use these settings with DIM >> >> since it will disable coalescing during periods of light load for better latency. >> >> >> >> (of course the way to fix this in general is RSS or some other method >> >> involving multiple queues). >> >> >> > >> > I took values before NAPI addition in legacy flow (rx_threshold: 24, rx_usec: 50) as >> reference. But netperf numbers were low with them, so tried tuning both and >> selected the pair which gives good numbers. >> >> Yeah, but the reason is that you are trading latency for throughput. >> There is only one queue, so when the interface is saturated you will not get good >> latency anyway (since latency-sensitive packets will get head-of-line blocked). But >> when activity is sparse you can good latency if there is no coalescing. So I think >> coalescing should only be used when there is a lot of traffic. Hence why I only >> adjusted the settings once I implemented DIM. I think you should be able to >> implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as efficient >> without NAPI. >> > > Ok, got it. I'll keep default values used before NAPI in legacy flow (coalesce count: 24, delay: 50) for both Tx and Rx and remove perf comparisons. Those settings are actually probably even worse for latency. I'd leave the settings at 0/0 (coalescing disabled) to match the existing behavior. I think the perf comparisons are helpful, especially for people who know they are going to be throughput-limited. My main point is that I think extending the dmaengine API to allow for DIM will have practical benefits in reduced latency. >> Actually, if you are looking into improving performance, I think lack of NAPI is >> probably the biggest limitation with the dmaengine backend. >> > Yes, I agree. NAPI for DMAEngine implementation is underway and will be sent to mainline soon. Looking forward to it. >> >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> >> >> > --- >> >> > This patch depend on following AXI DMA dmengine driver changes sent >> >> > to dmaengine mailing list as pre-requisit series: >> >> > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. >> >> > com/ >> >> > --- >> >> > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ >> >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 >> >> > +++++++++++++++++++ >> >> > 2 files changed, 59 insertions(+) >> >> > >> >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h >> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h >> >> > index 5ff742103beb..cdf6cbb6f2fd 100644 >> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h >> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h >> >> > @@ -126,6 +126,12 @@ >> >> > #define XAXIDMA_DFT_TX_USEC 50 >> >> > #define XAXIDMA_DFT_RX_USEC 16 >> >> > >> >> > +/* Default TX/RX Threshold and delay timer values for SGDMA mode >> >> > +with >> >> DMAEngine */ >> >> > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 >> >> > +#define XAXIDMAENGINE_DFT_TX_USEC 5 >> >> > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 >> >> > +#define XAXIDMAENGINE_DFT_RX_USEC 16 >> >> > + >> >> > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet >> */ >> >> > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet >> */ >> >> > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control bits */ >> >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> >> > index 1b7a653c1f4e..f9c7d90d4ecb 100644 >> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c >> >> > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct >> >> > net_device *ndev) { >> >> > struct axienet_local *lp = netdev_priv(ndev); >> >> > struct skbuf_dma_descriptor *skbuf_dma; >> >> > + struct dma_slave_config tx_config, rx_config; >> >> > int i, ret; >> >> > >> >> > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ >> >> > -1520,6 >> >> > +1521,22 @@ static int axienet_init_dmaengine(struct net_device >> >> > +*ndev) >> >> > goto err_dma_release_tx; >> >> > } >> >> > >> >> > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; >> >> > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; >> >> > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; >> >> > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; >> >> >> >> I think it would be clearer to just do something like >> >> >> >> struct dma_slave_config tx_config = { >> >> .coalesce_cnt = 16, >> >> .coalesce_usecs = 5, >> >> }; >> >> >> >> since these are only used once. And this ensures that you initialize the whole >> struct. >> >> >> >> But what tree are you using? I don't see these members on net-next or >> dmaengine. >> > >> > These changes are proposed in separate series in dmaengine >> https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd.com/ and I >> described it here below my SOB. >> >> I think you should post those patches with this series to allow them to be reviewed >> appropriately. >> >> --Sean > > DMAengine series functionality depends on commit > (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/commit/drivers/dma/xilinx?h=next&id=7e01511443c30a55a5ae78d3debd46d4d872517e) > in dmaengine which is currently not there in net-next. So I sent that > to dmaengine only. Please let me know if any way to send as single > series. It looks like this won't cause any conflicts, so I think you can just send the whole series with a note in the cover letter like | This series depends on commit 7e01511443c3 ("dmaengine: xilinx_dma: | Set dma_device directions") currently in dmaengine/next. --Sean ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-05-30 20:44 ` Sean Anderson @ 2025-06-03 11:07 ` Gupta, Suraj 2025-06-09 16:22 ` Sean Anderson 0 siblings, 1 reply; 13+ messages in thread From: Gupta, Suraj @ 2025-06-03 11:07 UTC (permalink / raw) To: Sean Anderson, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini [Public] > -----Original Message----- > From: Sean Anderson <sean.anderson@linux.dev> > Sent: Saturday, May 31, 2025 2:15 AM > To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; > Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org > Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini > <harini.katakam@amd.com> > Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce > parameters in DMAengine flow > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On 5/30/25 06:18, Gupta, Suraj wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > >> -----Original Message----- > >> From: Sean Anderson <sean.anderson@linux.dev> > >> Sent: Thursday, May 29, 2025 9:48 PM > >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal > >> <michal.simek@amd.com>; Pandey, Radhey Shyam > >> <radhey.shyam.pandey@amd.com>; horms@kernel.org > >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; > >> Katakam, Harini <harini.katakam@amd.com> > >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and > >> report coalesce parameters in DMAengine flow > >> > >> Caution: This message originated from an External Source. Use proper > >> caution when opening attachments, clicking links, or responding. > >> > >> > >> On 5/28/25 08:00, Gupta, Suraj wrote: > >> > [AMD Official Use Only - AMD Internal Distribution Only] > >> > > >> >> -----Original Message----- > >> >> From: Sean Anderson <sean.anderson@linux.dev> > >> >> Sent: Tuesday, May 27, 2025 9:47 PM > >> >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; > >> >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > >> >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal > >> >> <michal.simek@amd.com>; Pandey, Radhey Shyam > >> >> <radhey.shyam.pandey@amd.com>; horms@kernel.org > >> >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > >> >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; > >> >> Katakam, Harini <harini.katakam@amd.com> > >> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and > >> >> report coalesce parameters in DMAengine flow > >> >> > >> >> Caution: This message originated from an External Source. Use > >> >> proper caution when opening attachments, clicking links, or responding. > >> >> > >> >> > >> >> On 5/25/25 06:22, Suraj Gupta wrote: > >> >> > Add support to configure / report interrupt coalesce count and > >> >> > delay via ethtool in DMAEngine flow. > >> >> > Netperf numbers are not good when using non-dmaengine default > >> >> > values, so tuned coalesce count and delay and defined separate > >> >> > default values in dmaengine flow. > >> >> > > >> >> > Netperf numbers and CPU utilisation change in DMAengine flow > >> >> > after introducing coalescing with default parameters: > >> >> > coalesce parameters: > >> >> > Transfer type Before(w/o coalescing) After(with coalescing) > >> >> > TCP Tx, CPU utilisation% 925, 27 941, 22 > >> >> > TCP Rx, CPU utilisation% 607, 32 741, 36 > >> >> > UDP Tx, CPU utilisation% 857, 31 960, 28 > >> >> > UDP Rx, CPU utilisation% 762, 26 783, 18 > >> >> > > >> >> > Above numbers are observed with 4x Cortex-a53. > >> >> > >> >> How does this affect latency? I would expect these RX settings to > >> >> increase latency around 5-10x. I only use these settings with DIM > >> >> since it will disable coalescing during periods of light load for better latency. > >> >> > >> >> (of course the way to fix this in general is RSS or some other > >> >> method involving multiple queues). > >> >> > >> > > >> > I took values before NAPI addition in legacy flow (rx_threshold: > >> > 24, rx_usec: 50) as > >> reference. But netperf numbers were low with them, so tried tuning > >> both and selected the pair which gives good numbers. > >> > >> Yeah, but the reason is that you are trading latency for throughput. > >> There is only one queue, so when the interface is saturated you will > >> not get good latency anyway (since latency-sensitive packets will get > >> head-of-line blocked). But when activity is sparse you can good > >> latency if there is no coalescing. So I think coalescing should only > >> be used when there is a lot of traffic. Hence why I only adjusted the > >> settings once I implemented DIM. I think you should be able to > >> implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as > efficient without NAPI. > >> > > > > Ok, got it. I'll keep default values used before NAPI in legacy flow (coalesce count: > 24, delay: 50) for both Tx and Rx and remove perf comparisons. > > Those settings are actually probably even worse for latency. I'd leave the settings at > 0/0 (coalescing disabled) to match the existing behavior. I think the perf comparisons > are helpful, especially for people who know they are going to be throughput-limited. > > My main point is that I think extending the dmaengine API to allow for DIM will have > practical benefits in reduced latency. > Sure, will implement DIM for both Tx and Rx in next version. However, I noticed it's implemented for Rx only in legacy flow. Is there any specific reason for that? > >> Actually, if you are looking into improving performance, I think lack > >> of NAPI is probably the biggest limitation with the dmaengine backend. > >> > > Yes, I agree. NAPI for DMAEngine implementation is underway and will be sent to > mainline soon. > > Looking forward to it. > > >> >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> > >> >> > --- > >> >> > This patch depend on following AXI DMA dmengine driver changes > >> >> > sent to dmaengine mailing list as pre-requisit series: > >> >> > https://lore.kernel.org/all/20250525101617.1168991-1-suraj.gupta2@amd. > >> >> > com/ > >> >> > --- > >> >> > drivers/net/ethernet/xilinx/xilinx_axienet.h | 6 +++ > >> >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 53 > >> >> > +++++++++++++++++++ > >> >> > 2 files changed, 59 insertions(+) > >> >> > > >> >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> >> > index 5ff742103beb..cdf6cbb6f2fd 100644 > >> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h > >> >> > @@ -126,6 +126,12 @@ > >> >> > #define XAXIDMA_DFT_TX_USEC 50 > >> >> > #define XAXIDMA_DFT_RX_USEC 16 > >> >> > > >> >> > +/* Default TX/RX Threshold and delay timer values for SGDMA > >> >> > +mode with > >> >> DMAEngine */ > >> >> > +#define XAXIDMAENGINE_DFT_TX_THRESHOLD 16 > >> >> > +#define XAXIDMAENGINE_DFT_TX_USEC 5 > >> >> > +#define XAXIDMAENGINE_DFT_RX_THRESHOLD 24 > >> >> > +#define XAXIDMAENGINE_DFT_RX_USEC 16 > >> >> > + > >> >> > #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx > packet > >> */ > >> >> > #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx > packet > >> */ > >> >> > #define XAXIDMA_BD_CTRL_ALL_MASK 0x0C000000 /* All control > bits */ > >> >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> >> > index 1b7a653c1f4e..f9c7d90d4ecb 100644 > >> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c > >> >> > @@ -1505,6 +1505,7 @@ static int axienet_init_dmaengine(struct > >> >> > net_device *ndev) { > >> >> > struct axienet_local *lp = netdev_priv(ndev); > >> >> > struct skbuf_dma_descriptor *skbuf_dma; > >> >> > + struct dma_slave_config tx_config, rx_config; > >> >> > int i, ret; > >> >> > > >> >> > lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0"); @@ > >> >> > -1520,6 > >> >> > +1521,22 @@ static int axienet_init_dmaengine(struct net_device > >> >> > +*ndev) > >> >> > goto err_dma_release_tx; > >> >> > } > >> >> > > >> >> > + tx_config.coalesce_cnt = XAXIDMAENGINE_DFT_TX_THRESHOLD; > >> >> > + tx_config.coalesce_usecs = XAXIDMAENGINE_DFT_TX_USEC; > >> >> > + rx_config.coalesce_cnt = XAXIDMAENGINE_DFT_RX_THRESHOLD; > >> >> > + rx_config.coalesce_usecs = XAXIDMAENGINE_DFT_RX_USEC; > >> >> > >> >> I think it would be clearer to just do something like > >> >> > >> >> struct dma_slave_config tx_config = { > >> >> .coalesce_cnt = 16, > >> >> .coalesce_usecs = 5, > >> >> }; > >> >> > >> >> since these are only used once. And this ensures that you > >> >> initialize the whole > >> struct. > >> >> > >> >> But what tree are you using? I don't see these members on net-next > >> >> or > >> dmaengine. > >> > > >> > These changes are proposed in separate series in dmaengine > >> https://lore.kernel.org/all/20250525101617.1168991-2-suraj.gupta2@amd > >> .com/ and I described it here below my SOB. > >> > >> I think you should post those patches with this series to allow them > >> to be reviewed appropriately. > >> > >> --Sean > > > > DMAengine series functionality depends on commit > > (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/c > > ommit/drivers/dma/xilinx?h=next&id=7e01511443c30a55a5ae78d3debd46d4d87 > > 2517e) in dmaengine which is currently not there in net-next. So I > > sent that to dmaengine only. Please let me know if any way to send as > > single series. > > It looks like this won't cause any conflicts, so I think you can just send the whole > series with a note in the cover letter like > > | This series depends on commit 7e01511443c3 ("dmaengine: xilinx_dma: > | Set dma_device directions") currently in dmaengine/next. > > --Sean Sure ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow 2025-06-03 11:07 ` Gupta, Suraj @ 2025-06-09 16:22 ` Sean Anderson 0 siblings, 0 replies; 13+ messages in thread From: Sean Anderson @ 2025-06-09 16:22 UTC (permalink / raw) To: Gupta, Suraj, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, vkoul@kernel.org, Simek, Michal, Pandey, Radhey Shyam, horms@kernel.org Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, git (AMD-Xilinx), Katakam, Harini On 6/3/25 07:07, Gupta, Suraj wrote: > [Public] > >> -----Original Message----- >> From: Sean Anderson <sean.anderson@linux.dev> >> Sent: Saturday, May 31, 2025 2:15 AM >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal <michal.simek@amd.com>; >> Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- >> kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; Katakam, Harini >> <harini.katakam@amd.com> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and report coalesce >> parameters in DMAengine flow >> >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> On 5/30/25 06:18, Gupta, Suraj wrote: >> > [AMD Official Use Only - AMD Internal Distribution Only] >> > >> >> -----Original Message----- >> >> From: Sean Anderson <sean.anderson@linux.dev> >> >> Sent: Thursday, May 29, 2025 9:48 PM >> >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal >> >> <michal.simek@amd.com>; Pandey, Radhey Shyam >> >> <radhey.shyam.pandey@amd.com>; horms@kernel.org >> >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; >> >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; >> >> Katakam, Harini <harini.katakam@amd.com> >> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and >> >> report coalesce parameters in DMAengine flow >> >> >> >> Caution: This message originated from an External Source. Use proper >> >> caution when opening attachments, clicking links, or responding. >> >> >> >> >> >> On 5/28/25 08:00, Gupta, Suraj wrote: >> >> > [AMD Official Use Only - AMD Internal Distribution Only] >> >> > >> >> >> -----Original Message----- >> >> >> From: Sean Anderson <sean.anderson@linux.dev> >> >> >> Sent: Tuesday, May 27, 2025 9:47 PM >> >> >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch; >> >> >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; >> >> >> pabeni@redhat.com; vkoul@kernel.org; Simek, Michal >> >> >> <michal.simek@amd.com>; Pandey, Radhey Shyam >> >> >> <radhey.shyam.pandey@amd.com>; horms@kernel.org >> >> >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; >> >> >> linux- kernel@vger.kernel.org; git (AMD-Xilinx) <git@amd.com>; >> >> >> Katakam, Harini <harini.katakam@amd.com> >> >> >> Subject: Re: [PATCH net-next] net: xilinx: axienet: Configure and >> >> >> report coalesce parameters in DMAengine flow >> >> >> >> >> >> Caution: This message originated from an External Source. Use >> >> >> proper caution when opening attachments, clicking links, or responding. >> >> >> >> >> >> >> >> >> On 5/25/25 06:22, Suraj Gupta wrote: >> >> >> > Add support to configure / report interrupt coalesce count and >> >> >> > delay via ethtool in DMAEngine flow. >> >> >> > Netperf numbers are not good when using non-dmaengine default >> >> >> > values, so tuned coalesce count and delay and defined separate >> >> >> > default values in dmaengine flow. >> >> >> > >> >> >> > Netperf numbers and CPU utilisation change in DMAengine flow >> >> >> > after introducing coalescing with default parameters: >> >> >> > coalesce parameters: >> >> >> > Transfer type Before(w/o coalescing) After(with coalescing) >> >> >> > TCP Tx, CPU utilisation% 925, 27 941, 22 >> >> >> > TCP Rx, CPU utilisation% 607, 32 741, 36 >> >> >> > UDP Tx, CPU utilisation% 857, 31 960, 28 >> >> >> > UDP Rx, CPU utilisation% 762, 26 783, 18 >> >> >> > >> >> >> > Above numbers are observed with 4x Cortex-a53. >> >> >> >> >> >> How does this affect latency? I would expect these RX settings to >> >> >> increase latency around 5-10x. I only use these settings with DIM >> >> >> since it will disable coalescing during periods of light load for better latency. >> >> >> >> >> >> (of course the way to fix this in general is RSS or some other >> >> >> method involving multiple queues). >> >> >> >> >> > >> >> > I took values before NAPI addition in legacy flow (rx_threshold: >> >> > 24, rx_usec: 50) as >> >> reference. But netperf numbers were low with them, so tried tuning >> >> both and selected the pair which gives good numbers. >> >> >> >> Yeah, but the reason is that you are trading latency for throughput. >> >> There is only one queue, so when the interface is saturated you will >> >> not get good latency anyway (since latency-sensitive packets will get >> >> head-of-line blocked). But when activity is sparse you can good >> >> latency if there is no coalescing. So I think coalescing should only >> >> be used when there is a lot of traffic. Hence why I only adjusted the >> >> settings once I implemented DIM. I think you should be able to >> >> implement it by calling net_dim from axienet_dma_rx_cb, but it will not be as >> efficient without NAPI. >> >> >> > >> > Ok, got it. I'll keep default values used before NAPI in legacy flow (coalesce count: >> 24, delay: 50) for both Tx and Rx and remove perf comparisons. >> >> Those settings are actually probably even worse for latency. I'd leave the settings at >> 0/0 (coalescing disabled) to match the existing behavior. I think the perf comparisons >> are helpful, especially for people who know they are going to be throughput-limited. >> >> My main point is that I think extending the dmaengine API to allow for DIM will have >> practical benefits in reduced latency. >> > Sure, will implement DIM for both Tx and Rx in next version. However, I noticed it's implemented for Rx only in legacy flow. Is there any specific reason for that? There's no latency issue with sending packets. It doesn't matter when we process Tx completions as long as we refill the ring in time to send more packets. So we can aggressively set the Tx coalescing for maximum throughput. --Sean ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-09 16:22 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-25 10:22 [PATCH net-next] net: xilinx: axienet: Configure and report coalesce parameters in DMAengine flow Suraj Gupta 2025-05-26 0:36 ` kernel test robot 2025-05-27 16:16 ` Sean Anderson 2025-05-28 12:00 ` Gupta, Suraj 2025-05-28 13:09 ` Subbaraya Sundeep 2025-05-29 16:17 ` Sean Anderson 2025-05-29 16:29 ` Andrew Lunn 2025-05-29 16:35 ` Sean Anderson 2025-05-30 10:18 ` Gupta, Suraj 2025-05-30 11:53 ` Gupta, Suraj 2025-05-30 20:44 ` Sean Anderson 2025-06-03 11:07 ` Gupta, Suraj 2025-06-09 16:22 ` Sean Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).