* [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-10-24 6:40 ` Michael Chan
2024-10-22 16:23 ` [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split " Taehee Yoo
` (6 subsequent siblings)
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
The bnxt_en driver supports rx-copybreak, but it couldn't be set by
userspace. Only the default value(256) has worked.
This patch makes the bnxt_en driver support following command.
`ethtool --set-tunable <devname> rx-copybreak <value> ` and
`ethtool --get-tunable <devname> rx-copybreak`.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Remove min rx-copybreak value.
- Add Review tag from Brett.
- Add Test tag from Stanislav.
v3:
- Update copybreak value after closing nic and before opening nic when
the device is running.
v2:
- Define max/vim rx_copybreak value.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 23 +++++----
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +-
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 48 ++++++++++++++++++-
3 files changed, 65 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index bda3742d4e32..0f5fe9ba691d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -81,7 +81,6 @@ MODULE_DESCRIPTION("Broadcom NetXtreme network driver");
#define BNXT_RX_OFFSET (NET_SKB_PAD + NET_IP_ALIGN)
#define BNXT_RX_DMA_OFFSET NET_SKB_PAD
-#define BNXT_RX_COPY_THRESH 256
#define BNXT_TX_PUSH_THRESH 164
@@ -1330,13 +1329,13 @@ static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi, u8 *data,
if (!skb)
return NULL;
- dma_sync_single_for_cpu(&pdev->dev, mapping, bp->rx_copy_thresh,
+ dma_sync_single_for_cpu(&pdev->dev, mapping, bp->rx_copybreak,
bp->rx_dir);
memcpy(skb->data - NET_IP_ALIGN, data - NET_IP_ALIGN,
len + NET_IP_ALIGN);
- dma_sync_single_for_device(&pdev->dev, mapping, bp->rx_copy_thresh,
+ dma_sync_single_for_device(&pdev->dev, mapping, bp->rx_copybreak,
bp->rx_dir);
skb_put(skb, len);
@@ -1829,7 +1828,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
return NULL;
}
- if (len <= bp->rx_copy_thresh) {
+ if (len <= bp->rx_copybreak) {
skb = bnxt_copy_skb(bnapi, data_ptr, len, mapping);
if (!skb) {
bnxt_abort_tpa(cpr, idx, agg_bufs);
@@ -2162,7 +2161,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
}
}
- if (len <= bp->rx_copy_thresh) {
+ if (len <= bp->rx_copybreak) {
if (!xdp_active)
skb = bnxt_copy_skb(bnapi, data_ptr, len, dma_addr);
else
@@ -4451,6 +4450,11 @@ void bnxt_set_tpa_flags(struct bnxt *bp)
bp->flags |= BNXT_FLAG_GRO;
}
+static void bnxt_init_ring_params(struct bnxt *bp)
+{
+ bp->rx_copybreak = BNXT_DEFAULT_RX_COPYBREAK;
+}
+
/* bp->rx_ring_size, bp->tx_ring_size, dev->mtu, BNXT_FLAG_{G|L}RO flags must
* be set on entry.
*/
@@ -4465,7 +4469,6 @@ void bnxt_set_ring_params(struct bnxt *bp)
rx_space = rx_size + ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- bp->rx_copy_thresh = BNXT_RX_COPY_THRESH;
ring_size = bp->rx_ring_size;
bp->rx_agg_ring_size = 0;
bp->rx_agg_nr_pages = 0;
@@ -4510,7 +4513,8 @@ void bnxt_set_ring_params(struct bnxt *bp)
ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) -
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
} else {
- rx_size = SKB_DATA_ALIGN(BNXT_RX_COPY_THRESH + NET_IP_ALIGN);
+ rx_size = SKB_DATA_ALIGN(bp->rx_copybreak +
+ NET_IP_ALIGN);
rx_space = rx_size + NET_SKB_PAD +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
}
@@ -6424,8 +6428,8 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
req->enables |=
cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
- req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
- req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
+ req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
+ req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
}
req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
return hwrm_req_send(bp, req);
@@ -15865,6 +15869,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
bnxt_init_l2_fltr_tbl(bp);
bnxt_set_rx_skb_mode(bp, false);
bnxt_set_tpa_flags(bp);
+ bnxt_init_ring_params(bp);
bnxt_set_ring_params(bp);
bnxt_rdma_aux_device_init(bp);
rc = bnxt_set_dflt_rings(bp, true);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 69231e85140b..1b83a2c8027b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -34,6 +34,9 @@
#include <linux/firmware/broadcom/tee_bnxt_fw.h>
#endif
+#define BNXT_DEFAULT_RX_COPYBREAK 256
+#define BNXT_MAX_RX_COPYBREAK 1024
+
extern struct list_head bnxt_block_cb_list;
struct page_pool;
@@ -2299,7 +2302,7 @@ struct bnxt {
enum dma_data_direction rx_dir;
u32 rx_ring_size;
u32 rx_agg_ring_size;
- u32 rx_copy_thresh;
+ u32 rx_copybreak;
u32 rx_ring_mask;
u32 rx_agg_ring_mask;
int rx_nr_pages;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index f71cc8188b4e..9af0a3f34750 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -4319,6 +4319,50 @@ static int bnxt_get_eee(struct net_device *dev, struct ethtool_keee *edata)
return 0;
}
+static int bnxt_set_tunable(struct net_device *dev,
+ const struct ethtool_tunable *tuna,
+ const void *data)
+{
+ struct bnxt *bp = netdev_priv(dev);
+ u32 rx_copybreak;
+
+ switch (tuna->id) {
+ case ETHTOOL_RX_COPYBREAK:
+ rx_copybreak = *(u32 *)data;
+ if (rx_copybreak > BNXT_MAX_RX_COPYBREAK)
+ return -ERANGE;
+ if (rx_copybreak != bp->rx_copybreak) {
+ if (netif_running(dev)) {
+ bnxt_close_nic(bp, false, false);
+ bp->rx_copybreak = rx_copybreak;
+ bnxt_set_ring_params(bp);
+ bnxt_open_nic(bp, false, false);
+ } else {
+ bp->rx_copybreak = rx_copybreak;
+ }
+ }
+ return 0;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int bnxt_get_tunable(struct net_device *dev,
+ const struct ethtool_tunable *tuna, void *data)
+{
+ struct bnxt *bp = netdev_priv(dev);
+
+ switch (tuna->id) {
+ case ETHTOOL_RX_COPYBREAK:
+ *(u32 *)data = bp->rx_copybreak;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
static int bnxt_read_sfp_module_eeprom_info(struct bnxt *bp, u16 i2c_addr,
u16 page_number, u8 bank,
u16 start_addr, u16 data_length,
@@ -4769,7 +4813,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
cpr = &rxr->bnapi->cp_ring;
if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
cpr = rxr->rx_cpr;
- pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copy_thresh);
+ pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copybreak);
skb = netdev_alloc_skb(bp->dev, pkt_size);
if (!skb)
return -ENOMEM;
@@ -5342,6 +5386,8 @@ const struct ethtool_ops bnxt_ethtool_ops = {
.get_link_ext_stats = bnxt_get_link_ext_stats,
.get_eee = bnxt_get_eee,
.set_eee = bnxt_set_eee,
+ .get_tunable = bnxt_get_tunable,
+ .set_tunable = bnxt_set_tunable,
.get_module_info = bnxt_get_module_info,
.get_module_eeprom = bnxt_get_module_eeprom,
.get_module_eeprom_by_page = bnxt_get_module_eeprom_by_page,
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command
2024-10-22 16:23 ` [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command Taehee Yoo
@ 2024-10-24 6:40 ` Michael Chan
2024-10-24 16:38 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Michael Chan @ 2024-10-24 6:40 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]
On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> The bnxt_en driver supports rx-copybreak, but it couldn't be set by
> userspace. Only the default value(256) has worked.
> This patch makes the bnxt_en driver support following command.
> `ethtool --set-tunable <devname> rx-copybreak <value> ` and
> `ethtool --get-tunable <devname> rx-copybreak`.
>
> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v4:
> - Remove min rx-copybreak value.
> - Add Review tag from Brett.
> - Add Test tag from Stanislav.
>
> v3:
> - Update copybreak value after closing nic and before opening nic when
> the device is running.
>
> v2:
> - Define max/vim rx_copybreak value.
>
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 23 +++++----
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +-
> .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 48 ++++++++++++++++++-
> 3 files changed, 65 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index bda3742d4e32..0f5fe9ba691d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -4510,7 +4513,8 @@ void bnxt_set_ring_params(struct bnxt *bp)
> ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) -
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> } else {
> - rx_size = SKB_DATA_ALIGN(BNXT_RX_COPY_THRESH + NET_IP_ALIGN);
> + rx_size = SKB_DATA_ALIGN(bp->rx_copybreak +
> + NET_IP_ALIGN);
When rx_copybreak is 0 or very small, rx_size will be very small and
will be a problem. We need rx_size to be big enough to contain the
packet header, so rx_size cannot be below some minimum (256?).
> rx_space = rx_size + NET_SKB_PAD +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> }
> @@ -6424,8 +6428,8 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> req->enables |=
> cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> - req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> + req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> + req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
Similarly, these thresholds should not go to 0 when rx_copybreak becomes small.
> }
> req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> return hwrm_req_send(bp, req);
> @@ -4769,7 +4813,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
> cpr = &rxr->bnapi->cp_ring;
> if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
> cpr = rxr->rx_cpr;
> - pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copy_thresh);
> + pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copybreak);
The loopback test will also not work if rx_copybreak is very small. I
think we should always use 256 bytes for the loopback test packet
size. Thanks.
> skb = netdev_alloc_skb(bp->dev, pkt_size);
> if (!skb)
> return -ENOMEM;
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command
2024-10-24 6:40 ` Michael Chan
@ 2024-10-24 16:38 ` Taehee Yoo
2024-10-25 4:54 ` Michael Chan
0 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-24 16:38 UTC (permalink / raw)
To: Michael Chan
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Thu, Oct 24, 2024 at 3:41 PM Michael Chan
Hi Michael,
Thank you so much for the review!
<michael.chan@broadcom.com> wrote:
>
> On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > The bnxt_en driver supports rx-copybreak, but it couldn't be set by
> > userspace. Only the default value(256) has worked.
> > This patch makes the bnxt_en driver support following command.
> > `ethtool --set-tunable <devname> rx-copybreak <value> ` and
> > `ethtool --get-tunable <devname> rx-copybreak`.
> >
> > Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Remove min rx-copybreak value.
> > - Add Review tag from Brett.
> > - Add Test tag from Stanislav.
> >
> > v3:
> > - Update copybreak value after closing nic and before opening nic when
> > the device is running.
> >
> > v2:
> > - Define max/vim rx_copybreak value.
> >
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 23 +++++----
> > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +-
> > .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 48 ++++++++++++++++++-
> > 3 files changed, 65 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index bda3742d4e32..0f5fe9ba691d 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>
> > @@ -4510,7 +4513,8 @@ void bnxt_set_ring_params(struct bnxt *bp)
> > ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) -
> > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > } else {
> > - rx_size = SKB_DATA_ALIGN(BNXT_RX_COPY_THRESH + NET_IP_ALIGN);
> > + rx_size = SKB_DATA_ALIGN(bp->rx_copybreak +
> > + NET_IP_ALIGN);
>
> When rx_copybreak is 0 or very small, rx_size will be very small and
> will be a problem. We need rx_size to be big enough to contain the
> packet header, so rx_size cannot be below some minimum (256?).
>
> > rx_space = rx_size + NET_SKB_PAD +
> > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > }
> > @@ -6424,8 +6428,8 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > req->enables |=
> > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > - req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > + req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> > + req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
>
> Similarly, these thresholds should not go to 0 when rx_copybreak becomes small.
>
> > }
> > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > return hwrm_req_send(bp, req);
>
> > @@ -4769,7 +4813,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
> > cpr = &rxr->bnapi->cp_ring;
> > if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
> > cpr = rxr->rx_cpr;
> > - pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copy_thresh);
> > + pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copybreak);
>
> The loopback test will also not work if rx_copybreak is very small. I
> think we should always use 256 bytes for the loopback test packet
> size. Thanks.
>
> > skb = netdev_alloc_skb(bp->dev, pkt_size);
> > if (!skb)
> > return -ENOMEM;
I tested `ethtool -t eth0` and I checked it fails if rx-copybreak is too
small. Sorry for missing that.
I think we can use max(BNXT_DEFAULT_RX_COPYBREAK,
bp->rx_copybreak) for both cases.
I tested it, it works well.
So I will use that if you are okay!
Thank you so much,
Tahee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command
2024-10-24 16:38 ` Taehee Yoo
@ 2024-10-25 4:54 ` Michael Chan
2024-10-25 8:07 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Michael Chan @ 2024-10-25 4:54 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
[-- Attachment #1: Type: text/plain, Size: 3256 bytes --]
On Thu, Oct 24, 2024 at 9:38 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> On Thu, Oct 24, 2024 at 3:41 PM Michael Chan
> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > index bda3742d4e32..0f5fe9ba691d 100644
> > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> >
> > > @@ -4510,7 +4513,8 @@ void bnxt_set_ring_params(struct bnxt *bp)
> > > ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) -
> > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > } else {
> > > - rx_size = SKB_DATA_ALIGN(BNXT_RX_COPY_THRESH + NET_IP_ALIGN);
> > > + rx_size = SKB_DATA_ALIGN(bp->rx_copybreak +
> > > + NET_IP_ALIGN);
> >
> > When rx_copybreak is 0 or very small, rx_size will be very small and
> > will be a problem. We need rx_size to be big enough to contain the
> > packet header, so rx_size cannot be below some minimum (256?).
> >
> > > rx_space = rx_size + NET_SKB_PAD +
> > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > }
> > > @@ -6424,8 +6428,8 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > > req->enables |=
> > > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > > - req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > > + req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> > > + req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
> >
> > Similarly, these thresholds should not go to 0 when rx_copybreak becomes small.
> >
> > > }
> > > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > > return hwrm_req_send(bp, req);
> >
> > > @@ -4769,7 +4813,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
> > > cpr = &rxr->bnapi->cp_ring;
> > > if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
> > > cpr = rxr->rx_cpr;
> > > - pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copy_thresh);
> > > + pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copybreak);
> >
> > The loopback test will also not work if rx_copybreak is very small. I
> > think we should always use 256 bytes for the loopback test packet
> > size. Thanks.
> >
> > > skb = netdev_alloc_skb(bp->dev, pkt_size);
> > > if (!skb)
> > > return -ENOMEM;
>
> I tested `ethtool -t eth0` and I checked it fails if rx-copybreak is too
> small. Sorry for missing that.
> I think we can use max(BNXT_DEFAULT_RX_COPYBREAK,
> bp->rx_copybreak) for both cases.
> I tested it, it works well.
> So I will use that if you are okay!
>
Yes, please go ahead. I think all 3 places I commented above should
use max(BNXT_DEFAULT_RX_COPYBREAK, bp->rx_copybreak). Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command
2024-10-25 4:54 ` Michael Chan
@ 2024-10-25 8:07 ` Taehee Yoo
0 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-10-25 8:07 UTC (permalink / raw)
To: Michael Chan
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Oct 25, 2024 at 1:55 PM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Thu, Oct 24, 2024 at 9:38 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 3:41 PM Michael Chan
>
> > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > index bda3742d4e32..0f5fe9ba691d 100644
> > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > >
> > > > @@ -4510,7 +4513,8 @@ void bnxt_set_ring_params(struct bnxt *bp)
> > > > ALIGN(max(NET_SKB_PAD, XDP_PACKET_HEADROOM), 8) -
> > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > } else {
> > > > - rx_size = SKB_DATA_ALIGN(BNXT_RX_COPY_THRESH + NET_IP_ALIGN);
> > > > + rx_size = SKB_DATA_ALIGN(bp->rx_copybreak +
> > > > + NET_IP_ALIGN);
> > >
> > > When rx_copybreak is 0 or very small, rx_size will be very small and
> > > will be a problem. We need rx_size to be big enough to contain the
> > > packet header, so rx_size cannot be below some minimum (256?).
> > >
> > > > rx_space = rx_size + NET_SKB_PAD +
> > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > }
> > > > @@ -6424,8 +6428,8 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > > > req->enables |=
> > > > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > > > - req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > > > + req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> > > > + req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
> > >
> > > Similarly, these thresholds should not go to 0 when rx_copybreak becomes small.
> > >
> > > > }
> > > > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > > > return hwrm_req_send(bp, req);
> > >
> > > > @@ -4769,7 +4813,7 @@ static int bnxt_run_loopback(struct bnxt *bp)
> > > > cpr = &rxr->bnapi->cp_ring;
> > > > if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
> > > > cpr = rxr->rx_cpr;
> > > > - pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copy_thresh);
> > > > + pkt_size = min(bp->dev->mtu + ETH_HLEN, bp->rx_copybreak);
> > >
> > > The loopback test will also not work if rx_copybreak is very small. I
> > > think we should always use 256 bytes for the loopback test packet
> > > size. Thanks.
> > >
> > > > skb = netdev_alloc_skb(bp->dev, pkt_size);
> > > > if (!skb)
> > > > return -ENOMEM;
> >
> > I tested `ethtool -t eth0` and I checked it fails if rx-copybreak is too
> > small. Sorry for missing that.
> > I think we can use max(BNXT_DEFAULT_RX_COPYBREAK,
> > bp->rx_copybreak) for both cases.
> > I tested it, it works well.
> > So I will use that if you are okay!
> >
>
> Yes, please go ahead. I think all 3 places I commented above should
> use max(BNXT_DEFAULT_RX_COPYBREAK, bp->rx_copybreak). Thanks.
Thanks a lot for your confirmation :)
I asked about jumbo_thresh in another thread, I would like to do it
based on that thread for jumbo_thresh.
And hds_thresh value will be configured by
`ethtool -G eth0 header-data-split-thresh 0", so that we need to
allow 0 ~ 256 value.
I think you indicated jumbo_thresh only, right?
Thank you so much!
Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
2024-10-22 16:23 ` [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-10-25 5:02 ` Michael Chan
2024-10-22 16:23 ` [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh Taehee Yoo
` (5 subsequent siblings)
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
NICs that uses bnxt_en driver supports tcp-data-split feature by the
name of HDS(header-data-split).
But there is no implementation for the HDS to enable or disable by
ethtool.
Only getting the current HDS status is implemented and The HDS is just
automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
The hds_threshold follows rx-copybreak value. and it was unchangeable.
This implements `ethtool -G <interface name> tcp-data-split <value>`
command option.
The value can be <on>, <off>, and <auto> but the <auto> will be
automatically changed to <on>.
HDS feature relies on the aggregation ring.
So, if HDS is enabled, the bnxt_en driver initializes the aggregation
ring.
This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Do not support disable tcp-data-split.
- Add Test tag from Stanislav.
v3:
- No changes.
v2:
- Do not set hds_threshold to 0.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0f5fe9ba691d..91ea42ff9b17 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4473,7 +4473,7 @@ void bnxt_set_ring_params(struct bnxt *bp)
bp->rx_agg_ring_size = 0;
bp->rx_agg_nr_pages = 0;
- if (bp->flags & BNXT_FLAG_TPA)
+ if (bp->flags & BNXT_FLAG_TPA || bp->flags & BNXT_FLAG_HDS)
agg_factor = min_t(u32, 4, 65536 / BNXT_RX_PAGE_SIZE);
bp->flags &= ~BNXT_FLAG_JUMBO;
@@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
+ req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
- if (BNXT_RX_PAGE_MODE(bp)) {
- req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
- } else {
+ if (bp->flags & BNXT_FLAG_AGG_RINGS) {
req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
req->enables |=
cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
- req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
}
req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1b83a2c8027b..432bc19b35ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2201,8 +2201,6 @@ struct bnxt {
#define BNXT_FLAG_TPA (BNXT_FLAG_LRO | BNXT_FLAG_GRO)
#define BNXT_FLAG_JUMBO 0x10
#define BNXT_FLAG_STRIP_VLAN 0x20
- #define BNXT_FLAG_AGG_RINGS (BNXT_FLAG_JUMBO | BNXT_FLAG_GRO | \
- BNXT_FLAG_LRO)
#define BNXT_FLAG_RFS 0x100
#define BNXT_FLAG_SHARED_RINGS 0x200
#define BNXT_FLAG_PORT_STATS 0x400
@@ -2223,6 +2221,9 @@ struct bnxt {
#define BNXT_FLAG_ROCE_MIRROR_CAP 0x4000000
#define BNXT_FLAG_TX_COAL_CMPL 0x8000000
#define BNXT_FLAG_PORT_STATS_EXT 0x10000000
+ #define BNXT_FLAG_HDS 0x20000000
+ #define BNXT_FLAG_AGG_RINGS (BNXT_FLAG_JUMBO | BNXT_FLAG_GRO | \
+ BNXT_FLAG_LRO | BNXT_FLAG_HDS)
#define BNXT_FLAG_ALL_CONFIG_FEATS (BNXT_FLAG_TPA | \
BNXT_FLAG_RFS | \
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 9af0a3f34750..5172d0547e0c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -854,9 +854,21 @@ static int bnxt_set_ringparam(struct net_device *dev,
(ering->tx_pending < BNXT_MIN_TX_DESC_CNT))
return -EINVAL;
+ if (kernel_ering->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED)
+ return -EOPNOTSUPP;
+
if (netif_running(dev))
bnxt_close_nic(bp, false, false);
+ switch (kernel_ering->tcp_data_split) {
+ case ETHTOOL_TCP_DATA_SPLIT_ENABLED:
+ bp->flags |= BNXT_FLAG_HDS;
+ break;
+ case ETHTOOL_TCP_DATA_SPLIT_UNKNOWN:
+ bp->flags &= ~BNXT_FLAG_HDS;
+ break;
+ }
+
bp->rx_ring_size = ering->rx_pending;
bp->tx_ring_size = ering->tx_pending;
bnxt_set_ring_params(bp);
@@ -5345,6 +5357,7 @@ const struct ethtool_ops bnxt_ethtool_ops = {
ETHTOOL_COALESCE_STATS_BLOCK_USECS |
ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
ETHTOOL_COALESCE_USE_CQE,
+ .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
.get_link_ksettings = bnxt_get_link_ksettings,
.set_link_ksettings = bnxt_set_link_ksettings,
.get_fec_stats = bnxt_get_fec_stats,
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-22 16:23 ` [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split " Taehee Yoo
@ 2024-10-25 5:02 ` Michael Chan
2024-10-25 7:59 ` Taehee Yoo
2024-10-25 19:24 ` Andy Gospodarek
0 siblings, 2 replies; 32+ messages in thread
From: Michael Chan @ 2024-10-25 5:02 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka,
Andrew Gospodarek
[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]
On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> NICs that uses bnxt_en driver supports tcp-data-split feature by the
> name of HDS(header-data-split).
> But there is no implementation for the HDS to enable or disable by
> ethtool.
> Only getting the current HDS status is implemented and The HDS is just
> automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> The hds_threshold follows rx-copybreak value. and it was unchangeable.
>
> This implements `ethtool -G <interface name> tcp-data-split <value>`
> command option.
> The value can be <on>, <off>, and <auto> but the <auto> will be
> automatically changed to <on>.
>
> HDS feature relies on the aggregation ring.
> So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> ring.
> This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v4:
> - Do not support disable tcp-data-split.
> - Add Test tag from Stanislav.
>
> v3:
> - No changes.
>
> v2:
> - Do not set hds_threshold to 0.
>
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 0f5fe9ba691d..91ea42ff9b17 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
>
> req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
>
> - if (BNXT_RX_PAGE_MODE(bp)) {
> - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
Please explain why this "if" condition is removed.
BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
comment.
> - } else {
> + if (bp->flags & BNXT_FLAG_AGG_RINGS) {
> req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> req->enables |=
> cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> - req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
> }
> req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-25 5:02 ` Michael Chan
@ 2024-10-25 7:59 ` Taehee Yoo
2024-10-25 19:24 ` Andy Gospodarek
1 sibling, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-10-25 7:59 UTC (permalink / raw)
To: Michael Chan
Cc: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka,
Andrew Gospodarek
On Fri, Oct 25, 2024 at 2:02 PM Michael Chan
Hi Michael,
Thank you so much for the review!
<michael.chan@broadcom.com> wrote:
>
> On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > name of HDS(header-data-split).
> > But there is no implementation for the HDS to enable or disable by
> > ethtool.
> > Only getting the current HDS status is implemented and The HDS is just
> > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> >
> > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > command option.
> > The value can be <on>, <off>, and <auto> but the <auto> will be
> > automatically changed to <on>.
> >
> > HDS feature relies on the aggregation ring.
> > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > ring.
> > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> >
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Do not support disable tcp-data-split.
> > - Add Test tag from Stanislav.
> >
> > v3:
> > - No changes.
> >
> > v2:
> > - Do not set hds_threshold to 0.
> >
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > 3 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 0f5fe9ba691d..91ea42ff9b17 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>
> > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> >
> > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> >
> > - if (BNXT_RX_PAGE_MODE(bp)) {
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
>
> Please explain why this "if" condition is removed.
> BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> comment.
Yes,
The reason why the "if" condition is removed is to make rx-copybreak
a pure software feature.
The current jumbo_thresh follows the rx-copybreak value, however,
I thought the rx-copybreak value should not affect any hardware function.
So, I thought following rx_buf_use_size instead of rx_copybreak is okay.
By this change, jumbo_thresh always follows rx_buf_use_size,
so I removed the "if" condition.
Oh, on second thought, it changes a default behavior, it's not my intention.
What value would be good for jumbo_thresh following?
What do you think?
>
> > - } else {
> > + if (bp->flags & BNXT_FLAG_AGG_RINGS) {
> > req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > req->enables |=
> > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> > req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
> > }
> > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
Thank you so much for the review!
Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-25 5:02 ` Michael Chan
2024-10-25 7:59 ` Taehee Yoo
@ 2024-10-25 19:24 ` Andy Gospodarek
2024-10-25 22:00 ` Michael Chan
1 sibling, 1 reply; 32+ messages in thread
From: Andy Gospodarek @ 2024-10-25 19:24 UTC (permalink / raw)
To: Michael Chan
Cc: Taehee Yoo, davem, kuba, pabeni, edumazet, almasrymina,
donald.hunter, corbet, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka,
Andrew Gospodarek
On Thu, Oct 24, 2024 at 10:02:30PM -0700, Michael Chan wrote:
> On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > name of HDS(header-data-split).
> > But there is no implementation for the HDS to enable or disable by
> > ethtool.
> > Only getting the current HDS status is implemented and The HDS is just
> > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> >
> > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > command option.
> > The value can be <on>, <off>, and <auto> but the <auto> will be
> > automatically changed to <on>.
> >
> > HDS feature relies on the aggregation ring.
> > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > ring.
> > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> >
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Do not support disable tcp-data-split.
> > - Add Test tag from Stanislav.
> >
> > v3:
> > - No changes.
> >
> > v2:
> > - Do not set hds_threshold to 0.
> >
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > 3 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 0f5fe9ba691d..91ea42ff9b17 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>
> > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> >
> > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> >
> > - if (BNXT_RX_PAGE_MODE(bp)) {
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
>
> Please explain why this "if" condition is removed.
> BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> comment.
>
In bnxt_set_rx_skb_mode we set BNXT_FLAG_RX_PAGE_MODE and clear
BNXT_FLAG_AGG_RINGS, so this should work. The only issue is that we
have spots in the driver where we check BNXT_RX_PAGE_MODE(bp) to
indicate that XDP single-buffer mode is enabled on the device.
If you need to respin this series I would prefer that the change is like
below to key off the page mode being disabled and BNXT_FLAG_AGG_RINGS
being enabled to setup HDS. This will serve as a reminder that this is
for XDP.
@@ -6418,15 +6418,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
+ req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
- if (BNXT_RX_PAGE_MODE(bp)) {
- req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
- } else {
+ if (!BNXT_RX_PAGE_MODE(bp) && (bp->flags & BNXT_FLAG_AGG_RINGS)) {
req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
req->enables |=
cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
- req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
}
req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > - } else {
> > + if (bp->flags & BNXT_FLAG_AGG_RINGS) {
> > req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > req->enables |=
> > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_copybreak);
> > req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
> > }
> > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-25 19:24 ` Andy Gospodarek
@ 2024-10-25 22:00 ` Michael Chan
2024-10-26 5:11 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Michael Chan @ 2024-10-25 22:00 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Taehee Yoo, davem, kuba, pabeni, edumazet, almasrymina,
donald.hunter, corbet, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]
On Fri, Oct 25, 2024 at 12:24 PM Andy Gospodarek
<andrew.gospodarek@broadcom.com> wrote:
>
> On Thu, Oct 24, 2024 at 10:02:30PM -0700, Michael Chan wrote:
> > On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> > >
> > > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > > name of HDS(header-data-split).
> > > But there is no implementation for the HDS to enable or disable by
> > > ethtool.
> > > Only getting the current HDS status is implemented and The HDS is just
> > > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> > >
> > > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > > command option.
> > > The value can be <on>, <off>, and <auto> but the <auto> will be
> > > automatically changed to <on>.
> > >
> > > HDS feature relies on the aggregation ring.
> > > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > > ring.
> > > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> > >
> > > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > > ---
> > >
> > > v4:
> > > - Do not support disable tcp-data-split.
> > > - Add Test tag from Stanislav.
> > >
> > > v3:
> > > - No changes.
> > >
> > > v2:
> > > - Do not set hds_threshold to 0.
> > >
> > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > > 3 files changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > index 0f5fe9ba691d..91ea42ff9b17 100644
> > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> >
> > > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > >
> > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > >
> > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> >
> > Please explain why this "if" condition is removed.
> > BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> > don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> > comment.
> >
>
> In bnxt_set_rx_skb_mode we set BNXT_FLAG_RX_PAGE_MODE and clear
> BNXT_FLAG_AGG_RINGS
The BNXT_FLAG_AGG_RINGS flag is true if the JUMBO, GRO, or LRO flag is
set. So even though it is initially cleared in
bnxt_set_rx_skb_mode(), we'll set the JUMBO flag if we are in
multi-buffer XDP mode. Again, we don't enable HDS in any XDP mode so
I think we need to keep the original logic here to skip setting the
HDS threshold if BNXT_FLAG_RX_PAGE_MODE is set.
> , so this should work. The only issue is that we
> have spots in the driver where we check BNXT_RX_PAGE_MODE(bp) to
> indicate that XDP single-buffer mode is enabled on the device.
>
> If you need to respin this series I would prefer that the change is like
> below to key off the page mode being disabled and BNXT_FLAG_AGG_RINGS
> being enabled to setup HDS. This will serve as a reminder that this is
> for XDP.
>
> @@ -6418,15 +6418,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
>
> req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
>
> - if (BNXT_RX_PAGE_MODE(bp)) {
> - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> - } else {
> + if (!BNXT_RX_PAGE_MODE(bp) && (bp->flags & BNXT_FLAG_AGG_RINGS)) {
> req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> req->enables |=
> cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> }
> req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-25 22:00 ` Michael Chan
@ 2024-10-26 5:11 ` Taehee Yoo
2024-10-30 20:39 ` Andy Gospodarek
0 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-26 5:11 UTC (permalink / raw)
To: Michael Chan
Cc: Andy Gospodarek, davem, kuba, pabeni, edumazet, almasrymina,
donald.hunter, corbet, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Sat, Oct 26, 2024 at 7:00 AM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Fri, Oct 25, 2024 at 12:24 PM Andy Gospodarek
> <andrew.gospodarek@broadcom.com> wrote:
Hi Andy,
Thank you so much for your review!
> >
> > On Thu, Oct 24, 2024 at 10:02:30PM -0700, Michael Chan wrote:
> > > On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> > > >
> > > > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > > > name of HDS(header-data-split).
> > > > But there is no implementation for the HDS to enable or disable by
> > > > ethtool.
> > > > Only getting the current HDS status is implemented and The HDS is just
> > > > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > > > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> > > >
> > > > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > > > command option.
> > > > The value can be <on>, <off>, and <auto> but the <auto> will be
> > > > automatically changed to <on>.
> > > >
> > > > HDS feature relies on the aggregation ring.
> > > > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > > > ring.
> > > > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> > > >
> > > > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > > > ---
> > > >
> > > > v4:
> > > > - Do not support disable tcp-data-split.
> > > > - Add Test tag from Stanislav.
> > > >
> > > > v3:
> > > > - No changes.
> > > >
> > > > v2:
> > > > - Do not set hds_threshold to 0.
> > > >
> > > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > > > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > > > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > > > 3 files changed, 19 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > index 0f5fe9ba691d..91ea42ff9b17 100644
> > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > >
> > > > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > >
> > > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > >
> > > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > >
> > > Please explain why this "if" condition is removed.
> > > BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> > > don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> > > comment.
> > >
> >
> > In bnxt_set_rx_skb_mode we set BNXT_FLAG_RX_PAGE_MODE and clear
> > BNXT_FLAG_AGG_RINGS
>
> The BNXT_FLAG_AGG_RINGS flag is true if the JUMBO, GRO, or LRO flag is
> set. So even though it is initially cleared in
> bnxt_set_rx_skb_mode(), we'll set the JUMBO flag if we are in
> multi-buffer XDP mode. Again, we don't enable HDS in any XDP mode so
> I think we need to keep the original logic here to skip setting the
> HDS threshold if BNXT_FLAG_RX_PAGE_MODE is set.
I thought the HDS is disallowed only when single-buffer XDP is set.
By this series, Core API disallows tcp-data-split only when
single-buffer XDP is set, but it allows tcp-data-split to set when
multi-buffer XDP is set.
+ if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED &&
+ dev_xdp_sb_prog_count(dev)) {
+ NL_SET_ERR_MSG(info->extack,
+ "tcp-data-split can not be enabled with
single buffer XDP");
+ return -EINVAL;
+ }
I think other drivers would allow tcp-data-split on multi buffer XDP,
so I wouldn't like to remove this condition check code.
I will not set HDS if XDP is set in the bnxt_hwrm_vnic_set_hds()
In addition, I think we need to add a condition to check XDP is set in
bnxt_set_ringparam().
>
> > , so this should work. The only issue is that we
> > have spots in the driver where we check BNXT_RX_PAGE_MODE(bp) to
> > indicate that XDP single-buffer mode is enabled on the device.
> >
> > If you need to respin this series I would prefer that the change is like
> > below to key off the page mode being disabled and BNXT_FLAG_AGG_RINGS
> > being enabled to setup HDS. This will serve as a reminder that this is
> > for XDP.
> >
> > @@ -6418,15 +6418,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> >
> > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> >
> > - if (BNXT_RX_PAGE_MODE(bp)) {
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > - } else {
> > + if (!BNXT_RX_PAGE_MODE(bp) && (bp->flags & BNXT_FLAG_AGG_RINGS)) {
> > req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > req->enables |=
> > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > }
> > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> >
Thanks a lot!
Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-26 5:11 ` Taehee Yoo
@ 2024-10-30 20:39 ` Andy Gospodarek
2024-10-31 5:20 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Andy Gospodarek @ 2024-10-30 20:39 UTC (permalink / raw)
To: Taehee Yoo
Cc: Michael Chan, Andy Gospodarek, davem, kuba, pabeni, edumazet,
almasrymina, donald.hunter, corbet, andrew+netdev, hawk,
ilias.apalodimas, ast, daniel, john.fastabend, dw, sdf,
asml.silence, brett.creeley, linux-doc, netdev, kory.maincent,
maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka
On Sat, Oct 26, 2024 at 02:11:15PM +0900, Taehee Yoo wrote:
> On Sat, Oct 26, 2024 at 7:00 AM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Fri, Oct 25, 2024 at 12:24 PM Andy Gospodarek
> > <andrew.gospodarek@broadcom.com> wrote:
>
> Hi Andy,
> Thank you so much for your review!
>
> > >
> > > On Thu, Oct 24, 2024 at 10:02:30PM -0700, Michael Chan wrote:
> > > > On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> > > > >
> > > > > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > > > > name of HDS(header-data-split).
> > > > > But there is no implementation for the HDS to enable or disable by
> > > > > ethtool.
> > > > > Only getting the current HDS status is implemented and The HDS is just
> > > > > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > > > > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> > > > >
> > > > > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > > > > command option.
> > > > > The value can be <on>, <off>, and <auto> but the <auto> will be
> > > > > automatically changed to <on>.
> > > > >
> > > > > HDS feature relies on the aggregation ring.
> > > > > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > > > > ring.
> > > > > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> > > > >
> > > > > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > > > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > > > > ---
> > > > >
> > > > > v4:
> > > > > - Do not support disable tcp-data-split.
> > > > > - Add Test tag from Stanislav.
> > > > >
> > > > > v3:
> > > > > - No changes.
> > > > >
> > > > > v2:
> > > > > - Do not set hds_threshold to 0.
> > > > >
> > > > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > > > > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > > > > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > > > > 3 files changed, 19 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > index 0f5fe9ba691d..91ea42ff9b17 100644
> > > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > >
> > > > > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > > >
> > > > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > > >
> > > > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > >
> > > > Please explain why this "if" condition is removed.
> > > > BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> > > > don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> > > > comment.
> > > >
> > >
> > > In bnxt_set_rx_skb_mode we set BNXT_FLAG_RX_PAGE_MODE and clear
> > > BNXT_FLAG_AGG_RINGS
> >
> > The BNXT_FLAG_AGG_RINGS flag is true if the JUMBO, GRO, or LRO flag is
> > set. So even though it is initially cleared in
> > bnxt_set_rx_skb_mode(), we'll set the JUMBO flag if we are in
> > multi-buffer XDP mode. Again, we don't enable HDS in any XDP mode so
> > I think we need to keep the original logic here to skip setting the
> > HDS threshold if BNXT_FLAG_RX_PAGE_MODE is set.
>
> I thought the HDS is disallowed only when single-buffer XDP is set.
> By this series, Core API disallows tcp-data-split only when
> single-buffer XDP is set, but it allows tcp-data-split to set when
> multi-buffer XDP is set.
So you are saying that a user could set copybreak with ethtool (included
in patch 1) and when a multibuffer XDP program is attached to an
interface with an MTU of 9k, only the header will be in the first page
and all the TCP data will be in the pages that follow?
> + if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED &&
> + dev_xdp_sb_prog_count(dev)) {
> + NL_SET_ERR_MSG(info->extack,
> + "tcp-data-split can not be enabled with
> single buffer XDP");
> + return -EINVAL;
> + }
>
> I think other drivers would allow tcp-data-split on multi buffer XDP,
> so I wouldn't like to remove this condition check code.
>
I have no problem keeping that logic in the core kernel. I'm just
asking you to please preserve the existing logic since it is
functionally equivalent and easier to read/compare to other spots where
XDP single-buffer mode is used.
> I will not set HDS if XDP is set in the bnxt_hwrm_vnic_set_hds()
> In addition, I think we need to add a condition to check XDP is set in
> bnxt_set_ringparam().
>
> >
> > > , so this should work. The only issue is that we
> > > have spots in the driver where we check BNXT_RX_PAGE_MODE(bp) to
> > > indicate that XDP single-buffer mode is enabled on the device.
> > >
> > > If you need to respin this series I would prefer that the change is like
> > > below to key off the page mode being disabled and BNXT_FLAG_AGG_RINGS
> > > being enabled to setup HDS. This will serve as a reminder that this is
> > > for XDP.
> > >
> > > @@ -6418,15 +6418,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > >
> > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > >
> > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > - } else {
> > > + if (!BNXT_RX_PAGE_MODE(bp) && (bp->flags & BNXT_FLAG_AGG_RINGS)) {
> > > req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> > > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > > req->enables |=
> > > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > > req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > > }
> > > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > >
>
> Thanks a lot!
> Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split ethtool command
2024-10-30 20:39 ` Andy Gospodarek
@ 2024-10-31 5:20 ` Taehee Yoo
0 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-10-31 5:20 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Michael Chan, davem, kuba, pabeni, edumazet, almasrymina,
donald.hunter, corbet, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Thu, Oct 31, 2024 at 5:39 AM Andy Gospodarek
<andrew.gospodarek@broadcom.com> wrote:
>
> On Sat, Oct 26, 2024 at 02:11:15PM +0900, Taehee Yoo wrote:
> > On Sat, Oct 26, 2024 at 7:00 AM Michael Chan <michael.chan@broadcom.com> wrote:
> > >
> > > On Fri, Oct 25, 2024 at 12:24 PM Andy Gospodarek
> > > <andrew.gospodarek@broadcom.com> wrote:
> >
> > Hi Andy,
> > Thank you so much for your review!
> >
> > > >
> > > > On Thu, Oct 24, 2024 at 10:02:30PM -0700, Michael Chan wrote:
> > > > > On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> > > > > >
> > > > > > NICs that uses bnxt_en driver supports tcp-data-split feature by the
> > > > > > name of HDS(header-data-split).
> > > > > > But there is no implementation for the HDS to enable or disable by
> > > > > > ethtool.
> > > > > > Only getting the current HDS status is implemented and The HDS is just
> > > > > > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > > > > > The hds_threshold follows rx-copybreak value. and it was unchangeable.
> > > > > >
> > > > > > This implements `ethtool -G <interface name> tcp-data-split <value>`
> > > > > > command option.
> > > > > > The value can be <on>, <off>, and <auto> but the <auto> will be
> > > > > > automatically changed to <on>.
> > > > > >
> > > > > > HDS feature relies on the aggregation ring.
> > > > > > So, if HDS is enabled, the bnxt_en driver initializes the aggregation
> > > > > > ring.
> > > > > > This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
> > > > > >
> > > > > > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > > > > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > > > > > ---
> > > > > >
> > > > > > v4:
> > > > > > - Do not support disable tcp-data-split.
> > > > > > - Add Test tag from Stanislav.
> > > > > >
> > > > > > v3:
> > > > > > - No changes.
> > > > > >
> > > > > > v2:
> > > > > > - Do not set hds_threshold to 0.
> > > > > >
> > > > > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-----
> > > > > > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++--
> > > > > > drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 13 +++++++++++++
> > > > > > 3 files changed, 19 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > > index 0f5fe9ba691d..91ea42ff9b17 100644
> > > > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > > >
> > > > > > @@ -6420,15 +6420,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > > > >
> > > > > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > > > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > > > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > > > >
> > > > > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > > >
> > > > > Please explain why this "if" condition is removed.
> > > > > BNXT_RX_PAGE_MODE() means that we are in XDP mode and we currently
> > > > > don't support HDS in XDP mode. Added Andy Gospo to CC so he can also
> > > > > comment.
> > > > >
> > > >
> > > > In bnxt_set_rx_skb_mode we set BNXT_FLAG_RX_PAGE_MODE and clear
> > > > BNXT_FLAG_AGG_RINGS
> > >
> > > The BNXT_FLAG_AGG_RINGS flag is true if the JUMBO, GRO, or LRO flag is
> > > set. So even though it is initially cleared in
> > > bnxt_set_rx_skb_mode(), we'll set the JUMBO flag if we are in
> > > multi-buffer XDP mode. Again, we don't enable HDS in any XDP mode so
> > > I think we need to keep the original logic here to skip setting the
> > > HDS threshold if BNXT_FLAG_RX_PAGE_MODE is set.
> >
> > I thought the HDS is disallowed only when single-buffer XDP is set.
> > By this series, Core API disallows tcp-data-split only when
> > single-buffer XDP is set, but it allows tcp-data-split to set when
> > multi-buffer XDP is set.
>
> So you are saying that a user could set copybreak with ethtool (included
> in patch 1) and when a multibuffer XDP program is attached to an
> interface with an MTU of 9k, only the header will be in the first page
> and all the TCP data will be in the pages that follow?
>
I think you asked about `hds_threshold = bp->rx_copybreak` right?
By this patchset, rx-copybreak will be a pure software feature.
hds_threshold value will be set by `ethtool -G eth0 header-data-split-thresh N`.
This is implemented in 4/8 patch.
Sorry, I missed commenting in this commit message that hds_threshold
value will no longer follow bp->rx_copybreak.
If HDS is allowed when multi buffer XDP is attached, xdp program will
see only the header on the first page. But As Michael and you suggested,
HDS is not going to be allowed if XDP is attached.
If I misunderstood your question, please let me know!
> > + if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED &&
> > + dev_xdp_sb_prog_count(dev)) {
> > + NL_SET_ERR_MSG(info->extack,
> > + "tcp-data-split can not be enabled with
> > single buffer XDP");
> > + return -EINVAL;
> > + }
> >
> > I think other drivers would allow tcp-data-split on multi buffer XDP,
> > so I wouldn't like to remove this condition check code.
> >
>
> I have no problem keeping that logic in the core kernel. I'm just
> asking you to please preserve the existing logic since it is
> functionally equivalent and easier to read/compare to other spots where
> XDP single-buffer mode is used.
Thanks a lot for the explanation and confirmation!
I will preserve the existing logic.
>
> > I will not set HDS if XDP is set in the bnxt_hwrm_vnic_set_hds()
> > In addition, I think we need to add a condition to check XDP is set in
> > bnxt_set_ringparam().
> >
> > >
> > > > , so this should work. The only issue is that we
> > > > have spots in the driver where we check BNXT_RX_PAGE_MODE(bp) to
> > > > indicate that XDP single-buffer mode is enabled on the device.
> > > >
> > > > If you need to respin this series I would prefer that the change is like
> > > > below to key off the page mode being disabled and BNXT_FLAG_AGG_RINGS
> > > > being enabled to setup HDS. This will serve as a reminder that this is
> > > > for XDP.
> > > >
> > > > @@ -6418,15 +6418,13 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
> > > >
> > > > req->flags = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_JUMBO_PLACEMENT);
> > > > req->enables = cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_JUMBO_THRESH_VALID);
> > > > + req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > >
> > > > - if (BNXT_RX_PAGE_MODE(bp)) {
> > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_buf_use_size);
> > > > - } else {
> > > > + if (!BNXT_RX_PAGE_MODE(bp) && (bp->flags & BNXT_FLAG_AGG_RINGS)) {
> > > > req->flags |= cpu_to_le32(VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV4 |
> > > > VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
> > > > req->enables |=
> > > > cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
> > > > - req->jumbo_thresh = cpu_to_le16(bp->rx_copy_thresh);
> > > > req->hds_threshold = cpu_to_le16(bp->rx_copy_thresh);
> > > > }
> > > > req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
> > > >
> >
> > Thanks a lot!
> > Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
2024-10-22 16:23 ` [PATCH net-next v4 1/8] bnxt_en: add support for rx-copybreak ethtool command Taehee Yoo
2024-10-22 16:23 ` [PATCH net-next v4 2/8] bnxt_en: add support for tcp-data-split " Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-11-01 14:55 ` Mina Almasry
2024-10-22 16:23 ` [PATCH net-next v4 4/8] bnxt_en: add support for header-data-split-thresh ethtool command Taehee Yoo
` (4 subsequent siblings)
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
The header-data-split-thresh option configures the threshold value of
the header-data-split.
If a received packet size is larger than this threshold value, a packet
will be split into header and payload.
The header indicates TCP and UDP header, but it depends on driver spec.
The bnxt_en driver supports HDS(Header-Data-Split) configuration at
FW level, affecting TCP and UDP too.
So, If header-data-split-thresh is set, it affects UDP and TCP packets.
Example:
# ethtool -G <interface name> header-data-split-thresh <value>
# ethtool -G enp14s0f0np0 tcp-data-split on header-data-split-thresh 256
# ethtool -g enp14s0f0np0
Ring parameters for enp14s0f0np0:
Pre-set maximums:
...
Header data split thresh: 256
Current hardware settings:
...
TCP data split: on
Header data split thresh: 256
The default/min/max values are not defined in the ethtool so the drivers
should define themself.
The 0 value means that all TCP/UDP packets' header and payload
will be split.
In general cases, HDS can increase the overhead of host memory and PCIe
bus because it copies data twice.
So users should consider the overhead of HDS.
If the HDS threshold is 0 and then the copybreak is 256 and the packet's
payload is 8 bytes.
So, two pages are used, one for headers and one for payloads.
By the copybreak, only the headers page is copied and then it can be
reused immediately and then a payloads page is still used.
If the HDS threshold is larger than 8, both headers and payloads are
copied and then a page can be recycled immediately.
So, too low HDS threshold has larger disadvantages than advantages
aspect of performance in general cases.
Users should consider the overhead of this feature.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Fix 80 charactor wrap.
- Rename from tcp-data-split-thresh to header-data-split-thresh
- Add description about overhead of HDS.
- Add ETHTOOL_RING_USE_HDS_THRS flag.
- Add dev_xdp_sb_prog_count() helper.
- Add Test tag from Stanislav.
v3:
- Fix documentation and ynl
- Update error messages
- Validate configuration of tcp-data-split and tcp-data-split-thresh
v2:
- Patch added.
Documentation/netlink/specs/ethtool.yaml | 8 ++
Documentation/networking/ethtool-netlink.rst | 79 ++++++++++++--------
include/linux/ethtool.h | 6 ++
include/linux/netdevice.h | 1 +
include/uapi/linux/ethtool_netlink.h | 2 +
net/core/dev.c | 13 ++++
net/ethtool/netlink.h | 2 +-
net/ethtool/rings.c | 37 ++++++++-
8 files changed, 115 insertions(+), 33 deletions(-)
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 6a050d755b9c..3e1f54324cbc 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -215,6 +215,12 @@ attribute-sets:
-
name: tx-push-buf-len-max
type: u32
+ -
+ name: header-data-split-thresh
+ type: u32
+ -
+ name: header-data-split-thresh-max
+ type: u32
-
name: mm-stat
@@ -1393,6 +1399,8 @@ operations:
- rx-push
- tx-push-buf-len
- tx-push-buf-len-max
+ - header-data-split-thresh
+ - header-data-split-thresh-max
dump: *ring-get-op
-
name: rings-set
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 295563e91082..513eb1517f53 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -875,24 +875,35 @@ Request contents:
Kernel response contents:
- ======================================= ====== ===========================
- ``ETHTOOL_A_RINGS_HEADER`` nested reply header
- ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
- ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring
- ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring
- ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
- ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
- ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
- ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
- ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
- ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring
- ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split
- ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
- ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
- ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
- ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer
- ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX`` u32 max size of TX push buffer
- ======================================= ====== ===========================
+ ================================================ ====== ====================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini
+ ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo
+ ring
+ ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo
+ ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on
+ the ring
+ ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data
+ split
+ ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
+ ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
+ ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push
+ buffer
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX`` u32 max size of TX push
+ buffer
+ ``ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH`` u32 threshold of
+ header / data split
+ ``ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH_MAX`` u32 max threshold of
+ header / data split
+ ================================================ ====== ====================
``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with
page-flipping TCP zero-copy receive (``getsockopt(TCP_ZEROCOPY_RECEIVE)``).
@@ -927,18 +938,22 @@ Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request.
Request contents:
- ==================================== ====== ===========================
- ``ETHTOOL_A_RINGS_HEADER`` nested reply header
- ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
- ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
- ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
- ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
- ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring
- ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
- ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
- ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
- ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer
- ==================================== ====== ===========================
+ ============================================ ====== =======================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the
+ ring
+ ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split
+ ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE
+ ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode
+ ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode
+ ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer
+ ``ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH`` u32 threshold of
+ header / data split
+ ============================================ ====== =======================
Kernel checks that requested ring sizes do not exceed limits reported by
driver. Driver may impose additional constraints and may not support all
@@ -954,6 +969,10 @@ A bigger CQE can have more receive buffer pointers, and in turn the NIC can
transfer a bigger frame from wire. Based on the NIC hardware, the overall
completion queue size can be adjusted in the driver if CQE size is modified.
+``ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH`` specifies the threshold value of
+header / data split feature. If a received packet size is larger than this
+threshold value, header and data will be split.
+
CHANNELS_GET
============
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 12f6dc567598..021fd21f3914 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -78,6 +78,8 @@ enum {
* @cqe_size: Size of TX/RX completion queue event
* @tx_push_buf_len: Size of TX push buffer
* @tx_push_buf_max_len: Maximum allowed size of TX push buffer
+ * @hds_thresh: Threshold value of header-data-split-thresh
+ * @hds_thresh_max: Maximum allowed threshold of header-data-split-thresh
*/
struct kernel_ethtool_ringparam {
u32 rx_buf_len;
@@ -87,6 +89,8 @@ struct kernel_ethtool_ringparam {
u32 cqe_size;
u32 tx_push_buf_len;
u32 tx_push_buf_max_len;
+ u32 hds_thresh;
+ u32 hds_thresh_max;
};
/**
@@ -97,6 +101,7 @@ struct kernel_ethtool_ringparam {
* @ETHTOOL_RING_USE_RX_PUSH: capture for setting rx_push
* @ETHTOOL_RING_USE_TX_PUSH_BUF_LEN: capture for setting tx_push_buf_len
* @ETHTOOL_RING_USE_TCP_DATA_SPLIT: capture for setting tcp_data_split
+ * @ETHTOOL_RING_USE_HDS_THRS: capture for setting header-data-split-thresh
*/
enum ethtool_supported_ring_param {
ETHTOOL_RING_USE_RX_BUF_LEN = BIT(0),
@@ -105,6 +110,7 @@ enum ethtool_supported_ring_param {
ETHTOOL_RING_USE_RX_PUSH = BIT(3),
ETHTOOL_RING_USE_TX_PUSH_BUF_LEN = BIT(4),
ETHTOOL_RING_USE_TCP_DATA_SPLIT = BIT(5),
+ ETHTOOL_RING_USE_HDS_THRS = BIT(6),
};
#define __ETH_RSS_HASH_BIT(bit) ((u32)1 << (bit))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8feaca12655e..e155b767a08d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4010,6 +4010,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
u8 dev_xdp_prog_count(struct net_device *dev);
+u8 dev_xdp_sb_prog_count(struct net_device *dev);
int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf);
u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 283305f6b063..7087c5c51017 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -364,6 +364,8 @@ enum {
ETHTOOL_A_RINGS_RX_PUSH, /* u8 */
ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, /* u32 */
ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, /* u32 */
+ ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH, /* u32 */
+ ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH_MAX, /* u32 */
/* add new constants above here */
__ETHTOOL_A_RINGS_CNT,
diff --git a/net/core/dev.c b/net/core/dev.c
index c682173a7642..890cd2bd0864 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9431,6 +9431,19 @@ u8 dev_xdp_prog_count(struct net_device *dev)
}
EXPORT_SYMBOL_GPL(dev_xdp_prog_count);
+u8 dev_xdp_sb_prog_count(struct net_device *dev)
+{
+ u8 count = 0;
+ int i;
+
+ for (i = 0; i < __MAX_XDP_MODE; i++)
+ if (dev->xdp_state[i].prog &&
+ !dev->xdp_state[i].prog->aux->xdp_has_frags)
+ count++;
+ return count;
+}
+EXPORT_SYMBOL_GPL(dev_xdp_sb_prog_count);
+
int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf)
{
if (!dev->netdev_ops->ndo_bpf)
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 203b08eb6c6f..9f51a252ebe0 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -455,7 +455,7 @@ extern const struct nla_policy ethnl_features_set_policy[ETHTOOL_A_FEATURES_WANT
extern const struct nla_policy ethnl_privflags_get_policy[ETHTOOL_A_PRIVFLAGS_HEADER + 1];
extern const struct nla_policy ethnl_privflags_set_policy[ETHTOOL_A_PRIVFLAGS_FLAGS + 1];
extern const struct nla_policy ethnl_rings_get_policy[ETHTOOL_A_RINGS_HEADER + 1];
-extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX + 1];
+extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH_MAX + 1];
extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEADER + 1];
extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1];
extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1];
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index b7865a14fdf8..e1fd82a91014 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -61,7 +61,11 @@ static int rings_reply_size(const struct ethnl_req_info *req_base,
nla_total_size(sizeof(u8)) + /* _RINGS_TX_PUSH */
nla_total_size(sizeof(u8))) + /* _RINGS_RX_PUSH */
nla_total_size(sizeof(u32)) + /* _RINGS_TX_PUSH_BUF_LEN */
- nla_total_size(sizeof(u32)); /* _RINGS_TX_PUSH_BUF_LEN_MAX */
+ nla_total_size(sizeof(u32)) + /* _RINGS_TX_PUSH_BUF_LEN_MAX */
+ nla_total_size(sizeof(u32)) +
+ /* _RINGS_HEADER_DATA_SPLIT_THRESH */
+ nla_total_size(sizeof(u32));
+ /* _RINGS_HEADER_DATA_SPLIT_THRESH_MAX*/
}
static int rings_fill_reply(struct sk_buff *skb,
@@ -108,7 +112,12 @@ static int rings_fill_reply(struct sk_buff *skb,
(nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX,
kr->tx_push_buf_max_len) ||
nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN,
- kr->tx_push_buf_len))))
+ kr->tx_push_buf_len))) ||
+ ((supported_ring_params & ETHTOOL_RING_USE_HDS_THRS) &&
+ (nla_put_u32(skb, ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH,
+ kr->hds_thresh) ||
+ nla_put_u32(skb, ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH_MAX,
+ kr->hds_thresh_max))))
return -EMSGSIZE;
return 0;
@@ -130,6 +139,7 @@ const struct nla_policy ethnl_rings_set_policy[] = {
[ETHTOOL_A_RINGS_TX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1),
[ETHTOOL_A_RINGS_RX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1),
[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN] = { .type = NLA_U32 },
+ [ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH] = { .type = NLA_U32 },
};
static int
@@ -155,6 +165,14 @@ ethnl_set_rings_validate(struct ethnl_req_info *req_info,
return -EOPNOTSUPP;
}
+ if (tb[ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH] &&
+ !(ops->supported_ring_params & ETHTOOL_RING_USE_HDS_THRS)) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ tb[ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH],
+ "setting header-data-split-thresh is not supported");
+ return -EOPNOTSUPP;
+ }
+
if (tb[ETHTOOL_A_RINGS_CQE_SIZE] &&
!(ops->supported_ring_params & ETHTOOL_RING_USE_CQE_SIZE)) {
NL_SET_ERR_MSG_ATTR(info->extack,
@@ -222,9 +240,24 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
tb[ETHTOOL_A_RINGS_RX_PUSH], &mod);
ethnl_update_u32(&kernel_ringparam.tx_push_buf_len,
tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod);
+ ethnl_update_u32(&kernel_ringparam.hds_thresh,
+ tb[ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH], &mod);
if (!mod)
return 0;
+ if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED &&
+ dev_xdp_sb_prog_count(dev)) {
+ NL_SET_ERR_MSG(info->extack,
+ "tcp-data-split can not be enabled with single buffer XDP");
+ return -EINVAL;
+ }
+
+ if (kernel_ringparam.hds_thresh > kernel_ringparam.hds_thresh_max) {
+ NL_SET_BAD_ATTR(info->extack,
+ tb[ETHTOOL_A_RINGS_HEADER_DATA_SPLIT_THRESH_MAX]);
+ return -ERANGE;
+ }
+
/* ensure new ring parameters are within limits */
if (ringparam.rx_pending > ringparam.rx_max_pending)
err_attr = tb[ETHTOOL_A_RINGS_RX];
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh
2024-10-22 16:23 ` [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh Taehee Yoo
@ 2024-11-01 14:55 ` Mina Almasry
2024-11-01 18:50 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 14:55 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> The header-data-split-thresh option configures the threshold value of
> the header-data-split.
> If a received packet size is larger than this threshold value, a packet
> will be split into header and payload.
> The header indicates TCP and UDP header, but it depends on driver spec.
> The bnxt_en driver supports HDS(Header-Data-Split) configuration at
> FW level, affecting TCP and UDP too.
> So, If header-data-split-thresh is set, it affects UDP and TCP packets.
>
> Example:
> # ethtool -G <interface name> header-data-split-thresh <value>
>
> # ethtool -G enp14s0f0np0 tcp-data-split on header-data-split-thresh 256
> # ethtool -g enp14s0f0np0
> Ring parameters for enp14s0f0np0:
> Pre-set maximums:
> ...
> Header data split thresh: 256
> Current hardware settings:
> ...
> TCP data split: on
> Header data split thresh: 256
This is a nit, feel free to ignore, but I wonder if we should call it
'Data split thresh' instead of 'Header data split threshold'.
It was a bit weird for me to refer to the feature as tcp-data-split,
but the threshold as hds_threshold which i guess is short for header
split threshold. Aligning the names to 'TCP data split [threshold]'
would be nice I think.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh
2024-11-01 14:55 ` Mina Almasry
@ 2024-11-01 18:50 ` Taehee Yoo
0 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-11-01 18:50 UTC (permalink / raw)
To: Mina Almasry
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Nov 1, 2024 at 11:56 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Tue, Oct 22, 2024 at 9:24 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > The header-data-split-thresh option configures the threshold value of
> > the header-data-split.
> > If a received packet size is larger than this threshold value, a packet
> > will be split into header and payload.
> > The header indicates TCP and UDP header, but it depends on driver spec.
> > The bnxt_en driver supports HDS(Header-Data-Split) configuration at
> > FW level, affecting TCP and UDP too.
> > So, If header-data-split-thresh is set, it affects UDP and TCP packets.
> >
> > Example:
> > # ethtool -G <interface name> header-data-split-thresh <value>
> >
> > # ethtool -G enp14s0f0np0 tcp-data-split on header-data-split-thresh 256
> > # ethtool -g enp14s0f0np0
> > Ring parameters for enp14s0f0np0:
> > Pre-set maximums:
> > ...
> > Header data split thresh: 256
> > Current hardware settings:
> > ...
> > TCP data split: on
> > Header data split thresh: 256
>
> This is a nit, feel free to ignore, but I wonder if we should call it
> 'Data split thresh' instead of 'Header data split threshold'.
>
> It was a bit weird for me to refer to the feature as tcp-data-split,
> but the threshold as hds_threshold which i guess is short for header
> split threshold. Aligning the names to 'TCP data split [threshold]'
> would be nice I think.
The reason why I used it is that I thought a command name and print
should be the same or very similar. Both "Data split thresh" and
"TCP data split threshold" are similar to the command.
But "Header data split thresh" is the same as the command. it's
very clear. So I still prefer "Header data split thresh".
But If you prefer these names, I will change it.
Thanks a lot!
Taehee Yoo
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 4/8] bnxt_en: add support for header-data-split-thresh ethtool command
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
` (2 preceding siblings ...)
2024-10-22 16:23 ` [PATCH net-next v4 3/8] net: ethtool: add support for configuring header-data-split-thresh Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-10-22 16:23 ` [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering Taehee Yoo
` (3 subsequent siblings)
7 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
The bnxt_en driver has configured the hds_threshold value automatically
when TPA is enabled based on the rx-copybreak default value.
Now the header-data-split-thresh ethtool command is added, so it adds an
implementation of header-data-split-thresh option.
Configuration of the header-data-split-thresh is allowed only when
the header-data-split is enabled. The default value of
header-data-split-thresh is 256, which is the default value of
rx-copybreak, which used to be the hds_thresh value.
# Example:
# ethtool -G enp14s0f0np0 tcp-data-split on header-data-split-thresh 256
# ethtool -g enp14s0f0np0
Ring parameters for enp14s0f0np0:
Pre-set maximums:
...
Header data split thresh: 256
Current hardware settings:
...
TCP data split: on
Header data split thresh: 256
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Reduce hole in struct bnxt.
- Add ETHTOOL_RING_USE_HDS_THRS to indicate bnxt_en driver support
header-data-split-thresh option.
- Add Test tag from Stanislav.
v3:
- Drop validation logic tcp-data-split and tcp-data-split-thresh.
v2:
- Patch added.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 ++-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 7 ++++++-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 91ea42ff9b17..7d9da483b867 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4453,6 +4453,7 @@ void bnxt_set_tpa_flags(struct bnxt *bp)
static void bnxt_init_ring_params(struct bnxt *bp)
{
bp->rx_copybreak = BNXT_DEFAULT_RX_COPYBREAK;
+ bp->hds_threshold = BNXT_DEFAULT_RX_COPYBREAK;
}
/* bp->rx_ring_size, bp->tx_ring_size, dev->mtu, BNXT_FLAG_{G|L}RO flags must
@@ -6427,7 +6428,7 @@ static int bnxt_hwrm_vnic_set_hds(struct bnxt *bp, struct bnxt_vnic_info *vnic)
VNIC_PLCMODES_CFG_REQ_FLAGS_HDS_IPV6);
req->enables |=
cpu_to_le32(VNIC_PLCMODES_CFG_REQ_ENABLES_HDS_THRESHOLD_VALID);
- req->hds_threshold = cpu_to_le16(bp->rx_copybreak);
+ req->hds_threshold = cpu_to_le16(bp->hds_threshold);
}
req->vnic_id = cpu_to_le32(vnic->fw_vnic_id);
return hwrm_req_send(bp, req);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 432bc19b35ea..e467341f1e5b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2361,6 +2361,8 @@ struct bnxt {
u8 q_ids[BNXT_MAX_QUEUE];
u8 max_q;
u8 num_tc;
+#define BNXT_HDS_THRESHOLD_MAX 256
+ u16 hds_threshold;
unsigned int current_interval;
#define BNXT_TIMER_INTERVAL HZ
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 5172d0547e0c..73e821a23f56 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -840,6 +840,9 @@ static void bnxt_get_ringparam(struct net_device *dev,
ering->rx_pending = bp->rx_ring_size;
ering->rx_jumbo_pending = bp->rx_agg_ring_size;
ering->tx_pending = bp->tx_ring_size;
+
+ kernel_ering->hds_thresh = bp->hds_threshold;
+ kernel_ering->hds_thresh_max = BNXT_HDS_THRESHOLD_MAX;
}
static int bnxt_set_ringparam(struct net_device *dev,
@@ -869,6 +872,7 @@ static int bnxt_set_ringparam(struct net_device *dev,
break;
}
+ bp->hds_threshold = (u16)kernel_ering->hds_thresh;
bp->rx_ring_size = ering->rx_pending;
bp->tx_ring_size = ering->tx_pending;
bnxt_set_ring_params(bp);
@@ -5357,7 +5361,8 @@ const struct ethtool_ops bnxt_ethtool_ops = {
ETHTOOL_COALESCE_STATS_BLOCK_USECS |
ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
ETHTOOL_COALESCE_USE_CQE,
- .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
+ .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
+ ETHTOOL_RING_USE_HDS_THRS,
.get_link_ksettings = bnxt_get_link_ksettings,
.set_link_ksettings = bnxt_set_link_ksettings,
.get_fec_stats = bnxt_get_fec_stats,
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
` (3 preceding siblings ...)
2024-10-22 16:23 ` [PATCH net-next v4 4/8] bnxt_en: add support for header-data-split-thresh ethtool command Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-11-01 14:29 ` Mina Almasry
2024-10-22 16:23 ` [PATCH net-next v4 6/8] net: ethtool: " Taehee Yoo
` (2 subsequent siblings)
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
If driver doesn't support ring parameter or tcp-data-split configuration
is not sufficient, the devmem should not be set up.
Before setup the devmem, tcp-data-split should be ON and
header-data-split-thresh value should be 0.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Check condition before __netif_get_rx_queue().
- Separate condition check.
- Add Test tag from Stanislav.
v3:
- Patch added.
net/core/devmem.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 11b91c12ee11..3425e872e87a 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -8,6 +8,8 @@
*/
#include <linux/dma-buf.h>
+#include <linux/ethtool.h>
+#include <linux/ethtool_netlink.h>
#include <linux/genalloc.h>
#include <linux/mm.h>
#include <linux/netdevice.h>
@@ -131,6 +133,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
struct net_devmem_dmabuf_binding *binding,
struct netlink_ext_ack *extack)
{
+ struct kernel_ethtool_ringparam kernel_ringparam = {};
+ struct ethtool_ringparam ringparam = {};
struct netdev_rx_queue *rxq;
u32 xa_idx;
int err;
@@ -140,6 +144,20 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return -ERANGE;
}
+ if (!dev->ethtool_ops->get_ringparam)
+ return -EOPNOTSUPP;
+
+ dev->ethtool_ops->get_ringparam(dev, &ringparam, &kernel_ringparam,
+ extack);
+ if (kernel_ringparam.tcp_data_split != ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
+ NL_SET_ERR_MSG(extack, "tcp-data-split is disabled");
+ return -EINVAL;
+ }
+ if (kernel_ringparam.hds_thresh) {
+ NL_SET_ERR_MSG(extack, "header-data-split-thresh is not zero");
+ return -EINVAL;
+ }
+
rxq = __netif_get_rx_queue(dev, rxq_idx);
if (rxq->mp_params.mp_priv) {
NL_SET_ERR_MSG(extack, "designated queue already memory provider bound");
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering
2024-10-22 16:23 ` [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering Taehee Yoo
@ 2024-11-01 14:29 ` Mina Almasry
2024-11-01 18:02 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 14:29 UTC (permalink / raw)
To: Taehee Yoo, Praveen Kaligineedi
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
Hi Taehee, sorry for the late reply. I was out on vacation and needed
to catch up on some stuff when I got back.
On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> If driver doesn't support ring parameter or tcp-data-split configuration
> is not sufficient, the devmem should not be set up.
> Before setup the devmem, tcp-data-split should be ON and
> header-data-split-thresh value should be 0.
>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v4:
> - Check condition before __netif_get_rx_queue().
> - Separate condition check.
> - Add Test tag from Stanislav.
>
> v3:
> - Patch added.
>
> net/core/devmem.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 11b91c12ee11..3425e872e87a 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -8,6 +8,8 @@
> */
>
> #include <linux/dma-buf.h>
> +#include <linux/ethtool.h>
> +#include <linux/ethtool_netlink.h>
> #include <linux/genalloc.h>
> #include <linux/mm.h>
> #include <linux/netdevice.h>
> @@ -131,6 +133,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> struct net_devmem_dmabuf_binding *binding,
> struct netlink_ext_ack *extack)
> {
> + struct kernel_ethtool_ringparam kernel_ringparam = {};
> + struct ethtool_ringparam ringparam = {};
> struct netdev_rx_queue *rxq;
> u32 xa_idx;
> int err;
> @@ -140,6 +144,20 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> return -ERANGE;
> }
>
> + if (!dev->ethtool_ops->get_ringparam)
> + return -EOPNOTSUPP;
> +
Maybe an error code not EOPNOTSUPP. I think that gets returned when
NET_DEVMEM is not compiled in and other situations like that. Lets
pick another error code? Maybe ENODEV.
Also consider extack error message. But it's very unlikely to hit this
error, so maybe not necessary.
> + dev->ethtool_ops->get_ringparam(dev, &ringparam, &kernel_ringparam,
> + extack);
> + if (kernel_ringparam.tcp_data_split != ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
> + NL_SET_ERR_MSG(extack, "tcp-data-split is disabled");
> + return -EINVAL;
> + }
> + if (kernel_ringparam.hds_thresh) {
> + NL_SET_ERR_MSG(extack, "header-data-split-thresh is not zero");
> + return -EINVAL;
> + }
> +
Thinking about drivers that support tcp-data-split, but don't support
any kind of hds_thresh. For us (GVE), the hds_thresh is implicitly
always 0.
Does the driver need to explicitly set hds_thresh to 0? If so, that
adds a bit of churn to driver code. Is it possible to detect in this
function that the driver doesn't support hds_thresh and allow the
binding if so?
I see in the previous patch you do something like:
supported_ring_params & ETHTOOL_RING_USE_HDS_THRS
To detect there is hds_thresh support. I was wondering if something
like this is possible so we don't have to update GVE and all future
drivers to explicitly set thresh to 0.
Other than that, looks fine to me.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering
2024-11-01 14:29 ` Mina Almasry
@ 2024-11-01 18:02 ` Taehee Yoo
2024-11-01 20:40 ` Mina Almasry
0 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-11-01 18:02 UTC (permalink / raw)
To: Mina Almasry
Cc: Praveen Kaligineedi, davem, kuba, pabeni, edumazet, donald.hunter,
corbet, michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Nov 1, 2024 at 11:29 PM Mina Almasry <almasrymina@google.com> wrote:
>
> Hi Taehee, sorry for the late reply. I was out on vacation and needed
> to catch up on some stuff when I got back.
Hi Mina,
Thank you so much for your review :)
>
> On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > If driver doesn't support ring parameter or tcp-data-split configuration
> > is not sufficient, the devmem should not be set up.
> > Before setup the devmem, tcp-data-split should be ON and
> > header-data-split-thresh value should be 0.
> >
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Check condition before __netif_get_rx_queue().
> > - Separate condition check.
> > - Add Test tag from Stanislav.
> >
> > v3:
> > - Patch added.
> >
> > net/core/devmem.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 11b91c12ee11..3425e872e87a 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -8,6 +8,8 @@
> > */
> >
> > #include <linux/dma-buf.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/ethtool_netlink.h>
> > #include <linux/genalloc.h>
> > #include <linux/mm.h>
> > #include <linux/netdevice.h>
> > @@ -131,6 +133,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > struct net_devmem_dmabuf_binding *binding,
> > struct netlink_ext_ack *extack)
> > {
> > + struct kernel_ethtool_ringparam kernel_ringparam = {};
> > + struct ethtool_ringparam ringparam = {};
> > struct netdev_rx_queue *rxq;
> > u32 xa_idx;
> > int err;
> > @@ -140,6 +144,20 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > return -ERANGE;
> > }
> >
> > + if (!dev->ethtool_ops->get_ringparam)
> > + return -EOPNOTSUPP;
> > +
>
> Maybe an error code not EOPNOTSUPP. I think that gets returned when
> NET_DEVMEM is not compiled in and other situations like that. Lets
> pick another error code? Maybe ENODEV.
There are several same code in the ethtool core.
It returns EOPNOTSUPP consistently.
In the v3 series, Brett reviewed it.
So, I changed it from EINVAL to EOPNOTSUPP it was reasonable to me.
So I prefer EOPNOTSUPP but I will follow your decision.
What do you think?
>
> Also consider extack error message. But it's very unlikely to hit this
> error, so maybe not necessary.
I removed extack from the v3. because ethtool doesn't use extack for
the same logic. It was reasonable to me.
>
> > + dev->ethtool_ops->get_ringparam(dev, &ringparam, &kernel_ringparam,
> > + extack);
> > + if (kernel_ringparam.tcp_data_split != ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
> > + NL_SET_ERR_MSG(extack, "tcp-data-split is disabled");
> > + return -EINVAL;
> > + }
> > + if (kernel_ringparam.hds_thresh) {
> > + NL_SET_ERR_MSG(extack, "header-data-split-thresh is not zero");
> > + return -EINVAL;
> > + }
> > +
>
> Thinking about drivers that support tcp-data-split, but don't support
> any kind of hds_thresh. For us (GVE), the hds_thresh is implicitly
> always 0.
>
> Does the driver need to explicitly set hds_thresh to 0? If so, that
> adds a bit of churn to driver code. Is it possible to detect in this
> function that the driver doesn't support hds_thresh and allow the
> binding if so?
>
> I see in the previous patch you do something like:
>
> supported_ring_params & ETHTOOL_RING_USE_HDS_THRS
>
> To detect there is hds_thresh support. I was wondering if something
> like this is possible so we don't have to update GVE and all future
> drivers to explicitly set thresh to 0.
How about setting maximum hds_threshold to 0?
The default value of hds_threshold of course 0.
I think gve code does not need to change much, just adding like below
will be okay.
I think if drivers don't support setting hds_threshold explicitly, it
is actually the same as support only 0.
So, there is no problem I think.
I didn't analyze gve driver code, So I might think it too optimistically.
#define GVE_HDS_THRESHOLD_MAX 0
kernel_ering->hds_thresh = GVE_HDS_THRESHOLD_MAX;
kernel_ering->hds_thresh_max = GVE_HDS_THRESHOLD_MAX;
...
.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
ETHTOOL_RING_USE_HDS_THRS,
ethtool command may show like this.
ethtool -g enp7s0f3np3
Ring parameters for enp7s0f3np3:
Pre-set maximums:
...
Header data split thresh: 0
Current hardware settings:
...
TCP data split: on
Header data split thresh: 0
If a driver can't set up hds_threshold, ethtool command may show like this.
ethtool -g enp7s0f3np3
Ring parameters for enp7s0f3np3:
Pre-set maximums:
TX push buff len: n/a
Header data split thresh: n/a
Current hardware settings:
...
TCP data split: on
Header data split thresh: n/a
Thanks a lot!
Taehee Yoo
>
> Other than that, looks fine to me.
>
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering
2024-11-01 18:02 ` Taehee Yoo
@ 2024-11-01 20:40 ` Mina Almasry
0 siblings, 0 replies; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 20:40 UTC (permalink / raw)
To: Taehee Yoo, Harshitha Ramamurthy, Ziwei Xiao
Cc: Praveen Kaligineedi, davem, kuba, pabeni, edumazet, donald.hunter,
corbet, michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, dw, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Nov 1, 2024 at 11:03 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> On Fri, Nov 1, 2024 at 11:29 PM Mina Almasry <almasrymina@google.com> wrote:
> >
> > Hi Taehee, sorry for the late reply. I was out on vacation and needed
> > to catch up on some stuff when I got back.
>
> Hi Mina,
> Thank you so much for your review :)
>
> >
> > On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
> > >
> > > If driver doesn't support ring parameter or tcp-data-split configuration
> > > is not sufficient, the devmem should not be set up.
> > > Before setup the devmem, tcp-data-split should be ON and
> > > header-data-split-thresh value should be 0.
> > >
> > > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > > ---
> > >
> > > v4:
> > > - Check condition before __netif_get_rx_queue().
> > > - Separate condition check.
> > > - Add Test tag from Stanislav.
> > >
> > > v3:
> > > - Patch added.
> > >
> > > net/core/devmem.c | 18 ++++++++++++++++++
> > > 1 file changed, 18 insertions(+)
> > >
> > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > index 11b91c12ee11..3425e872e87a 100644
> > > --- a/net/core/devmem.c
> > > +++ b/net/core/devmem.c
> > > @@ -8,6 +8,8 @@
> > > */
> > >
> > > #include <linux/dma-buf.h>
> > > +#include <linux/ethtool.h>
> > > +#include <linux/ethtool_netlink.h>
> > > #include <linux/genalloc.h>
> > > #include <linux/mm.h>
> > > #include <linux/netdevice.h>
> > > @@ -131,6 +133,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > > struct net_devmem_dmabuf_binding *binding,
> > > struct netlink_ext_ack *extack)
> > > {
> > > + struct kernel_ethtool_ringparam kernel_ringparam = {};
> > > + struct ethtool_ringparam ringparam = {};
> > > struct netdev_rx_queue *rxq;
> > > u32 xa_idx;
> > > int err;
> > > @@ -140,6 +144,20 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > > return -ERANGE;
> > > }
> > >
> > > + if (!dev->ethtool_ops->get_ringparam)
> > > + return -EOPNOTSUPP;
> > > +
> >
> > Maybe an error code not EOPNOTSUPP. I think that gets returned when
> > NET_DEVMEM is not compiled in and other situations like that. Lets
> > pick another error code? Maybe ENODEV.
>
> There are several same code in the ethtool core.
> It returns EOPNOTSUPP consistently.
> In the v3 series, Brett reviewed it.
> So, I changed it from EINVAL to EOPNOTSUPP it was reasonable to me.
> So I prefer EOPNOTSUPP but I will follow your decision.
> What do you think?
>
> >
> > Also consider extack error message. But it's very unlikely to hit this
> > error, so maybe not necessary.
>
> I removed extack from the v3. because ethtool doesn't use extack for
> the same logic. It was reasonable to me.
>
Ah, looks like I accidentally re-opened discussion on a couple of
items that you're already aligned on. Not critical. This is fine by
me.
> >
> > > + dev->ethtool_ops->get_ringparam(dev, &ringparam, &kernel_ringparam,
> > > + extack);
> > > + if (kernel_ringparam.tcp_data_split != ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
> > > + NL_SET_ERR_MSG(extack, "tcp-data-split is disabled");
> > > + return -EINVAL;
> > > + }
> > > + if (kernel_ringparam.hds_thresh) {
> > > + NL_SET_ERR_MSG(extack, "header-data-split-thresh is not zero");
> > > + return -EINVAL;
> > > + }
> > > +
> >
> > Thinking about drivers that support tcp-data-split, but don't support
> > any kind of hds_thresh. For us (GVE), the hds_thresh is implicitly
> > always 0.
> >
> > Does the driver need to explicitly set hds_thresh to 0? If so, that
> > adds a bit of churn to driver code. Is it possible to detect in this
> > function that the driver doesn't support hds_thresh and allow the
> > binding if so?
> >
> > I see in the previous patch you do something like:
> >
> > supported_ring_params & ETHTOOL_RING_USE_HDS_THRS
> >
> > To detect there is hds_thresh support. I was wondering if something
> > like this is possible so we don't have to update GVE and all future
> > drivers to explicitly set thresh to 0.
>
> How about setting maximum hds_threshold to 0?
> The default value of hds_threshold of course 0.
> I think gve code does not need to change much, just adding like below
> will be okay.
>
> I think if drivers don't support setting hds_threshold explicitly, it
> is actually the same as support only 0.
> So, there is no problem I think.
>
> I didn't analyze gve driver code, So I might think it too optimistically.
>
> #define GVE_HDS_THRESHOLD_MAX 0
> kernel_ering->hds_thresh = GVE_HDS_THRESHOLD_MAX;
> kernel_ering->hds_thresh_max = GVE_HDS_THRESHOLD_MAX;
> ...
> .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
> ETHTOOL_RING_USE_HDS_THRS,
>
OK, if you can think of a way to do this without any churn to other
drivers, that would be better, but this is fine by me either way.
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 6/8] net: ethtool: add ring parameter filtering
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
` (4 preceding siblings ...)
2024-10-22 16:23 ` [PATCH net-next v4 5/8] net: devmem: add ring parameter filtering Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-11-01 14:35 ` Mina Almasry
2024-10-22 16:23 ` [PATCH net-next v4 7/8] net: netmem: add netmem_is_pfmemalloc() helper function Taehee Yoo
2024-10-22 16:23 ` [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp Taehee Yoo
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
While the devmem is running, the tcp-data-split and
header-data-split-thresh configuration should not be changed.
If user tries to change tcp-data-split and threshold value while the
devmem is running, it fails and shows extack message.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Add netdev_devmem_enabled() helper.
- Add Test tag from Stanislav.
v3:
- Patch added
include/net/netdev_rx_queue.h | 14 ++++++++++++++
net/ethtool/common.h | 1 +
net/ethtool/rings.c | 13 +++++++++++++
3 files changed, 28 insertions(+)
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index 596836abf7bf..7fbb64ce8d89 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -55,6 +55,20 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
return index;
}
+static inline bool netdev_devmem_enabled(struct net_device *dev)
+{
+ struct netdev_rx_queue *queue;
+ int i;
+
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ queue = __netif_get_rx_queue(dev, i);
+ if (queue->mp_params.mp_priv)
+ return true;
+ }
+
+ return false;
+}
+
int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
#endif
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 4a2de3ce7354..5b8e5847ba3c 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -5,6 +5,7 @@
#include <linux/netdevice.h>
#include <linux/ethtool.h>
+#include <net/netdev_rx_queue.h>
#define ETHTOOL_DEV_FEATURE_WORDS DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index e1fd82a91014..ca313c301081 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -258,6 +258,19 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
return -ERANGE;
}
+ if (netdev_devmem_enabled(dev)) {
+ if (kernel_ringparam.tcp_data_split !=
+ ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
+ NL_SET_ERR_MSG(info->extack,
+ "tcp-data-split should be enabled while devmem is running");
+ return -EINVAL;
+ } else if (kernel_ringparam.hds_thresh) {
+ NL_SET_ERR_MSG(info->extack,
+ "header-data-split-thresh should be zero while devmem is running");
+ return -EINVAL;
+ }
+ }
+
/* ensure new ring parameters are within limits */
if (ringparam.rx_pending > ringparam.rx_max_pending)
err_attr = tb[ETHTOOL_A_RINGS_RX];
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 6/8] net: ethtool: add ring parameter filtering
2024-10-22 16:23 ` [PATCH net-next v4 6/8] net: ethtool: " Taehee Yoo
@ 2024-11-01 14:35 ` Mina Almasry
2024-11-01 18:08 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 14:35 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> While the devmem is running, the tcp-data-split and
> header-data-split-thresh configuration should not be changed.
> If user tries to change tcp-data-split and threshold value while the
> devmem is running, it fails and shows extack message.
>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v4:
> - Add netdev_devmem_enabled() helper.
> - Add Test tag from Stanislav.
>
> v3:
> - Patch added
>
> include/net/netdev_rx_queue.h | 14 ++++++++++++++
> net/ethtool/common.h | 1 +
> net/ethtool/rings.c | 13 +++++++++++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index 596836abf7bf..7fbb64ce8d89 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -55,6 +55,20 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> return index;
> }
>
> +static inline bool netdev_devmem_enabled(struct net_device *dev)
Mega nit: netdev_memory_provider_enabled().
This is actually not devmem specific, and there is already an io_uring
provider in the works.
But, also, we already have dev_get_min_mp_channel_count() defined in
linux/netdevice.h. Lets re-use that one instead of adding another
helper that does almost the same thing. Sorry, I should have
remembered we already have this helper in the last iteration.
Other than that, looks fine to me.
> +{
> + struct netdev_rx_queue *queue;
> + int i;
> +
> + for (i = 0; i < dev->real_num_rx_queues; i++) {
> + queue = __netif_get_rx_queue(dev, i);
> + if (queue->mp_params.mp_priv)
> + return true;
> + }
> +
> + return false;
> +}
> +
> int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
>
> #endif
> diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> index 4a2de3ce7354..5b8e5847ba3c 100644
> --- a/net/ethtool/common.h
> +++ b/net/ethtool/common.h
> @@ -5,6 +5,7 @@
>
> #include <linux/netdevice.h>
> #include <linux/ethtool.h>
> +#include <net/netdev_rx_queue.h>
>
> #define ETHTOOL_DEV_FEATURE_WORDS DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
>
> diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
> index e1fd82a91014..ca313c301081 100644
> --- a/net/ethtool/rings.c
> +++ b/net/ethtool/rings.c
> @@ -258,6 +258,19 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
> return -ERANGE;
> }
>
> + if (netdev_devmem_enabled(dev)) {
> + if (kernel_ringparam.tcp_data_split !=
> + ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
> + NL_SET_ERR_MSG(info->extack,
> + "tcp-data-split should be enabled while devmem is running");
Maybe: "can't disable tcp-data-split while device has memory provider enabled"
> + return -EINVAL;
> + } else if (kernel_ringparam.hds_thresh) {
> + NL_SET_ERR_MSG(info->extack,
> + "header-data-split-thresh should be zero while devmem is running");
Maybe: "can't set non-zero hds_thresh while device is memory provider enabled".
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 6/8] net: ethtool: add ring parameter filtering
2024-11-01 14:35 ` Mina Almasry
@ 2024-11-01 18:08 ` Taehee Yoo
0 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2024-11-01 18:08 UTC (permalink / raw)
To: Mina Almasry
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Nov 1, 2024 at 11:35 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > While the devmem is running, the tcp-data-split and
> > header-data-split-thresh configuration should not be changed.
> > If user tries to change tcp-data-split and threshold value while the
> > devmem is running, it fails and shows extack message.
> >
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Add netdev_devmem_enabled() helper.
> > - Add Test tag from Stanislav.
> >
> > v3:
> > - Patch added
> >
> > include/net/netdev_rx_queue.h | 14 ++++++++++++++
> > net/ethtool/common.h | 1 +
> > net/ethtool/rings.c | 13 +++++++++++++
> > 3 files changed, 28 insertions(+)
> >
> > diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> > index 596836abf7bf..7fbb64ce8d89 100644
> > --- a/include/net/netdev_rx_queue.h
> > +++ b/include/net/netdev_rx_queue.h
> > @@ -55,6 +55,20 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> > return index;
> > }
> >
> > +static inline bool netdev_devmem_enabled(struct net_device *dev)
>
> Mega nit: netdev_memory_provider_enabled().
>
> This is actually not devmem specific, and there is already an io_uring
> provider in the works.
>
> But, also, we already have dev_get_min_mp_channel_count() defined in
> linux/netdevice.h. Lets re-use that one instead of adding another
> helper that does almost the same thing. Sorry, I should have
> remembered we already have this helper in the last iteration.
Ah, I didn't catch it too.
I will use dev_get_min_mp_channel_count() instead.
Thanks a lot!
>
> Other than that, looks fine to me.
>
> > +{
> > + struct netdev_rx_queue *queue;
> > + int i;
> > +
> > + for (i = 0; i < dev->real_num_rx_queues; i++) {
> > + queue = __netif_get_rx_queue(dev, i);
> > + if (queue->mp_params.mp_priv)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
> >
> > #endif
> > diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> > index 4a2de3ce7354..5b8e5847ba3c 100644
> > --- a/net/ethtool/common.h
> > +++ b/net/ethtool/common.h
> > @@ -5,6 +5,7 @@
> >
> > #include <linux/netdevice.h>
> > #include <linux/ethtool.h>
> > +#include <net/netdev_rx_queue.h>
> >
> > #define ETHTOOL_DEV_FEATURE_WORDS DIV_ROUND_UP(NETDEV_FEATURE_COUNT, 32)
> >
> > diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
> > index e1fd82a91014..ca313c301081 100644
> > --- a/net/ethtool/rings.c
> > +++ b/net/ethtool/rings.c
> > @@ -258,6 +258,19 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
> > return -ERANGE;
> > }
> >
> > + if (netdev_devmem_enabled(dev)) {
> > + if (kernel_ringparam.tcp_data_split !=
> > + ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
> > + NL_SET_ERR_MSG(info->extack,
> > + "tcp-data-split should be enabled while devmem is running");
>
> Maybe: "can't disable tcp-data-split while device has memory provider enabled"
Thanks! I will use it!
>
> > + return -EINVAL;
> > + } else if (kernel_ringparam.hds_thresh) {
> > + NL_SET_ERR_MSG(info->extack,
> > + "header-data-split-thresh should be zero while devmem is running");
>
> Maybe: "can't set non-zero hds_thresh while device is memory provider enabled".
Thanks, I will use it too.
Thanks a lot!
Taehee Yoo
>
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 7/8] net: netmem: add netmem_is_pfmemalloc() helper function
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
` (5 preceding siblings ...)
2024-10-22 16:23 ` [PATCH net-next v4 6/8] net: ethtool: " Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-11-01 14:36 ` Mina Almasry
2024-10-22 16:23 ` [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp Taehee Yoo
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
The netmem_is_pfmemalloc() is a netmem version of page_is_pfmemalloc().
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Suggested-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Patch added.
include/net/netmem.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 8a6e20be4b9d..49ae2bf05362 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -171,4 +171,12 @@ static inline unsigned long netmem_get_dma_addr(netmem_ref netmem)
return __netmem_clear_lsb(netmem)->dma_addr;
}
+static inline bool netmem_is_pfmemalloc(netmem_ref netmem)
+{
+ if (netmem_is_net_iov(netmem))
+ return false;
+
+ return page_is_pfmemalloc(netmem_to_page(netmem));
+}
+
#endif /* _NET_NETMEM_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 7/8] net: netmem: add netmem_is_pfmemalloc() helper function
2024-10-22 16:23 ` [PATCH net-next v4 7/8] net: netmem: add netmem_is_pfmemalloc() helper function Taehee Yoo
@ 2024-11-01 14:36 ` Mina Almasry
0 siblings, 0 replies; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 14:36 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> The netmem_is_pfmemalloc() is a netmem version of page_is_pfmemalloc().
>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Suggested-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp
2024-10-22 16:23 [PATCH net-next v4 0/7] bnxt_en: implement device memory TCP for bnxt Taehee Yoo
` (6 preceding siblings ...)
2024-10-22 16:23 ` [PATCH net-next v4 7/8] net: netmem: add netmem_is_pfmemalloc() helper function Taehee Yoo
@ 2024-10-22 16:23 ` Taehee Yoo
2024-11-01 14:53 ` Mina Almasry
7 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-10-22 16:23 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, almasrymina, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev
Cc: kory.maincent, maxime.chevallier, danieller, hengqi, ecree.xilinx,
przemyslaw.kitszel, hkallweit1, ahmed.zaki, rrameshbabu, idosch,
jiri, bigeasy, lorenzo, jdamato, aleksander.lobakin, kaiyuanz,
willemb, daniel.zahka, ap420073
Currently, bnxt_en driver satisfies the requirements of Device memory
TCP, which is tcp-data-split.
So, it implements Device memory TCP for bnxt_en driver.
From now on, the aggregation ring handles netmem_ref instead of page
regardless of the on/off of netmem.
So, for the aggregation ring, memory will be handled with the netmem
page_pool API instead of generic page_pool API.
If Devmem is enabled, netmem_ref is used as-is and if Devmem is not
enabled, netmem_ref will be converted to page and that is used.
Driver recognizes whether the devmem is set or unset based on the
mp_params.mp_priv is not NULL.
Only if devmem is set, it passes PP_FLAG_ALLOW_UNREADABLE_NETMEM.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
v4:
- Do not select NET_DEVMEM in Kconfig.
- Pass PP_FLAG_ALLOW_UNREADABLE_NETMEM flag unconditionally.
- Add __bnxt_rx_agg_pages_xdp().
- Use gfp flag in __bnxt_alloc_rx_netmem().
- Do not add *offset in the __bnxt_alloc_rx_netmem().
- Do not pass queue_idx to bnxt_alloc_rx_page_pool().
- Add Test tag from Stanislav.
- Add page_pool_recycle_direct_netmem() helper.
v3:
- Patch added.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 182 ++++++++++++++++------
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +-
include/net/page_pool/helpers.h | 6 +
3 files changed, 142 insertions(+), 48 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7d9da483b867..7924b1da0413 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -55,6 +55,7 @@
#include <net/page_pool/helpers.h>
#include <linux/align.h>
#include <net/netdev_queues.h>
+#include <net/netdev_rx_queue.h>
#include "bnxt_hsi.h"
#include "bnxt.h"
@@ -863,6 +864,22 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
bnapi->events &= ~BNXT_TX_CMP_EVENT;
}
+static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
+ struct bnxt_rx_ring_info *rxr,
+ unsigned int *offset,
+ gfp_t gfp)
+{
+ netmem_ref netmem;
+
+ netmem = page_pool_alloc_netmem(rxr->page_pool, gfp);
+ if (!netmem)
+ return 0;
+ *offset = 0;
+
+ *mapping = page_pool_get_dma_addr_netmem(netmem);
+ return netmem;
+}
+
static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
struct bnxt_rx_ring_info *rxr,
unsigned int *offset,
@@ -972,21 +989,21 @@ static inline u16 bnxt_find_next_agg_idx(struct bnxt_rx_ring_info *rxr, u16 idx)
return next;
}
-static inline int bnxt_alloc_rx_page(struct bnxt *bp,
- struct bnxt_rx_ring_info *rxr,
- u16 prod, gfp_t gfp)
+static inline int bnxt_alloc_rx_netmem(struct bnxt *bp,
+ struct bnxt_rx_ring_info *rxr,
+ u16 prod, gfp_t gfp)
{
struct rx_bd *rxbd =
&rxr->rx_agg_desc_ring[RX_AGG_RING(bp, prod)][RX_IDX(prod)];
struct bnxt_sw_rx_agg_bd *rx_agg_buf;
- struct page *page;
- dma_addr_t mapping;
u16 sw_prod = rxr->rx_sw_agg_prod;
unsigned int offset = 0;
+ dma_addr_t mapping;
+ netmem_ref netmem;
- page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
+ netmem = __bnxt_alloc_rx_netmem(bp, &mapping, rxr, &offset, gfp);
- if (!page)
+ if (!netmem)
return -ENOMEM;
if (unlikely(test_bit(sw_prod, rxr->rx_agg_bmap)))
@@ -996,7 +1013,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
rx_agg_buf = &rxr->rx_agg_ring[sw_prod];
rxr->rx_sw_agg_prod = RING_RX_AGG(bp, NEXT_RX_AGG(sw_prod));
- rx_agg_buf->page = page;
+ rx_agg_buf->netmem = netmem;
rx_agg_buf->offset = offset;
rx_agg_buf->mapping = mapping;
rxbd->rx_bd_haddr = cpu_to_le64(mapping);
@@ -1044,7 +1061,7 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
struct rx_agg_cmp *agg;
struct bnxt_sw_rx_agg_bd *cons_rx_buf, *prod_rx_buf;
struct rx_bd *prod_bd;
- struct page *page;
+ netmem_ref netmem;
if (p5_tpa)
agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, start + i);
@@ -1061,11 +1078,11 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
cons_rx_buf = &rxr->rx_agg_ring[cons];
/* It is possible for sw_prod to be equal to cons, so
- * set cons_rx_buf->page to NULL first.
+ * set cons_rx_buf->netmem to 0 first.
*/
- page = cons_rx_buf->page;
- cons_rx_buf->page = NULL;
- prod_rx_buf->page = page;
+ netmem = cons_rx_buf->netmem;
+ cons_rx_buf->netmem = 0;
+ prod_rx_buf->netmem = netmem;
prod_rx_buf->offset = cons_rx_buf->offset;
prod_rx_buf->mapping = cons_rx_buf->mapping;
@@ -1190,29 +1207,104 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
return skb;
}
-static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
- struct bnxt_cp_ring_info *cpr,
- struct skb_shared_info *shinfo,
- u16 idx, u32 agg_bufs, bool tpa,
- struct xdp_buff *xdp)
+static bool __bnxt_rx_agg_pages_skb(struct bnxt *bp,
+ struct bnxt_cp_ring_info *cpr,
+ struct sk_buff *skb,
+ u16 idx, u32 agg_bufs, bool tpa)
{
struct bnxt_napi *bnapi = cpr->bnapi;
struct pci_dev *pdev = bp->pdev;
- struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
- u16 prod = rxr->rx_agg_prod;
+ struct bnxt_rx_ring_info *rxr;
u32 i, total_frag_len = 0;
bool p5_tpa = false;
+ u16 prod;
+
+ rxr = bnapi->rx_ring;
+ prod = rxr->rx_agg_prod;
if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
p5_tpa = true;
for (i = 0; i < agg_bufs; i++) {
- skb_frag_t *frag = &shinfo->frags[i];
- u16 cons, frag_len;
+ struct bnxt_sw_rx_agg_bd *cons_rx_buf;
struct rx_agg_cmp *agg;
+ u16 cons, frag_len;
+ dma_addr_t mapping;
+ netmem_ref netmem;
+
+ if (p5_tpa)
+ agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
+ else
+ agg = bnxt_get_agg(bp, cpr, idx, i);
+ cons = agg->rx_agg_cmp_opaque;
+ frag_len = (le32_to_cpu(agg->rx_agg_cmp_len_flags_type) &
+ RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
+
+ cons_rx_buf = &rxr->rx_agg_ring[cons];
+ skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
+ cons_rx_buf->offset, frag_len,
+ BNXT_RX_PAGE_SIZE);
+ __clear_bit(cons, rxr->rx_agg_bmap);
+
+ /* It is possible for bnxt_alloc_rx_netmem() to allocate
+ * a sw_prod index that equals the cons index, so we
+ * need to clear the cons entry now.
+ */
+ mapping = cons_rx_buf->mapping;
+ netmem = cons_rx_buf->netmem;
+ cons_rx_buf->netmem = 0;
+
+ if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
+ skb->len -= frag_len;
+ skb->data_len -= frag_len;
+ skb->truesize -= BNXT_RX_PAGE_SIZE;
+ --skb_shinfo(skb)->nr_frags;
+ cons_rx_buf->netmem = netmem;
+
+ /* Update prod since possibly some pages have been
+ * allocated already.
+ */
+ rxr->rx_agg_prod = prod;
+ bnxt_reuse_rx_agg_bufs(cpr, idx, i, agg_bufs - i, tpa);
+ return 0;
+ }
+
+ dma_sync_single_for_cpu(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
+ bp->rx_dir);
+
+ total_frag_len += frag_len;
+ prod = NEXT_RX_AGG(prod);
+ }
+ rxr->rx_agg_prod = prod;
+ return total_frag_len;
+}
+
+static u32 __bnxt_rx_agg_pages_xdp(struct bnxt *bp,
+ struct bnxt_cp_ring_info *cpr,
+ struct skb_shared_info *shinfo,
+ u16 idx, u32 agg_bufs, bool tpa,
+ struct xdp_buff *xdp)
+{
+ struct bnxt_napi *bnapi = cpr->bnapi;
+ struct pci_dev *pdev = bp->pdev;
+ struct bnxt_rx_ring_info *rxr;
+ u32 i, total_frag_len = 0;
+ bool p5_tpa = false;
+ u16 prod;
+
+ rxr = bnapi->rx_ring;
+ prod = rxr->rx_agg_prod;
+
+ if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
+ p5_tpa = true;
+
+ for (i = 0; i < agg_bufs; i++) {
struct bnxt_sw_rx_agg_bd *cons_rx_buf;
- struct page *page;
+ skb_frag_t *frag = &shinfo->frags[i];
+ struct rx_agg_cmp *agg;
+ u16 cons, frag_len;
dma_addr_t mapping;
+ netmem_ref netmem;
if (p5_tpa)
agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
@@ -1223,9 +1315,10 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
cons_rx_buf = &rxr->rx_agg_ring[cons];
- skb_frag_fill_page_desc(frag, cons_rx_buf->page,
- cons_rx_buf->offset, frag_len);
+ skb_frag_fill_netmem_desc(frag, cons_rx_buf->netmem,
+ cons_rx_buf->offset, frag_len);
shinfo->nr_frags = i + 1;
+
__clear_bit(cons, rxr->rx_agg_bmap);
/* It is possible for bnxt_alloc_rx_page() to allocate
@@ -1233,15 +1326,15 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
* need to clear the cons entry now.
*/
mapping = cons_rx_buf->mapping;
- page = cons_rx_buf->page;
- cons_rx_buf->page = NULL;
+ netmem = cons_rx_buf->netmem;
+ cons_rx_buf->netmem = 0;
- if (xdp && page_is_pfmemalloc(page))
+ if (netmem_is_pfmemalloc(netmem))
xdp_buff_set_frag_pfmemalloc(xdp);
- if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
+ if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
--shinfo->nr_frags;
- cons_rx_buf->page = page;
+ cons_rx_buf->netmem = netmem;
/* Update prod since possibly some pages have been
* allocated already.
@@ -1266,20 +1359,12 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
struct sk_buff *skb, u16 idx,
u32 agg_bufs, bool tpa)
{
- struct skb_shared_info *shinfo = skb_shinfo(skb);
- u32 total_frag_len = 0;
-
- total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo, idx,
- agg_bufs, tpa, NULL);
- if (!total_frag_len) {
+ if (!__bnxt_rx_agg_pages_skb(bp, cpr, skb, idx, agg_bufs, tpa)) {
skb_mark_for_recycle(skb);
dev_kfree_skb(skb);
return NULL;
}
- skb->data_len += total_frag_len;
- skb->len += total_frag_len;
- skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs;
return skb;
}
@@ -1294,8 +1379,8 @@ static u32 bnxt_rx_agg_pages_xdp(struct bnxt *bp,
if (!xdp_buff_has_frags(xdp))
shinfo->nr_frags = 0;
- total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo,
- idx, agg_bufs, tpa, xdp);
+ total_frag_len = __bnxt_rx_agg_pages_xdp(bp, cpr, shinfo,
+ idx, agg_bufs, tpa, xdp);
if (total_frag_len) {
xdp_buff_set_frags_flag(xdp);
shinfo->nr_frags = agg_bufs;
@@ -3341,15 +3426,15 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
for (i = 0; i < max_idx; i++) {
struct bnxt_sw_rx_agg_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
- struct page *page = rx_agg_buf->page;
+ netmem_ref netmem = rx_agg_buf->netmem;
- if (!page)
+ if (!netmem)
continue;
- rx_agg_buf->page = NULL;
+ rx_agg_buf->netmem = 0;
__clear_bit(i, rxr->rx_agg_bmap);
- page_pool_recycle_direct(rxr->page_pool, page);
+ page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
}
}
@@ -3620,7 +3705,10 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.dev = &bp->pdev->dev;
pp.dma_dir = bp->rx_dir;
pp.max_len = PAGE_SIZE;
- pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+ pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
+ PP_FLAG_ALLOW_UNREADABLE_NETMEM;
+ pp.queue_idx = rxr->bnapi->index;
+ pp.order = 0;
rxr->page_pool = page_pool_create(&pp);
if (IS_ERR(rxr->page_pool)) {
@@ -4153,7 +4241,7 @@ static void bnxt_alloc_one_rx_ring_page(struct bnxt *bp,
prod = rxr->rx_agg_prod;
for (i = 0; i < bp->rx_agg_ring_size; i++) {
- if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_KERNEL)) {
+ if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_KERNEL)) {
netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d pages only\n",
ring_nr, i, bp->rx_ring_size);
break;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index e467341f1e5b..c38b0a8836e2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -894,7 +894,7 @@ struct bnxt_sw_rx_bd {
};
struct bnxt_sw_rx_agg_bd {
- struct page *page;
+ netmem_ref netmem;
unsigned int offset;
dma_addr_t mapping;
};
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 793e6fd78bc5..0149f6f6208f 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -382,6 +382,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
page_pool_put_full_page(pool, page, true);
}
+static inline void page_pool_recycle_direct_netmem(struct page_pool *pool,
+ netmem_ref netmem)
+{
+ page_pool_put_full_netmem(pool, netmem, true);
+}
+
#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA \
(sizeof(dma_addr_t) > sizeof(unsigned long))
--
2.34.1
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp
2024-10-22 16:23 ` [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp Taehee Yoo
@ 2024-11-01 14:53 ` Mina Almasry
2024-11-01 18:24 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: Mina Almasry @ 2024-11-01 14:53 UTC (permalink / raw)
To: Taehee Yoo
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> Currently, bnxt_en driver satisfies the requirements of Device memory
> TCP, which is tcp-data-split.
> So, it implements Device memory TCP for bnxt_en driver.
>
> From now on, the aggregation ring handles netmem_ref instead of page
> regardless of the on/off of netmem.
> So, for the aggregation ring, memory will be handled with the netmem
> page_pool API instead of generic page_pool API.
>
> If Devmem is enabled, netmem_ref is used as-is and if Devmem is not
> enabled, netmem_ref will be converted to page and that is used.
>
> Driver recognizes whether the devmem is set or unset based on the
> mp_params.mp_priv is not NULL.
> Only if devmem is set, it passes PP_FLAG_ALLOW_UNREADABLE_NETMEM.
Looks like in the latest version, you pass
PP_FLAG_ALLOW_UNREADABLE_NETMEM unconditionally, so this line is
obsolete.
However, I think you should only pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
if hds_thresh==0 and tcp-data-split==1, because otherwise the driver
is not configured well enough to handle unreadable netmem, right? I
know that we added checks in the devmem binding to detect hds_thresh
and tcp-data-split, but we should keep another layer of protection in
the driver. The driver should not set PP_FLAG_ALLOW_UNREADABLE_NETMEM
unless it's configured to be able to handle unreadable netmem.
>
> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v4:
> - Do not select NET_DEVMEM in Kconfig.
> - Pass PP_FLAG_ALLOW_UNREADABLE_NETMEM flag unconditionally.
> - Add __bnxt_rx_agg_pages_xdp().
> - Use gfp flag in __bnxt_alloc_rx_netmem().
> - Do not add *offset in the __bnxt_alloc_rx_netmem().
> - Do not pass queue_idx to bnxt_alloc_rx_page_pool().
> - Add Test tag from Stanislav.
> - Add page_pool_recycle_direct_netmem() helper.
>
> v3:
> - Patch added.
>
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 182 ++++++++++++++++------
> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +-
> include/net/page_pool/helpers.h | 6 +
> 3 files changed, 142 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 7d9da483b867..7924b1da0413 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -55,6 +55,7 @@
> #include <net/page_pool/helpers.h>
> #include <linux/align.h>
> #include <net/netdev_queues.h>
> +#include <net/netdev_rx_queue.h>
>
> #include "bnxt_hsi.h"
> #include "bnxt.h"
> @@ -863,6 +864,22 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
> bnapi->events &= ~BNXT_TX_CMP_EVENT;
> }
>
> +static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
> + struct bnxt_rx_ring_info *rxr,
> + unsigned int *offset,
> + gfp_t gfp)
> +{
> + netmem_ref netmem;
> +
> + netmem = page_pool_alloc_netmem(rxr->page_pool, gfp);
> + if (!netmem)
> + return 0;
> + *offset = 0;
> +
> + *mapping = page_pool_get_dma_addr_netmem(netmem);
> + return netmem;
> +}
> +
> static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
> struct bnxt_rx_ring_info *rxr,
> unsigned int *offset,
> @@ -972,21 +989,21 @@ static inline u16 bnxt_find_next_agg_idx(struct bnxt_rx_ring_info *rxr, u16 idx)
> return next;
> }
>
> -static inline int bnxt_alloc_rx_page(struct bnxt *bp,
> - struct bnxt_rx_ring_info *rxr,
> - u16 prod, gfp_t gfp)
> +static inline int bnxt_alloc_rx_netmem(struct bnxt *bp,
> + struct bnxt_rx_ring_info *rxr,
> + u16 prod, gfp_t gfp)
> {
> struct rx_bd *rxbd =
> &rxr->rx_agg_desc_ring[RX_AGG_RING(bp, prod)][RX_IDX(prod)];
> struct bnxt_sw_rx_agg_bd *rx_agg_buf;
> - struct page *page;
> - dma_addr_t mapping;
> u16 sw_prod = rxr->rx_sw_agg_prod;
> unsigned int offset = 0;
> + dma_addr_t mapping;
> + netmem_ref netmem;
>
> - page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
> + netmem = __bnxt_alloc_rx_netmem(bp, &mapping, rxr, &offset, gfp);
>
> - if (!page)
> + if (!netmem)
> return -ENOMEM;
>
> if (unlikely(test_bit(sw_prod, rxr->rx_agg_bmap)))
> @@ -996,7 +1013,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
> rx_agg_buf = &rxr->rx_agg_ring[sw_prod];
> rxr->rx_sw_agg_prod = RING_RX_AGG(bp, NEXT_RX_AGG(sw_prod));
>
> - rx_agg_buf->page = page;
> + rx_agg_buf->netmem = netmem;
> rx_agg_buf->offset = offset;
> rx_agg_buf->mapping = mapping;
> rxbd->rx_bd_haddr = cpu_to_le64(mapping);
> @@ -1044,7 +1061,7 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
> struct rx_agg_cmp *agg;
> struct bnxt_sw_rx_agg_bd *cons_rx_buf, *prod_rx_buf;
> struct rx_bd *prod_bd;
> - struct page *page;
> + netmem_ref netmem;
>
> if (p5_tpa)
> agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, start + i);
> @@ -1061,11 +1078,11 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
> cons_rx_buf = &rxr->rx_agg_ring[cons];
>
> /* It is possible for sw_prod to be equal to cons, so
> - * set cons_rx_buf->page to NULL first.
> + * set cons_rx_buf->netmem to 0 first.
> */
> - page = cons_rx_buf->page;
> - cons_rx_buf->page = NULL;
> - prod_rx_buf->page = page;
> + netmem = cons_rx_buf->netmem;
> + cons_rx_buf->netmem = 0;
> + prod_rx_buf->netmem = netmem;
> prod_rx_buf->offset = cons_rx_buf->offset;
>
> prod_rx_buf->mapping = cons_rx_buf->mapping;
> @@ -1190,29 +1207,104 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
> return skb;
> }
>
> -static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> - struct bnxt_cp_ring_info *cpr,
> - struct skb_shared_info *shinfo,
> - u16 idx, u32 agg_bufs, bool tpa,
> - struct xdp_buff *xdp)
> +static bool __bnxt_rx_agg_pages_skb(struct bnxt *bp,
> + struct bnxt_cp_ring_info *cpr,
> + struct sk_buff *skb,
> + u16 idx, u32 agg_bufs, bool tpa)
> {
To be honest I could not immediately understand why
__bnxt_rx_agg_pages needed to be split into __bnxt_rx_agg_pages_skb
and __bnxt_rx_agg_pages_xdp.
Fundamentally speaking we wanted the netmem transition to be as smooth
and low-churn as possible for drivers. The only big changes in this
patch are the split between skb and xdp. That points to a problem in
the design of netmem maybe.
For xdp, core makes sure that if xdp is enabled on the device, then
the netmem is always pages (never unreadable). So I think netmem
should be able to handle xdp as well as skb. Can you give more details
on why the split?
> struct bnxt_napi *bnapi = cpr->bnapi;
> struct pci_dev *pdev = bp->pdev;
> - struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
> - u16 prod = rxr->rx_agg_prod;
> + struct bnxt_rx_ring_info *rxr;
> u32 i, total_frag_len = 0;
> bool p5_tpa = false;
> + u16 prod;
> +
> + rxr = bnapi->rx_ring;
> + prod = rxr->rx_agg_prod;
>
> if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
> p5_tpa = true;
>
> for (i = 0; i < agg_bufs; i++) {
> - skb_frag_t *frag = &shinfo->frags[i];
> - u16 cons, frag_len;
> + struct bnxt_sw_rx_agg_bd *cons_rx_buf;
> struct rx_agg_cmp *agg;
> + u16 cons, frag_len;
> + dma_addr_t mapping;
> + netmem_ref netmem;
> +
> + if (p5_tpa)
> + agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
> + else
> + agg = bnxt_get_agg(bp, cpr, idx, i);
> + cons = agg->rx_agg_cmp_opaque;
> + frag_len = (le32_to_cpu(agg->rx_agg_cmp_len_flags_type) &
> + RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
> +
> + cons_rx_buf = &rxr->rx_agg_ring[cons];
> + skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
> + cons_rx_buf->offset, frag_len,
> + BNXT_RX_PAGE_SIZE);
> + __clear_bit(cons, rxr->rx_agg_bmap);
> +
> + /* It is possible for bnxt_alloc_rx_netmem() to allocate
> + * a sw_prod index that equals the cons index, so we
> + * need to clear the cons entry now.
> + */
> + mapping = cons_rx_buf->mapping;
> + netmem = cons_rx_buf->netmem;
> + cons_rx_buf->netmem = 0;
> +
> + if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
> + skb->len -= frag_len;
> + skb->data_len -= frag_len;
> + skb->truesize -= BNXT_RX_PAGE_SIZE;
> + --skb_shinfo(skb)->nr_frags;
> + cons_rx_buf->netmem = netmem;
> +
> + /* Update prod since possibly some pages have been
> + * allocated already.
> + */
> + rxr->rx_agg_prod = prod;
> + bnxt_reuse_rx_agg_bufs(cpr, idx, i, agg_bufs - i, tpa);
> + return 0;
> + }
> +
> + dma_sync_single_for_cpu(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
> + bp->rx_dir);
> +
You should probably use page_pool_dma_sync_for_cpu. I'm merging a
change to make that function skip dma-syncing for net_iov:
https://lore.kernel.org/netdev/20241029204541.1301203-5-almasrymina@google.com/
Which is necessary following Jason Gunthorpe's guidance.
> + total_frag_len += frag_len;
> + prod = NEXT_RX_AGG(prod);
> + }
> + rxr->rx_agg_prod = prod;
> + return total_frag_len;
> +}
> +
> +static u32 __bnxt_rx_agg_pages_xdp(struct bnxt *bp,
> + struct bnxt_cp_ring_info *cpr,
> + struct skb_shared_info *shinfo,
> + u16 idx, u32 agg_bufs, bool tpa,
> + struct xdp_buff *xdp)
> +{
> + struct bnxt_napi *bnapi = cpr->bnapi;
> + struct pci_dev *pdev = bp->pdev;
> + struct bnxt_rx_ring_info *rxr;
> + u32 i, total_frag_len = 0;
> + bool p5_tpa = false;
> + u16 prod;
> +
> + rxr = bnapi->rx_ring;
> + prod = rxr->rx_agg_prod;
> +
> + if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
> + p5_tpa = true;
> +
> + for (i = 0; i < agg_bufs; i++) {
> struct bnxt_sw_rx_agg_bd *cons_rx_buf;
> - struct page *page;
> + skb_frag_t *frag = &shinfo->frags[i];
> + struct rx_agg_cmp *agg;
> + u16 cons, frag_len;
> dma_addr_t mapping;
> + netmem_ref netmem;
>
> if (p5_tpa)
> agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
> @@ -1223,9 +1315,10 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
>
> cons_rx_buf = &rxr->rx_agg_ring[cons];
> - skb_frag_fill_page_desc(frag, cons_rx_buf->page,
> - cons_rx_buf->offset, frag_len);
> + skb_frag_fill_netmem_desc(frag, cons_rx_buf->netmem,
> + cons_rx_buf->offset, frag_len);
> shinfo->nr_frags = i + 1;
> +
> __clear_bit(cons, rxr->rx_agg_bmap);
>
> /* It is possible for bnxt_alloc_rx_page() to allocate
> @@ -1233,15 +1326,15 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> * need to clear the cons entry now.
> */
> mapping = cons_rx_buf->mapping;
> - page = cons_rx_buf->page;
> - cons_rx_buf->page = NULL;
> + netmem = cons_rx_buf->netmem;
> + cons_rx_buf->netmem = 0;
>
> - if (xdp && page_is_pfmemalloc(page))
> + if (netmem_is_pfmemalloc(netmem))
> xdp_buff_set_frag_pfmemalloc(xdp);
>
> - if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
> + if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
> --shinfo->nr_frags;
> - cons_rx_buf->page = page;
> + cons_rx_buf->netmem = netmem;
>
> /* Update prod since possibly some pages have been
> * allocated already.
> @@ -1266,20 +1359,12 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
> struct sk_buff *skb, u16 idx,
> u32 agg_bufs, bool tpa)
> {
> - struct skb_shared_info *shinfo = skb_shinfo(skb);
> - u32 total_frag_len = 0;
> -
> - total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo, idx,
> - agg_bufs, tpa, NULL);
> - if (!total_frag_len) {
> + if (!__bnxt_rx_agg_pages_skb(bp, cpr, skb, idx, agg_bufs, tpa)) {
> skb_mark_for_recycle(skb);
> dev_kfree_skb(skb);
> return NULL;
> }
>
> - skb->data_len += total_frag_len;
> - skb->len += total_frag_len;
> - skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs;
> return skb;
> }
>
> @@ -1294,8 +1379,8 @@ static u32 bnxt_rx_agg_pages_xdp(struct bnxt *bp,
> if (!xdp_buff_has_frags(xdp))
> shinfo->nr_frags = 0;
>
> - total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo,
> - idx, agg_bufs, tpa, xdp);
> + total_frag_len = __bnxt_rx_agg_pages_xdp(bp, cpr, shinfo,
> + idx, agg_bufs, tpa, xdp);
> if (total_frag_len) {
> xdp_buff_set_frags_flag(xdp);
> shinfo->nr_frags = agg_bufs;
> @@ -3341,15 +3426,15 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
>
> for (i = 0; i < max_idx; i++) {
> struct bnxt_sw_rx_agg_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
> - struct page *page = rx_agg_buf->page;
> + netmem_ref netmem = rx_agg_buf->netmem;
>
> - if (!page)
> + if (!netmem)
> continue;
>
> - rx_agg_buf->page = NULL;
> + rx_agg_buf->netmem = 0;
> __clear_bit(i, rxr->rx_agg_bmap);
>
> - page_pool_recycle_direct(rxr->page_pool, page);
> + page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
> }
> }
>
> @@ -3620,7 +3705,10 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> pp.dev = &bp->pdev->dev;
> pp.dma_dir = bp->rx_dir;
> pp.max_len = PAGE_SIZE;
> - pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> + pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
> + PP_FLAG_ALLOW_UNREADABLE_NETMEM;
PP_FLAG_ALLOW_UNREADABLE_NETMEM should only be set when the driver can
handle unreadable netmem. I.e. when hds_thresh==0 and
tcp-data-split==1.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp
2024-11-01 14:53 ` Mina Almasry
@ 2024-11-01 18:24 ` Taehee Yoo
2025-02-19 0:15 ` David Wei
0 siblings, 1 reply; 32+ messages in thread
From: Taehee Yoo @ 2024-11-01 18:24 UTC (permalink / raw)
To: Mina Almasry
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, dw, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Fri, Nov 1, 2024 at 11:53 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > Currently, bnxt_en driver satisfies the requirements of Device memory
> > TCP, which is tcp-data-split.
> > So, it implements Device memory TCP for bnxt_en driver.
> >
> > From now on, the aggregation ring handles netmem_ref instead of page
> > regardless of the on/off of netmem.
> > So, for the aggregation ring, memory will be handled with the netmem
> > page_pool API instead of generic page_pool API.
> >
> > If Devmem is enabled, netmem_ref is used as-is and if Devmem is not
> > enabled, netmem_ref will be converted to page and that is used.
> >
> > Driver recognizes whether the devmem is set or unset based on the
> > mp_params.mp_priv is not NULL.
> > Only if devmem is set, it passes PP_FLAG_ALLOW_UNREADABLE_NETMEM.
>
> Looks like in the latest version, you pass
> PP_FLAG_ALLOW_UNREADABLE_NETMEM unconditionally, so this line is
> obsolete.
Okay, I will remove this line.
>
> However, I think you should only pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
> if hds_thresh==0 and tcp-data-split==1, because otherwise the driver
> is not configured well enough to handle unreadable netmem, right? I
> know that we added checks in the devmem binding to detect hds_thresh
> and tcp-data-split, but we should keep another layer of protection in
> the driver. The driver should not set PP_FLAG_ALLOW_UNREADABLE_NETMEM
> unless it's configured to be able to handle unreadable netmem.
Okay, I agree, I will pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
only when hds_thresh==0 and tcp-data-split==1.
>
> >
> > Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v4:
> > - Do not select NET_DEVMEM in Kconfig.
> > - Pass PP_FLAG_ALLOW_UNREADABLE_NETMEM flag unconditionally.
> > - Add __bnxt_rx_agg_pages_xdp().
> > - Use gfp flag in __bnxt_alloc_rx_netmem().
> > - Do not add *offset in the __bnxt_alloc_rx_netmem().
> > - Do not pass queue_idx to bnxt_alloc_rx_page_pool().
> > - Add Test tag from Stanislav.
> > - Add page_pool_recycle_direct_netmem() helper.
> >
> > v3:
> > - Patch added.
> >
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 182 ++++++++++++++++------
> > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +-
> > include/net/page_pool/helpers.h | 6 +
> > 3 files changed, 142 insertions(+), 48 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 7d9da483b867..7924b1da0413 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -55,6 +55,7 @@
> > #include <net/page_pool/helpers.h>
> > #include <linux/align.h>
> > #include <net/netdev_queues.h>
> > +#include <net/netdev_rx_queue.h>
> >
> > #include "bnxt_hsi.h"
> > #include "bnxt.h"
> > @@ -863,6 +864,22 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
> > bnapi->events &= ~BNXT_TX_CMP_EVENT;
> > }
> >
> > +static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
> > + struct bnxt_rx_ring_info *rxr,
> > + unsigned int *offset,
> > + gfp_t gfp)
> > +{
> > + netmem_ref netmem;
> > +
> > + netmem = page_pool_alloc_netmem(rxr->page_pool, gfp);
> > + if (!netmem)
> > + return 0;
> > + *offset = 0;
> > +
> > + *mapping = page_pool_get_dma_addr_netmem(netmem);
> > + return netmem;
> > +}
> > +
> > static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
> > struct bnxt_rx_ring_info *rxr,
> > unsigned int *offset,
> > @@ -972,21 +989,21 @@ static inline u16 bnxt_find_next_agg_idx(struct bnxt_rx_ring_info *rxr, u16 idx)
> > return next;
> > }
> >
> > -static inline int bnxt_alloc_rx_page(struct bnxt *bp,
> > - struct bnxt_rx_ring_info *rxr,
> > - u16 prod, gfp_t gfp)
> > +static inline int bnxt_alloc_rx_netmem(struct bnxt *bp,
> > + struct bnxt_rx_ring_info *rxr,
> > + u16 prod, gfp_t gfp)
> > {
> > struct rx_bd *rxbd =
> > &rxr->rx_agg_desc_ring[RX_AGG_RING(bp, prod)][RX_IDX(prod)];
> > struct bnxt_sw_rx_agg_bd *rx_agg_buf;
> > - struct page *page;
> > - dma_addr_t mapping;
> > u16 sw_prod = rxr->rx_sw_agg_prod;
> > unsigned int offset = 0;
> > + dma_addr_t mapping;
> > + netmem_ref netmem;
> >
> > - page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
> > + netmem = __bnxt_alloc_rx_netmem(bp, &mapping, rxr, &offset, gfp);
> >
> > - if (!page)
> > + if (!netmem)
> > return -ENOMEM;
> >
> > if (unlikely(test_bit(sw_prod, rxr->rx_agg_bmap)))
> > @@ -996,7 +1013,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
> > rx_agg_buf = &rxr->rx_agg_ring[sw_prod];
> > rxr->rx_sw_agg_prod = RING_RX_AGG(bp, NEXT_RX_AGG(sw_prod));
> >
> > - rx_agg_buf->page = page;
> > + rx_agg_buf->netmem = netmem;
> > rx_agg_buf->offset = offset;
> > rx_agg_buf->mapping = mapping;
> > rxbd->rx_bd_haddr = cpu_to_le64(mapping);
> > @@ -1044,7 +1061,7 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
> > struct rx_agg_cmp *agg;
> > struct bnxt_sw_rx_agg_bd *cons_rx_buf, *prod_rx_buf;
> > struct rx_bd *prod_bd;
> > - struct page *page;
> > + netmem_ref netmem;
> >
> > if (p5_tpa)
> > agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, start + i);
> > @@ -1061,11 +1078,11 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
> > cons_rx_buf = &rxr->rx_agg_ring[cons];
> >
> > /* It is possible for sw_prod to be equal to cons, so
> > - * set cons_rx_buf->page to NULL first.
> > + * set cons_rx_buf->netmem to 0 first.
> > */
> > - page = cons_rx_buf->page;
> > - cons_rx_buf->page = NULL;
> > - prod_rx_buf->page = page;
> > + netmem = cons_rx_buf->netmem;
> > + cons_rx_buf->netmem = 0;
> > + prod_rx_buf->netmem = netmem;
> > prod_rx_buf->offset = cons_rx_buf->offset;
> >
> > prod_rx_buf->mapping = cons_rx_buf->mapping;
> > @@ -1190,29 +1207,104 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
> > return skb;
> > }
> >
> > -static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> > - struct bnxt_cp_ring_info *cpr,
> > - struct skb_shared_info *shinfo,
> > - u16 idx, u32 agg_bufs, bool tpa,
> > - struct xdp_buff *xdp)
> > +static bool __bnxt_rx_agg_pages_skb(struct bnxt *bp,
> > + struct bnxt_cp_ring_info *cpr,
> > + struct sk_buff *skb,
> > + u16 idx, u32 agg_bufs, bool tpa)
> > {
>
> To be honest I could not immediately understand why
> __bnxt_rx_agg_pages needed to be split into __bnxt_rx_agg_pages_skb
> and __bnxt_rx_agg_pages_xdp.
>
> Fundamentally speaking we wanted the netmem transition to be as smooth
> and low-churn as possible for drivers. The only big changes in this
> patch are the split between skb and xdp. That points to a problem in
> the design of netmem maybe.
>
> For xdp, core makes sure that if xdp is enabled on the device, then
> the netmem is always pages (never unreadable). So I think netmem
> should be able to handle xdp as well as skb. Can you give more details
> on why the split?
In the v3 patch, there was an opinion that refactoring for separating
into skb path and xdp path in the future. So I changed.
As you feel, I think separating skb path and xdp path is not directly
related to the purpose of this patch.
I agree the separating them but no need to be included in this patchset.
I will revert it.
>
> > struct bnxt_napi *bnapi = cpr->bnapi;
> > struct pci_dev *pdev = bp->pdev;
> > - struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
> > - u16 prod = rxr->rx_agg_prod;
> > + struct bnxt_rx_ring_info *rxr;
> > u32 i, total_frag_len = 0;
> > bool p5_tpa = false;
> > + u16 prod;
> > +
> > + rxr = bnapi->rx_ring;
> > + prod = rxr->rx_agg_prod;
> >
> > if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
> > p5_tpa = true;
> >
> > for (i = 0; i < agg_bufs; i++) {
> > - skb_frag_t *frag = &shinfo->frags[i];
> > - u16 cons, frag_len;
> > + struct bnxt_sw_rx_agg_bd *cons_rx_buf;
> > struct rx_agg_cmp *agg;
> > + u16 cons, frag_len;
> > + dma_addr_t mapping;
> > + netmem_ref netmem;
> > +
> > + if (p5_tpa)
> > + agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
> > + else
> > + agg = bnxt_get_agg(bp, cpr, idx, i);
> > + cons = agg->rx_agg_cmp_opaque;
> > + frag_len = (le32_to_cpu(agg->rx_agg_cmp_len_flags_type) &
> > + RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
> > +
> > + cons_rx_buf = &rxr->rx_agg_ring[cons];
> > + skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
> > + cons_rx_buf->offset, frag_len,
> > + BNXT_RX_PAGE_SIZE);
> > + __clear_bit(cons, rxr->rx_agg_bmap);
> > +
> > + /* It is possible for bnxt_alloc_rx_netmem() to allocate
> > + * a sw_prod index that equals the cons index, so we
> > + * need to clear the cons entry now.
> > + */
> > + mapping = cons_rx_buf->mapping;
> > + netmem = cons_rx_buf->netmem;
> > + cons_rx_buf->netmem = 0;
> > +
> > + if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
> > + skb->len -= frag_len;
> > + skb->data_len -= frag_len;
> > + skb->truesize -= BNXT_RX_PAGE_SIZE;
> > + --skb_shinfo(skb)->nr_frags;
> > + cons_rx_buf->netmem = netmem;
> > +
> > + /* Update prod since possibly some pages have been
> > + * allocated already.
> > + */
> > + rxr->rx_agg_prod = prod;
> > + bnxt_reuse_rx_agg_bufs(cpr, idx, i, agg_bufs - i, tpa);
> > + return 0;
> > + }
> > +
> > + dma_sync_single_for_cpu(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
> > + bp->rx_dir);
> > +
>
> You should probably use page_pool_dma_sync_for_cpu. I'm merging a
> change to make that function skip dma-syncing for net_iov:
>
> https://lore.kernel.org/netdev/20241029204541.1301203-5-almasrymina@google.com/
>
> Which is necessary following Jason Gunthorpe's guidance.
Okay, no problem.
I will wait to merge it then I will use that then send a v5 patch.
>
> > + total_frag_len += frag_len;
> > + prod = NEXT_RX_AGG(prod);
> > + }
> > + rxr->rx_agg_prod = prod;
> > + return total_frag_len;
> > +}
> > +
> > +static u32 __bnxt_rx_agg_pages_xdp(struct bnxt *bp,
> > + struct bnxt_cp_ring_info *cpr,
> > + struct skb_shared_info *shinfo,
> > + u16 idx, u32 agg_bufs, bool tpa,
> > + struct xdp_buff *xdp)
> > +{
> > + struct bnxt_napi *bnapi = cpr->bnapi;
> > + struct pci_dev *pdev = bp->pdev;
> > + struct bnxt_rx_ring_info *rxr;
> > + u32 i, total_frag_len = 0;
> > + bool p5_tpa = false;
> > + u16 prod;
> > +
> > + rxr = bnapi->rx_ring;
> > + prod = rxr->rx_agg_prod;
> > +
> > + if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
> > + p5_tpa = true;
> > +
> > + for (i = 0; i < agg_bufs; i++) {
> > struct bnxt_sw_rx_agg_bd *cons_rx_buf;
> > - struct page *page;
> > + skb_frag_t *frag = &shinfo->frags[i];
> > + struct rx_agg_cmp *agg;
> > + u16 cons, frag_len;
> > dma_addr_t mapping;
> > + netmem_ref netmem;
> >
> > if (p5_tpa)
> > agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
> > @@ -1223,9 +1315,10 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> > RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
> >
> > cons_rx_buf = &rxr->rx_agg_ring[cons];
> > - skb_frag_fill_page_desc(frag, cons_rx_buf->page,
> > - cons_rx_buf->offset, frag_len);
> > + skb_frag_fill_netmem_desc(frag, cons_rx_buf->netmem,
> > + cons_rx_buf->offset, frag_len);
> > shinfo->nr_frags = i + 1;
> > +
> > __clear_bit(cons, rxr->rx_agg_bmap);
> >
> > /* It is possible for bnxt_alloc_rx_page() to allocate
> > @@ -1233,15 +1326,15 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
> > * need to clear the cons entry now.
> > */
> > mapping = cons_rx_buf->mapping;
> > - page = cons_rx_buf->page;
> > - cons_rx_buf->page = NULL;
> > + netmem = cons_rx_buf->netmem;
> > + cons_rx_buf->netmem = 0;
> >
> > - if (xdp && page_is_pfmemalloc(page))
> > + if (netmem_is_pfmemalloc(netmem))
> > xdp_buff_set_frag_pfmemalloc(xdp);
> >
> > - if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
> > + if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
> > --shinfo->nr_frags;
> > - cons_rx_buf->page = page;
> > + cons_rx_buf->netmem = netmem;
> >
> > /* Update prod since possibly some pages have been
> > * allocated already.
> > @@ -1266,20 +1359,12 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
> > struct sk_buff *skb, u16 idx,
> > u32 agg_bufs, bool tpa)
> > {
> > - struct skb_shared_info *shinfo = skb_shinfo(skb);
> > - u32 total_frag_len = 0;
> > -
> > - total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo, idx,
> > - agg_bufs, tpa, NULL);
> > - if (!total_frag_len) {
> > + if (!__bnxt_rx_agg_pages_skb(bp, cpr, skb, idx, agg_bufs, tpa)) {
> > skb_mark_for_recycle(skb);
> > dev_kfree_skb(skb);
> > return NULL;
> > }
> >
> > - skb->data_len += total_frag_len;
> > - skb->len += total_frag_len;
> > - skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs;
> > return skb;
> > }
> >
> > @@ -1294,8 +1379,8 @@ static u32 bnxt_rx_agg_pages_xdp(struct bnxt *bp,
> > if (!xdp_buff_has_frags(xdp))
> > shinfo->nr_frags = 0;
> >
> > - total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo,
> > - idx, agg_bufs, tpa, xdp);
> > + total_frag_len = __bnxt_rx_agg_pages_xdp(bp, cpr, shinfo,
> > + idx, agg_bufs, tpa, xdp);
> > if (total_frag_len) {
> > xdp_buff_set_frags_flag(xdp);
> > shinfo->nr_frags = agg_bufs;
> > @@ -3341,15 +3426,15 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
> >
> > for (i = 0; i < max_idx; i++) {
> > struct bnxt_sw_rx_agg_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
> > - struct page *page = rx_agg_buf->page;
> > + netmem_ref netmem = rx_agg_buf->netmem;
> >
> > - if (!page)
> > + if (!netmem)
> > continue;
> >
> > - rx_agg_buf->page = NULL;
> > + rx_agg_buf->netmem = 0;
> > __clear_bit(i, rxr->rx_agg_bmap);
> >
> > - page_pool_recycle_direct(rxr->page_pool, page);
> > + page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
> > }
> > }
> >
> > @@ -3620,7 +3705,10 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> > pp.dev = &bp->pdev->dev;
> > pp.dma_dir = bp->rx_dir;
> > pp.max_len = PAGE_SIZE;
> > - pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> > + pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
> > + PP_FLAG_ALLOW_UNREADABLE_NETMEM;
>
> PP_FLAG_ALLOW_UNREADABLE_NETMEM should only be set when the driver can
> handle unreadable netmem. I.e. when hds_thresh==0 and
> tcp-data-split==1.
Okay, I will add a condition for that.
Thanks a lot!
Taehee Yoo
>
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp
2024-11-01 18:24 ` Taehee Yoo
@ 2025-02-19 0:15 ` David Wei
2025-02-19 2:37 ` Taehee Yoo
0 siblings, 1 reply; 32+ messages in thread
From: David Wei @ 2025-02-19 0:15 UTC (permalink / raw)
To: Taehee Yoo, Mina Almasry
Cc: davem, kuba, pabeni, edumazet, donald.hunter, corbet,
michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast, daniel,
john.fastabend, sdf, asml.silence, brett.creeley, linux-doc,
netdev, kory.maincent, maxime.chevallier, danieller, hengqi,
ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On 2024-11-01 11:24, Taehee Yoo wrote:
> On Fri, Nov 1, 2024 at 11:53 PM Mina Almasry <almasrymina@google.com> wrote:
>>
>> On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
>>>
>>> Currently, bnxt_en driver satisfies the requirements of Device memory
>>> TCP, which is tcp-data-split.
>>> So, it implements Device memory TCP for bnxt_en driver.
>>>
>>> From now on, the aggregation ring handles netmem_ref instead of page
>>> regardless of the on/off of netmem.
>>> So, for the aggregation ring, memory will be handled with the netmem
>>> page_pool API instead of generic page_pool API.
>>>
>>> If Devmem is enabled, netmem_ref is used as-is and if Devmem is not
>>> enabled, netmem_ref will be converted to page and that is used.
>>>
>>> Driver recognizes whether the devmem is set or unset based on the
>>> mp_params.mp_priv is not NULL.
>>> Only if devmem is set, it passes PP_FLAG_ALLOW_UNREADABLE_NETMEM.
>>
>> Looks like in the latest version, you pass
>> PP_FLAG_ALLOW_UNREADABLE_NETMEM unconditionally, so this line is
>> obsolete.
>
> Okay, I will remove this line.
>
>>
>> However, I think you should only pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
>> if hds_thresh==0 and tcp-data-split==1, because otherwise the driver
>> is not configured well enough to handle unreadable netmem, right? I
>> know that we added checks in the devmem binding to detect hds_thresh
>> and tcp-data-split, but we should keep another layer of protection in
>> the driver. The driver should not set PP_FLAG_ALLOW_UNREADABLE_NETMEM
>> unless it's configured to be able to handle unreadable netmem.
>
> Okay, I agree, I will pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
> only when hds_thresh==0 and tcp-data-split==1.
>
>>
>>>
>>> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
>>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>>> ---
>>>
>>> v4:
>>> - Do not select NET_DEVMEM in Kconfig.
>>> - Pass PP_FLAG_ALLOW_UNREADABLE_NETMEM flag unconditionally.
>>> - Add __bnxt_rx_agg_pages_xdp().
>>> - Use gfp flag in __bnxt_alloc_rx_netmem().
>>> - Do not add *offset in the __bnxt_alloc_rx_netmem().
>>> - Do not pass queue_idx to bnxt_alloc_rx_page_pool().
>>> - Add Test tag from Stanislav.
>>> - Add page_pool_recycle_direct_netmem() helper.
>>>
>>> v3:
>>> - Patch added.
>>>
>>> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 182 ++++++++++++++++------
>>> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +-
>>> include/net/page_pool/helpers.h | 6 +
>>> 3 files changed, 142 insertions(+), 48 deletions(-)
Hi Taehee, what is your plan with this patch? Are you still working on
it? I noticed that you dropped it in later versions of this series. With
io_uring zero copy Rx now merged I also need bnxt support, but I don't
want to duplicate efforts. Please let me know, thanks!
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v4 8/8] bnxt_en: add support for device memory tcp
2025-02-19 0:15 ` David Wei
@ 2025-02-19 2:37 ` Taehee Yoo
0 siblings, 0 replies; 32+ messages in thread
From: Taehee Yoo @ 2025-02-19 2:37 UTC (permalink / raw)
To: David Wei
Cc: Mina Almasry, davem, kuba, pabeni, edumazet, donald.hunter,
corbet, michael.chan, andrew+netdev, hawk, ilias.apalodimas, ast,
daniel, john.fastabend, sdf, asml.silence, brett.creeley,
linux-doc, netdev, kory.maincent, maxime.chevallier, danieller,
hengqi, ecree.xilinx, przemyslaw.kitszel, hkallweit1, ahmed.zaki,
rrameshbabu, idosch, jiri, bigeasy, lorenzo, jdamato,
aleksander.lobakin, kaiyuanz, willemb, daniel.zahka
On Wed, Feb 19, 2025 at 9:16 AM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-11-01 11:24, Taehee Yoo wrote:
> > On Fri, Nov 1, 2024 at 11:53 PM Mina Almasry <almasrymina@google.com> wrote:
> >>
> >> On Tue, Oct 22, 2024 at 9:25 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >>>
> >>> Currently, bnxt_en driver satisfies the requirements of Device memory
> >>> TCP, which is tcp-data-split.
> >>> So, it implements Device memory TCP for bnxt_en driver.
> >>>
> >>> From now on, the aggregation ring handles netmem_ref instead of page
> >>> regardless of the on/off of netmem.
> >>> So, for the aggregation ring, memory will be handled with the netmem
> >>> page_pool API instead of generic page_pool API.
> >>>
> >>> If Devmem is enabled, netmem_ref is used as-is and if Devmem is not
> >>> enabled, netmem_ref will be converted to page and that is used.
> >>>
> >>> Driver recognizes whether the devmem is set or unset based on the
> >>> mp_params.mp_priv is not NULL.
> >>> Only if devmem is set, it passes PP_FLAG_ALLOW_UNREADABLE_NETMEM.
> >>
> >> Looks like in the latest version, you pass
> >> PP_FLAG_ALLOW_UNREADABLE_NETMEM unconditionally, so this line is
> >> obsolete.
> >
> > Okay, I will remove this line.
> >
> >>
> >> However, I think you should only pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
> >> if hds_thresh==0 and tcp-data-split==1, because otherwise the driver
> >> is not configured well enough to handle unreadable netmem, right? I
> >> know that we added checks in the devmem binding to detect hds_thresh
> >> and tcp-data-split, but we should keep another layer of protection in
> >> the driver. The driver should not set PP_FLAG_ALLOW_UNREADABLE_NETMEM
> >> unless it's configured to be able to handle unreadable netmem.
> >
> > Okay, I agree, I will pass PP_FLAG_ALLOW_UNREADABLE_NETMEM
> > only when hds_thresh==0 and tcp-data-split==1.
> >
> >>
> >>>
> >>> Tested-by: Stanislav Fomichev <sdf@fomichev.me>
> >>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> >>> ---
> >>>
> >>> v4:
> >>> - Do not select NET_DEVMEM in Kconfig.
> >>> - Pass PP_FLAG_ALLOW_UNREADABLE_NETMEM flag unconditionally.
> >>> - Add __bnxt_rx_agg_pages_xdp().
> >>> - Use gfp flag in __bnxt_alloc_rx_netmem().
> >>> - Do not add *offset in the __bnxt_alloc_rx_netmem().
> >>> - Do not pass queue_idx to bnxt_alloc_rx_page_pool().
> >>> - Add Test tag from Stanislav.
> >>> - Add page_pool_recycle_direct_netmem() helper.
> >>>
> >>> v3:
> >>> - Patch added.
> >>>
> >>> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 182 ++++++++++++++++------
> >>> drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +-
> >>> include/net/page_pool/helpers.h | 6 +
> >>> 3 files changed, 142 insertions(+), 48 deletions(-)
>
> Hi Taehee, what is your plan with this patch? Are you still working on
> it? I noticed that you dropped it in later versions of this series. With
> io_uring zero copy Rx now merged I also need bnxt support, but I don't
> want to duplicate efforts. Please let me know, thanks!
Hi David,
Sorry for the late! I'm still working on it.
I implemented a working patch, but there are several bugs.
So I'm thinking about how to deal with it.
And then, I would like to send this patch after fixing bugs.
Thanks a lot!
Taehee Yoo
^ permalink raw reply [flat|nested] 32+ messages in thread