* Re: [PATCH V2 net] net: mana: Configure hwc timeout from hardware
From: Greg KH @ 2023-07-07 10:17 UTC (permalink / raw)
To: Souradeep Chakrabarti
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
schakrabarti
In-Reply-To: <1688723128-14878-1-git-send-email-schakrabarti@linux.microsoft.com>
On Fri, Jul 07, 2023 at 02:45:28AM -0700, Souradeep Chakrabarti wrote:
> At present hwc timeout value is a fixed value.
> This patch sets the hwc timeout from the hardware.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V1 -> V2:
> * Added return check for mana_gd_query_hwc_timeout
> * Removed dev_err from mana_gd_query_hwc_timeout
> ---
> .../net/ethernet/microsoft/mana/gdma_main.c | 30 ++++++++++++++++++-
> .../net/ethernet/microsoft/mana/hw_channel.c | 25 +++++++++++++++-
> include/net/mana/gdma.h | 20 ++++++++++++-
> include/net/mana/hw_channel.h | 5 ++++
> 4 files changed, 77 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..949c927c3a7e 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -106,6 +106,27 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> return 0;
> }
>
> +static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32 *timeout_val)
> +{
> + struct gdma_context *gc = pci_get_drvdata(pdev);
> + struct gdma_query_hwc_timeout_req req = {};
> + struct gdma_query_hwc_timeout_resp resp = {};
> + int err;
> +
> + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
> + sizeof(req), sizeof(resp));
> + req.timeout_ms = *timeout_val;
> + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> + if (err || resp.hdr.status)
> + return err ? err : -EPROTO;
> +
> + *timeout_val = resp.timeout_ms;
> + dev_info(gc->dev, "Successfully changed the timeout value %u\n",
> + *timeout_val);
When the kernel works properly, it is quiet. Please always remove your
debugging code before submitting changes for inclusion.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH V2 net] net: mana: Configure hwc timeout from hardware
From: Greg KH @ 2023-07-07 10:16 UTC (permalink / raw)
To: Souradeep Chakrabarti
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
schakrabarti
In-Reply-To: <1688723128-14878-1-git-send-email-schakrabarti@linux.microsoft.com>
On Fri, Jul 07, 2023 at 02:45:28AM -0700, Souradeep Chakrabarti wrote:
> At present hwc timeout value is a fixed value.
> This patch sets the hwc timeout from the hardware.
This really does not describe what is happening here. Please read the
documentation for how to write a good changelog text.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V1 -> V2:
> * Added return check for mana_gd_query_hwc_timeout
> * Removed dev_err from mana_gd_query_hwc_timeout
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
^ permalink raw reply
* Re: [PATCH net] net: mana: Configure hwc timeout from hardware
From: Souradeep Chakrabarti @ 2023-07-07 9:51 UTC (permalink / raw)
To: Leon Romanovsky
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
schakrabarti
In-Reply-To: <20230705104731.GM6455@unreal>
On Wed, Jul 05, 2023 at 01:47:31PM +0300, Leon Romanovsky wrote:
> On Wed, Jul 05, 2023 at 02:32:58AM -0700, Souradeep Chakrabarti wrote:
> > At present hwc timeout value is a fixed value.
> > This patch sets the hwc timeout from the hardware.
> >
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > .../net/ethernet/microsoft/mana/gdma_main.c | 27 +++++++++++++++++++
> > .../net/ethernet/microsoft/mana/hw_channel.c | 25 ++++++++++++++++-
> > include/net/mana/gdma.h | 20 +++++++++++++-
> > include/net/mana/hw_channel.h | 5 ++++
> > 4 files changed, 75 insertions(+), 2 deletions(-)
>
> We are in merge window now, it is not net material.
>
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..5d30347e0137 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -106,6 +106,30 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> > return 0;
> > }
> >
> > +static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32 *timeout_val)
> > +{
>
> Callers are not checking return value, so or make this function void or
> check return value.
I have fixed it in V2 patch.
>
> > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > + struct gdma_query_hwc_timeout_req req = {};
> > + struct gdma_query_hwc_timeout_resp resp = {};
> > + int err;
> > +
> > + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
> > + sizeof(req), sizeof(resp));
> > + req.timeout_ms = *timeout_val;
> > + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > + if (err || resp.hdr.status) {
>
> I see this check almost in all callers to mana_gd_send_request(). It
> will be nice if mana_gd_send_request() would check status internally
> and return error.
>
In a separate patch in future we can do that.
Thanks for the suggestion.
> > + dev_err(gc->dev, "Failed to query timeout: %d, 0x%x\n", err,
> > + resp.hdr.status);
> > + return err ? err : -EPROTO;
> > + }
> > +
> > + *timeout_val = resp.timeout_ms;
> > + dev_info(gc->dev, "Successfully changed the timeout value %u\n",
> > + *timeout_val);
> > +
> > + return 0;
> > +}
> > +
> > static int mana_gd_detect_devices(struct pci_dev *pdev)
> > {
> > struct gdma_context *gc = pci_get_drvdata(pdev);
> > @@ -879,6 +903,7 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> > struct gdma_context *gc = pci_get_drvdata(pdev);
> > struct gdma_verify_ver_resp resp = {};
> > struct gdma_verify_ver_req req = {};
> > + struct hw_channel_context *hwc = gc->hwc.driver_data;
> > int err;
> >
> > mana_gd_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
> > @@ -907,6 +932,8 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> > err, resp.hdr.status);
> > return err ? err : -EPROTO;
> > }
> > + if (resp.pf_cap_flags1 & GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> > + mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
> >
> > return 0;
> > }
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..f5980c26fd09 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue *q_self,
> > complete(&hwc->hwc_init_eqe_comp);
> > break;
> >
> > + case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
> > + type_data.as_uint32 = event->details[0];
> > + type = type_data.type;
> > + val = type_data.value;
> > +
> > + switch (type) {
> > + case HWC_DATA_CFG_HWC_TIMEOUT:
> > + hwc->hwc_timeout = val;
> > + break;
> > +
> > + default:
> > + dev_warn(hwc->dev, "Received unknown reconfig type %u\n", type);
> > + break;
> > + }
> > +
> > + break;
> > +
> > default:
> > + dev_warn(hwc->dev, "Received unknown gdma event %u\n", event->type);
> > /* Ignore unknown events, which should never happen. */
> > break;
> > }
> > @@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context *gc)
> > gd->pdid = INVALID_PDID;
> > gd->doorbell = INVALID_DOORBELL;
> >
> > + hwc->hwc_timeout = HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
> > /* mana_hwc_init_queues() only creates the required data structures,
> > * and doesn't touch the HWC device.
> > */
> > @@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
> > hwc->gdma_dev->doorbell = INVALID_DOORBELL;
> > hwc->gdma_dev->pdid = INVALID_PDID;
> >
> > + hwc->hwc_timeout = 0;
> > +
> > kfree(hwc);
> > gc->hwc.driver_data = NULL;
> > gc->hwc.gdma_context = NULL;
> > @@ -818,6 +839,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> > dest_vrq = hwc->pf_dest_vrq_id;
> > dest_vrcq = hwc->pf_dest_vrcq_id;
> > }
> > + dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
> >
> > err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
> > if (err) {
> > @@ -825,7 +847,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> > goto out;
> > }
> >
> > - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> > + if (!wait_for_completion_timeout(&ctx->comp_event,
> > + (hwc->hwc_timeout / 1000) * HZ)) {
> > dev_err(hwc->dev, "HWC: Request timed out!\n");
> > err = -ETIMEDOUT;
> > goto out;
> > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> > index 96c120160f15..88b6ef7ce1a6 100644
> > --- a/include/net/mana/gdma.h
> > +++ b/include/net/mana/gdma.h
> > @@ -33,6 +33,7 @@ enum gdma_request_type {
> > GDMA_DESTROY_PD = 30,
> > GDMA_CREATE_MR = 31,
> > GDMA_DESTROY_MR = 32,
> > + GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
> > };
> >
> > #define GDMA_RESOURCE_DOORBELL_PAGE 27
> > @@ -57,6 +58,8 @@ enum gdma_eqe_type {
> > GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
> > GDMA_EQE_HWC_INIT_DATA = 130,
> > GDMA_EQE_HWC_INIT_DONE = 131,
> > + GDMA_EQE_HWC_SOC_RECONFIG = 132,
> > + GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
> > };
> >
> > enum {
> > @@ -531,10 +534,12 @@ enum {
> > * so the driver is able to reliably support features like busy_poll.
> > */
> > #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
> > +#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
> >
> > #define GDMA_DRV_CAP_FLAGS1 \
> > (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> > - GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
> > + GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> > + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> >
> > #define GDMA_DRV_CAP_FLAGS2 0
> >
> > @@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
> > u32 alloc_res_id_on_creation;
> > }; /* HW DATA */
> >
> > +/* GDMA_QUERY_HWC_TIMEOUT */
> > +struct gdma_query_hwc_timeout_req {
> > + struct gdma_req_hdr hdr;
> > + u32 timeout_ms;
> > + u32 reserved;
> > +};
> > +
> > +struct gdma_query_hwc_timeout_resp {
> > + struct gdma_resp_hdr hdr;
> > + u32 timeout_ms;
> > + u32 reserved;
> > +};
> > +
> > enum atb_page_size {
> > ATB_PAGE_SIZE_4K,
> > ATB_PAGE_SIZE_8K,
> > diff --git a/include/net/mana/hw_channel.h b/include/net/mana/hw_channel.h
> > index 6a757a6e2732..3d3b5c881bc1 100644
> > --- a/include/net/mana/hw_channel.h
> > +++ b/include/net/mana/hw_channel.h
> > @@ -23,6 +23,10 @@
> > #define HWC_INIT_DATA_PF_DEST_RQ_ID 10
> > #define HWC_INIT_DATA_PF_DEST_CQ_ID 11
> >
> > +#define HWC_DATA_CFG_HWC_TIMEOUT 1
> > +
> > +#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
> > +
> > /* Structures labeled with "HW DATA" are exchanged with the hardware. All of
> > * them are naturally aligned and hence don't need __packed.
> > */
> > @@ -182,6 +186,7 @@ struct hw_channel_context {
> >
> > u32 pf_dest_vrq_id;
> > u32 pf_dest_vrcq_id;
> > + u32 hwc_timeout;
> >
> > struct hwc_caller_ctx *caller_ctx;
> > };
> > --
> > 2.34.1
> >
^ permalink raw reply
* Re: [PATCH net] net: mana: Configure hwc timeout from hardware
From: Souradeep Chakrabarti @ 2023-07-07 9:48 UTC (permalink / raw)
To: Saurabh Singh Sengar
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org, Souradeep Chakrabarti
In-Reply-To: <PUZP153MB074909725BED8C12720CFE3EBE2FA@PUZP153MB0749.APCP153.PROD.OUTLOOK.COM>
On Wed, Jul 05, 2023 at 11:11:50AM +0000, Saurabh Singh Sengar wrote:
>
>
> > -----Original Message-----
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > Sent: Wednesday, July 5, 2023 3:03 PM
> > To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> > <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> > kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> > Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> > cai.huoqing@linux.dev; ssengar@linux.microsoft.com;
> > vkuznets@redhat.com; tglx@linutronix.de; linux-hyperv@vger.kernel.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > rdma@vger.kernel.org
> > Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> > <schakrabarti@microsoft.com>; Souradeep Chakrabarti
> > <schakrabarti@linux.microsoft.com>
> > Subject: [PATCH net] net: mana: Configure hwc timeout from hardware
> >
> > At present hwc timeout value is a fixed value.
> > This patch sets the hwc timeout from the hardware.
> >
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > .../net/ethernet/microsoft/mana/gdma_main.c | 27 +++++++++++++++++++
> > .../net/ethernet/microsoft/mana/hw_channel.c | 25 ++++++++++++++++-
> > include/net/mana/gdma.h | 20 +++++++++++++-
> > include/net/mana/hw_channel.h | 5 ++++
> > 4 files changed, 75 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 8f3f78b68592..5d30347e0137 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -106,6 +106,30 @@ static int mana_gd_query_max_resources(struct
> > pci_dev *pdev)
> > return 0;
> > }
> >
> > +static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32
> > +*timeout_val) {
> > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > + struct gdma_query_hwc_timeout_req req = {};
> > + struct gdma_query_hwc_timeout_resp resp = {};
> > + int err;
> > +
> > + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
> > + sizeof(req), sizeof(resp));
> > + req.timeout_ms = *timeout_val;
> > + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp),
> > &resp);
> > + if (err || resp.hdr.status) {
> > + dev_err(gc->dev, "Failed to query timeout: %d, 0x%x\n", err,
> > + resp.hdr.status);
> > + return err ? err : -EPROTO;
> > + }
> > +
> > + *timeout_val = resp.timeout_ms;
> > + dev_info(gc->dev, "Successfully changed the timeout value %u\n",
> > + *timeout_val);
> > +
> > + return 0;
> > +}
> > +
> > static int mana_gd_detect_devices(struct pci_dev *pdev) {
> > struct gdma_context *gc = pci_get_drvdata(pdev); @@ -879,6 +903,7
> > @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> > struct gdma_context *gc = pci_get_drvdata(pdev);
> > struct gdma_verify_ver_resp resp = {};
> > struct gdma_verify_ver_req req = {};
> > + struct hw_channel_context *hwc = gc->hwc.driver_data;
> > int err;
> >
> > mana_gd_init_req_hdr(&req.hdr,
> > GDMA_VERIFY_VF_DRIVER_VERSION, @@ -907,6 +932,8 @@ int
> > mana_gd_verify_vf_version(struct pci_dev *pdev)
> > err, resp.hdr.status);
> > return err ? err : -EPROTO;
> > }
> > + if (resp.pf_cap_flags1 &
> > GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> > + mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
> >
> > return 0;
> > }
> > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > index 9d1507eba5b9..f5980c26fd09 100644
> > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> > @@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx,
> > struct gdma_queue *q_self,
> > complete(&hwc->hwc_init_eqe_comp);
> > break;
> >
> > + case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
> > + type_data.as_uint32 = event->details[0];
> > + type = type_data.type;
> > + val = type_data.value;
> > +
> > + switch (type) {
> > + case HWC_DATA_CFG_HWC_TIMEOUT:
> > + hwc->hwc_timeout = val;
> > + break;
> > +
> > + default:
> > + dev_warn(hwc->dev, "Received unknown reconfig
> > type %u\n", type);
> > + break;
> > + }
> > +
> > + break;
> > +
> > default:
> > + dev_warn(hwc->dev, "Received unknown gdma event %u\n",
> > event->type);
> > /* Ignore unknown events, which should never happen. */
> > break;
> > }
> > @@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context
> > *gc)
> > gd->pdid = INVALID_PDID;
> > gd->doorbell = INVALID_DOORBELL;
> >
> > + hwc->hwc_timeout =
> > HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
> > /* mana_hwc_init_queues() only creates the required data structures,
> > * and doesn't touch the HWC device.
> > */
> > @@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct
> > gdma_context *gc)
> > hwc->gdma_dev->doorbell = INVALID_DOORBELL;
> > hwc->gdma_dev->pdid = INVALID_PDID;
> >
> > + hwc->hwc_timeout = 0;
> > +
> > kfree(hwc);
> > gc->hwc.driver_data = NULL;
> > gc->hwc.gdma_context = NULL;
> > @@ -818,6 +839,7 @@ int mana_hwc_send_request(struct
> > hw_channel_context *hwc, u32 req_len,
> > dest_vrq = hwc->pf_dest_vrq_id;
> > dest_vrcq = hwc->pf_dest_vrcq_id;
> > }
> > + dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
>
> Can avoid dev_err here
I have fixed it in v2 patch
>
> >
> > err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
> > if (err) {
> > @@ -825,7 +847,8 @@ int mana_hwc_send_request(struct
> > hw_channel_context *hwc, u32 req_len,
> > goto out;
> > }
> >
> > - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> > + if (!wait_for_completion_timeout(&ctx->comp_event,
> > + (hwc->hwc_timeout / 1000) * HZ)) {
> > dev_err(hwc->dev, "HWC: Request timed out!\n");
> > err = -ETIMEDOUT;
> > goto out;
> > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index
> > 96c120160f15..88b6ef7ce1a6 100644
> > --- a/include/net/mana/gdma.h
> > +++ b/include/net/mana/gdma.h
> > @@ -33,6 +33,7 @@ enum gdma_request_type {
> > GDMA_DESTROY_PD = 30,
> > GDMA_CREATE_MR = 31,
> > GDMA_DESTROY_MR = 32,
> > + GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
> > };
> >
> > #define GDMA_RESOURCE_DOORBELL_PAGE 27
> > @@ -57,6 +58,8 @@ enum gdma_eqe_type {
> > GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
> > GDMA_EQE_HWC_INIT_DATA = 130,
> > GDMA_EQE_HWC_INIT_DONE = 131,
> > + GDMA_EQE_HWC_SOC_RECONFIG = 132,
> > + GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
> > };
> >
> > enum {
> > @@ -531,10 +534,12 @@ enum {
> > * so the driver is able to reliably support features like busy_poll.
> > */
> > #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
> > +#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
> >
> > #define GDMA_DRV_CAP_FLAGS1 \
> > (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> > - GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
> > + GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> > + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> >
> > #define GDMA_DRV_CAP_FLAGS2 0
> >
> > @@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
> > u32 alloc_res_id_on_creation;
> > }; /* HW DATA */
> >
> > +/* GDMA_QUERY_HWC_TIMEOUT */
> > +struct gdma_query_hwc_timeout_req {
> > + struct gdma_req_hdr hdr;
> > + u32 timeout_ms;
> > + u32 reserved;
> > +};
> > +
> > +struct gdma_query_hwc_timeout_resp {
> > + struct gdma_resp_hdr hdr;
> > + u32 timeout_ms;
> > + u32 reserved;
> > +};
> > +
> > enum atb_page_size {
> > ATB_PAGE_SIZE_4K,
> > ATB_PAGE_SIZE_8K,
> > diff --git a/include/net/mana/hw_channel.h
> > b/include/net/mana/hw_channel.h index 6a757a6e2732..3d3b5c881bc1
> > 100644
> > --- a/include/net/mana/hw_channel.h
> > +++ b/include/net/mana/hw_channel.h
> > @@ -23,6 +23,10 @@
> > #define HWC_INIT_DATA_PF_DEST_RQ_ID 10
> > #define HWC_INIT_DATA_PF_DEST_CQ_ID 11
> >
> > +#define HWC_DATA_CFG_HWC_TIMEOUT 1
> > +
> > +#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
> > +
> > /* Structures labeled with "HW DATA" are exchanged with the hardware. All
> > of
> > * them are naturally aligned and hence don't need __packed.
> > */
> > @@ -182,6 +186,7 @@ struct hw_channel_context {
> >
> > u32 pf_dest_vrq_id;
> > u32 pf_dest_vrcq_id;
> > + u32 hwc_timeout;
> >
> > struct hwc_caller_ctx *caller_ctx;
> > };
> > --
> > 2.34.1
^ permalink raw reply
* [PATCH V2 net] net: mana: Configure hwc timeout from hardware
From: Souradeep Chakrabarti @ 2023-07-07 9:45 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma
Cc: stable, schakrabarti, Souradeep Chakrabarti
At present hwc timeout value is a fixed value.
This patch sets the hwc timeout from the hardware.
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V1 -> V2:
* Added return check for mana_gd_query_hwc_timeout
* Removed dev_err from mana_gd_query_hwc_timeout
---
.../net/ethernet/microsoft/mana/gdma_main.c | 30 ++++++++++++++++++-
.../net/ethernet/microsoft/mana/hw_channel.c | 25 +++++++++++++++-
include/net/mana/gdma.h | 20 ++++++++++++-
include/net/mana/hw_channel.h | 5 ++++
4 files changed, 77 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..949c927c3a7e 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -106,6 +106,27 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
return 0;
}
+static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32 *timeout_val)
+{
+ struct gdma_context *gc = pci_get_drvdata(pdev);
+ struct gdma_query_hwc_timeout_req req = {};
+ struct gdma_query_hwc_timeout_resp resp = {};
+ int err;
+
+ mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
+ sizeof(req), sizeof(resp));
+ req.timeout_ms = *timeout_val;
+ err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+ if (err || resp.hdr.status)
+ return err ? err : -EPROTO;
+
+ *timeout_val = resp.timeout_ms;
+ dev_info(gc->dev, "Successfully changed the timeout value %u\n",
+ *timeout_val);
+
+ return 0;
+}
+
static int mana_gd_detect_devices(struct pci_dev *pdev)
{
struct gdma_context *gc = pci_get_drvdata(pdev);
@@ -879,6 +900,7 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_verify_ver_resp resp = {};
struct gdma_verify_ver_req req = {};
+ struct hw_channel_context *hwc = gc->hwc.driver_data;
int err;
mana_gd_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
@@ -907,7 +929,13 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
err, resp.hdr.status);
return err ? err : -EPROTO;
}
-
+ if (resp.pf_cap_flags1 & GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG) {
+ err = mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
+ if (err) {
+ dev_err(gc->dev, "Failed to set the hwc timeout %d\n", err);
+ return err;
+ }
+ }
return 0;
}
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 2bd1d74021f7..db433501e5e6 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue *q_self,
complete(&hwc->hwc_init_eqe_comp);
break;
+ case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
+ type_data.as_uint32 = event->details[0];
+ type = type_data.type;
+ val = type_data.value;
+
+ switch (type) {
+ case HWC_DATA_CFG_HWC_TIMEOUT:
+ hwc->hwc_timeout = val;
+ break;
+
+ default:
+ dev_warn(hwc->dev, "Received unknown reconfig type %u\n", type);
+ break;
+ }
+
+ break;
+
default:
+ dev_warn(hwc->dev, "Received unknown gdma event %u\n", event->type);
/* Ignore unknown events, which should never happen. */
break;
}
@@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context *gc)
gd->pdid = INVALID_PDID;
gd->doorbell = INVALID_DOORBELL;
+ hwc->hwc_timeout = HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
/* mana_hwc_init_queues() only creates the required data structures,
* and doesn't touch the HWC device.
*/
@@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
hwc->gdma_dev->doorbell = INVALID_DOORBELL;
hwc->gdma_dev->pdid = INVALID_PDID;
+ hwc->hwc_timeout = 0;
+
kfree(hwc);
gc->hwc.driver_data = NULL;
gc->hwc.gdma_context = NULL;
@@ -818,6 +839,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
dest_vrq = hwc->pf_dest_vrq_id;
dest_vrcq = hwc->pf_dest_vrcq_id;
}
+ dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
if (err) {
@@ -825,7 +847,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
goto out;
}
- if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
+ if (!wait_for_completion_timeout(&ctx->comp_event,
+ (hwc->hwc_timeout / 1000) * HZ)) {
dev_err(hwc->dev, "HWC: Request timed out!\n");
err = -ETIMEDOUT;
goto out;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 96c120160f15..88b6ef7ce1a6 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -33,6 +33,7 @@ enum gdma_request_type {
GDMA_DESTROY_PD = 30,
GDMA_CREATE_MR = 31,
GDMA_DESTROY_MR = 32,
+ GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
};
#define GDMA_RESOURCE_DOORBELL_PAGE 27
@@ -57,6 +58,8 @@ enum gdma_eqe_type {
GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
GDMA_EQE_HWC_INIT_DATA = 130,
GDMA_EQE_HWC_INIT_DONE = 131,
+ GDMA_EQE_HWC_SOC_RECONFIG = 132,
+ GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
};
enum {
@@ -531,10 +534,12 @@ enum {
* so the driver is able to reliably support features like busy_poll.
*/
#define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
+#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
#define GDMA_DRV_CAP_FLAGS1 \
(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
- GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
+ GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
+ GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
#define GDMA_DRV_CAP_FLAGS2 0
@@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
u32 alloc_res_id_on_creation;
}; /* HW DATA */
+/* GDMA_QUERY_HWC_TIMEOUT */
+struct gdma_query_hwc_timeout_req {
+ struct gdma_req_hdr hdr;
+ u32 timeout_ms;
+ u32 reserved;
+};
+
+struct gdma_query_hwc_timeout_resp {
+ struct gdma_resp_hdr hdr;
+ u32 timeout_ms;
+ u32 reserved;
+};
+
enum atb_page_size {
ATB_PAGE_SIZE_4K,
ATB_PAGE_SIZE_8K,
diff --git a/include/net/mana/hw_channel.h b/include/net/mana/hw_channel.h
index 6a757a6e2732..3d3b5c881bc1 100644
--- a/include/net/mana/hw_channel.h
+++ b/include/net/mana/hw_channel.h
@@ -23,6 +23,10 @@
#define HWC_INIT_DATA_PF_DEST_RQ_ID 10
#define HWC_INIT_DATA_PF_DEST_CQ_ID 11
+#define HWC_DATA_CFG_HWC_TIMEOUT 1
+
+#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
+
/* Structures labeled with "HW DATA" are exchanged with the hardware. All of
* them are naturally aligned and hence don't need __packed.
*/
@@ -182,6 +186,7 @@ struct hw_channel_context {
u32 pf_dest_vrq_id;
u32 pf_dest_vrcq_id;
+ u32 hwc_timeout;
struct hwc_caller_ctx *caller_ctx;
};
--
2.34.1
^ permalink raw reply related
* RE: [EXTERNAL] [PATCH V2 2/9] x86/hyperv: Set Virtual Trust Level in VMBus init message
From: Saurabh Singh Sengar @ 2023-07-07 9:07 UTC (permalink / raw)
To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
daniel.lezcano@linaro.org, arnd@arndb.de, Michael Kelley (LINUX)
Cc: Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
In-Reply-To: <20230627032248.2170007-3-ltykernel@gmail.com>
> -----Original Message-----
> From: Tianyu Lan <ltykernel@gmail.com>
> Sent: Tuesday, June 27, 2023 8:53 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; daniel.lezcano@linaro.org; arnd@arndb.de; Michael Kelley
> (LINUX) <mikelley@microsoft.com>
> Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>; linux-arch@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org;
> vkuznets@redhat.com
> Subject: [EXTERNAL] [PATCH V2 2/9] x86/hyperv: Set Virtual Trust Level in
> VMBus init message
>
> From: Tianyu Lan <tiala@microsoft.com>
>
> SEV-SNP guest provides vtl(Virtual Trust Level) and get it from Hyper-V hvcall
> via register hvcall HVCALL_ GET_VP_REGISTERS.
>
> During initialization of VMBus, vtl needs to be set in the VMBus init message.
>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 36 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/hyperv-tlfs.h | 7 ++++++
> drivers/hv/connection.c | 1 +
> include/asm-generic/mshyperv.h | 1 +
> include/linux/hyperv.h | 4 ++--
> 5 files changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index
> 6c04b52f139b..1ba367a9686e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -378,6 +378,40 @@ static void __init hv_get_partition_id(void)
> local_irq_restore(flags);
> }
>
> +static u8 __init get_vtl(void)
> +{
> + u64 control = HV_HYPERCALL_REP_COMP_1 |
> HVCALL_GET_VP_REGISTERS;
> + struct hv_get_vp_registers_input *input;
> + struct hv_get_vp_registers_output *output;
> + u64 vtl = 0;
> + u64 ret;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = (struct hv_get_vp_registers_output *)input;
> + if (!input) {
> + local_irq_restore(flags);
> + goto done;
> + }
> +
> + memset(input, 0, struct_size(input, element, 1));
> + input->header.partitionid = HV_PARTITION_ID_SELF;
> + input->header.vpindex = HV_VP_INDEX_SELF;
> + input->header.inputvtl = 0;
> + input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
> +
> + ret = hv_do_hypercall(control, input, output);
> + if (hv_result_success(ret))
> + vtl = output->as64.low & HV_X64_VTL_MASK;
> + else
> + pr_err("Hyper-V: failed to get VTL! %lld", ret);
In case of error this function will return vtl=0, which can be the valid value of vtl.
I suggest we initialize vtl with -1 so and then check for its return.
This could be a good utility function which can be used for any Hyper-V VTL system, so think
of making it global ?
- Saurabh
> + local_irq_restore(flags);
> +
> +done:
> + return vtl;
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -506,6 +540,8 @@ void __init hyperv_init(void)
> /* Query the VMs extended capability once, so that it can be cached.
> */
> hv_query_ext_cap(0);
>
> + /* Find the VTL */
> + ms_hyperv.vtl = get_vtl();
> return;
>
> clean_guest_os_id:
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h
> b/arch/x86/include/asm/hyperv-tlfs.h
> index cea95dcd27c2..4bf0b315b0ce 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -301,6 +301,13 @@ enum hv_isolation_type {
> #define HV_X64_MSR_TIME_REF_COUNT
> HV_REGISTER_TIME_REF_COUNT
> #define HV_X64_MSR_REFERENCE_TSC
> HV_REGISTER_REFERENCE_TSC
>
> +/*
> + * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
> + * there is not associated MSR address.
> + */
> +#define HV_X64_REGISTER_VSM_VP_STATUS 0x000D0003
> +#define HV_X64_VTL_MASK GENMASK(3, 0)
> +
> /* Hyper-V memory host visibility */
> enum hv_mem_host_visibility {
> VMBUS_PAGE_NOT_VISIBLE = 0,
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index
> 5978e9dbc286..02b54f85dc60 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct
> vmbus_channel_msginfo *msginfo, u32 version)
> */
> if (version >= VERSION_WIN10_V5) {
> msg->msg_sint = VMBUS_MESSAGE_SINT;
> + msg->msg_vtl = ms_hyperv.vtl;
> vmbus_connection.msg_conn_id =
> VMBUS_MESSAGE_CONNECTION_ID_4;
> } else {
> msg->interrupt_page =
> virt_to_phys(vmbus_connection.int_page);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-
> generic/mshyperv.h index 6b5c41f90398..f73a044ecaa7 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -54,6 +54,7 @@ struct ms_hyperv_info {
> };
> };
> u64 shared_gpa_boundary;
> + u8 vtl;
> };
> extern struct ms_hyperv_info ms_hyperv; extern bool hv_nested; diff --git
> a/include/linux/hyperv.h b/include/linux/hyperv.h index
> bfbc37ce223b..1f2bfec4abde 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
> u64 interrupt_page;
> struct {
> u8 msg_sint;
> - u8 padding1[3];
> - u32 padding2;
> + u8 msg_vtl;
> + u8 reserved[6];
> };
> };
> u64 monitor_page1;
> --
> 2.25.1
^ permalink raw reply
* RE: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving packets
From: Long Li @ 2023-07-06 17:51 UTC (permalink / raw)
To: Paolo Abeni, Jakub Kicinski
Cc: Greg KH, longli@linuxonhyperv.com, Jason Gunthorpe,
Leon Romanovsky, Ajay Sharma, Dexuan Cui, KY Srinivasan,
Haiyang Zhang, Wei Liu, David S. Miller, Eric Dumazet,
linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <8fb0c81c022d58d3f08082764038d17cfc849ba1.camel@redhat.com>
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving
> packets
>
> On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote:
> > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX
> > > > > > > > > queue doorbell on receiving packets
> > > > > > > > >
> > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote:
> > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those
> > > > > > > > > > > > > > > > > > > > > kernels are longterm) They need
> > > > > > > > > > > > > > > > > > > > > this fix to achieve the
> > > > > > > > > > > > > > > > > > > > > performance target.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Why can't they be upgraded to get that
> > > > > > > > > > > > > > > > > performance target, and all the other
> > > > > > > > > > > > > > > > > goodness that those kernels have? We
> > > > > > > > > > > > > > > > > don't normally backport new features,
> > > > > > > > > > > > > > > > > right?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this should be considered as a fix, not
> > > > > > > > > > > > > a new feature.
> > > > > > > > > > > > >
> > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the
> > > > > > > > > > > > > start. Due to lack of hardware testing
> > > > > > > > > > > > > capability at early stage of the project, we
> > > > > > > > > > > > > could only test 100GB for the Linux driver. When
> > > > > > > > > > > > > hardware is fully capable of reaching designed
> > > > > > > > > > > > > spec, this bug in the Linux driver shows up.
> > > > > > > > >
> > > > > > > > > That part we understand.
> > > > > > > > >
> > > > > > > > > If I were you I'd try to convince Greg and Paolo that
> > > > > > > > > the change is small and significant for user experience.
> > > > > > > > > And answer Greg's question why upgrading the kernel past
> > > > > > > > > 6.1 is a challenge in your environment.
> > > > >
> > > > > I was under the impression that this patch was considered to be
> > > > > a feature, not a bug fix. I was trying to justify that the
> > > > > "Fixes:" tag was needed.
> > > > >
> > > > > I apologize for misunderstanding this.
> > > > >
> > > > > Without this fix, it's not possible to run a typical workload
> > > > > designed for 200Gb physical link speed.
> > > > >
> > > > > We see a large number of customers and Linux distributions
> > > > > committed on 5.15 and 6.1 kernels. They planned the product
> > > > > cycles and certification processes around these longterm kernel
> > > > > versions. It's difficult for them to upgrade to newer kernel
> > > > > versions.
>
> I think there are some misunderstanding WRT distros and stable kernels.
> (Commercial) distros will backport the patch as needed, regardless such patch
> landing in the 5.15 upstream tree or not. Individual users running their own
> vanilla 5.15 kernel can't expect performance improvement landing there.
>
> All in all I feel undecided. I would endorse this change going trough net-next
> (without the stable tag). I would feel less torn with this change targeting -net
> without the stable tag. Targeting -net with the stable tag sounds a bit too much to
> me.
>
> Cheers,
> Paolo
I'm sending this patch to net-next without stable tag.
Thanks,
Long
^ permalink raw reply
* [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared
From: Michael Kelley @ 2023-07-06 16:41 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
luto, peterz, thomas.lendacky, sathyanarayanan.kuppuswamy,
kirill.shutemov, seanjc, rick.p.edgecombe, linux-kernel,
linux-hyperv, x86
Cc: mikelley
In a CoCo VM when a page transitions from private to shared, or vice
versa, attributes in the PTE must be updated *and* the hypervisor must
be notified of the change. Because there are two separate steps, there's
a window where the settings are inconsistent. Normally the code that
initiates the transition (via set_memory_decrypted() or
set_memory_encrypted()) ensures that the memory is not being accessed
during a transition, so the window of inconsistency is not a problem.
However, the load_unaligned_zeropad() function can read arbitrary memory
pages at arbitrary times, which could access a transitioning page during
the window. In such a case, CoCo VM specific exceptions are taken
(depending on the CoCo architecture in use). Current code in those
exception handlers recovers and does "fixup" on the result returned by
load_unaligned_zeropad(). Unfortunately, this exception handling and
fixup code is tricky and somewhat fragile. At the moment, it is
broken for both TDX and SEV-SNP.
There's also a problem with the current code in paravisor scenarios:
TDX Partitioning and SEV-SNP in vTOM mode. The exceptions need
to be forwarded from the paravisor to the Linux guest, but there
are no architectural specs for how to do that.
To avoid these complexities of the CoCo exception handlers, change
the core transition code in __set_memory_enc_pgtable() to do the
following:
1. Remove aliasing mappings
2. Remove the PRESENT bit from the PTEs of all transitioning pages
3. Flush the TLB globally
4. Flush the data cache if needed
5. Set/clear the encryption attribute as appropriate
6. Notify the hypervisor of the page status change
7. Add back the PRESENT bit
With this approach, load_unaligned_zeropad() just takes its normal
page-fault-based fixup path if it touches a page that is transitioning.
As a result, load_unaligned_zeropad() and CoCo VM page transitioning
are completely decoupled. CoCo VM page transitions can proceed
without needing to handle architecture-specific exceptions and fix
things up. This decoupling reduces the complexity due to separate
TDX and SEV-SNP fixup paths, and gives more freedom to revise and
introduce new capabilities in future versions of the TDX and SEV-SNP
architectures. Paravisor scenarios work properly without needing
to forward exceptions.
This approach may make __set_memory_enc_pgtable() slightly slower
because of touching the PTEs three times instead of just once. But
the run time of this function is already dominated by the hypercall
and the need to flush the TLB at least once and maybe twice. In any
case, this function is only used for CoCo VM page transitions, and
is already unsuitable for hot paths.
The architecture specific callback function for notifying the
hypervisor typically must translate guest kernel virtual addresses
into guest physical addresses to pass to the hypervisor. Because
the PTEs are invalid at the time of callback, the code for doing the
translation needs updating. virt_to_phys() or equivalent continues
to work for direct map addresses. But vmalloc addresses cannot use
vmalloc_to_page() because that function requires the leaf PTE to be
valid. Instead, slow_virt_to_phys() must be used. Both functions
manually walk the page table hierarchy, so performance is the same.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
I'm assuming the TDX handling of the data cache flushing is the
same with this new approach, and that it doesn't need to be paired
with a TLB flush as in the current code. If that's not a correct
assumption, let me know.
I've left the two hypervisor callbacks, before and after Step 5
above. If the PTEs are invalid, it seems like the order of Step 5
and Step 6 shouldn't matter, so perhaps one of the callback could
be dropped. Let me know if there are reasons to do otherwise.
It may well be possible to optimize the new implementation of
__set_memory_enc_pgtable(). The existing set_memory_np() and
set_memory_p() functions do all the right things in a very clear
fashion, but perhaps not as optimally as having all three PTE
manipulations directly in the same function. It doesn't appear
that optimizing the performance really matters here, but I'm open
to suggestions.
I've tested this on TDX VMs and SEV-SNP + vTOM VMs. I can also
test on SEV-SNP VMs without vTOM. But I'll probably need help
covering SEV and SEV-ES VMs.
This RFC patch does *not* remove code that would no longer be
needed. If there's agreement to take this new approach, I'll
add patches to remove the unneeded code.
This patch is built against linux-next20230704.
arch/x86/hyperv/ivm.c | 3 ++-
arch/x86/kernel/sev.c | 2 +-
arch/x86/mm/pat/set_memory.c | 32 ++++++++++++--------------------
3 files changed, 15 insertions(+), 22 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 28be6df..2859ec3 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -308,7 +308,8 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
return false;
for (i = 0, pfn = 0; i < pagecount; i++) {
- pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+ pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
+ i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;
pfn++;
if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1ee7bed..59db55e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -784,7 +784,7 @@ static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long
hdr->end_entry = i;
if (is_vmalloc_addr((void *)vaddr)) {
- pfn = vmalloc_to_pfn((void *)vaddr);
+ pfn = slow_virt_to_phys((void *)vaddr) >> PAGE_SHIFT;
use_large_entry = false;
} else {
pfn = __pa(vaddr) >> PAGE_SHIFT;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f12..8a194c7 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2136,6 +2136,11 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
addr &= PAGE_MASK;
+ /* set_memory_np() removes aliasing mappings and flushes the TLB */
+ ret = set_memory_np(addr, numpages);
+ if (ret)
+ return ret;
+
memset(&cpa, 0, sizeof(cpa));
cpa.vaddr = &addr;
cpa.numpages = numpages;
@@ -2143,36 +2148,23 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
cpa.mask_clr = enc ? pgprot_decrypted(empty) : pgprot_encrypted(empty);
cpa.pgd = init_mm.pgd;
- /* Must avoid aliasing mappings in the highmem code */
- kmap_flush_unused();
- vm_unmap_aliases();
-
/* Flush the caches as needed before changing the encryption attribute. */
- if (x86_platform.guest.enc_tlb_flush_required(enc))
- cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required());
+ if (x86_platform.guest.enc_cache_flush_required())
+ cpa_flush(&cpa, 1);
/* Notify hypervisor that we are about to set/clr encryption attribute. */
if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc))
return -EIO;
ret = __change_page_attr_set_clr(&cpa, 1);
-
- /*
- * After changing the encryption attribute, we need to flush TLBs again
- * in case any speculative TLB caching occurred (but no need to flush
- * caches again). We could just use cpa_flush_all(), but in case TLB
- * flushing gets optimized in the cpa_flush() path use the same logic
- * as above.
- */
- cpa_flush(&cpa, 0);
+ if (ret)
+ return ret;
/* Notify hypervisor that we have successfully set/clr encryption attribute. */
- if (!ret) {
- if (!x86_platform.guest.enc_status_change_finish(addr, numpages, enc))
- ret = -EIO;
- }
+ if (!x86_platform.guest.enc_status_change_finish(addr, numpages, enc))
+ return -EIO;
- return ret;
+ return set_memory_p(&addr, numpages);
}
static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
--
1.8.3.1
^ permalink raw reply related
* RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Haiyang Zhang @ 2023-07-06 13:54 UTC (permalink / raw)
To: Alexander Lobakin, Souradeep Chakrabarti, souradeep chakrabarti
Cc: KY Srinivasan, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <cd36e39a-ebb9-706a-87c3-2f76de82f7ca@intel.com>
> -----Original Message-----
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Sent: Thursday, July 6, 2023 7:49 AM
> To: Souradeep Chakrabarti <schakrabarti@microsoft.com>; souradeep
> chakrabarti <schakrabarti@linux.microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> when host is unresponsive
>
> From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
> Date: Thu, 6 Jul 2023 11:43:58 +0000
>
> >
> >
> >> -----Original Message-----
> >> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> >> Sent: Thursday, July 6, 2023 5:09 PM
> >> To: Souradeep Chakrabarti <schakrabarti@microsoft.com>; souradeep
> >> chakrabarti <schakrabarti@linux.microsoft.com>
> >> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> >> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> >> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> >> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>;
> Ajay
> >> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> >> cai.huoqing@linux.dev; ssengar@linux.microsoft.com;
> vkuznets@redhat.com;
> >> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> >> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
> >> stable@vger.kernel.org
> >> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> >> when host is unresponsive
> >>
> >> From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
> >> Date: Thu, 6 Jul 2023 10:41:03 +0000
> >>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> >>>> Sent: Wednesday, July 5, 2023 8:06 PM
> >>
> >> [...]
> >>
> >>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
> >>>>>> iterations. I know usleep_range() often is much less precise, but still.
> >>>>>> Do you really need that much time? Has this been measured during
> >>>>>> the tests that it can take up to 120 seconds or is it just some
> >>>>>> random value that "should be enough"?
> >>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
> >>>>>> work instead of wasting resources.
> >>>>> Here the intent is not waiting for 120 seconds, rather than avoid
> >>>>> continue checking the pending_sends of each tx queues for an
> >>>>> indefinite time,
> >>>> before freeing sk_buffs.
> >>>>> The pending_sends can only get decreased for a tx queue, if
> >>>>> mana_poll_tx_cq() gets called for a completion notification and
> >>>>> increased by
> >>>> xmit.
> >>>>>
> >>>>> In this particular bug, apc->port_is_up is not set to false, causing
> >>>>> xmit to keep increasing the pending_sends for the queue and
> >>>>> mana_poll_tx_cq() not getting called for the queue.
> >>>>>
> >>>>> If we see the comment in the function mana_dealloc_queues(), it
> mentions
> >> it :
> >>>>>
> >>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is
> false.
> >>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-
> enable
> >>>>> 2348 * a txq because it may not timely see apc->port_is_up being
> cleared
> >>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops
> any
> >>>>> 2350 * new packets due to apc->port_is_up being false.
> >>>>>
> >>>>> The value 120 seconds has been decided here based on maximum
> number
> >>>>> of queues
> >>>>
> >>>> This is quite opposite to what you're saying above. How should I
> >>>> connect these
> >>>> two:
> >>>>
> >>>> Here the intent is not waiting for 120 seconds
> >>>>
> >>>> +
> >>>>
> >>>> The value 120 seconds has been decided here based on maximum number
> >>>> of queues
> >>>>
> >>>> ?
> >>>> Can cleaning the Tx queues really last for 120 seconds?
> >>>> My understanding is that timeouts need to be sensible and not go to
> >>>> the outer space. What is the medium value you got during the tests?
> >>>>
> >>> For each queue each takes few milli second, in a normal condition. So
> >>> based on maximum number of allowed queues for our h/w it won't go
> >>> beyond a sec.
> >>> The 120s only happens rarely during some NIC HW issue -unexpected.
> >>> So this timeout will only trigger in a very rare scenario.
> >>
> >> So set the timeout to 2 seconds if it makes no difference?
> > It can go near 120 seconds in a very rare MANA h/w scenario. That normally
> won't happen.
> > But during that scenario, we may need 120 seconds.
>
> This waiting loop is needed to let the pending Tx packets be sent. If
> they weren't sent in 1 second, it most likely makes no sense already
> whether they will be sent at all or not -- the destination host won't
> wait for them for so long.
> You say that it may happen only in case of HW issue. If so, I assume you
> need to fix it some way, e.g. do a HW reset or so? If so, why bother
> waiting for Tx completions if Tx is hung? You free all skbs later either
> way, so there are no leaks.
At that point, we don't actually care if the pending packets are sent or not.
But if we free the queues too soon, and the HW is slow for unexpected
reasons, a delayed completion notice will DMA into the freed memory and
cause corruption. That's why we have a longer waiting time.
Souradeep, you may double check with hostnet team to see what's the
best waiting time to ensure no more HW activities.
Thanks,
- Haiyang
^ permalink raw reply
* Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Alexander Lobakin @ 2023-07-06 11:48 UTC (permalink / raw)
To: Souradeep Chakrabarti, souradeep chakrabarti
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <PUZP153MB0788C7D2376F3271D77CE826CC2CA@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Date: Thu, 6 Jul 2023 11:43:58 +0000
>
>
>> -----Original Message-----
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Sent: Thursday, July 6, 2023 5:09 PM
>> To: Souradeep Chakrabarti <schakrabarti@microsoft.com>; souradeep
>> chakrabarti <schakrabarti@linux.microsoft.com>
>> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
>> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
>> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
>> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
>> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
>> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
>> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
>> stable@vger.kernel.org
>> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
>> when host is unresponsive
>>
>> From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
>> Date: Thu, 6 Jul 2023 10:41:03 +0000
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> Sent: Wednesday, July 5, 2023 8:06 PM
>>
>> [...]
>>
>>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>>>>> iterations. I know usleep_range() often is much less precise, but still.
>>>>>> Do you really need that much time? Has this been measured during
>>>>>> the tests that it can take up to 120 seconds or is it just some
>>>>>> random value that "should be enough"?
>>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>>>>> work instead of wasting resources.
>>>>> Here the intent is not waiting for 120 seconds, rather than avoid
>>>>> continue checking the pending_sends of each tx queues for an
>>>>> indefinite time,
>>>> before freeing sk_buffs.
>>>>> The pending_sends can only get decreased for a tx queue, if
>>>>> mana_poll_tx_cq() gets called for a completion notification and
>>>>> increased by
>>>> xmit.
>>>>>
>>>>> In this particular bug, apc->port_is_up is not set to false, causing
>>>>> xmit to keep increasing the pending_sends for the queue and
>>>>> mana_poll_tx_cq() not getting called for the queue.
>>>>>
>>>>> If we see the comment in the function mana_dealloc_queues(), it mentions
>> it :
>>>>>
>>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>>>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>>>>> 2350 * new packets due to apc->port_is_up being false.
>>>>>
>>>>> The value 120 seconds has been decided here based on maximum number
>>>>> of queues
>>>>
>>>> This is quite opposite to what you're saying above. How should I
>>>> connect these
>>>> two:
>>>>
>>>> Here the intent is not waiting for 120 seconds
>>>>
>>>> +
>>>>
>>>> The value 120 seconds has been decided here based on maximum number
>>>> of queues
>>>>
>>>> ?
>>>> Can cleaning the Tx queues really last for 120 seconds?
>>>> My understanding is that timeouts need to be sensible and not go to
>>>> the outer space. What is the medium value you got during the tests?
>>>>
>>> For each queue each takes few milli second, in a normal condition. So
>>> based on maximum number of allowed queues for our h/w it won't go
>>> beyond a sec.
>>> The 120s only happens rarely during some NIC HW issue -unexpected.
>>> So this timeout will only trigger in a very rare scenario.
>>
>> So set the timeout to 2 seconds if it makes no difference?
> It can go near 120 seconds in a very rare MANA h/w scenario. That normally won't happen.
> But during that scenario, we may need 120 seconds.
This waiting loop is needed to let the pending Tx packets be sent. If
they weren't sent in 1 second, it most likely makes no sense already
whether they will be sent at all or not -- the destination host won't
wait for them for so long.
You say that it may happen only in case of HW issue. If so, I assume you
need to fix it some way, e.g. do a HW reset or so? If so, why bother
waiting for Tx completions if Tx is hung? You free all skbs later either
way, so there are no leaks.
>>
>>>>> are allowed in this specific hardware, it is a safe assumption.
>>>>>>
>>>>>>>
>>>>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>>>>> + txq = &apc->tx_qp[i].txq;
>>>>>>> + cq = &apc->tx_qp[i].tx_cq;
>> [...]
>>
>> Thanks,
>> Olek
Thanks,
Olek
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Souradeep Chakrabarti @ 2023-07-06 11:43 UTC (permalink / raw)
To: Alexander Lobakin, souradeep chakrabarti
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <c7e4a416-9da4-7ff2-2223-589fd66f557d@intel.com>
>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@intel.com>
>Sent: Thursday, July 6, 2023 5:09 PM
>To: Souradeep Chakrabarti <schakrabarti@microsoft.com>; souradeep
>chakrabarti <schakrabarti@linux.microsoft.com>
>Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
><haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
><decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
>kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
>Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
>cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
>tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
>linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
>stable@vger.kernel.org
>Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
>when host is unresponsive
>
>From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
>Date: Thu, 6 Jul 2023 10:41:03 +0000
>
>>
>>
>>> -----Original Message-----
>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Sent: Wednesday, July 5, 2023 8:06 PM
>
>[...]
>
>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>>>> iterations. I know usleep_range() often is much less precise, but still.
>>>>> Do you really need that much time? Has this been measured during
>>>>> the tests that it can take up to 120 seconds or is it just some
>>>>> random value that "should be enough"?
>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>>>> work instead of wasting resources.
>>>> Here the intent is not waiting for 120 seconds, rather than avoid
>>>> continue checking the pending_sends of each tx queues for an
>>>> indefinite time,
>>> before freeing sk_buffs.
>>>> The pending_sends can only get decreased for a tx queue, if
>>>> mana_poll_tx_cq() gets called for a completion notification and
>>>> increased by
>>> xmit.
>>>>
>>>> In this particular bug, apc->port_is_up is not set to false, causing
>>>> xmit to keep increasing the pending_sends for the queue and
>>>> mana_poll_tx_cq() not getting called for the queue.
>>>>
>>>> If we see the comment in the function mana_dealloc_queues(), it mentions
>it :
>>>>
>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>>>> 2350 * new packets due to apc->port_is_up being false.
>>>>
>>>> The value 120 seconds has been decided here based on maximum number
>>>> of queues
>>>
>>> This is quite opposite to what you're saying above. How should I
>>> connect these
>>> two:
>>>
>>> Here the intent is not waiting for 120 seconds
>>>
>>> +
>>>
>>> The value 120 seconds has been decided here based on maximum number
>>> of queues
>>>
>>> ?
>>> Can cleaning the Tx queues really last for 120 seconds?
>>> My understanding is that timeouts need to be sensible and not go to
>>> the outer space. What is the medium value you got during the tests?
>>>
>> For each queue each takes few milli second, in a normal condition. So
>> based on maximum number of allowed queues for our h/w it won't go
>> beyond a sec.
>> The 120s only happens rarely during some NIC HW issue -unexpected.
>> So this timeout will only trigger in a very rare scenario.
>
>So set the timeout to 2 seconds if it makes no difference?
It can go near 120 seconds in a very rare MANA h/w scenario. That normally won't happen.
But during that scenario, we may need 120 seconds.
>
>>>> are allowed in this specific hardware, it is a safe assumption.
>>>>>
>>>>>>
>>>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>>>> + txq = &apc->tx_qp[i].txq;
>>>>>> + cq = &apc->tx_qp[i].tx_cq;
>[...]
>
>Thanks,
>Olek
^ permalink raw reply
* Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Alexander Lobakin @ 2023-07-06 11:39 UTC (permalink / raw)
To: Souradeep Chakrabarti, souradeep chakrabarti
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <PUZP153MB0788A5F92E65AC9A98AF03AFCC2CA@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Date: Thu, 6 Jul 2023 10:41:03 +0000
>
>
>> -----Original Message-----
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Sent: Wednesday, July 5, 2023 8:06 PM
[...]
>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>>> iterations. I know usleep_range() often is much less precise, but still.
>>>> Do you really need that much time? Has this been measured during the
>>>> tests that it can take up to 120 seconds or is it just some random
>>>> value that "should be enough"?
>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>>> work instead of wasting resources.
>>> Here the intent is not waiting for 120 seconds, rather than avoid
>>> continue checking the pending_sends of each tx queues for an indefinite time,
>> before freeing sk_buffs.
>>> The pending_sends can only get decreased for a tx queue, if
>>> mana_poll_tx_cq() gets called for a completion notification and increased by
>> xmit.
>>>
>>> In this particular bug, apc->port_is_up is not set to false, causing
>>> xmit to keep increasing the pending_sends for the queue and
>>> mana_poll_tx_cq() not getting called for the queue.
>>>
>>> If we see the comment in the function mana_dealloc_queues(), it mentions it :
>>>
>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>>> 2350 * new packets due to apc->port_is_up being false.
>>>
>>> The value 120 seconds has been decided here based on maximum number of
>>> queues
>>
>> This is quite opposite to what you're saying above. How should I connect these
>> two:
>>
>> Here the intent is not waiting for 120 seconds
>>
>> +
>>
>> The value 120 seconds has been decided here based on maximum number of
>> queues
>>
>> ?
>> Can cleaning the Tx queues really last for 120 seconds?
>> My understanding is that timeouts need to be sensible and not go to the outer
>> space. What is the medium value you got during the tests?
>>
> For each queue each takes few milli second, in a normal condition. So
> based on maximum number of allowed queues for our h/w it won't
> go beyond a sec.
> The 120s only happens rarely during some NIC HW issue -unexpected.
> So this timeout will only trigger in a very rare scenario.
So set the timeout to 2 seconds if it makes no difference?
>>> are allowed in this specific hardware, it is a safe assumption.
>>>>
>>>>>
>>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>>> + txq = &apc->tx_qp[i].txq;
>>>>> + cq = &apc->tx_qp[i].tx_cq;
[...]
Thanks,
Olek
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Souradeep Chakrabarti @ 2023-07-06 10:41 UTC (permalink / raw)
To: Alexander Lobakin, souradeep chakrabarti
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <7e316b51-be46-96db-84cb-addd28d90b0f@intel.com>
>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@intel.com>
>Sent: Wednesday, July 5, 2023 8:06 PM
>To: Souradeep Chakrabarti <schakrabarti@microsoft.com>; souradeep
>chakrabarti <schakrabarti@linux.microsoft.com>
>Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
><haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
><decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
>kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
>Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
>cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
>tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
>linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
>stable@vger.kernel.org
>Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
>when host is unresponsive
>
>[You don't often get email from aleksander.lobakin@intel.com. Learn why this is
>important at https://aka.ms/LearnAboutSenderIdentification ]
>
>From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
>Date: Mon, 3 Jul 2023 19:55:06 +0000
>
>>
>>
>>> -----Original Message-----
>>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Sent: Monday, July 3, 2023 10:18 PM
>
>[...]
>
>>>> for (i = 0; i < apc->num_queues; i++) {
>>>> txq = &apc->tx_qp[i].txq;
>>>> -
>>>> - while (atomic_read(&txq->pending_sends) > 0)
>>>> + while (atomic_read(&txq->pending_sends) > 0 &&
>>>> + time_before(jiffies, timeout)) {
>>>> usleep_range(1000, 2000);> + }
>>>> }
>>>
>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>> iterations. I know usleep_range() often is much less precise, but still.
>>> Do you really need that much time? Has this been measured during the
>>> tests that it can take up to 120 seconds or is it just some random
>>> value that "should be enough"?
>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>> work instead of wasting resources.
>> Here the intent is not waiting for 120 seconds, rather than avoid
>> continue checking the pending_sends of each tx queues for an indefinite time,
>before freeing sk_buffs.
>> The pending_sends can only get decreased for a tx queue, if
>> mana_poll_tx_cq() gets called for a completion notification and increased by
>xmit.
>>
>> In this particular bug, apc->port_is_up is not set to false, causing
>> xmit to keep increasing the pending_sends for the queue and
>> mana_poll_tx_cq() not getting called for the queue.
>>
>> If we see the comment in the function mana_dealloc_queues(), it mentions it :
>>
>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>> 2350 * new packets due to apc->port_is_up being false.
>>
>> The value 120 seconds has been decided here based on maximum number of
>> queues
>
>This is quite opposite to what you're saying above. How should I connect these
>two:
>
>Here the intent is not waiting for 120 seconds
>
>+
>
>The value 120 seconds has been decided here based on maximum number of
>queues
>
>?
>Can cleaning the Tx queues really last for 120 seconds?
>My understanding is that timeouts need to be sensible and not go to the outer
>space. What is the medium value you got during the tests?
>
For each queue each takes few milli second, in a normal condition. So
based on maximum number of allowed queues for our h/w it won't
go beyond a sec.
The 120s only happens rarely during some NIC HW issue -unexpected.
So this timeout will only trigger in a very rare scenario.
>> are allowed in this specific hardware, it is a safe assumption.
>>>
>>>>
>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>> + txq = &apc->tx_qp[i].txq;
>>>> + cq = &apc->tx_qp[i].tx_cq;
>>>
>>> cq can be just &txq->tx_cq.
>> mana_txq structure does not have a pointer to mana_cq.
>
>Sorry, misread, my bad.
>
>>>
>>>> + while (atomic_read(&txq->pending_sends)) {
>>>> + skb = skb_dequeue(&txq->pending_skbs);
>>>> + mana_unmap_skb(skb, apc);
>>>> + napi_consume_skb(skb, cq->budget);
>>>
>>> (you already have comment about this one)
>>>
>>>> + atomic_sub(1, &txq->pending_sends);
>>>> + }
>>>> + }
>>>> /* We're 100% sure the queues can no longer be woken up, because
>>>> * we're sure now mana_poll_tx_cq() can't be running.
>>>> */
>>>
>>> Thanks,
>>> Olek
>Thanks,
>Olek
^ permalink raw reply
* Re: Hyper-V vsock streams do not fill the supplied buffer in full
From: Stefano Garzarella @ 2023-07-06 10:01 UTC (permalink / raw)
To: Gary Guo, Dexuan Cui
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, linux-hyperv,
virtualization, netdev, linux-kernel
In-Reply-To: <20230704234532.532c8ee7.gary@garyguo.net>
Hi Gary,
On Wed, Jul 5, 2023 at 12:45 AM Gary Guo <gary@garyguo.net> wrote:
>
> When a vsock stream is called with recvmsg with a buffer, it only fills
> the buffer with data from the first single VM packet. Even if there are
> more VM packets at the time and the buffer is still not completely
> filled, it will just leave the buffer partially filled.
>
> This causes some issues when in WSLD which uses the vsock in
> non-blocking mode and uses epoll.
>
> For stream-oriented sockets, the epoll man page [1] says that
>
> > For stream-oriented files (e.g., pipe, FIFO, stream socket),
> > the condition that the read/write I/O space is exhausted can
> > also be detected by checking the amount of data read from /
> > written to the target file descriptor. For example, if you
> > call read(2) by asking to read a certain amount of data and
> > read(2) returns a lower number of bytes, you can be sure of
> > having exhausted the read I/O space for the file descriptor.
>
> This has been used as an optimisation in the wild for reducing number
> of syscalls required for stream sockets (by asserting that the socket
> will not have to polled to EAGAIN in edge-trigger mode, if the buffer
> given to recvmsg is not filled completely). An example is Tokio, which
> starting in v1.21.0 [2].
>
> When this optimisation combines with the behaviour of Hyper-V vsock, it
> causes issue in this scenario:
> * the VM host send data to the guest, and it's splitted into multiple
> VM packets
> * sk_data_ready is called and epoll returns, notifying the userspace
> that the socket is ready
> * userspace call recvmsg with a buffer, and it's partially filled
> * userspace assumes that the stream socket is depleted, and if new data
> arrives epoll will notify it again.
> * kernel always considers the socket to be ready, and since it's in
> edge-trigger mode, the epoll instance will never be notified again.
>
> This different realisation of the readiness causes the userspace to
> block forever.
Thanks for the detailed description of the problem.
I think we should fix the hvs_stream_dequeue() in
net/vmw_vsock/hyperv_transport.c.
We can do something similar to what we do in
virtio_transport_stream_do_dequeue() in
net/vmw_vsock/virtio_transport_common.c
@Dexuan WDYT?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH V5 net] net: mana: Fix MANA VF unload when hardware is unresponsive
From: Paolo Abeni @ 2023-07-06 8:18 UTC (permalink / raw)
To: Souradeep Chakrabarti, kys, haiyangz, wei.liu, decui, davem,
edumazet, kuba, longli, sharmaajay, leon, cai.huoqing, ssengar,
vkuznets, tglx, linux-hyperv, netdev, linux-kernel, linux-rdma
Cc: stable, schakrabarti
In-Reply-To: <1688544973-2507-1-git-send-email-schakrabarti@linux.microsoft.com>
On Wed, 2023-07-05 at 01:16 -0700, Souradeep Chakrabarti wrote:
> When unloading the MANA driver, mana_dealloc_queues() waits for the MANA
> hardware to complete any inflight packets and set the pending send count
> to zero. But if the hardware has failed, mana_dealloc_queues()
> could wait forever.
>
> Fix this by adding a timeout to the wait. Set the timeout to 120 seconds,
> which is a somewhat arbitrary value that is more than long enough for
> functional hardware to complete any sends.
>
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> ---
> V4 -> V5:
> * Added fixes tag
> * Changed the usleep_range from static to incremental value.
> * Initialized timeout in the begining.
> ---
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
the changelog should come after the SoB tag, and there should be no '--
- ' separator before the SoB.
Please double-check your patch with the checkpatch script before the
next submission, it should catch trivial issues as the above one.
> ---
> drivers/net/ethernet/microsoft/mana/mana_en.c | 30 ++++++++++++++++---
> 1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index a499e460594b..56b7074db1a2 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2345,9 +2345,13 @@ int mana_attach(struct net_device *ndev)
> static int mana_dealloc_queues(struct net_device *ndev)
> {
> struct mana_port_context *apc = netdev_priv(ndev);
> + unsigned long timeout = jiffies + 120 * HZ;
> struct gdma_dev *gd = apc->ac->gdma_dev;
> struct mana_txq *txq;
> + struct sk_buff *skb;
> + struct mana_cq *cq;
> int i, err;
> + u32 tsleep;
>
> if (apc->port_is_up)
> return -EINVAL;
> @@ -2363,15 +2367,33 @@ static int mana_dealloc_queues(struct net_device *ndev)
> * to false, but it doesn't matter since mana_start_xmit() drops any
> * new packets due to apc->port_is_up being false.
> *
> - * Drain all the in-flight TX packets
> + * Drain all the in-flight TX packets.
> + * A timeout of 120 seconds for all the queues is used.
> + * This will break the while loop when h/w is not responding.
> + * This value of 120 has been decided here considering max
> + * number of queues.
> */
> +
> for (i = 0; i < apc->num_queues; i++) {
> txq = &apc->tx_qp[i].txq;
> -
> - while (atomic_read(&txq->pending_sends) > 0)
> - usleep_range(1000, 2000);
> + tsleep = 1000;
> + while (atomic_read(&txq->pending_sends) > 0 &&
> + time_before(jiffies, timeout)) {
> + usleep_range(tsleep, tsleep << 1);
> + tsleep <<= 1;
> + }
> }
>
> + for (i = 0; i < apc->num_queues; i++) {
> + txq = &apc->tx_qp[i].txq;
> + cq = &apc->tx_qp[i].tx_cq;
The above variable is unused, and causes a build warning. Please remove
the assignment and the variable declaration.
Thanks,
Paolo
^ permalink raw reply
* Re: [PATCH v2 00/24] use vmalloc_array and vcalloc
From: Martin K. Petersen @ 2023-07-06 1:35 UTC (permalink / raw)
To: Julia Lawall
Cc: linux-hyperv, kernel-janitors, keescook, christophe.jaillet, kuba,
kasan-dev, Andrey Konovalov, Dmitry Vyukov, iommu, linux-tegra,
Robin Murphy, Krishna Reddy, virtualization, Xuan Zhuo,
linux-scsi, linaro-mm-sig, linux-media, John Stultz,
Brian Starkey, Laura Abbott, Liam Mark, Benjamin Gaignard,
dri-devel, linux-kernel, netdev, Shailend Chand, linux-rdma, mhi,
linux-arm-msm, linux-btrfs, intel-gvt-dev, intel-gfx, Dave Hansen,
H. Peter Anvin, linux-sgx
In-Reply-To: <20230627144339.144478-1-Julia.Lawall@inria.fr>
Julia,
> The functions vmalloc_array and vcalloc were introduced in
>
> commit a8749a35c399 ("mm: vmalloc: introduce array allocation functions")
>
> but are not used much yet. This series introduces uses of
> these functions, to protect against multiplication overflows.
Applied #7 and #24 to 6.5/scsi-staging, thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: [PATCH V2 1/9] x86/hyperv: Add sev-snp enlightened guest static key
From: Tianyu Lan @ 2023-07-05 14:43 UTC (permalink / raw)
To: Michael Kelley (LINUX), KY Srinivasan, Haiyang Zhang,
wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, daniel.lezcano@linaro.org,
arnd@arndb.de
Cc: Tianyu Lan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
In-Reply-To: <BYAPR21MB1688570E8D0D0C49B68DD548D72EA@BYAPR21MB1688.namprd21.prod.outlook.com>
On 7/4/2023 10:17 PM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, June 26, 2023 8:23 PM
>>
>> Introduce static key isolation_type_en_snp for enlightened
>> sev-snp guest check.
>>
>> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
>> ---
>> arch/x86/hyperv/ivm.c | 11 +++++++++++
>> arch/x86/include/asm/mshyperv.h | 3 +++
>> arch/x86/kernel/cpu/mshyperv.c | 9 +++++++--
>> drivers/hv/hv_common.c | 6 ++++++
>> include/asm-generic/mshyperv.h | 12 +++++++++---
>> 5 files changed, 36 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
>> index cc92388b7a99..5d3ee3124e00 100644
>> --- a/arch/x86/hyperv/ivm.c
>> +++ b/arch/x86/hyperv/ivm.c
>> @@ -409,3 +409,14 @@ bool hv_isolation_type_snp(void)
>> {
>> return static_branch_unlikely(&isolation_type_snp);
>> }
>> +
>> +DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
>> +/*
>> + * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
>> + * isolation enlightened VM.
>> + */
>> +bool hv_isolation_type_en_snp(void)
>> +{
>> + return static_branch_unlikely(&isolation_type_en_snp);
>> +}
>> +
>
> Vitaly had suggested some renaming of this and related variables and
> functions, which you had agreed to do. But I don't see that renaming
> reflected in this patch or throughout the full patch set.
>
Hi Michael:
Thanks for your review. I am preparing a separating patchset to rework
hv_isolation_type_en_snp() and hv_isolation_type_snp() according to
Vitaly suggestion because it will touch more files and not affect
SEV-SNP function.
Thanks.
^ permalink raw reply
* Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive
From: Alexander Lobakin @ 2023-07-05 14:35 UTC (permalink / raw)
To: Souradeep Chakrabarti, souradeep chakrabarti
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, Long Li, Ajay Sharma, leon@kernel.org,
cai.huoqing@linux.dev, ssengar@linux.microsoft.com,
vkuznets@redhat.com, tglx@linutronix.de,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <PUZP153MB07880E6D692FD5D13C508694CC29A@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
From: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Date: Mon, 3 Jul 2023 19:55:06 +0000
>
>
>> -----Original Message-----
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Sent: Monday, July 3, 2023 10:18 PM
[...]
>>> for (i = 0; i < apc->num_queues; i++) {
>>> txq = &apc->tx_qp[i].txq;
>>> -
>>> - while (atomic_read(&txq->pending_sends) > 0)
>>> + while (atomic_read(&txq->pending_sends) > 0 &&
>>> + time_before(jiffies, timeout)) {
>>> usleep_range(1000, 2000);> + }
>>> }
>>
>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>> iterations. I know usleep_range() often is much less precise, but still.
>> Do you really need that much time? Has this been measured during the tests
>> that it can take up to 120 seconds or is it just some random value that "should
>> be enough"?
>> If you really need 120 seconds, I'd suggest using a timer / delayed work instead
>> of wasting resources.
> Here the intent is not waiting for 120 seconds, rather than avoid continue checking the
> pending_sends of each tx queues for an indefinite time, before freeing sk_buffs.
> The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq()
> gets called for a completion notification and increased by xmit.
>
> In this particular bug, apc->port_is_up is not set to false, causing
> xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq()
> not getting called for the queue.
>
> If we see the comment in the function mana_dealloc_queues(), it mentions it :
>
> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
> 2348 * a txq because it may not timely see apc->port_is_up being cleared
> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
> 2350 * new packets due to apc->port_is_up being false.
>
> The value 120 seconds has been decided here based on maximum number of queues
This is quite opposite to what you're saying above. How should I connect
these two:
Here the intent is not waiting for 120 seconds
+
The value 120 seconds has been decided here based on maximum number of
queues
?
Can cleaning the Tx queues really last for 120 seconds?
My understanding is that timeouts need to be sensible and not go to the
outer space. What is the medium value you got during the tests?
> are allowed in this specific hardware, it is a safe assumption.
>>
>>>
>>> + for (i = 0; i < apc->num_queues; i++) {
>>> + txq = &apc->tx_qp[i].txq;
>>> + cq = &apc->tx_qp[i].tx_cq;
>>
>> cq can be just &txq->tx_cq.
> mana_txq structure does not have a pointer to mana_cq.
Sorry, misread, my bad.
>>
>>> + while (atomic_read(&txq->pending_sends)) {
>>> + skb = skb_dequeue(&txq->pending_skbs);
>>> + mana_unmap_skb(skb, apc);
>>> + napi_consume_skb(skb, cq->budget);
>>
>> (you already have comment about this one)
>>
>>> + atomic_sub(1, &txq->pending_sends);
>>> + }
>>> + }
>>> /* We're 100% sure the queues can no longer be woken up, because
>>> * we're sure now mana_poll_tx_cq() can't be running.
>>> */
>>
>> Thanks,
>> Olek
Thanks,
Olek
^ permalink raw reply
* [PATCH] HID: hyperv: avoid struct memcpy overrun warning
From: Arnd Bergmann @ 2023-07-05 14:02 UTC (permalink / raw)
To: Jiri Kosina, Benjamin Tissoires
Cc: Arnd Bergmann, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Paulo Miguel Almeida, Michael Kelley, Dawei Li,
Yang Yingliang, Thomas Weißschuh, linux-hyperv, linux-input,
linux-kernel
From: Arnd Bergmann <arnd@arndb.de>
A previous patch addressed the fortified memcpy warning for most
builds, but I still see this one with gcc-9:
In file included from include/linux/string.h:254,
from drivers/hid/hid-hyperv.c:8:
In function 'fortify_memcpy_chk',
inlined from 'mousevsc_on_receive' at drivers/hid/hid-hyperv.c:272:3:
include/linux/fortify-string.h:583:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
583 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My guess is that the WARN_ON() itself is what confuses gcc, so it no
longer sees that there is a correct range check. Rework the code in a
way that helps readability and avoids the warning.
Fixes: 542f25a944715 ("HID: hyperv: Replace one-element array with flexible-array member")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/hid/hid-hyperv.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 49d4a26895e76..f33485d83d24f 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -258,19 +258,17 @@ static void mousevsc_on_receive(struct hv_device *device,
switch (hid_msg_hdr->type) {
case SYNTH_HID_PROTOCOL_RESPONSE:
+ len = struct_size(pipe_msg, data, pipe_msg->size);
+
/*
* While it will be impossible for us to protect against
* malicious/buggy hypervisor/host, add a check here to
* ensure we don't corrupt memory.
*/
- if (struct_size(pipe_msg, data, pipe_msg->size)
- > sizeof(struct mousevsc_prt_msg)) {
- WARN_ON(1);
+ if (WARN_ON(len > sizeof(struct mousevsc_prt_msg)))
break;
- }
- memcpy(&input_dev->protocol_resp, pipe_msg,
- struct_size(pipe_msg, data, pipe_msg->size));
+ memcpy(&input_dev->protocol_resp, pipe_msg, len);
complete(&input_dev->wait_event);
break;
--
2.39.2
^ permalink raw reply related
* [PATCH] vmbus_testing: fix wrong python syntax for integer value comparison
From: Ani Sinha @ 2023-07-05 13:44 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
Cc: Ani Sinha, linux-hyperv, linux-kernel
It is incorrect in python to compare integer values using the "is" keyword.
The "is" keyword in python is used to compare references to two objects,
not their values. Newer version of python3 (version 3.8) throws a warning
when such incorrect comparison is made. For value comparison, "==" should
be used.
Fix this in the code and suppress the following warning:
/usr/sbin/vmbus_testing:167: SyntaxWarning: "is" with a literal. Did you mean "=="?
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
tools/hv/vmbus_testing | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/hv/vmbus_testing b/tools/hv/vmbus_testing
index e7212903dd1d..4467979d8f69 100755
--- a/tools/hv/vmbus_testing
+++ b/tools/hv/vmbus_testing
@@ -164,7 +164,7 @@ def recursive_file_lookup(path, file_map):
def get_all_devices_test_status(file_map):
for device in file_map:
- if (get_test_state(locate_state(device, file_map)) is 1):
+ if (get_test_state(locate_state(device, file_map)) == 1):
print("Testing = ON for: {}"
.format(device.split("/")[5]))
else:
@@ -203,7 +203,7 @@ def write_test_files(path, value):
def set_test_state(state_path, state_value, quiet):
write_test_files(state_path, state_value)
- if (get_test_state(state_path) is 1):
+ if (get_test_state(state_path) == 1):
if (not quiet):
print("Testing = ON for device: {}"
.format(state_path.split("/")[5]))
--
2.39.1
^ permalink raw reply related
* RE: [PATCH net] net: mana: Configure hwc timeout from hardware
From: Saurabh Singh Sengar @ 2023-07-05 11:11 UTC (permalink / raw)
To: Souradeep Chakrabarti, KY Srinivasan, Haiyang Zhang,
wei.liu@kernel.org, Dexuan Cui, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, Long Li,
Ajay Sharma, leon@kernel.org, cai.huoqing@linux.dev,
ssengar@linux.microsoft.com, vkuznets@redhat.com,
tglx@linutronix.de, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: stable@vger.kernel.org, Souradeep Chakrabarti
In-Reply-To: <1688549578-12906-1-git-send-email-schakrabarti@linux.microsoft.com>
> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> Sent: Wednesday, July 5, 2023 3:03 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com;
> vkuznets@redhat.com; tglx@linutronix.de; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org
> Cc: stable@vger.kernel.org; Souradeep Chakrabarti
> <schakrabarti@microsoft.com>; Souradeep Chakrabarti
> <schakrabarti@linux.microsoft.com>
> Subject: [PATCH net] net: mana: Configure hwc timeout from hardware
>
> At present hwc timeout value is a fixed value.
> This patch sets the hwc timeout from the hardware.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> .../net/ethernet/microsoft/mana/gdma_main.c | 27 +++++++++++++++++++
> .../net/ethernet/microsoft/mana/hw_channel.c | 25 ++++++++++++++++-
> include/net/mana/gdma.h | 20 +++++++++++++-
> include/net/mana/hw_channel.h | 5 ++++
> 4 files changed, 75 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5d30347e0137 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -106,6 +106,30 @@ static int mana_gd_query_max_resources(struct
> pci_dev *pdev)
> return 0;
> }
>
> +static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32
> +*timeout_val) {
> + struct gdma_context *gc = pci_get_drvdata(pdev);
> + struct gdma_query_hwc_timeout_req req = {};
> + struct gdma_query_hwc_timeout_resp resp = {};
> + int err;
> +
> + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
> + sizeof(req), sizeof(resp));
> + req.timeout_ms = *timeout_val;
> + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp),
> &resp);
> + if (err || resp.hdr.status) {
> + dev_err(gc->dev, "Failed to query timeout: %d, 0x%x\n", err,
> + resp.hdr.status);
> + return err ? err : -EPROTO;
> + }
> +
> + *timeout_val = resp.timeout_ms;
> + dev_info(gc->dev, "Successfully changed the timeout value %u\n",
> + *timeout_val);
> +
> + return 0;
> +}
> +
> static int mana_gd_detect_devices(struct pci_dev *pdev) {
> struct gdma_context *gc = pci_get_drvdata(pdev); @@ -879,6 +903,7
> @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> struct gdma_context *gc = pci_get_drvdata(pdev);
> struct gdma_verify_ver_resp resp = {};
> struct gdma_verify_ver_req req = {};
> + struct hw_channel_context *hwc = gc->hwc.driver_data;
> int err;
>
> mana_gd_init_req_hdr(&req.hdr,
> GDMA_VERIFY_VF_DRIVER_VERSION, @@ -907,6 +932,8 @@ int
> mana_gd_verify_vf_version(struct pci_dev *pdev)
> err, resp.hdr.status);
> return err ? err : -EPROTO;
> }
> + if (resp.pf_cap_flags1 &
> GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> + mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..f5980c26fd09 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx,
> struct gdma_queue *q_self,
> complete(&hwc->hwc_init_eqe_comp);
> break;
>
> + case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
> + type_data.as_uint32 = event->details[0];
> + type = type_data.type;
> + val = type_data.value;
> +
> + switch (type) {
> + case HWC_DATA_CFG_HWC_TIMEOUT:
> + hwc->hwc_timeout = val;
> + break;
> +
> + default:
> + dev_warn(hwc->dev, "Received unknown reconfig
> type %u\n", type);
> + break;
> + }
> +
> + break;
> +
> default:
> + dev_warn(hwc->dev, "Received unknown gdma event %u\n",
> event->type);
> /* Ignore unknown events, which should never happen. */
> break;
> }
> @@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context
> *gc)
> gd->pdid = INVALID_PDID;
> gd->doorbell = INVALID_DOORBELL;
>
> + hwc->hwc_timeout =
> HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
> /* mana_hwc_init_queues() only creates the required data structures,
> * and doesn't touch the HWC device.
> */
> @@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct
> gdma_context *gc)
> hwc->gdma_dev->doorbell = INVALID_DOORBELL;
> hwc->gdma_dev->pdid = INVALID_PDID;
>
> + hwc->hwc_timeout = 0;
> +
> kfree(hwc);
> gc->hwc.driver_data = NULL;
> gc->hwc.gdma_context = NULL;
> @@ -818,6 +839,7 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> dest_vrq = hwc->pf_dest_vrq_id;
> dest_vrcq = hwc->pf_dest_vrcq_id;
> }
> + dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
Can avoid dev_err here
>
> err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
> if (err) {
> @@ -825,7 +847,8 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> goto out;
> }
>
> - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> + if (!wait_for_completion_timeout(&ctx->comp_event,
> + (hwc->hwc_timeout / 1000) * HZ)) {
> dev_err(hwc->dev, "HWC: Request timed out!\n");
> err = -ETIMEDOUT;
> goto out;
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index
> 96c120160f15..88b6ef7ce1a6 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -33,6 +33,7 @@ enum gdma_request_type {
> GDMA_DESTROY_PD = 30,
> GDMA_CREATE_MR = 31,
> GDMA_DESTROY_MR = 32,
> + GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
> };
>
> #define GDMA_RESOURCE_DOORBELL_PAGE 27
> @@ -57,6 +58,8 @@ enum gdma_eqe_type {
> GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
> GDMA_EQE_HWC_INIT_DATA = 130,
> GDMA_EQE_HWC_INIT_DONE = 131,
> + GDMA_EQE_HWC_SOC_RECONFIG = 132,
> + GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
> };
>
> enum {
> @@ -531,10 +534,12 @@ enum {
> * so the driver is able to reliably support features like busy_poll.
> */
> #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
> +#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
>
> #define GDMA_DRV_CAP_FLAGS1 \
> (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> - GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
> + GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
>
> #define GDMA_DRV_CAP_FLAGS2 0
>
> @@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
> u32 alloc_res_id_on_creation;
> }; /* HW DATA */
>
> +/* GDMA_QUERY_HWC_TIMEOUT */
> +struct gdma_query_hwc_timeout_req {
> + struct gdma_req_hdr hdr;
> + u32 timeout_ms;
> + u32 reserved;
> +};
> +
> +struct gdma_query_hwc_timeout_resp {
> + struct gdma_resp_hdr hdr;
> + u32 timeout_ms;
> + u32 reserved;
> +};
> +
> enum atb_page_size {
> ATB_PAGE_SIZE_4K,
> ATB_PAGE_SIZE_8K,
> diff --git a/include/net/mana/hw_channel.h
> b/include/net/mana/hw_channel.h index 6a757a6e2732..3d3b5c881bc1
> 100644
> --- a/include/net/mana/hw_channel.h
> +++ b/include/net/mana/hw_channel.h
> @@ -23,6 +23,10 @@
> #define HWC_INIT_DATA_PF_DEST_RQ_ID 10
> #define HWC_INIT_DATA_PF_DEST_CQ_ID 11
>
> +#define HWC_DATA_CFG_HWC_TIMEOUT 1
> +
> +#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
> +
> /* Structures labeled with "HW DATA" are exchanged with the hardware. All
> of
> * them are naturally aligned and hence don't need __packed.
> */
> @@ -182,6 +186,7 @@ struct hw_channel_context {
>
> u32 pf_dest_vrq_id;
> u32 pf_dest_vrcq_id;
> + u32 hwc_timeout;
>
> struct hwc_caller_ctx *caller_ctx;
> };
> --
> 2.34.1
^ permalink raw reply
* Re: [PATCH net] net: mana: Configure hwc timeout from hardware
From: Leon Romanovsky @ 2023-07-05 10:47 UTC (permalink / raw)
To: Souradeep Chakrabarti
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma, stable,
schakrabarti
In-Reply-To: <1688549578-12906-1-git-send-email-schakrabarti@linux.microsoft.com>
On Wed, Jul 05, 2023 at 02:32:58AM -0700, Souradeep Chakrabarti wrote:
> At present hwc timeout value is a fixed value.
> This patch sets the hwc timeout from the hardware.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> .../net/ethernet/microsoft/mana/gdma_main.c | 27 +++++++++++++++++++
> .../net/ethernet/microsoft/mana/hw_channel.c | 25 ++++++++++++++++-
> include/net/mana/gdma.h | 20 +++++++++++++-
> include/net/mana/hw_channel.h | 5 ++++
> 4 files changed, 75 insertions(+), 2 deletions(-)
We are in merge window now, it is not net material.
>
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5d30347e0137 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -106,6 +106,30 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> return 0;
> }
>
> +static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32 *timeout_val)
> +{
Callers are not checking return value, so or make this function void or
check return value.
> + struct gdma_context *gc = pci_get_drvdata(pdev);
> + struct gdma_query_hwc_timeout_req req = {};
> + struct gdma_query_hwc_timeout_resp resp = {};
> + int err;
> +
> + mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
> + sizeof(req), sizeof(resp));
> + req.timeout_ms = *timeout_val;
> + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> + if (err || resp.hdr.status) {
I see this check almost in all callers to mana_gd_send_request(). It
will be nice if mana_gd_send_request() would check status internally
and return error.
> + dev_err(gc->dev, "Failed to query timeout: %d, 0x%x\n", err,
> + resp.hdr.status);
> + return err ? err : -EPROTO;
> + }
> +
> + *timeout_val = resp.timeout_ms;
> + dev_info(gc->dev, "Successfully changed the timeout value %u\n",
> + *timeout_val);
> +
> + return 0;
> +}
> +
> static int mana_gd_detect_devices(struct pci_dev *pdev)
> {
> struct gdma_context *gc = pci_get_drvdata(pdev);
> @@ -879,6 +903,7 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> struct gdma_context *gc = pci_get_drvdata(pdev);
> struct gdma_verify_ver_resp resp = {};
> struct gdma_verify_ver_req req = {};
> + struct hw_channel_context *hwc = gc->hwc.driver_data;
> int err;
>
> mana_gd_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
> @@ -907,6 +932,8 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
> err, resp.hdr.status);
> return err ? err : -EPROTO;
> }
> + if (resp.pf_cap_flags1 & GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
> + mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..f5980c26fd09 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue *q_self,
> complete(&hwc->hwc_init_eqe_comp);
> break;
>
> + case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
> + type_data.as_uint32 = event->details[0];
> + type = type_data.type;
> + val = type_data.value;
> +
> + switch (type) {
> + case HWC_DATA_CFG_HWC_TIMEOUT:
> + hwc->hwc_timeout = val;
> + break;
> +
> + default:
> + dev_warn(hwc->dev, "Received unknown reconfig type %u\n", type);
> + break;
> + }
> +
> + break;
> +
> default:
> + dev_warn(hwc->dev, "Received unknown gdma event %u\n", event->type);
> /* Ignore unknown events, which should never happen. */
> break;
> }
> @@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context *gc)
> gd->pdid = INVALID_PDID;
> gd->doorbell = INVALID_DOORBELL;
>
> + hwc->hwc_timeout = HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
> /* mana_hwc_init_queues() only creates the required data structures,
> * and doesn't touch the HWC device.
> */
> @@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
> hwc->gdma_dev->doorbell = INVALID_DOORBELL;
> hwc->gdma_dev->pdid = INVALID_PDID;
>
> + hwc->hwc_timeout = 0;
> +
> kfree(hwc);
> gc->hwc.driver_data = NULL;
> gc->hwc.gdma_context = NULL;
> @@ -818,6 +839,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> dest_vrq = hwc->pf_dest_vrq_id;
> dest_vrcq = hwc->pf_dest_vrcq_id;
> }
> + dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
>
> err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
> if (err) {
> @@ -825,7 +847,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> goto out;
> }
>
> - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> + if (!wait_for_completion_timeout(&ctx->comp_event,
> + (hwc->hwc_timeout / 1000) * HZ)) {
> dev_err(hwc->dev, "HWC: Request timed out!\n");
> err = -ETIMEDOUT;
> goto out;
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 96c120160f15..88b6ef7ce1a6 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -33,6 +33,7 @@ enum gdma_request_type {
> GDMA_DESTROY_PD = 30,
> GDMA_CREATE_MR = 31,
> GDMA_DESTROY_MR = 32,
> + GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
> };
>
> #define GDMA_RESOURCE_DOORBELL_PAGE 27
> @@ -57,6 +58,8 @@ enum gdma_eqe_type {
> GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
> GDMA_EQE_HWC_INIT_DATA = 130,
> GDMA_EQE_HWC_INIT_DONE = 131,
> + GDMA_EQE_HWC_SOC_RECONFIG = 132,
> + GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
> };
>
> enum {
> @@ -531,10 +534,12 @@ enum {
> * so the driver is able to reliably support features like busy_poll.
> */
> #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
> +#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
>
> #define GDMA_DRV_CAP_FLAGS1 \
> (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> - GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
> + GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
>
> #define GDMA_DRV_CAP_FLAGS2 0
>
> @@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
> u32 alloc_res_id_on_creation;
> }; /* HW DATA */
>
> +/* GDMA_QUERY_HWC_TIMEOUT */
> +struct gdma_query_hwc_timeout_req {
> + struct gdma_req_hdr hdr;
> + u32 timeout_ms;
> + u32 reserved;
> +};
> +
> +struct gdma_query_hwc_timeout_resp {
> + struct gdma_resp_hdr hdr;
> + u32 timeout_ms;
> + u32 reserved;
> +};
> +
> enum atb_page_size {
> ATB_PAGE_SIZE_4K,
> ATB_PAGE_SIZE_8K,
> diff --git a/include/net/mana/hw_channel.h b/include/net/mana/hw_channel.h
> index 6a757a6e2732..3d3b5c881bc1 100644
> --- a/include/net/mana/hw_channel.h
> +++ b/include/net/mana/hw_channel.h
> @@ -23,6 +23,10 @@
> #define HWC_INIT_DATA_PF_DEST_RQ_ID 10
> #define HWC_INIT_DATA_PF_DEST_CQ_ID 11
>
> +#define HWC_DATA_CFG_HWC_TIMEOUT 1
> +
> +#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
> +
> /* Structures labeled with "HW DATA" are exchanged with the hardware. All of
> * them are naturally aligned and hence don't need __packed.
> */
> @@ -182,6 +186,7 @@ struct hw_channel_context {
>
> u32 pf_dest_vrq_id;
> u32 pf_dest_vrcq_id;
> + u32 hwc_timeout;
>
> struct hwc_caller_ctx *caller_ctx;
> };
> --
> 2.34.1
>
^ permalink raw reply
* [PATCH net] net: mana: Configure hwc timeout from hardware
From: Souradeep Chakrabarti @ 2023-07-05 9:32 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma
Cc: stable, schakrabarti, Souradeep Chakrabarti
At present hwc timeout value is a fixed value.
This patch sets the hwc timeout from the hardware.
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 27 +++++++++++++++++++
.../net/ethernet/microsoft/mana/hw_channel.c | 25 ++++++++++++++++-
include/net/mana/gdma.h | 20 +++++++++++++-
include/net/mana/hw_channel.h | 5 ++++
4 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..5d30347e0137 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -106,6 +106,30 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
return 0;
}
+static int mana_gd_query_hwc_timeout(struct pci_dev *pdev, u32 *timeout_val)
+{
+ struct gdma_context *gc = pci_get_drvdata(pdev);
+ struct gdma_query_hwc_timeout_req req = {};
+ struct gdma_query_hwc_timeout_resp resp = {};
+ int err;
+
+ mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_HWC_TIMEOUT,
+ sizeof(req), sizeof(resp));
+ req.timeout_ms = *timeout_val;
+ err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+ if (err || resp.hdr.status) {
+ dev_err(gc->dev, "Failed to query timeout: %d, 0x%x\n", err,
+ resp.hdr.status);
+ return err ? err : -EPROTO;
+ }
+
+ *timeout_val = resp.timeout_ms;
+ dev_info(gc->dev, "Successfully changed the timeout value %u\n",
+ *timeout_val);
+
+ return 0;
+}
+
static int mana_gd_detect_devices(struct pci_dev *pdev)
{
struct gdma_context *gc = pci_get_drvdata(pdev);
@@ -879,6 +903,7 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_verify_ver_resp resp = {};
struct gdma_verify_ver_req req = {};
+ struct hw_channel_context *hwc = gc->hwc.driver_data;
int err;
mana_gd_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
@@ -907,6 +932,8 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
err, resp.hdr.status);
return err ? err : -EPROTO;
}
+ if (resp.pf_cap_flags1 & GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
+ mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
return 0;
}
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 9d1507eba5b9..f5980c26fd09 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -174,7 +174,25 @@ static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue *q_self,
complete(&hwc->hwc_init_eqe_comp);
break;
+ case GDMA_EQE_HWC_SOC_RECONFIG_DATA:
+ type_data.as_uint32 = event->details[0];
+ type = type_data.type;
+ val = type_data.value;
+
+ switch (type) {
+ case HWC_DATA_CFG_HWC_TIMEOUT:
+ hwc->hwc_timeout = val;
+ break;
+
+ default:
+ dev_warn(hwc->dev, "Received unknown reconfig type %u\n", type);
+ break;
+ }
+
+ break;
+
default:
+ dev_warn(hwc->dev, "Received unknown gdma event %u\n", event->type);
/* Ignore unknown events, which should never happen. */
break;
}
@@ -704,6 +722,7 @@ int mana_hwc_create_channel(struct gdma_context *gc)
gd->pdid = INVALID_PDID;
gd->doorbell = INVALID_DOORBELL;
+ hwc->hwc_timeout = HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS;
/* mana_hwc_init_queues() only creates the required data structures,
* and doesn't touch the HWC device.
*/
@@ -770,6 +789,8 @@ void mana_hwc_destroy_channel(struct gdma_context *gc)
hwc->gdma_dev->doorbell = INVALID_DOORBELL;
hwc->gdma_dev->pdid = INVALID_PDID;
+ hwc->hwc_timeout = 0;
+
kfree(hwc);
gc->hwc.driver_data = NULL;
gc->hwc.gdma_context = NULL;
@@ -818,6 +839,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
dest_vrq = hwc->pf_dest_vrq_id;
dest_vrcq = hwc->pf_dest_vrcq_id;
}
+ dev_err(hwc->dev, "HWC: timeout %u ms\n", hwc->hwc_timeout);
err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
if (err) {
@@ -825,7 +847,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
goto out;
}
- if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
+ if (!wait_for_completion_timeout(&ctx->comp_event,
+ (hwc->hwc_timeout / 1000) * HZ)) {
dev_err(hwc->dev, "HWC: Request timed out!\n");
err = -ETIMEDOUT;
goto out;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 96c120160f15..88b6ef7ce1a6 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -33,6 +33,7 @@ enum gdma_request_type {
GDMA_DESTROY_PD = 30,
GDMA_CREATE_MR = 31,
GDMA_DESTROY_MR = 32,
+ GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */
};
#define GDMA_RESOURCE_DOORBELL_PAGE 27
@@ -57,6 +58,8 @@ enum gdma_eqe_type {
GDMA_EQE_HWC_INIT_EQ_ID_DB = 129,
GDMA_EQE_HWC_INIT_DATA = 130,
GDMA_EQE_HWC_INIT_DONE = 131,
+ GDMA_EQE_HWC_SOC_RECONFIG = 132,
+ GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133,
};
enum {
@@ -531,10 +534,12 @@ enum {
* so the driver is able to reliably support features like busy_poll.
*/
#define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
+#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
#define GDMA_DRV_CAP_FLAGS1 \
(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
- GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX)
+ GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
+ GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
#define GDMA_DRV_CAP_FLAGS2 0
@@ -664,6 +669,19 @@ struct gdma_disable_queue_req {
u32 alloc_res_id_on_creation;
}; /* HW DATA */
+/* GDMA_QUERY_HWC_TIMEOUT */
+struct gdma_query_hwc_timeout_req {
+ struct gdma_req_hdr hdr;
+ u32 timeout_ms;
+ u32 reserved;
+};
+
+struct gdma_query_hwc_timeout_resp {
+ struct gdma_resp_hdr hdr;
+ u32 timeout_ms;
+ u32 reserved;
+};
+
enum atb_page_size {
ATB_PAGE_SIZE_4K,
ATB_PAGE_SIZE_8K,
diff --git a/include/net/mana/hw_channel.h b/include/net/mana/hw_channel.h
index 6a757a6e2732..3d3b5c881bc1 100644
--- a/include/net/mana/hw_channel.h
+++ b/include/net/mana/hw_channel.h
@@ -23,6 +23,10 @@
#define HWC_INIT_DATA_PF_DEST_RQ_ID 10
#define HWC_INIT_DATA_PF_DEST_CQ_ID 11
+#define HWC_DATA_CFG_HWC_TIMEOUT 1
+
+#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000
+
/* Structures labeled with "HW DATA" are exchanged with the hardware. All of
* them are naturally aligned and hence don't need __packed.
*/
@@ -182,6 +186,7 @@ struct hw_channel_context {
u32 pf_dest_vrq_id;
u32 pf_dest_vrcq_id;
+ u32 hwc_timeout;
struct hwc_caller_ctx *caller_ctx;
};
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 07/12] arch/x86: Declare edid_info in <asm/screen_info.h>
From: Thomas Zimmermann @ 2023-07-05 8:18 UTC (permalink / raw)
To: Arnd Bergmann, Helge Deller, Daniel Vetter, Dave Airlie
Cc: linux-hyperv, linux-efi, linux-ia64, linux-sh, Peter Zijlstra,
Dave Hansen, linux-fbdev, dri-devel, linux-mips, H. Peter Anvin,
sparclinux, linux-riscv, Ard Biesheuvel, Linux-Arch,
linux-hexagon, linux-staging, linux-csky@vger.kernel.org,
Ingo Molnar, Sami Tolvanen, Kees Cook, Paul E. McKenney,
Frederic Weisbecker, Nicholas Piggin, Borislav Petkov, loongarch,
Thomas Gleixner, linux-arm-kernel, x86, linux-kernel,
Juerg Haefliger, linux-alpha, Andrew Morton, linuxppc-dev
In-Reply-To: <dd5aa01e-afad-48d2-bf4c-4a58b74f1644@app.fastmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4223 bytes --]
Hi Arnd
Am 30.06.23 um 13:53 schrieb Arnd Bergmann:
> On Fri, Jun 30, 2023, at 09:46, Thomas Zimmermann wrote:
>> Am 29.06.23 um 15:21 schrieb Arnd Bergmann:
>>> On Thu, Jun 29, 2023, at 15:01, Thomas Zimmermann wrote:
>>>> Am 29.06.23 um 14:35 schrieb Arnd Bergmann:
>>>>> On Thu, Jun 29, 2023, at 13:45, Thomas Zimmermann wrote:
>>
>>>>
>>>> FIRMWARE_EDID is a user-selectable feature, while ARCH_HAS_EDID_INFO
>>>> announces an architecture feature. They do different things.
>>>
>>> I still have trouble seeing the difference.
>>
>> The idea here is that ARCH_HAS_ signals the architecture's support for
>> the feature. Drivers set 'depends on' in their Kconfig.
>>
>> Another Kconfig token, VIDEO_SCREEN_INFO or FIRMWARE_EDID, would then
>> actually enable the feature. Drivers select VIDEO_SCREEN_INFO or
>> FIRMWARE_EDID and the architectures contains code like
>
> Fair enough. In that case, I guess FIRMWARE_EDID will just depend on
> ARCH_HAS_EDID_INFO, or possibly "depends on FIRMWARE_EDID || EFI"
> after it starts calling into an EFI specific function, right?
>
>> #ifdef VIDEO_SCREEN_INFO
>> struct screen_info screen_info = {
>> /* set values here */
>> }
>> #endif
>>
>> This allows us to disable code that requires screen_info/edid_info, but
>> also disable screen_info/edid_info unless such code has been enabled in
>> the kernel config.
>>
>> Some architectures currently mimic this by guarding screen_info with
>> ifdef CONFIG_VT or similar. I'd like to make this more flexible. The
>> cost of a few more internal Kconfig tokens seems negligible.
>
> I definitely get it for the screen_info, which needs the complexity.
> For ARCHARCH_HAS_EDID_INFO I would hope that it's never selected by
> anything other than x86, so I would still go with just a dependency
> on x86 for simplicity, but I don't mind having the extra symbol if that
> keeps it more consistent with how the screen_info is handled.
Well, I'd like to add edid_info to platforms with EFI. What would be
arm/arm64 and loongarch, I guess. See below for the future plans.
>
>>> I suppose you could use FIRMWARE_EDID on EFI or OF systems without
>>> the need for a global edid_info structure, but that would not
>>> share any code with the current fb_firmware_edid() function.
>>
>> The current code is build on top of screen_info and edid_info. I'd
>> preferably not replace that, if possible.
>
> One way I could imagine this looking in the end would be
> something like
>
> struct screen_info *fb_screen_info(struct device *dev)
> {
> struct screen_info *si = NULL;
>
> if (IS_ENABLED(CONFIG_EFI))
> si = efi_get_screen_info(dev);
>
> if (IS_ENABLED(CONFIG_ARCH_HAS_SCREEN_INFO) && !si)
> si = screen_info;
>
> return si;
> }
>
> corresponding to fb_firmware_edid(). With this, any driver
> that wants to access screen_info would call this function
> instead of using the global pointer, plus either NULL pointer
> check or a CONFIG_ARCH_HAS_SCREEN_INFO dependency.
>
> This way we could completely eliminate the global screen_info
> on arm64, riscv, and loongarch but still use the efi and
> hyperv framebuffer/drm drivers.
If possible, I'd like to remove global screen_info and edid_info
entirely from fbdev and the various consoles.
We currently use screen_info to set up the generic framebuffer device in
drivers/firmware/sysfb.c. I'd like to use edid_info here as well, so
that the generic graphics drivers can get EDID information.
For the few fbdev drivers and consoles that require the global
screen_info/edid_info, I'd rather provide lookup functions in sysfb
(e.g., sysfb_get_screen_info(), sysfb_get_edid_info()). The global
screen_info/edid_info state would then become an internal artifact of
the sysfb code.
Hopefully that explains some of the decisions made in this patchset.
Best regards
Thomas
>
> Arnd
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply
* [PATCH V5 net] net: mana: Fix MANA VF unload when hardware is unresponsive
From: Souradeep Chakrabarti @ 2023-07-05 8:16 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma
Cc: stable, schakrabarti, Souradeep Chakrabarti
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA
hardware to complete any inflight packets and set the pending send count
to zero. But if the hardware has failed, mana_dealloc_queues()
could wait forever.
Fix this by adding a timeout to the wait. Set the timeout to 120 seconds,
which is a somewhat arbitrary value that is more than long enough for
functional hardware to complete any sends.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
---
V4 -> V5:
* Added fixes tag
* Changed the usleep_range from static to incremental value.
* Initialized timeout in the begining.
---
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 30 ++++++++++++++++---
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a499e460594b..56b7074db1a2 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2345,9 +2345,13 @@ int mana_attach(struct net_device *ndev)
static int mana_dealloc_queues(struct net_device *ndev)
{
struct mana_port_context *apc = netdev_priv(ndev);
+ unsigned long timeout = jiffies + 120 * HZ;
struct gdma_dev *gd = apc->ac->gdma_dev;
struct mana_txq *txq;
+ struct sk_buff *skb;
+ struct mana_cq *cq;
int i, err;
+ u32 tsleep;
if (apc->port_is_up)
return -EINVAL;
@@ -2363,15 +2367,33 @@ static int mana_dealloc_queues(struct net_device *ndev)
* to false, but it doesn't matter since mana_start_xmit() drops any
* new packets due to apc->port_is_up being false.
*
- * Drain all the in-flight TX packets
+ * Drain all the in-flight TX packets.
+ * A timeout of 120 seconds for all the queues is used.
+ * This will break the while loop when h/w is not responding.
+ * This value of 120 has been decided here considering max
+ * number of queues.
*/
+
for (i = 0; i < apc->num_queues; i++) {
txq = &apc->tx_qp[i].txq;
-
- while (atomic_read(&txq->pending_sends) > 0)
- usleep_range(1000, 2000);
+ tsleep = 1000;
+ while (atomic_read(&txq->pending_sends) > 0 &&
+ time_before(jiffies, timeout)) {
+ usleep_range(tsleep, tsleep << 1);
+ tsleep <<= 1;
+ }
}
+ for (i = 0; i < apc->num_queues; i++) {
+ txq = &apc->tx_qp[i].txq;
+ cq = &apc->tx_qp[i].tx_cq;
+ while (atomic_read(&txq->pending_sends)) {
+ skb = skb_dequeue(&txq->pending_skbs);
+ mana_unmap_skb(skb, apc);
+ dev_consume_skb_any(skb);
+ atomic_sub(1, &txq->pending_sends);
+ }
+ }
/* We're 100% sure the queues can no longer be woken up, because
* we're sure now mana_poll_tx_cq() can't be running.
*/
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox