All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: Sagi Grimberg
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	Jack Morgenstein
	<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Subject: Re: [PATCH for-next 2/3] mlx4_core: Add helper functions to support MR re-registration
Date: Sun, 17 Aug 2014 16:16:55 +0300	[thread overview]
Message-ID: <53F0AB47.8050804@mellanox.com> (raw)
In-Reply-To: <53E206A2.6040800-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

On 6/8/2014 1:42 PM, Sagi Grimberg wrote:
> On 7/31/2014 11:01 AM, Or Gerlitz wrote:
>> From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Add few helper functions to support a mechanism of getting an
>> MPT, modifying it and updating the HCA with the modified object.
>>
>> The code takes 2 paths, one for directly changing the MPT (and sometimes
>> its related MTTs) and another one which queries the MPT and updates the
>> HCA via fw command SW2HW_MPT. The first path is used in native mode; the
>> second path is slower and is used only in SRIOV.
>>
>
> Hey Matan,
>
> Couple of comments below...
>
>> Signed-off-by: Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    2 +
>>   drivers/net/ethernet/mellanox/mlx4/mr.c            |  160
>> ++++++++++++++++++++
>>   .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   26 +++-
>>   include/linux/mlx4/device.h                        |   16 ++
>>   4 files changed, 202 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
>> b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
>> index 1d8af73..b40d587 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
>> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
>> @@ -279,6 +279,8 @@ struct mlx4_icm_table {
>>   #define MLX4_MPT_FLAG_PHYSICAL        (1 <<  9)
>>   #define MLX4_MPT_FLAG_REGION        (1 <<  8)
>>
>> +#define MLX4_MPT_PD_MASK        (0x1FFFFUL)
>> +#define MLX4_MPT_PD_VF_MASK        (0xFE0000UL)
>>   #define MLX4_MPT_PD_FLAG_FAST_REG   (1 << 27)
>>   #define MLX4_MPT_PD_FLAG_RAE        (1 << 28)
>>   #define MLX4_MPT_PD_FLAG_EN_INV        (3 << 24)
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c
>> b/drivers/net/ethernet/mellanox/mlx4/mr.c
>> index 2839abb..7d717ec 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/mr.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c
>> @@ -298,6 +298,131 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev,
>> struct mlx4_cmd_mailbox *mailbox
>>                   MLX4_CMD_TIME_CLASS_B, MLX4_CMD_WRAPPED);
>>   }
>>
>> +int mlx4_mr_hw_get_mpt(struct mlx4_dev *dev, struct mlx4_mr *mmr,
>> +               struct mlx4_mpt_entry ***mpt_entry)
>> +{
>> +    int err;
>> +    int key = key_to_hw_index(mmr->key) & (dev->caps.num_mpts - 1);
>> +    struct mlx4_cmd_mailbox *mailbox = NULL;
>> +
>> +    /* Make sure that at this point we have single-threaded access
>> only */
>
> This is not a specific comment for the below, but a general comment for
> the routine (otherwise I would expect the condition below to be
> atomic_read). Maybe it's better to document it as a requirement of the
> routine above (in kernel-doc style)?
>
>> +
>> +    if (mmr->enabled != MLX4_MPT_EN_HW)
>> +        return -EINVAL;
>> +
>> +    err = mlx4_HW2SW_MPT(dev, NULL, key);
>
> Nit: I'd lose this extra line between err=/if(err) statements.
> I see you did in some sections of this patch...
>
>> +
>> +    if (err) {
>> +        mlx4_warn(dev, "HW2SW_MPT failed (%d).", err);
>> +        mlx4_warn(dev, "Most likely the MR has MWs bound to it.\n");
>> +        return err;
>> +    }
>> +
>> +    mmr->enabled = MLX4_MPT_EN_SW;
>> +
>> +    if (!mlx4_is_mfunc(dev)) {
>> +        **mpt_entry = mlx4_table_find(
>> +                &mlx4_priv(dev)->mr_table.dmpt_table,
>> +                key, NULL);
>> +    } else {
>> +        mailbox = mlx4_alloc_cmd_mailbox(dev);
>> +        if (IS_ERR_OR_NULL(mailbox))
>> +            return PTR_ERR(mailbox);
>> +
>> +        err = mlx4_cmd_box(dev, 0, mailbox->dma, key,
>> +                   0, MLX4_CMD_QUERY_MPT,
>> +                   MLX4_CMD_TIME_CLASS_B,
>> +                   MLX4_CMD_WRAPPED);
>> +
>> +        if (err)
>> +            goto free_mailbox;
>> +
>> +        *mpt_entry = (struct mlx4_mpt_entry **)&mailbox->buf;
>> +    }
>> +
>> +    if (!(*mpt_entry) || !(**mpt_entry)) {
>> +        err = -ENOMEM;
>> +        goto free_mailbox;
>> +    }
>> +
>> +    return 0;
>> +
>> +free_mailbox:
>> +    mlx4_free_cmd_mailbox(dev, mailbox);
>> +    return err;
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_hw_get_mpt);
>> +
>> +int mlx4_mr_hw_write_mpt(struct mlx4_dev *dev, struct mlx4_mr *mmr,
>> +             struct mlx4_mpt_entry **mpt_entry)
>> +{
>> +    int err;
>> +
>> +    if (!mlx4_is_mfunc(dev)) {
>> +        /* Make sure any changes to this entry are flushed */
>> +        wmb();
>> +
>> +        *(u8 *)(*mpt_entry) = MLX4_MPT_STATUS_HW;
>> +
>> +        /* Make sure the new status is written */
>> +        wmb();
>> +
>> +        err = mlx4_SYNC_TPT(dev);
>> +    } else {
>> +        int key = key_to_hw_index(mmr->key) & (dev->caps.num_mpts - 1);
>> +
>> +        struct mlx4_cmd_mailbox *mailbox =
>> +            container_of((void *)mpt_entry, struct mlx4_cmd_mailbox,
>> +                     buf);
>> +
>> +        err = mlx4_SW2HW_MPT(dev, mailbox, key);
>> +    }
>> +
>> +    mmr->pd = be32_to_cpu((*mpt_entry)->pd_flags) & MLX4_MPT_PD_MASK;
>
> Here you update the PD even if you failed while ib_core changes them
> only if it succeeded, that would causes inconsistency. I think it will
> be better to write this update only after write/put_mpt succeeded
> (meaning it might be better to move it to the caller - mlx4_ib).
>
>> +    if (!err)
>> +        mmr->enabled = MLX4_MPT_EN_HW;
>> +    return err;
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_hw_write_mpt);
>> +
>> +void mlx4_mr_hw_put_mpt(struct mlx4_dev *dev,
>> +            struct mlx4_mpt_entry **mpt_entry)
>> +{
>> +    if (mlx4_is_mfunc(dev)) {
>> +        struct mlx4_cmd_mailbox *mailbox =
>> +            container_of((void *)mpt_entry, struct mlx4_cmd_mailbox,
>> +                     buf);
>> +        mlx4_free_cmd_mailbox(dev, mailbox);
>> +    }
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_hw_put_mpt);
>> +
>> +int mlx4_mr_hw_change_pd(struct mlx4_dev *dev, struct mlx4_mpt_entry
>> *mpt_entry,
>> +             u32 pdn)
>> +{
>> +    u32 pd_flags = be32_to_cpu(mpt_entry->pd_flags);
>> +    /* The wrapper function will put the slave's id here */
>> +    if (mlx4_is_mfunc(dev))
>> +        pd_flags &= ~MLX4_MPT_PD_VF_MASK;
>> +    mpt_entry->pd_flags = cpu_to_be32((pd_flags &  ~MLX4_MPT_PD_MASK) |
>
> Here you protect against bad pd_flags (corrupted mpt_entry was given?)
> with & ~MLX4_MPT_PD_MASK, but you might hide the mpt_corruption by
> masking it... isn't it better to fail (internal error) in this case?
>
>> +                      (pdn & MLX4_MPT_PD_MASK)
>> +                      | MLX4_MPT_PD_FLAG_EN_INV);
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_hw_change_pd);
>> +
>> +int mlx4_mr_hw_change_access(struct mlx4_dev *dev,
>> +                 struct mlx4_mpt_entry *mpt_entry,
>> +                 u32 access)
>> +{
>> +    u32 flags = (be32_to_cpu(mpt_entry->flags) & ~MLX4_PERM_MASK) |
>> +            (access & MLX4_PERM_MASK);
>> +
>> +    mpt_entry->flags = cpu_to_be32(flags);
>
> Note that you don't update the mmr access flags no where. after you
> succeed query_mr will probably return the old permissions.
>
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_hw_change_access);
>> +
>>   static int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx,
>> u32 pd,
>>                  u64 iova, u64 size, u32 access, int npages,
>>                  int page_shift, struct mlx4_mr *mr)
>> @@ -463,6 +588,41 @@ int mlx4_mr_free(struct mlx4_dev *dev, struct
>> mlx4_mr *mr)
>>   }
>>   EXPORT_SYMBOL_GPL(mlx4_mr_free);
>>
>> +void mlx4_mr_rereg_mem_cleanup(struct mlx4_dev *dev, struct mlx4_mr *mr)
>> +{
>> +    mlx4_mtt_cleanup(dev, &mr->mtt);
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_rereg_mem_cleanup);
>> +
>> +int mlx4_mr_rereg_mem_write(struct mlx4_dev *dev, struct mlx4_mr *mr,
>> +                u64 iova, u64 size, int npages,
>> +                int page_shift, struct mlx4_mpt_entry *mpt_entry)
>> +{
>> +    int err;
>> +
>> +    mpt_entry->start       = cpu_to_be64(mr->iova);
>> +    mpt_entry->length      = cpu_to_be64(mr->size);
>
> You pass u64 iova, u64 size, but you don't use them...
> probably better to use them (and update the mmr after the change - at
> mlx4_ib)
>
>> +    mpt_entry->entity_size = cpu_to_be32(mr->mtt.page_shift);
>> +
>> +    err = mlx4_mtt_init(dev, npages, page_shift, &mr->mtt);
>> +    if (err)
>> +        return err;
>> +
>> +    if (mr->mtt.order < 0) {
>> +        mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL);
>> +        mpt_entry->mtt_addr = 0;
>> +    } else {
>> +        mpt_entry->mtt_addr = cpu_to_be64(mlx4_mtt_addr(dev,
>> +                          &mr->mtt));
>> +        if (mr->mtt.page_shift == 0)
>> +            mpt_entry->mtt_sz    = cpu_to_be32(1 << mr->mtt.order);
>> +    }
>> +    mr->enabled = MLX4_MPT_EN_SW;
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(mlx4_mr_rereg_mem_write);
>> +
>>   int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr)
>>   {
>>       struct mlx4_cmd_mailbox *mailbox;
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
>> b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
>> index 0efc136..1089367 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
>> @@ -2613,12 +2613,34 @@ int mlx4_QUERY_MPT_wrapper(struct mlx4_dev
>> *dev, int slave,
>>       if (err)
>>           return err;
>>
>> -    if (mpt->com.from_state != RES_MPT_HW) {
>> +    if (mpt->com.from_state == RES_MPT_MAPPED) {
>> +        /* In order to allow rereg in SRIOV, we need to alter the MPT
>> entry. To do
>> +         * that, the VF must read the MPT. But since the MPT entry
>> memory is not
>> +         * in the VF's virtual memory space, it must use QUERY_MPT to
>> obtain the
>> +         * entry contents. To guarantee that the MPT cannot be
>> changed, the driver
>> +         * must perform HW2SW_MPT before this query and return the
>> MPT entry to HW
>> +         * ownership fofollowing the change. The change here allows
>> the VF to
>> +         * perform QUERY_MPT also when the entry is in SW ownership.
>> +         */
>> +        struct mlx4_mpt_entry *mpt_entry = mlx4_table_find(
>> +                    &mlx4_priv(dev)->mr_table.dmpt_table,
>> +                    mpt->key, NULL);
>> +
>> +        if (NULL == mpt_entry || NULL == outbox->buf) {
>> +            err = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        memcpy(outbox->buf, mpt_entry, sizeof(*mpt_entry));
>> +
>> +        err = 0;
>> +    } else if (mpt->com.from_state == RES_MPT_HW) {
>> +        err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>> +    } else {
>>           err = -EBUSY;
>>           goto out;
>>       }
>>
>> -    err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>>
>>   out:
>>       put_res(dev, slave, id, RES_MPT);
>> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
>> index b12f4bb..a64c397 100644
>> --- a/include/linux/mlx4/device.h
>> +++ b/include/linux/mlx4/device.h
>> @@ -262,6 +262,7 @@ enum {
>>       MLX4_PERM_REMOTE_WRITE    = 1 << 13,
>>       MLX4_PERM_ATOMIC    = 1 << 14,
>>       MLX4_PERM_BIND_MW    = 1 << 15,
>> +    MLX4_PERM_MASK        = 0xFC00
>>   };
>>
>>   enum {
>> @@ -1243,4 +1244,19 @@ int mlx4_vf_smi_enabled(struct mlx4_dev *dev,
>> int slave, int port);
>>   int mlx4_vf_get_enable_smi_admin(struct mlx4_dev *dev, int slave,
>> int port);
>>   int mlx4_vf_set_enable_smi_admin(struct mlx4_dev *dev, int slave,
>> int port,
>>                    int enable);
>> +int mlx4_mr_hw_get_mpt(struct mlx4_dev *dev, struct mlx4_mr *mmr,
>> +               struct mlx4_mpt_entry ***mpt_entry);
>> +int mlx4_mr_hw_write_mpt(struct mlx4_dev *dev, struct mlx4_mr *mmr,
>> +             struct mlx4_mpt_entry **mpt_entry);
>> +int mlx4_mr_hw_change_pd(struct mlx4_dev *dev, struct mlx4_mpt_entry
>> *mpt_entry,
>> +             u32 pdn);
>> +int mlx4_mr_hw_change_access(struct mlx4_dev *dev,
>> +                 struct mlx4_mpt_entry *mpt_entry,
>> +                 u32 access);
>> +void mlx4_mr_hw_put_mpt(struct mlx4_dev *dev,
>> +            struct mlx4_mpt_entry **mpt_entry);
>> +void mlx4_mr_rereg_mem_cleanup(struct mlx4_dev *dev, struct mlx4_mr
>> *mr);
>> +int mlx4_mr_rereg_mem_write(struct mlx4_dev *dev, struct mlx4_mr *mr,
>> +                u64 iova, u64 size, int npages,
>> +                int page_shift, struct mlx4_mpt_entry *mpt_entry);
>>   #endif /* MLX4_DEVICE_H */
>>
>

Hi Sagi,

Thanks for the comments.
I'll look into that and fix/reply.

Thanks,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-08-17 13:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-31  8:01 [PATCH for-next 0/3] Add user MR re-registeration support Or Gerlitz
     [not found] ` <1406793690-3816-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-07-31  8:01   ` [PATCH for-next 1/3] IB/core: " Or Gerlitz
2014-07-31  8:01   ` [PATCH for-next 2/3] mlx4_core: Add helper functions to support MR re-registration Or Gerlitz
     [not found]     ` <1406793690-3816-3-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-06 10:42       ` Sagi Grimberg
     [not found]         ` <53E206A2.6040800-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-08-17 13:16           ` Matan Barak [this message]
2014-07-31  8:01   ` [PATCH for-next 3/3] IB/mlx4_ib: Add support for user " Or Gerlitz
     [not found]     ` <1406793690-3816-4-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-08-06 10:46       ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53F0AB47.8050804@mellanox.com \
    --to=matanb-vpraknaxozvwk0htik3j/w@public.gmane.org \
    --cc=amirv-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.