linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Michal Kalderon <michal.kalderon@marvell.com>,
	Kamal Heib <kamalheib1@gmail.com>
Cc: ariel.elior@marvell.com, dledford@redhat.com,
	galpress@amazon.com, linux-rdma@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
Date: Thu, 25 Jul 2019 14:55:40 -0300	[thread overview]
Message-ID: <20190725175540.GA18757@ziepe.ca> (raw)
In-Reply-To: <20190709141735.19193-2-michal.kalderon@marvell.com>

On Tue, Jul 09, 2019 at 05:17:30PM +0300, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
> 
> The code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
> 
> Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   |   1 +
>  drivers/infiniband/core/uverbs_cmd.c  |   1 +
>  drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h               |  46 ++++++++++++
>  5 files changed, 184 insertions(+)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 8a6ccb936dfe..a830c2c5d691 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
>  	SET_DEVICE_OP(dev_ops, mmap);
> +	SET_DEVICE_OP(dev_ops, mmap_free);
>  	SET_DEVICE_OP(dev_ops, modify_ah);
>  	SET_DEVICE_OP(dev_ops, modify_cq);
>  	SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..1ed01b02401f 100644
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>  
>  	rdma_restrack_del(&ucontext->res);
>  
> +	rdma_user_mmap_entries_remove_free(ucontext);
>  	ib_dev->ops.dealloc_ucontext(ucontext);
>  	kfree(ucontext);
>  
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 7ddd0e5bc6b3..44c0600245e4 100644
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
>  
>  	mutex_init(&ucontext->per_mm_list_lock);
>  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> +	xa_init(&ucontext->mmap_xa);
>  
>  	ret = get_unused_fd_flags(O_CLOEXEC);
>  	if (ret < 0)
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 11c13c1381cf..4b909d7b97de 100644
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL(rdma_user_mmap_io);
>  
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}
> +
> +/**
> + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @key: The key received from rdma_user_mmap_entry_insert which
> + *     is provided by user as the address to map.
> + * @len: The length the user wants to map
> + *
> + * This function is called when a user tries to mmap a key it
> + * initially received from the driver. They key was created by
> + * the function rdma_user_mmap_entry_insert.
> + *
> + * Return an entry if exists or NULL if there is no match.
> + */
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +	if (!entry || entry->length != len)
> +		return NULL;
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> +		  entry->obj, key, entry->address, entry->length);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);

It is a mistake we keep making, and maybe the war is hopelessly lost
now, but functions called from a driver should not be part of the
ib_uverbs module - ideally uverbs is an optional module. They should
be in ib_core.

Maybe put this in ib_core_uverbs.c ?

Kamal, you've been tackling various cleanups, maybe making ib_uverbs
unloadable again is something you'd be keen on?

> +/**
> + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @obj: opaque driver object that will be stored in the entry.
> + * @address: The address that will be mmapped to the user
> + * @length: Length of the address that will be mmapped
> + * @mmap_flag: opaque driver flags related to the address (For
> + *           example could be used for cachability)
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user call mmap with the given key.
> + * The function returns a unique key that should be provided to user, the user
> + * will use the key to map the given address.
> + *
> + * Note this locking scheme cannot support removal of entries,
> + * except during ucontext destruction when the core code
> + * guarentees no concurrency.
> + *
> + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u32 next_mmap_page;
> +	int err;
> +
> +	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +	if (!entry)
> +		return RDMA_USER_MMAP_INVALID;
> +
> +	entry->obj = obj;
> +	entry->address = address;
> +	entry->length = length;
> +	entry->mmap_flag = mmap_flag;
> +
> +	xa_lock(&ucontext->mmap_xa);
> +	if (check_add_overflow(ucontext->mmap_xa_page,
> +			       (u32)(length >> PAGE_SHIFT),

Should this be divide round up ?

> +			       &next_mmap_page))
> +		goto err_unlock;

I still don't like that this algorithm latches into a permanent
failure when the xa_page wraps.

It seems worth spending a bit more time here to tidy this.. Keep using
the mmap_xa_page scheme, but instead do something like

alloc_cyclic_range():

while () {
   // Find first empty element in a cyclic way
   xa_page_first = mmap_xa_page;
   xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)

   // Is there a enough room to have the range?
   if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
      mmap_xa_page = 0;
      continue;
   }

   // See if the element before intersects 
   elm = xa_find(xa, &zero, xa_page_end, 0);
   if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
      mmap_xa_page = elm->last + 1;
      continue
   }
  
   // xa_page_first -> xa_page_end should now be free
   xa_insert(xa, xa_page_start, entry);
   mmap_xa_page = xa_page_end + 1;
   return xa_page_start;
}

Approximately, please check it.

> @@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
>  
>  #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
>  
> +#define RDMA_USER_MMAP_FLAG_SHIFT 56
> +#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> +#define RDMA_USER_MMAP_INVALID U64_MAX
> +struct rdma_user_mmap_entry {
> +	void *obj;
> +	u64 address;
> +	u64 length;
> +	u32 mmap_page;
> +	u8 mmap_flag;
> +};
> +
>  /**
>   * struct ib_device_ops - InfiniBand device operations
>   * This structure defines all the InfiniBand device operations, providers will
> @@ -2311,6 +2324,19 @@ struct ib_device_ops {
>  			      struct ib_udata *udata);
>  	void (*dealloc_ucontext)(struct ib_ucontext *context);
>  	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> +	/**
> +	 * Memory that is mapped to the user can only be freed once the
> +	 * ucontext of the application is destroyed. This is for
> +	 * security reasons where we don't want an application to have a
> +	 * mapping to phyiscal memory that is freed and allocated to
> +	 * another application. For this reason, all the entries are
> +	 * stored in ucontext and once ucontext is freed mmap_free is
> +	 * called on each of the entries. They type of the memory that

They -> the

> +	 * was mapped may differ between entries and is opaque to the
> +	 * rdma_user_mmap interface. Therefore needs to be implemented
> +	 * by the driver in mmap_free.
> +	 */
> +	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
>  	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
>  	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
>  	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> @@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
>  #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
>  int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  		      unsigned long pfn, unsigned long size, pgprot_t prot);
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag);
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
> +void rdma_user_mmap_entries_remove_free(struct ib_ucontext
> *ucontext);

Should remove_free should be in the core-priv header?

Jason

  parent reply	other threads:[~2019-07-25 17:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-09 14:17 [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Michal Kalderon
2019-07-09 14:17 ` [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions Michal Kalderon
2019-07-10 12:19   ` Gal Pressman
2019-07-25 17:55   ` Jason Gunthorpe [this message]
2019-07-25 19:34     ` Michal Kalderon
2019-07-25 19:52       ` Jason Gunthorpe
2019-07-26  8:42         ` Michal Kalderon
2019-07-26 13:23           ` Jason Gunthorpe
2019-07-28  8:45             ` Gal Pressman
2019-07-29 14:06               ` Jason Gunthorpe
2019-07-28  9:30     ` Kamal Heib
2019-07-29 14:11       ` Jason Gunthorpe
2019-07-29 12:58     ` Michal Kalderon
2019-07-29 13:53       ` Gal Pressman
2019-07-29 14:04         ` Jason Gunthorpe
2019-07-29 15:26           ` [EXT] " Michal Kalderon
2019-07-29 14:07         ` Michal Kalderon
2019-07-09 14:17 ` [PATCH v6 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers Michal Kalderon
2019-07-10 12:09   ` Gal Pressman
2019-07-09 14:17 ` [PATCH v6 rdma-next 3/6] RDMA/qedr: Use the common mmap API Michal Kalderon
2019-07-09 14:17 ` [PATCH v6 rdma-next 4/6] qed*: Change dpi_addr to be denoted with __iomem Michal Kalderon
2019-07-25 18:06   ` Jason Gunthorpe
2019-07-09 14:17 ` [PATCH v6 rdma-next 5/6] RDMA/qedr: Add doorbell overflow recovery support Michal Kalderon
2019-07-25 18:01   ` Jason Gunthorpe
2019-07-25 19:38     ` [EXT] " Michal Kalderon
2019-07-09 14:17 ` [PATCH v6 rdma-next 6/6] RDMA/qedr: Add iWARP doorbell " Michal Kalderon
2019-07-10  7:32 ` [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA Gal Pressman
2019-07-11  7:23   ` Michal Kalderon
2019-07-25 18:01 ` Jason Gunthorpe
2019-07-25 19:40   ` [EXT] " Michal Kalderon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190725175540.GA18757@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=ariel.elior@marvell.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=galpress@amazon.com \
    --cc=kamalheib1@gmail.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michal.kalderon@marvell.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).