linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>
Cc: <dri-devel@lists.freedesktop.org>,
	<nouveau@lists.freedesktop.org>, <linux-cxl@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>, <nvdimm@lists.linux.dev>,
	<linux-pci@vger.kernel.org>
Subject: RE: [PATCH v2 16/28] resource: Introduce alloc_free_mem_region()
Date: Thu, 21 Jul 2022 09:10:49 -0700	[thread overview]
Message-ID: <62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com>

[ add dri-devel and nouveau ]

Dan Williams wrote:
> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>    drivers from the CXL Region driver (like the pmem or dax drivers) are
>    responsible for request_mem_region() when they activate the memory
>    range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> As part of the consolidation of the legacy GFR_REQUEST_REGION case with
> the new default of just inserting a new resource into the free space
> some minor cleanups like not checking for NULL before calling
> devres_free() (which does its own check) is included.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>

Jason, Christoph, anyone else that depends on CONFIG_DEVICE_PRIVATE,

Just a heads up that with Jonathan's review I am going to proceed with
pushing this change to linux-next. Please holler if
CONFIG_DEVICE_PRIVATE starts misbehaving, or if you have other feedback,
and I will get it fixed up.

> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |  178 +++++++++++++++++++++++++++++++++++++++---------
>  mm/Kconfig             |    5 +
>  3 files changed, 150 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 79d1ad6d6275..616b683563a9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> +		unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..4c5e80b92f2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
>  }
>  EXPORT_SYMBOL_GPL(page_is_ram);
>  
> -static int __region_intersects(resource_size_t start, size_t size,
> -			unsigned long flags, unsigned long desc)
> +static int __region_intersects(struct resource *parent, resource_size_t start,
> +			       size_t size, unsigned long flags,
> +			       unsigned long desc)
>  {
>  	struct resource res;
>  	int type = 0; int other = 0;
> @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size,
>  	res.start = start;
>  	res.end = start + size - 1;
>  
> -	for (p = iomem_resource.child; p ; p = p->sibling) {
> +	for (p = parent->child; p ; p = p->sibling) {
>  		bool is_type = (((p->flags & flags) == flags) &&
>  				((desc == IORES_DESC_NONE) ||
>  				 (desc == p->desc)));
> @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
>  	int ret;
>  
>  	read_lock(&resource_lock);
> -	ret = __region_intersects(start, size, flags, desc);
> +	ret = __region_intersects(&iomem_resource, start, size, flags, desc);
>  	read_unlock(&resource_lock);
>  
>  	return ret;
> @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static struct resource *__request_free_mem_region(struct device *dev,
> -		struct resource *base, unsigned long size, const char *name)
> +#ifdef CONFIG_GET_FREE_REGION
> +#define GFR_DESCENDING		(1UL << 0)
> +#define GFR_REQUEST_REGION	(1UL << 1)
> +#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
> +
> +static resource_size_t gfr_start(struct resource *base, resource_size_t size,
> +				 resource_size_t align, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING) {
> +		resource_size_t end;
> +
> +		end = min_t(resource_size_t, base->end,
> +			    (1ULL << MAX_PHYSMEM_BITS) - 1);
> +		return end - size + 1;
> +	}
> +
> +	return ALIGN(base->start, align);
> +}
> +
> +static bool gfr_continue(struct resource *base, resource_size_t addr,
> +			 resource_size_t size, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr > size && addr >= base->start;
> +	/*
> +	 * In the ascend case be careful that the last increment by
> +	 * @size did not wrap 0.
> +	 */
> +	return addr > addr - size &&
> +	       addr <= min_t(resource_size_t, base->end,
> +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> +}
> +
> +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> +				unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr - size;
> +	return addr + size;
> +}
> +
> +static void remove_free_mem_region(void *_res)
> +{
> +	struct resource *res = _res;
> +
> +	if (res->parent)
> +		remove_resource(res);
> +	free_resource(res);
> +}
> +
> +static struct resource *
> +get_free_mem_region(struct device *dev, struct resource *base,
> +		    resource_size_t size, const unsigned long align,
> +		    const char *name, const unsigned long desc,
> +		    const unsigned long flags)
>  {
> -	resource_size_t end, addr;
> +	resource_size_t addr;
>  	struct resource *res;
>  	struct region_devres *dr = NULL;
>  
> -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> -	addr = end - size + 1UL;
> +	size = ALIGN(size, align);
>  
>  	res = alloc_resource(GFP_KERNEL);
>  	if (!res)
>  		return ERR_PTR(-ENOMEM);
>  
> -	if (dev) {
> +	if (dev && (flags & GFR_REQUEST_REGION)) {
>  		dr = devres_alloc(devm_region_release,
>  				sizeof(struct region_devres), GFP_KERNEL);
>  		if (!dr) {
>  			free_resource(res);
>  			return ERR_PTR(-ENOMEM);
>  		}
> +	} else if (dev) {
> +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> +			return ERR_PTR(-ENOMEM);
>  	}
>  
>  	write_lock(&resource_lock);
> -	for (; addr > size && addr >= base->start; addr -= size) {
> -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> -				REGION_DISJOINT)
> +	for (addr = gfr_start(base, size, align, flags);
> +	     gfr_continue(base, addr, size, flags);
> +	     addr = gfr_next(addr, size, flags)) {
> +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> +		    REGION_DISJOINT)
>  			continue;
>  
> -		if (__request_region_locked(res, &iomem_resource, addr, size,
> -						name, 0))
> -			break;
> +		if (flags & GFR_REQUEST_REGION) {
> +			if (__request_region_locked(res, &iomem_resource, addr,
> +						    size, name, 0))
> +				break;
>  
> -		if (dev) {
> -			dr->parent = &iomem_resource;
> -			dr->start = addr;
> -			dr->n = size;
> -			devres_add(dev, dr);
> -		}
> +			if (dev) {
> +				dr->parent = &iomem_resource;
> +				dr->start = addr;
> +				dr->n = size;
> +				devres_add(dev, dr);
> +			}
>  
> -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> -		write_unlock(&resource_lock);
> +			res->desc = desc;
> +			write_unlock(&resource_lock);
> +
> +
> +			/*
> +			 * A driver is claiming this region so revoke any
> +			 * mappings.
> +			 */
> +			revoke_iomem(res);
> +		} else {
> +			res->start = addr;
> +			res->end = addr + size - 1;
> +			res->name = name;
> +			res->desc = desc;
> +			res->flags = IORESOURCE_MEM;
> +
> +			/*
> +			 * Only succeed if the resource hosts an exclusive
> +			 * range after the insert
> +			 */
> +			if (__insert_resource(base, res) || res->child)
> +				break;
> +
> +			write_unlock(&resource_lock);
> +		}
>  
> -		/*
> -		 * A driver is claiming this region so revoke any mappings.
> -		 */
> -		revoke_iomem(res);
>  		return res;
>  	}
>  	write_unlock(&resource_lock);
>  
> -	free_resource(res);
> -	if (dr)
> +	if (flags & GFR_REQUEST_REGION) {
> +		free_resource(res);
>  		devres_free(dr);
> +	} else if (dev)
> +		devm_release_action(dev, remove_free_mem_region, res);
>  
>  	return ERR_PTR(-ERANGE);
>  }
> @@ -1854,18 +1932,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
>  struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size)
>  {
> -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> +				   dev_name(dev),
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
>  
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name)
>  {
> -	return __request_free_mem_region(NULL, base, size, name);
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(request_free_mem_region);
>  
> -#endif /* CONFIG_DEVICE_PRIVATE */
> +/**
> + * alloc_free_mem_region - find a free region relative to @base
> + * @base: resource that will parent the new resource
> + * @size: size in bytes of memory to allocate from @base
> + * @align: alignment requirements for the allocation
> + * @name: resource name
> + *
> + * Buses like CXL, that can dynamically instantiate new memory regions,
> + * need a method to allocate physical address space for those regions.
> + * Allocate and insert a new resource to cover a free, unclaimed by a
> + * descendant of @base, range in the span of @base.
> + */
> +struct resource *alloc_free_mem_region(struct resource *base,
> +				       unsigned long size, unsigned long align,
> +				       const char *name)
> +{
> +	/* Default of ascending direction and insert resource */
> +	unsigned long flags = 0;
> +
> +	return get_free_mem_region(NULL, base, size, align, name,
> +				   IORES_DESC_NONE, flags);
> +}
> +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
> +#endif /* CONFIG_GET_FREE_REGION */
>  
>  static int __init strict_iomem(char *str)
>  {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 169e64192e48..a5b4fee2e3fd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -994,9 +994,14 @@ config HMM_MIRROR
>  	bool
>  	depends on MMU
>  
> +config GET_FREE_REGION
> +	depends on SPARSEMEM
> +	bool
> +
>  config DEVICE_PRIVATE
>  	bool "Unaddressable device memory (GPU memory, ...)"
>  	depends on ZONE_DEVICE
> +	select GET_FREE_REGION
>  
>  	help
>  	  Allows creation of struct pages to represent unaddressable device
> 



  parent reply	other threads:[~2022-07-21 16:11 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15  0:00 [PATCH v2 00/28] CXL PMEM Region Provisioning Dan Williams
2022-07-15  0:00 ` [PATCH v2 01/28] Documentation/cxl: Use a double line break between entries Dan Williams
2022-07-20 13:26   ` Jonathan Cameron
2022-07-15  0:00 ` [PATCH v2 02/28] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
2022-07-20 15:39   ` Jonathan Cameron
2022-07-15  0:00 ` [PATCH v2 03/28] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
2022-07-15  5:23   ` Greg Kroah-Hartman
2022-07-20 16:03   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 04/28] cxl/core: Define a 'struct cxl_root_decoder' Dan Williams
2022-07-20 16:07   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 05/28] cxl/core: Define a 'struct cxl_endpoint_decoder' Dan Williams
2022-07-20 16:11   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 06/28] cxl/hdm: Enumerate allocated DPA Dan Williams
2022-07-20 16:40   ` Jonathan Cameron
2022-07-21 15:29     ` Dan Williams
2022-07-15  0:01 ` [PATCH v2 07/28] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
2022-07-15  0:01 ` [PATCH v2 08/28] cxl/hdm: Track next decoder to allocate Dan Williams
2022-07-20 16:45   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 09/28] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
2022-07-20 16:51   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 10/28] cxl/port: Record dport in endpoint references Dan Williams
2022-07-20 16:53   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 11/28] cxl/port: Record parent dport when adding ports Dan Williams
2022-07-15  0:01 ` [PATCH v2 12/28] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
2022-07-15  0:01 ` [PATCH v2 13/28] cxl/port: Move dport tracking to an xarray Dan Williams
2022-07-20 16:56   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 14/28] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
2022-07-20 16:58   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 15/28] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
2022-07-15  0:02 ` [PATCH v2 16/28] resource: Introduce alloc_free_mem_region() Dan Williams
2022-07-20 17:00   ` Jonathan Cameron
2022-07-21 16:10   ` Dan Williams [this message]
2022-09-06 13:25   ` Rogerio Alves
2022-07-15  0:02 ` [PATCH v2 17/28] cxl/region: Add region creation support Dan Williams
2022-07-20 17:16   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 18/28] cxl/region: Add a 'uuid' attribute Dan Williams
2022-07-20 17:18   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 19/28] cxl/region: Add interleave geometry attributes Dan Williams
2022-07-15  0:02 ` [PATCH v2 20/28] cxl/region: Allocate HPA capacity to regions Dan Williams
2022-07-20 17:20   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 21/28] cxl/region: Enable the assignment of endpoint decoders " Dan Williams
2022-07-20 17:26   ` Jonathan Cameron
2022-07-20 19:05     ` Dan Williams
2022-07-15  0:02 ` [PATCH v2 22/28] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
2022-07-15  0:02 ` [PATCH v2 23/28] cxl/region: Attach endpoint decoders Dan Williams
2022-07-20 17:29   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 24/28] cxl/region: Program target lists Dan Williams
2022-07-20 17:41   ` Jonathan Cameron
2022-07-21 16:56     ` Dan Williams
2022-07-15  0:03 ` [PATCH v2 25/28] cxl/hdm: Commit decoder state to hardware Dan Williams
2022-07-20 17:44   ` Jonathan Cameron
2022-07-15  0:03 ` [PATCH v2 26/28] cxl/region: Add region driver boiler plate Dan Williams
2022-07-15  0:03 ` [PATCH v2 27/28] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
2022-07-20 17:46   ` Jonathan Cameron
2022-07-15  0:03 ` [PATCH v2 28/28] cxl/region: Introduce cxl_pmem_region objects Dan Williams
2022-07-20 18:05   ` Jonathan Cameron
2022-07-20 18:12 ` [PATCH v2 00/28] CXL PMEM Region Provisioning Jonathan Cameron
2022-07-21 18:34   ` Dan Williams
2022-07-21 14:59 ` Jonathan Cameron
2022-07-21 16:29   ` Dan Williams
2022-07-21 17:22     ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).