Re: [PATCH v3] kernel/resource: Fix locking in request_free_mem_region

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <david@redhat.com>,
	<daniel.vetter@ffwll.ch>, <dan.j.williams@intel.com>,
	<gregkh@linuxfoundation.org>, <jglisse@redhat.com>,
	<bsingharora@gmail.com>,
	kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v3] kernel/resource: Fix locking in request_free_mem_region
Date: Tue, 30 Mar 2021 15:59:17 +1100	[thread overview]
Message-ID: <7635785.yHxt4vaozm@nvdebian> (raw)
In-Reply-To: <801cd23e-9ac0-adb6-88ba-63ad18d4664c@nvidia.com>

On Tuesday, 30 March 2021 2:42:34 PM AEDT John Hubbard wrote:
> On 3/29/21 5:38 PM, Alistair Popple wrote:
> > request_free_mem_region() is used to find an empty range of physical
> > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> > over the range of possible addresses using region_intersects() to see if
> > the range is free.
> > 
> > region_intersects() obtains a read lock before walking the resource tree
> > to protect against concurrent changes. However it drops the lock prior
> > to returning. This means by the time request_mem_region() is called in
> > request_free_mem_region() another thread may have already reserved the
> > requested region resulting in unexpected failures and a message in the
> > kernel log from hitting this condition:
> > 
> >          /*
> >           * mm/hmm.c reserves physical addresses which then
> >           * become unavailable to other users.  Conflicts are
> >           * not expected.  Warn to aid debugging if encountered.
> >           */
> >          if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
> >                  pr_warn("Unaddressable device %s %pR conflicts with %pR",
> >                          conflict->name, conflict, res);
> > 
> > To fix this create versions of region_intersects() and
> > request_mem_region() that allow the caller to take the appropriate lock
> > such that it may be held over the required calls.
> > 
> > Instead of creating another version of devm_request_mem_region() that
> > doesn't take the lock open-code it to allow the caller to pre-allocate
> > the required memory prior to taking the lock.
> > 
> > Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
> > Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region 
helper")
> 
> Hi Alistair!
> 
> The above "Fixes:" tag looks wrong to me, because that commit did not create
> the broken locking that this patch fixes. Therefore, I think that particular 
line
> should be removed from the commit description.

Right, the last "Fixes:" tag is the origin of the bug but the refactoring into 
different functions and files made it non-obvious how this patch was related. 
Happy to drop these though.
 
> Another note below:
> 
> > Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using 
ZONE_DEVICE")
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> > Acked-by: Balbir Singh <bsingharora@gmail.com>
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > 
> > ---
> > 
> > Hi Andrew,
> > 
> > This fixes a boot issue reported by the kernel test robot with the
> > previous version of the patch on x86 with CONFIG_IO_STRICT_DEVMEM=y.
> > This was due to the platform specific implementation of
> > devmem_is_allowed() creating a recursive lock which I missed. I notice
> > you have put v2 in mmtom so apologies for the churn but can you please
> > use this version instead? Thanks.
> > 
> >   - Alistair
> > ---
> >   kernel/resource.c | 142 ++++++++++++++++++++++++++++++----------------
> >   1 file changed, 92 insertions(+), 50 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 627e61b0c124..7061b9f903ca 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
> >   }
> >   EXPORT_SYMBOL_GPL(page_is_ram);
> >   
> > +static int __region_intersects(resource_size_t start, size_t size,
> > +			       unsigned long flags, unsigned long desc)
> > +{
> > +	struct resource res;
> > +	int type = 0; int other = 0;
> > +	struct resource *p;
> > +
> > +	res.start = start;
> > +	res.end = start + size - 1;
> > +
> > +	for (p = iomem_resource.child; p ; p = p->sibling) {
> > +		bool is_type = (((p->flags & flags) == flags) &&
> > +				((desc == IORES_DESC_NONE) ||
> > +				 (desc == p->desc)));
> > +
> > +		if (resource_overlaps(p, &res))
> > +			is_type ? type++ : other++;
> > +	}
> > +
> > +	if (type == 0)
> > +		return REGION_DISJOINT;
> > +
> > +	if (other == 0)
> > +		return REGION_INTERSECTS;
> > +
> > +	return REGION_MIXED;
> > +}
> > +
> >   /**
> >    * region_intersects() - determine intersection of region with known 
resources
> >    * @start: region start address
> > @@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
> >   int region_intersects(resource_size_t start, size_t size, unsigned long 
flags,
> >   		      unsigned long desc)
> >   {
> > -	struct resource res;
> > -	int type = 0; int other = 0;
> > -	struct resource *p;
> > -
> > -	res.start = start;
> > -	res.end = start + size - 1;
> > +	int rc;
> >   
> >   	read_lock(&resource_lock);
> > -	for (p = iomem_resource.child; p ; p = p->sibling) {
> > -		bool is_type = (((p->flags & flags) == flags) &&
> > -				((desc == IORES_DESC_NONE) ||
> > -				 (desc == p->desc)));
> > -
> > -		if (resource_overlaps(p, &res))
> > -			is_type ? type++ : other++;
> > -	}
> > +	rc = __region_intersects(start, size, flags, desc);
> >   	read_unlock(&resource_lock);
> > -
> > -	if (type == 0)
> > -		return REGION_DISJOINT;
> > -
> > -	if (other == 0)
> > -		return REGION_INTERSECTS;
> > -
> > -	return REGION_MIXED;
> > +	return rc;
> >   }
> >   EXPORT_SYMBOL_GPL(region_intersects);
> >   
> > @@ -1171,31 +1180,16 @@ struct address_space *iomem_get_mapping(void)
> >   	return smp_load_acquire(&iomem_inode)->i_mapping;
> >   }
> >   
> > -/**
> > - * __request_region - create a new busy resource region
> > - * @parent: parent resource descriptor
> > - * @start: resource start address
> > - * @n: resource region size
> > - * @name: reserving caller's ID string
> > - * @flags: IO resource flags
> > - */
> > -struct resource * __request_region(struct resource *parent,
> > -				   resource_size_t start, resource_size_t n,
> > -				   const char *name, int flags)
> > +static bool request_region_locked(struct resource *parent,
> > +				    struct resource *res, resource_size_t start,
> > +				    resource_size_t n, const char *name, int flags)
> >   {
> >   	DECLARE_WAITQUEUE(wait, current);
> > -	struct resource *res = alloc_resource(GFP_KERNEL);
> > -	struct resource *orig_parent = parent;
> > -
> > -	if (!res)
> > -		return NULL;
> >   
> >   	res->name = name;
> >   	res->start = start;
> >   	res->end = start + n - 1;
> >   
> > -	write_lock(&resource_lock);
> > -
> >   	for (;;) {
> >   		struct resource *conflict;
> >   
> > @@ -1230,14 +1224,37 @@ struct resource * __request_region(struct resource 
*parent,
> >   			write_lock(&resource_lock);
> >   			continue;
> >   		}
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +/**
> > + * __request_region - create a new busy resource region
> > + * @parent: parent resource descriptor
> > + * @start: resource start address
> > + * @n: resource region size
> > + * @name: reserving caller's ID string
> > + * @flags: IO resource flags
> > + */
> > +struct resource *__request_region(struct resource *parent,
> > +				  resource_size_t start, resource_size_t n,
> > +				  const char *name, int flags)
> > +{
> > +	struct resource *res = alloc_resource(GFP_KERNEL);
> > +
> > +	if (!res)
> > +		return NULL;
> > +
> > +	write_lock(&resource_lock);
> > +	if (!request_region_locked(parent, res, start, n, name, flags)) {
> >   		/* Uhhuh, that didn't work out.. */
> >   		free_resource(res);
> >   		res = NULL;
> > -		break;
> >   	}
> >   	write_unlock(&resource_lock);
> > -
> > -	if (res && orig_parent == &iomem_resource)
> > +	if (res && parent == &iomem_resource)
> >   		revoke_iomem(res);
> >   
> >   	return res;
> > @@ -1779,26 +1796,51 @@ static struct resource 
*__request_free_mem_region(struct device *dev,
> >   {
> >   	resource_size_t end, addr;
> >   	struct resource *res;
> > +	struct region_devres *dr = NULL;
> > +
> > +	res = alloc_resource(GFP_KERNEL);
> > +	if (!res)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	if (dev) {
> > +		dr = devres_alloc(devm_region_release, sizeof(struct region_devres),
> > +				  GFP_KERNEL);
> > +		if (!dr) {
> > +			free_resource(res);
> > +			return ERR_PTR(-ENOMEM);
> > +		}
> > +	}
> >   
> >   	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> >   	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> >   	addr = end - size + 1UL;
> >   
> > +	write_lock(&resource_lock);
> >   	for (; addr > size && addr >= base->start; addr -= size) {
> > -		if (region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> > +		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> >   				REGION_DISJOINT)
> >   			continue;
> >   
> > -		if (dev)
> > -			res = devm_request_mem_region(dev, addr, size, name);
> > -		else
> > -			res = request_mem_region(addr, size, name);
> > -		if (!res)
> > -			return ERR_PTR(-ENOMEM);
> > +		if (!request_region_locked(&iomem_resource, res, addr,
> > +						   size, name, 0))
> > +			break;
> > +
> >   		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> > +		if (dev) {
> > +			dr->parent = &iomem_resource;
> > +			dr->start = addr;
> > +			dr->n = size;
> > +			devres_add(dev, dr);
> > +		}
> > +
> > +		write_unlock(&resource_lock);
> > +		revoke_iomem(res);
> 
> This is new, and not mentioned in the commit log, and therefore quite
> surprising. It seems like the right thing to do but it also seems like a
> different fix! I'm not saying that it should be a separate patch, but it
> does seem worth loudly mentioning in the commit log, yes?

This isn't a different fix though, it is just about maintaining the original 
behaviour which called revoke_iomem() after dropping the lock. I inadvertently 
switched this around in the initial patch such that revoke_iomem() got called 
with the lock, leading to deadlock on x86 with CONFIG_IO_STRICT_DEVMEM=y.

This does change the order of revoke_iomem() and devres_add() slightly, but as 
far as I can tell that shouldn't be an issue. Can call that out in the commit 
log.

> >   		return res;
> >   	}
> >   
> > +	write_unlock(&resource_lock);
> > +	free_resource(res);
> > +
> >   	return ERR_PTR(-ERANGE);
> >   }
> >   
> > 
> 
> thanks,
>

next prev parent reply	other threads:[~2021-03-30  4:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30  0:38 [PATCH v3] kernel/resource: Fix locking in request_free_mem_region Alistair Popple
2021-03-30  3:42 ` John Hubbard
2021-03-30  4:59   ` Alistair Popple [this message]
2021-03-30  5:17     ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7635785.yHxt4vaozm@nvdebian \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oliver.sang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.