All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Muchun Song <smuchun@gmail.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>, <david@redhat.com>,
	<daniel.vetter@ffwll.ch>, <dan.j.williams@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	<jhubbard@nvidia.com>, <jglisse@redhat.com>, <linux-mm@kvack.org>,
	<songmuchun@bytedance.com>
Subject: Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region
Date: Thu, 1 Apr 2021 16:03:53 +1100	[thread overview]
Message-ID: <5357782.nCMeKBJ0AR@nvdebian> (raw)
In-Reply-To: <CAPSr9jGNQ+cVMW12CDt3xTT4HJXDkxeqCTv5Gv=y4G-iRKEi=g@mail.gmail.com>

On Thursday, 1 April 2021 3:56:05 PM AEDT Muchun Song wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Mar 26, 2021 at 9:22 AM Alistair Popple <apopple@nvidia.com> wrote:
> >
> > request_free_mem_region() is used to find an empty range of physical
> > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> > over the range of possible addresses using region_intersects() to see if
> > the range is free.
> >
> > region_intersects() obtains a read lock before walking the resource tree
> > to protect against concurrent changes. However it drops the lock prior
> > to returning. This means by the time request_mem_region() is called in
> > request_free_mem_region() another thread may have already reserved the
> > requested region resulting in unexpected failures and a message in the
> > kernel log from hitting this condition:
> >
> >         /*
> >          * mm/hmm.c reserves physical addresses which then
> >          * become unavailable to other users.  Conflicts are
> >          * not expected.  Warn to aid debugging if encountered.
> >          */
> >         if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
> >                 pr_warn("Unaddressable device %s %pR conflicts with %pR",
> >                         conflict->name, conflict, res);
> >
> > To fix this create versions of region_intersects() and
> > request_mem_region() that allow the caller to take the appropriate lock
> > such that it may be held over the required calls.
> >
> > Instead of creating another version of devm_request_mem_region() that
> > doesn't take the lock open-code it to allow the caller to pre-allocate
> > the required memory prior to taking the lock.
> >
> > Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
> > Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region 
helper")
> > Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using 
ZONE_DEVICE")
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> 
> +cc my email (songmuchun@bytedance.com).
> 
> Hi Alistair,
> 
> Thanks for your patch. But there is a deadlock that should be fixed.
> Please see the following scenario.
> 
> __request_region
>     write_lock(&resource_lock)
>         request_region_locked
>             revoke_iomem
>                 devmem_is_allowed (arch/x86/mm/init.c)
>                     region_intersects
>                         read_lock(&resource_lock)   // deadlock

Thanks for the report and apologies for the breakage. The kernel test robot 
caught it pretty quickly - see https://lore.kernel.org/linux-mm/
20210330003842.18948-1-apopple@nvidia.com/ for an updated version that fixes 
this.

 - Alistair

> >
> > ---
> >
> > v2:
> >  - Added Fixes tag
> >
> > ---
> >  kernel/resource.c | 146 +++++++++++++++++++++++++++++-----------------
> >  1 file changed, 94 insertions(+), 52 deletions(-)
> >
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 627e61b0c124..2d4652383dd2 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
> >  }
> >  EXPORT_SYMBOL_GPL(page_is_ram);
> >
> > +static int __region_intersects(resource_size_t start, size_t size,
> > +                              unsigned long flags, unsigned long desc)
> > +{
> > +       struct resource res;
> > +       int type = 0; int other = 0;
> > +       struct resource *p;
> > +
> > +       res.start = start;
> > +       res.end = start + size - 1;
> > +
> > +       for (p = iomem_resource.child; p ; p = p->sibling) {
> > +               bool is_type = (((p->flags & flags) == flags) &&
> > +                               ((desc == IORES_DESC_NONE) ||
> > +                                (desc == p->desc)));
> > +
> > +               if (resource_overlaps(p, &res))
> > +                       is_type ? type++ : other++;
> > +       }
> > +
> > +       if (type == 0)
> > +               return REGION_DISJOINT;
> > +
> > +       if (other == 0)
> > +               return REGION_INTERSECTS;
> > +
> > +       return REGION_MIXED;
> > +}
> > +
> >  /**
> >   * region_intersects() - determine intersection of region with known 
resources
> >   * @start: region start address
> > @@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
> >  int region_intersects(resource_size_t start, size_t size, unsigned long 
flags,
> >                       unsigned long desc)
> >  {
> > -       struct resource res;
> > -       int type = 0; int other = 0;
> > -       struct resource *p;
> > -
> > -       res.start = start;
> > -       res.end = start + size - 1;
> > +       int rc;
> >
> >         read_lock(&resource_lock);
> > -       for (p = iomem_resource.child; p ; p = p->sibling) {
> > -               bool is_type = (((p->flags & flags) == flags) &&
> > -                               ((desc == IORES_DESC_NONE) ||
> > -                                (desc == p->desc)));
> > -
> > -               if (resource_overlaps(p, &res))
> > -                       is_type ? type++ : other++;
> > -       }
> > +       rc = __region_intersects(start, size, flags, desc);
> >         read_unlock(&resource_lock);
> > -
> > -       if (type == 0)
> > -               return REGION_DISJOINT;
> > -
> > -       if (other == 0)
> > -               return REGION_INTERSECTS;
> > -
> > -       return REGION_MIXED;
> > +       return rc;
> >  }
> >  EXPORT_SYMBOL_GPL(region_intersects);
> >
> > @@ -1171,31 +1180,17 @@ struct address_space *iomem_get_mapping(void)
> >         return smp_load_acquire(&iomem_inode)->i_mapping;
> >  }
> >
> > -/**
> > - * __request_region - create a new busy resource region
> > - * @parent: parent resource descriptor
> > - * @start: resource start address
> > - * @n: resource region size
> > - * @name: reserving caller's ID string
> > - * @flags: IO resource flags
> > - */
> > -struct resource * __request_region(struct resource *parent,
> > -                                  resource_size_t start, resource_size_t 
n,
> > -                                  const char *name, int flags)
> > +static bool request_region_locked(struct resource *parent,
> > +                                   struct resource *res, resource_size_t 
start,
> > +                                   resource_size_t n, const char *name, 
int flags)
> >  {
> > -       DECLARE_WAITQUEUE(wait, current);
> > -       struct resource *res = alloc_resource(GFP_KERNEL);
> >         struct resource *orig_parent = parent;
> > -
> > -       if (!res)
> > -               return NULL;
> > +       DECLARE_WAITQUEUE(wait, current);
> >
> >         res->name = name;
> >         res->start = start;
> >         res->end = start + n - 1;
> >
> > -       write_lock(&resource_lock);
> > -
> >         for (;;) {
> >                 struct resource *conflict;
> >
> > @@ -1230,16 +1225,39 @@ struct resource * __request_region(struct resource 
*parent,
> >                         write_lock(&resource_lock);
> >                         continue;
> >                 }
> > -               /* Uhhuh, that didn't work out.. */
> > -               free_resource(res);
> > -               res = NULL;
> > -               break;
> > +               return false;
> >         }
> > -       write_unlock(&resource_lock);
> >
> >         if (res && orig_parent == &iomem_resource)
> >                 revoke_iomem(res);
> 
> What do you think of the fix below?
> 
> +       if (res && orig_parent == &iomem_resource) {
> +               write_unlock(&resource_lock);
> +               revoke_iomem(res);
> +               write_lock(&resource_lock);
> +       }
> 
> >
> > +       return true;
> > +}
> > +
> > +/**
> > + * __request_region - create a new busy resource region
> > + * @parent: parent resource descriptor
> > + * @start: resource start address
> > + * @n: resource region size
> > + * @name: reserving caller's ID string
> > + * @flags: IO resource flags
> > + */
> > +struct resource *__request_region(struct resource *parent,
> > +                                 resource_size_t start, resource_size_t 
n,
> > +                                 const char *name, int flags)
> > +{
> > +       struct resource *res = alloc_resource(GFP_KERNEL);
> > +
> > +       if (!res)
> > +               return NULL;
> > +
> > +       write_lock(&resource_lock);
> > +       if (!request_region_locked(parent, res, start, n, name, flags)) {
> > +               /* Uhhuh, that didn't work out.. */
> > +               free_resource(res);
> > +               res = NULL;
> > +       }
> > +       write_unlock(&resource_lock);
> >         return res;
> >  }
> >  EXPORT_SYMBOL(__request_region);
> > @@ -1779,26 +1797,50 @@ static struct resource 
*__request_free_mem_region(struct device *dev,
> >  {
> >         resource_size_t end, addr;
> >         struct resource *res;
> > +       struct region_devres *dr = NULL;
> > +
> > +       res = alloc_resource(GFP_KERNEL);
> > +       if (!res)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       if (dev) {
> > +               dr = devres_alloc(devm_region_release, sizeof(struct 
region_devres),
> > +                                 GFP_KERNEL);
> > +               if (!dr) {
> > +                       free_resource(res);
> > +                       return ERR_PTR(-ENOMEM);
> > +               }
> > +       }
> >
> >         size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> >         end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 
1);
> >         addr = end - size + 1UL;
> >
> > +       write_lock(&resource_lock);
> >         for (; addr > size && addr >= base->start; addr -= size) {
> > -               if (region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> > +               if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> >                                 REGION_DISJOINT)
> >                         continue;
> >
> > -               if (dev)
> > -                       res = devm_request_mem_region(dev, addr, size, 
name);
> > -               else
> > -                       res = request_mem_region(addr, size, name);
> > -               if (!res)
> > -                       return ERR_PTR(-ENOMEM);
> > +               if (!request_region_locked(&iomem_resource, res, addr,
> > +                                                  size, name, 0))
> > +                       break;
> > +
> >                 res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> > +               if (dev) {
> > +                       dr->parent = &iomem_resource;
> > +                       dr->start = addr;
> > +                       dr->n = size;
> > +                       devres_add(dev, dr);
> > +               }
> > +
> > +               write_unlock(&resource_lock);
> >                 return res;
> >         }
> >
> > +       write_unlock(&resource_lock);
> > +       free_resource(res);
> > +
> >         return ERR_PTR(-ERANGE);
> >  }
> >
> > --
> > 2.20.1
> >






      reply	other threads:[~2021-04-01  5:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26  1:20 [PATCH v2] kernel/resource: Fix locking in request_free_mem_region Alistair Popple
2021-03-26  5:15 ` Balbir Singh
2021-03-29  1:55   ` Alistair Popple
2021-03-29  5:39     ` Balbir Singh
2021-03-26  8:57 ` David Hildenbrand
2021-03-29  1:37   ` Alistair Popple
2021-03-29  9:27     ` David Hildenbrand
2021-03-30  9:13     ` David Hildenbrand
2021-03-31  6:19       ` Alistair Popple
2021-03-31  6:41         ` David Hildenbrand
2021-03-29  5:42 ` [kernel/resource] cf1e4e12c9: WARNING:possible_recursive_locking_detected kernel test robot
2021-03-29  5:42   ` kernel test robot
2021-03-29  7:53   ` Alistair Popple
2021-03-29  7:53     ` Alistair Popple
2021-04-01  4:56 ` [PATCH v2] kernel/resource: Fix locking in request_free_mem_region Muchun Song
2021-04-01  5:03   ` Alistair Popple [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5357782.nCMeKBJ0AR@nvdebian \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=smuchun@gmail.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.