All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov@parallels.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
Date: Tue, 28 Feb 2017 12:57:29 +0100	[thread overview]
Message-ID: <20170228115729.GB13872@osiris> (raw)
In-Reply-To: <20170227162031.GA27937@dhcp22.suse.cz>

On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote:
> [CC Rafael]
> 
> I've got lost in the acpi indirection (again). I can see
> acpi_device_hotplug calling lock_device_hotplug() but i cannot find a
> path down to add_memory() which might call add_memory_resource. But the
> patch below sounds suspicious to me. Is it possible that this could lead
> to a deadlock. I would suspect that it is the s390 code which needs to
> do the locking. But I would have to double check - it is really easy to
> get lost there.

To me it rather looks like bfc8c90139eb ("mem-hotplug: implement
get/put_online_mems") introduced quite subtle and probably wrong locking
rules.

The patch introduced mem_hotplug_begin() in order to have something like
cpu_hotplug_begin() for memory. Note that for cpu hotplug all
cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin().

Especially this makes sure that active_writer can only be changed by one
process. (See also Dan's commit which introduced the lock_device_hotplug()
calls: https://marc.info/?l=linux-kernel&m=148693912419972&w=2 )

If you look at the above commit bfc8c90139eb: there is nothing like
cpu_maps_update_begin() for memory. And therefore it's possible to have
concurrent writers to active_writer.

It looks like now lock_device_hotplug() is supposed to be the new
cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I
read the code completely wrong ;)

> On Sun 26-02-17 12:42:44, Sebastian Ott wrote:
> > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390:
> > 
> > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 assert_held_device_hotplug+0x4a/0x58
> > [    5.731214] Call Trace:
> > [    5.731219] ([<000000000067b8b0>] assert_held_device_hotplug+0x40/0x58)
> > [    5.731225]  [<0000000000337914>] mem_hotplug_begin+0x34/0xc8
> > [    5.731231]  [<00000000008b897e>] add_memory_resource+0x7e/0x1f8
> > [    5.731236]  [<00000000008b8bd2>] add_memory+0xda/0x130
> > [    5.731243]  [<0000000000d7f0dc>] add_memory_merged+0x15c/0x178
> > [    5.731247]  [<0000000000d7f3a6>] sclp_detect_standby_memory+0x2ae/0x2f8
> > [    5.731252]  [<00000000001002ba>] do_one_initcall+0xa2/0x150
> > [    5.731258]  [<0000000000d3adc0>] kernel_init_freeable+0x228/0x2d8
> > [    5.731263]  [<00000000008b6572>] kernel_init+0x2a/0x140
> > [    5.731267]  [<00000000008c3972>] kernel_thread_starter+0x6/0xc
> > [    5.731272]  [<00000000008c396c>] kernel_thread_starter+0x0/0xc
> > [    5.731276] no locks held by swapper/0/1.
> > [    5.731280] Last Breaking-Event-Address:
> > [    5.731285]  [<000000000067b8b6>] assert_held_device_hotplug+0x46/0x58
> > [    5.731292] ---[ end trace 46480df21194c96a ]---
> 
> such an informtion belongs to the changelog
> 
> > ----->8
> > mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
> > 
> > With commit 3fc219241 ("mm: validate device_hotplug is held for memory hotplug")
> > a lock assertion was added to mem_hotplug_begin() which led to a warning
> > when add_memory() is called. Fix this by acquiring device_hotplug_lock in
> > add_memory_resource().
> > 
> > Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
> > ---
> >  mm/memory_hotplug.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 1d3ed58..c633bbc 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1361,6 +1361,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
> >  		new_pgdat = !p;
> >  	}
> >  
> > +	lock_device_hotplug();
> >  	mem_hotplug_begin();
> >  
> >  	/*
> > @@ -1416,6 +1417,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
> >  
> >  out:
> >  	mem_hotplug_done();
> > +	unlock_device_hotplug();
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(add_memory_resource);
> > -- 
> > 2.3.0
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov@parallels.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
Date: Tue, 28 Feb 2017 12:57:29 +0100	[thread overview]
Message-ID: <20170228115729.GB13872@osiris> (raw)
In-Reply-To: <20170227162031.GA27937@dhcp22.suse.cz>

On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote:
> [CC Rafael]
> 
> I've got lost in the acpi indirection (again). I can see
> acpi_device_hotplug calling lock_device_hotplug() but i cannot find a
> path down to add_memory() which might call add_memory_resource. But the
> patch below sounds suspicious to me. Is it possible that this could lead
> to a deadlock. I would suspect that it is the s390 code which needs to
> do the locking. But I would have to double check - it is really easy to
> get lost there.

To me it rather looks like bfc8c90139eb ("mem-hotplug: implement
get/put_online_mems") introduced quite subtle and probably wrong locking
rules.

The patch introduced mem_hotplug_begin() in order to have something like
cpu_hotplug_begin() for memory. Note that for cpu hotplug all
cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin().

Especially this makes sure that active_writer can only be changed by one
process. (See also Dan's commit which introduced the lock_device_hotplug()
calls: https://marc.info/?l=linux-kernel&m=148693912419972&w=2 )

If you look at the above commit bfc8c90139eb: there is nothing like
cpu_maps_update_begin() for memory. And therefore it's possible to have
concurrent writers to active_writer.

It looks like now lock_device_hotplug() is supposed to be the new
cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I
read the code completely wrong ;)

> On Sun 26-02-17 12:42:44, Sebastian Ott wrote:
> > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390:
> > 
> > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 assert_held_device_hotplug+0x4a/0x58
> > [    5.731214] Call Trace:
> > [    5.731219] ([<000000000067b8b0>] assert_held_device_hotplug+0x40/0x58)
> > [    5.731225]  [<0000000000337914>] mem_hotplug_begin+0x34/0xc8
> > [    5.731231]  [<00000000008b897e>] add_memory_resource+0x7e/0x1f8
> > [    5.731236]  [<00000000008b8bd2>] add_memory+0xda/0x130
> > [    5.731243]  [<0000000000d7f0dc>] add_memory_merged+0x15c/0x178
> > [    5.731247]  [<0000000000d7f3a6>] sclp_detect_standby_memory+0x2ae/0x2f8
> > [    5.731252]  [<00000000001002ba>] do_one_initcall+0xa2/0x150
> > [    5.731258]  [<0000000000d3adc0>] kernel_init_freeable+0x228/0x2d8
> > [    5.731263]  [<00000000008b6572>] kernel_init+0x2a/0x140
> > [    5.731267]  [<00000000008c3972>] kernel_thread_starter+0x6/0xc
> > [    5.731272]  [<00000000008c396c>] kernel_thread_starter+0x0/0xc
> > [    5.731276] no locks held by swapper/0/1.
> > [    5.731280] Last Breaking-Event-Address:
> > [    5.731285]  [<000000000067b8b6>] assert_held_device_hotplug+0x46/0x58
> > [    5.731292] ---[ end trace 46480df21194c96a ]---
> 
> such an informtion belongs to the changelog
> 
> > ----->8
> > mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
> > 
> > With commit 3fc219241 ("mm: validate device_hotplug is held for memory hotplug")
> > a lock assertion was added to mem_hotplug_begin() which led to a warning
> > when add_memory() is called. Fix this by acquiring device_hotplug_lock in
> > add_memory_resource().
> > 
> > Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
> > ---
> >  mm/memory_hotplug.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 1d3ed58..c633bbc 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1361,6 +1361,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
> >  		new_pgdat = !p;
> >  	}
> >  
> > +	lock_device_hotplug();
> >  	mem_hotplug_begin();
> >  
> >  	/*
> > @@ -1416,6 +1417,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
> >  
> >  out:
> >  	mem_hotplug_done();
> > +	unlock_device_hotplug();
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(add_memory_resource);
> > -- 
> > 2.3.0
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

  reply	other threads:[~2017-02-28 11:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-26 11:42 [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done} Sebastian Ott
2017-02-26 11:42 ` Sebastian Ott
2017-02-27 16:20 ` Michal Hocko
2017-02-27 16:20   ` Michal Hocko
2017-02-28 11:57   ` Heiko Carstens [this message]
2017-02-28 11:57     ` Heiko Carstens
2017-03-01 12:51     ` Heiko Carstens
2017-03-01 12:51       ` Heiko Carstens
2017-03-01 15:52       ` Dan Williams
2017-03-01 15:52         ` Dan Williams
2017-03-01 17:04         ` Heiko Carstens
2017-03-01 17:04           ` Heiko Carstens
2017-03-01 22:55           ` Dan Williams
2017-03-01 22:55             ` Dan Williams
2017-03-06  8:22             ` Heiko Carstens
2017-03-06  8:22               ` Heiko Carstens
2017-03-09  6:26               ` Dan Williams
2017-03-09  6:26                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228115729.GB13872@osiris \
    --to=heiko.carstens@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=sebott@linux.vnet.ibm.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.