Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 7/9] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator (v1)
From: Paul E. McKenney @ 2019-07-16 18:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Bjorn Helgaas, linux-kernel, Alexey Kuznetsov, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716040303.GA73383@google.com>

On Tue, Jul 16, 2019 at 12:03:03AM -0400, Joel Fernandes wrote:
> On Mon, Jul 15, 2019 at 03:02:35PM -0500, Bjorn Helgaas wrote:
> > On Mon, Jul 15, 2019 at 10:37:03AM -0400, Joel Fernandes (Google) wrote:
> > > The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> > > reader-lock held, because the pci_mmcfg_lock is already held. Make this
> > > known to the list macro so that it fixes new lockdep warnings that
> > > trigger due to lockdep checks added to list_for_each_entry_rcu().
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Ingo takes care of most patches to this file, but FWIW,
> > 
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Thanks.
> 
> > I would personally prefer if you capitalized the subject to match the
> > "x86/PCI:" convention that's used fairly consistently in
> > arch/x86/pci/.
> > 
> > Also, I didn't apply this to be sure, but it looks like this might
> > make a line or two wider than 80 columns, which I would rewrap if I
> > were applying this.
> 
> Updated below is the patch with the nits corrected:

I am OK with this going either way, but it does depend on an earlier
patch.

							Thanx, Paul

> ---8<-----------------------
> 
> >From 73fab09d7e33ca2110c24215f8ed428c12625dbe Mon Sep 17 00:00:00 2001
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Date: Sat, 1 Jun 2019 15:05:49 -0400
> Subject: [PATCH] x86/PCI: Pass lockdep condition to pcm_mmcfg_list iterator
>  (v1)
> 
> The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
> reader-lock held, because the pci_mmcfg_lock is already held. Make this
> known to the list macro so that it fixes new lockdep warnings that
> trigger due to lockdep checks added to list_for_each_entry_rcu().
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  arch/x86/pci/mmconfig-shared.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 7389db538c30..9e3250ec5a37 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -29,6 +29,7 @@
>  static bool pci_mmcfg_running_state;
>  static bool pci_mmcfg_arch_init_failed;
>  static DEFINE_MUTEX(pci_mmcfg_lock);
> +#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
>  
>  LIST_HEAD(pci_mmcfg_list);
>  
> @@ -54,7 +55,8 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
>  	struct pci_mmcfg_region *cfg;
>  
>  	/* keep list sorted by segment and starting bus number */
> -	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
> +	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list,
> +				pci_mmcfg_lock_held()) {
>  		if (cfg->segment > new->segment ||
>  		    (cfg->segment == new->segment &&
>  		     cfg->start_bus >= new->start_bus)) {
> @@ -118,7 +120,8 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
>  {
>  	struct pci_mmcfg_region *cfg;
>  
> -	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
> +	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list
> +				pci_mmcfg_lock_held())
>  		if (cfg->segment == segment &&
>  		    cfg->start_bus <= bus && bus <= cfg->end_bus)
>  			return cfg;
> -- 
> 2.22.0.510.g264f2c817a-goog
> 


^ permalink raw reply

* Re: [PATCH 02/14] docs: power: add it to to the main documentation index
From: Pavel Machek @ 2019-07-16 18:42 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, linux-doc,
	linux-pm
In-Reply-To: <95abe2e389f5bfb9dd03d55de384c2b9b5bb78da.1563277838.git.mchehab+samsung@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

On Tue 2019-07-16 09:10:41, Mauro Carvalho Chehab wrote:
> The power docs are orphaned at the documentation body.
> 
> While it could likely be moved to be inside some guide, I'm opting to just
> adding it to the main index.rst, removing the :orphan: and adding the SPDX
> header.
> 
> The reason is similar to what it was done for other driver-specific
> subsystems: the docs there contain a mix of Kernelspace, uAPI and
> admin-guide. So, better to keep them on its own directory,
> while the docs there are not properly classified.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH 5/9] driver/core: Convert to use built-in RCU list checking (v1)
From: Paul E. McKenney @ 2019-07-16 18:40 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Greg Kroah-Hartman, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-6-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 10:37:01AM -0400, Joel Fernandes (Google) wrote:
> list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
> it in driver core.
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This one looks ready.

							Thanx, Paul

> ---
>  drivers/base/base.h          |  1 +
>  drivers/base/core.c          | 10 ++++++++++
>  drivers/base/power/runtime.c | 15 ++++++++++-----
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index b405436ee28e..0d32544b6f91 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
>  /* Device links support */
>  extern int device_links_read_lock(void);
>  extern void device_links_read_unlock(int idx);
> +extern int device_links_read_lock_held(void);
>  extern int device_links_check_suppliers(struct device *dev);
>  extern void device_links_driver_bound(struct device *dev);
>  extern void device_links_driver_cleanup(struct device *dev);
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index da84a73f2ba6..85e82f38717f 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
>  {
>  	srcu_read_unlock(&device_links_srcu, idx);
>  }
> +
> +int device_links_read_lock_held(void)
> +{
> +	return srcu_read_lock_held(&device_links_srcu);
> +}
>  #else /* !CONFIG_SRCU */
>  static DECLARE_RWSEM(device_links_lock);
>  
> @@ -91,6 +96,11 @@ void device_links_read_unlock(int not_used)
>  {
>  	up_read(&device_links_lock);
>  }
> +
> +int device_links_read_lock_held(void)
> +{
> +	return lock_is_held(&device_links_lock);
> +}
>  #endif /* !CONFIG_SRCU */
>  
>  /**
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 952a1e7057c7..7a10e8379a70 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
>  {
>  	struct device_link *link;
>  
> -	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
> +	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> +				device_links_read_lock_held()) {
>  		int retval;
>  
>  		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> @@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
>  {
>  	struct device_link *link;
>  
> -	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
> +	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> +				device_links_read_lock_held()) {
>  		if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>  			continue;
>  
> @@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
>  
>  	idx = device_links_read_lock();
>  
> -	list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> +	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> +				device_links_read_lock_held()) {
>  		if (link->flags & DL_FLAG_STATELESS)
>  			continue;
>  
> @@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
>  
>  	idx = device_links_read_lock();
>  
> -	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> +				device_links_read_lock_held())
>  		if (link->flags & DL_FLAG_PM_RUNTIME) {
>  			link->supplier_preactivated = true;
>  			refcount_inc(&link->rpm_active);
> @@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
>  
>  	idx = device_links_read_lock();
>  
> -	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> +				device_links_read_lock_held())
>  		if (link->supplier_preactivated) {
>  			link->supplier_preactivated = false;
>  			if (refcount_dec_not_one(&link->rpm_active))
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH 4/9] ipv4: add lockdep condition to fix for_each_entry (v1)
From: Paul E. McKenney @ 2019-07-16 18:39 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-5-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 10:37:00AM -0400, Joel Fernandes (Google) wrote:
> Using the previous support added, use it for adding lockdep conditions
> to list usage here.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

We need an ack or better from the subsystem maintainer for this one.

						Thanx, Paul

> ---
>  net/ipv4/fib_frontend.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 317339cd7f03..26b0fb24e2c2 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
>  	h = id & (FIB_TABLE_HASHSZ - 1);
>  
>  	head = &net->ipv4.fib_table_hash[h];
> -	hlist_for_each_entry_rcu(tb, head, tb_hlist) {
> +	hlist_for_each_entry_rcu(tb, head, tb_hlist,
> +				 lockdep_rtnl_is_held()) {
>  		if (tb->tb_id == id)
>  			return tb;
>  	}
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH 2/9] rcu: Add support for consolidated-RCU reader checking (v3)
From: Paul E. McKenney @ 2019-07-16 18:38 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-3-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 10:36:58AM -0400, Joel Fernandes (Google) wrote:
> This patch adds support for checking RCU reader sections in list
> traversal macros. Optionally, if the list macro is called under SRCU or
> other lock/mutex protection, then appropriate lockdep expressions can be
> passed to make the checks pass.
> 
> Existing list_for_each_entry_rcu() invocations don't need to pass the
> optional fourth argument (cond) unless they are under some non-RCU
> protection and needs to make lockdep check pass.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Now that I am on the correct version, again please fold in the checks
for the extra argument.  The ability to have an optional argument looks
quite helpful, especially when compared to growing the RCU API!

A few more things below.

> ---
>  include/linux/rculist.h  | 28 ++++++++++++++++++++-----
>  include/linux/rcupdate.h |  7 +++++++
>  kernel/rcu/Kconfig.debug | 11 ++++++++++
>  kernel/rcu/update.c      | 44 ++++++++++++++++++++++++----------------
>  4 files changed, 67 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index e91ec9ddcd30..1048160625bb 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
>   */
>  #define list_next_rcu(list)	(*((struct list_head __rcu **)(&(list)->next)))
>  
> +/*
> + * Check during list traversal that we are within an RCU reader
> + */
> +
> +#ifdef CONFIG_PROVE_RCU_LIST

This new Kconfig option is OK temporarily, but unless there is reason to
fear malfunction that a few weeks of rcutorture, 0day, and -next won't
find, it would be better to just use CONFIG_PROVE_RCU.  The overall goal
is to reduce the number of RCU knobs rather than grow them, must though
history might lead one to believe otherwise.  :-/

> +#define __list_check_rcu(dummy, cond, ...)				\
> +	({								\
> +	RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(),		\
> +			 "RCU-list traversed in non-reader section!");	\
> +	 })
> +#else
> +#define __list_check_rcu(dummy, cond, ...) ({})
> +#endif
> +
>  /*
>   * Insert a new entry between two known consecutive entries.
>   *
> @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the list_head within the struct.
> + * @cond:	optional lockdep expression if called from non-RCU protection.
>   *
>   * This list-traversal primitive may safely run concurrently with
>   * the _rcu list-mutation primitives such as list_add_rcu()
>   * as long as the traversal is guarded by rcu_read_lock().
>   */
> -#define list_for_each_entry_rcu(pos, head, member) \
> -	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> -		&pos->member != (head); \
> +#define list_for_each_entry_rcu(pos, head, member, cond...)		\
> +	for (__list_check_rcu(dummy, ## cond, 0),			\
> +	     pos = list_entry_rcu((head)->next, typeof(*pos), member);	\
> +		&pos->member != (head);					\
>  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the hlist_node within the struct.
> + * @cond:	optional lockdep expression if called from non-RCU protection.
>   *
>   * This list-traversal primitive may safely run concurrently with
>   * the _rcu list-mutation primitives such as hlist_add_head_rcu()
>   * as long as the traversal is guarded by rcu_read_lock().
>   */
> -#define hlist_for_each_entry_rcu(pos, head, member)			\
> -	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> +#define hlist_for_each_entry_rcu(pos, head, member, cond...)		\
> +	for (__list_check_rcu(dummy, ## cond, 0),			\
> +	     pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
>  			typeof(*(pos)), member);			\
>  		pos;							\
>  		pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 8f7167478c1d..f3c29efdf19a 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
>  int rcu_read_lock_held(void);
>  int rcu_read_lock_bh_held(void);
>  int rcu_read_lock_sched_held(void);
> +int rcu_read_lock_any_held(void);
>  
>  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
> @@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
>  {
>  	return !preemptible();
>  }
> +
> +static inline int rcu_read_lock_any_held(void)
> +{
> +	return !preemptible();
> +}
> +
>  #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  #ifdef CONFIG_PROVE_RCU
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 5ec3ea4028e2..7fbd21dbfcd0 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -8,6 +8,17 @@ menu "RCU Debugging"
>  config PROVE_RCU
>  	def_bool PROVE_LOCKING
>  
> +config PROVE_RCU_LIST
> +	bool "RCU list lockdep debugging"
> +	depends on PROVE_RCU

This must also depend on RCU_EXPERT.  

> +	default n
> +	help
> +	  Enable RCU lockdep checking for list usages. By default it is
> +	  turned off since there are several list RCU users that still
> +	  need to be converted to pass a lockdep expression. To prevent
> +	  false-positive splats, we keep it default disabled but once all
> +	  users are converted, we can remove this config option.
> +
>  config TORTURE_TEST
>  	tristate
>  	default n
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 9dd5aeef6e70..b7a4e3b5fa98 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -91,14 +91,18 @@ module_param(rcu_normal_after_boot, int, 0);
>   * Similarly, we avoid claiming an SRCU read lock held if the current
>   * CPU is offline.
>   */
> +#define rcu_read_lock_held_common()		\
> +	if (!debug_lockdep_rcu_enabled())	\
> +		return 1;			\
> +	if (!rcu_is_watching())			\
> +		return 0;			\
> +	if (!rcu_lockdep_current_cpu_online())	\
> +		return 0;

Nice abstraction of common code!

							Thanx, Paul

> +
>  int rcu_read_lock_sched_held(void)
>  {
> -	if (!debug_lockdep_rcu_enabled())
> -		return 1;
> -	if (!rcu_is_watching())
> -		return 0;
> -	if (!rcu_lockdep_current_cpu_online())
> -		return 0;
> +	rcu_read_lock_held_common();
> +
>  	return lock_is_held(&rcu_sched_lock_map) || !preemptible();
>  }
>  EXPORT_SYMBOL(rcu_read_lock_sched_held);
> @@ -257,12 +261,8 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled);
>   */
>  int rcu_read_lock_held(void)
>  {
> -	if (!debug_lockdep_rcu_enabled())
> -		return 1;
> -	if (!rcu_is_watching())
> -		return 0;
> -	if (!rcu_lockdep_current_cpu_online())
> -		return 0;
> +	rcu_read_lock_held_common();
> +
>  	return lock_is_held(&rcu_lock_map);
>  }
>  EXPORT_SYMBOL_GPL(rcu_read_lock_held);
> @@ -284,16 +284,24 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
>   */
>  int rcu_read_lock_bh_held(void)
>  {
> -	if (!debug_lockdep_rcu_enabled())
> -		return 1;
> -	if (!rcu_is_watching())
> -		return 0;
> -	if (!rcu_lockdep_current_cpu_online())
> -		return 0;
> +	rcu_read_lock_held_common();
> +
>  	return in_softirq() || irqs_disabled();
>  }
>  EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
>  
> +int rcu_read_lock_any_held(void)
> +{
> +	rcu_read_lock_held_common();
> +
> +	if (lock_is_held(&rcu_lock_map) ||
> +	    lock_is_held(&rcu_bh_lock_map) ||
> +	    lock_is_held(&rcu_sched_lock_map))
> +		return 1;
> +	return !preemptible();
> +}
> +EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
> +
>  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  /**
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH 3/9] rcu/sync: Remove custom check for reader-section (v2)
From: Paul E. McKenney @ 2019-07-16 18:39 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190715143705.117908-4-joel@joelfernandes.org>

On Mon, Jul 15, 2019 at 10:36:59AM -0400, Joel Fernandes (Google) wrote:
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This one looks good!

							Thanx, Paul

> ---
>  include/linux/rcu_sync.h | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 9b83865d24f9..0027d4c8087c 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -31,9 +31,7 @@ struct rcu_sync {
>   */
>  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> -			 !rcu_read_lock_bh_held() &&
> -			 !rcu_read_lock_sched_held(),
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
>  			 "suspicious rcu_sync_is_idle() usage");
>  	return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
>  }
> -- 
> 2.22.0.510.g264f2c817a-goog
> 


^ permalink raw reply

* Re: [PATCH v2 2/9] rcu: Add support for consolidated-RCU reader checking
From: Joel Fernandes @ 2019-07-16 18:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716182237.GA22819@linux.ibm.com>

On Tue, Jul 16, 2019 at 11:22:37AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 01:00:17PM -0400, Joel Fernandes (Google) wrote:
> > This patch adds support for checking RCU reader sections in list
> > traversal macros. Optionally, if the list macro is called under SRCU or
> > other lock/mutex protection, then appropriate lockdep expressions can be
> > passed to make the checks pass.
> > 
> > Existing list_for_each_entry_rcu() invocations don't need to pass the
> > optional fourth argument (cond) unless they are under some non-RCU
> > protection and needs to make lockdep check pass.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> If you fold in the checks for extra parameters, I will take this
> one and also 1/9.

I folded the checks in and also threw in the rcu-sync with Oleg's ack:

Could you pull into /dev branch?

git pull https://github.com/joelagnel/linux-kernel.git list-first-three
(Based on your dev branch)


^ permalink raw reply

* Re: [PATCH v2 2/9] rcu: Add support for consolidated-RCU reader checking
From: Paul E. McKenney @ 2019-07-16 18:22 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Kuznetsov, Bjorn Helgaas, Borislav Petkov,
	c0d1n61at3, David S. Miller, edumazet, Greg Kroah-Hartman,
	Hideaki YOSHIFUJI, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Josh Triplett, keescook, kernel-hardening, kernel-team,
	Lai Jiangshan, Len Brown, linux-acpi, linux-doc, linux-pci,
	linux-pm, Mathieu Desnoyers, neilb, netdev, Oleg Nesterov,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170024.111093-3-joel@joelfernandes.org>

On Fri, Jul 12, 2019 at 01:00:17PM -0400, Joel Fernandes (Google) wrote:
> This patch adds support for checking RCU reader sections in list
> traversal macros. Optionally, if the list macro is called under SRCU or
> other lock/mutex protection, then appropriate lockdep expressions can be
> passed to make the checks pass.
> 
> Existing list_for_each_entry_rcu() invocations don't need to pass the
> optional fourth argument (cond) unless they are under some non-RCU
> protection and needs to make lockdep check pass.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

If you fold in the checks for extra parameters, I will take this
one and also 1/9.

							Thanx, Paul

> ---
>  include/linux/rculist.h  | 28 +++++++++++++++++++++++-----
>  include/linux/rcupdate.h |  7 +++++++
>  kernel/rcu/Kconfig.debug | 11 +++++++++++
>  kernel/rcu/update.c      | 14 ++++++++++++++
>  4 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index e91ec9ddcd30..1048160625bb 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -40,6 +40,20 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
>   */
>  #define list_next_rcu(list)	(*((struct list_head __rcu **)(&(list)->next)))
>  
> +/*
> + * Check during list traversal that we are within an RCU reader
> + */
> +
> +#ifdef CONFIG_PROVE_RCU_LIST
> +#define __list_check_rcu(dummy, cond, ...)				\
> +	({								\
> +	RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(),		\
> +			 "RCU-list traversed in non-reader section!");	\
> +	 })
> +#else
> +#define __list_check_rcu(dummy, cond, ...) ({})
> +#endif
> +
>  /*
>   * Insert a new entry between two known consecutive entries.
>   *
> @@ -343,14 +357,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the list_head within the struct.
> + * @cond:	optional lockdep expression if called from non-RCU protection.
>   *
>   * This list-traversal primitive may safely run concurrently with
>   * the _rcu list-mutation primitives such as list_add_rcu()
>   * as long as the traversal is guarded by rcu_read_lock().
>   */
> -#define list_for_each_entry_rcu(pos, head, member) \
> -	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> -		&pos->member != (head); \
> +#define list_for_each_entry_rcu(pos, head, member, cond...)		\
> +	for (__list_check_rcu(dummy, ## cond, 0),			\
> +	     pos = list_entry_rcu((head)->next, typeof(*pos), member);	\
> +		&pos->member != (head);					\
>  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> @@ -616,13 +632,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the hlist_node within the struct.
> + * @cond:	optional lockdep expression if called from non-RCU protection.
>   *
>   * This list-traversal primitive may safely run concurrently with
>   * the _rcu list-mutation primitives such as hlist_add_head_rcu()
>   * as long as the traversal is guarded by rcu_read_lock().
>   */
> -#define hlist_for_each_entry_rcu(pos, head, member)			\
> -	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
> +#define hlist_for_each_entry_rcu(pos, head, member, cond...)		\
> +	for (__list_check_rcu(dummy, ## cond, 0),			\
> +	     pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
>  			typeof(*(pos)), member);			\
>  		pos;							\
>  		pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 922bb6848813..712b464ab960 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -223,6 +223,7 @@ int debug_lockdep_rcu_enabled(void);
>  int rcu_read_lock_held(void);
>  int rcu_read_lock_bh_held(void);
>  int rcu_read_lock_sched_held(void);
> +int rcu_read_lock_any_held(void);
>  
>  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
> @@ -243,6 +244,12 @@ static inline int rcu_read_lock_sched_held(void)
>  {
>  	return !preemptible();
>  }
> +
> +static inline int rcu_read_lock_any_held(void)
> +{
> +	return !preemptible();
> +}
> +
>  #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  #ifdef CONFIG_PROVE_RCU
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 0ec7d1d33a14..b20d0e2903d1 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -7,6 +7,17 @@ menu "RCU Debugging"
>  config PROVE_RCU
>  	def_bool PROVE_LOCKING
>  
> +config PROVE_RCU_LIST
> +	bool "RCU list lockdep debugging"
> +	depends on PROVE_RCU
> +	default n
> +	help
> +	  Enable RCU lockdep checking for list usages. By default it is
> +	  turned off since there are several list RCU users that still
> +	  need to be converted to pass a lockdep expression. To prevent
> +	  false-positive splats, we keep it default disabled but once all
> +	  users are converted, we can remove this config option.
> +
>  config TORTURE_TEST
>  	tristate
>  	default n
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index bb961cd89e76..0cc7be0fb6b5 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -294,6 +294,20 @@ int rcu_read_lock_bh_held(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
>  
> +int rcu_read_lock_any_held(void)
> +{
> +	if (!debug_lockdep_rcu_enabled())
> +		return 1;
> +	if (!rcu_is_watching())
> +		return 0;
> +	if (!rcu_lockdep_current_cpu_online())
> +		return 0;
> +	if (lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map))
> +		return 1;
> +	return !preemptible();
> +}
> +EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
> +
>  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  /**
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-16 18:28 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190716182642.GB22819@linux.ibm.com>

On Tue, Jul 16, 2019 at 11:26:42AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> > The rcu/sync code was doing its own check whether we are in a reader
> > section. With RCU consolidating flavors and the generic helper added in
> > this series, this is no longer need. We can just use the generic helper
> > and it results in a nice cleanup.
> > 
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> This needs to be forward-ported to current mainline.  (Or, I believe
> equivalently for this file, to branch "dev" of -rcu.)
> 
> Especially given that you have Oleg's Ack, I would be happy to
> take the forward-ported version.

Never mind, I am one version behind.  Apologies for the noise!

							Thanx, Paul

> > ---
> > Please note: Only build and boot tested this particular patch so far.
> > 
> >  include/linux/rcu_sync.h |  5 ++---
> >  kernel/rcu/sync.c        | 22 ----------------------
> >  2 files changed, 2 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> > index 6fc53a1345b3..c954f1efc919 100644
> > --- a/include/linux/rcu_sync.h
> > +++ b/include/linux/rcu_sync.h
> > @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
> >   */
> >  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
> >  {
> > -#ifdef CONFIG_PROVE_RCU
> > -	rcu_sync_lockdep_assert(rsp);
> > -#endif
> > +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> > +			 "suspicious rcu_sync_is_idle() usage");
> >  	return !rsp->gp_state; /* GP_IDLE */
> >  }
> >  
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index a8304d90573f..535e02601f56 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -10,37 +10,25 @@
> >  #include <linux/rcu_sync.h>
> >  #include <linux/sched.h>
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -#define __INIT_HELD(func)	.held = func,
> > -#else
> > -#define __INIT_HELD(func)
> > -#endif
> > -
> >  static const struct {
> >  	void (*sync)(void);
> >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> >  	void (*wait)(void);
> > -#ifdef CONFIG_PROVE_RCU
> > -	int  (*held)(void);
> > -#endif
> >  } gp_ops[] = {
> >  	[RCU_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_held)
> >  	},
> >  	[RCU_SCHED_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_sched_held)
> >  	},
> >  	[RCU_BH_SYNC] = {
> >  		.sync = synchronize_rcu,
> >  		.call = call_rcu,
> >  		.wait = rcu_barrier,
> > -		__INIT_HELD(rcu_read_lock_bh_held)
> >  	},
> >  };
> >  
> > @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> >  
> >  #define	rss_lock	gp_wait.lock
> >  
> > -#ifdef CONFIG_PROVE_RCU
> > -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> > -{
> > -	RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> > -			 "suspicious rcu_sync_is_idle() usage");
> > -}
> > -
> > -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> > -#endif
> > -
> >  /**
> >   * rcu_sync_init() - Initialize an rcu_sync structure
> >   * @rsp: Pointer to rcu_sync structure to be initialized
> > -- 
> > 2.22.0.510.g264f2c817a-goog
> > 


^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Paul E. McKenney @ 2019-07-16 18:26 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Oleg Nesterov, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190712170024.111093-4-joel@joelfernandes.org>

On Fri, Jul 12, 2019 at 01:00:18PM -0400, Joel Fernandes (Google) wrote:
> The rcu/sync code was doing its own check whether we are in a reader
> section. With RCU consolidating flavors and the generic helper added in
> this series, this is no longer need. We can just use the generic helper
> and it results in a nice cleanup.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This needs to be forward-ported to current mainline.  (Or, I believe
equivalently for this file, to branch "dev" of -rcu.)

Especially given that you have Oleg's Ack, I would be happy to
take the forward-ported version.

							Thanx, Paul

> ---
> Please note: Only build and boot tested this particular patch so far.
> 
>  include/linux/rcu_sync.h |  5 ++---
>  kernel/rcu/sync.c        | 22 ----------------------
>  2 files changed, 2 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
> index 6fc53a1345b3..c954f1efc919 100644
> --- a/include/linux/rcu_sync.h
> +++ b/include/linux/rcu_sync.h
> @@ -39,9 +39,8 @@ extern void rcu_sync_lockdep_assert(struct rcu_sync *);
>   */
>  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
>  {
> -#ifdef CONFIG_PROVE_RCU
> -	rcu_sync_lockdep_assert(rsp);
> -#endif
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
> +			 "suspicious rcu_sync_is_idle() usage");
>  	return !rsp->gp_state; /* GP_IDLE */
>  }
>  
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index a8304d90573f..535e02601f56 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -10,37 +10,25 @@
>  #include <linux/rcu_sync.h>
>  #include <linux/sched.h>
>  
> -#ifdef CONFIG_PROVE_RCU
> -#define __INIT_HELD(func)	.held = func,
> -#else
> -#define __INIT_HELD(func)
> -#endif
> -
>  static const struct {
>  	void (*sync)(void);
>  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
>  	void (*wait)(void);
> -#ifdef CONFIG_PROVE_RCU
> -	int  (*held)(void);
> -#endif
>  } gp_ops[] = {
>  	[RCU_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_held)
>  	},
>  	[RCU_SCHED_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_sched_held)
>  	},
>  	[RCU_BH_SYNC] = {
>  		.sync = synchronize_rcu,
>  		.call = call_rcu,
>  		.wait = rcu_barrier,
> -		__INIT_HELD(rcu_read_lock_bh_held)
>  	},
>  };
>  
> @@ -49,16 +37,6 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
>  
>  #define	rss_lock	gp_wait.lock
>  
> -#ifdef CONFIG_PROVE_RCU
> -void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
> -{
> -	RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
> -			 "suspicious rcu_sync_is_idle() usage");
> -}
> -
> -EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
> -#endif
> -
>  /**
>   * rcu_sync_init() - Initialize an rcu_sync structure
>   * @rsp: Pointer to rcu_sync structure to be initialized
> -- 
> 2.22.0.510.g264f2c817a-goog
> 


^ permalink raw reply

* Re: [PATCH v9 04/18] kunit: test: add kunit_stream a std::stream like logger
From: Brendan Higgins @ 2019-07-16 17:51 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <20190716153002.49E292054F@mail.kernel.org>

On Tue, Jul 16, 2019 at 8:30 AM Stephen Boyd <sboyd@kernel.org> wrote:
>
> Quoting Brendan Higgins (2019-07-16 01:37:34)
> > On Tue, Jul 16, 2019 at 12:57 AM Brendan Higgins
> > <brendanhiggins@google.com> wrote:
> > >
> > > On Mon, Jul 15, 2019 at 3:15 PM Stephen Boyd <sboyd@kernel.org> wrote:
> > > >
> > > > Quoting Brendan Higgins (2019-07-12 01:17:30)
> > > > > diff --git a/include/kunit/kunit-stream.h b/include/kunit/kunit-stream.h
> > > > > new file mode 100644
> > > > > index 0000000000000..a7b53eabf6be4
> > > > > --- /dev/null
> > > > > +++ b/include/kunit/kunit-stream.h
> > > > > +/**
> > > > > + * struct kunit_stream - a std::stream style string builder.
> > > > > + *
> > > > > + * A std::stream style string builder. Allows messages to be built up and
> > > > > + * printed all at once.
> > > > > + */
> > > > > +struct kunit_stream {
> > > > > +       /* private: internal use only. */
> > > > > +       struct kunit *test;
> > > > > +       const char *level;
> > > >
> > > > Is the level changed? See my comment below, but I wonder if this whole
> > > > struct can go away and the wrappers can just operate on 'struct
> > > > string_stream' instead.
> > >
> > > I was inclined to agree with you when I first read your comment, but
> > > then I thought about the case that someone wants to add in a debug
> > > message (of which I currently have none). I think under most
> > > circumstances a user of kunit_stream would likely want to pick a
> > > default verbosity that maybe I should provide, but may still want
> > > different verbosity levels.
> > >
> > > The main reason I want to keep the types separate, string_stream vs.
> > > kunit_stream, is that they are intended to be used differently.
> > > string_stream is just a generic string builder. If you are using that,
> > > you are expecting to see someone building the string at some point and
> > > then doing something interesting with it. kunit_stream really tells
> > > you specifically that KUnit is putting together a message to
> > > communicate something to a user of KUnit. It is really used in a very
> > > specific way, and I wouldn't want to generalize its usage beyond how
> > > it is currently used. I think in order to preserve the author's
> > > intention it adds clarity to keep the types separate regardless of how
> > > similar they might be in reality.
>
> You may want to add some of these reasons to the commit text.

Will do.

> > > > > +
> > > > > +       if (!string_stream_is_empty(stream->internal_stream)) {
> > > > > +               kunit_err(stream->test,
> > > > > +                         "End of test case reached with uncommitted stream entries\n");
> > > > > +               kunit_stream_commit(stream);
> > > > > +       }
> > > > > +}
> > > > > +
> > > >
> > > > Nitpick: Drop this extra newline.
> > >
> > > Oops, nice catch.
> >
> > Not super important, but I don't want you to think that I am ignoring
> > you. I think you must have unintentionally deleted the last function
> > in this file, or maybe you are referring to something that I am just
> > not seeing, but I don't see the extra newline here.
>
> No worries. Sorry for the noise.
>
> > > property of the input, it may or may not be enough information on its
> > > own for the expectation to fail, but we want to share the result of
> > > the property check with the user regardless, BUT only if the
> > > expectation as a whole fails.
> > >
> > > Hence, we can have multiple `struct kunit_stream`s associated with a
> > > `struct kunit` active at any given time.
>
> Makes sense. I wasn't sure if there were more than one stream associated
> with a test. Sounds like there are many to one so it can't just be a
> member of the test. This could be documented somewhere so this question
> doesn't come up again.

Sounds good. Will do.

Thanks!

^ permalink raw reply

* Re: [PATCH v9 04/18] kunit: test: add kunit_stream a std::stream like logger
From: Stephen Boyd @ 2019-07-16 17:50 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <CAFd5g44_axVHNMBzxSURQB_-R+Rif7cZcg7PyZ_SS+5hcy5jZA@mail.gmail.com>

Quoting Brendan Higgins (2019-07-16 01:37:34)
> On Tue, Jul 16, 2019 at 12:57 AM Brendan Higgins
> <brendanhiggins@google.com> wrote:
> >
> > A `struct kunit_stream` is usually associated with a message that is
> > being built up over time like maybe an expectation; it is meant to
> > capture the idea that we might want to send some information out to
> > the user pertaining to some thing 'X', but we aren't sure that we
> > actually want to send it until 'X' is complete, but do to the nature
> > of 'X' it is easier to start constructing the message before 'X' is
> > complete.
> >
> > Consider a complicated expectation, there might be multiple conditions
> > that satisfy it and multiple conditions which could make it fail. As
> > we start exploring the input to the expectation we gain information
> > that we might want to share back with the user if the expectation were
> > to fail and we might get that information before we are actually sure
> > that the expectation does indeed fail.
> >
> > When we first step into the expectation we immediately know the
> > function name, file name, and line number where we are called and
> > would want to put that information into any message we would send to
> > the user about this expectation. Next, we might want to check a
> > property of the input, it may or may not be enough information on its
> > own for the expectation to fail, but we want to share the result of
> > the property check with the user regardless, BUT only if the
> > expectation as a whole fails.
> >
> > Hence, we can have multiple `struct kunit_stream`s associated with a
> > `struct kunit` active at any given time.

I'm coming back to this now after reading the rest of the patches that
deal with assertions and expectations. It looks like the string stream
is there to hold a few different pieces of information:

 - Line Number
 - File Name
 - Function Name

The above items could be stored in a structure on the stack that then
gets printed and formatted when the expectation or assertion fails. That
would make the whole string stream structure and code unnecessary.

The only hypothetical case where this can't be done is a complicated
assertion or expectation that does more than one check and can't be
written as a function that dumps out what went wrong. Is this a real
problem? Maybe such an assertion should just open code that logic so we
don't have to build up a string for all the other simple cases.

It seems far simpler to get rid of the string stream API and just have a
struct for this.

	struct kunit_fail_msg {
		const char *line;
		const char *file;
		const char *func;
		const char *msg;
	};

Then you can have the assertion macros create this on the stack (with
another macro?).

	#define DEFINE_KUNIT_FAIL_MSG(name, _msg) \
		struct kunit_fail_msg name = { \
			.line =  __LINE__, \
			.file = __FILE__, \
			.func = __func__, \
			.msg = _msg, \
		}

Note: I don't know if the __LINE__ above will use the macro location, so
this probably needs another wrapper to put the right line number there.

I don't want to derail this whole series on this topic, but it seems
like a bunch of code is there to construct this same set of information
over and over again into a buffer a little bit at a time and then throw
it away when nothing fails just because we may want to support the case
where we have some unstructured data to inform the user about.

Why not build in the structured part into the framework (i.e. the struct
above) so that it's always there and then add the string building part
later when we have to?

^ permalink raw reply

* [RFC v1 2/4] arm64, kexec: interface preparation for mmu enabled kexec
From: Pavel Tatashin @ 2019-07-16 16:56 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-doc, linux-arm-kernel
In-Reply-To: <20190716165641.6990-1-pasha.tatashin@soleen.com>

Currently cpu_install_idmap() is used to install page table during kexec
switch over to purgatory. We soon will be using our own page table, that
maps the whole physical range (and might be even more, i.e if new DTB
describes a bigger physical range or mem= parameter limited physical
range in the current kernel).

Make kimage_arch to be always part of arm64.
Add relocate_kern and kexec_pgtable verctors to this struct, as we won't
be able to rely on a single control page anymore.

Copy relocation function in machine_kexec_prepare(), and setup page
table there as well (for now idmap_pg_dir).

Cleanup call to cpu_soft_restart by removing ugly ifdefs. When
kimage->arch.dtb_mem is not set, it is 0 anyway.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/include/asm/kexec.h    |  5 +--
 arch/arm64/kernel/cpu-reset.h     |  7 ++++-
 arch/arm64/kernel/machine_kexec.c | 52 ++++++++++++++-----------------
 3 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 12a561a54128..ef2d2442b890 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -90,14 +90,15 @@ static inline void crash_prepare_suspend(void) {}
 static inline void crash_post_resume(void) {}
 #endif
 
-#ifdef CONFIG_KEXEC_FILE
 #define ARCH_HAS_KIMAGE_ARCH
-
 struct kimage_arch {
 	void *dtb;
 	unsigned long dtb_mem;
+	void  *relocate_kern;
+	pgd_t *kexec_pgtable;
 };
 
+#ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_image_ops;
 
 struct kimage;
diff --git a/arch/arm64/kernel/cpu-reset.h b/arch/arm64/kernel/cpu-reset.h
index ed50e9587ad8..c795811587f0 100644
--- a/arch/arm64/kernel/cpu-reset.h
+++ b/arch/arm64/kernel/cpu-reset.h
@@ -14,6 +14,7 @@ void __cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
 	unsigned long arg0, unsigned long arg1, unsigned long arg2);
 
 static inline void __noreturn cpu_soft_restart(unsigned long entry,
+					       pgd_t *kexec_pgtable,
 					       unsigned long arg0,
 					       unsigned long arg1,
 					       unsigned long arg2)
@@ -24,7 +25,11 @@ static inline void __noreturn cpu_soft_restart(unsigned long entry,
 		is_hyp_mode_available();
 	restart = (void *)__pa_symbol(__cpu_soft_restart);
 
-	cpu_install_idmap();
+	cpu_set_reserved_ttbr0();
+	local_flush_tlb_all();
+	write_sysreg(phys_to_ttbr(virt_to_phys(kexec_pgtable)), ttbr0_el1);
+	isb();
+
 	restart(el2_switch, entry, arg0, arg1, arg2);
 	unreachable();
 }
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 0df8493624e0..f4565eb01d09 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -42,6 +42,8 @@ static void _kexec_image_info(const char *func, int line,
 	pr_debug("    start:       %lx\n", kimage->start);
 	pr_debug("    head:        %lx\n", kimage->head);
 	pr_debug("    nr_segments: %lu\n", kimage->nr_segments);
+	pr_debug("    arch.kexec_pgtable: %p\n", kimage->arch.kexec_pgtable);
+	pr_debug("    arch.relocate_kern: %p\n", kimage->arch.relocate_kern);
 
 	for (i = 0; i < kimage->nr_segments; i++) {
 		pr_debug("      segment[%lu]: %016lx - %016lx, 0x%lx bytes, %lu pages\n",
@@ -67,13 +69,24 @@ void machine_kexec_cleanup(struct kimage *kimage)
  */
 int machine_kexec_prepare(struct kimage *kimage)
 {
-	kexec_image_info(kimage);
+	void *reloc_buf = page_address(kimage->control_code_page);
 
 	if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
 		pr_err("Can't kexec: CPUs are stuck in the kernel.\n");
 		return -EBUSY;
 	}
 
+	/*
+	 * Copy arm64_relocate_new_kernel to the buffer for use after the kernel
+	 * is shut down.
+	 */
+	memcpy(reloc_buf, arm64_relocate_new_kernel,
+	       arm64_relocate_new_kernel_size);
+
+	kimage->arch.relocate_kern = reloc_buf;
+	kimage->arch.kexec_pgtable = lm_alias(idmap_pg_dir);
+	kexec_image_info(kimage);
+
 	return 0;
 }
 
@@ -143,8 +156,6 @@ static void kexec_segment_flush(const struct kimage *kimage)
  */
 void machine_kexec(struct kimage *kimage)
 {
-	phys_addr_t reboot_code_buffer_phys;
-	void *reboot_code_buffer;
 	bool in_kexec_crash = (kimage == kexec_crash_image);
 	bool stuck_cpus = cpus_are_stuck_in_kernel();
 
@@ -155,32 +166,17 @@ void machine_kexec(struct kimage *kimage)
 	WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
 		"Some CPUs may be stale, kdump will be unreliable.\n");
 
-	reboot_code_buffer_phys = page_to_phys(kimage->control_code_page);
-	reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys);
-
 	kexec_image_info(kimage);
-
-	pr_debug("%s:%d: control_code_page:        %p\n", __func__, __LINE__,
-		kimage->control_code_page);
-	pr_debug("%s:%d: reboot_code_buffer_phys:  %pa\n", __func__, __LINE__,
-		&reboot_code_buffer_phys);
-	pr_debug("%s:%d: reboot_code_buffer:       %p\n", __func__, __LINE__,
-		reboot_code_buffer);
 	pr_debug("%s:%d: relocate_new_kernel:      %p\n", __func__, __LINE__,
 		arm64_relocate_new_kernel);
 	pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n",
 		__func__, __LINE__, arm64_relocate_new_kernel_size,
 		arm64_relocate_new_kernel_size);
 
-	/*
-	 * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use
-	 * after the kernel is shut down.
-	 */
-	memcpy(reboot_code_buffer, arm64_relocate_new_kernel,
-		arm64_relocate_new_kernel_size);
 
-	/* Flush the reboot_code_buffer in preparation for its execution. */
-	__flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size);
+	/* Flush the relocate_kern in preparation for its execution. */
+	__flush_dcache_area(kimage->arch.relocate_kern,
+			    arm64_relocate_new_kernel_size);
 
 	/*
 	 * Although we've killed off the secondary CPUs, we don't update
@@ -188,7 +184,7 @@ void machine_kexec(struct kimage *kimage)
 	 * need to avoid flush_icache_range(), which will attempt to IPI
 	 * the offline CPUs. Therefore, we must use the __* variant here.
 	 */
-	__flush_icache_range((uintptr_t)reboot_code_buffer,
+	__flush_icache_range((uintptr_t)kimage->arch.relocate_kern,
 			     arm64_relocate_new_kernel_size);
 
 	/* Flush the kimage list and its buffers. */
@@ -204,7 +200,7 @@ void machine_kexec(struct kimage *kimage)
 
 	/*
 	 * cpu_soft_restart will shutdown the MMU, disable data caches, then
-	 * transfer control to the reboot_code_buffer which contains a copy of
+	 * transfer control to the relocate_kern which contains a copy of
 	 * the arm64_relocate_new_kernel routine.  arm64_relocate_new_kernel
 	 * uses physical addressing to relocate the new image to its final
 	 * position and transfers control to the image entry point when the
@@ -214,12 +210,10 @@ void machine_kexec(struct kimage *kimage)
 	 * userspace (kexec-tools).
 	 * In kexec_file case, the kernel starts directly without purgatory.
 	 */
-	cpu_soft_restart(reboot_code_buffer_phys, kimage->head, kimage->start,
-#ifdef CONFIG_KEXEC_FILE
-						kimage->arch.dtb_mem);
-#else
-						0);
-#endif
+	cpu_soft_restart(__pa(kimage->arch.relocate_kern),
+			 kimage->arch.kexec_pgtable,
+			 kimage->head, kimage->start,
+			 kimage->arch.dtb_mem);
 
 	BUG(); /* Should never get here. */
 }
-- 
2.22.0


^ permalink raw reply related

* [RFC v1 3/4] arm64, kexec: add kexec's own identity page table
From: Pavel Tatashin @ 2019-07-16 16:56 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-doc, linux-arm-kernel
In-Reply-To: <20190716165641.6990-1-pasha.tatashin@soleen.com>

Allocate and configure identity page table to be used for kexec reboot.
Note, for now we still have MMU disabled during kernel relocation phase,
so this table is still used the same as idmap_pg_dir was used.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/machine_kexec.c | 78 ++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index f4565eb01d09..60433c264178 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -12,6 +12,7 @@
 #include <linux/kexec.h>
 #include <linux/page-flags.h>
 #include <linux/smp.h>
+#include <linux/memblock.h>
 
 #include <asm/cacheflush.h>
 #include <asm/cpu_ops.h>
@@ -20,6 +21,7 @@
 #include <asm/mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/page.h>
+#include <asm/ident_map.h>
 
 #include "cpu-reset.h"
 
@@ -55,6 +57,77 @@ static void _kexec_image_info(const char *func, int line,
 	}
 }
 
+/* Allocates pages for kexec page table */
+static void *kexec_pgtable_alloc(void *arg)
+{
+	struct kimage *kimage = (struct kimage *)arg;
+	struct page *page = kimage_alloc_control_pages(kimage, 0);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
+/*
+ * Create identity mapped page table for kexec purposes. The flags that are used
+ * in this page table are the same as what is set in __create_page_tables. The
+ * page table is needed for performance reasons. Without it, kernel relocation
+ * is rather slow, because when MMU is off, d-caching is disabled as well.
+ */
+static int
+kexec_create_pgtable(struct kimage *kimage)
+{
+	void *pgd_page = kexec_pgtable_alloc(kimage);
+	phys_addr_t kexec_pgtable;
+	int rv, i;
+	struct memblock_region *reg;
+	struct ident_map_info info = {
+		.alloc_pgt_page	= kexec_pgtable_alloc,
+		.alloc_arg	= kimage,
+		.page_flags	= PMD_SECT_VALID | PMD_SECT_AF | PMD_SECT_S |
+				  PMD_ATTRINDX(MT_NORMAL),
+		.offset		= 0,
+		.pud_pages	= false,
+	};
+
+	if (!pgd_page)
+		return -ENOMEM;
+
+	clear_page(pgd_page);
+	kexec_pgtable = __pa(pgd_page);
+
+	for_each_memblock(memory, reg) {
+		phys_addr_t mstart = reg->base;
+		phys_addr_t mend   = reg->base + reg->size;
+
+		rv = ident_map_pgd_populate(&info, kexec_pgtable, mstart, mend);
+		if (rv)
+			return rv;
+	}
+
+	/*
+	 * It is possible new kernel knows of some physical addresses that this
+	 * kernel does not know: for example a different device tree might
+	 * provide information of a memory region, or memory could have been
+	 * reduced via mem= kernel parameter.
+	 * This is why also unconditionally map new kernel segments, even though
+	 * most likely this is redundant.
+	 */
+	for (i = 0; i < kimage->nr_segments; i++) {
+		phys_addr_t mstart = kimage->segment[i].mem;
+		phys_addr_t mend   = mstart + kimage->segment[i].memsz;
+
+		rv = ident_map_pgd_populate(&info, kexec_pgtable, mstart, mend);
+		if (rv)
+			return rv;
+	}
+
+	kimage->arch.kexec_pgtable = pgd_page;
+
+	return 0;
+}
+
 void machine_kexec_cleanup(struct kimage *kimage)
 {
 	/* Empty routine needed to avoid build errors. */
@@ -70,6 +143,7 @@ void machine_kexec_cleanup(struct kimage *kimage)
 int machine_kexec_prepare(struct kimage *kimage)
 {
 	void *reloc_buf = page_address(kimage->control_code_page);
+	int rv;
 
 	if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) {
 		pr_err("Can't kexec: CPUs are stuck in the kernel.\n");
@@ -84,10 +158,10 @@ int machine_kexec_prepare(struct kimage *kimage)
 	       arm64_relocate_new_kernel_size);
 
 	kimage->arch.relocate_kern = reloc_buf;
-	kimage->arch.kexec_pgtable = lm_alias(idmap_pg_dir);
+	rv = kexec_create_pgtable(kimage);
 	kexec_image_info(kimage);
 
-	return 0;
+	return rv;
 }
 
 /**
-- 
2.22.0


^ permalink raw reply related

* [RFC v1 1/4] arm64, mm: identity mapped page table
From: Pavel Tatashin @ 2019-07-16 16:56 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-doc, linux-arm-kernel
In-Reply-To: <20190716165641.6990-1-pasha.tatashin@soleen.com>

Created identiy mapped page table that maps 1 to 1 virtual to physical
addresses.

Similarly to x86, this table can be used in kasan, hibernate, and kexec.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/include/asm/ident_map.h | 26 ++++++++
 arch/arm64/mm/Makefile             |  1 +
 arch/arm64/mm/ident_map.c          | 99 ++++++++++++++++++++++++++++++
 3 files changed, 126 insertions(+)
 create mode 100644 arch/arm64/include/asm/ident_map.h
 create mode 100644 arch/arm64/mm/ident_map.c

diff --git a/arch/arm64/include/asm/ident_map.h b/arch/arm64/include/asm/ident_map.h
new file mode 100644
index 000000000000..1bb9fcd27368
--- /dev/null
+++ b/arch/arm64/include/asm/ident_map.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ * Pavel Tatashin <patatash@linux.microsoft.com>
+ */
+
+#ifndef _ASM_IDENT_MAP_H
+#define _ASM_IDENT_MAP_H
+
+#include <linux/types.h>
+#include <asm/pgtable.h>
+
+struct ident_map_info {
+	void * (*alloc_pgt_page)(void *);	/* allocate a page  */
+	void *alloc_arg;			/* arg. for alloc_pgt_page */
+	unsigned long page_flags;		/* PMD or PUD flags */
+	unsigned long offset;			/* ident mapping offset */
+	bool pud_pages;				/* PUD level huge pages */
+};
+
+int ident_map_pgd_populate(struct ident_map_info *info,
+			   phys_addr_t pgd_page,
+			   phys_addr_t addr,
+			   phys_addr_t end);
+
+#endif /* _ASM_ARM64_IDENT_MAP_H */
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 849c1df3d214..dfa5a074a360 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -5,6 +5,7 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   context.o proc.o pageattr.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM64_PTDUMP_CORE)	+= dump.o
+obj-$(CONFIG_KEXEC_CORE)	+= ident_map.o
 obj-$(CONFIG_ARM64_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
diff --git a/arch/arm64/mm/ident_map.c b/arch/arm64/mm/ident_map.c
new file mode 100644
index 000000000000..bcfff5e2573b
--- /dev/null
+++ b/arch/arm64/mm/ident_map.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019, Microsoft Corporation.
+ * Pavel Tatashin <patatash@linux.microsoft.com>
+ */
+
+#include <asm/ident_map.h>
+#include <asm/pgalloc.h>
+
+/* Initialize PMD size huge entries in page table */
+static void ident_map_pmd_init(struct ident_map_info *info,
+			       phys_addr_t pmd_page, phys_addr_t addr,
+			       phys_addr_t end)
+{
+	const unsigned long flags = info->page_flags;
+	const unsigned long offset = info->offset;
+	pmd_t *pmdp = (pmd_t *)__va(pmd_page) + pmd_index(addr);
+
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE, pmdp++) {
+		set_pmd(pmdp, __pmd(__phys_to_pmd_val(addr - offset) | flags));
+	}
+}
+
+/* Initialize PUD size huge entries in page table */
+static void ident_map_pud_init(struct ident_map_info *info,
+			       phys_addr_t pud_page, phys_addr_t addr,
+			       phys_addr_t end)
+{
+	const unsigned long flags = info->page_flags;
+	const unsigned long offset = info->offset;
+	pud_t *pudp = (pud_t *)__va(pud_page) + pud_index(addr);
+
+	addr &= PUD_MASK;
+	for (; addr < end; addr += PUD_SIZE, pudp++) {
+		set_pud(pudp, __pud(__phys_to_pud_val(addr - offset) | flags));
+	}
+}
+
+/* Populate PUD level with PMD entries */
+static int ident_map_pud_populate(struct ident_map_info *info,
+				  phys_addr_t pud_page, phys_addr_t addr,
+				  phys_addr_t end)
+{
+	pud_t *pudp = (pud_t *)__va(pud_page) + pud_index(addr);
+	phys_addr_t pmd_page, next;
+
+	for (; addr < end; addr = next, pudp++) {
+		next = pud_addr_end(addr, end);
+		if (pud_none(*pudp)) {
+			void *pmd = info->alloc_pgt_page(info->alloc_arg);
+
+			if (!pmd)
+				return -ENOMEM;
+
+			clear_page(pmd);
+			__pud_populate(pudp, __pa(pmd), PUD_TYPE_TABLE);
+		}
+		pmd_page = __pud_to_phys(*pudp);
+		ident_map_pmd_init(info, pmd_page, addr, next);
+	}
+
+	return 0;
+}
+
+/* Populate identify mapped page table with physical range[addr, end) */
+int ident_map_pgd_populate(struct ident_map_info *info,
+			   phys_addr_t pgd_page, phys_addr_t addr,
+			   phys_addr_t end)
+{
+	const bool pud_pages = info->pud_pages;
+	pgd_t *pgdp = (pgd_t *)__va(pgd_page) + pgd_index(addr);
+	phys_addr_t pud_page, next;
+
+	for (; addr < end; addr = next, pgdp++) {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp)) {
+			void *pud = info->alloc_pgt_page(info->alloc_arg);
+
+			if (!pud)
+				return -ENOMEM;
+
+			clear_page(pud);
+			__pgd_populate(pgdp, __pa(pud), PUD_TYPE_TABLE);
+		}
+		pud_page = __pgd_to_phys(*pgdp);
+		if (pud_pages) {
+			ident_map_pud_init(info, pud_page, addr, next);
+		} else {
+			int rv = ident_map_pud_populate(info, pud_page, addr,
+				 next);
+
+			if (rv)
+				return rv;
+		}
+	}
+
+	return 0;
+}
-- 
2.22.0


^ permalink raw reply related

* [RFC v1 4/4] arm64: Keep MMU on while kernel is being relocated
From: Pavel Tatashin @ 2019-07-16 16:56 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-doc, linux-arm-kernel
In-Reply-To: <20190716165641.6990-1-pasha.tatashin@soleen.com>

It is inefficient to do kernel relocation with MMU disabled. This is
because if MMU is disabled,  dcache must also be disabled.

Now, that we have identity page table we can disable MMU after relocation
is completed.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/cpu-reset.S       |  8 -------
 arch/arm64/kernel/relocate_kernel.S | 36 ++++++++++++++++++-----------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 6ea337d464c4..d5cfc17b8e1f 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -30,14 +30,6 @@
  * flat identity mapping.
  */
 ENTRY(__cpu_soft_restart)
-	/* Clear sctlr_el1 flags. */
-	mrs	x12, sctlr_el1
-	ldr	x13, =SCTLR_ELx_FLAGS
-	bic	x12, x12, x13
-	pre_disable_mmu_workaround
-	msr	sctlr_el1, x12
-	isb
-
 	cbz	x0, 1f				// el2_switch?
 	mov	x0, #HVC_SOFT_RESTART
 	hvc	#0				// no return
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index c1d7db71a726..e2724fedd082 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -36,18 +36,6 @@ ENTRY(arm64_relocate_new_kernel)
 	mov	x14, xzr			/* x14 = entry ptr */
 	mov	x13, xzr			/* x13 = copy dest */
 
-	/* Clear the sctlr_el2 flags. */
-	mrs	x0, CurrentEL
-	cmp	x0, #CurrentEL_EL2
-	b.ne	1f
-	mrs	x0, sctlr_el2
-	ldr	x1, =SCTLR_ELx_FLAGS
-	bic	x0, x0, x1
-	pre_disable_mmu_workaround
-	msr	sctlr_el2, x0
-	isb
-1:
-
 	/* Check if the new image needs relocation. */
 	tbnz	x16, IND_DONE_BIT, .Ldone
 
@@ -63,10 +51,10 @@ ENTRY(arm64_relocate_new_kernel)
 	add     x20, x0, #PAGE_SIZE
 	sub     x1, x15, #1
 	bic     x0, x0, x1
-2:	dc      ivac, x0
+1:	dc      ivac, x0
 	add     x0, x0, x15
 	cmp     x0, x20
-	b.lo    2b
+	b.lo    1b
 	dsb     sy
 
 	mov x20, x13
@@ -104,6 +92,26 @@ ENTRY(arm64_relocate_new_kernel)
 	dsb	nsh
 	isb
 
+	/* Clear sctlr_el1 flags. */
+	mrs	x12, sctlr_el1
+	ldr	x13, =SCTLR_ELx_FLAGS
+	bic	x12, x12, x13
+	pre_disable_mmu_workaround
+	msr	sctlr_el1, x12
+	isb
+
+	/* Clear the sctlr_el2 flags. */
+	mrs	x0, CurrentEL
+	cmp	x0, #CurrentEL_EL2
+	b.ne	2f
+	mrs	x0, sctlr_el2
+	ldr	x1, =SCTLR_ELx_FLAGS
+	bic	x0, x0, x1
+	pre_disable_mmu_workaround
+	msr	sctlr_el2, x0
+	isb
+2:
+
 	/* Start new image. */
 	mov	x0, x18
 	mov	x1, xzr
-- 
2.22.0


^ permalink raw reply related

* [RFC v1 0/4] arm64: MMU enabled kexec kernel relocation
From: Pavel Tatashin @ 2019-07-16 16:56 UTC (permalink / raw)
  To: pasha.tatashin, jmorris, sashal, ebiederm, kexec, linux-kernel,
	corbet, catalin.marinas, will, linux-doc, linux-arm-kernel

Added identity mapped page table, and keep MMU enabled while
kernel is being relocated from sparse pages to the final
destination during kexec.

More description about the problem I am trying to solve here, can be
found here:
https://lore.kernel.org/lkml/20190709182014.16052-1-pasha.tatashin@soleen.com/

This patch series works in terms, that I can kexec-reboot both in QEMU
and on a physical machine. However, I do not see performance improvement
during relocation. The performance is just as slow as before with disabled
caches.

Am I missing something? Perhaps, there is some flag that I should also
enable in page table? Please provide me with any suggestions.

Pavel Tatashin (4):
  arm64, mm: identity mapped page table
  arm64, kexec: interface preparation for mmu enabled kexec
  arm64, kexec: add kexec's own identity page table
  arm64: Keep MMU on while kernel is being relocated

 arch/arm64/include/asm/ident_map.h  |  26 ++++++
 arch/arm64/include/asm/kexec.h      |   5 +-
 arch/arm64/kernel/cpu-reset.S       |   8 --
 arch/arm64/kernel/cpu-reset.h       |   7 +-
 arch/arm64/kernel/machine_kexec.c   | 128 +++++++++++++++++++++-------
 arch/arm64/kernel/relocate_kernel.S |  36 +++++---
 arch/arm64/mm/Makefile              |   1 +
 arch/arm64/mm/ident_map.c           |  99 +++++++++++++++++++++
 8 files changed, 255 insertions(+), 55 deletions(-)
 create mode 100644 arch/arm64/include/asm/ident_map.h
 create mode 100644 arch/arm64/mm/ident_map.c

-- 
2.22.0


^ permalink raw reply

* Re: [PATCH 03/14] docs: fix broken doc references due to renames
From: Jerry Hoemann @ 2019-07-16 16:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-watchdog, linux-doc
In-Reply-To: <aa415583bf6b812b0249093a601aa31412f3a1cf.1563277838.git.mchehab+samsung@kernel.org>

On Tue, Jul 16, 2019 at 09:10:42AM -0300, Mauro Carvalho Chehab wrote:
> Some files got renamed but probably due to some merge conflicts,
> a few references still point to the old locations.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
>  Documentation/RCU/rculist_nulls.txt                   | 2 +-
>  Documentation/devicetree/bindings/arm/idle-states.txt | 2 +-
>  Documentation/locking/spinlocks.rst                   | 4 ++--
>  Documentation/memory-barriers.txt                     | 2 +-
>  Documentation/translations/ko_KR/memory-barriers.txt  | 2 +-
>  Documentation/watchdog/hpwdt.rst                      | 2 +-

For: Documentation/watchdog/hpwdt.rst

Reviewed-by: Jerry Hoemann <jerry.hoemann@hpe.com>


>  MAINTAINERS                                           | 8 ++++----
>  drivers/gpu/drm/drm_modes.c                           | 2 +-
>  drivers/i2c/busses/i2c-nvidia-gpu.c                   | 2 +-
>  drivers/scsi/hpsa.c                                   | 4 ++--
>  10 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
> index 8151f0195f76..23f115dc87cf 100644
> --- a/Documentation/RCU/rculist_nulls.txt
> +++ b/Documentation/RCU/rculist_nulls.txt
> @@ -1,7 +1,7 @@
>  Using hlist_nulls to protect read-mostly linked lists and
>  objects using SLAB_TYPESAFE_BY_RCU allocations.
>  
> -Please read the basics in Documentation/RCU/listRCU.txt
> +Please read the basics in Documentation/RCU/listRCU.rst
>  
>  Using special makers (called 'nulls') is a convenient way
>  to solve following problem :
> diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
> index 326f29b270ad..2d325bed37e5 100644
> --- a/Documentation/devicetree/bindings/arm/idle-states.txt
> +++ b/Documentation/devicetree/bindings/arm/idle-states.txt
> @@ -703,4 +703,4 @@ cpus {
>      https://www.devicetree.org/specifications/ 
>  
>  [6] ARM Linux Kernel documentation - Booting AArch64 Linux
> -    Documentation/arm64/booting.txt
> +    Documentation/arm64/booting.rst
> diff --git a/Documentation/locking/spinlocks.rst b/Documentation/locking/spinlocks.rst
> index 098107fb7d86..e93ec6645238 100644
> --- a/Documentation/locking/spinlocks.rst
> +++ b/Documentation/locking/spinlocks.rst
> @@ -82,7 +82,7 @@ itself.  The read lock allows many concurrent readers.  Anything that
>  **changes** the list will have to get the write lock.
>  
>     NOTE! RCU is better for list traversal, but requires careful
> -   attention to design detail (see Documentation/RCU/listRCU.txt).
> +   attention to design detail (see Documentation/RCU/listRCU.rst).
>  
>  Also, you cannot "upgrade" a read-lock to a write-lock, so if you at _any_
>  time need to do any changes (even if you don't do it every time), you have
> @@ -90,7 +90,7 @@ to get the write-lock at the very beginning.
>  
>     NOTE! We are working hard to remove reader-writer spinlocks in most
>     cases, so please don't add a new one without consensus.  (Instead, see
> -   Documentation/RCU/rcu.txt for complete information.)
> +   Documentation/RCU/rcu.rst for complete information.)
>  
>  ----
>  
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 045bb8148fe9..1adbb8a371c7 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -548,7 +548,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
>  
>  	[*] For information on bus mastering DMA and coherency please read:
>  
> -	    Documentation/PCI/pci.rst
> +	    Documentation/driver-api/pci/pci.rst
>  	    Documentation/DMA-API-HOWTO.txt
>  	    Documentation/DMA-API.txt
>  
> diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
> index a33c2a536542..2774624ee843 100644
> --- a/Documentation/translations/ko_KR/memory-barriers.txt
> +++ b/Documentation/translations/ko_KR/memory-barriers.txt
> @@ -569,7 +569,7 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE 
>  
>  	[*] 버스 마스터링 DMA 와 일관성에 대해서는 다음을 참고하시기 바랍니다:
>  
> -	    Documentation/PCI/pci.rst
> +	    Documentation/driver-api/pci/pci.rst
>  	    Documentation/DMA-API-HOWTO.txt
>  	    Documentation/DMA-API.txt
>  
> diff --git a/Documentation/watchdog/hpwdt.rst b/Documentation/watchdog/hpwdt.rst
> index 94a96371113e..49c647dba8aa 100644
> --- a/Documentation/watchdog/hpwdt.rst
> +++ b/Documentation/watchdog/hpwdt.rst
> @@ -59,7 +59,7 @@ Last reviewed: 08/20/2018
>   and loop forever.  This is generally not what a watchdog user wants.
>  
>   For those wishing to learn more please see:
> -	Documentation/kdump/kdump.rst
> +	Documentation/admin-guide/kdump/kdump.rst
>  	Documentation/admin-guide/kernel-parameters.txt (panic=)
>  	Your Linux Distribution specific documentation.
>  
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b3a5c72f3298..b0acc138e9e9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -899,7 +899,7 @@ L:	linux-iio@vger.kernel.org
>  W:	http://ez.analog.com/community/linux-device-drivers 
>  S:	Supported
>  F:	drivers/iio/adc/ad7124.c
> -F:	Documentation/devicetree/bindings/iio/adc/adi,ad7124.txt
> +F:	Documentation/devicetree/bindings/iio/adc/adi,ad7124.yaml
>  
>  ANALOG DEVICES INC AD7606 DRIVER
>  M:	Stefan Popa <stefan.popa@analog.com>
> @@ -4189,7 +4189,7 @@ M:	Jens Axboe <axboe@kernel.dk>
>  L:	cgroups@vger.kernel.org
>  L:	linux-block@vger.kernel.org
>  T:	git git://git.kernel.dk/linux-block
> -F:	Documentation/cgroup-v1/blkio-controller.rst
> +F:	Documentation/admin-guide/cgroup-v1/blkio-controller.rst
>  F:	block/blk-cgroup.c
>  F:	include/linux/blk-cgroup.h
>  F:	block/blk-throttle.c
> @@ -6848,7 +6848,7 @@ R:	Sagi Shahar <sagis@google.com>
>  R:	Jon Olson <jonolson@google.com>
>  L:	netdev@vger.kernel.org
>  S:	Supported
> -F:	Documentation/networking/device_drivers/google/gve.txt
> +F:	Documentation/networking/device_drivers/google/gve.rst
>  F:	drivers/net/ethernet/google
>  
>  GPD POCKET FAN DRIVER
> @@ -12096,7 +12096,7 @@ M:	Juergen Gross <jgross@suse.com>
>  M:	Alok Kataria <akataria@vmware.com>
>  L:	virtualization@lists.linux-foundation.org
>  S:	Supported
> -F:	Documentation/virtual/paravirt_ops.txt
> +F:	Documentation/virtual/paravirt_ops.rst
>  F:	arch/*/kernel/paravirt*
>  F:	arch/*/include/asm/paravirt*.h
>  F:	include/linux/hypervisor.h
> diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
> index 57e6408288c8..4645af681ef8 100644
> --- a/drivers/gpu/drm/drm_modes.c
> +++ b/drivers/gpu/drm/drm_modes.c
> @@ -1680,7 +1680,7 @@ static int drm_mode_parse_cmdline_options(char *str, size_t len,
>   *
>   * Additionals options can be provided following the mode, using a comma to
>   * separate each option. Valid options can be found in
> - * Documentation/fb/modedb.txt.
> + * Documentation/fb/modedb.rst.
>   *
>   * The intermediate drm_cmdline_mode structure is required to store additional
>   * options from the command line modline like the force-enable/disable flag.
> diff --git a/drivers/i2c/busses/i2c-nvidia-gpu.c b/drivers/i2c/busses/i2c-nvidia-gpu.c
> index cfc76b5de726..5a1235fd86bb 100644
> --- a/drivers/i2c/busses/i2c-nvidia-gpu.c
> +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c
> @@ -364,7 +364,7 @@ static void gpu_i2c_remove(struct pci_dev *pdev)
>  /*
>   * We need gpu_i2c_suspend() even if it is stub, for runtime pm to work
>   * correctly. Without it, lspci shows runtime pm status as "D0" for the card.
> - * Documentation/power/pci.txt also insists for driver to provide this.
> + * Documentation/power/pci.rst also insists for driver to provide this.
>   */
>  static __maybe_unused int gpu_i2c_suspend(struct device *dev)
>  {
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 43a6b5350775..eaf6177ac9ee 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -7798,7 +7798,7 @@ static void hpsa_free_pci_init(struct ctlr_info *h)
>  	hpsa_disable_interrupt_mode(h);		/* pci_init 2 */
>  	/*
>  	 * call pci_disable_device before pci_release_regions per
> -	 * Documentation/PCI/pci.rst
> +	 * Documentation/driver-api/pci/pci.rst
>  	 */
>  	pci_disable_device(h->pdev);		/* pci_init 1 */
>  	pci_release_regions(h->pdev);		/* pci_init 2 */
> @@ -7881,7 +7881,7 @@ static int hpsa_pci_init(struct ctlr_info *h)
>  clean1:
>  	/*
>  	 * call pci_disable_device before pci_release_regions per
> -	 * Documentation/PCI/pci.rst
> +	 * Documentation/driver-api/pci/pci.rst
>  	 */
>  	pci_disable_device(h->pdev);
>  	pci_release_regions(h->pdev);
> -- 
> 2.21.0

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH v9 03/18] kunit: test: add string_stream a std::stream like string builder
From: Stephen Boyd @ 2019-07-16 15:33 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <CAFd5g47y4vDB2P=EsGN8305LGeQPCTveNs-Jd5-=6K-XKY==CA@mail.gmail.com>

Quoting Brendan Higgins (2019-07-15 15:43:20)
> On Mon, Jul 15, 2019 at 3:11 PM Brendan Higgins
> <brendanhiggins@google.com> wrote:
> >
> > On Mon, Jul 15, 2019 at 3:04 PM Stephen Boyd <sboyd@kernel.org> wrote:
> > >
> > > Quoting Brendan Higgins (2019-07-15 14:11:50)
> > > > On Mon, Jul 15, 2019 at 1:43 PM Stephen Boyd <sboyd@kernel.org> wrote:
> > > > >
> > > > > I also wonder if it would be better to just have a big slop buffer of a
> > > > > 4K page or something so that we almost never have to allocate anything
> > > > > with a string_stream and we can just rely on a reader consuming data
> > > > > while writers are writing. That might work out better, but I don't quite
> > > > > understand the use case for the string stream.
> > > >
> > > > That makes sense, but might that also waste memory since we will
> > > > almost never need that much memory?
> > >
> > > Why do we care? These are unit tests.
> >
> > Agreed.
> >
> > > Having allocations in here makes
> > > things more complicated, whereas it would be simpler to have a pointer
> > > and a spinlock operating on a chunk of memory that gets flushed out
> > > periodically.
> >
> > I am not so sure. I have to have the logic to allocate memory in some
> > case no matter what (what if I need more memory that my preallocated
> > chuck?). I think it is simpler to always request an allocation than to
> > only sometimes request an allocation.
> 
> Another even simpler alternative might be to just allocate memory
> using kunit_kmalloc as we need it and just let the kunit_resource code
> handle cleaning it all up when the test case finishes.

Sure, sounds like a nice way to avoid duplicating similar logic to
maintain a list of things to free later.

> 
> What do you think?

If you go the allocation route then you'll need to have the flags to
know what context you're in to allocate appropriately. Does that mean
all the string operations will now take GFP flags?


^ permalink raw reply

* Re: [PATCH v9 04/18] kunit: test: add kunit_stream a std::stream like logger
From: Stephen Boyd @ 2019-07-16 15:30 UTC (permalink / raw)
  To: Brendan Higgins
  Cc: Frank Rowand, Greg KH, Josh Poimboeuf, Kees Cook, Kieran Bingham,
	Luis Chamberlain, Peter Zijlstra, Rob Herring, shuah,
	Theodore Ts'o, Masahiro Yamada, devicetree, dri-devel,
	kunit-dev, open list:DOCUMENTATION, linux-fsdevel, linux-kbuild,
	Linux Kernel Mailing List, open list:KERNEL SELFTEST FRAMEWORK,
	linux-nvdimm, linux-um, Sasha Levin, Bird, Timothy,
	Amir Goldstein, Dan Carpenter, Daniel Vetter, Jeff Dike,
	Joel Stanley, Julia Lawall, Kevin Hilman, Knut Omang,
	Logan Gunthorpe, Michael Ellerman, Petr Mladek, Randy Dunlap,
	Richard Weinberger, David Rientjes, Steven Rostedt, wfg
In-Reply-To: <CAFd5g44_axVHNMBzxSURQB_-R+Rif7cZcg7PyZ_SS+5hcy5jZA@mail.gmail.com>

Quoting Brendan Higgins (2019-07-16 01:37:34)
> On Tue, Jul 16, 2019 at 12:57 AM Brendan Higgins
> <brendanhiggins@google.com> wrote:
> >
> > On Mon, Jul 15, 2019 at 3:15 PM Stephen Boyd <sboyd@kernel.org> wrote:
> > >
> > > Quoting Brendan Higgins (2019-07-12 01:17:30)
> > > > diff --git a/include/kunit/kunit-stream.h b/include/kunit/kunit-stream.h
> > > > new file mode 100644
> > > > index 0000000000000..a7b53eabf6be4
> > > > --- /dev/null
> > > > +++ b/include/kunit/kunit-stream.h
> > > > +/**
> > > > + * struct kunit_stream - a std::stream style string builder.
> > > > + *
> > > > + * A std::stream style string builder. Allows messages to be built up and
> > > > + * printed all at once.
> > > > + */
> > > > +struct kunit_stream {
> > > > +       /* private: internal use only. */
> > > > +       struct kunit *test;
> > > > +       const char *level;
> > >
> > > Is the level changed? See my comment below, but I wonder if this whole
> > > struct can go away and the wrappers can just operate on 'struct
> > > string_stream' instead.
> >
> > I was inclined to agree with you when I first read your comment, but
> > then I thought about the case that someone wants to add in a debug
> > message (of which I currently have none). I think under most
> > circumstances a user of kunit_stream would likely want to pick a
> > default verbosity that maybe I should provide, but may still want
> > different verbosity levels.
> >
> > The main reason I want to keep the types separate, string_stream vs.
> > kunit_stream, is that they are intended to be used differently.
> > string_stream is just a generic string builder. If you are using that,
> > you are expecting to see someone building the string at some point and
> > then doing something interesting with it. kunit_stream really tells
> > you specifically that KUnit is putting together a message to
> > communicate something to a user of KUnit. It is really used in a very
> > specific way, and I wouldn't want to generalize its usage beyond how
> > it is currently used. I think in order to preserve the author's
> > intention it adds clarity to keep the types separate regardless of how
> > similar they might be in reality.

You may want to add some of these reasons to the commit text.

> > > > +
> > > > +       if (!string_stream_is_empty(stream->internal_stream)) {
> > > > +               kunit_err(stream->test,
> > > > +                         "End of test case reached with uncommitted stream entries\n");
> > > > +               kunit_stream_commit(stream);
> > > > +       }
> > > > +}
> > > > +
> > >
> > > Nitpick: Drop this extra newline.
> >
> > Oops, nice catch.
> 
> Not super important, but I don't want you to think that I am ignoring
> you. I think you must have unintentionally deleted the last function
> in this file, or maybe you are referring to something that I am just
> not seeing, but I don't see the extra newline here.

No worries. Sorry for the noise.

> > property of the input, it may or may not be enough information on its
> > own for the expectation to fail, but we want to share the result of
> > the property check with the user regardless, BUT only if the
> > expectation as a whole fails.
> >
> > Hence, we can have multiple `struct kunit_stream`s associated with a
> > `struct kunit` active at any given time.

Makes sense. I wasn't sure if there were more than one stream associated
with a test. Sounds like there are many to one so it can't just be a
member of the test. This could be documented somewhere so this question
doesn't come up again.


^ permalink raw reply

* Re: [PATCH] tracing/fgraph: support recording function return values
From: Steven Rostedt @ 2019-07-16 15:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Changbin Du, Will Deacon, mingo, corbet, linux, catalin.marinas,
	tglx, bp, hpa, x86, linux-doc, linux-kernel, linux-arm-kernel
In-Reply-To: <20190716142005.GE3402@hirez.programming.kicks-ass.net>

On Tue, 16 Jul 2019 16:20:05 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Jul 16, 2019 at 10:08:18PM +0800, Changbin Du wrote:
> > On Mon, Jul 15, 2019 at 12:12:31PM +0200, Peter Zijlstra wrote:  
> 
> > > Alternatively, we can have recordmcount (or objtool) mark all functions
> > > with a return value when the build has DEBUG_INFO on. The dwarves know
> > > the function signature.
> > >  
> > We can extend the recordmcount tool to search 'subprogram' tag in the DIE tree.
> > In below example, the 'DW_AT_type' is the type of function pidfd_create().
> > 
> > $ readelf -w kernel/pid.o
> >  [...]
> >  <1><1b914>: Abbrev Number: 232 (DW_TAG_subprogram)
> >     <1b916>   DW_AT_name        : (indirect string, offset: 0x415e): pidfd_create
> >     <1b91a>   DW_AT_decl_file   : 1
> >     <1b91b>   DW_AT_decl_line   : 471
> >     <1b91d>   DW_AT_decl_column : 12
> >     <1b91e>   DW_AT_prototyped  : 1
> >     <1b91e>   DW_AT_type        : <0xcc>
> >     <1b922>   DW_AT_low_pc      : 0x450
> >     <1b92a>   DW_AT_high_pc     : 0x50
> >     <1b932>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
> >     <1b934>   DW_AT_GNU_all_call_sites: 1
> >     <1b934>   DW_AT_sibling     : <0x1b9d9>
> >  [...]
> > 
> > To that end, we need to introduce libdw library for recordmcount. I will have a
> > try this week.  
> 
> Right; but only when this config option is set.

Sure, and we can have fgraph support of return values depend on that
option ;-)

> 
> > And probably, we can also record the parameters?  
> 
> The 'fun' part is where to store all this information in the kernel and
> how fast you can find it while tracing.

This has been on my TODO list for a long time (I'm really happy if
someone else would do it!). My thought is that this information would
need to be able to be a module and loaded (like config.gz can be). And
then you can load the info, do the tracing, and then unload it.

For the speed part, we can add a way to hook the function with the
parameters, which shouldn't be an issue, as we already do that when
filtering for function graph. There's a function hash table that fgraph
users have that is tested to see if it should trace the function or
not. And the functions themselves are recorded in a mostly binary array
that can be looked up via a binary search from the ip address.

-- Steve

^ permalink raw reply

* Re: [PATCH 02/12] Documentation/arm: repointer docs to Documentation/arch/arm
From: Krzysztof Kozlowski @ 2019-07-16 14:51 UTC (permalink / raw)
  To: Alex Shi
  Cc: linux-doc, Jonathan Corbet, linux-kernel, linux-stm32,
	linux-arm-kernel, linuxppc-dev, linux-riscv, linux-omap,
	linux-fbdev, linux-samsung-soc@vger.kernel.org, linux-ia64,
	linux-mips, linux-parisc, linux-scsi, linux-s390, kvm, linux-sh,
	Kukjin Kim, linux-crypto, linux-input, linux-serial
In-Reply-To: <20190712022018.27989-2-alex.shi@linux.alibaba.com>

On Fri, 12 Jul 2019 at 04:20, Alex Shi <alex.shi@linux.alibaba.com> wrote:
>
> Since we move 'arm/arm64' docs to Documentation/arch/{arm,arm64} dir,
> redirect the doc pointer to them.
>
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-input@vger.kernel.org
> Cc: linux-serial@vger.kernel.org
> ---
>  Documentation/arch/arm/Samsung-S3C24XX/GPIO.txt    |  2 +-
>  .../arch/arm/Samsung-S3C24XX/Overview.txt          |  6 +++---
>  Documentation/arch/arm/Samsung/GPIO.txt            |  2 +-
>  Documentation/arch/arm/Samsung/Overview.txt        |  4 ++--
>  Documentation/devicetree/bindings/arm/xen.txt      |  2 +-
>  Documentation/devicetree/booting-without-of.txt    |  4 ++--
>  Documentation/translations/zh_CN/arm/Booting       |  4 ++--
>  .../translations/zh_CN/arm/kernel_user_helpers.txt |  4 ++--
>  MAINTAINERS                                        |  6 +++---

I assume it will go through doc tree, so for Samsung:
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>

Best regards,
Krzysztof

^ permalink raw reply

* [PATCH] docs/vm: transhuge: fix typo in madvise reference
From: Jeremy Cline @ 2019-07-16 14:49 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Jeremy Cline

Fix an off-by-one typo in the transparent huge pages admin
documentation.

Signed-off-by: Jeremy Cline <jcline@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 7ab93a8404b9..bd5714547cee 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -53,7 +53,7 @@ disabled, there is ``khugepaged`` daemon that scans memory and
 collapses sequences of basic pages into huge pages.
 
 The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
-interface and using madivse(2) and prctl(2) system calls.
+interface and using madvise(2) and prctl(2) system calls.
 
 Transparent Hugepage Support maximizes the usefulness of free memory
 if compared to the reservation approach of hugetlbfs by allowing all
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] tracing/fgraph: support recording function return values
From: Peter Zijlstra @ 2019-07-16 14:20 UTC (permalink / raw)
  To: Changbin Du
  Cc: Will Deacon, rostedt, mingo, corbet, linux, catalin.marinas, tglx,
	bp, hpa, x86, linux-doc, linux-kernel, linux-arm-kernel
In-Reply-To: <20190716140817.za4rad3hx76efqgp@mail.google.com>

On Tue, Jul 16, 2019 at 10:08:18PM +0800, Changbin Du wrote:
> On Mon, Jul 15, 2019 at 12:12:31PM +0200, Peter Zijlstra wrote:

> > Alternatively, we can have recordmcount (or objtool) mark all functions
> > with a return value when the build has DEBUG_INFO on. The dwarves know
> > the function signature.
> >
> We can extend the recordmcount tool to search 'subprogram' tag in the DIE tree.
> In below example, the 'DW_AT_type' is the type of function pidfd_create().
> 
> $ readelf -w kernel/pid.o
>  [...]
>  <1><1b914>: Abbrev Number: 232 (DW_TAG_subprogram)
>     <1b916>   DW_AT_name        : (indirect string, offset: 0x415e): pidfd_create
>     <1b91a>   DW_AT_decl_file   : 1
>     <1b91b>   DW_AT_decl_line   : 471
>     <1b91d>   DW_AT_decl_column : 12
>     <1b91e>   DW_AT_prototyped  : 1
>     <1b91e>   DW_AT_type        : <0xcc>
>     <1b922>   DW_AT_low_pc      : 0x450
>     <1b92a>   DW_AT_high_pc     : 0x50
>     <1b932>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
>     <1b934>   DW_AT_GNU_all_call_sites: 1
>     <1b934>   DW_AT_sibling     : <0x1b9d9>
>  [...]
> 
> To that end, we need to introduce libdw library for recordmcount. I will have a
> try this week.

Right; but only when this config option is set.

> And probably, we can also record the parameters?

The 'fun' part is where to store all this information in the kernel and
how fast you can find it while tracing.

^ permalink raw reply

* Re: [PATCH] tracing/fgraph: support recording function return values
From: Changbin Du @ 2019-07-16 14:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Changbin Du, rostedt, mingo, corbet, linux,
	catalin.marinas, tglx, bp, hpa, x86, linux-doc, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20190715101231.GB3419@hirez.programming.kicks-ass.net>

On Mon, Jul 15, 2019 at 12:12:31PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2019 at 09:29:30AM +0100, Will Deacon wrote:
> > On Sat, Jul 13, 2019 at 08:10:26PM +0800, Changbin Du wrote:
> > > This patch adds a new trace option 'funcgraph-retval' and is disabled by
> > > default. When this option is enabled, fgraph tracer will show the return
> > > value of each function. This is useful to find/analyze a original error
> > > source in a call graph.
> > > 
> > > One limitation is that the kernel doesn't know the prototype of functions.
> > > So fgraph assumes all functions have a retvalue of type int. You must ignore
> > > the value of *void* function. And if the retvalue looks like an error code
> > > then both hexadecimal and decimal number are displayed.
> > 
> > This seems like quite a significant drawback and I think it could be pretty
> > confusing if you have to filter out bogus return values from the trace.
> > 
> > For example, in your snippet:
> > 
> > >  3)               |  kvm_vm_ioctl() {
> > >  3)               |    mutex_lock() {
> > >  3)               |      _cond_resched() {
> > >  3)   0.234 us    |        rcu_all_qs(); /* ret=0x80000000 */
> > >  3)   0.704 us    |      } /* ret=0x0 */
> > >  3)   1.226 us    |    } /* ret=0x0 */
> > >  3)   0.247 us    |    mutex_unlock(); /* ret=0xffff8880738ed040 */
> > 
> > mutex_unlock() is wrongly listed as returning something.
> > 
> > How much of this could be achieved from userspace by placing kretprobes on
> > non-void functions instead?
> 
> Alternatively, we can have recordmcount (or objtool) mark all functions
> with a return value when the build has DEBUG_INFO on. The dwarves know
> the function signature.
>
We can extend the recordmcount tool to search 'subprogram' tag in the DIE tree.
In below example, the 'DW_AT_type' is the type of function pidfd_create().

$ readelf -w kernel/pid.o
 [...]
 <1><1b914>: Abbrev Number: 232 (DW_TAG_subprogram)
    <1b916>   DW_AT_name        : (indirect string, offset: 0x415e): pidfd_create
    <1b91a>   DW_AT_decl_file   : 1
    <1b91b>   DW_AT_decl_line   : 471
    <1b91d>   DW_AT_decl_column : 12
    <1b91e>   DW_AT_prototyped  : 1
    <1b91e>   DW_AT_type        : <0xcc>
    <1b922>   DW_AT_low_pc      : 0x450
    <1b92a>   DW_AT_high_pc     : 0x50
    <1b932>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
    <1b934>   DW_AT_GNU_all_call_sites: 1
    <1b934>   DW_AT_sibling     : <0x1b9d9>
 [...]

To that end, we need to introduce libdw library for recordmcount. I will have a
try this week.

And probably, we can also record the parameters?

-- 
Cheers,
Changbin Du

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox