LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/rtas: Restrict RTAS requests from userspace
From: Andrew Donnellan @ 2020-07-02 16:19 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nathanl, leobras.c, stable, Daniel Axtens

A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.

The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.

Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.

However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).

Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.

In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.

We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.

The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.

Reported-by: Daniel Axtens <dja@axtens.net>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 arch/powerpc/Kconfig       |  13 +++
 arch/powerpc/kernel/rtas.c | 198 +++++++++++++++++++++++++++++++++++++
 2 files changed, 211 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..0e2dfe497357 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -973,6 +973,19 @@ config PPC_SECVAR_SYSFS
 	  read/write operations on these variables. Say Y if you have
 	  secure boot enabled and want to expose variables to userspace.
 
+config PPC_RTAS_FILTER
+	bool "Enable filtering of RTAS syscalls"
+	default y
+	depends on PPC_RTAS
+	help
+	  The RTAS syscall API has security issues that could be used to
+	  compromise system integrity. This option enforces restrictions on the
+	  RTAS calls and arguments passed by userspace programs to mitigate
+	  these issues.
+
+	  Say Y unless you know what you are doing and the filter is causing
+	  problems for you.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index a09eba03f180..ec1cae52d8bd 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -324,6 +324,23 @@ int rtas_token(const char *service)
 }
 EXPORT_SYMBOL(rtas_token);
 
+#ifdef CONFIG_PPC_RTAS_FILTER
+
+static char *rtas_token_name(int token)
+{
+	struct property *prop;
+
+	for_each_property_of_node(rtas.dev, prop) {
+		const __be32 *tokp = prop->value;
+
+		if (tokp && be32_to_cpu(*tokp) == token)
+			return prop->name;
+	}
+	return NULL;
+}
+
+#endif /* CONFIG_PPC_RTAS_FILTER */
+
 int rtas_service_present(const char *service)
 {
 	return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
@@ -1110,6 +1127,184 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 	return NULL;
 }
 
+#ifdef CONFIG_PPC_RTAS_FILTER
+
+/*
+ * The sys_rtas syscall, as originally designed, allows root to pass
+ * arbitrary physical addresses to RTAS calls. A number of RTAS calls
+ * can be abused to write to arbitrary memory and do other things that
+ * are potentially harmful to system integrity, and thus should only
+ * be used inside the kernel and not exposed to userspace.
+ *
+ * All known legitimate users of the sys_rtas syscall will only ever
+ * pass addresses that fall within the RMO buffer, and use a known
+ * subset of RTAS calls.
+ *
+ * Accordingly, we filter RTAS requests to check that the call is
+ * permitted, and that provided pointers fall within the RMO buffer.
+ * The rtas_filters list contains an entry for each permitted call,
+ * with the indexes of the parameters which are expected to contain
+ * addresses and sizes of buffers allocated inside the RMO buffer.
+ */
+struct rtas_filter {
+	const char name[32];
+
+	/* Indexes into the args buffer, -1 if not used */
+	int rmo_buf_idx1;
+	int rmo_size_idx1;
+	int rmo_buf_idx2;
+	int rmo_size_idx2;
+};
+
+struct rtas_filter rtas_filters[] = {
+	{ "ibm,activate-firmware", -1, -1, -1, -1 },
+	{ "ibm,configure-connector", 0, -1, 1, -1 },	/* Special cased, size 4096 */
+	{ "display-character", -1, -1, -1, -1 },
+	{ "ibm,display-message", 0, -1, -1, -1 },
+	{ "ibm,errinjct", 2, -1, -1, -1 },		/* Fixed size of 1024 */
+	{ "ibm,close-errinjct", -1, -1, -1, -1 },
+	{ "ibm,open-errinct", -1, -1, -1, -1 },
+	{ "ibm,get-config-addr-info2", -1, -1, -1, -1 },
+	{ "ibm,get-dynamic-sensor-state", 1, -1, -1, -1 },
+	{ "ibm,get-indices", 2, 3, -1, -1 },
+	{ "get-power-level", -1, -1, -1, -1 },
+	{ "get-sensor-state", -1, -1, -1, -1 },
+	{ "ibm,get-system-parameter", 1, 2, -1, -1 },
+	{ "get-time-of-day", -1, -1, -1, -1 },
+	{ "ibm,get-vpd", 0, -1, 1, 2 },
+	{ "ibm,lpar-perftools", 2, 3, -1, -1 },
+	{ "ibm,platform-dump", 4, 5, -1, -1 },
+	{ "ibm,read-slot-reset-state", -1, -1, -1, -1 },
+	{ "ibm,scan-log-dump", 0, 1, -1, -1 },
+	{ "ibm,set-dynamic-indicator", 2, -1, -1, -1 },
+	{ "ibm,set-eeh-option", -1, -1, -1, -1 },
+	{ "set-indicator", -1, -1, -1, -1 },
+	{ "set-power-level", -1, -1, -1, -1 },
+	{ "set-time-for-power-on", -1, -1, -1, -1 },
+	{ "ibm,set-system-parameter", 1, -1, -1, -1 },
+	{ "set-time-of-day", -1, -1, -1, -1 },
+	{ "ibm,suspend-me", -1, -1, -1, -1 },
+	{ "ibm,update-nodes", 0, -1, -1, -1 },		/* Fixed size of 4096 */
+	{ "ibm,update-properties", 0, -1, -1, -1 },	/* Fixed size of 4096 */
+	{ "ibm,physical-attestation", 0, 1, -1, -1 },
+};
+
+static void dump_rtas_params(int token, int nargs, int nret,
+			     struct rtas_args *args)
+{
+	int i;
+	char *token_name = rtas_token_name(token);
+
+	pr_err_ratelimited("sys_rtas: token=0x%x (%s), nargs=%d, nret=%d (called by %s)\n",
+			   token, token_name ? token_name : "unknown", nargs,
+			   nret, current->comm);
+	pr_err_ratelimited("sys_rtas: args: ");
+
+	for (i = 0; i < nargs; i++) {
+		u32 arg = be32_to_cpu(args->args[i]);
+
+		pr_cont("%08x ", arg);
+		if (arg >= rtas_rmo_buf &&
+		    arg < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
+			pr_cont("(buf+0x%lx) ", arg - rtas_rmo_buf);
+	}
+
+	pr_cont("\n");
+}
+
+static bool in_rmo_buf(u32 base, u32 end)
+{
+	return base >= rtas_rmo_buf &&
+		base < (rtas_rmo_buf + RTAS_RMOBUF_MAX) &&
+		base <= end &&
+		end >= rtas_rmo_buf &&
+		end < (rtas_rmo_buf + RTAS_RMOBUF_MAX);
+}
+
+static bool block_rtas_call(int token, int nargs,
+			    struct rtas_args *args)
+{
+	int i;
+	const char *reason;
+	char *token_name = rtas_token_name(token);
+
+	if (!token_name)
+		goto err_notpermitted;
+
+	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+		struct rtas_filter *f = &rtas_filters[i];
+		u32 base, size, end;
+
+		if (strcmp(token_name, f->name))
+			continue;
+
+		if (f->rmo_buf_idx1 != -1) {
+			base = be32_to_cpu(args->args[f->rmo_buf_idx1]);
+			if (f->rmo_size_idx1 != -1)
+				size = be32_to_cpu(args->args[f->rmo_size_idx1]);
+			else if (!strcmp(token_name, "ibm,errinjct"))
+				size = 1024;
+			else if (!strcmp(token_name, "ibm,update-nodes") ||
+				 !strcmp(token_name, "ibm,update-properties") ||
+				 !strcmp(token_name, "ibm,configure-connector"))
+				size = 4096;
+			else
+				size = 1;
+
+			end = base + size - 1;
+			if (!in_rmo_buf(base, end)) {
+				reason = "address pair 1 out of range";
+				goto err;
+			}
+		}
+
+		if (f->rmo_buf_idx2 != -1) {
+			base = be32_to_cpu(args->args[f->rmo_buf_idx2]);
+			if (f->rmo_size_idx2 != -1)
+				size = be32_to_cpu(args->args[f->rmo_size_idx2]);
+			else if (!strcmp(token_name, "ibm,configure-connector"))
+				size = 4096;
+			else
+				size = 1;
+			end = base + size - 1;
+
+			/*
+			 * Special case for ibm,configure-connector where the
+			 * address can be 0
+			 */
+			if (!strcmp(token_name, "ibm,configure-connector") &&
+			    base == 0)
+				return false;
+
+			if (!in_rmo_buf(base, end)) {
+				reason = "address pair 2 out of range";
+				goto err;
+			}
+		}
+
+		return false;
+	}
+
+err_notpermitted:
+	reason = "call not permitted";
+
+err:
+	pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt? (%s)\n",
+			   reason);
+	dump_rtas_params(token, nargs, 0, args);
+	return true;
+}
+
+#else
+
+static bool block_rtas_call(int token, int nargs,
+			    struct rtas_args *args)
+{
+	return false;
+}
+
+#endif /* CONFIG_PPC_RTAS_FILTER */
+
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
@@ -1147,6 +1342,9 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	args.rets = &args.args[nargs];
 	memset(args.rets, 0, nret * sizeof(rtas_arg_t));
 
+	if (block_rtas_call(token, nargs, &args))
+		return -EINVAL;
+
 	/* Need to handle ibm,suspend_me call specially */
 	if (token == ibm_suspend_me_token) {
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v5] ocxl: control via sysfs whether the FPGA is reloaded on a link reset
From: Andrew Donnellan @ 2020-07-02 16:55 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, clombard, alastair
In-Reply-To: <20200619140439.153962-1-fbarrat@linux.ibm.com>

On 20/6/20 12:04 am, Frederic Barrat wrote:
> From: Philippe Bergheaud <felix@linux.ibm.com>
> 
> Some opencapi FPGA images allow to control if the FPGA should be reloaded
> on the next adapter reset. If it is supported, the image specifies it
> through a Vendor Specific DVSEC in the config space of function 0.
> 
> Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
From: Catalin Marinas @ 2020-07-02 16:55 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, Kazuhito Hagio, Bhupesh Sharma, x86, kexec,
	linux-kernel, linuxppc-dev, Paul Mackerras, Boris Petkov,
	James Morse, Thomas Gleixner, bhupesh.linux, Will Deacon,
	Ingo Molnar, linux-arm-kernel, Dave Anderson
In-Reply-To: <20200702120855.GD21026@dhcp-128-65.nay.redhat.com>

On Thu, Jul 02, 2020 at 08:08:55PM +0800, Dave Young wrote:
> Hi Catalin,
> On 07/02/20 at 12:00pm, Catalin Marinas wrote:
> > On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > index 9f1557b98468..18175687133a 100644
> > > --- a/kernel/crash_core.c
> > > +++ b/kernel/crash_core.c
> > > @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> > >  	VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
> > >  	VMCOREINFO_STRUCT_SIZE(mem_section);
> > >  	VMCOREINFO_OFFSET(mem_section, section_mem_map);
> > > +	VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
> > >  #endif
> > >  	VMCOREINFO_STRUCT_SIZE(page);
> > >  	VMCOREINFO_STRUCT_SIZE(pglist_data);
> > 
> > I can queue this patch via the arm64 tree (together with the second one)
> > but I'd like an ack from the kernel/crash_core.c maintainers. They don't
> > seem to have been cc'ed either (only the kexec list).
> 
> For the VMCOREINFO part, I'm fine with the changes, but since I do not
> understand the arm64 pieces so I would like to leave to arm64 people to
> review.  If arm64 bits are good enough, feel free to add:
> 
> Acked-by: Dave Young <dyoung@redhat.com>

Thanks.

-- 
Catalin

^ permalink raw reply

* Re: [PATCH 6/8] powerpc/pseries: implement paravirt qspinlocks for SPLPAR
From: kernel test robot @ 2020-07-02 16:15 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: kbuild-all, Peter Zijlstra, linuxppc-dev, Boqun Feng,
	linux-kernel, Nicholas Piggin, virtualization, Ingo Molnar,
	Waiman Long, Will Deacon
In-Reply-To: <20200702074839.1057733-7-npiggin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7266 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on tip/locking/core v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-queued-spinlocks-and-rwlocks/20200702-155158
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/locking/lock_events.c:61:16: warning: no previous prototype for 'lockevent_read' [-Wmissing-prototypes]
      61 | ssize_t __weak lockevent_read(struct file *file, char __user *user_buf,
         |                ^~~~~~~~~~~~~~
   kernel/locking/lock_events.c: In function 'skip_lockevent':
>> kernel/locking/lock_events.c:126:12: error: implicit declaration of function 'pv_is_native_spin_unlock' [-Werror=implicit-function-declaration]
     126 |   pv_on = !pv_is_native_spin_unlock();
         |            ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/pv_is_native_spin_unlock +126 kernel/locking/lock_events.c

fb346fd9fc081c Waiman Long 2019-04-04   57  
fb346fd9fc081c Waiman Long 2019-04-04   58  /*
fb346fd9fc081c Waiman Long 2019-04-04   59   * The lockevent_read() function can be overridden.
fb346fd9fc081c Waiman Long 2019-04-04   60   */
fb346fd9fc081c Waiman Long 2019-04-04  @61  ssize_t __weak lockevent_read(struct file *file, char __user *user_buf,
fb346fd9fc081c Waiman Long 2019-04-04   62  			      size_t count, loff_t *ppos)
fb346fd9fc081c Waiman Long 2019-04-04   63  {
fb346fd9fc081c Waiman Long 2019-04-04   64  	char buf[64];
fb346fd9fc081c Waiman Long 2019-04-04   65  	int cpu, id, len;
fb346fd9fc081c Waiman Long 2019-04-04   66  	u64 sum = 0;
fb346fd9fc081c Waiman Long 2019-04-04   67  
fb346fd9fc081c Waiman Long 2019-04-04   68  	/*
fb346fd9fc081c Waiman Long 2019-04-04   69  	 * Get the counter ID stored in file->f_inode->i_private
fb346fd9fc081c Waiman Long 2019-04-04   70  	 */
fb346fd9fc081c Waiman Long 2019-04-04   71  	id = (long)file_inode(file)->i_private;
fb346fd9fc081c Waiman Long 2019-04-04   72  
fb346fd9fc081c Waiman Long 2019-04-04   73  	if (id >= lockevent_num)
fb346fd9fc081c Waiman Long 2019-04-04   74  		return -EBADF;
fb346fd9fc081c Waiman Long 2019-04-04   75  
fb346fd9fc081c Waiman Long 2019-04-04   76  	for_each_possible_cpu(cpu)
fb346fd9fc081c Waiman Long 2019-04-04   77  		sum += per_cpu(lockevents[id], cpu);
fb346fd9fc081c Waiman Long 2019-04-04   78  	len = snprintf(buf, sizeof(buf) - 1, "%llu\n", sum);
fb346fd9fc081c Waiman Long 2019-04-04   79  
fb346fd9fc081c Waiman Long 2019-04-04   80  	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
fb346fd9fc081c Waiman Long 2019-04-04   81  }
fb346fd9fc081c Waiman Long 2019-04-04   82  
fb346fd9fc081c Waiman Long 2019-04-04   83  /*
fb346fd9fc081c Waiman Long 2019-04-04   84   * Function to handle write request
fb346fd9fc081c Waiman Long 2019-04-04   85   *
fb346fd9fc081c Waiman Long 2019-04-04   86   * When idx = reset_cnts, reset all the counts.
fb346fd9fc081c Waiman Long 2019-04-04   87   */
fb346fd9fc081c Waiman Long 2019-04-04   88  static ssize_t lockevent_write(struct file *file, const char __user *user_buf,
fb346fd9fc081c Waiman Long 2019-04-04   89  			   size_t count, loff_t *ppos)
fb346fd9fc081c Waiman Long 2019-04-04   90  {
fb346fd9fc081c Waiman Long 2019-04-04   91  	int cpu;
fb346fd9fc081c Waiman Long 2019-04-04   92  
fb346fd9fc081c Waiman Long 2019-04-04   93  	/*
fb346fd9fc081c Waiman Long 2019-04-04   94  	 * Get the counter ID stored in file->f_inode->i_private
fb346fd9fc081c Waiman Long 2019-04-04   95  	 */
fb346fd9fc081c Waiman Long 2019-04-04   96  	if ((long)file_inode(file)->i_private != LOCKEVENT_reset_cnts)
fb346fd9fc081c Waiman Long 2019-04-04   97  		return count;
fb346fd9fc081c Waiman Long 2019-04-04   98  
fb346fd9fc081c Waiman Long 2019-04-04   99  	for_each_possible_cpu(cpu) {
fb346fd9fc081c Waiman Long 2019-04-04  100  		int i;
fb346fd9fc081c Waiman Long 2019-04-04  101  		unsigned long *ptr = per_cpu_ptr(lockevents, cpu);
fb346fd9fc081c Waiman Long 2019-04-04  102  
fb346fd9fc081c Waiman Long 2019-04-04  103  		for (i = 0 ; i < lockevent_num; i++)
fb346fd9fc081c Waiman Long 2019-04-04  104  			WRITE_ONCE(ptr[i], 0);
fb346fd9fc081c Waiman Long 2019-04-04  105  	}
fb346fd9fc081c Waiman Long 2019-04-04  106  	return count;
fb346fd9fc081c Waiman Long 2019-04-04  107  }
fb346fd9fc081c Waiman Long 2019-04-04  108  
fb346fd9fc081c Waiman Long 2019-04-04  109  /*
fb346fd9fc081c Waiman Long 2019-04-04  110   * Debugfs data structures
fb346fd9fc081c Waiman Long 2019-04-04  111   */
fb346fd9fc081c Waiman Long 2019-04-04  112  static const struct file_operations fops_lockevent = {
fb346fd9fc081c Waiman Long 2019-04-04  113  	.read = lockevent_read,
fb346fd9fc081c Waiman Long 2019-04-04  114  	.write = lockevent_write,
fb346fd9fc081c Waiman Long 2019-04-04  115  	.llseek = default_llseek,
fb346fd9fc081c Waiman Long 2019-04-04  116  };
fb346fd9fc081c Waiman Long 2019-04-04  117  
bf20616f46e536 Waiman Long 2019-04-04  118  #ifdef CONFIG_PARAVIRT_SPINLOCKS
bf20616f46e536 Waiman Long 2019-04-04  119  #include <asm/paravirt.h>
bf20616f46e536 Waiman Long 2019-04-04  120  
bf20616f46e536 Waiman Long 2019-04-04  121  static bool __init skip_lockevent(const char *name)
bf20616f46e536 Waiman Long 2019-04-04  122  {
bf20616f46e536 Waiman Long 2019-04-04  123  	static int pv_on __initdata = -1;
bf20616f46e536 Waiman Long 2019-04-04  124  
bf20616f46e536 Waiman Long 2019-04-04  125  	if (pv_on < 0)
bf20616f46e536 Waiman Long 2019-04-04 @126  		pv_on = !pv_is_native_spin_unlock();
bf20616f46e536 Waiman Long 2019-04-04  127  	/*
bf20616f46e536 Waiman Long 2019-04-04  128  	 * Skip PV qspinlock events on bare metal.
bf20616f46e536 Waiman Long 2019-04-04  129  	 */
bf20616f46e536 Waiman Long 2019-04-04  130  	if (!pv_on && !memcmp(name, "pv_", 3))
bf20616f46e536 Waiman Long 2019-04-04  131  		return true;
bf20616f46e536 Waiman Long 2019-04-04  132  	return false;
bf20616f46e536 Waiman Long 2019-04-04  133  }
bf20616f46e536 Waiman Long 2019-04-04  134  #else
bf20616f46e536 Waiman Long 2019-04-04  135  static inline bool skip_lockevent(const char *name)
bf20616f46e536 Waiman Long 2019-04-04  136  {
bf20616f46e536 Waiman Long 2019-04-04  137  	return false;
bf20616f46e536 Waiman Long 2019-04-04  138  }
bf20616f46e536 Waiman Long 2019-04-04  139  #endif
bf20616f46e536 Waiman Long 2019-04-04  140  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69747 bytes --]

^ permalink raw reply

* Re: [PATCH 18/20] block: refator submit_bio_noacct
From: Naresh Kamboju @ 2020-07-02 15:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-xtensa, linux-nvdimm, linux-s390,
	Alexander Potapenko, linux-m68k, linux-nvme, open list,
	linux-raid, dm-devel, Qian Cai, kasan-dev, Andrey Ryabinin,
	linux-bcache, linuxppc-dev, Dmitry Vyukov, drbd-dev
In-Reply-To: <20200702151453.GA1799@lst.de>

On Thu, 2 Jul 2020 at 20:45, Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Jul 02, 2020 at 10:10:10AM -0400, Qian Cai wrote:
> > On Mon, Jun 29, 2020 at 09:39:45PM +0200, Christoph Hellwig wrote:
> > > Split out a __submit_bio_noacct helper for the actual de-recursion
> > > algorithm, and simplify the loop by using a continue when we can't
> > > enter the queue for a bio.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > Reverting this commit and its dependencies,
> >
> > 5a6c35f9af41 block: remove direct_make_request
> > ff93ea0ce763 block: shortcut __submit_bio_noacct for blk-mq drivers
> >
> > fixed the stack-out-of-bounds during boot,
> >
> > https://lore.kernel.org/linux-block/000000000000bcdeaa05a97280e4@google.com/
>
> Yikes.  bio_alloc_bioset pokes into bio_list[1] in a totally
> undocumented way.  But even with that the problem should only show
> up with "block: shortcut __submit_bio_noacct for blk-mq drivers".
>
> Can you try this patch?

Applied your patch on top of linux-next 20200702 and tested on
arm64 and x86_64 devices and the reported BUG fixed.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index bf882b8d84450c..9f1bf8658b611a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1155,11 +1155,10 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
>  static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
>  {
>         struct gendisk *disk = bio->bi_disk;
> -       struct bio_list bio_list;
> +       struct bio_list bio_list[2] = { };
>         blk_qc_t ret = BLK_QC_T_NONE;
>
> -       bio_list_init(&bio_list);
> -       current->bio_list = &bio_list;
> +       current->bio_list = bio_list;
>
>         do {
>                 WARN_ON_ONCE(bio->bi_disk != disk);
> @@ -1174,7 +1173,7 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
>                 }
>
>                 ret = blk_mq_submit_bio(bio);
> -       } while ((bio = bio_list_pop(&bio_list)));
> +       } while ((bio = bio_list_pop(&bio_list[0])));
>
>         current->bio_list = NULL;
>         return ret;

ref:
https://lkft.validation.linaro.org/scheduler/job/1538359#L288
https://lkft.validation.linaro.org/scheduler/job/1538360#L572


- Naresh

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: sound: fsl-asoc-card: add new compatible for I2S slave
From: Mark Brown @ 2020-07-02 15:42 UTC (permalink / raw)
  To: Arnaud Ferraris
  Cc: devicetree, alsa-devel, linuxppc-dev, Timur Tabi, Xiubo Li,
	linux-kernel, Takashi Iwai, Liam Girdwood, Jaroslav Kysela,
	Nicolin Chen, Rob Herring, kernel, Fabio Estevam
In-Reply-To: <5de5ea5b-0716-8ed1-28b0-9ad3da7a2d47@collabora.com>

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On Thu, Jul 02, 2020 at 05:28:03PM +0200, Arnaud Ferraris wrote:
> Le 02/07/2020 à 16:31, Mark Brown a écrit :

> > Why require that the CODEC be clock master here - why not make this
> > configurable, reusing the properties from the generic and audio graph
> > cards?

> This is partly because I'm not sure how to do it (yet), but mostly
> because I don't have the hardware to test this (the 2 CODECs present on
> my only i.MX6 board are both clock master)

Take a look at what the generic cards are doing, it's a library function 
asoc_simple_parse_daifmt().  It's not the end of the world if you can't
test it properly - if it turns out it's buggy somehow someone can always
fix the code later but an ABI is an ABI so we can't change it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 18/20] block: refator submit_bio_noacct
From: Naresh Kamboju @ 2020-07-02 15:15 UTC (permalink / raw)
  To: Qian Cai, Christoph Hellwig
  Cc: Song Liu, Alexei Starovoitov, linux-nvme, dm-devel, linux-bcache,
	drbd-dev, linux-s390, Daniel Borkmann, linux-nvdimm,
	john.fastabend, Yonghong Song, Andrii Nakryiko, linux-xtensa,
	linux-raid, linux-m68k, lkft-triage, kpsingh, Jens Axboe,
	linux-block, Netdev, open list, bpf, linuxppc-dev,
	Martin KaFai Lau
In-Reply-To: <20200702141001.GA3834@lca.pw>

On Thu, 2 Jul 2020 at 19:40, Qian Cai <cai@lca.pw> wrote:
>
> On Mon, Jun 29, 2020 at 09:39:45PM +0200, Christoph Hellwig wrote:
> > Split out a __submit_bio_noacct helper for the actual de-recursion
> > algorithm, and simplify the loop by using a continue when we can't
> > enter the queue for a bio.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>

Kernel BUG: on arm64 and x86_64 devices running linux next-rc3-next-20200702
with KASAN config enabled. While running mkfs -t ext4.

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git commit: d37d57041350dff35dd17cbdf9aef4011acada38
  git describe: next-20200702
  make_kernelversion: 5.8.0-rc3
  kernel-config:
https://builds.tuxbuild.com/DnjQHvYrx586eUoFxtYZxQ/kernel.config

steps to reproduce:
 # mkfs -t ext4 /dev/disk/by-id/ata-SanDisk_SDSSDA120G_165193445014


BUG: KASAN: stack-out-of-bounds in bio_alloc_bioset+0x28c/0x2c8
[   59.398307] Read of size 8 at addr ffff0009084277e0 by task mkfs.ext4/417
[   59.405121]
[   59.406644] CPU: 5 PID: 417 Comm: mkfs.ext4 Not tainted
5.8.0-rc3-next-20200702 #1
[   59.414248] Hardware name: ARM Juno development board (r2) (DT)
[   59.420195] Call trace:
[   59.422683]  dump_backtrace+0x0/0x2b8
[   59.426386]  show_stack+0x18/0x28
[   59.429741]  dump_stack+0xec/0x144
[   59.433183]  print_address_description.isra.0+0x6c/0x448
[   59.438531]  kasan_report+0x134/0x200
[   59.442226]  __asan_load8+0x9c/0xd8
[   59.445751]  bio_alloc_bioset+0x28c/0x2c8
[   59.449796]  bio_clone_fast+0x28/0x98
[   59.453492]  bio_split+0x64/0x138
[   59.456842]  __blk_queue_split+0x534/0x698
[   59.460979]  blk_mq_submit_bio+0x10c/0x680
[   59.465118]  submit_bio_noacct+0x57c/0x640
[   59.469253]  submit_bio+0xc0/0x358
[   59.472688]  submit_bio_wait+0xc0/0x110
[   59.476561]  blkdev_issue_discard+0xd0/0x138
[   59.480877]  blk_ioctl_discard+0x1b8/0x238
[   59.485008]  blkdev_common_ioctl+0x594/0xd38
[   59.489312]  blkdev_ioctl+0x130/0x578
[   59.493010]  block_ioctl+0x78/0x98
[   59.496453]  ksys_ioctl+0xb8/0xf8
[   59.499808]  __arm64_sys_ioctl+0x44/0x60
[   59.503781]  el0_svc_common.constprop.0+0xa4/0x1e0
[   59.508615]  do_el0_svc+0x38/0xa0
[   59.511967]  el0_sync_handler+0x98/0x1a8
[   59.515922]  el0_sync+0x158/0x180
[   59.519255]
[   59.520761] The buggy address belongs to the page:
[   59.525590] page:fffffe00240109c0 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[   59.533895] flags: 0x2ffff00000000000()
[   59.537779] raw: 2ffff00000000000 0000000000000000 fffffe00240109c8
0000000000000000
[   59.545575] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[   59.553352] page dumped because: kasan: bad access detected
[   59.558947]
[   59.560463] addr ffff0009084277e0 is located in stack of task
mkfs.ext4/417 at offset 48 in frame:
[   59.569475]  submit_bio_noacct+0x0/0x640
[   59.573423]
[   59.574930] this frame has 2 objects:
[   59.578624]  [32, 48) 'bio_list'
[   59.578644]  [64, 96) 'bio_list_on_stack'
[   59.581889]
[   59.587412] Memory state around the buggy address:
[   59.592243]  ffff000908427680: 00 00 00 f2 00 00 00 f2 f2 f2 00 00
00 00 00 f3
[   59.599510]  ffff000908427700: f3 f3 f3 f3 00 00 00 00 00 00 00 00
00 00 00 00
[   59.606777] >ffff000908427780: 00 00 00 00 00 00 f1 f1 f1 f1 00 00
f2 f2 00 00
[   59.614031]                                                        ^
[   59.620427]  ffff000908427800: 00 00 f3 f3 f3 f3 00 00 00 00 00 00
00 00 00 00
[   59.627694]  ffff000908427880: 00 00 00 00 00 00 f1 f1 f1 f1 00 00
00 00 f3 f3
[   59.634946] ==================================================================
[   59.642198] Disabling lock debugging due to kernel taint


Kernel BUG on x86_64:

[   17.809563] ==================================================================
[   17.816786] BUG: KASAN: stack-out-of-bounds in bio_alloc_bioset+0x31f/0x340
[   17.823750] Read of size 8 at addr ffff888225f9f450 by task systemd-udevd/361
[   17.830881]
[   17.832384] CPU: 0 PID: 361 Comm: systemd-udevd Not tainted
5.8.0-rc3-next-20200702 #1
[   17.840294] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.2 05/23/2018
[   17.847686] Call Trace:
[   17.850143]  dump_stack+0x84/0xba
[   17.853462]  print_address_description.constprop.0+0x1f/0x210
[   17.859212]  ? _raw_spin_lock_irqsave+0x7c/0xd0
[   17.859214]  ? _raw_write_lock_irqsave+0xd0/0xd0
[   17.859217]  ? bio_alloc_bioset+0x31f/0x340
[   17.859220]  kasan_report.cold+0x37/0x7c
[   17.859222]  ? bio_alloc_bioset+0x31f/0x340
[   17.859224]  __asan_load8+0x86/0xb0
[   17.859226]  bio_alloc_bioset+0x31f/0x340
[   17.859228]  ? bvec_alloc+0x160/0x160
[   17.859230]  ? bio_alloc_bioset+0x253/0x340
[   17.859232]  ? mpage_alloc.isra.0+0x37/0x120
[   17.859234]  ? do_mpage_readpage+0x740/0xd40
[   17.859236]  ? mpage_readahead+0x196/0x280
[   17.859238]  ? blkdev_readahead+0x10/0x20
[   17.859241]  ? read_pages+0x149/0x470
[   17.859243]  ? page_cache_readahead_unbounded+0x2de/0x360
[   17.859246]  ? __do_page_cache_readahead+0x6c/0x80
[   17.859248]  bio_clone_fast+0x14/0x30
[   17.859250]  bio_split+0x64/0x1b0
[   17.859252]  __blk_queue_split+0x417/0x8d0
[   17.859255]  ? __blk_rq_map_sg+0x820/0x820
[   17.859258]  ? kmem_cache_alloc+0xc6/0x4b0
[   17.859260]  ? mempool_alloc_slab+0x12/0x20
[   17.859262]  blk_mq_submit_bio+0x150/0xb90
[   17.859265]  ? blk_mq_try_issue_directly+0xe0/0xe0
[   17.859267]  ? blk_queue_enter+0xea/0x460
[   17.859269]  ? submit_bio_checks+0x4cc/0xa00
[   17.859272]  ? bio_add_page+0x78/0x110
[   17.859274]  submit_bio_noacct+0x5ff/0x6c0
[   17.859276]  ? mpage_alloc.isra.0+0xab/0x120
[   17.859279]  ? blk_queue_enter+0x460/0x460
[   17.859281]  ? do_mpage_readpage+0xc02/0xd40
[   17.859283]  submit_bio+0xb5/0x2e0
[   17.859286]  ? submit_bio_noacct+0x6c0/0x6c0
[   17.859288]  ? __disk_get_part+0x3d/0x50
[   17.859290]  mpage_readahead+0x227/0x280
[   17.859293]  ? do_mpage_readpage+0xd40/0xd40
[   17.859295]  ? bdev_evict_inode+0x130/0x130
[   17.859297]  ? find_get_pages_contig+0x340/0x340
[   17.859299]  blkdev_readahead+0x10/0x20
[   17.859302]  read_pages+0x149/0x470
[   17.859304]  ? lru_cache_add+0xde/0xf0
[   17.859306]  ? read_cache_pages+0x280/0x280
[   17.859309]  ? add_to_page_cache_locked+0x10/0x10
[   17.859310]  ? alloc_pages_current+0x98/0x110
[   17.859313]  page_cache_readahead_unbounded+0x2de/0x360
[   17.859316]  ? read_pages+0x470/0x470
[   17.859319]  ? xas_load+0xee/0x110
[   17.859321]  ? find_get_entry+0xbf/0x250
[   17.859323]  __do_page_cache_readahead+0x6c/0x80
[   17.859326]  force_page_cache_readahead+0xee/0x180
[   17.859329]  page_cache_sync_readahead+0x131/0x140
[   17.859331]  generic_file_buffered_read+0x698/0x1130
[   17.859334]  ? get_page_from_freelist+0x1b13/0x1e60
[   17.859337]  ? pagecache_get_page+0x3a0/0x3a0
[   17.859340]  ? __isolate_free_page+0x210/0x210
[   17.859342]  ? __ia32_sys_mmap_pgoff+0x90/0x90
[   17.859345]  generic_file_read_iter+0x17f/0x1f0
[   17.859347]  ? memory_high_write+0x1c0/0x1c0
[   17.859349]  blkdev_read_iter+0x76/0x90
[   17.859352]  new_sync_read+0x298/0x3c0
[   17.859354]  ? __ia32_sys_llseek+0x230/0x230
[   17.859357]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   17.859359]  ? fsnotify+0x12c/0x5f0
[   17.859361]  ? __vfs_read+0x30/0x90
[   17.859363]  __vfs_read+0x76/0x90
[   17.859365]  vfs_read+0xc8/0x1e0
[   17.859368]  ksys_read+0xc8/0x170
[   17.859370]  ? kernel_write+0xc0/0xc0
[   17.859372]  ? syscall_trace_enter+0x166/0x280
[   17.859375]  __x64_sys_read+0x3e/0x50
[   17.859377]  do_syscall_64+0x43/0x70
[   17.859379]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   17.859381] RIP: 0033:0x7fe23cf4b56e
[   17.859382] Code: Bad RIP value.
[   17.859383] RSP: 002b:00007fff586583c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[   17.859386] RAX: ffffffffffffffda RBX: 00005620318bd8a0 RCX: 00007fe23cf4b56e
[   17.859387] RDX: 0000000000040000 RSI: 00007fe23dd56038 RDI: 000000000000000f
[   17.859388] RBP: 0000000000040000 R08: 00007fe23dd56010 R09: 0000000000000000
[   17.859390] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
[   17.859391] R13: 00005620318bd8f0 R14: 00007fe23dd56028 R15: 00007fe23dd56010
[   17.859392]
[   17.859393] The buggy address belongs to the page:
[   17.859396] page:ffffea000897e7c0 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[   17.859397] flags: 0x200000000000000()
[   17.859400] raw: 0200000000000000 0000000000000000 ffffea000897e7c8
0000000000000000
[   17.859403] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[   17.859403] page dumped because: kasan: bad access detected
[   17.859404]
[   17.859406] addr ffff888225f9f450 is located in stack of task
systemd-udevd/361 at offset 48 in frame:
[   17.859408]  submit_bio_noacct+0x0/0x6c0
[   17.859409]
[   17.859410] this frame has 2 objects:
[   17.859412]  [32, 48) 'bio_list'
[   17.859414]  [64, 96) 'bio_list_on_stack'
[   17.859414]
[   17.859415] Memory state around the buggy address:
[   17.859417]  ffff888225f9f300: f2 00 00 00 f2 00 00 00 f2 f2 f2 00
00 00 00 00
[   17.859418]  ffff888225f9f380: f3 f3 f3 f3 f3 00 00 00 00 00 00 00
00 00 00 00
[   17.859420] >ffff888225f9f400: 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2
00 00 00 00
[   17.859421]                                                  ^
[   17.859422]  ffff888225f9f480: f3 f3 f3 f3 00 00 00 00 00 00 00 00
00 00 00 00
[   17.859424]  ffff888225f9f500: 00 00 00 f1 f1 f1 f1 00 00 00 00 f3
f3 f3 f3 00
[   17.859425] ==================================================================
[   17.859425] Disabling lock debugging due to kernel taint

^ permalink raw reply

* Re: [PATCH 18/20] block: refator submit_bio_noacct
From: Christoph Hellwig @ 2020-07-02 15:14 UTC (permalink / raw)
  To: Qian Cai
  Cc: Jens Axboe, linux-xtensa, linux-nvdimm, linux-s390, linux-m68k,
	linux-nvme, linux-kernel, linux-raid, dm-devel,
	Alexander Potapenko, kasan-dev, Andrey Ryabinin, linux-bcache,
	linuxppc-dev, Christoph Hellwig, Dmitry Vyukov, drbd-dev
In-Reply-To: <20200702141001.GA3834@lca.pw>

On Thu, Jul 02, 2020 at 10:10:10AM -0400, Qian Cai wrote:
> On Mon, Jun 29, 2020 at 09:39:45PM +0200, Christoph Hellwig wrote:
> > Split out a __submit_bio_noacct helper for the actual de-recursion
> > algorithm, and simplify the loop by using a continue when we can't
> > enter the queue for a bio.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Reverting this commit and its dependencies,
> 
> 5a6c35f9af41 block: remove direct_make_request
> ff93ea0ce763 block: shortcut __submit_bio_noacct for blk-mq drivers
> 
> fixed the stack-out-of-bounds during boot,
> 
> https://lore.kernel.org/linux-block/000000000000bcdeaa05a97280e4@google.com/

Yikes.  bio_alloc_bioset pokes into bio_list[1] in a totally
undocumented way.  But even with that the problem should only show
up with "block: shortcut __submit_bio_noacct for blk-mq drivers".

Can you try this patch?

diff --git a/block/blk-core.c b/block/blk-core.c
index bf882b8d84450c..9f1bf8658b611a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1155,11 +1155,10 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 {
 	struct gendisk *disk = bio->bi_disk;
-	struct bio_list bio_list;
+	struct bio_list bio_list[2] = { };
 	blk_qc_t ret = BLK_QC_T_NONE;
 
-	bio_list_init(&bio_list);
-	current->bio_list = &bio_list;
+	current->bio_list = bio_list;
 
 	do {
 		WARN_ON_ONCE(bio->bi_disk != disk);
@@ -1174,7 +1173,7 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 		}
 
 		ret = blk_mq_submit_bio(bio);
-	} while ((bio = bio_list_pop(&bio_list)));
+	} while ((bio = bio_list_pop(&bio_list[0])));
 
 	current->bio_list = NULL;
 	return ret;

^ permalink raw reply related

* Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: Srikar Dronamraju @ 2020-07-02 14:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Gautham R Shenoy, David Hildenbrand, Linus Torvalds, linux-kernel,
	linux-mm, Satheesh Rajendran, Mel Gorman, Kirill A. Shutemov,
	Andrew Morton, linuxppc-dev, Christopher Lameter, Vlastimil Babka
In-Reply-To: <20200702084123.GC18446@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-07-02 10:41:23]:

> On Thu 02-07-20 12:14:08, Srikar Dronamraju wrote:
> > * Michal Hocko <mhocko@kernel.org> [2020-07-01 14:21:10]:
> > 
> > > > >>>>> The autonuma problem sounds interesting but again this patch doesn't
> > > > >>>>> really solve the underlying problem because I strongly suspect that the
> > > > >>>>> problem is still there when a numa node gets all its memory offline as
> > > > >>>>> mentioned above.
> > > 
> > > I would really appreciate a feedback to these two as well.
> > 
> > 1. Its not just numactl that's to be fixed but all tools/utilities that
> > depend on /sys/devices/system/node/online. Are we saying to not rely/believe
> > in the output given by the kernel but do further verification?  
> 
> No, what we are saying is that even an online node might have zero
> number of online pages/cpus. So the online status is not really
> something that matters. If people are confused by that output then user
> space tools can make their confusion go away. I really do not understand
> why the kernel should do any logic there.

The user facing teams are saying they are getting queries from the users who
are unable to understand from the tools/sysfs files why a node is online and
but has no attached resources. Its the amount of time that is being spent on
these issues that triggered the patch. Initially even I was skeptical that
this was a non-issue.

> 
> > Also how would the user space differentiate between the case where the
> > Kernel missed marking a node as offline to the case where the memory was
> > offlined on a cpuless node but node wasn't offline?.
> 
> What I am arguing is that those two shouldn't be any different. Really!
> 
> > 2. Regarding the autonuma, the case of offline memory is user/admin driven,
> > so if there is a performance hit, its something that's driven by his
> > user/admin actions. Also how often do we see users offline complete memory
> > of cpuless node on a 2 node system?
> 
> How often do we see crippled HW configurations like that? Really if
> autonuma should be made more clever for one case it should recognize the
> other as well.
> 

Lets take a 16 socket PowerVM system and assume that 32 lpars are created
on that socket, i.e 2 lpars for each socket. (PowerVM has the final say on
how the lpars are created.) In such a case, we can expect 30 out of the 32
lpars to face this problem, with the only 2 lpars that actually run on
socket 0 having the correct configuration.

> > > 
> > > This begs a question whether ppc can do the same thing?
> > 
> > Certainly ppc can be made to adapt to this situation but that would be a
> > workaround. Do we have a reason why we think node 0 is unique and special?
> 
> It is not. As replied in other email in this thread. I would hope for
> having less hacks in the numa initialization. Cleaning up the mess is
> would be a lot of work and testing on all NUMA capable architectures.
> This is a heritage from the past I am afraid. All that I am arguing here
> is that your touch to the generic code with a very simple looking patch
> might have side effects which are pretty much impossible to review.
> Moreover it seems that nothing but ppc really needs this treatment.
> So fixing it in ppc specific code sounds much more safe.
> 
> Normally I would really push for a generic solution but after getting
> burned several times in this area I do not dare anymore. The problem is
> not in the code complexity but in how spread it is in places where you
> do not expect side effects.
> 

I do understand and respect your viewpoint.

> -- 
> Michal Hocko
> SUSE Labs

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: Gautham R Shenoy @ 2020-07-02 14:36 UTC (permalink / raw)
  To: Kajol Jain; +Cc: nathanl, ego, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200626102824.270923-3-kjain@linux.ibm.com>

On Fri, Jun 26, 2020 at 03:58:24PM +0530, Kajol Jain wrote:
> Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
> 
> Primary use to expose the cpumask is for the perf tool which has the
> capability to parse the driver sysfs folder and understand the
> cpumask file. Having cpumask file will reduce the number of perf command
> line parameters (will avoid "-C" option in the perf tool
> command line). It can also notify the user which is
> the current cpu used to retrieve the counter data.
> 
> command:# cat /sys/devices/hv_24x7/cpumask
> 0
> 
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>

This patch looks good to me.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  .../sysfs-bus-event_source-devices-hv_24x7    |  7 ++++
>  arch/powerpc/perf/hv-24x7.c                   | 36 +++++++++++++++++--
>  2 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> index e8698afcd952..f9dd3755b049 100644
> --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
> @@ -43,6 +43,13 @@ Description:	read only
>  		This sysfs interface exposes the number of cores per chip
>  		present in the system.
> 
> +What:		/sys/devices/hv_24x7/cpumask
> +Date:		June 2020
> +Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> +Description:	read only
> +		This sysfs file exposes the cpumask which is designated to make
> +		HCALLs to retrieve hv-24x7 pmu event counter data.
> +
>  What:		/sys/bus/event_source/devices/hv_24x7/event_descs/<event-name>
>  Date:		February 2014
>  Contact:	Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index ce4739e2b407..3c699612d29f 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
>  	return sprintf(buf, "%s\n", (char *)d->var);
>  }
> 
> +static ssize_t cpumask_get_attr(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	return cpumap_print_to_pagebuf(true, buf, &hv_24x7_cpumask);
> +}
> +
>  static ssize_t sockets_show(struct device *dev,
>  			    struct device_attribute *attr, char *buf)
>  {
> @@ -1116,6 +1122,17 @@ static DEVICE_ATTR_RO(sockets);
>  static DEVICE_ATTR_RO(chipspersocket);
>  static DEVICE_ATTR_RO(coresperchip);
> 
> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_get_attr, NULL);
> +
> +static struct attribute *cpumask_attrs[] = {
> +	&dev_attr_cpumask.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group cpumask_attr_group = {
> +	.attrs = cpumask_attrs,
> +};
> +
>  static struct bin_attribute *if_bin_attrs[] = {
>  	&bin_attr_catalog,
>  	NULL,
> @@ -1143,6 +1160,11 @@ static const struct attribute_group *attr_groups[] = {
>  	&event_desc_group,
>  	&event_long_desc_group,
>  	&if_group,
> +	/*
> +	 * This NULL is a placeholder for the cpumask attr which will update
> +	 * onlyif cpuhotplug registration is successful
> +	 */
> +	NULL,
>  	NULL,
>  };
> 
> @@ -1683,7 +1705,7 @@ static int hv_24x7_cpu_hotplug_init(void)
> 
>  static int hv_24x7_init(void)
>  {
> -	int r;
> +	int r, i = -1;
>  	unsigned long hret;
>  	struct hv_perf_caps caps;
> 
> @@ -1727,8 +1749,18 @@ static int hv_24x7_init(void)
> 
>  	/* init cpuhotplug */
>  	r = hv_24x7_cpu_hotplug_init();
> -	if (r)
> +	if (r) {
>  		pr_err("hv_24x7: CPU hotplug init failed\n");
> +	} else {
> +		/*
> +		 * Cpu hotplug init is successful, add the
> +		 * cpumask file as part of pmu attr group and
> +		 * assign it to very first NULL location.
> +		 */
> +		while (attr_groups[++i])
> +			/* nothing */;
> +		attr_groups[i] = &cpumask_attr_group;
> +	}
> 
>  	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>  	if (r)
> -- 
> 2.18.2
> 

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: sound: fsl-asoc-card: add new compatible for I2S slave
From: Mark Brown @ 2020-07-02 14:31 UTC (permalink / raw)
  To: Arnaud Ferraris
  Cc: devicetree, alsa-devel, linuxppc-dev, Timur Tabi, Xiubo Li,
	linux-kernel, Takashi Iwai, Liam Girdwood, Jaroslav Kysela,
	Nicolin Chen, Rob Herring, kernel, Fabio Estevam
In-Reply-To: <20200702141114.232688-2-arnaud.ferraris@collabora.com>

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Thu, Jul 02, 2020 at 04:11:14PM +0200, Arnaud Ferraris wrote:
> fsl-asoc-card currently doesn't support generic codecs with the SoC
> acting as I2S slave.
> 
> This commit adds a new `fsl,imx-audio-i2s-slave` for this use-case, as
> well as the following mandatory properties:

Why require that the CODEC be clock master here - why not make this
configurable, reusing the properties from the generic and audio graph
cards?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 18/20] block: refator submit_bio_noacct
From: Qian Cai @ 2020-07-02 14:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-xtensa, linux-nvdimm, linux-s390, linux-m68k,
	linux-nvme, linux-kernel, linux-raid, dm-devel, linux-bcache,
	linuxppc-dev, drbd-dev
In-Reply-To: <20200629193947.2705954-19-hch@lst.de>

On Mon, Jun 29, 2020 at 09:39:45PM +0200, Christoph Hellwig wrote:
> Split out a __submit_bio_noacct helper for the actual de-recursion
> algorithm, and simplify the loop by using a continue when we can't
> enter the queue for a bio.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reverting this commit and its dependencies,

5a6c35f9af41 block: remove direct_make_request
ff93ea0ce763 block: shortcut __submit_bio_noacct for blk-mq drivers

fixed the stack-out-of-bounds during boot,

https://lore.kernel.org/linux-block/000000000000bcdeaa05a97280e4@google.com/

[   55.573431][ T1373] BUG: KASAN: stack-out-of-bounds in bio_alloc_bioset+0x493/0x4a0
bio_alloc_bioset+0x493/0x4a0:
bio_list_empty at include/linux/bio.h:561
(inlined by) bio_alloc_bioset at block/bio.c:482
[   55.581140][ T1373] Read of size 8 at addr ffffc9000a7df1e0 by task mount/1373
[   55.588409][ T1373]
[   55.590615][ T1373] CPU: 2 PID: 1373 Comm: mount Not tainted 5.8.0-rc3-next-20200702 #2
[   55.598672][ T1373] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[   55.607972][ T1373] Call Trace:
[   55.607980][ T1373]  dump_stack+0x9d/0xe0
[   55.607984][ T1373]  ? bio_alloc_bioset+0x493/0x4a0
[   55.607992][ T1373]  ? bio_alloc_bioset+0x493/0x4a0
[   55.625007][ T1373]  print_address_description.constprop.8.cold.10+0x56/0x44e
[   55.632191][ T1373]  ? bio_alloc_bioset+0x493/0x4a0
[   55.637100][ T1373]  ? bio_alloc_bioset+0x493/0x4a0
[   55.642011][ T1373]  kasan_report.cold.11+0x37/0x7c
[   55.646923][ T1373]  ? bio_alloc_bioset+0x493/0x4a0
[   55.651968][ T1373]  bio_alloc_bioset+0x493/0x4a0
[   55.651971][ T1373]  ? bvec_alloc+0x290/0x290
[   55.651975][ T1373]  ? mark_lock+0x147/0x1800
[   55.651978][ T1373]  ? mark_lock+0x147/0x1800
[   55.651981][ T1373]  bio_clone_fast+0xe/0x30
[   55.651983][ T1373]  bio_split+0x8a/0x4c0
[   55.651986][ T1373]  ? print_irqtrace_events+0x270/0x270
[   55.651990][ T1373]  __blk_queue_split+0xc42/0x13e0
[   55.651998][ T1373]  ? __lock_acquire+0xc57/0x4da0
         Startin[   55.693322][ T1373]  ? __blk_rq_map_sg+0x14c0/0x14c0
[   55.699711][ T1373]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[   55.705602][ T1373]  ? mark_held_locks+0xb0/0x110
[   55.705605][ T1373]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[   55.705608][ T1373]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[   55.705611][ T1373]  ? find_held_lock+0x33/0x1c0
[   55.705614][ T1373]  ? find_held_lock+0x33/0x1c0
[   55.705618][ T1373]  blk_mq_submit_bio+0x19e/0x1e20
[   55.705621][ T1373]  ? lock_downgrade+0x720/0x720
[   55.705624][ T1373]  ? blk_mq_try_issue_directly+0x140/0x140
[   55.705628][ T1373]  ? rcu_read_lock_sched_held+0xaa/0xd0
[   55.705631][ T1373]  ? rcu_read_lock_bh_held+0xc0/0xc0
[   55.705635][ T1373]  ? blk_queue_enter+0x83c/0x9a0
[   55.705647][ T1373]  ? submit_bio_checks+0x1cc0/0x1cc0
[   55.767384][ T1373]  submit_bio_noacct+0x9c0/0xeb0
[   55.772212][ T1373]  ? blk_queue_enter+0x9a0/0x9a0
[   55.777038][ T1373]  ? lockdep_hardirqs_on_prepare+0x550/0x550
[   55.782913][ T1373]  ? trace_hardirqs_on+0x20/0x1b5
[   55.787825][ T1373]  ? submit_bio+0xe7/0x480
[   55.792125][ T1373]  submit_bio+0xe7/0x480
[   55.796252][ T1373]  ? bio_associate_blkg_from_css+0x4a3/0xd30
[   55.802124][ T1373]  ? submit_bio_noacct+0xeb0/0xeb0
[   55.807124][ T1373]  ? lock_downgrade+0x720/0x720
[   55.811862][ T1373]  ? rcu_read_unlock+0x50/0x50
[   55.816512][ T1373]  ? lockdep_init_map_waits+0x267/0x7b0
[   55.821948][ T1373]  ? lockdep_init_map_waits+0x267/0x7b0
g LVM event acti[   55.827386][ T1373]  ? __raw_spin_lock_init+0x34/0x100
[   55.833957][ T1373]  submit_bio_wait+0xf9/0x200
vation on device[   55.838521][ T1373]  ? submit_bio_wait_endio+0x30/0x30
[   55.845091][ T1373]  xfs_rw_bdev+0x3ca/0x4d0
[   55.849396][ T1373]  xlog_do_io+0x149/0x320
[   55.853611][ T1373]  xlog_bread+0x1e/0xb0
[   55.857651][ T1373]  xlog_find_verify_log_record+0xba/0x4c0
[   55.863264][ T1373]  ? xlog_header_check_mount+0xb0/0xb0
[   55.868615][ T1373]  xlog_find_zeroed+0x2bc/0x4c0
 8:3...
[   55.873356][ T1373]  ? print_irqtrace_events+0x270/0x270
[   55.880093][ T1373]  ? xlog_find_verify_log_record+0x4c0/0x4c0
[   55.885966][ T1373]  ? __lock_acquire+0x1920/0x4da0
[   55.890881][ T1373]  xlog_find_head+0xd4/0x790
[   55.895355][ T1373]  ? xlog_find_zeroed+0x4c0/0x4c0
[   55.900269][ T1373]  ? rcu_read_lock_sched_held+0xaa/0xd0
[   55.905708][ T1373]  ? rcu_read_lock_bh_held+0xc0/0xc0
[   55.910885][ T1373]  ? sugov_update_single+0x18d/0x4f0
[   55.916058][ T1373]  xlog_find_tail+0xc2/0x810
[   55.920534][ T1373]  ? mark_lock+0x147/0x1800
[   55.924921][ T1373]  ? xlog_verify_head+0x4c0/0x4c0
[   55.929834][ T1373]  ? debug_show_held_locks+0x30/0x50
[   55.935007][ T1373]  ? print_irqtrace_events+0x270/0x270
[   55.940358][ T1373]  ? try_to_wake_up+0x6d1/0xf40
[   55.945094][ T1373]  ? mark_held_locks+0xb0/0x110
[   55.949835][ T1373]  ? lockdep_hardirqs_on_prepare+0x38c/0x550
[   55.955708][ T1373]  ? _raw_spin_unlock_irqrestore+0x39/0x40
[   55.961410][ T1373]  ? trace_hardirqs_on+0x20/0x1b5
[   55.966324][ T1373]  xlog_recover+0x7c/0x480
[   55.970627][ T1373]  ? xlog_buf_readahead+0x110/0x110
[   55.975715][ T1373]  ? migrate_swap_stop+0xbf0/0xbf0
[   55.980718][ T1373]  ? lockdep_init_map_waits+0x267/0x7b0
[   55.986156][ T1373]  ? __raw_spin_lock_init+0x34/0x100
[   55.991333][ T1373]  xfs_log_mount+0x541/0x660
[   55.995809][ T1373]  xfs_mountfs+0xccd/0x1a00
[   56.000202][ T1373]  ? queue_work_node+0x190/0x190
[   56.005028][ T1373]  ? rcu_read_lock_sched_held+0xaa/0xd0
[   56.010466][ T1373]  ? xfs_default_resblks+0x50/0x50
[   56.015464][ T1373]  ? xfs_filestream_get_parent+0xa0/0xa0
[   56.020989][ T1373]  ? init_timer_key+0x285/0x320
[   56.025727][ T1373]  ? lockdep_init_map_waits+0x267/0x7b0
[   56.031165][ T1373]  ? xfs_filestream_get_parent+0xa0/0xa0
[   56.036689][ T1373]  ? xfs_mru_cache_create+0x358/0x560
[   56.041951][ T1373]  xfs_fc_fill_super+0x6d3/0xd50
[   56.046777][ T1373]  get_tree_bdev+0x40a/0x690
[   56.051257][ T1373]  ? xfs_fs_inode_init_once+0xc0/0xc0
[   56.056523][ T1373]  vfs_get_tree+0x84/0x2c0
[   56.060827][ T1373]  do_mount+0xf93/0x1630
[   56.064953][ T1373]  ? rcu_read_lock_bh_held+0xc0/0xc0
[   56.070129][ T1373]  ? copy_mount_string+0x20/0x20
[   56.074956][ T1373]  ? _copy_from_user+0xbe/0x100
[   56.079696][ T1373]  ? memdup_user+0x4f/0x80
[   56.083999][ T1373]  __x64_sys_mount+0x15d/0x1b0
2m  OK  ] St[   56.088654][ T1373]  do_syscall_64+0x5f/0x310
[   56.094437][ T1373]  ? trace_hardirqs_off+0x12/0x1a0
[   56.099439][ T1373]  ? asm_exc_page_fault+0x8/0x30
[   56.104267][ T1373]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   56.110055][ T1373] RIP: 0033:0x7f3bc2c8a9ee
[   56.114357][ T1373] Code: Bad RIP value.
[   56.118309][ T1373] RSP: 002b:00007fffd4675718 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
arted File Syste[   56.126629][ T1373] RAX: ffffffffffffffda RBX: 000055a59d34c9c0 RCX: 00007f3bc2c8a9ee
[   56.135900][ T1373] RDX: 000055a59d34cba0 RSI: 000055a59d34cc00 RDI: 000055a59d34e900
[   56.143779][ T1373] RBP: 00007f3bc3a36184 R08: 0000000000000000 R09: 0000000000000003
[   56.151661][ T1373] R10: 00000000c0ed0000 R11: 0000000000000246 R12: 0000000000000000
[   56.159541][ T1373] R13: 00000000c0ed0000 R14: 000055a59d34e900 R15: 000055a59d34cba0
[   56.167422][ T1373]
[   56.169626][ T1373]
[   56.171831][ T1373] addr ffffc9000a7df1e0 is located in stack of task mount/1373 at offset 48 in frame:
[   56.181287][ T1373]  submit_bio_noacct+0x0/0xeb0
submit_bio_noacct at block/blk-core.c:1198
[   56.185939][ T1373]
[   56.188144][ T1373] this frame has 2 objects:
m Check on /dev/[   56.192532][ T1373]  [32, 48) 'bio_list'
[   56.192534][ T1373]  [96, 128) 'bio_list_on_stack'
[   56.197872][ T1373]
[   56.204894][ T1373] Memory state around the buggy address:
[   56.210420][ T1373]  ffffc9000a7df080: f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 00 00
[   56.218389][ T1373]  ffffc9000a7df100: 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00
disk/by-uuid/D10[   56.226359][ T1373] >ffffc9000a7df180: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2 f2 f2
[   56.235718][ T1373]                                                        ^
[   56.242817][ T1373]  ffffc9000a7df200: f2 f2 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00
[   56.250790][ T1373]  ffffc9000a7df280: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2
[   56.258757][ T1373] ==================================================================

> ---
>  block/blk-core.c | 131 +++++++++++++++++++++++++----------------------
>  1 file changed, 71 insertions(+), 60 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1caeb01e127768..b82f48c86e6f7a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1085,6 +1085,74 @@ static blk_qc_t do_make_request(struct bio *bio)
>  	return ret;
>  }
>  
> +/*
> + * The loop in this function may be a bit non-obvious, and so deserves some
> + * explanation:
> + *
> + *  - Before entering the loop, bio->bi_next is NULL (as all callers ensure
> + *    that), so we have a list with a single bio.
> + *  - We pretend that we have just taken it off a longer list, so we assign
> + *    bio_list to a pointer to the bio_list_on_stack, thus initialising the
> + *    bio_list of new bios to be added.  ->submit_bio() may indeed add some more
> + *    bios through a recursive call to submit_bio_noacct.  If it did, we find a
> + *    non-NULL value in bio_list and re-enter the loop from the top.
> + *  - In this case we really did just take the bio of the top of the list (no
> + *    pretending) and so remove it from bio_list, and call into ->submit_bio()
> + *    again.
> + *
> + * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio.
> + * bio_list_on_stack[1] contains bios that were submitted before the current
> + *	->submit_bio_bio, but that haven't been processed yet.
> + */
> +static blk_qc_t __submit_bio_noacct(struct bio *bio)
> +{
> +	struct bio_list bio_list_on_stack[2];
> +	blk_qc_t ret = BLK_QC_T_NONE;
> +
> +	BUG_ON(bio->bi_next);
> +
> +	bio_list_init(&bio_list_on_stack[0]);
> +	current->bio_list = bio_list_on_stack;
> +
> +	do {
> +		struct request_queue *q = bio->bi_disk->queue;
> +		struct bio_list lower, same;
> +
> +		if (unlikely(bio_queue_enter(bio) != 0))
> +			continue;
> +
> +		/*
> +		 * Create a fresh bio_list for all subordinate requests.
> +		 */
> +		bio_list_on_stack[1] = bio_list_on_stack[0];
> +		bio_list_init(&bio_list_on_stack[0]);
> +
> +		ret = do_make_request(bio);
> +
> +		/*
> +		 * Sort new bios into those for a lower level and those for the
> +		 * same level.
> +		 */
> +		bio_list_init(&lower);
> +		bio_list_init(&same);
> +		while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
> +			if (q == bio->bi_disk->queue)
> +				bio_list_add(&same, bio);
> +			else
> +				bio_list_add(&lower, bio);
> +
> +		/*
> +		 * Now assemble so we handle the lowest level first.
> +		 */
> +		bio_list_merge(&bio_list_on_stack[0], &lower);
> +		bio_list_merge(&bio_list_on_stack[0], &same);
> +		bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
> +	} while ((bio = bio_list_pop(&bio_list_on_stack[0])));
> +
> +	current->bio_list = NULL;
> +	return ret;
> +}
> +
>  /**
>   * submit_bio_noacct - re-submit a bio to the block device layer for I/O
>   * @bio:  The bio describing the location in memory and on the device.
> @@ -1096,17 +1164,8 @@ static blk_qc_t do_make_request(struct bio *bio)
>   */
>  blk_qc_t submit_bio_noacct(struct bio *bio)
>  {
> -	/*
> -	 * bio_list_on_stack[0] contains bios submitted by the current
> -	 * ->submit_bio.
> -	 * bio_list_on_stack[1] contains bios that were submitted before the
> -	 * current ->submit_bio_bio, but that haven't been processed yet.
> -	 */
> -	struct bio_list bio_list_on_stack[2];
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -
>  	if (!submit_bio_checks(bio))
> -		goto out;
> +		return BLK_QC_T_NONE;
>  
>  	/*
>  	 * We only want one ->submit_bio to be active at a time, else
> @@ -1120,58 +1179,10 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
>  	 */
>  	if (current->bio_list) {
>  		bio_list_add(&current->bio_list[0], bio);
> -		goto out;
> +		return BLK_QC_T_NONE;
>  	}
>  
> -	/* following loop may be a bit non-obvious, and so deserves some
> -	 * explanation.
> -	 * Before entering the loop, bio->bi_next is NULL (as all callers
> -	 * ensure that) so we have a list with a single bio.
> -	 * We pretend that we have just taken it off a longer list, so
> -	 * we assign bio_list to a pointer to the bio_list_on_stack,
> -	 * thus initialising the bio_list of new bios to be
> -	 * added.  ->submit_bio() may indeed add some more bios
> -	 * through a recursive call to submit_bio_noacct.  If it
> -	 * did, we find a non-NULL value in bio_list and re-enter the loop
> -	 * from the top.  In this case we really did just take the bio
> -	 * of the top of the list (no pretending) and so remove it from
> -	 * bio_list, and call into ->submit_bio() again.
> -	 */
> -	BUG_ON(bio->bi_next);
> -	bio_list_init(&bio_list_on_stack[0]);
> -	current->bio_list = bio_list_on_stack;
> -	do {
> -		struct request_queue *q = bio->bi_disk->queue;
> -
> -		if (likely(bio_queue_enter(bio) == 0)) {
> -			struct bio_list lower, same;
> -
> -			/* Create a fresh bio_list for all subordinate requests */
> -			bio_list_on_stack[1] = bio_list_on_stack[0];
> -			bio_list_init(&bio_list_on_stack[0]);
> -			ret = do_make_request(bio);
> -
> -			/* sort new bios into those for a lower level
> -			 * and those for the same level
> -			 */
> -			bio_list_init(&lower);
> -			bio_list_init(&same);
> -			while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
> -				if (q == bio->bi_disk->queue)
> -					bio_list_add(&same, bio);
> -				else
> -					bio_list_add(&lower, bio);
> -			/* now assemble so we handle the lowest level first */
> -			bio_list_merge(&bio_list_on_stack[0], &lower);
> -			bio_list_merge(&bio_list_on_stack[0], &same);
> -			bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
> -		}
> -		bio = bio_list_pop(&bio_list_on_stack[0]);
> -	} while (bio);
> -	current->bio_list = NULL; /* deactivate */
> -
> -out:
> -	return ret;
> +	return __submit_bio_noacct(bio);
>  }
>  EXPORT_SYMBOL(submit_bio_noacct);
>  
> -- 
> 2.26.2
> 

^ permalink raw reply

* [PATCH] docs: powerpc: Clarify book3s/32 MMU families
From: Christophe Leroy @ 2020-07-02 14:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, corbet
  Cc: linuxppc-dev, linux-kernel, linux-doc

Documentation wrongly tells that book3s/32 CPU have hash MMU.

603 and e300 core only have software loaded TLB.

755, 7450 family and e600 core have both hash MMU and software loaded
TLB. This can be selected by setting a bit in HID2 (755) or
HID0 (others). At the time being this is not supported by the kernel.

Make this explicit in the documentation.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 Documentation/powerpc/cpu_families.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/powerpc/cpu_families.rst b/Documentation/powerpc/cpu_families.rst
index 1e063c5440c3..9b84e045e713 100644
--- a/Documentation/powerpc/cpu_families.rst
+++ b/Documentation/powerpc/cpu_families.rst
@@ -9,7 +9,9 @@ and are supported by arch/powerpc.
 Book3S (aka sPAPR)
 ------------------
 
-- Hash MMU
+- Hash MMU (except 603 and e300)
+- Software loaded TLB (603 and e300)
+- Selectable Software loaded TLB in addition to hash MMU (755, 7450, e600)
 - Mix of 32 & 64 bit::
 
    +--------------+                 +----------------+
@@ -24,9 +26,9 @@ Book3S (aka sPAPR)
           |                                 |
           |                                 |
           v                                 v
-   +--------------+                 +----------------+      +-------+
-   |     604      |                 |    750 (G3)    | ---> | 750CX |
-   +--------------+                 +----------------+      +-------+
+   +--------------+    +-----+      +----------------+      +-------+
+   |     604      |    | 755 | <--- |    750 (G3)    | ---> | 750CX |
+   +--------------+    +-----+      +----------------+      +-------+
           |                                 |                   |
           |                                 |                   |
           v                                 v                   v
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH V3 (RESEND) 2/3] mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()
From: Catalin Marinas @ 2020-07-02 14:07 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: x86, H. Peter Anvin, Peter Zijlstra, Dave Hansen, linuxppc-dev,
	linux-kernel, linux-mm, Ingo Molnar, Paul Mackerras,
	Andy Lutomirski, Borislav Petkov, Thomas Gleixner, Will Deacon,
	Andrew Morton, linux-arm-kernel
In-Reply-To: <1592442930-9380-3-git-send-email-anshuman.khandual@arm.com>

On Thu, Jun 18, 2020 at 06:45:29AM +0530, Anshuman Khandual wrote:
> There are many instances where vmemap allocation is often switched between
> regular memory and device memory just based on whether altmap is available
> or not. vmemmap_alloc_block_buf() is used in various platforms to allocate
> vmemmap mappings. Lets also enable it to handle altmap based device memory
> allocation along with existing regular memory allocations. This will help
> in avoiding the altmap based allocation switch in many places.
> 
> While here also implement a regular memory allocation fallback mechanism
> when the first preferred device memory allocation fails. This will ensure
> preserving the existing semantics on powerpc platform. To summarize there
> are three different methods to call vmemmap_alloc_block_buf().
> 
> (., NULL,   false) /* Allocate from system RAM */
> (., altmap, false) /* Allocate from altmap without any fallback */
> (., altmap, true)  /* Allocate from altmap with fallback (system RAM) */
[...]
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index bc73abf0bc25..01e25b56eccb 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -225,12 +225,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  		 * fall back to system memory if the altmap allocation fail.
>  		 */
>  		if (altmap && !altmap_cross_boundary(altmap, start, page_size)) {
> -			p = altmap_alloc_block_buf(page_size, altmap);
> -			if (!p)
> -				pr_debug("altmap block allocation failed, falling back to system memory");
> +			p = vmemmap_alloc_block_buf(page_size, node,
> +						    altmap, true);
> +		} else {
> +			p = vmemmap_alloc_block_buf(page_size, node,
> +						    NULL, false);
>  		}
> -		if (!p)
> -			p = vmemmap_alloc_block_buf(page_size, node);
>  		if (!p)
>  			return -ENOMEM;

Is the fallback argument actually necessary. It may be cleaner to just
leave the code as is with the choice between altmap and NULL. If an arch
needs a fallback (only powerpc), they have the fallback in place
already. I don't see the powerpc code any better after this change.

I'm fine with the altmap argument though.

-- 
Catalin

^ permalink raw reply

* Re: [powerpc][next-20200701] Hung task timeouts during regression test runs
From: Sachin Sant @ 2020-07-02 13:16 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, axboe, Linux Next Mailing List, linuxppc-dev
In-Reply-To: <20200702115216.GF2452799@T590>



> On 02-Jul-2020, at 5:22 PM, Ming Lei <ming.lei@redhat.com> wrote:
> 
> On Thu, Jul 02, 2020 at 04:53:04PM +0530, Sachin Sant wrote:
>> Starting with linux-next 20200701 release I am observing automated regressions
>> tests taking longer time to complete. A test which took 10 minutes with next-20200630
>> took more than 60 minutes against next-20200701. 
>> 
>> Following hung task timeout messages were seen during these runs
>> 
>> [ 1718.848351]       Not tainted 5.8.0-rc3-next-20200701-autotest #1
>> [ 1718.848356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 1718.848362] NetworkManager  D    0  2626      1 0x00040080
>> [ 1718.848367] Call Trace:
>> [ 1718.848374] [c0000008b0f6b8f0] [c000000000c6d558] schedule+0x78/0x130 (unreliable)
>> [ 1718.848382] [c0000008b0f6bad0] [c00000000001b070] __switch_to+0x2e0/0x480
>> [ 1718.848388] [c0000008b0f6bb30] [c000000000c6ce9c] __schedule+0x2cc/0x910
>> [ 1718.848394] [c0000008b0f6bc10] [c000000000c6d558] schedule+0x78/0x130
>> [ 1718.848401] [c0000008b0f6bc40] [c0000000005d5a64] jbd2_log_wait_commit+0xd4/0x1a0
>> [ 1718.848408] [c0000008b0f6bcc0] [c00000000055fb6c] ext4_sync_file+0x1cc/0x480
>> [ 1718.848415] [c0000008b0f6bd20] [c000000000493530] vfs_fsync_range+0x70/0xf0
>> [ 1718.848421] [c0000008b0f6bd60] [c000000000493638] do_fsync+0x58/0xd0
>> [ 1718.848427] [c0000008b0f6bda0] [c0000000004936d8] sys_fsync+0x28/0x40
>> [ 1718.848433] [c0000008b0f6bdc0] [c000000000035e28] system_call_exception+0xf8/0x1c0
>> [ 1718.848440] [c0000008b0f6be20] [c00000000000ca70] system_call_common+0xf0/0x278
>> 
>> Comparing next-20200630 with next-20200701 one possible candidate seems to
>> be following commit:
>> 
>> commit 37f4a24c2469a10a4c16c641671bd766e276cf9f
>>    blk-mq: centralise related handling into blk_mq_get_driver_tag
>> 
>> Reverting this commit allows the test to complete in 10 minutes.
> 
> Hello,
> 
> Thanks for the report.
> 
> Please try the following fix:
> 
> https://lore.kernel.org/linux-block/20200702062041.GC2452799@T590/raw

The fix works for me.

Tested-by : Sachin Sant <sachinp@linux.vnet.ibm.com>

Thanks
-Sachin

> 
> 
> Thanks,
> Ming


^ permalink raw reply

* Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
From: Dave Young @ 2020-07-02 12:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Kazuhito Hagio, Bhupesh Sharma, x86, kexec,
	linux-kernel, linuxppc-dev, Paul Mackerras, Boris Petkov,
	James Morse, Thomas Gleixner, bhupesh.linux, Will Deacon,
	Ingo Molnar, linux-arm-kernel, Dave Anderson
In-Reply-To: <20200702110003.GC22241@gaia>

Hi Catalin,
On 07/02/20 at 12:00pm, Catalin Marinas wrote:
> On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 9f1557b98468..18175687133a 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> >  	VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
> >  	VMCOREINFO_STRUCT_SIZE(mem_section);
> >  	VMCOREINFO_OFFSET(mem_section, section_mem_map);
> > +	VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
> >  #endif
> >  	VMCOREINFO_STRUCT_SIZE(page);
> >  	VMCOREINFO_STRUCT_SIZE(pglist_data);
> 
> I can queue this patch via the arm64 tree (together with the second one)
> but I'd like an ack from the kernel/crash_core.c maintainers. They don't
> seem to have been cc'ed either (only the kexec list).

For the VMCOREINFO part, I'm fine with the changes, but since I do not
understand the arm64 pieces so I would like to leave to arm64 people to
review.  If arm64 bits are good enough, feel free to add:

Acked-by: Dave Young <dyoung@redhat.com>

Thanks
Dave


^ permalink raw reply

* [Bug 208181] BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8
From: bugzilla-daemon @ 2020-07-02 12:00 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-208181-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=208181

--- Comment #11 from Christophe Leroy (christophe.leroy@csgroup.eu) ---
The issue is that that commit moved more code than described into kasan_init():

Kasan Pages allocation have to be moved into kasan_init() but page tables
allocation must remain before the switch to the final hash table.

Problem only occurs on book3s/32 having hash MMU.

See proposed fix at
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=187165 (2
patches).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: Dave Young @ 2020-07-02 11:59 UTC (permalink / raw)
  To: piliu
  Cc: Thiago Jung Bauermann, Kexec-ml, Mahesh J Salgaonkar,
	Petr Tesarik, lkml, linuxppc-dev, Sourabh Jain, Vivek Goyal,
	Andrew Morton, Mimi Zohar, Hari Bathini, Eric Biederman
In-Reply-To: <bc8fe308-5ce9-1ca6-c832-7c3a75a732d2@redhat.com>

> > I'm confused about the "overlap with crashkernel memory", does that mean
> > those normal kernel used memory could be put in crashkernel reserved
> > memory range?  If so why can't just skip those areas while crashkernel
> > doing the reservation?
> I raised the same question in another mail. As Hari's answer, "kexec -p"
> skips these ranges in user space. And the same logic should be done in
> "kexec -s -p"

See it, thanks!  The confusion also applied to the userspace
implementation though.  Seems they have to be special cases because of 
the powerpc crashkernel reservation implemtation in kernel limitation

Thanks
Dave


^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: Dave Young @ 2020-07-02 11:54 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Thiago Jung Bauermann, Pingfan Liu, Kexec-ml, Petr Tesarik, lkml,
	Sourabh Jain, Mahesh J Salgaonkar, linuxppc-dev, Mimi Zohar,
	Andrew Morton, Vivek Goyal, Eric Biederman
In-Reply-To: <6e96ae5a-91fd-726e-1eda-314f2317d8b4@linux.ibm.com>

On 07/01/20 at 11:48pm, Hari Bathini wrote:
> 
> 
> On 01/07/20 1:10 pm, Dave Young wrote:
> > Hi Hari,
> > On 06/27/20 at 12:35am, Hari Bathini wrote:
> >> crashkernel region could have an overlap with special memory regions
> >> like  opal, rtas, tce-table & such. These regions are referred to as
> >> exclude memory ranges. Setup this ranges during image probe in order
> >> to avoid them while finding the buffer for different kdump segments.
> >> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
> >> accounting for these ranges. Also, override arch_kexec_add_buffer()
> >> to locate a memory hole & later call __kexec_add_buffer() function
> >> with kbuf->mem set to skip the generic locate memory hole lookup.
> >>
> >> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> >> ---
> >>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
> >>  arch/powerpc/include/asm/kexec.h           |    7 -
> >>  arch/powerpc/kexec/elf_64.c                |    7 +
> >>  arch/powerpc/kexec/file_load_64.c          |  292 ++++++++++++++++++++++++++++
> >>  4 files changed, 312 insertions(+), 4 deletions(-)
> >>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
> >>
> > [snip]
> >>  /**
> >> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
> >> + *                             regions like opal/rtas, tce-table, initrd,
> >> + *                             kernel, htab which should be avoided while
> >> + *                             setting up kexec load segments.
> >> + * @mem_ranges:                Range list to add the memory ranges to.
> >> + *
> >> + * Returns 0 on success, negative errno on error.
> >> + */
> >> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
> >> +{
> >> +	int ret;
> >> +
> >> +	ret = add_tce_mem_ranges(mem_ranges);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_initrd_mem_range(mem_ranges);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_htab_mem_range(mem_ranges);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_kernel_mem_range(mem_ranges);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_rtas_mem_range(mem_ranges, false);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_opal_mem_range(mem_ranges, false);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	ret = add_reserved_ranges(mem_ranges);
> >> +	if (ret)
> >> +		goto out;
> >> +
> >> +	/* exclude memory ranges should be sorted for easy lookup */
> >> +	sort_memory_ranges(*mem_ranges);
> >> +out:
> >> +	if (ret)
> >> +		pr_err("Failed to setup exclude memory ranges\n");
> >> +	return ret;
> >> +}
> > 
> > I'm confused about the "overlap with crashkernel memory", does that mean
> > those normal kernel used memory could be put in crashkernel reserved
> 
> There are regions that could overlap with crashkernel region but they are
> not normal kernel used memory though. These are regions that kernel and/or
> f/w chose to place at a particular address for real mode accessibility
> and/or memory layout between kernel & f/w kind of thing.
> 
> > memory range?  If so why can't just skip those areas while crashkernel
> > doing the reservation?
> 
> crashkernel region has a dependency to be in the first memory block for it
> to be accessible in real mode. Accommodating this requirement while addressing
> other requirements would mean something like what we have now. A list of
> possible special memory regions in crashkernel region to take care of.
> 
> I have plans to split crashkernel region into low & high to have exclusive
> regions for crashkernel, even if that means to have two of them. But that
> is for another day with its own set of complexities to deal with...

Ok, I was not aware the powerpc crashkernel reservation is not
dynamically reserved.  But seems powerpc need those tricks at least for
the time being like you said.

Thanks
Dave


^ permalink raw reply

* Re: [powerpc][next-20200701] Hung task timeouts during regression test runs
From: Ming Lei @ 2020-07-02 11:52 UTC (permalink / raw)
  To: Sachin Sant; +Cc: linux-block, axboe, Linux Next Mailing List, linuxppc-dev
In-Reply-To: <CDAB3931-FAAD-443A-A9CD-362E527043A1@linux.vnet.ibm.com>

On Thu, Jul 02, 2020 at 04:53:04PM +0530, Sachin Sant wrote:
> Starting with linux-next 20200701 release I am observing automated regressions
> tests taking longer time to complete. A test which took 10 minutes with next-20200630
> took more than 60 minutes against next-20200701. 
> 
> Following hung task timeout messages were seen during these runs
> 
> [ 1718.848351]       Not tainted 5.8.0-rc3-next-20200701-autotest #1
> [ 1718.848356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1718.848362] NetworkManager  D    0  2626      1 0x00040080
> [ 1718.848367] Call Trace:
> [ 1718.848374] [c0000008b0f6b8f0] [c000000000c6d558] schedule+0x78/0x130 (unreliable)
> [ 1718.848382] [c0000008b0f6bad0] [c00000000001b070] __switch_to+0x2e0/0x480
> [ 1718.848388] [c0000008b0f6bb30] [c000000000c6ce9c] __schedule+0x2cc/0x910
> [ 1718.848394] [c0000008b0f6bc10] [c000000000c6d558] schedule+0x78/0x130
> [ 1718.848401] [c0000008b0f6bc40] [c0000000005d5a64] jbd2_log_wait_commit+0xd4/0x1a0
> [ 1718.848408] [c0000008b0f6bcc0] [c00000000055fb6c] ext4_sync_file+0x1cc/0x480
> [ 1718.848415] [c0000008b0f6bd20] [c000000000493530] vfs_fsync_range+0x70/0xf0
> [ 1718.848421] [c0000008b0f6bd60] [c000000000493638] do_fsync+0x58/0xd0
> [ 1718.848427] [c0000008b0f6bda0] [c0000000004936d8] sys_fsync+0x28/0x40
> [ 1718.848433] [c0000008b0f6bdc0] [c000000000035e28] system_call_exception+0xf8/0x1c0
> [ 1718.848440] [c0000008b0f6be20] [c00000000000ca70] system_call_common+0xf0/0x278
> 
> Comparing next-20200630 with next-20200701 one possible candidate seems to
> be following commit:
> 
> commit 37f4a24c2469a10a4c16c641671bd766e276cf9f
>     blk-mq: centralise related handling into blk_mq_get_driver_tag
> 
> Reverting this commit allows the test to complete in 10 minutes.

Hello,

Thanks for the report.

Please try the following fix:

https://lore.kernel.org/linux-block/20200702062041.GC2452799@T590/raw


Thanks,
Ming


^ permalink raw reply

* [PATCH 1/2] Revert "powerpc/kasan: Fix shadow pages allocation failure"
From: Christophe Leroy @ 2020-07-02 11:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	erhard_f
  Cc: linuxppc-dev, linux-kernel

This reverts commit d2a91cef9bbdeb87b7449fdab1a6be6000930210.

This commit moved too much work in kasan_init(). The allocation
of shadow pages has to be moved for the reason explained in that
patch, but the allocation of page tables still need to be done
before switching to the final hash table.

First revert the incorrect commit, following patch redoes it
properly.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/kasan.h      | 2 ++
 arch/powerpc/mm/init_32.c             | 2 ++
 arch/powerpc/mm/kasan/kasan_init_32.c | 4 +---
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index be85c7005fb1..d635b96c7ea6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -27,10 +27,12 @@
 
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
+void kasan_mmu_init(void);
 void kasan_init(void);
 void kasan_late_init(void);
 #else
 static inline void kasan_init(void) { }
+static inline void kasan_mmu_init(void) { }
 static inline void kasan_late_init(void) { }
 #endif
 
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 5a5469eb3174..bf1717f8d5f4 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -171,6 +171,8 @@ void __init MMU_init(void)
 	btext_unmap();
 #endif
 
+	kasan_mmu_init();
+
 	setup_kup();
 
 	/* Shortly after that, the entire linear mapping will be available */
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 0760e1e754e4..4813c6d50889 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -117,7 +117,7 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
 	kasan_update_early_region(k_start, k_end, __pte(0));
 }
 
-static void __init kasan_mmu_init(void)
+void __init kasan_mmu_init(void)
 {
 	int ret;
 	struct memblock_region *reg;
@@ -146,8 +146,6 @@ static void __init kasan_mmu_init(void)
 
 void __init kasan_init(void)
 {
-	kasan_mmu_init();
-
 	kasan_remap_early_shadow_ro();
 
 	clear_page(kasan_early_shadow_page);
-- 
2.25.0


^ permalink raw reply related

* [PATCH 2/2] powerpc/kasan: Fix shadow pages allocation failure
From: Christophe Leroy @ 2020-07-02 11:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	erhard_f
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu>

Doing kasan pages allocation in MMU_init is too early, kernel doesn't
have access yet to the entire memory space and memblock_alloc() fails
when the kernel is a bit big.

Do it from kasan_init() instead.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 4813c6d50889..019b0c0bbbf3 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -120,11 +120,24 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
 void __init kasan_mmu_init(void)
 {
 	int ret;
+
+	if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
+	    IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
+		ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+		if (ret)
+			panic("kasan: kasan_init_shadow_page_tables() failed");
+	}
+}
+
+void __init kasan_init(void)
+{
 	struct memblock_region *reg;
 
 	for_each_memblock(memory, reg) {
 		phys_addr_t base = reg->base;
 		phys_addr_t top = min(base + reg->size, total_lowmem);
+		int ret;
 
 		if (base >= top)
 			continue;
@@ -134,18 +147,6 @@ void __init kasan_mmu_init(void)
 			panic("kasan: kasan_init_region() failed");
 	}
 
-	if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
-	    IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
-		ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
-
-		if (ret)
-			panic("kasan: kasan_init_shadow_page_tables() failed");
-	}
-
-}
-
-void __init kasan_init(void)
-{
 	kasan_remap_early_shadow_ro();
 
 	clear_page(kasan_early_shadow_page);
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: Dave Young @ 2020-07-02 11:47 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Pingfan Liu, Petr Tesarik, Kexec-ml, Mahesh J Salgaonkar,
	Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain, Vivek Goyal,
	Andrew Morton, Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <0e145e84-a6cf-4da3-1a1a-331a7e1ac1fa@linux.ibm.com>

On 07/02/20 at 12:01am, Hari Bathini wrote:
> 
> 
> On 01/07/20 1:16 pm, Dave Young wrote:
> > On 06/29/20 at 05:26pm, Hari Bathini wrote:
> >> Hi Petr,
> >>
> >> On 29/06/20 5:09 pm, Petr Tesarik wrote:
> >>> Hi Hari,
> >>>
> >>> is there any good reason to add two more functions with a very similar
> >>> name to an existing function? AFAICS all you need is a way to call a
> >>> PPC64-specific function from within kexec_add_buffer (PATCH 4/11), so
> >>> you could add something like this:
> >>>
> >>> int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> >>> {
> >>> 	return 0;
> >>> }
> >>>
> >>> Call this function from kexec_add_buffer where appropriate and then
> >>> override it for PPC64 (it roughly corresponds to your
> >>> kexec_locate_mem_hole_ppc64() from PATCH 4/11).
> >>>
> >>> FWIW it would make it easier for me to follow the resulting code.
> >>
> >> Right, Petr.
> >>
> >> I was trying out a few things before I ended up with what I sent here.
> >> Bu yeah.. I did realize arch_kexec_locate_mem_hole() would have been better
> >> after sending out v1. Will take care of that in v2.
> > 
> > Another way is use arch private function to locate mem hole, then set
> > kbuf->mem, and then call kexec_add_buf, it will skip the common locate
> > hole function.
> 
> Dave, I did think about it. But there are a couple of places this can get
> tricky. One is ima_add_kexec_buffer() and the other is kexec_elf_load().
> These call sites could be updated to set kbuf->mem before kexec_add_buffer().
> But the current approach seemed like the better option for it creates a
> single point of control in setting up segment buffers and also, makes adding
> any new segments simpler, arch-specific segments or otherwise.
> 

Ok, thanks for the explanation.


^ permalink raw reply

* [powerpc][next-20200701] Hung task timeouts during regression test runs
From: Sachin Sant @ 2020-07-02 11:23 UTC (permalink / raw)
  To: linuxppc-dev, linux-block; +Cc: axboe, Linux Next Mailing List, ming.lei

Starting with linux-next 20200701 release I am observing automated regressions
tests taking longer time to complete. A test which took 10 minutes with next-20200630
took more than 60 minutes against next-20200701. 

Following hung task timeout messages were seen during these runs

[ 1718.848351]       Not tainted 5.8.0-rc3-next-20200701-autotest #1
[ 1718.848356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1718.848362] NetworkManager  D    0  2626      1 0x00040080
[ 1718.848367] Call Trace:
[ 1718.848374] [c0000008b0f6b8f0] [c000000000c6d558] schedule+0x78/0x130 (unreliable)
[ 1718.848382] [c0000008b0f6bad0] [c00000000001b070] __switch_to+0x2e0/0x480
[ 1718.848388] [c0000008b0f6bb30] [c000000000c6ce9c] __schedule+0x2cc/0x910
[ 1718.848394] [c0000008b0f6bc10] [c000000000c6d558] schedule+0x78/0x130
[ 1718.848401] [c0000008b0f6bc40] [c0000000005d5a64] jbd2_log_wait_commit+0xd4/0x1a0
[ 1718.848408] [c0000008b0f6bcc0] [c00000000055fb6c] ext4_sync_file+0x1cc/0x480
[ 1718.848415] [c0000008b0f6bd20] [c000000000493530] vfs_fsync_range+0x70/0xf0
[ 1718.848421] [c0000008b0f6bd60] [c000000000493638] do_fsync+0x58/0xd0
[ 1718.848427] [c0000008b0f6bda0] [c0000000004936d8] sys_fsync+0x28/0x40
[ 1718.848433] [c0000008b0f6bdc0] [c000000000035e28] system_call_exception+0xf8/0x1c0
[ 1718.848440] [c0000008b0f6be20] [c00000000000ca70] system_call_common+0xf0/0x278

Comparing next-20200630 with next-20200701 one possible candidate seems to
be following commit:

commit 37f4a24c2469a10a4c16c641671bd766e276cf9f
    blk-mq: centralise related handling into blk_mq_get_driver_tag

Reverting this commit allows the test to complete in 10 minutes.

Thanks
-Sachin


^ permalink raw reply

* Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
From: Catalin Marinas @ 2020-07-02 11:00 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, Kazuhito Hagio, Will Deacon, x86, kexec,
	linux-kernel, Paul Mackerras, James Morse, Boris Petkov,
	Thomas Gleixner, bhupesh.linux, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel, Dave Anderson
In-Reply-To: <1589395957-24628-2-git-send-email-bhsharma@redhat.com>

On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 9f1557b98468..18175687133a 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>  	VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
>  	VMCOREINFO_STRUCT_SIZE(mem_section);
>  	VMCOREINFO_OFFSET(mem_section, section_mem_map);
> +	VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
>  #endif
>  	VMCOREINFO_STRUCT_SIZE(page);
>  	VMCOREINFO_STRUCT_SIZE(pglist_data);

I can queue this patch via the arm64 tree (together with the second one)
but I'd like an ack from the kernel/crash_core.c maintainers. They don't
seem to have been cc'ed either (only the kexec list).

Thanks.

-- 
Catalin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox