Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v4 11/12] fpga: dfl: fme: add global error reporting support
From: Wu Hao @ 2019-08-07  2:45 UTC (permalink / raw)
  To: Greg KH
  Cc: mdf, linux-fpga, linux-kernel, linux-api, linux-doc, atull,
	Luwei Kang, Ananda Ravuri, Xu Yilun
In-Reply-To: <20190805155626.GD8107@kroah.com>

On Mon, Aug 05, 2019 at 05:56:26PM +0200, Greg KH wrote:
> On Sun, Aug 04, 2019 at 06:20:21PM +0800, Wu Hao wrote:
> > +static int fme_global_err_init(struct platform_device *pdev,
> > +			       struct dfl_feature *feature)
> > +{
> > +	struct device *dev;
> > +	int ret = 0;
> > +
> > +	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > +	if (!dev)
> > +		return -ENOMEM;
> > +
> > +	dev->parent = &pdev->dev;
> > +	dev->release = err_dev_release;
> > +	dev_set_name(dev, "errors");
> > +
> > +	fme_error_enable(feature);
> > +
> > +	ret = device_register(dev);
> > +	if (ret) {
> > +		put_device(dev);
> > +		return ret;
> > +	}
> > +
> > +	ret = device_add_groups(dev, error_groups);
> 
> cute, but no, you do not create a whole struct device for a subdir.  Use
> the attribute group name like you did on earlier patches.

Sure, let me fix it in the next version.

> 
> And again, you raced userspace and lost :(

Same here, could you please give some more hints here?

Thanks in advance.
Hao

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [PATCH v4 07/12] fpga: dfl: afu: add error reporting support.
From: Wu Hao @ 2019-08-07  2:35 UTC (permalink / raw)
  To: Greg KH
  Cc: mdf, linux-fpga, linux-kernel, linux-api, linux-doc, atull,
	Xu Yilun
In-Reply-To: <20190805155437.GC8107@kroah.com>

On Mon, Aug 05, 2019 at 05:54:37PM +0200, Greg KH wrote:
> On Sun, Aug 04, 2019 at 06:20:17PM +0800, Wu Hao wrote:
> > Error reporting is one important private feature, it reports error
> > detected on port and accelerated function unit (AFU). It introduces
> > several sysfs interfaces to allow userspace to check and clear
> > errors detected by hardware.
> > 
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > Acked-by: Alan Tull <atull@kernel.org>
> > Signed-off-by: Moritz Fischer <mdf@kernel.org>
> > ---
> > v2: switch to device_add/remove_group for sysfs.
> > v3: update kernel version and date in sysfs doc
> > v4: remove dev_dbg in init/uinit callback function.
> > ---
> >  Documentation/ABI/testing/sysfs-platform-dfl-port |  39 ++++
> >  drivers/fpga/Makefile                             |   1 +
> >  drivers/fpga/dfl-afu-error.c                      | 221 ++++++++++++++++++++++
> >  drivers/fpga/dfl-afu-main.c                       |   4 +
> >  drivers/fpga/dfl-afu.h                            |   4 +
> >  5 files changed, 269 insertions(+)
> >  create mode 100644 drivers/fpga/dfl-afu-error.c
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > index 5663441..3b6580b 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-port
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > @@ -81,3 +81,42 @@ KernelVersion:	5.4
> >  Contact:	Wu Hao <hao.wu@intel.com>
> >  Description:	Read-only. Read this file to get the status of issued command
> >  		to userclck_freqcntrcmd.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/errors/revision
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the revision of this error
> > +		reporting private feature.
> 
> Same revision question here that I had on an earlier patch.
> 
> 
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/errors/errors
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get errors detected on port and
> > +		Accelerated Function Unit (AFU).
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/errors/first_error
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the first error detected by
> > +		hardware.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the first malformed request
> > +		captured by hardware.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/errors/clear
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Write-only. Write error code to this file to clear errors.
> > +		Write fails with -EINVAL if input parsing fails or input error
> > +		code doesn't match.
> > +		Write fails with -EBUSY or -ETIMEDOUT if error can't be cleared
> > +		as hardware is in low power state (-EBUSY) or not responding
> > +		(-ETIMEDOUT).
> > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > index 312b937..7255891 100644
> > --- a/drivers/fpga/Makefile
> > +++ b/drivers/fpga/Makefile
> > @@ -41,6 +41,7 @@ obj-$(CONFIG_FPGA_DFL_AFU)		+= dfl-afu.o
> >  
> >  dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
> >  dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
> > +dfl-afu-objs += dfl-afu-error.o
> >  
> >  # Drivers for FPGAs which implement DFL
> >  obj-$(CONFIG_FPGA_DFL_PCI)		+= dfl-pci.o
> > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> > new file mode 100644
> > index 0000000..c5e0efa
> > --- /dev/null
> > +++ b/drivers/fpga/dfl-afu-error.c
> > @@ -0,0 +1,221 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
> > + *
> > + * Copyright 2019 Intel Corporation, Inc.
> > + *
> > + * Authors:
> > + *   Wu Hao <hao.wu@linux.intel.com>
> > + *   Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > + *   Joseph Grecco <joe.grecco@intel.com>
> > + *   Enno Luebbers <enno.luebbers@intel.com>
> > + *   Tim Whisonant <tim.whisonant@intel.com>
> > + *   Ananda Ravuri <ananda.ravuri@intel.com>
> > + *   Mitchel Henry <henry.mitchel@intel.com>
> > + */
> > +
> > +#include <linux/uaccess.h>
> > +
> > +#include "dfl-afu.h"
> > +
> > +#define PORT_ERROR_MASK		0x8
> > +#define PORT_ERROR		0x10
> > +#define PORT_FIRST_ERROR	0x18
> > +#define PORT_MALFORMED_REQ0	0x20
> > +#define PORT_MALFORMED_REQ1	0x28
> > +
> > +#define ERROR_MASK		GENMASK_ULL(63, 0)
> > +
> > +/* mask or unmask port errors by the error mask register. */
> > +static void __port_err_mask(struct device *dev, bool mask)
> > +{
> > +	void __iomem *base;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > +	writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
> > +}
> > +
> > +/* clear port errors. */
> > +static int __port_err_clear(struct device *dev, u64 err)
> > +{
> > +	struct platform_device *pdev = to_platform_device(dev);
> > +	void __iomem *base_err, *base_hdr;
> > +	int ret;
> > +	u64 v;
> > +
> > +	base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +	base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	/*
> > +	 * clear Port Errors
> > +	 *
> > +	 * - Check for AP6 State
> > +	 * - Halt Port by keeping Port in reset
> > +	 * - Set PORT Error mask to all 1 to mask errors
> > +	 * - Clear all errors
> > +	 * - Set Port mask to all 0 to enable errors
> > +	 * - All errors start capturing new errors
> > +	 * - Enable Port by pulling the port out of reset
> > +	 */
> > +
> > +	/* if device is still in AP6 power state, can not clear any error. */
> > +	v = readq(base_hdr + PORT_HDR_STS);
> > +	if (FIELD_GET(PORT_STS_PWR_STATE, v) == PORT_STS_PWR_STATE_AP6) {
> > +		dev_err(dev, "Could not clear errors, device in AP6 state.\n");
> > +		return -EBUSY;
> > +	}
> > +
> > +	/* Halt Port by keeping Port in reset */
> > +	ret = __port_disable(pdev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Mask all errors */
> > +	__port_err_mask(dev, true);
> > +
> > +	/* Clear errors if err input matches with current port errors.*/
> > +	v = readq(base_err + PORT_ERROR);
> > +
> > +	if (v == err) {
> > +		writeq(v, base_err + PORT_ERROR);
> > +
> > +		v = readq(base_err + PORT_FIRST_ERROR);
> > +		writeq(v, base_err + PORT_FIRST_ERROR);
> > +	} else {
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	/* Clear mask */
> > +	__port_err_mask(dev, false);
> > +
> > +	/* Enable the Port by clear the reset */
> > +	__port_enable(pdev);
> > +
> > +	return ret;
> > +}
> > +
> > +static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
> > +			     char *buf)
> > +{
> > +	void __iomem *base;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > +	return sprintf(buf, "%u\n", dfl_feature_revision(base));
> > +}
> > +static DEVICE_ATTR_RO(revision);
> > +
> > +static ssize_t errors_show(struct device *dev, struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	void __iomem *base;
> > +	u64 error;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	error = readq(base + PORT_ERROR);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return sprintf(buf, "0x%llx\n", (unsigned long long)error);
> > +}
> > +static DEVICE_ATTR_RO(errors);
> > +
> > +static ssize_t first_error_show(struct device *dev,
> > +				struct device_attribute *attr, char *buf)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	void __iomem *base;
> > +	u64 error;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	error = readq(base + PORT_FIRST_ERROR);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return sprintf(buf, "0x%llx\n", (unsigned long long)error);
> > +}
> > +static DEVICE_ATTR_RO(first_error);
> > +
> > +static ssize_t first_malformed_req_show(struct device *dev,
> > +					struct device_attribute *attr,
> > +					char *buf)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	void __iomem *base;
> > +	u64 req0, req1;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	req0 = readq(base + PORT_MALFORMED_REQ0);
> > +	req1 = readq(base + PORT_MALFORMED_REQ1);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return sprintf(buf, "0x%016llx%016llx\n",
> > +		       (unsigned long long)req1, (unsigned long long)req0);
> > +}
> > +static DEVICE_ATTR_RO(first_malformed_req);
> > +
> > +static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
> > +			   const char *buff, size_t count)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	u64 value;
> > +	int ret;
> > +
> > +	if (kstrtou64(buff, 0, &value))
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&pdata->lock);
> > +	ret = __port_err_clear(dev, value);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return ret ? ret : count;
> > +}
> > +static DEVICE_ATTR_WO(clear);
> > +
> > +static struct attribute *port_err_attrs[] = {
> > +	&dev_attr_revision.attr,
> > +	&dev_attr_errors.attr,
> > +	&dev_attr_first_error.attr,
> > +	&dev_attr_first_malformed_req.attr,
> > +	&dev_attr_clear.attr,
> > +	NULL,
> > +};
> > +
> > +static struct attribute_group port_err_attr_group = {
> > +	.attrs = port_err_attrs,
> > +	.name = "errors",
> > +};
> > +
> > +static int port_err_init(struct platform_device *pdev,
> > +			 struct dfl_feature *feature)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	__port_err_mask(&pdev->dev, false);
> > +	mutex_unlock(&pdata->lock);
> 
> Locking one data structure and then modifying another one is up there
> with "things never to do in the kernel unless you want a huge
> headache!".

Actually we always use the same lock for protection as other places, but
the code may cause some misunderstanding, let me improve this part in
the next version.

> 
> > +
> > +	return device_add_group(&pdev->dev, &port_err_attr_group);
> 
> You raced userspace and lost :(

Do you mind giving some more hints on this one? I guess I didn't fully
understand this. :( Add handling if device_add_group failed here, or
something else I should fix?

Thanks
Hao

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [PATCH v4 06/12] fpga: dfl: afu: export __port_enable/disable function.
From: Wu Hao @ 2019-08-07  2:21 UTC (permalink / raw)
  To: Greg KH
  Cc: mdf, linux-fpga, linux-kernel, linux-api, linux-doc, atull,
	Xu Yilun
In-Reply-To: <20190805155240.GB8107@kroah.com>

On Mon, Aug 05, 2019 at 05:52:40PM +0200, Greg KH wrote:
> On Sun, Aug 04, 2019 at 06:20:16PM +0800, Wu Hao wrote:
> > As these two functions are used by other private features. e.g.
> > in error reporting private feature, it requires to check port status
> > and reset port for error clearing.
> > 
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > Acked-by: Moritz Fischer <mdf@kernel.org>
> > Acked-by: Alan Tull <atull@kernel.org>
> > Signed-off-by: Moritz Fischer <mdf@kernel.org>
> > ---
> > v2: rebased
> > ---
> >  drivers/fpga/dfl-afu-main.c | 25 ++++++++++++++-----------
> >  drivers/fpga/dfl-afu.h      |  3 +++
> >  2 files changed, 17 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> > index e013afb..e312179 100644
> > --- a/drivers/fpga/dfl-afu-main.c
> > +++ b/drivers/fpga/dfl-afu-main.c
> > @@ -22,14 +22,16 @@
> >  #include "dfl-afu.h"
> >  
> >  /**
> > - * port_enable - enable a port
> > + * __port_enable - enable a port
> >   * @pdev: port platform device.
> >   *
> >   * Enable Port by clear the port soft reset bit, which is set by default.
> >   * The AFU is unable to respond to any MMIO access while in reset.
> > - * port_enable function should only be used after port_disable function.
> > + * __port_enable function should only be used after __port_disable function.
> > + *
> > + * The caller needs to hold lock for protection.
> >   */
> > -static void port_enable(struct platform_device *pdev)
> > +void __port_enable(struct platform_device *pdev)
> 
> worst global function name ever.
> 
> Don't polute the global namespace like this for a single driver.  If you
> REALLY need it, then use a prefix that shows it is your individual
> dfl_special_sauce_platform_device_only type thing.

Oh.. Sure.. Let me fix the naming in the next version.

Thanks
Hao

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [PATCH v4 04/12] fpga: dfl: afu: add userclock sysfs interfaces.
From: Wu Hao @ 2019-08-07  2:18 UTC (permalink / raw)
  To: Greg KH
  Cc: mdf, linux-fpga, linux-kernel, linux-api, linux-doc, atull,
	Ananda Ravuri, Russ Weight, Xu Yilun
In-Reply-To: <20190805155113.GA8107@kroah.com>

On Mon, Aug 05, 2019 at 05:51:13PM +0200, Greg KH wrote:
> On Sun, Aug 04, 2019 at 06:20:14PM +0800, Wu Hao wrote:
> > This patch introduces userclock sysfs interfaces for AFU, user
> > could use these interfaces for clock setting to AFU.
> > 
> > Please note that, this is only working for port header feature
> > with revision 0, for later revisions, userclock setting is moved
> > to a separated private feature, so one revision sysfs interface
> > is exposed to userspace application for this purpose too.
> > 
> > Signed-off-by: Ananda Ravuri <ananda.ravuri@intel.com>
> > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> > Signed-off-by: Xu Yilun <yilun.xu@intel.com>
> > Signed-off-by: Wu Hao <hao.wu@intel.com>
> > Acked-by: Alan Tull <atull@kernel.org>
> > Signed-off-by: Moritz Fischer <mdf@kernel.org>
> > ---
> > v2: rebased, and switched to use device_add/remove_groups for sysfs
> > v3: update kernel version and date in sysfs doc
> > v4: rebased.
> > ---
> >  Documentation/ABI/testing/sysfs-platform-dfl-port |  35 +++++++
> >  drivers/fpga/dfl-afu-main.c                       | 114 +++++++++++++++++++++-
> >  drivers/fpga/dfl.h                                |   9 ++
> >  3 files changed, 157 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > index 1ab3e6f..5663441 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-port
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > @@ -46,3 +46,38 @@ Contact:	Wu Hao <hao.wu@intel.com>
> >  Description:	Read-write. Read or set AFU latency tolerance reporting value.
> >  		Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
> >  		to 0 if it is latency sensitive.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/revision
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the revision of port header
> > +		feature.
> 
> What does "revision" mean?
> 
> It feels like you are creating a different set of sysfs files depending
> on the revision field.  Which is fine, sysfs is one-value-per-file and
> userspace needs to handle if the file is present or not.  So why not
> just rely on that and not have to mess with 'revision' at all?  What is
> userspace going to do with that information?

Hi Greg

Thanks for the review and comments,

Yes, different revision of private feature may have different hardware
features, so driver will expose different set of sysfs entries. revision
here is used to help userspace to distinguish them. I think it makes
sense to just rely on if sysfs entry exists or not, manage revision in
userspace code may be quit difficult. Plan to remove this entry in the
next version.

> 
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/userclk_freqcmd
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Write-only. User writes command to this interface to set
> > +		userclock to AFU.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/userclk_freqsts
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the status of issued command
> > +		to userclck_freqcmd.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Write-only. User writes command to this interface to set
> > +		userclock counter.
> > +
> > +What:		/sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts
> > +Date:		August 2019
> > +KernelVersion:	5.4
> > +Contact:	Wu Hao <hao.wu@intel.com>
> > +Description:	Read-only. Read this file to get the status of issued command
> > +		to userclck_freqcntrcmd.
> > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> > index 12175bb..407c97d 100644
> > --- a/drivers/fpga/dfl-afu-main.c
> > +++ b/drivers/fpga/dfl-afu-main.c
> > @@ -142,6 +142,17 @@ static int port_get_id(struct platform_device *pdev)
> >  static DEVICE_ATTR_RO(id);
> >  
> >  static ssize_t
> > +revision_show(struct device *dev, struct device_attribute *attr, char *buf)
> > +{
> > +	void __iomem *base;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	return sprintf(buf, "%x\n", dfl_feature_revision(base));
> > +}
> > +static DEVICE_ATTR_RO(revision);
> > +
> > +static ssize_t
> >  ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
> >  {
> >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > @@ -276,6 +287,7 @@ static int port_get_id(struct platform_device *pdev)
> >  
> >  static struct attribute *port_hdr_attrs[] = {
> >  	&dev_attr_id.attr,
> > +	&dev_attr_revision.attr,
> >  	&dev_attr_ltr.attr,
> >  	&dev_attr_ap1_event.attr,
> >  	&dev_attr_ap2_event.attr,
> > @@ -284,14 +296,113 @@ static int port_get_id(struct platform_device *pdev)
> >  };
> >  ATTRIBUTE_GROUPS(port_hdr);
> >  
> > +static ssize_t
> > +userclk_freqcmd_store(struct device *dev, struct device_attribute *attr,
> > +		      const char *buf, size_t count)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	u64 userclk_freq_cmd;
> > +	void __iomem *base;
> > +
> > +	if (kstrtou64(buf, 0, &userclk_freq_cmd))
> > +		return -EINVAL;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return count;
> > +}
> > +static DEVICE_ATTR_WO(userclk_freqcmd);
> > +
> > +static ssize_t
> > +userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr,
> > +			  const char *buf, size_t count)
> > +{
> > +	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > +	u64 userclk_freqcntr_cmd;
> > +	void __iomem *base;
> > +
> > +	if (kstrtou64(buf, 0, &userclk_freqcntr_cmd))
> > +		return -EINVAL;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	mutex_lock(&pdata->lock);
> > +	writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1);
> > +	mutex_unlock(&pdata->lock);
> > +
> > +	return count;
> > +}
> > +static DEVICE_ATTR_WO(userclk_freqcntrcmd);
> > +
> > +static ssize_t
> > +userclk_freqsts_show(struct device *dev, struct device_attribute *attr,
> > +		     char *buf)
> > +{
> > +	u64 userclk_freqsts;
> > +	void __iomem *base;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	userclk_freqsts = readq(base + PORT_HDR_USRCLK_STS0);
> > +
> > +	return sprintf(buf, "0x%llx\n", (unsigned long long)userclk_freqsts);
> > +}
> > +static DEVICE_ATTR_RO(userclk_freqsts);
> > +
> > +static ssize_t
> > +userclk_freqcntrsts_show(struct device *dev, struct device_attribute *attr,
> > +			 char *buf)
> > +{
> > +	u64 userclk_freqcntrsts;
> > +	void __iomem *base;
> > +
> > +	base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > +	userclk_freqcntrsts = readq(base + PORT_HDR_USRCLK_STS1);
> > +
> > +	return sprintf(buf, "0x%llx\n",
> > +		       (unsigned long long)userclk_freqcntrsts);
> > +}
> > +static DEVICE_ATTR_RO(userclk_freqcntrsts);
> > +
> > +static struct attribute *port_hdr_userclk_attrs[] = {
> > +	&dev_attr_userclk_freqcmd.attr,
> > +	&dev_attr_userclk_freqcntrcmd.attr,
> > +	&dev_attr_userclk_freqsts.attr,
> > +	&dev_attr_userclk_freqcntrsts.attr,
> > +	NULL,
> > +};
> > +ATTRIBUTE_GROUPS(port_hdr_userclk);
> > +
> >  static int port_hdr_init(struct platform_device *pdev,
> >  			 struct dfl_feature *feature)
> >  {
> > +	int ret;
> > +
> >  	dev_dbg(&pdev->dev, "PORT HDR Init.\n");
> >  
> >  	port_reset(pdev);
> >  
> > -	return device_add_groups(&pdev->dev, port_hdr_groups);
> > +	ret = device_add_groups(&pdev->dev, port_hdr_groups);
> 
> This all needs to be reworked based on the ability for devices to
> properly add groups when they are bound on probe (the core does it for
> you, no need for the driver to do it.)  But until then, you should at
> least consider:
> 
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * if revision > 0, the userclock will be moved from port hdr register
> > +	 * region to a separated private feature.
> > +	 */
> > +	if (dfl_feature_revision(feature->ioaddr) > 0)
> > +		return 0;
> > +
> > +	ret = device_add_groups(&pdev->dev, port_hdr_userclk_groups);
> > +	if (ret)
> > +		device_remove_groups(&pdev->dev, port_hdr_groups);
> 
> struct attribute_group has is_visible() as a callback to have the core
> show or not show, individual attributes when they are created.  So no
> need for a second group of attributes and you needing to add/remove
> them, just add them all and let the callback handle the "is visible"
> logic.  Makes cleanup _so_ much easier (i.e. you don't have to do it.)

Sure, will use is_visible() here instead in the next version, it does
make thing more clear. Thanks a lot of the comments.

Hao

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [PATCH v4 1/5] mm/page_idle: Add per-pid idle page tracking using virtual indexing
From: Andrew Morton @ 2019-08-06 22:19 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, joelaf, Jonathan Corbet, Kees Cook, kernel-team,
	linux-api, linux-doc, linux-fsdevel, linux-mm, Michal Hocko,
	Mike Rapoport, minchan, namhyung, paulmck, Robin Murphy,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon, Brendan Gregg
In-Reply-To: <20190805170451.26009-1-joel@joelfernandes.org>

(cc Brendan's other email address, hoping for review input ;))

On Mon,  5 Aug 2019 13:04:47 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> The page_idle tracking feature currently requires looking up the pagemap
> for a process followed by interacting with /sys/kernel/mm/page_idle.
> Looking up PFN from pagemap in Android devices is not supported by
> unprivileged process and requires SYS_ADMIN and gives 0 for the PFN.
> 
> This patch adds support to directly interact with page_idle tracking at
> the PID level by introducing a /proc/<pid>/page_idle file.  It follows
> the exact same semantics as the global /sys/kernel/mm/page_idle, but now
> looking up PFN through pagemap is not needed since the interface uses
> virtual frame numbers, and at the same time also does not require
> SYS_ADMIN.
> 
> In Android, we are using this for the heap profiler (heapprofd) which
> profiles and pin points code paths which allocates and leaves memory
> idle for long periods of time. This method solves the security issue
> with userspace learning the PFN, and while at it is also shown to yield
> better results than the pagemap lookup, the theory being that the window
> where the address space can change is reduced by eliminating the
> intermediate pagemap look up stage. In virtual address indexing, the
> process's mmap_sem is held for the duration of the access.

Quite a lot of changes to the page_idle code.  Has this all been
runtime tested on architectures where
CONFIG_HAVE_ARCH_PTE_SWP_PGIDLE=n?  That could be x86 with a little
Kconfig fiddle-for-testing-purposes.

> 8 files changed, 376 insertions(+), 45 deletions(-)

Quite a lot of new code unconditionally added to major architectures. 
Are we confident that everyone will want this feature?

>
> ...
>
> +static int proc_page_idle_open(struct inode *inode, struct file *file)
> +{
> +	struct mm_struct *mm;
> +
> +	mm = proc_mem_open(inode, PTRACE_MODE_READ);
> +	if (IS_ERR(mm))
> +		return PTR_ERR(mm);
> +	file->private_data = mm;
> +	return 0;
> +}
> +
> +static int proc_page_idle_release(struct inode *inode, struct file *file)
> +{
> +	struct mm_struct *mm = file->private_data;
> +
> +	if (mm)

I suspect the test isn't needed?  proc_page_idle_release) won't be
called if proc_page_idle_open() failed?

> +		mmdrop(mm);
> +	return 0;
> +}
>
> ...
>

^ permalink raw reply

* [PATCH AUTOSEL 5.2 10/59] mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
From: Sasha Levin @ 2019-08-06 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christoph Hellwig, Ralph Campbell, Jason Gunthorpe,
	Felix Kuehling, Sasha Levin, linux-mm, linux-doc
In-Reply-To: <20190806213319.19203-1-sashal@kernel.org>

From: Christoph Hellwig <hch@lst.de>

[ Upstream commit 2bcbeaefde2f0384d6ad351c151b1a9fe7791a0a ]

We should not have two different error codes for the same
condition. EAGAIN must be reserved for the FAULT_FLAG_ALLOW_RETRY retry
case and signals to the caller that the mmap_sem has been unlocked.

Use EBUSY for the !valid case so that callers can get the locking right.

Link: https://lore.kernel.org/r/20190724065258.16603-2-hch@lst.de
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
[jgg: elaborated commit message]
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/vm/hmm.rst |  2 +-
 mm/hmm.c                 | 10 ++++------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 7cdf7282e0229..65b6c1109cc81 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -231,7 +231,7 @@ respect in order to keep things properly synchronized. The usage pattern is::
       ret = hmm_range_snapshot(&range);
       if (ret) {
           up_read(&mm->mmap_sem);
-          if (ret == -EAGAIN) {
+          if (ret == -EBUSY) {
             /*
              * No need to check hmm_range_wait_until_valid() return value
              * on retry we will get proper error with hmm_range_snapshot()
diff --git a/mm/hmm.c b/mm/hmm.c
index 4c405dfbd2b3d..27dd9a8816272 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -995,7 +995,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
  * @range: range
  * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
  *          permission (for instance asking for write and range is read only),
- *          -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
+ *          -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
  *          vma or it is illegal to access that range), number of valid pages
  *          in range->pfns[] (from range start address).
  *
@@ -1019,7 +1019,7 @@ long hmm_range_snapshot(struct hmm_range *range)
 	do {
 		/* If range is no longer valid force retry. */
 		if (!range->valid)
-			return -EAGAIN;
+			return -EBUSY;
 
 		vma = find_vma(hmm->mm, start);
 		if (vma == NULL || (vma->vm_flags & device_vma))
@@ -1117,10 +1117,8 @@ long hmm_range_fault(struct hmm_range *range, bool block)
 
 	do {
 		/* If range is no longer valid force retry. */
-		if (!range->valid) {
-			up_read(&hmm->mm->mmap_sem);
-			return -EAGAIN;
-		}
+		if (!range->valid)
+			return -EBUSY;
 
 		vma = find_vma(hmm->mm, start);
 		if (vma == NULL || (vma->vm_flags & device_vma))
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 0/6] hwspinlock: allow sharing of hwspinlocks
From: Suman Anna @ 2019-08-06 21:30 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Fabien DESSENNE, Ohad Ben-Cohen, Rob Herring, Mark Rutland,
	Maxime Coquelin, Alexandre TORGUE, Jonathan Corbet,
	linux-remoteproc@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	Benjamin GAIGNARD
In-Reply-To: <20190806182128.GD26807@tuxbook-pro>

On 8/6/19 1:21 PM, Bjorn Andersson wrote:
> On Tue 06 Aug 10:38 PDT 2019, Suman Anna wrote:
> 
>> Hi Fabien,
>>
>> On 8/5/19 12:46 PM, Bjorn Andersson wrote:
>>> On Mon 05 Aug 01:48 PDT 2019, Fabien DESSENNE wrote:
>>>
>>>>
>>>> On 01/08/2019 9:14 PM, Bjorn Andersson wrote:
>>>>> On Wed 13 Mar 08:50 PDT 2019, Fabien Dessenne wrote:
> [..]
>>>> B/ This would introduce some inconsistency between the two 'request' API
>>>> which are hwspin_lock_request() and hwspin_lock_request_specific().
>>>> hwspin_lock_request() looks for an unused lock, so requests for an exclusive
>>>> usage. On the other side, request_specific() would request shared locks.
>>>> Worst the following sequence can transform an exclusive usage into a shared
>>>>
>>>
>>> There is already an inconsistency in between these; as with above any
>>> system that uses both request() and request_specific() will be suffering
>>> from intermittent failures due to probe ordering.
>>>
>>>> one:
>>>>    -hwspin_lock_request() -> returns Id#0 (exclusive)
>>>>    -hwspin_lock_request() -> returns Id#1 (exclusive)
>>>>    -hwspin_lock_request_specific(0) -> returns Id#0 and makes Id#0 shared
>>>> Honestly I am not sure that this is a real issue, but it's better to have it
>>>> in mind before we take ay decision
>>
>> Wouldn't it be actually simpler to just introduce a new specific API
>> variant for this, similar to the reset core for example (it uses a
>> separate exclusive API), without having to modify the bindings at all.
>> It is just a case of your driver using the right API, and the core can
>> be modified to use the additional tag semantics based on the API. It
>> should avoid any confusion with say using a different second cell value
>> for the same lock in two different nodes.
>>
> 
> But this implies that there is an actual need to hold these locks
> exclusively. Given that they are (except for the raw case) all wrapped
> by Linux locking primitives there shouldn't be a problem sharing a lock
> (except possibly for the raw case).

Yes agreed, the HWLOCK_RAW and HWLOCK_IN_ATOMIC cases are unprotected. I
am still trying to understand better the usecase to see if the same lock
is being multiplexed for different protection contexts, or if all of
them are protecting the same context.

> 
> I agree that we shouldn't specify this property in DT - if anything it
> should be a variant of the API.
> 
>> If you are sharing a hwlock on the Linux side, surely your driver should
>> be aware that it is a shared lock. The tag can be set during the first
>> request API, and you look through both tags when giving out a handle.
>>
> 
> Why would the driver need to know about it?

Just the semantics if we were to support single user vs multiple users
on Linux-side to even get a handle. Your point is that this may be moot
since we have protection anyway other than the raw cases. But we need to
be able to have the same API work across all cases.

So far, it had mostly been that there would be one user on Linux
competing with other equivalent peer entities on different processors.
It is not common to have multiple users since these protection schemes
are usually needed only at the lowest levels of a stack, so the
exclusive handle stuff had been sufficient.

> 
>> Obviously, the hwspin_lock_request() API usage semantics always had the
>> implied additional need for communicating the lock id to the other peer
>> entity, so a realistic usage is most always the specific API variant. I
>> doubt this API would be of much use for the shared driver usage. This
>> also implies that the client user does not care about specifying a lock
>> in DT.
>>
> 
> Afaict if the lock are shared then there shouldn't be a problem with
> some clients using the request API and others request_specific(). As any
> collisions would simply mean that there are more contention on the lock.
> 
> With the current exclusive model that is not possible and the success of
> the request_specific will depend on probe order.
> 
> But perhaps it should be explicitly prohibited to use both APIs on the
> same hwspinlock instance?

Yeah, they are meant to be complimentary usage, though I doubt we will
ever have any realistic users for the generic API if we haven't had a
usage so far. I had posted a concept of reserved locks long back [1] to
keep away certain locks from the generic requestor, but dropped it since
we did not have an actual use-case needing it.

regards
Suman

[1] https://lwn.net/Articles/611944/

^ permalink raw reply

* Re: [PATCH 5/6] tty: serial: Add linflexuart driver for S32V234
From: gregkh @ 2019-08-06 18:40 UTC (permalink / raw)
  To: Stefan-gabriel Mirea
  Cc: corbet@lwn.net, robh+dt@kernel.org, mark.rutland@arm.com,
	catalin.marinas@arm.com, will@kernel.org, shawnguo@kernel.org,
	Leo Li, jslaby@suse.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	linux-serial@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Cosmin Stefan Stoica,
	Larisa Ileana Grigore
In-Reply-To: <HE1PR0402MB28579034C09EB49A76A4F8E7DFD50@HE1PR0402MB2857.eurprd04.prod.outlook.com>

On Tue, Aug 06, 2019 at 05:11:17PM +0000, Stefan-gabriel Mirea wrote:
> On 8/5/2019 6:31 PM, gregkh@linuxfoundation.org wrote:
> > On Fri, Aug 02, 2019 at 07:47:23PM +0000, Stefan-gabriel Mirea wrote:
> >>
> >> +/* Freescale Linflex UART */
> >> +#define PORT_LINFLEXUART     121
> > 
> > Do you really need this modified?
> 
> Hello Greg,
> 
> This macro is meant to be assigned to port->type in the config_port
> method from uart_ops, in order for verify_port to know if the received
> serial_struct structure was really targeted for a LINFlex port. It
> needs to be defined outside, to avoid "collisions" with other drivers.

Yes, I know what it goes to, but does anyone in userspace actually use
it?

> As far as I see, uart_set_info() will actually fail at the
> "baud_base < 9600" check[1], right after calling verify_port(), when
> performing an ioctl() on "/dev/console" with TIOCSSERIAL using a
> serial_struct obtained with TIOCGSERIAL. This happens because this
> reduced version of the LINFlex UART driver will not touch the uartclk
> field of the uart_port (as there is currently no clock support).
> Therefore, the linflex_config/verify_port() functions, along with the
> PORT_LINFLEXUART macro, may be indeed unnecessary at this point (and
> should be added later). Is this what you mean?

No, see below.

> Other than that, I do not see anything wrong with the addition of a
> define in serial_core.h for this purpose (which is also what most of the
> serial drivers do, including amba-pl011.c, mentioned in
> Documentation/driver-api/serial/driver.rst as providing the reference
> implementation), so please be more specific.

I am getting tired of dealing with merge issues with that list, and no
one seems to be able to find where they are really needed for userspace,
especially for new devices.  What happens if you do not have use it?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 0/6] hwspinlock: allow sharing of hwspinlocks
From: Bjorn Andersson @ 2019-08-06 18:21 UTC (permalink / raw)
  To: Suman Anna
  Cc: Fabien DESSENNE, Ohad Ben-Cohen, Rob Herring, Mark Rutland,
	Maxime Coquelin, Alexandre TORGUE, Jonathan Corbet,
	linux-remoteproc@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	Benjamin GAIGNARD
In-Reply-To: <dcd1aeea-cffe-d5fb-af5a-e52efcc2e046@ti.com>

On Tue 06 Aug 10:38 PDT 2019, Suman Anna wrote:

> Hi Fabien,
> 
> On 8/5/19 12:46 PM, Bjorn Andersson wrote:
> > On Mon 05 Aug 01:48 PDT 2019, Fabien DESSENNE wrote:
> > 
> >>
> >> On 01/08/2019 9:14 PM, Bjorn Andersson wrote:
> >>> On Wed 13 Mar 08:50 PDT 2019, Fabien Dessenne wrote:
[..]
> >> B/ This would introduce some inconsistency between the two 'request' API
> >> which are hwspin_lock_request() and hwspin_lock_request_specific().
> >> hwspin_lock_request() looks for an unused lock, so requests for an exclusive
> >> usage. On the other side, request_specific() would request shared locks.
> >> Worst the following sequence can transform an exclusive usage into a shared
> >>
> > 
> > There is already an inconsistency in between these; as with above any
> > system that uses both request() and request_specific() will be suffering
> > from intermittent failures due to probe ordering.
> > 
> >> one:
> >>    -hwspin_lock_request() -> returns Id#0 (exclusive)
> >>    -hwspin_lock_request() -> returns Id#1 (exclusive)
> >>    -hwspin_lock_request_specific(0) -> returns Id#0 and makes Id#0 shared
> >> Honestly I am not sure that this is a real issue, but it's better to have it
> >> in mind before we take ay decision
> 
> Wouldn't it be actually simpler to just introduce a new specific API
> variant for this, similar to the reset core for example (it uses a
> separate exclusive API), without having to modify the bindings at all.
> It is just a case of your driver using the right API, and the core can
> be modified to use the additional tag semantics based on the API. It
> should avoid any confusion with say using a different second cell value
> for the same lock in two different nodes.
> 

But this implies that there is an actual need to hold these locks
exclusively. Given that they are (except for the raw case) all wrapped
by Linux locking primitives there shouldn't be a problem sharing a lock
(except possibly for the raw case).


I agree that we shouldn't specify this property in DT - if anything it
should be a variant of the API.

> If you are sharing a hwlock on the Linux side, surely your driver should
> be aware that it is a shared lock. The tag can be set during the first
> request API, and you look through both tags when giving out a handle.
> 

Why would the driver need to know about it?

> Obviously, the hwspin_lock_request() API usage semantics always had the
> implied additional need for communicating the lock id to the other peer
> entity, so a realistic usage is most always the specific API variant. I
> doubt this API would be of much use for the shared driver usage. This
> also implies that the client user does not care about specifying a lock
> in DT.
> 

Afaict if the lock are shared then there shouldn't be a problem with
some clients using the request API and others request_specific(). As any
collisions would simply mean that there are more contention on the lock.

With the current exclusive model that is not possible and the success of
the request_specific will depend on probe order.

But perhaps it should be explicitly prohibited to use both APIs on the
same hwspinlock instance?

Regards,
Bjorn

^ permalink raw reply

* Re: [PATCH] Documentation: fs: Convert xfs-delayed-logging-design.txt to ReSt
From: Jonathan Corbet @ 2019-08-06 17:46 UTC (permalink / raw)
  To: Sheriff Esseson
  Cc: skhan, linux-kernel-mentees, Darrick J. Wong,
	supporter:XFS FILESYSTEM, open list:DOCUMENTATION, open list
In-Reply-To: <20190806090323.GA16095@localhost>

On Tue, 6 Aug 2019 10:03:23 +0100
Sheriff Esseson <sheriffesseson@gmail.com> wrote:

> Convert xfs-delayed-logging-design.txt to ReST and fix broken references.
> The enumerations at "Lifecycle Changes" breaks because of lines begining with
> "<", treat as diagrams.

[...]

> @@ -27,14 +30,18 @@ written to disk after change D, we would see in the log the following series
>  of transactions, their contents and the log sequence number (LSN) of the
>  transaction:
>  
> +        ============           =========        ==============
>  	Transaction		Contents	LSN
> +        ============           =========        ==============
>  	   A			   A		   X
>  	   B			  A+B		  X+n
>  	   C			 A+B+C		 X+n+m
>  	   D			A+B+C+D		X+n+m+o
>  	    <object written to disk>
> -	   E			   E		   Y (> X+n+m+o)
> +        ------------------------------------------------------
> +	   E			   E		Y (> X+n+m+o)
>  	   F			  E+F		  Y+p
> +        ============           =========        ==============

So this is really more of a diagram than a table; I'd suggest just using a
literal block like you did elsewhere.

[...]

>  Lifecycle Changes
> +=================
>  
> -The existing log item life cycle is as follows:
> +The existing log item life cycle is as follows::
>  
>  	1. Transaction allocate
>  	2. Transaction reserve

This, instead, is a proper outline.  I guess the literal block is OK, but
it feels like we could do better.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH 0/6] hwspinlock: allow sharing of hwspinlocks
From: Suman Anna @ 2019-08-06 17:38 UTC (permalink / raw)
  To: Bjorn Andersson, Fabien DESSENNE
  Cc: Ohad Ben-Cohen, Rob Herring, Mark Rutland, Maxime Coquelin,
	Alexandre TORGUE, Jonathan Corbet,
	linux-remoteproc@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	Benjamin GAIGNARD
In-Reply-To: <20190805174659.GA23928@tuxbook-pro>

Hi Fabien,

On 8/5/19 12:46 PM, Bjorn Andersson wrote:
> On Mon 05 Aug 01:48 PDT 2019, Fabien DESSENNE wrote:
> 
>>
>> On 01/08/2019 9:14 PM, Bjorn Andersson wrote:
>>> On Wed 13 Mar 08:50 PDT 2019, Fabien Dessenne wrote:
>>>
>>>> The current implementation does not allow two different devices to use
>>>> a common hwspinlock. This patch set proposes to have, as an option, some
>>>> hwspinlocks shared between several users.
>>>>
>>>> Below is an example that explain the need for this:
>>>> 	exti: interrupt-controller@5000d000 {
>>>> 		compatible = "st,stm32mp1-exti", "syscon";
>>>> 		interrupt-controller;
>>>> 		#interrupt-cells = <2>;
>>>> 		reg = <0x5000d000 0x400>;
>>>> 		hwlocks = <&hsem 1>;
>>>> 	};
>>>> The two drivers (stm32mp1-exti and syscon) refer to the same hwlock.
>>>> With the current hwspinlock implementation, only the first driver succeeds
>>>> in requesting (hwspin_lock_request_specific) the hwlock. The second request
>>>> fails.

Help me understand the problem that you are trying to solve here. Is
this a case of you having two clients on Linux-side needing to use the
same lock but still requiring the arbitration with software running on
some other remote processor? Are they talking to the same entity on the
remote-side or different peers.

I see the series is all about getting a handle so that they can use the
API, and is the expected usage that the same entity will lock and unlock
before the other driver can lock it.

>>>>
>>>>
>>>> The proposed approach does not modify the API, but extends the DT 'hwlocks'
>>>> property with a second optional parameter (the first one identifies an
>>>> hwlock) that specifies whether an hwlock is requested for exclusive usage
>>>> (current behavior) or can be shared between several users.
>>>> Examples:
>>>> 	hwlocks = <&hsem 8>;	Ref to hwlock #8 for exclusive usage
>>>> 	hwlocks = <&hsem 8 0>;	Ref to hwlock #8 for exclusive (0) usage
>>>> 	hwlocks = <&hsem 8 1>;	Ref to hwlock #8 for shared (1) usage
>>>>
>>>> As a constraint, the #hwlock-cells value must be 1 or 2.
>>>> In the current implementation, this can have theorically any value but:
>>>> - all of the exisiting drivers use the same value : 1.
>>>> - the framework supports only one value : 1 (see implementation of
>>>>    of_hwspin_lock_simple_xlate())
>>>> Hence, it shall not be a problem to restrict this value to 1 or 2 since
>>>> it won't break any driver.
>>>>
>>> Hi Fabien,
>>>
>>> Your series looks good, but it makes me wonder why the hardware locks
>>> should be an exclusive resource.
>>>
>>> How about just making all (specific) locks shared?
>>
>> Hi Bjorn,
>>
>> Making all locks shared is a possible implementation (my first 
>> implementation
>> was going this way) but there are some drawbacks we must be aware of:
>>
>> A/ This theoretically break the legacy behavior (the legacy works with
>> exclusive (UNUSED radix tag) usage). As a consequence, an existing driver
>> that is currently failing to request a lock (already claimed by another
>> user) would now work fine. Not sure that there are such drivers, so this
>> point is probably not a real issue.
>>
> 
> Right, it's possible that a previously misconfigured system now
> successfully probes more than one device that uses a particular
> spinlock. But such system would be suffering from issues related to e.g.
> probe ordering.
> 
> So I think we should ignore this issue.
> 
>> B/ This would introduce some inconsistency between the two 'request' API
>> which are hwspin_lock_request() and hwspin_lock_request_specific().
>> hwspin_lock_request() looks for an unused lock, so requests for an exclusive
>> usage. On the other side, request_specific() would request shared locks.
>> Worst the following sequence can transform an exclusive usage into a shared
>>
> 
> There is already an inconsistency in between these; as with above any
> system that uses both request() and request_specific() will be suffering
> from intermittent failures due to probe ordering.
> 
>> one:
>>    -hwspin_lock_request() -> returns Id#0 (exclusive)
>>    -hwspin_lock_request() -> returns Id#1 (exclusive)
>>    -hwspin_lock_request_specific(0) -> returns Id#0 and makes Id#0 shared
>> Honestly I am not sure that this is a real issue, but it's better to have it
>> in mind before we take ay decision

Wouldn't it be actually simpler to just introduce a new specific API
variant for this, similar to the reset core for example (it uses a
separate exclusive API), without having to modify the bindings at all.
It is just a case of your driver using the right API, and the core can
be modified to use the additional tag semantics based on the API. It
should avoid any confusion with say using a different second cell value
for the same lock in two different nodes.

If you are sharing a hwlock on the Linux side, surely your driver should
be aware that it is a shared lock. The tag can be set during the first
request API, and you look through both tags when giving out a handle.

Obviously, the hwspin_lock_request() API usage semantics always had the
implied additional need for communicating the lock id to the other peer
entity, so a realistic usage is most always the specific API variant. I
doubt this API would be of much use for the shared driver usage. This
also implies that the client user does not care about specifying a lock
in DT.

regards
Suman

> 
> The case where I can see a
> problem with this would be if the two clients somehow would nest their
> locking regions.
> 
> But generally I think this could consider this an improvement, because
> the request_specific() would now be able to acquire its hwlock, with
> some additional contention due to the multiple use.
> 
>> I could not find any driver using the hwspin_lock_request() API, we
>> may decide to remove (or to make deprecated) this API, having
>> everything 'shared without any conditions'.
>>
> 
> It would be nice to have an upstream user of this API.
> 
>>
>> I can see three options:
>> 1- Keep my initial proposition
>> 2- Have hwspin_lock_request_specific() using shared locks and
>>     hwspin_lock_request() using unused (so 'initially' exclusive) locks.
>> 3- Have hwspin_lock_request_specific() using shared locks and
>>     remove/make deprecated hwspin_lock_request().
>>
>> Just let me know what is your preference.
>>
> 
> I think we should start with #2 and would like input from e.g. Suman
> regarding #3.
> 
> Regards,
> Bjorn
> 
>> BR
>>
>> Fabien
>>
>>>
>>> Regards,
>>> Bjorn
>>>
>>>> Fabien Dessenne (6):
>>>>    dt-bindings: hwlock: add support of shared locks
>>>>    hwspinlock: allow sharing of hwspinlocks
>>>>    dt-bindings: hwlock: update STM32 #hwlock-cells value
>>>>    ARM: dts: stm32: Add hwspinlock node for stm32mp157 SoC
>>>>    ARM: dts: stm32: Add hwlock for irqchip on stm32mp157
>>>>    ARM: dts: stm32: hwlocks for GPIO for stm32mp157
>>>>
>>>>   .../devicetree/bindings/hwlock/hwlock.txt          | 27 +++++--
>>>>   .../bindings/hwlock/st,stm32-hwspinlock.txt        |  6 +-
>>>>   Documentation/hwspinlock.txt                       | 10 ++-
>>>>   arch/arm/boot/dts/stm32mp157-pinctrl.dtsi          |  2 +
>>>>   arch/arm/boot/dts/stm32mp157c.dtsi                 | 10 +++
>>>>   drivers/hwspinlock/hwspinlock_core.c               | 82 +++++++++++++++++-----
>>>>   drivers/hwspinlock/hwspinlock_internal.h           |  2 +
>>>>   7 files changed, 108 insertions(+), 31 deletions(-)
>>>>
>>>> -- 
>>>> 2.7.4
>>>>


^ permalink raw reply

* Re: [PATCH v3 0/3] Convert some RCU articles to ReST
From: Jonathan Corbet @ 2019-08-06 17:33 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, linux-doc,
	Mathieu Desnoyers, Mauro Carvalho Chehab, Paul E. McKenney, rcu,
	Steven Rostedt
In-Reply-To: <20190730231030.27510-1-joel@joelfernandes.org>

On Tue, 30 Jul 2019 19:10:27 -0400
"Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> This patch is a respin of the RCU ReST patch from Mauro [1].
> 
> I updated his changelog, and made some fixes.
> 
> [1] https://www.spinics.net/lists/rcu/msg00750.html
> 
> Joel Fernandes (Google) (2):
> docs: rcu: Correct links referring to titles
> docs: rcu: Increase toctree to 3
> 
> Mauro Carvalho Chehab (1):
> docs: rcu: convert some articles from html to ReST

So what is the plan for this series.  Paul, do you want to take it...?

Thanks,

jon

^ permalink raw reply

* Re: [PATCH] docs/zh_CN: update Chinese howto.rst for latexdocs making
From: Jonathan Corbet @ 2019-08-06 17:25 UTC (permalink / raw)
  To: Alex Shi
  Cc: Mauro Carvalho Chehab, Harry Wei, Federico Vaga, SeongJae Park,
	Tom Levy, linux-doc
In-Reply-To: <20190805031758.64156-1-alex.shi@linux.alibaba.com>

On Mon,  5 Aug 2019 11:17:58 +0800
Alex Shi <alex.shi@linux.alibaba.com> wrote:

> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> foundd a reference
> error in Chinese howto.rst. which need update introducations for
> latexdocs/epubdocs format doc making.
> 
> So I update this part according to latest howto.rst.

This one doesn't apply to docs-next, and I hesitate to try to fix it
myself.  Any chance of a respin?

Thanks,

jon

^ permalink raw reply

* Re: [PATCH] Input: docs: fix spelling mistake "potocol" -> "protocol"
From: Jonathan Corbet @ 2019-08-06 17:24 UTC (permalink / raw)
  To: Colin King
  Cc: Henrik Rydberg, Dmitry Torokhov, linux-input, linux-doc,
	kernel-janitors, linux-kernel
In-Reply-To: <20190805104951.26947-1-colin.king@canonical.com>

On Mon,  5 Aug 2019 11:49:51 +0100
Colin King <colin.king@canonical.com> wrote:

> There is a minor spelling mistake in the documentation, fix it.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  Documentation/input/multi-touch-protocol.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/input/multi-touch-protocol.rst b/Documentation/input/multi-touch-protocol.rst
> index 6be70342e709..307fe22d9668 100644
> --- a/Documentation/input/multi-touch-protocol.rst
> +++ b/Documentation/input/multi-touch-protocol.rst
> @@ -23,7 +23,7 @@ devices capable of tracking identifiable contacts (type B), the protocol
>  describes how to send updates for individual contacts via event slots.
>  
>  .. note::
> -   MT potocol type A is obsolete, all kernel drivers have been
> +   MT protocol type A is obsolete, all kernel drivers have been
>     converted to use type B.

Applied, thanks.

jon

^ permalink raw reply

* Re: [PATCH] kernel-doc: ignore __printf attribute
From: Jonathan Corbet @ 2019-08-06 17:23 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-doc@vger.kernel.org, LKML, Brendan Higgins
In-Reply-To: <77cf8297-7de3-4ad1-d497-4ad941012b75@infradead.org>

On Mon, 5 Aug 2019 09:29:50 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Ignore __printf() function attributes just as other __attribute__
> strings are ignored.
> 
> Fixes this kernel-doc warning message:
> include/kunit/kunit-stream.h:58: warning: Function parameter or member '2' not described in '__printf'
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Brendan Higgins <brendanhiggins@google.com>
> Tested-by: Brendan Higgins <brendanhiggins@google.com>

Applied, thanks.

jon

^ permalink raw reply

* Re: [PATCH 5/6] tty: serial: Add linflexuart driver for S32V234
From: Stefan-gabriel Mirea @ 2019-08-06 17:11 UTC (permalink / raw)
  To: gregkh@linuxfoundation.org
  Cc: corbet@lwn.net, robh+dt@kernel.org, mark.rutland@arm.com,
	catalin.marinas@arm.com, will@kernel.org, shawnguo@kernel.org,
	Leo Li, jslaby@suse.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	linux-serial@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Cosmin Stefan Stoica,
	Larisa Ileana Grigore
In-Reply-To: <20190805153114.GA16836@kroah.com>

On 8/5/2019 6:31 PM, gregkh@linuxfoundation.org wrote:
> On Fri, Aug 02, 2019 at 07:47:23PM +0000, Stefan-gabriel Mirea wrote:
>>
>> +/* Freescale Linflex UART */
>> +#define PORT_LINFLEXUART     121
> 
> Do you really need this modified?

Hello Greg,

This macro is meant to be assigned to port->type in the config_port
method from uart_ops, in order for verify_port to know if the received
serial_struct structure was really targeted for a LINFlex port. It
needs to be defined outside, to avoid "collisions" with other drivers.

As far as I see, uart_set_info() will actually fail at the
"baud_base < 9600" check[1], right after calling verify_port(), when
performing an ioctl() on "/dev/console" with TIOCSSERIAL using a
serial_struct obtained with TIOCGSERIAL. This happens because this
reduced version of the LINFlex UART driver will not touch the uartclk
field of the uart_port (as there is currently no clock support).
Therefore, the linflex_config/verify_port() functions, along with the
PORT_LINFLEXUART macro, may be indeed unnecessary at this point (and
should be added later). Is this what you mean?

Other than that, I do not see anything wrong with the addition of a
define in serial_core.h for this purpose (which is also what most of the
serial drivers do, including amba-pl011.c, mentioned in
Documentation/driver-api/serial/driver.rst as providing the reference
implementation), so please be more specific.

Regards,
Stefan

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tty/serial/serial_core.c?h=v5.3-rc1#n872

^ permalink raw reply

* Re: [PATCH] docs: mtd: Update spi nor reference driver
From: Schrempf Frieder @ 2019-08-06 16:40 UTC (permalink / raw)
  To: John Garry, corbet@lwn.net, mchehab+samsung@kernel.org,
	linux-mtd@lists.infradead.org
  Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	marek.vasut@gmail.com, tudor.ambarus@microchip.com,
	broonie@kernel.org, miquel.raynal@bootlin.com, richard@nod.at,
	vigneshr@ti.com
In-Reply-To: <6c4bb892-6cf5-af46-3ace-b333fd47ef14@huawei.com>

Cc: +MTD/SPI-NOR/SPI maintainers

Hi John,

On 06.08.19 18:35, John Garry wrote:
> On 06/08/2019 17:06, John Garry wrote:
>> The reference driver no longer exists since commit 50f1242c6742 ("mtd:
>> fsl-quadspi: Remove the driver as it was replaced by spi-fsl-qspi.c").
>>
>> Update reference to spi-fsl-qspi.c driver.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>>
>> diff --git a/Documentation/driver-api/mtd/spi-nor.rst 
>> b/Documentation/driver-api/mtd/spi-nor.rst
>> index f5333e3bf486..1f0437676762 100644
>> --- a/Documentation/driver-api/mtd/spi-nor.rst
>> +++ b/Documentation/driver-api/mtd/spi-nor.rst
> 
> In fact this document has many references to Freescale QuadSPI - could 
> someone kindly review this complete document for up-to-date accuracy?

The new driver spi-fsl-qspi.c is not a SPI NOR controller driver 
anymore. It is now a SPI controller driver that uses the SPI MEM API, so 
referencing it here is obsolete.

Actually it seems like the whole file is obsolete and needs to be 
removed or replaced by proper documentation of the SPI MEM API.

@Maintainers:
Maybe the docs under Documentation/driver-api/mtd should be officially 
maintained by the MTD subsystem (and added to MAINTAINERS). And if there 
will be some driver API docs for SPI MEM it should probably live in 
Documentation/driver-api/spi instead of Documentation/driver-api/mtd, as 
spi-mem.c itself is in drivers/spi.

Regards,
Frieder

> 
> Thanks,
> John
> 
>> @@ -59,7 +59,7 @@ Part III - How can drivers use the framework?
>>
>>  The main API is spi_nor_scan(). Before you call the hook, a driver 
>> should
>>  initialize the necessary fields for spi_nor{}. Please see
>> -drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to 
>> fsl-quadspi.c
>> +drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to 
>> spi-fsl-qspi.c
>>  when you want to write a new driver for a SPI NOR controller.
>>  Another API is spi_nor_restore(), this is used to restore the status 
>> of SPI
>>  flash chip such as addressing mode. Call it whenever detach the 
>> driver from
>>
> 
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply

* Re: [PATCH] docs: mtd: Update spi nor reference driver
From: John Garry @ 2019-08-06 16:35 UTC (permalink / raw)
  To: corbet, mchehab+samsung, linux-mtd
  Cc: linux-doc, linux-kernel, frieder.schrempf
In-Reply-To: <1565107583-68506-1-git-send-email-john.garry@huawei.com>

On 06/08/2019 17:06, John Garry wrote:
> The reference driver no longer exists since commit 50f1242c6742 ("mtd:
> fsl-quadspi: Remove the driver as it was replaced by spi-fsl-qspi.c").
>
> Update reference to spi-fsl-qspi.c driver.
>
> Signed-off-by: John Garry <john.garry@huawei.com>
>
> diff --git a/Documentation/driver-api/mtd/spi-nor.rst b/Documentation/driver-api/mtd/spi-nor.rst
> index f5333e3bf486..1f0437676762 100644
> --- a/Documentation/driver-api/mtd/spi-nor.rst
> +++ b/Documentation/driver-api/mtd/spi-nor.rst

In fact this document has many references to Freescale QuadSPI - could 
someone kindly review this complete document for up-to-date accuracy?

Thanks,
John

> @@ -59,7 +59,7 @@ Part III - How can drivers use the framework?
>
>  The main API is spi_nor_scan(). Before you call the hook, a driver should
>  initialize the necessary fields for spi_nor{}. Please see
> -drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
> +drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to spi-fsl-qspi.c
>  when you want to write a new driver for a SPI NOR controller.
>  Another API is spi_nor_restore(), this is used to restore the status of SPI
>  flash chip such as addressing mode. Call it whenever detach the driver from
>



^ permalink raw reply

* [PATCH] docs: mtd: Update spi nor reference driver
From: John Garry @ 2019-08-06 16:06 UTC (permalink / raw)
  To: corbet, mchehab+samsung, linux-mtd
  Cc: linux-doc, linux-kernel, frieder.schrempf, John Garry

The reference driver no longer exists since commit 50f1242c6742 ("mtd:
fsl-quadspi: Remove the driver as it was replaced by spi-fsl-qspi.c").

Update reference to spi-fsl-qspi.c driver.

Signed-off-by: John Garry <john.garry@huawei.com>

diff --git a/Documentation/driver-api/mtd/spi-nor.rst b/Documentation/driver-api/mtd/spi-nor.rst
index f5333e3bf486..1f0437676762 100644
--- a/Documentation/driver-api/mtd/spi-nor.rst
+++ b/Documentation/driver-api/mtd/spi-nor.rst
@@ -59,7 +59,7 @@ Part III - How can drivers use the framework?
 
 The main API is spi_nor_scan(). Before you call the hook, a driver should
 initialize the necessary fields for spi_nor{}. Please see
-drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
+drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to spi-fsl-qspi.c
 when you want to write a new driver for a SPI NOR controller.
 Another API is spi_nor_restore(), this is used to restore the status of SPI
 flash chip such as addressing mode. Call it whenever detach the driver from
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v4 3/5] [RFC] arm64: Add support for idle bit in swap PTE
From: Joel Fernandes @ 2019-08-06 15:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Michal Hocko, linux-kernel, Robin Murphy, Alexey Dobriyan,
	Andrew Morton, Borislav Petkov, Brendan Gregg, Catalin Marinas,
	Christian Hansen, dancol, fmayer, H. Peter Anvin, Ingo Molnar,
	Jonathan Corbet, Kees Cook, kernel-team, linux-api, linux-doc,
	linux-fsdevel, linux-mm, Mike Rapoport, namhyung, paulmck,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806144747.GA72938@google.com>

On Tue, Aug 06, 2019 at 11:47:47PM +0900, Minchan Kim wrote:
> On Tue, Aug 06, 2019 at 01:57:03PM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 07:14:46, Joel Fernandes wrote:
> > > On Tue, Aug 06, 2019 at 12:47:55PM +0200, Michal Hocko wrote:
> > > > On Tue 06-08-19 06:36:27, Joel Fernandes wrote:
> > > > > On Tue, Aug 06, 2019 at 10:42:03AM +0200, Michal Hocko wrote:
> > > > > > On Mon 05-08-19 13:04:49, Joel Fernandes (Google) wrote:
> > > > > > > This bit will be used by idle page tracking code to correctly identify
> > > > > > > if a page that was swapped out was idle before it got swapped out.
> > > > > > > Without this PTE bit, we lose information about if a page is idle or not
> > > > > > > since the page frame gets unmapped.
> > > > > > 
> > > > > > And why do we need that? Why cannot we simply assume all swapped out
> > > > > > pages to be idle? They were certainly idle enough to be reclaimed,
> > > > > > right? Or what does idle actualy mean here?
> > > > > 
> > > > > Yes, but other than swapping, in Android a page can be forced to be swapped
> > > > > out as well using the new hints that Minchan is adding?
> > > > 
> > > > Yes and that is effectivelly making them idle, no?
> > > 
> > > That depends on how you think of it.
> > 
> > I would much prefer to have it documented so that I do not have to guess ;)
> > 
> > > If you are thinking of a monitoring
> > > process like a heap profiler, then from the heap profiler's (that only cares
> > > about the process it is monitoring) perspective it will look extremely odd if
> > > pages that are recently accessed by the process appear to be idle which would
> > > falsely look like those processes are leaking memory. The reality being,
> > > Android forced those pages into swap because of other reasons. I would like
> > > for the swapping mechanism, whether forced swapping or memory reclaim, not to
> > > interfere with the idle detection.
> > 
> > Hmm, but how are you going to handle situation when the page is unmapped
> > and refaulted again (e.g. a normal reclaim of a pagecache)? You are
> > losing that information same was as in the swapout case, no? Or am I
> > missing something?
> 
> If page is unmapped, it's not a idle memory any longer because it's
> free memory. We could detect the pte is not present.

I think Michal is not talking of explictly being unmapped, but about the case
where a file-backed mapped page is unmapped due to memory pressure ? This is
similar to the swap situation.

Basically... file page is marked idle, then it is accessed by userspace. Then
memory pressure drops it off the page cache so the idle information is lost.
Next time we check the page_idle, we miss that it was accessed indeed.

It is not an issue for the heap profiler or anonymous memory per-se. But is
similar to the swap situation.

> If page is refaulted, it's not a idle memory any longer because it's
> accessed again. We could detect it because the newly allocated page
> doesn't have a PG_idle page flag.

In the refault case, yes it should not be a problem.

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v4 3/5] [RFC] arm64: Add support for idle bit in swap PTE
From: Minchan Kim @ 2019-08-06 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Joel Fernandes, linux-kernel, Robin Murphy, Alexey Dobriyan,
	Andrew Morton, Borislav Petkov, Brendan Gregg, Catalin Marinas,
	Christian Hansen, dancol, fmayer, H. Peter Anvin, Ingo Molnar,
	Jonathan Corbet, Kees Cook, kernel-team, linux-api, linux-doc,
	linux-fsdevel, linux-mm, Mike Rapoport, namhyung, paulmck,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806115703.GY11812@dhcp22.suse.cz>

On Tue, Aug 06, 2019 at 01:57:03PM +0200, Michal Hocko wrote:
> On Tue 06-08-19 07:14:46, Joel Fernandes wrote:
> > On Tue, Aug 06, 2019 at 12:47:55PM +0200, Michal Hocko wrote:
> > > On Tue 06-08-19 06:36:27, Joel Fernandes wrote:
> > > > On Tue, Aug 06, 2019 at 10:42:03AM +0200, Michal Hocko wrote:
> > > > > On Mon 05-08-19 13:04:49, Joel Fernandes (Google) wrote:
> > > > > > This bit will be used by idle page tracking code to correctly identify
> > > > > > if a page that was swapped out was idle before it got swapped out.
> > > > > > Without this PTE bit, we lose information about if a page is idle or not
> > > > > > since the page frame gets unmapped.
> > > > > 
> > > > > And why do we need that? Why cannot we simply assume all swapped out
> > > > > pages to be idle? They were certainly idle enough to be reclaimed,
> > > > > right? Or what does idle actualy mean here?
> > > > 
> > > > Yes, but other than swapping, in Android a page can be forced to be swapped
> > > > out as well using the new hints that Minchan is adding?
> > > 
> > > Yes and that is effectivelly making them idle, no?
> > 
> > That depends on how you think of it.
> 
> I would much prefer to have it documented so that I do not have to guess ;)
> 
> > If you are thinking of a monitoring
> > process like a heap profiler, then from the heap profiler's (that only cares
> > about the process it is monitoring) perspective it will look extremely odd if
> > pages that are recently accessed by the process appear to be idle which would
> > falsely look like those processes are leaking memory. The reality being,
> > Android forced those pages into swap because of other reasons. I would like
> > for the swapping mechanism, whether forced swapping or memory reclaim, not to
> > interfere with the idle detection.
> 
> Hmm, but how are you going to handle situation when the page is unmapped
> and refaulted again (e.g. a normal reclaim of a pagecache)? You are
> losing that information same was as in the swapout case, no? Or am I
> missing something?

If page is unmapped, it's not a idle memory any longer because it's
free memory. We could detect the pte is not present.

If page is refaulted, it's not a idle memory any longer because it's
accessed again. We could detect it because the newly allocated page
doesn't have a PG_idle page flag.

Both case, idle page tracking couldn't report them as IDLE so it's okay.

^ permalink raw reply

* Re: [PATCH v4 3/5] [RFC] arm64: Add support for idle bit in swap PTE
From: Michal Hocko @ 2019-08-06 14:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Robin Murphy, Alexey Dobriyan, Andrew Morton,
	Borislav Petkov, Brendan Gregg, Catalin Marinas, Christian Hansen,
	dancol, fmayer, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Kees Cook, kernel-team, linux-api, linux-doc, linux-fsdevel,
	linux-mm, Mike Rapoport, minchan, namhyung, paulmck,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806134321.GA15167@google.com>

On Tue 06-08-19 09:43:21, Joel Fernandes wrote:
> On Tue, Aug 06, 2019 at 01:57:03PM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 07:14:46, Joel Fernandes wrote:
> > > On Tue, Aug 06, 2019 at 12:47:55PM +0200, Michal Hocko wrote:
> > > > On Tue 06-08-19 06:36:27, Joel Fernandes wrote:
> > > > > On Tue, Aug 06, 2019 at 10:42:03AM +0200, Michal Hocko wrote:
> > > > > > On Mon 05-08-19 13:04:49, Joel Fernandes (Google) wrote:
> > > > > > > This bit will be used by idle page tracking code to correctly identify
> > > > > > > if a page that was swapped out was idle before it got swapped out.
> > > > > > > Without this PTE bit, we lose information about if a page is idle or not
> > > > > > > since the page frame gets unmapped.
> > > > > > 
> > > > > > And why do we need that? Why cannot we simply assume all swapped out
> > > > > > pages to be idle? They were certainly idle enough to be reclaimed,
> > > > > > right? Or what does idle actualy mean here?
> > > > > 
> > > > > Yes, but other than swapping, in Android a page can be forced to be swapped
> > > > > out as well using the new hints that Minchan is adding?
> > > > 
> > > > Yes and that is effectivelly making them idle, no?
> > > 
> > > That depends on how you think of it.
> > 
> > I would much prefer to have it documented so that I do not have to guess ;)
> 
> Sure :)
> 
> > > If you are thinking of a monitoring
> > > process like a heap profiler, then from the heap profiler's (that only cares
> > > about the process it is monitoring) perspective it will look extremely odd if
> > > pages that are recently accessed by the process appear to be idle which would
> > > falsely look like those processes are leaking memory. The reality being,
> > > Android forced those pages into swap because of other reasons. I would like
> > > for the swapping mechanism, whether forced swapping or memory reclaim, not to
> > > interfere with the idle detection.
> > 
> > Hmm, but how are you going to handle situation when the page is unmapped
> > and refaulted again (e.g. a normal reclaim of a pagecache)? You are
> > losing that information same was as in the swapout case, no? Or am I
> > missing something?
> 
> Yes you are right, it would have the same issue, thanks for bringing it up.
> Should we rename this bit to PTE_IDLE and do the same thing that we are doing
> for swap?

What if we decide to tear the page table down as well? E.g. because we
can reclaim file backed mappings and free some memory used for page
tables. We do not do that right now but I can see that really large
mappings might push us that direction. Sure this is mostly a theoretical
concern but I am wondering whether promissing to keep the idle bit over
unmapping is not too much.

I am not sure how to deal with this myself, TBH. In any case the current
semantic - via pfn - will lose the idle bit already so can we mimic it
as well? We only have 1 bit for each address which makes it challenging.
The easiest way would be to declare that the idle bit might disappear on
activating or reclaiming the page. How well that suits different
usecases is a different question. I would be interested in hearing from
other people about this of course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v4 4/5] page_idle: Drain all LRU pagevec before idle tracking
From: Joel Fernandes @ 2019-08-06 13:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Alexey Dobriyan, Andrew Morton, Borislav Petkov,
	Brendan Gregg, Catalin Marinas, Christian Hansen, dancol, fmayer,
	H. Peter Anvin, Ingo Molnar, Jonathan Corbet, Kees Cook,
	kernel-team, linux-api, linux-doc, linux-fsdevel, linux-mm,
	Mike Rapoport, minchan, namhyung, paulmck, Robin Murphy,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806114402.GX11812@dhcp22.suse.cz>

On Tue, Aug 06, 2019 at 01:44:02PM +0200, Michal Hocko wrote:
[snip]
> > > > This operation even if expensive is only done once during the access of the
> > > > page_idle file. Did you have a better fix in mind?
> > > 
> > > Can we set the idle bit also for non-lru pages as long as they are
> > > reachable via pte?
> > 
> > Not at the moment with the current page idle tracking code. PageLRU(page)
> > flag is checked in page_idle_get_page().
> 
> yes, I am aware of the current code. I strongly suspect that the PageLRU
> check was there to not mark arbitrary page looked up by pfn with the
> idle bit because that would be unexpected. But I might be easily wrong
> here.

Yes, quite possible.

> > Even if we could set it for non-LRU, the idle bit (page flag) would not be
> > cleared if page is not on LRU because page-reclaim code (page_referenced() I
> > believe) would not clear it.
> 
> Yes, it is either reclaim when checking references as you say but also
> mark_page_accessed. I believe the later might still have the page on the
> pcp LRU add cache. Maybe I am missing something something but it seems
> that there is nothing fundamentally requiring the user mapped page to be
> on the LRU list when seting the idle bit.
> 
> That being said, your big hammer approach will work more reliable but if
> you do not feel like changing the underlying PageLRU assumption then
> document that draining should be removed longterm.

Yes, at the moment I am in preference of keeping the underlying assumption
same. I am Ok with adding of a comment on the drain call that it is to be
removed longterm.

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v4 3/5] [RFC] arm64: Add support for idle bit in swap PTE
From: Joel Fernandes @ 2019-08-06 13:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Robin Murphy, Alexey Dobriyan, Andrew Morton,
	Borislav Petkov, Brendan Gregg, Catalin Marinas, Christian Hansen,
	dancol, fmayer, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Kees Cook, kernel-team, linux-api, linux-doc, linux-fsdevel,
	linux-mm, Mike Rapoport, minchan, namhyung, paulmck,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806115703.GY11812@dhcp22.suse.cz>

On Tue, Aug 06, 2019 at 01:57:03PM +0200, Michal Hocko wrote:
> On Tue 06-08-19 07:14:46, Joel Fernandes wrote:
> > On Tue, Aug 06, 2019 at 12:47:55PM +0200, Michal Hocko wrote:
> > > On Tue 06-08-19 06:36:27, Joel Fernandes wrote:
> > > > On Tue, Aug 06, 2019 at 10:42:03AM +0200, Michal Hocko wrote:
> > > > > On Mon 05-08-19 13:04:49, Joel Fernandes (Google) wrote:
> > > > > > This bit will be used by idle page tracking code to correctly identify
> > > > > > if a page that was swapped out was idle before it got swapped out.
> > > > > > Without this PTE bit, we lose information about if a page is idle or not
> > > > > > since the page frame gets unmapped.
> > > > > 
> > > > > And why do we need that? Why cannot we simply assume all swapped out
> > > > > pages to be idle? They were certainly idle enough to be reclaimed,
> > > > > right? Or what does idle actualy mean here?
> > > > 
> > > > Yes, but other than swapping, in Android a page can be forced to be swapped
> > > > out as well using the new hints that Minchan is adding?
> > > 
> > > Yes and that is effectivelly making them idle, no?
> > 
> > That depends on how you think of it.
> 
> I would much prefer to have it documented so that I do not have to guess ;)

Sure :)

> > If you are thinking of a monitoring
> > process like a heap profiler, then from the heap profiler's (that only cares
> > about the process it is monitoring) perspective it will look extremely odd if
> > pages that are recently accessed by the process appear to be idle which would
> > falsely look like those processes are leaking memory. The reality being,
> > Android forced those pages into swap because of other reasons. I would like
> > for the swapping mechanism, whether forced swapping or memory reclaim, not to
> > interfere with the idle detection.
> 
> Hmm, but how are you going to handle situation when the page is unmapped
> and refaulted again (e.g. a normal reclaim of a pagecache)? You are
> losing that information same was as in the swapout case, no? Or am I
> missing something?

Yes you are right, it would have the same issue, thanks for bringing it up.
Should we rename this bit to PTE_IDLE and do the same thing that we are doing
for swap?

i.e. if (page_idle(page)) and page is a file page, then we write state
into the PTE of the page. Later on refault, the PTE bit would automatically
get cleared (just like it does on swap-in). But before refault, the idle
tracking code sees the page as still marked idle. Do you see any issue with that?


> > This is just an effort to make the idle tracking a little bit better. We
> > would like to not lose the 'accessed' information of the pages.
> > 
> > Initially, I had proposed what you are suggesting as well however the above
> > reasons made me to do it like this. Also Minchan and Konstantin suggested
> > this, so there are more people interested in the swap idle bit. Minchan, can
> > you provide more thoughts here? (He is on 2-week vacation from today so
> > hopefully replies before he vanishes ;-)).
> 
> We can move on with the rest of the series in the mean time but I would
> like to see a proper justification for the swap entries and why they
> should be handled special.

Ok, I will improve the changelog.


> > Also assuming all swap pages as idle has other "semantic" issues. It is quite
> > odd if a swapped page is automatically marked as idle without userspace
> > telling it to. Consider the following set of events: 1. Userspace marks only
> > a certain memory region as idle. 2. Userspace reads back the bits
> > corresponding to a bigger region. Part of this bigger region is swapped.
> > Userspace expects all of the pages it did not mark, to have idle bit set to
> > '0' because it never marked them as idle. However if it is now surprised by
> > what it read back (not all '0' read back). Since a page is swapped, it will
> > be now marked "automatically" as idle as per your proposal, even if userspace
> > never marked it explicity before. This would be quite confusing/ambiguous.
> 
> OK, I see. I guess the primary question I have is how do you distinguish
> Idle page which got unmapped and faulted in again from swapped out page
> and refaulted - including the time the pte is not present.

Ok, lets discuss more.

thanks Michal!

 - Joel


^ permalink raw reply

* Re: [PATCH v4 3/5] [RFC] arm64: Add support for idle bit in swap PTE
From: Michal Hocko @ 2019-08-06 11:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Robin Murphy, Alexey Dobriyan, Andrew Morton,
	Borislav Petkov, Brendan Gregg, Catalin Marinas, Christian Hansen,
	dancol, fmayer, H. Peter Anvin, Ingo Molnar, Jonathan Corbet,
	Kees Cook, kernel-team, linux-api, linux-doc, linux-fsdevel,
	linux-mm, Mike Rapoport, minchan, namhyung, paulmck,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190806111446.GA117316@google.com>

On Tue 06-08-19 07:14:46, Joel Fernandes wrote:
> On Tue, Aug 06, 2019 at 12:47:55PM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 06:36:27, Joel Fernandes wrote:
> > > On Tue, Aug 06, 2019 at 10:42:03AM +0200, Michal Hocko wrote:
> > > > On Mon 05-08-19 13:04:49, Joel Fernandes (Google) wrote:
> > > > > This bit will be used by idle page tracking code to correctly identify
> > > > > if a page that was swapped out was idle before it got swapped out.
> > > > > Without this PTE bit, we lose information about if a page is idle or not
> > > > > since the page frame gets unmapped.
> > > > 
> > > > And why do we need that? Why cannot we simply assume all swapped out
> > > > pages to be idle? They were certainly idle enough to be reclaimed,
> > > > right? Or what does idle actualy mean here?
> > > 
> > > Yes, but other than swapping, in Android a page can be forced to be swapped
> > > out as well using the new hints that Minchan is adding?
> > 
> > Yes and that is effectivelly making them idle, no?
> 
> That depends on how you think of it.

I would much prefer to have it documented so that I do not have to guess ;)

> If you are thinking of a monitoring
> process like a heap profiler, then from the heap profiler's (that only cares
> about the process it is monitoring) perspective it will look extremely odd if
> pages that are recently accessed by the process appear to be idle which would
> falsely look like those processes are leaking memory. The reality being,
> Android forced those pages into swap because of other reasons. I would like
> for the swapping mechanism, whether forced swapping or memory reclaim, not to
> interfere with the idle detection.

Hmm, but how are you going to handle situation when the page is unmapped
and refaulted again (e.g. a normal reclaim of a pagecache)? You are
losing that information same was as in the swapout case, no? Or am I
missing something?

> This is just an effort to make the idle tracking a little bit better. We
> would like to not lose the 'accessed' information of the pages.
> 
> Initially, I had proposed what you are suggesting as well however the above
> reasons made me to do it like this. Also Minchan and Konstantin suggested
> this, so there are more people interested in the swap idle bit. Minchan, can
> you provide more thoughts here? (He is on 2-week vacation from today so
> hopefully replies before he vanishes ;-)).

We can move on with the rest of the series in the mean time but I would
like to see a proper justification for the swap entries and why they
should be handled special.

> Also assuming all swap pages as idle has other "semantic" issues. It is quite
> odd if a swapped page is automatically marked as idle without userspace
> telling it to. Consider the following set of events: 1. Userspace marks only
> a certain memory region as idle. 2. Userspace reads back the bits
> corresponding to a bigger region. Part of this bigger region is swapped.
> Userspace expects all of the pages it did not mark, to have idle bit set to
> '0' because it never marked them as idle. However if it is now surprised by
> what it read back (not all '0' read back). Since a page is swapped, it will
> be now marked "automatically" as idle as per your proposal, even if userspace
> never marked it explicity before. This would be quite confusing/ambiguous.

OK, I see. I guess the primary question I have is how do you distinguish
Idle page which got unmapped and faulted in again from swapped out page
and refaulted - including the time the pte is not present.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox