Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v15 05/13] clk: ingenic: Add driver for the TCU clocks
From: Stephen Boyd @ 2019-08-08 14:48 UTC (permalink / raw)
  To: Arnd Bergmann, Daniel Lezcano, James Hogan, Jason Cooper,
	Jonathan Corbet, Lee Jones, Marc Zyngier, Mark Rutland,
	Michael Turquette, Paul Burton, Paul Cercueil, Ralf Baechle,
	Rob Herring, Thomas Gleixner
  Cc: devicetree, linux-kernel, linux-doc, linux-mips, linux-clk, od,
	Mathieu Malaterre, Paul Cercueil, Artur Rojek
In-Reply-To: <20190724171615.20774-6-paul@crapouillou.net>

Quoting Paul Cercueil (2019-07-24 10:16:07)
> Add driver to support the clocks provided by the Timer/Counter Unit
> (TCU) of the JZ47xx SoCs from Ingenic.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> Tested-by: Mathieu Malaterre <malat@debian.org>
> Tested-by: Artur Rojek <contact@artur-rojek.eu>
> ---

Acked-by: Stephen Boyd <sboyd@kernel.org>


^ permalink raw reply

* Re: [PATCH] mailmap: add entry for Jaegeuk Kim
From: Chao Yu @ 2019-08-08 14:37 UTC (permalink / raw)
  To: Jonathan Corbet, Chao Yu; +Cc: linux-doc, linux-kernel, jaegeuk
In-Reply-To: <fe9cd2bc-76ed-5371-e0c3-b538e7a805e7@kernel.org>

On 2019-8-2 22:23, Chao Yu wrote:
> On 2019-8-2 21:26, Jonathan Corbet wrote:
>> On Fri, 2 Aug 2019 09:21:35 +0800
>> Chao Yu <yuchao0@huawei.com> wrote:
>>
>>> Add entry to connect all Jaegeuk's email addresses.
>>>
>>> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>> ---
>>>  .mailmap | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/.mailmap b/.mailmap
>>> index 477debe3d960..70d41c86e644 100644
>>> --- a/.mailmap
>>> +++ b/.mailmap
>>> @@ -89,6 +89,9 @@ Henrik Kretzschmar <henne@nachtwindheim.de>
>>>  Henrik Rydberg <rydberg@bitmath.org>
>>>  Herbert Xu <herbert@gondor.apana.org.au>
>>>  Jacob Shin <Jacob.Shin@amd.com>
>>> +Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
>>> +Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@motorola.com>
>>> +Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
>>
>> So as I understand it, the mailmap file is there mostly to ensure that a
>> person's changesets are properly collected in 'git shortlog' and such.  As
>> documented on the man page, it is used when a person's name is spelled
>> differently at different times.
>>
>> That doesn't appear to be the case here, and shortlog output is correct
>> already.  Given that, do we *really* need to maintain a collection of old
>> email addresses in the mailmap file?  What is the benefit of that?
> 
> IMO, when we use git-blame to find out who is response for specified code, w/o
> mailmap we may just found old obsolete email address in the related commit; even
> we can search full name for his/her new email address, how can we make sure they
> are the same person... so anyway, it can help to find last valid/canonical email
> address of someone.

Any thoughts?

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>> jon
>>

^ permalink raw reply

* [PATCH v4 02/17] drm/ttm: add gem_ttm_bo_device_init()
From: Gerd Hoffmann @ 2019-08-08 13:44 UTC (permalink / raw)
  To: dri-devel
  Cc: tzimmermann, Gerd Hoffmann, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, Jonathan Corbet,
	open list:DOCUMENTATION, open list
In-Reply-To: <20190808134417.10610-1-kraxel@redhat.com>

Now with ttm_buffer_object being a subclass of drm_gem_object we can
easily lookup ttm_buffer_object for a given drm_gem_object, which in
turm allows to create common helper functions.

This patch starts off with a gem_ttm_bo_device_init() helper function
which initializes ttm with the vma offset manager used by gem, to make
sure gem and ttm have the same view on vma offsets.

With that in place gem+ttm drivers don't need their private
drm_driver.dumb_map_offset implementation any more.

v3:
 - complete rewrite

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 include/drm/drm_gem_ttm_helper.h     | 30 +++++++++++++++++++++++
 drivers/gpu/drm/drm_gem_ttm_helper.c | 36 ++++++++++++++++++++++++++++
 Documentation/gpu/drm-mm.rst         | 12 ++++++++++
 drivers/gpu/drm/Kconfig              |  7 ++++++
 drivers/gpu/drm/Makefile             |  3 +++
 5 files changed, 88 insertions(+)
 create mode 100644 include/drm/drm_gem_ttm_helper.h
 create mode 100644 drivers/gpu/drm/drm_gem_ttm_helper.c

diff --git a/include/drm/drm_gem_ttm_helper.h b/include/drm/drm_gem_ttm_helper.h
new file mode 100644
index 000000000000..43c9db3583cc
--- /dev/null
+++ b/include/drm/drm_gem_ttm_helper.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef DRM_GEM_TTM_HELPER_H
+#define DRM_GEM_TTM_HELPER_H
+
+#include <linux/kernel.h>
+
+#include <drm/drm_gem.h>
+#include <drm/drm_device.h>
+#include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
+
+/**
+ * Returns the container of type &struct ttm_buffer_object
+ * for field base.
+ * @gem:	the GEM object
+ * Returns:	The containing GEM VRAM object
+ */
+static inline struct ttm_buffer_object *drm_gem_ttm_of_gem(
+	struct drm_gem_object *gem)
+{
+	return container_of(gem, struct ttm_buffer_object, base);
+}
+
+int drm_gem_ttm_bo_device_init(struct drm_device *dev,
+			       struct ttm_bo_device *bdev,
+			       struct ttm_bo_driver *driver,
+			       bool need_dma32);
+
+#endif
diff --git a/drivers/gpu/drm/drm_gem_ttm_helper.c b/drivers/gpu/drm/drm_gem_ttm_helper.c
new file mode 100644
index 000000000000..0c57e9fd50b9
--- /dev/null
+++ b/drivers/gpu/drm/drm_gem_ttm_helper.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <drm/drm_gem_ttm_helper.h>
+
+/**
+ * DOC: overview
+ *
+ * This library provides helper functions for gem objects backed by
+ * ttm.
+ */
+
+/**
+ * drm_gem_ttm_bo_device_init - ttm init for devices which use gem+ttm
+ *
+ * @dev: A pointer to a struct drm_device.
+ * @bdev: A pointer to a struct ttm_bo_device to initialize.
+ * @driver: A pointer to a struct ttm_bo_driver set up by the caller.
+ * @need_dma32: Whenever the device is limited to 32bit DMA.
+ *
+ * This initializes ttm with dev->vma_offset_manager, so gem and ttm
+ * fuction are working with the same vma_offset_manager.
+ *
+ * Returns:
+ * !0: Failure.
+ */
+int drm_gem_ttm_bo_device_init(struct drm_device *dev,
+			       struct ttm_bo_device *bdev,
+			       struct ttm_bo_driver *driver,
+			       bool need_dma32)
+{
+	return ttm_bo_device_init_with_vma_manager(bdev, driver,
+						   dev->anon_inode->i_mapping,
+						   dev->vma_offset_manager,
+						   need_dma32);
+}
+EXPORT_SYMBOL(drm_gem_ttm_bo_device_init);
diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index b664f054c259..a70a1d9f30ec 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -412,6 +412,18 @@ VRAM MM Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_vram_mm_helper.c
    :export:
 
+GEM TTM Helper Functions Reference
+-----------------------------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
+   :doc: overview
+
+.. kernel-doc:: include/drm/drm_gem_ttm_helper.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
+   :export:
+
 VMA Offset Manager
 ==================
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e6f40fb54c9a..f7b25519f95c 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -172,6 +172,13 @@ config DRM_VRAM_HELPER
 	help
 	  Helpers for VRAM memory management
 
+config DRM_TTM_HELPER
+	tristate
+	depends on DRM
+	select DRM_TTM
+	help
+	  Helpers for ttm-based gem objects
+
 config DRM_GEM_CMA_HELPER
 	bool
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 10f8329a8b71..545c61d6528b 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -37,6 +37,9 @@ drm_vram_helper-y := drm_gem_vram_helper.o \
 		     drm_vram_mm_helper.o
 obj-$(CONFIG_DRM_VRAM_HELPER) += drm_vram_helper.o
 
+drm_ttm_helper-y := drm_gem_ttm_helper.o
+obj-$(CONFIG_DRM_TTM_HELPER) += drm_ttm_helper.o
+
 drm_kms_helper-y := drm_crtc_helper.o drm_dp_helper.o drm_dsc.o drm_probe_helper.o \
 		drm_plane_helper.o drm_dp_mst_topology.o drm_atomic_helper.o \
 		drm_kms_helper_common.o drm_dp_dual_mode_helper.o \
-- 
2.18.1


^ permalink raw reply related

* Re: [PATCH 0/6] hwspinlock: allow sharing of hwspinlocks
From: Fabien DESSENNE @ 2019-08-08 12:52 UTC (permalink / raw)
  To: Suman Anna, Bjorn Andersson
  Cc: Ohad Ben-Cohen, Rob Herring, Mark Rutland, Maxime Coquelin,
	Alexandre TORGUE, Jonathan Corbet,
	linux-remoteproc@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	Benjamin GAIGNARD
In-Reply-To: <f0893b3f-0124-007a-3ca2-831f60ad9a80@ti.com>


On 07/08/2019 6:19 PM, Suman Anna wrote:
> Hi Fabien,
>
> On 8/7/19 3:39 AM, Fabien DESSENNE wrote:
>> Hi
>>
>> On 06/08/2019 11:30 PM, Suman Anna wrote:
>>> On 8/6/19 1:21 PM, Bjorn Andersson wrote:
>>>> On Tue 06 Aug 10:38 PDT 2019, Suman Anna wrote:
>>>>
>>>>> Hi Fabien,
>>>>>
>>>>> On 8/5/19 12:46 PM, Bjorn Andersson wrote:
>>>>>> On Mon 05 Aug 01:48 PDT 2019, Fabien DESSENNE wrote:
>>>>>>
>>>>>>> On 01/08/2019 9:14 PM, Bjorn Andersson wrote:
>>>>>>>> On Wed 13 Mar 08:50 PDT 2019, Fabien Dessenne wrote:
>>>> [..]
>>>>>>> B/ This would introduce some inconsistency between the two 'request' API
>>>>>>> which are hwspin_lock_request() and hwspin_lock_request_specific().
>>>>>>> hwspin_lock_request() looks for an unused lock, so requests for an exclusive
>>>>>>> usage. On the other side, request_specific() would request shared locks.
>>>>>>> Worst the following sequence can transform an exclusive usage into a shared
>>>>>>>
>>>>>> There is already an inconsistency in between these; as with above any
>>>>>> system that uses both request() and request_specific() will be suffering
>>>>>> from intermittent failures due to probe ordering.
>>>>>>
>>>>>>> one:
>>>>>>>      -hwspin_lock_request() -> returns Id#0 (exclusive)
>>>>>>>      -hwspin_lock_request() -> returns Id#1 (exclusive)
>>>>>>>      -hwspin_lock_request_specific(0) -> returns Id#0 and makes Id#0 shared
>>>>>>> Honestly I am not sure that this is a real issue, but it's better to have it
>>>>>>> in mind before we take ay decision
>>>>> Wouldn't it be actually simpler to just introduce a new specific API
>>>>> variant for this, similar to the reset core for example (it uses a
>>>>> separate exclusive API), without having to modify the bindings at all.
>>>>> It is just a case of your driver using the right API, and the core can
>>>>> be modified to use the additional tag semantics based on the API. It
>>>>> should avoid any confusion with say using a different second cell value
>>>>> for the same lock in two different nodes.
>>>>>
>>>> But this implies that there is an actual need to hold these locks
>>>> exclusively. Given that they are (except for the raw case) all wrapped
>>>> by Linux locking primitives there shouldn't be a problem sharing a lock
>>>> (except possibly for the raw case).
>>> Yes agreed, the HWLOCK_RAW and HWLOCK_IN_ATOMIC cases are unprotected. I
>>> am still trying to understand better the usecase to see if the same lock
>>> is being multiplexed for different protection contexts, or if all of
>>> them are protecting the same context.
>>
>> Here are two different examples that explain the need for changes.
>> In both cases the Linux clients are talking to a single entity on the
>> remote-side.
>>
>> Example 1:
>>       exti: interrupt-controller@5000d000 {
>>           compatible = "st,stm32mp1-exti", "syscon";
>>           interrupt-controller;
>>           #interrupt-cells = <2>;
>>           reg = <0x5000d000 0x400>;
>>           hwlocks = <&hsem 1>;
>>       };
>> The two drivers (stm32mp1-exti and syscon) refer to the same hwlock.
>> With the current hwspinlock implementation, only the first driver succeeds
>> in requesting (hwspin_lock_request_specific) the hwlock. The second request
>> fails.
>> Here, we really need to share the hwlock between the two drivers.
>> Note: hardware spinlock support for regmap was 'recently' introduced in 4.15
>> see https://lore.kernel.org/patchwork/patch/845941/
>>
>>
>>
>> Example 2:
>> Here it is more a question of optimization : we want to save the number of
>> hwlocks used to protect resources, using an unique hwlock to protect all
>> pinctrl resources:
>>           pinctrl: pin-controller@50002000 {
>>               compatible = "st,stm32mp157-pinctrl";
>>               ranges = <0 0x50002000 0xa400>;
>>               hwlocks = <&hsem 0 1>;
>>
>>           pinctrl_z: pin-controller-z@54004000 {
>>               compatible = "st,stm32mp157-z-pinctrl";
>>               ranges = <0 0x54004000 0x400>;
>>               pins-are-numbered;
>>               hwlocks = <&hsem 0 1>;
> Thanks for the examples.
>
>>>> I agree that we shouldn't specify this property in DT - if anything it
>>>> should be a variant of the API.
>>
>> If we decide to add a 'shared' API, then, what about the generic regmap
>> driver?
>>
>> In the context of above example1, this would require to update the
>> regmap driver.
>>
>> But would this be acceptable for any driver using syscon/regmap?
>>
>>
>> I think it is better to keep the existing API (modifying it so it always
>> allows
>>
>> hwlocks sharing, so no need for bindings update) than adding another API.
> For your usecases, you would definitely need the syscon/regmap behavior
> to be shared right. Whether we introduce a 'shared' API or an
> 'exclusive' API and change the current API behavior to shared, it is
> definitely a case-by-case usage scenario for the existing drivers and
> usage right. The main contention point is what to do with the
> unprotected usecases like Bjorn originally pointed out.

OK, I see : the hwspinlock framework does not offer any lock protection 
with the RAW/IN_ATOMIC modes.
This is an issue if several different 'local' drivers try to get a 
shared lock in the same time.
And this is a personal problem since I need to use shared locks in 
...atomic mode.

I have tried to see how it is possible to put a constraint on the 
callers, just like this is documented for the RAW mode which is:
    "Caution: If the mode is HWLOCK_RAW, that means user must protect 
the routine
     of getting hardware lock with mutex or spinlock.."
I do not think that it is acceptable to ask several drivers to share a 
common mutex/spinlock for shared locks.
But I think about another option: the driver implementing the trylock 
ops may offer such protection. This is the case if the driver returns 
"busy" if the lock is already taken, not only by the remote processor, 
but also by the local host.

So what do you think about adding such a documentation note :
"Caution : the HWLOCK_RAW / HWLOCK_IN_ATOMIC modes shall not be used 
with shared locks unless the hwspinlock driver supports local lock 
protection"

Optionally, we may add a "local_lock_protection" flag in the 
hwspinlock_device struct, set by the driver before it calls 
hwspin_lock_register().
This flag can then be checked by hwspinlock core to allow/deny use of 
shared locks in the raw/atomic modes.

Let me know what you think about it.

BR

Fabien

>
> regards
> Suman
>
>>
>>
>>>>> If you are sharing a hwlock on the Linux side, surely your driver should
>>>>> be aware that it is a shared lock. The tag can be set during the first
>>>>> request API, and you look through both tags when giving out a handle.
>>>>>
>>>> Why would the driver need to know about it?
>>> Just the semantics if we were to support single user vs multiple users
>>> on Linux-side to even get a handle. Your point is that this may be moot
>>> since we have protection anyway other than the raw cases. But we need to
>>> be able to have the same API work across all cases.
>>>
>>> So far, it had mostly been that there would be one user on Linux
>>> competing with other equivalent peer entities on different processors.
>>> It is not common to have multiple users since these protection schemes
>>> are usually needed only at the lowest levels of a stack, so the
>>> exclusive handle stuff had been sufficient.
>>>
>>>>> Obviously, the hwspin_lock_request() API usage semantics always had the
>>>>> implied additional need for communicating the lock id to the other peer
>>>>> entity, so a realistic usage is most always the specific API variant. I
>>>>> doubt this API would be of much use for the shared driver usage. This
>>>>> also implies that the client user does not care about specifying a lock
>>>>> in DT.
>>>>>
>>>> Afaict if the lock are shared then there shouldn't be a problem with
>>>> some clients using the request API and others request_specific(). As any
>>>> collisions would simply mean that there are more contention on the lock.
>>>>
>>>> With the current exclusive model that is not possible and the success of
>>>> the request_specific will depend on probe order.
>>>>
>>>> But perhaps it should be explicitly prohibited to use both APIs on the
>>>> same hwspinlock instance?
>>> Yeah, they are meant to be complimentary usage, though I doubt we will
>>> ever have any realistic users for the generic API if we haven't had a
>>> usage so far. I had posted a concept of reserved locks long back [1] to
>>> keep away certain locks from the generic requestor, but dropped it since
>>> we did not have an actual use-case needing it.
>>>
>>> regards
>>> Suman
>>>
>>> [1] https://lwn.net/Articles/611944/

^ permalink raw reply

* Re: [PATCH 5/6] tty: serial: Add linflexuart driver for S32V234
From: Stefan-gabriel Mirea @ 2019-08-08 12:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: corbet@lwn.net, robh+dt@kernel.org, mark.rutland@arm.com,
	gregkh@linuxfoundation.org, catalin.marinas@arm.com,
	shawnguo@kernel.org, Leo Li, jslaby@suse.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-serial@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Cosmin Stefan Stoica,
	Larisa Ileana Grigore
In-Reply-To: <20190808080832.nleult5bknmzr3ze@willie-the-truck>

Hello Will,

On 8/8/2019 11:08 AM, Will Deacon wrote:
> On Fri, Aug 02, 2019 at 07:47:23PM +0000, Stefan-gabriel Mirea wrote:
>> +             linflex,<addr>
>> +                     Use early console provided by Freescale LinFlex UART
>> +                     serial driver for NXP S32V234 SoCs. A valid base
>> +                     address must be provided, and the serial port must
>> +                     already be setup and configured.
> 
> Why isn't earlycon= sufficient for this?

"earlycon=" is not actually supported. I will fix this in the next
version by adding a /chosen/stdout-path to the dts. The compatible
string provided to OF_EARLYCON_DECLARE will also be changed from
"fsl,s32v234-linflexuart" to "fsl,s32-linflexuart" to match the one in
the device tree nodes. I missed this after importing a rename from our
codebase.

Should I remove this addition from kernel-parameters.txt after that?

Regards,
Stefan

^ permalink raw reply

* Re: [PATCH v2] powerpc/fadump: sysfs for fadump memory reservation
From: Hari Bathini @ 2019-08-08 11:17 UTC (permalink / raw)
  To: Sourabh Jain, mpe; +Cc: corbet, mahesh, linux-doc, linux-kernel, linuxppc-dev
In-Reply-To: <20190808100833.30367-1-sourabhjain@linux.ibm.com>



On 08/08/19 3:38 PM, Sourabh Jain wrote:
> Add a sys interface to allow querying the memory reserved by
> fadump for saving the crash dump.
> 
> Add an ABI doc entry for new sysfs interface.
>    - /sys/kernel/fadump_mem_reserved
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
> v1 -> v2:
>   - Added ABI doc for new sysfs interface.
> ---
> 
>  Documentation/ABI/testing/sysfs-kernel-fadump    |  6 ++++++
>  Documentation/powerpc/firmware-assisted-dump.rst |  5 +++++
>  arch/powerpc/kernel/fadump.c                     | 14 ++++++++++++++
>  3 files changed, 25 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump
> new file mode 100644
> index 000000000000..003e2f025dcb
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump
> @@ -0,0 +1,6 @@
> +What:		/sys/kernel/fadump_mem_reserved
> +Date:		August 2019
> +Contact:	linuxppc-dev@lists.ozlabs.org
> +Description:	read only
> +		Provide information about the amount of memory
> +		reserved by fadump to saving the crash dump.

s/to saving/to save/

Rest looks good..

Thanks
Hari


^ permalink raw reply

* [PATCH v2] powerpc/fadump: sysfs for fadump memory reservation
From: Sourabh Jain @ 2019-08-08 10:08 UTC (permalink / raw)
  To: mpe
  Cc: linuxppc-dev, mahesh, hbathini, linux-kernel, linux-doc, corbet,
	Sourabh Jain

Add a sys interface to allow querying the memory reserved by
fadump for saving the crash dump.

Add an ABI doc entry for new sysfs interface.
   - /sys/kernel/fadump_mem_reserved

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
v1 -> v2:
  - Added ABI doc for new sysfs interface.
---

 Documentation/ABI/testing/sysfs-kernel-fadump    |  6 ++++++
 Documentation/powerpc/firmware-assisted-dump.rst |  5 +++++
 arch/powerpc/kernel/fadump.c                     | 14 ++++++++++++++
 3 files changed, 25 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump
new file mode 100644
index 000000000000..003e2f025dcb
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -0,0 +1,6 @@
+What:		/sys/kernel/fadump_mem_reserved
+Date:		August 2019
+Contact:	linuxppc-dev@lists.ozlabs.org
+Description:	read only
+		Provide information about the amount of memory
+		reserved by fadump to saving the crash dump.
diff --git a/Documentation/powerpc/firmware-assisted-dump.rst b/Documentation/powerpc/firmware-assisted-dump.rst
index 9ca12830a48e..a5dfb20d4dc3 100644
--- a/Documentation/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/powerpc/firmware-assisted-dump.rst
@@ -222,6 +222,11 @@ Here is the list of files under kernel sysfs:
     be handled and vmcore will not be captured. This interface can be
     easily integrated with kdump service start/stop.
 
+ /sys/kernel/fadump_mem_reserved
+
+   This is used to display the memory reserved by fadump for saving the
+   crash dump.
+
  /sys/kernel/fadump_release_mem
     This file is available only when fadump is active during
     second kernel. This is used to release the reserved memory
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4eab97292cc2..cd373d1d4b82 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1514,6 +1514,13 @@ static ssize_t fadump_enabled_show(struct kobject *kobj,
 	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
+static ssize_t fadump_mem_reserved_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
+}
+
 static ssize_t fadump_register_show(struct kobject *kobj,
 					struct kobj_attribute *attr,
 					char *buf)
@@ -1632,6 +1639,9 @@ static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
 static struct kobj_attribute fadump_register_attr = __ATTR(fadump_registered,
 						0644, fadump_register_show,
 						fadump_register_store);
+static struct kobj_attribute fadump_mem_reserved_attr =
+			__ATTR(fadump_mem_reserved, 0444,
+			       fadump_mem_reserved_show, NULL);
 
 DEFINE_SHOW_ATTRIBUTE(fadump_region);
 
@@ -1663,6 +1673,10 @@ static void fadump_init_files(void)
 			printk(KERN_ERR "fadump: unable to create sysfs file"
 				" fadump_release_mem (%d)\n", rc);
 	}
+	rc = sysfs_create_file(kernel_kobj, &fadump_mem_reserved_attr.attr);
+	if (rc)
+		pr_err("unable to create sysfs file fadump_mem_reserved (%d)\n",
+		       rc);
 	return;
 }
 
-- 
2.17.2


^ permalink raw reply related

* Re: [EXTERNAL][PATCH v15 04/13] mfd/syscon: Add device_node_to_regmap()
From: Arnd Bergmann @ 2019-08-08 10:04 UTC (permalink / raw)
  To: Paul Burton
  Cc: Paul Cercueil, Lee Jones, Ralf Baechle, James Hogan,
	Jonathan Corbet, Daniel Lezcano, Thomas Gleixner,
	Michael Turquette, Stephen Boyd, Jason Cooper, Marc Zyngier,
	Rob Herring, Mark Rutland, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-clk@vger.kernel.org, od@zcrc.me,
	Mathieu Malaterre
In-Reply-To: <20190729165536.dd67ws6nr2msx4pk@pburton-laptop>

On Mon, Jul 29, 2019 at 6:55 PM Paul Burton <paul.burton@mips.com> wrote:
>
> Lee, Arnd,
>
> On Wed, Jul 24, 2019 at 01:16:06PM -0400, Paul Cercueil wrote:
> > device_node_to_regmap() is exactly like syscon_node_to_regmap(), but it
> > does not check that the node is compatible with "syscon", and won't
> > attach the first clock it finds to the regmap.
> >
> > The rationale behind this, is that one device node with a standard
> > compatible string "foo,bar" can be covered by multiple drivers sharing a
> > regmap, or by a single driver doing all the job without a regmap, but
> > these are implementation details which shouldn't reflect on the
> > devicetree.
>
> Does this looks like a good path forwards to you? Its use in this case
> is described by Documentation/devicetree/bindings/timer/ingenic,tcu.txt
> in patch 3 of the series.
>
> If you're OK with it an ack would be appreciated so I can take the
> series through mips-next, otherwise I guess we'd need to go back to the
> v14 approach.

Yes, I guess this is ok, sorry for missing the submission earlier.

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply

* [PATCH v3 3/8] drm/ttm: add gem_ttm_bo_device_init()
From: Gerd Hoffmann @ 2019-08-08  9:36 UTC (permalink / raw)
  To: dri-devel
  Cc: tzimmermann, Gerd Hoffmann, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Sean Paul, Jonathan Corbet,
	open list:DOCUMENTATION, open list
In-Reply-To: <20190808093702.29512-1-kraxel@redhat.com>

Now with ttm_buffer_object being a subclass of drm_gem_object we can
easily lookup ttm_buffer_object for a given drm_gem_object, which in
turm allows to create common helper functions.

This patch starts off with a gem_ttm_bo_device_init() helper function
which initializes ttm with the vma offset manager used by gem, to make
sure gem and ttm have the same view on vma offsets.

With that in place gem+ttm drivers don't need their private
drm_driver.dumb_map_offset implementation any more.

v3:
 - complete rewrite

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 include/drm/drm_gem_ttm_helper.h     | 30 +++++++++++++++++++++++
 drivers/gpu/drm/drm_gem_ttm_helper.c | 36 ++++++++++++++++++++++++++++
 Documentation/gpu/drm-mm.rst         | 12 ++++++++++
 drivers/gpu/drm/Kconfig              |  7 ++++++
 drivers/gpu/drm/Makefile             |  3 +++
 5 files changed, 88 insertions(+)
 create mode 100644 include/drm/drm_gem_ttm_helper.h
 create mode 100644 drivers/gpu/drm/drm_gem_ttm_helper.c

diff --git a/include/drm/drm_gem_ttm_helper.h b/include/drm/drm_gem_ttm_helper.h
new file mode 100644
index 000000000000..43c9db3583cc
--- /dev/null
+++ b/include/drm/drm_gem_ttm_helper.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef DRM_GEM_TTM_HELPER_H
+#define DRM_GEM_TTM_HELPER_H
+
+#include <linux/kernel.h>
+
+#include <drm/drm_gem.h>
+#include <drm/drm_device.h>
+#include <drm/ttm/ttm_bo_api.h>
+#include <drm/ttm/ttm_bo_driver.h>
+
+/**
+ * Returns the container of type &struct ttm_buffer_object
+ * for field base.
+ * @gem:	the GEM object
+ * Returns:	The containing GEM VRAM object
+ */
+static inline struct ttm_buffer_object *drm_gem_ttm_of_gem(
+	struct drm_gem_object *gem)
+{
+	return container_of(gem, struct ttm_buffer_object, base);
+}
+
+int drm_gem_ttm_bo_device_init(struct drm_device *dev,
+			       struct ttm_bo_device *bdev,
+			       struct ttm_bo_driver *driver,
+			       bool need_dma32);
+
+#endif
diff --git a/drivers/gpu/drm/drm_gem_ttm_helper.c b/drivers/gpu/drm/drm_gem_ttm_helper.c
new file mode 100644
index 000000000000..0c57e9fd50b9
--- /dev/null
+++ b/drivers/gpu/drm/drm_gem_ttm_helper.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <drm/drm_gem_ttm_helper.h>
+
+/**
+ * DOC: overview
+ *
+ * This library provides helper functions for gem objects backed by
+ * ttm.
+ */
+
+/**
+ * drm_gem_ttm_bo_device_init - ttm init for devices which use gem+ttm
+ *
+ * @dev: A pointer to a struct drm_device.
+ * @bdev: A pointer to a struct ttm_bo_device to initialize.
+ * @driver: A pointer to a struct ttm_bo_driver set up by the caller.
+ * @need_dma32: Whenever the device is limited to 32bit DMA.
+ *
+ * This initializes ttm with dev->vma_offset_manager, so gem and ttm
+ * fuction are working with the same vma_offset_manager.
+ *
+ * Returns:
+ * !0: Failure.
+ */
+int drm_gem_ttm_bo_device_init(struct drm_device *dev,
+			       struct ttm_bo_device *bdev,
+			       struct ttm_bo_driver *driver,
+			       bool need_dma32)
+{
+	return ttm_bo_device_init_with_vma_manager(bdev, driver,
+						   dev->anon_inode->i_mapping,
+						   dev->vma_offset_manager,
+						   need_dma32);
+}
+EXPORT_SYMBOL(drm_gem_ttm_bo_device_init);
diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index b664f054c259..a70a1d9f30ec 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -412,6 +412,18 @@ VRAM MM Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_vram_mm_helper.c
    :export:
 
+GEM TTM Helper Functions Reference
+-----------------------------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
+   :doc: overview
+
+.. kernel-doc:: include/drm/drm_gem_ttm_helper.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gem_ttm_helper.c
+   :export:
+
 VMA Offset Manager
 ==================
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e6f40fb54c9a..f7b25519f95c 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -172,6 +172,13 @@ config DRM_VRAM_HELPER
 	help
 	  Helpers for VRAM memory management
 
+config DRM_TTM_HELPER
+	tristate
+	depends on DRM
+	select DRM_TTM
+	help
+	  Helpers for ttm-based gem objects
+
 config DRM_GEM_CMA_HELPER
 	bool
 	depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 10f8329a8b71..545c61d6528b 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -37,6 +37,9 @@ drm_vram_helper-y := drm_gem_vram_helper.o \
 		     drm_vram_mm_helper.o
 obj-$(CONFIG_DRM_VRAM_HELPER) += drm_vram_helper.o
 
+drm_ttm_helper-y := drm_gem_ttm_helper.o
+obj-$(CONFIG_DRM_TTM_HELPER) += drm_ttm_helper.o
+
 drm_kms_helper-y := drm_crtc_helper.o drm_dp_helper.o drm_dsc.o drm_probe_helper.o \
 		drm_plane_helper.o drm_dp_mst_topology.o drm_atomic_helper.o \
 		drm_kms_helper_common.o drm_dp_dual_mode_helper.o \
-- 
2.18.1


^ permalink raw reply related

* Re: [PATCH v7 0/2] arm64 tagged address ABI
From: Szabolcs Nagy @ 2019-08-08  9:32 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel@lists.infradead.org
  Cc: nd, Vincenzo Frascino, Will Deacon, Andrey Konovalov,
	Kevin Brodsky, linux-doc@vger.kernel.org,
	linux-arch@vger.kernel.org, Dave Hansen
In-Reply-To: <20190807155321.9648-1-catalin.marinas@arm.com>

On 07/08/2019 16:53, Catalin Marinas wrote:
> Hi,
> 
> Thanks for the feedback so far. This is an updated series documenting
> the AArch64 Tagged Address ABI as implemented by these patches:
> 
> http://lkml.kernel.org/r/cover.1563904656.git.andreyknvl@google.com
> 
> Version 6 of the documentation series is available here:
> 
> http://lkml.kernel.org/r/20190725135044.24381-1-vincenzo.frascino@arm.com
> 
> Changes in v7:
> 
> - Dropped the MAP_PRIVATE requirements for tagged pointers for both
>   anonymous and file mappings. One reason is that we can't enforce such
>   restriction anyway. The other reason is that a future series
>   implementing support for the hardware MTE will detect
>   incompatibilities of the new PROT_MTE flag with various mmap()
>   options.

OK.

> - As a consequence of the above, I removed Szabolcs ack as I'm not sure
>   he's ok with the change.
> 
> - Clarified the sysctl and prctl() interaction and reordered the
>   descriptions.
> 
> - Reworded the prctl(PR_SET_MM) restrictions.
> 
> - Removed the description of the tag preservation from the first patch
>   as it didn't really make sense (the syscall ABI has always preserved
>   all registers other than x0 on return to user).

preservation is more interesting when a user pointer
is passed to the kernel and later it is passed back
to user space (e.g. set/get_robust_list, or sigaction
where old handler pointer is returned), then the
kernel may want to drop the tag to do something with
the pointer, but user space may want it to be preserved.

in principle segfault si_addr is a similar case when
memory access via tagged pointer faults: currently
the kernel does not preserve the tag.

so i think it's interesting to know when exactly the
kernel preserves the tags, but it may not be easy to
document in a generic way.

> 
> - s/ARM64/AArch64/ for consistency with the tagged-pointers.rst
>   document.
> 
> - Other minor rewordings.
> 
> Vincenzo Frascino (2):
>   arm64: Define Documentation/arm64/tagged-address-abi.rst
>   arm64: Relax Documentation/arm64/tagged-pointers.rst
> 
>  Documentation/arm64/tagged-address-abi.rst | 151 +++++++++++++++++++++
>  Documentation/arm64/tagged-pointers.rst    |  23 +++-
>  2 files changed, 167 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/arm64/tagged-address-abi.rst
> 


^ permalink raw reply

* Re: [PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
From: Szabolcs Nagy @ 2019-08-08  9:25 UTC (permalink / raw)
  To: Dave Hansen, Catalin Marinas,
	linux-arm-kernel@lists.infradead.org
  Cc: nd, Vincenzo Frascino, Will Deacon, Andrey Konovalov,
	Kevin Brodsky, linux-doc@vger.kernel.org,
	linux-arch@vger.kernel.org
In-Reply-To: <826a9ace-feac-c019-843e-07e23c9fd46c@intel.com>

On 07/08/2019 21:38, Dave Hansen wrote:
> On 8/7/19 8:53 AM, Catalin Marinas wrote:
>> +- The syscall behaviour is undefined for non valid tagged pointers.
> 
> Do you really mean "undefined"?  I mean, a bad pointer is a bad pointer.
>  Why should it matter if it's a tagged bad pointer or an untagged bad
> pointer?

bad pointers are invalid, but some non-bad pointers are
also invalid if they are tagged (e.g. tagged pointer to
device memory?) those may be valid to dereference in
userspace but don't work across the syscall abi (device
driver does not handle the tag?).

>> +- mmap() addr parameter.
>> +
>> +- mremap() new_address parameter.
> 
> Is munmap() missing?  Or was there a reason for leaving it out?

the new address in mmap and mremap may not be currently
mapped, other m* functions operate on existing mappings
(munmap, madvise, mprotect, mlock,...)

although by this logic brk (and related PR_SET_MM_*)
should be excluded here too.

>> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and
>> +  PR_SET_MM_MAP_SIZE.
>> +
>> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map fields.
>> +
>> +Any attempt to use non-zero tagged pointers will lead to undefined
>> +behaviour.
> 
> I wonder if you want to generalize this a bit.  I think you're saying
> that parts of the ABI that modify the *layout* of the address space
> never accept tagged pointers.

something like that, but i think this is hard to specify
in a generic way.

^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Michal Hocko @ 2019-08-08  8:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Andrew Morton, linux-kernel, Alexey Dobriyan, Borislav Petkov,
	Brendan Gregg, Catalin Marinas, Christian Hansen, dancol, fmayer,
	H. Peter Anvin, Ingo Molnar, Jonathan Corbet, Kees Cook,
	kernel-team, linux-api, linux-doc, linux-fsdevel, linux-mm,
	Mike Rapoport, minchan, namhyung, paulmck, Robin Murphy,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807213105.GA14622@google.com>

On Wed 07-08-19 17:31:05, Joel Fernandes wrote:
> On Wed, Aug 07, 2019 at 01:58:40PM -0700, Andrew Morton wrote:
> > On Wed, 7 Aug 2019 16:45:30 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > > On Wed, Aug 07, 2019 at 01:04:02PM -0700, Andrew Morton wrote:
> > > > On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > > > 
> > > > > In Android, we are using this for the heap profiler (heapprofd) which
> > > > > profiles and pin points code paths which allocates and leaves memory
> > > > > idle for long periods of time. This method solves the security issue
> > > > > with userspace learning the PFN, and while at it is also shown to yield
> > > > > better results than the pagemap lookup, the theory being that the window
> > > > > where the address space can change is reduced by eliminating the
> > > > > intermediate pagemap look up stage. In virtual address indexing, the
> > > > > process's mmap_sem is held for the duration of the access.
> > > > 
> > > > So is heapprofd a developer-only thing?  Is heapprofd included in
> > > > end-user android loads?  If not then, again, wouldn't it be better to
> > > > make the feature Kconfigurable so that Android developers can enable it
> > > > during development then disable it for production kernels?
> > > 
> > > Almost all of this code is already configurable with
> > > CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
> > > disabled.
> > > 
> > > Or are you referring to something else that needs to be made configurable?
> > 
> > Yes - the 300+ lines of code which this patchset adds!
> > 
> > The impacted people will be those who use the existing
> > idle-page-tracking feature but who will not use the new feature.  I
> > guess we can assume this set is small...
> 
> Yes, I think this set should be small. The code size increase of page_idle.o
> is from ~1KB to ~2KB. Most of the extra space is consumed by
> page_idle_proc_generic() function which this patch adds. I don't think adding
> another CONFIG option to disable this while keeping existing
> CONFIG_IDLE_PAGE_TRACKING enabled, is worthwhile but I am open to the
> addition of such an option if anyone feels strongly about it. I believe that
> once this patch is merged, most like this new interface being added is what
> will be used more than the old interface (for some of the usecases) so it
> makes sense to keep it alive with CONFIG_IDLE_PAGE_TRACKING.

I would tend to agree with Joel here. The functionality falls into an
existing IDLE_PAGE_TRACKING config option quite nicely. If there really
are users who want to save some space and this is standing in the way
then they can easily add a new config option with some justification so
the savings are clear. Without that an additional config simply adds to
the already existing configurability complexity and balkanization.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 5/6] tty: serial: Add linflexuart driver for S32V234
From: Will Deacon @ 2019-08-08  8:08 UTC (permalink / raw)
  To: Stefan-gabriel Mirea
  Cc: corbet@lwn.net, robh+dt@kernel.org, mark.rutland@arm.com,
	gregkh@linuxfoundation.org, catalin.marinas@arm.com,
	shawnguo@kernel.org, Leo Li, jslaby@suse.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-serial@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Cosmin Stefan Stoica,
	Larisa Ileana Grigore
In-Reply-To: <20190802194702.30249-6-stefan-gabriel.mirea@nxp.com>

On Fri, Aug 02, 2019 at 07:47:23PM +0000, Stefan-gabriel Mirea wrote:
> Introduce support for LINFlex driver, based on:
> - the version of Freescale LPUART driver after commit b3e3bf2ef2c7 ("Merge
>   4.0-rc7 into tty-next");
> - commit abf1e0a98083 ("tty: serial: fsl_lpuart: lock port on console
>   write").
> In this basic version, the driver can be tested using initramfs and relies
> on the clocks and pin muxing set up by U-Boot.
> 
> Remarks concerning the earlycon support:
> 
> - LinFlexD does not allow character transmissions in the INIT mode (see
>   section 47.4.2.1 in the reference manual[1]). Therefore, a mutual
>   exclusion between the first linflex_setup_watermark/linflex_set_termios
>   executions and linflex_earlycon_putchar was employed and the characters
>   normally sent to earlycon during initialization are kept in a buffer and
>   sent afterwards.
> 
> - Empirically, character transmission is also forbidden within the last 1-2
>   ms before entering the INIT mode, so we use an explicit timeout
>   (PREINIT_DELAY) between linflex_earlycon_putchar and the first call to
>   linflex_setup_watermark.
> 
> - U-Boot currently uses the UART FIFO mode, while this driver makes the
>   transition to the buffer mode. Therefore, the earlycon putchar function
>   matches the U-Boot behavior before initializations and the Linux behavior
>   after.
> 
> [1] https://www.nxp.com/webapp/Download?colCode=S32V234RM
> 
> Signed-off-by: Stoica Cosmin-Stefan <cosmin.stoica@nxp.com>
> Signed-off-by: Adrian.Nitu <adrian.nitu@freescale.com>
> Signed-off-by: Larisa Grigore <Larisa.Grigore@nxp.com>
> Signed-off-by: Ana Nedelcu <B56683@freescale.com>
> Signed-off-by: Mihaela Martinas <Mihaela.Martinas@freescale.com>
> Signed-off-by: Matthew Nunez <matthew.nunez@nxp.com>
> [stefan-gabriel.mirea@nxp.com: Reduced for upstreaming and implemented
>                                earlycon support]
> Signed-off-by: Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   6 +
>  drivers/tty/serial/Kconfig                    |  15 +
>  drivers/tty/serial/Makefile                   |   1 +
>  drivers/tty/serial/fsl_linflexuart.c          | 956 ++++++++++++++++++
>  include/uapi/linux/serial_core.h              |   3 +
>  5 files changed, 981 insertions(+)
>  create mode 100644 drivers/tty/serial/fsl_linflexuart.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 46b826fcb5ad..4d545732aadc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1090,6 +1090,12 @@
>  			the framebuffer, pass the 'ram' option so that it is
>  			mapped with the correct attributes.
>  
> +		linflex,<addr>
> +			Use early console provided by Freescale LinFlex UART
> +			serial driver for NXP S32V234 SoCs. A valid base
> +			address must be provided, and the serial port must
> +			already be setup and configured.

Why isn't earlycon= sufficient for this?

Will

^ permalink raw reply

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
From: Frank Rowand @ 2019-08-08  2:06 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jonathan Corbet
  Cc: devicetree, linux-kernel, David Collins, kernel-team, linux-doc
In-Reply-To: <20190724001100.133423-4-saravanak@google.com>

On 7/23/19 5:10 PM, Saravana Kannan wrote:
> Add device-links after the devices are created (but before they are
> probed) by looking at common DT bindings like clocks and
> interconnects.
> 
> Automatically adding device-links for functional dependencies at the
> framework level provides the following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, interconnect providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   5 +
>  drivers/of/platform.c                         | 165 ++++++++++++++++++
>  2 files changed, 170 insertions(+)
> 

Documentation/admin-guide/kernel-paramers.rst:

After line 129, add:

	OF	Devicetree is enabled

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 46b826fcb5ad..12937349d79d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3170,6 +3170,11 @@
>  			This can be set from sysctl after boot.
>  			See Documentation/admin-guide/sysctl/vm.rst for details.
>  
> +	of_devlink	[KNL] Make device links from common DT bindings. Useful
> +			for optimizing probe order and making sure resources
> +			aren't turned off before the consumer devices have
> +			probed.

        of_supplier_depend instead of of_devlink ????

	of_supplier_depend
			[OF, KNL] Make device links from consumer devicetree
			nodes to supplier devicetree nodes.  The
			consumer / supplier relationships are inferred from
			scanning the devicetree.  The driver for a consumer
			device will not be probed until the drivers for all of
			its supplier devices have been successfully probed.


> +
>  	ohci1394_dma=early	[HW] enable debugging via the ohci1394 driver.
>  			See Documentation/debugging-via-ohci1394.txt for more
>  			info.
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 7801e25e6895..4344419a26fc 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
>  }
>  EXPORT_SYMBOL_GPL(of_platform_default_populate);
>  

> +bool of_link_is_valid(struct device_node *con, struct device_node *sup)

Change to less vague:

   bool of_ancestor_of(struct device_node *test_np, struct device_node *np)


> +{
> +	of_node_get(sup);
> +	/*
> +	 * Don't allow linking a device node as a consumer of one of its
> +	 * descendant nodes. By definition, a child node can't be a functional
> +	 * dependency for the parent node.
> +	 */
> +	while (sup) {
> +		if (sup == con) {
> +			of_node_put(sup);
> +			return false;
> +		}
> +		sup = of_get_next_parent(sup);
> +	}
> +	return true;

Change to more generic:

	of_node_get(test_np);
	while (test_np) {
		if (test_np == np) {
			of_node_put(test_np);
			return true;
		}
		test_np = of_get_next_parent(test_np);
	}
	return false;

> +}
> +


/**
 * of_link_to_phandle - Add device link to supplier
 * @dev: consumer device
 * @sup_np: pointer to the supplier device tree node
 *
 * TODO: ...
 *
 * Return:
 * * 0 if link successfully created for supplier or of_devlink is false
 * * an error if unable to create link
 */

Should have dev_debug() or pr_warn() or something on errors in this
function -- the caller does not report any issue

> +static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
> +{
> +	struct platform_device *sup_dev;
> +	u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
> +	int ret = 0;
> +
> +	/*

> +	 * Since we are trying to create device links, we need to find
> +	 * the actual device node that owns this supplier phandle.
> +	 * Often times it's the same node, but sometimes it can be one
> +	 * of the parents. So walk up the parent till you find a
> +	 * device.

Change comment to:

	 * Find the device node that contains the supplier phandle.  It may
	 * be @sup_np or it may be an ancestor of @sup_np.

> +	 */

See comment in caller of of_link_to_phandle() - do not hide the final
of_node_put() of sup_np inside of_link_to_phandle(), so need to do
an of_node_get() here.

	of_node_get(sup_np);

> +	while (sup_np && !of_find_property(sup_np, "compatible", NULL))
> +		sup_np = of_get_next_parent(sup_np);
> +	if (!sup_np)

> +		return 0;

This case should never occur(?), it is an error.

                return -ENODEV;

> +
> +	if (!of_link_is_valid(dev->of_node, sup_np)) {
> +		of_node_put(sup_np);
> +		return 0;

Do not use a name that obscures what the function is doing, also
return an actual issue.

	if (of_ancestor_of(sup_np, dev->of_node)) {
		of_node_put(sup_np);
		return -EINVAL;

> +	}
> +	sup_dev = of_find_device_by_node(sup_np);
> +	of_node_put(sup_np);
> +	if (!sup_dev)
> +		return -ENODEV;
> +	if (!device_link_add(dev, &sup_dev->dev, dl_flags))
> +		ret = -ENODEV;
> +	put_device(&sup_dev->dev);
> +	return ret;
> +}
> +

/**
 * parse_prop_cells - Property parsing functions for suppliers
 *
 * @np:            pointer to a device tree node containing a list
 * @prop_name:     Name of property holding a phandle value
 * @phandle_index: For properties holding a table of phandles, this is the
 *                 index into the table
 * @list_name:     property name that contains a list
 * @cells_name:    property name that specifies phandles' arguments count
 *
 * This function is useful to parse lists of phandles and their arguments.
 *
 * Return:
 * * Node pointer with refcount incremented, use of_node_put() on it when done.
 * * NULL if not found.
 */

> +static struct device_node *parse_prop_cells(struct device_node *np,
> +					    const char *prop, int index,
> +					    const char *binding,
> +					    const char *cell)

Make names consistent with of_parse_phandle_with_args():
  Change prop to prop_name
  Change index to phandle_index
  Change binding to list_name
  Change cell to cells_name

> +{
> +	struct of_phandle_args sup_args;
> +

> +	/* Don't need to check property name for every index. */
> +	if (!index && strcmp(prop, binding))
> +		return NULL;

I read the discussion on whether to check property name only once
in version 6.

This check is fragile, depending upon the calling code to be properly
structured.  Do the check for all values of index.  The reduction of
overhead from not checking does not justify the fragileness and the
extra complexity for the code reader to understand why the check can
be bypassed when
index is not zero.

> +
> +	if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
> +		return NULL;
> +
> +	return sup_args.np;
> +}
> +
> +static struct device_node *parse_clocks(struct device_node *np,
> +					const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
> +}
> +
> +static struct device_node *parse_interconnects(struct device_node *np,
> +					       const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	return parse_prop_cells(np, prop, index, "interconnects",
> +				"#interconnect-cells");
> +}
> +
> +static int strcmp_suffix(const char *str, const char *suffix)
> +{
> +	unsigned int len, suffix_len;
> +
> +	len = strlen(str);
> +	suffix_len = strlen(suffix);
> +	if (len <= suffix_len)
> +		return -1;
> +	return strcmp(str + len - suffix_len, suffix);
> +}
> +
> +static struct device_node *parse_regulators(struct device_node *np,
> +					    const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	if (index || strcmp_suffix(prop, "-supply"))
> +		return NULL;
> +
> +	return of_parse_phandle(np, prop, 0);
> +}
> +
> +/**

> + * struct supplier_bindings - Information for parsing supplier DT binding
> + *
> + * @parse_prop:		If the function cannot parse the property, return NULL.
> + *			Otherwise, return the phandle listed in the property
> + *			that corresponds to the index.

There is no documentation of dynamic function parameters in the docbook
description of a struct.  Use this format for now and I will clean up when
I clean up all of the devicetree docbook info.

Change above comment to:

 * struct supplier_bindings - Property parsing functions for suppliers
 *
 * @parse_prop: function name
 *              parse_prop() finds the node corresponding to a supplier phandle
 * @parse_prop.np: Pointer to device node holding supplier phandle property
 * @parse_prop.prop_name: Name of property holding a phandle value
 * @parse_prop.index: For properties holding a table of phandles, this is the
 *                    index into the table
 *
 * Return:
 * * parse_prop() return values are
 * * Node pointer with refcount incremented, use of_node_put() on it when done.
 * * NULL if not found.

> + */
> +struct supplier_bindings {
> +	struct device_node *(*parse_prop)(struct device_node *np,
> +					  const char *name, int index);

Change name to prop_name
Change index to phandle_index

> +};
> +
> +static const struct supplier_bindings bindings[] = {
> +	{ .parse_prop = parse_clocks, },
> +	{ .parse_prop = parse_interconnects, },
> +	{ .parse_prop = parse_regulators, },

> +	{ },

	{},

> +};
> +

/**
 * of_link_property - TODO:
 * dev:
 * con_np:
 * prop:
 *
 * TODO...
 *
 * Any failed attempt to create a link will NOT result in an immediate return.
 * of_link_property() must create all possible links even when one of more
 * attempts to create a link fail.

Why?  isn't one failure enough to prevent probing this device?
Continuing to scan just results in extra work... which will be
repeated every time device_link_check_waiting_consumers() is called

 *
 * Return:
 * * 0 if TODO:
 * * -ENODEV on error
 */


I left some "TODO:" sections to be filled out above.


> +static bool of_link_property(struct device *dev, struct device_node *con_np,
> +			     const char *prop)

Returns 0 or -ENODEV, so bool is incorrect

(Also fixed on 8/8 in patch: "[PATCH 1/2] of/platform: Fix fn definitons for
of_link_is_valid() and of_link_property()")



> +{
> +	struct device_node *phandle;
> +	struct supplier_bindings *s = bindings;
> +	unsigned int i = 0;

> +	bool done = true, matched = false;

Change to:

   	bool matched = false;
	int ret = 0;

	/* do not stop at first failed link, link all available suppliers */

> +
> +	while (!matched && s->parse_prop) {
> +		while ((phandle = s->parse_prop(con_np, prop, i))) {
> +			matched = true;
> +			i++;
> +			if (of_link_to_phandle(dev, phandle))


Remove comment:

> +				/*
> +				 * Don't stop at the first failure. See
> +				 * Documentation for bus_type.add_links for
> +				 * more details.
> +				 */

> +				done = false;

				ret = -ENODEV;

Do not hide of_node_put() inside of_link_to_phandle(), do it here:

			of_node_put(phandle);

> +		}
> +		s++;
> +	}

> +	return done ? 0 : -ENODEV;

	return ret;

> +}
> +
> +static bool of_devlink;
> +core_param(of_devlink, of_devlink, bool, 0);
> +

/**
 * of_link_to_suppliers - Add device links to suppliers
 * @dev: consumer device
 *
 * Create device links to all available suppliers of @dev.
 * Must NOT stop at the first failed link.
 * If some suppliers are not yet available, this function will be
 * called again when additional suppliers become available.
 *
 * Return:
 * * 0 if links successfully created for all suppliers
 * * an error if one or more suppliers not yet available
 */

> +static int of_link_to_suppliers(struct device *dev)
> +{
> +	struct property *p;

> +	bool done = true;

remove done

        int ret = 0;

> +
> +	if (!of_devlink)
> +		return 0;

> +	if (unlikely(!dev->of_node))
> +		return 0;

Check not needed, for_each_property_of_node() will detect !dev->of_node.

> +
> +	for_each_property_of_node(dev->of_node, p)
> +		if (of_link_property(dev, dev->of_node, p->name))

> +			done = false;

                        ret = -EAGAIN;

> +
> +	return done ? 0 : -ENODEV;

	return ret;

> +}
> +
>  #ifndef CONFIG_PPC
>  static const struct of_device_id reserved_mem_matches[] = {
>  	{ .compatible = "qcom,rmtfs-mem" },
> @@ -523,6 +687,7 @@ static int __init of_platform_default_populate_init(void)
>  	if (!of_have_populated_dt())
>  		return -ENODEV;
>  
> +	platform_bus_type.add_links = of_link_to_suppliers;
>  	/*
>  	 * Handle certain compatibles explicitly, since we don't want to create
>  	 * platform_devices for every node in /reserved-memory with a
> -- 
> 2.22.0.709.g102302147b-goog
> 
>

^ permalink raw reply

* Re: [PATCH v8 0/2] fTPM: firmware TPM running in TEE
From: Sasha Levin @ 2019-08-08  1:08 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: peterhuewe, jgg, corbet, linux-kernel, linux-doc, linux-integrity,
	linux-kernel, thiruan, bryankel, tee-dev, ilias.apalodimas,
	sumit.garg, rdunlap
In-Reply-To: <20190805223324.qvbqa45xnp5fgsib@linux.intel.com>

On Tue, Aug 06, 2019 at 01:51:32AM +0300, Jarkko Sakkinen wrote:
>On Mon, Aug 05, 2019 at 02:05:18PM -0400, Sasha Levin wrote:
>> On Mon, Aug 05, 2019 at 12:44:28AM +0300, Jarkko Sakkinen wrote:
>> > On Thu, Jul 11, 2019 at 11:08:58PM +0300, Jarkko Sakkinen wrote:
>> > > On Fri, Jul 05, 2019 at 04:47:44PM -0400, Sasha Levin wrote:
>> > > > Changes from v7:
>> > > >
>> > > >  - Address Jarkko's comments.
>> > > >
>> > > > Sasha Levin (2):
>> > > >   fTPM: firmware TPM running in TEE
>> > > >   fTPM: add documentation for ftpm driver
>> > > >
>> > > >  Documentation/security/tpm/index.rst        |   1 +
>> > > >  Documentation/security/tpm/tpm_ftpm_tee.rst |  27 ++
>> > > >  drivers/char/tpm/Kconfig                    |   5 +
>> > > >  drivers/char/tpm/Makefile                   |   1 +
>> > > >  drivers/char/tpm/tpm_ftpm_tee.c             | 350 ++++++++++++++++++++
>> > > >  drivers/char/tpm/tpm_ftpm_tee.h             |  40 +++
>> > > >  6 files changed, 424 insertions(+)
>> > > >  create mode 100644 Documentation/security/tpm/tpm_ftpm_tee.rst
>> > > >  create mode 100644 drivers/char/tpm/tpm_ftpm_tee.c
>> > > >  create mode 100644 drivers/char/tpm/tpm_ftpm_tee.h
>> > > >
>> > > > --
>> > > > 2.20.1
>> > > >
>> > >
>> > > I applied the patches now. Appreciate a lot the patience with these.
>> > > Thank you.
>> >
>> > Hi, can you possibly fix these:
>>
>> Any objection to sending you a patch on top of your tree instead?
>
>Go ahead. Added the previous patches to my master.

Thanks! I'm getting back home on Monday and I'll send it out right away.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Joel Fernandes @ 2019-08-07 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807213105.GA14622@google.com>

On Wed, Aug 07, 2019 at 05:31:05PM -0400, Joel Fernandes wrote:
> On Wed, Aug 07, 2019 at 01:58:40PM -0700, Andrew Morton wrote:
> > On Wed, 7 Aug 2019 16:45:30 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > > On Wed, Aug 07, 2019 at 01:04:02PM -0700, Andrew Morton wrote:
> > > > On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > > > 
> > > > > In Android, we are using this for the heap profiler (heapprofd) which
> > > > > profiles and pin points code paths which allocates and leaves memory
> > > > > idle for long periods of time. This method solves the security issue
> > > > > with userspace learning the PFN, and while at it is also shown to yield
> > > > > better results than the pagemap lookup, the theory being that the window
> > > > > where the address space can change is reduced by eliminating the
> > > > > intermediate pagemap look up stage. In virtual address indexing, the
> > > > > process's mmap_sem is held for the duration of the access.
> > > > 
> > > > So is heapprofd a developer-only thing?  Is heapprofd included in
> > > > end-user android loads?  If not then, again, wouldn't it be better to
> > > > make the feature Kconfigurable so that Android developers can enable it
> > > > during development then disable it for production kernels?
> > > 
> > > Almost all of this code is already configurable with
> > > CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
> > > disabled.
> > > 
> > > Or are you referring to something else that needs to be made configurable?
> > 
> > Yes - the 300+ lines of code which this patchset adds!
> > 
> > The impacted people will be those who use the existing
> > idle-page-tracking feature but who will not use the new feature.  I
> > guess we can assume this set is small...
> 
> Yes, I think this set should be small. The code size increase of page_idle.o
> is from ~1KB to ~2KB. Most of the extra space is consumed by
> page_idle_proc_generic() function which this patch adds. I don't think adding
> another CONFIG option to disable this while keeping existing
> CONFIG_IDLE_PAGE_TRACKING enabled, is worthwhile but I am open to the
> addition of such an option if anyone feels strongly about it. I believe that
> once this patch is merged, most like this new interface being added is what

s/most like/most likely/

> will be used more than the old interface (for some of the usecases) so it
> makes sense to keep it alive with CONFIG_IDLE_PAGE_TRACKING.
> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Joel Fernandes @ 2019-08-07 21:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807135840.92b852e980a9593fe91fbf59@linux-foundation.org>

On Wed, Aug 07, 2019 at 01:58:40PM -0700, Andrew Morton wrote:
> On Wed, 7 Aug 2019 16:45:30 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Wed, Aug 07, 2019 at 01:04:02PM -0700, Andrew Morton wrote:
> > > On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > > 
> > > > In Android, we are using this for the heap profiler (heapprofd) which
> > > > profiles and pin points code paths which allocates and leaves memory
> > > > idle for long periods of time. This method solves the security issue
> > > > with userspace learning the PFN, and while at it is also shown to yield
> > > > better results than the pagemap lookup, the theory being that the window
> > > > where the address space can change is reduced by eliminating the
> > > > intermediate pagemap look up stage. In virtual address indexing, the
> > > > process's mmap_sem is held for the duration of the access.
> > > 
> > > So is heapprofd a developer-only thing?  Is heapprofd included in
> > > end-user android loads?  If not then, again, wouldn't it be better to
> > > make the feature Kconfigurable so that Android developers can enable it
> > > during development then disable it for production kernels?
> > 
> > Almost all of this code is already configurable with
> > CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
> > disabled.
> > 
> > Or are you referring to something else that needs to be made configurable?
> 
> Yes - the 300+ lines of code which this patchset adds!
> 
> The impacted people will be those who use the existing
> idle-page-tracking feature but who will not use the new feature.  I
> guess we can assume this set is small...

Yes, I think this set should be small. The code size increase of page_idle.o
is from ~1KB to ~2KB. Most of the extra space is consumed by
page_idle_proc_generic() function which this patch adds. I don't think adding
another CONFIG option to disable this while keeping existing
CONFIG_IDLE_PAGE_TRACKING enabled, is worthwhile but I am open to the
addition of such an option if anyone feels strongly about it. I believe that
once this patch is merged, most like this new interface being added is what
will be used more than the old interface (for some of the usecases) so it
makes sense to keep it alive with CONFIG_IDLE_PAGE_TRACKING.

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Andrew Morton @ 2019-08-07 20:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807204530.GB90900@google.com>

On Wed, 7 Aug 2019 16:45:30 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Aug 07, 2019 at 01:04:02PM -0700, Andrew Morton wrote:
> > On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > 
> > > In Android, we are using this for the heap profiler (heapprofd) which
> > > profiles and pin points code paths which allocates and leaves memory
> > > idle for long periods of time. This method solves the security issue
> > > with userspace learning the PFN, and while at it is also shown to yield
> > > better results than the pagemap lookup, the theory being that the window
> > > where the address space can change is reduced by eliminating the
> > > intermediate pagemap look up stage. In virtual address indexing, the
> > > process's mmap_sem is held for the duration of the access.
> > 
> > So is heapprofd a developer-only thing?  Is heapprofd included in
> > end-user android loads?  If not then, again, wouldn't it be better to
> > make the feature Kconfigurable so that Android developers can enable it
> > during development then disable it for production kernels?
> 
> Almost all of this code is already configurable with
> CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
> disabled.
> 
> Or are you referring to something else that needs to be made configurable?

Yes - the 300+ lines of code which this patchset adds!

The impacted people will be those who use the existing
idle-page-tracking feature but who will not use the new feature.  I
guess we can assume this set is small...



^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Joel Fernandes @ 2019-08-07 20:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807130402.49c9ea8bf144d2f83bfeb353@linux-foundation.org>

On Wed, Aug 07, 2019 at 01:04:02PM -0700, Andrew Morton wrote:
> On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> 
> > In Android, we are using this for the heap profiler (heapprofd) which
> > profiles and pin points code paths which allocates and leaves memory
> > idle for long periods of time. This method solves the security issue
> > with userspace learning the PFN, and while at it is also shown to yield
> > better results than the pagemap lookup, the theory being that the window
> > where the address space can change is reduced by eliminating the
> > intermediate pagemap look up stage. In virtual address indexing, the
> > process's mmap_sem is held for the duration of the access.
> 
> So is heapprofd a developer-only thing?  Is heapprofd included in
> end-user android loads?  If not then, again, wouldn't it be better to
> make the feature Kconfigurable so that Android developers can enable it
> during development then disable it for production kernels?

Almost all of this code is already configurable with
CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
disabled.

Or are you referring to something else that needs to be made configurable?

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v4 1/5] mm/page_idle: Add per-pid idle page tracking using virtual indexing
From: Joel Fernandes @ 2019-08-07 20:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon, Brendan Gregg
In-Reply-To: <20190807130122.f148548c05ec07e7b716457e@linux-foundation.org>

On Wed, Aug 07, 2019 at 01:01:22PM -0700, Andrew Morton wrote:
> On Wed, 7 Aug 2019 06:00:13 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > > > 8 files changed, 376 insertions(+), 45 deletions(-)
> > > 
> > > Quite a lot of new code unconditionally added to major architectures. 
> > > Are we confident that everyone will want this feature?
> > 
> > I did not follow, could you clarify more? All of this diff stat is not to
> > architecture code:
> 
> 
> My point is that the patchset adds a lot of new code with no way in
> which users can opt out.  Almost everyone gets a fatter kernel - how
> many of those users will actually benefit from it?
> 
> If "not many" then shouldn't we be making it Kconfigurable?

Almost all of this code is already configurable with
CONFIG_IDLE_PAGE_TRACKING. If you disable it, then all of this code gets
disabled.

Or are you referring to something else that needs to be made configurable?

> Are there userspace tools which present this info to users or which
> provide monitoring of some form?  Do major distros ship those tools? 
> Do people use them?  etcetera.
> 

Android's heapprofd is what I was working on which is already using it (patch
is not yet upstreamed). There is working set tracking which Sandeep (also
from Android) said he wants to use. Minchan plans to use this in combination
with ZRAM-based idle tracking. Mike Rappoport also showed some interest, but
I am not sure where/how he is using it. These are just some of the usecases I
am aware off. I am pretty sure more will come as well.

thanks,

 - Joel


^ permalink raw reply

* Re: [PATCH v7 1/2] arm64: Define Documentation/arm64/tagged-address-abi.rst
From: Dave Hansen @ 2019-08-07 20:38 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: Vincenzo Frascino, Will Deacon, Andrey Konovalov, Szabolcs Nagy,
	Kevin Brodsky, linux-doc, linux-arch
In-Reply-To: <20190807155321.9648-2-catalin.marinas@arm.com>

On 8/7/19 8:53 AM, Catalin Marinas wrote:
> +- mmap() done by the process itself (or its parent), where either:
> +
> +  - flags have the **MAP_ANONYMOUS** bit set
> +  - the file descriptor refers to a regular file (including those returned
> +    by memfd_create()) or **/dev/zero**

What's a "regular file"? ;)

> +- brk() system call done by the process itself (i.e. the heap area between
> +  the initial location of the program break at process creation and its
> +  current location).
> +
> +- any memory mapped by the kernel in the address space of the process
> +  during creation and with the same restrictions as for mmap() above (e.g.
> +  data, bss, stack).
> +
> +The AArch64 Tagged Address ABI is an opt-in feature and an application can
> +control it via **prctl()** as follows:
> +
> +- **PR_SET_TAGGED_ADDR_CTRL**: enable or disable the AArch64 Tagged Address
> +  ABI for the calling process.
> +
> +  The (unsigned int) arg2 argument is a bit mask describing the control mode
> +  used:
> +
> +  - **PR_TAGGED_ADDR_ENABLE**: enable AArch64 Tagged Address ABI. Default
> +    status is disabled.
> +
> +  The arguments arg3, arg4, and arg5 are ignored.

For previous prctl()'s, we've found that it's best to require that the
unused arguments be 0.  Without that, apps are free to put garbage
there, which makes extending the prctl to use other arguments impossible
in the future.

Also, shouldn't this be converted over to an arch_prctl()?

> +The prctl(PR_SET_TAGGED_ADDR_CTRL, ...) will return -EINVAL if the
> +AArch64 Tagged Address ABI is not available
> +(CONFIG_ARM64_TAGGED_ADDR_ABI disabled or sysctl abi.tagged_addr=0).
> +
> +The ABI properties set by the mechanism described above are inherited by
> +threads of the same application and fork()'ed children but cleared by
> +execve().

What is the scope of these prctl()'s?  Are they thread-scoped or
process-scoped?  Can two threads in the same process run with different
tagging ABI modes?

> +Opting in (the prctl() option described above only) to or out of the
> +AArch64 Tagged Address ABI can be disabled globally at runtime using the
> +sysctl interface:
> +
> +- **abi.tagged_addr**: a new sysctl interface that can be used to prevent
> +  applications from enabling or disabling the relaxed ABI. The sysctl
> +  supports the following configuration options:
> +
> +  - **0**: disable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to
> +    enable/disable the AArch64 Tagged Address ABI globally
> +
> +  - **1** (Default): enable the prctl(PR_SET_TAGGED_ADDR_CTRL) option to
> +    enable/disable the AArch64 Tagged Address ABI globally
> +
> +  Note that this sysctl does not affect the status of the AArch64 Tagged
> +  Address ABI of the running processes.

Shouldn't the name be "abi.tagged_addr_control" or something?  It
actually has *zero* direct effect on tagged addresses in the ABI.

What's the reason for allowing it to be toggled at runtime like this?
Wouldn't it make more sense to just have it be a boot option so you
*know* what the state of individual processes is?

> +When a process has successfully enabled the new ABI by invoking
> +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following
> +behaviours are guaranteed:
> +
> +- Every currently available syscall, except the cases mentioned in section
> +  3, can accept any valid tagged pointer. The same rule is applicable to
> +  any syscall introduced in the future.
> +
> +- The syscall behaviour is undefined for non valid tagged pointers.

Do you really mean "undefined"?  I mean, a bad pointer is a bad pointer.
 Why should it matter if it's a tagged bad pointer or an untagged bad
pointer?

...
> +A definition of the meaning of tagged pointers on AArch64 can be found in:
> +Documentation/arm64/tagged-pointers.txt.
> +
> +3. AArch64 Tagged Address ABI Exceptions
> +-----------------------------------------
> +
> +The behaviour described in section 2, with particular reference to the
> +acceptance by the syscalls of any valid tagged pointer, is not applicable
> +to the following cases:

This is saying things in a pretty roundabout manner.  Can't it just say:
 "The following cases do not accept tagged pointers:"

> +- mmap() addr parameter.
> +
> +- mremap() new_address parameter.

Is munmap() missing?  Or was there a reason for leaving it out?

> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and
> +  PR_SET_MM_MAP_SIZE.
> +
> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map fields.
> +
> +Any attempt to use non-zero tagged pointers will lead to undefined
> +behaviour.

I wonder if you want to generalize this a bit.  I think you're saying
that parts of the ABI that modify the *layout* of the address space
never accept tagged pointers.

> +4. Example of correct usage
> +---------------------------
> +.. code-block:: c
> +
> +   void main(void)
> +   {
> +           static int tbi_enabled = 0;
> +           unsigned long tag = 0;
> +
> +           char *ptr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                            MAP_ANONYMOUS, -1, 0);
> +
> +           if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE,
> +                     0, 0, 0) == 0)
> +                   tbi_enabled = 1;
> +
> +           if (ptr == (void *)-1) /* MAP_FAILED */
> +                   return -1;
> +
> +           if (tbi_enabled)
> +                   tag = rand() & 0xff;
> +
> +           ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
> +
> +           *ptr = 'a';
> +
> +           ...
> +   }

It looks like the TAG_SHIFT and tag size are pretty baked into the
aarch64 architecture.  But, are you confident that no future
implementations will want different positions or sizes?  (obviously
controlled by other TCR_EL1 bits)


^ permalink raw reply

* Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index
From: Andrew Morton @ 2019-08-07 20:04 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, joelaf, Jonathan Corbet, Kees Cook, kernel-team,
	linux-api, linux-doc, linux-fsdevel, linux-mm, Michal Hocko,
	Mike Rapoport, minchan, namhyung, paulmck, Robin Murphy,
	Roman Gushchin, Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon
In-Reply-To: <20190807171559.182301-1-joel@joelfernandes.org>

On Wed,  7 Aug 2019 13:15:54 -0400 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> In Android, we are using this for the heap profiler (heapprofd) which
> profiles and pin points code paths which allocates and leaves memory
> idle for long periods of time. This method solves the security issue
> with userspace learning the PFN, and while at it is also shown to yield
> better results than the pagemap lookup, the theory being that the window
> where the address space can change is reduced by eliminating the
> intermediate pagemap look up stage. In virtual address indexing, the
> process's mmap_sem is held for the duration of the access.

So is heapprofd a developer-only thing?  Is heapprofd included in
end-user android loads?  If not then, again, wouldn't it be better to
make the feature Kconfigurable so that Android developers can enable it
during development then disable it for production kernels?


^ permalink raw reply

* Re: [PATCH v4 1/5] mm/page_idle: Add per-pid idle page tracking using virtual indexing
From: Andrew Morton @ 2019-08-07 20:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Alexey Dobriyan, Borislav Petkov, Brendan Gregg,
	Catalin Marinas, Christian Hansen, dancol, fmayer, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	minchan, namhyung, paulmck, Robin Murphy, Roman Gushchin,
	Stephen Rothwell, surenb, Thomas Gleixner, tkjos,
	Vladimir Davydov, Vlastimil Babka, Will Deacon, Brendan Gregg
In-Reply-To: <20190807100013.GC169551@google.com>

On Wed, 7 Aug 2019 06:00:13 -0400 Joel Fernandes <joel@joelfernandes.org> wrote:

> > > 8 files changed, 376 insertions(+), 45 deletions(-)
> > 
> > Quite a lot of new code unconditionally added to major architectures. 
> > Are we confident that everyone will want this feature?
> 
> I did not follow, could you clarify more? All of this diff stat is not to
> architecture code:


My point is that the patchset adds a lot of new code with no way in
which users can opt out.  Almost everyone gets a fatter kernel - how
many of those users will actually benefit from it?

If "not many" then shouldn't we be making it Kconfigurable?

Are there userspace tools which present this info to users or which
provide monitoring of some form?  Do major distros ship those tools? 
Do people use them?  etcetera.


^ permalink raw reply

* [PATCH v5 3/6] [RFC] x86: Add support for idle bit in swap PTE
From: Joel Fernandes (Google) @ 2019-08-07 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Dobriyan, Andrew Morton,
	Borislav Petkov, Brendan Gregg, Catalin Marinas, Christian Hansen,
	dancol, fmayer, H. Peter Anvin, Ingo Molnar, joelaf,
	Jonathan Corbet, Kees Cook, kernel-team, linux-api, linux-doc,
	linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport, minchan,
	namhyung, paulmck, Robin Murphy, Roman Gushchin, Stephen Rothwell,
	surenb, Thomas Gleixner, tkjos, Vladimir Davydov, Vlastimil Babka,
	Will Deacon
In-Reply-To: <20190807171559.182301-1-joel@joelfernandes.org>

This bit will be used by idle page tracking code to correctly identify
if a page that was swapped out was idle before it got swapped out.
Without this PTE bit, we lose information about if a page is idle or not
since the page frame gets unmapped and the page gets freed.

Bits 2-6 are unused in the swap PTE (see the comment in
arch/x86/include/asm/pgtable_64.h). Bit 2 corresponds to _PAGE_USER. Use
it for swap PTE purposes.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/x86/Kconfig                     |  1 +
 arch/x86/include/asm/pgtable.h       | 15 +++++++++++++++
 arch/x86/include/asm/pgtable_types.h |  6 ++++++
 3 files changed, 22 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..728f22370f17 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -139,6 +139,7 @@ config X86
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_PREL32_RELOCATIONS
+	select HAVE_ARCH_PTE_SWP_PGIDLE
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_STACKLEAK
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0bc530c4eb13..ef3e662cee4a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1371,6 +1371,21 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
 #endif
 #endif
 
+static inline pte_t pte_swp_mkpage_idle(pte_t pte)
+{
+	return pte_set_flags(pte, _PAGE_SWP_PGIDLE);
+}
+
+static inline int pte_swp_page_idle(pte_t pte)
+{
+	return pte_flags(pte) & _PAGE_SWP_PGIDLE;
+}
+
+static inline pte_t pte_swp_clear_mkpage_idle(pte_t pte)
+{
+	return pte_clear_flags(pte, _PAGE_SWP_PGIDLE);
+}
+
 #define PKRU_AD_BIT 0x1
 #define PKRU_WD_BIT 0x2
 #define PKRU_BITS_PER_PKEY 2
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b5e49e6bac63..6739cba4c900 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -100,6 +100,12 @@
 #define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
 
+#ifdef CONFIG_IDLE_PAGE_TRACKING
+#define _PAGE_SWP_PGIDLE	_PAGE_USER
+#else
+#define _PAGE_SWP_PGIDLE	(_AT(pteval_t, 0))
+#endif
+
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
 #define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
 #define _PAGE_DEVMAP	(_AT(u64, 1) << _PAGE_BIT_DEVMAP)
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH v5 2/6] mm/page_idle: Add support for handling swapped PG_Idle pages
From: Joel Fernandes (Google) @ 2019-08-07 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Minchan Kim, Alexey Dobriyan,
	Andrew Morton, Borislav Petkov, Brendan Gregg, Catalin Marinas,
	Christian Hansen, dancol, fmayer, H. Peter Anvin, Ingo Molnar,
	joelaf, Jonathan Corbet, Kees Cook, kernel-team, linux-api,
	linux-doc, linux-fsdevel, linux-mm, Michal Hocko, Mike Rapoport,
	namhyung, paulmck, Robin Murphy, Roman Gushchin, Stephen Rothwell,
	surenb, Thomas Gleixner, tkjos, Vladimir Davydov, Vlastimil Babka,
	Will Deacon
In-Reply-To: <20190807171559.182301-1-joel@joelfernandes.org>

Idle page tracking currently does not work well in the following
scenario:
 1. mark page-A idle which was present at that time.
 2. run workload
 3. page-A is not touched by workload
 4. *sudden* memory pressure happen so finally page A is finally swapped out
 5. now see the page A - it appears as if it was accessed (pte unmapped
    so idle bit not set in output) - but it's incorrect.

To fix this, we store the idle information into a new idle bit of the
swap PTE during swapping of anonymous pages.

Also in the future, madvise extensions will allow a system process
manager (like Android's ActivityManager) to swap pages out of a process
that it knows will be cold. To an external process like a heap profiler
that is doing idle tracking on another process, this procedure will
interfere with the idle page tracking similar to the above steps.

Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/Kconfig                  |  3 +++
 include/asm-generic/pgtable.h |  6 ++++++
 mm/page_idle.c                | 26 ++++++++++++++++++++++++--
 mm/rmap.c                     |  2 ++
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a7b57dd42c26..3aa121ce824e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -575,6 +575,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config HAVE_ARCH_SOFT_DIRTY
 	bool
 
+config HAVE_ARCH_PTE_SWP_PGIDLE
+	bool
+
 config HAVE_MOD_ARCH_SPECIFIC
 	bool
 	help
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 75d9d68a6de7..6d51d0a355a7 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -712,6 +712,12 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
 #define arch_start_context_switch(prev)	do {} while (0)
 #endif
 
+#ifndef CONFIG_HAVE_ARCH_PTE_SWP_PGIDLE
+static inline pte_t pte_swp_mkpage_idle(pte_t pte) { return pte; }
+static inline int pte_swp_page_idle(pte_t pte) { return 0; }
+static inline pte_t pte_swp_clear_mkpage_idle(pte_t pte) { return pte; }
+#endif
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 9de4f4c67a8c..2766d4ab348c 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -276,7 +276,7 @@ struct page_idle_proc_priv {
 };
 
 /*
- * Add page to list to be set as idle later.
+ * Set a page as idle or add it to a list to be set as idle later.
  */
 static void pte_page_idle_proc_add(struct page *page,
 			       unsigned long addr, struct mm_walk *walk)
@@ -303,6 +303,13 @@ static void pte_page_idle_proc_add(struct page *page,
 		page_get = page_idle_get_page(page);
 		if (!page_get)
 			return;
+	} else {
+		/* For swapped pages, set output bit as idle */
+		frames = (addr - priv->start_addr) >> PAGE_SHIFT;
+		bit = frames % BITMAP_CHUNK_BITS;
+		chunk = &chunk[frames / BITMAP_CHUNK_BITS];
+		*chunk |= (1 << bit);
+		return;
 	}
 
 	/*
@@ -323,6 +330,7 @@ static int pte_page_idle_proc_range(pmd_t *pmd, unsigned long addr,
 	spinlock_t *ptl;
 	struct page *page;
 	struct vm_area_struct *vma = walk->vma;
+	struct page_idle_proc_priv *priv = walk->private;
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
@@ -341,6 +349,19 @@ static int pte_page_idle_proc_range(pmd_t *pmd, unsigned long addr,
 
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
+		/* For swap_pte handling, we use an idle bit in the swap pte. */
+		if (is_swap_pte(*pte)) {
+			if (priv->write) {
+				set_pte_at(walk->mm, addr, pte,
+					   pte_swp_mkpage_idle(*pte));
+			} else {
+				/* If swap pte has idle bit set, report it as idle */
+				if (pte_swp_page_idle(*pte))
+					pte_page_idle_proc_add(NULL, addr, walk);
+			}
+			continue;
+		}
+
 		if (!pte_present(*pte))
 			continue;
 
@@ -432,7 +453,8 @@ ssize_t page_idle_proc_generic(struct file *file, char __user *ubuff,
 				set_page_idle(page);
 			}
 		} else {
-			if (page_idle_pte_check(page)) {
+			/* If page is NULL, it was swapped out */
+			if (!page || page_idle_pte_check(page)) {
 				off = ((cur->addr) >> PAGE_SHIFT) - start_frame;
 				bit = off % BITMAP_CHUNK_BITS;
 				index = off / BITMAP_CHUNK_BITS;
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..4bd618aab402 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1629,6 +1629,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			swp_pte = swp_entry_to_pte(entry);
 			if (pte_soft_dirty(pteval))
 				swp_pte = pte_swp_mksoft_dirty(swp_pte);
+			if (page_is_idle(page))
+				swp_pte = pte_swp_mkpage_idle(swp_pte);
 			set_pte_at(mm, address, pvmw.pte, swp_pte);
 			/* Invalidate as we cleared the pte */
 			mmu_notifier_invalidate_range(mm, address,
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox