LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH v3 3/3] soc: fsl: add RCPM driver
From: Ran Wang @ 2019-05-20  7:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mark Rutland, Len Brown, devicetree@vger.kernel.org,
	Greg Kroah-Hartman, linux-pm@vger.kernel.org, Rafael J . Wysocki,
	linux-kernel@vger.kernel.org, Leo Li, Rob Herring,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190520072630.GA3674@amd>

Hi Pavel,

On Monday, May 20, 2019 15:27: Pavel Machek wrote:
> 
> Hi!
> 
> > The NXP's QorIQ Processors based on ARM Core have RCPM module (Run
> > Control and Power Management), which performs all device-level tasks
> > associated with power management such as wakeup source control.
> >
> > This driver depends on PM wakeup source framework which help to
> > collect wake information.
> >
> > Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> 
> > +// Copyright 2019 NXP
> > +//
> > +// Author: Ran Wang <ran.wang_1@nxp.com>,
> 
> extra ,

OK, will update.

> > +	rcpm = dev_get_drvdata(dev);
> > +	if (!rcpm)
> > +		return -EINVAL;
> > +
> > +	/* Begin with first registered wakeup source */
> > +	ws = wakeup_source_get_next(NULL);
> > +	while (ws) {
> 
> while (ws = wakeup_source_get_next(NULL))
> 
> ?
I just answered this in v2 mail thread: 
"Actually, we only pass NULL to wakeup_source_get_next()
at very first call to get 1st wakeup source. Then in the while
loop, we will fetch next source but not 1st, that's different.
I am afraid your suggestion is not quite correct."

Regards
Ran

^ permalink raw reply

* Re: [PATCH v3 3/3] soc: fsl: add RCPM driver
From: Pavel Machek @ 2019-05-20  7:26 UTC (permalink / raw)
  To: Ran Wang
  Cc: Mark Rutland, Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Li Yang, Rob Herring,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20190520065816.32360-3-ran.wang_1@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

Hi!

> The NXP's QorIQ Processors based on ARM Core have RCPM module
> (Run Control and Power Management), which performs all device-level
> tasks associated with power management such as wakeup source control.
> 
> This driver depends on PM wakeup source framework which help to
> collect wake information.
> 
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>

> +// Copyright 2019 NXP
> +//
> +// Author: Ran Wang <ran.wang_1@nxp.com>,

extra ,

> +	rcpm = dev_get_drvdata(dev);
> +	if (!rcpm)
> +		return -EINVAL;
> +
> +	/* Begin with first registered wakeup source */
> +	ws = wakeup_source_get_next(NULL);
> +	while (ws) {

while (ws = wakeup_source_get_next(NULL))

?

								Pavel
								
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH v2] ocxl: Fix potential memory leak on context creation
From: Andrew Donnellan @ 2019-05-20  7:18 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, andrew.donnellan, alastair; +Cc: clombard
In-Reply-To: <20190520071618.1722-1-fbarrat@linux.ibm.com>

Acked-by: Andrew Donnellan <ajd@linux.ibm.com>

On 20/5/19 5:16 pm, Frederic Barrat wrote:
> If we couldn't fully init a context, we were leaking memory.
> 
> Fixes: b9721d275cc2 ("ocxl: Allow external drivers to use OpenCAPI contexts")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> 
> Changelog:
> v2: reset context pointer in case of allocation failure (Andrew)
> 
>   drivers/misc/ocxl/context.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index bab9c9364184..24e4fb010275 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -22,6 +22,8 @@ int ocxl_context_alloc(struct ocxl_context **context, struct ocxl_afu *afu,
>   			afu->pasid_base + afu->pasid_max, GFP_KERNEL);
>   	if (pasid < 0) {
>   		mutex_unlock(&afu->contexts_lock);
> +		kfree(*context);
> +		*context = NULL;
>   		return pasid;
>   	}
>   	afu->pasid_count++;
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited


^ permalink raw reply

* [PATCH v2] ocxl: Fix potential memory leak on context creation
From: Frederic Barrat @ 2019-05-20  7:16 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, alastair; +Cc: clombard

If we couldn't fully init a context, we were leaking memory.

Fixes: b9721d275cc2 ("ocxl: Allow external drivers to use OpenCAPI contexts")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---

Changelog:
v2: reset context pointer in case of allocation failure (Andrew)

 drivers/misc/ocxl/context.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index bab9c9364184..24e4fb010275 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -22,6 +22,8 @@ int ocxl_context_alloc(struct ocxl_context **context, struct ocxl_afu *afu,
 			afu->pasid_base + afu->pasid_max, GFP_KERNEL);
 	if (pasid < 0) {
 		mutex_unlock(&afu->contexts_lock);
+		kfree(*context);
+		*context = NULL;
 		return pasid;
 	}
 	afu->pasid_count++;
-- 
2.21.0


^ permalink raw reply related

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
From: Nicholas Piggin @ 2019-05-20  7:00 UTC (permalink / raw)
  To: bharata
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth,
	linuxppc-dev
In-Reply-To: <20190520055622.GC22939@in.ibm.com>

Bharata B Rao's on May 20, 2019 3:56 pm:
> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> >> > git bisect points to
>> >> >
>> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> >> > Author: Nicholas Piggin <npiggin@gmail.com>
>> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >> >
>> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >> >
>> >> >     The page table fragment allocator uses the main page refcount racily
>> >> >     with respect to speculative references. A customer observed a BUG due
>> >> >     to page table page refcount underflow in the fragment allocator. This
>> >> >     can be caused by the fragment allocator set_page_count stomping on a
>> >> >     speculative reference, and then the speculative failure handler
>> >> >     decrements the new reference, and the underflow eventually pops when
>> >> >     the page tables are freed.
>> >> >
>> >> >     Fix this by using a dedicated field in the struct page for the page
>> >> >     table fragment allocator.
>> >> >
>> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >> >     Cc: stable@vger.kernel.org # v3.10+
>> >> 
>> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> >> see the crash.
>> > 
>> > Right, but the commit says it fixes page table page refcount underflow by
>> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>> > for this pt_frag_refcount.
>> 
>> The fixed underflow is caused by a bug (race on page count) that got 
>> fixed by that patch. You are hitting a different underflow here. It's
>> not certain my patch caused it, I'm just trying to reproduce now.
> 
> Ok.

Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
4GB guest (via host adding / removing memory device), and it just works.

It's likely to be an edge case like an off by one or rounding error
that just happens to trigger in your config. Might be easiest if you
could test with a debug patch.

Thanks,
Nick


^ permalink raw reply

* [PATCH v3 3/3] soc: fsl: add RCPM driver
From: Ran Wang @ 2019-05-20  6:58 UTC (permalink / raw)
  To: Li Yang, Rob Herring, Mark Rutland, Pavel Machek
  Cc: Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Ran Wang, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20190520065816.32360-1-ran.wang_1@nxp.com>

The NXP's QorIQ Processors based on ARM Core have RCPM module
(Run Control and Power Management), which performs all device-level
tasks associated with power management such as wakeup source control.

This driver depends on PM wakeup source framework which help to
collect wake information.

Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
---
Change in v3:
	- Some whitespace ajdustment.

Change in v2:
	- Rebase Kconfig and Makefile update to latest mainline.

 drivers/soc/fsl/Kconfig  |    8 +++
 drivers/soc/fsl/Makefile |    1 +
 drivers/soc/fsl/rcpm.c   |  124 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+), 0 deletions(-)
 create mode 100644 drivers/soc/fsl/rcpm.c

diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
index 61f8e14..8e84e40 100644
--- a/drivers/soc/fsl/Kconfig
+++ b/drivers/soc/fsl/Kconfig
@@ -29,4 +29,12 @@ config FSL_MC_DPIO
 	  other DPAA2 objects. This driver does not expose the DPIO
 	  objects individually, but groups them under a service layer
 	  API.
+
+config FSL_RCPM
+	bool "Freescale RCPM support"
+	depends on PM_SLEEP
+	help
+	  The NXP's QorIQ Processors based on ARM Core have RCPM module
+	  (Run Control and Power Management), which performs all device-level
+	  tasks associated with power management, such as wakeup source control.
 endmenu
diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
index 803ef1b..c1be6ee 100644
--- a/drivers/soc/fsl/Makefile
+++ b/drivers/soc/fsl/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_QUICC_ENGINE)		+= qe/
 obj-$(CONFIG_CPM)			+= qe/
 obj-$(CONFIG_FSL_GUTS)			+= guts.o
 obj-$(CONFIG_FSL_MC_DPIO) 		+= dpio/
+obj-$(CONFIG_FSL_RCPM)		+= rcpm.o
diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
new file mode 100644
index 0000000..678d1cd
--- /dev/null
+++ b/drivers/soc/fsl/rcpm.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// rcpm.c - Freescale QorIQ RCPM driver
+//
+// Copyright 2019 NXP
+//
+// Author: Ran Wang <ran.wang_1@nxp.com>,
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/of_address.h>
+#include <linux/slab.h>
+#include <linux/suspend.h>
+#include <linux/kernel.h>
+
+#define RCPM_WAKEUP_CELL_MAX_SIZE	7
+
+struct rcpm {
+	unsigned int	wakeup_cells;
+	void __iomem	*ippdexpcr_base;
+	bool		little_endian;
+};
+
+static int rcpm_pm_prepare(struct device *dev)
+{
+	struct device_node	*np = dev->of_node;
+	struct wakeup_source	*ws;
+	struct rcpm		*rcpm;
+	u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp;
+	int i, ret;
+
+	rcpm = dev_get_drvdata(dev);
+	if (!rcpm)
+		return -EINVAL;
+
+	/* Begin with first registered wakeup source */
+	ws = wakeup_source_get_next(NULL);
+	while (ws) {
+		ret = device_property_read_u32_array(ws->attached_dev,
+				"fsl,rcpm-wakeup", value, rcpm->wakeup_cells + 1);
+
+		/*  Wakeup source should refer to current rcpm device */
+		if (ret || (np->phandle != value[0])) {
+			dev_info(dev, "%s doesn't refer to this rcpm\n",
+					ws->name);
+			ws = wakeup_source_get_next(ws);
+			continue;
+		}
+
+		for (i = 0; i < rcpm->wakeup_cells; i++) {
+			/* We can only OR related bits */
+			if (value[i + 1]) {
+				if (rcpm->little_endian) {
+					tmp = ioread32(rcpm->ippdexpcr_base + i * 4);
+					tmp |= value[i + 1];
+					iowrite32(tmp, rcpm->ippdexpcr_base + i * 4);
+				} else {
+					tmp = ioread32be(rcpm->ippdexpcr_base + i * 4);
+					tmp |= value[i + 1];
+					iowrite32be(tmp, rcpm->ippdexpcr_base + i * 4);
+				}
+			}
+		}
+		ws = wakeup_source_get_next(ws);
+	}
+
+	return 0;
+}
+
+static const struct dev_pm_ops rcpm_pm_ops = {
+	.prepare =  rcpm_pm_prepare,
+};
+
+static int rcpm_probe(struct platform_device *pdev)
+{
+	struct device	*dev = &pdev->dev;
+	struct resource *r;
+	struct rcpm	*rcpm;
+	int ret;
+
+	rcpm = devm_kzalloc(dev, sizeof(*rcpm), GFP_KERNEL);
+	if (!rcpm)
+		return -ENOMEM;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!r)
+		return -ENODEV;
+
+	rcpm->ippdexpcr_base = devm_ioremap_resource(&pdev->dev, r);
+	if (IS_ERR(rcpm->ippdexpcr_base)) {
+		ret =  PTR_ERR(rcpm->ippdexpcr_base);
+		return ret;
+	}
+
+	rcpm->little_endian = device_property_read_bool(
+			&pdev->dev, "little-endian");
+
+	ret = device_property_read_u32(&pdev->dev,
+			"fsl,#rcpm-wakeup-cells", &rcpm->wakeup_cells);
+	if (ret)
+		return ret;
+
+	dev_set_drvdata(&pdev->dev, rcpm);
+
+	return 0;
+}
+
+static const struct of_device_id rcpm_of_match[] = {
+	{ .compatible = "fsl,qoriq-rcpm-2.1+", },
+	{}
+};
+MODULE_DEVICE_TABLE(of, rcpm_of_match);
+
+static struct platform_driver rcpm_driver = {
+	.driver = {
+		.name = "rcpm",
+		.of_match_table = rcpm_of_match,
+		.pm	= &rcpm_pm_ops,
+	},
+	.probe = rcpm_probe,
+};
+
+module_platform_driver(rcpm_driver);
-- 
1.7.1


^ permalink raw reply related

* [PATCH v3 1/3] PM: wakeup: Add routine to help fetch wakeup source object.
From: Ran Wang @ 2019-05-20  6:58 UTC (permalink / raw)
  To: Li Yang, Rob Herring, Mark Rutland, Pavel Machek
  Cc: Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Ran Wang, linuxppc-dev,
	linux-arm-kernel

Some user might want to go through all registered wakeup sources
and doing things accordingly. For example, SoC PM driver might need to
do HW programming to prevent powering down specific IP which wakeup
source depending on. And is user's responsibility to identify if this
wakeup source he is interested in.

Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
---
Change in v3:
	- Adjust indentation of *attached_dev;.

Change in v2:
	- None.

 drivers/base/power/wakeup.c |   18 ++++++++++++++++++
 include/linux/pm_wakeup.h   |    3 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 5b2b6a0..6904485 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -14,6 +14,7 @@
 #include <linux/suspend.h>
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
+#include <linux/of_device.h>
 #include <linux/pm_wakeirq.h>
 #include <trace/events/power.h>
 
@@ -226,6 +227,22 @@ void wakeup_source_unregister(struct wakeup_source *ws)
 	}
 }
 EXPORT_SYMBOL_GPL(wakeup_source_unregister);
+/**
+ * wakeup_source_get_next - Get next wakeup source from the list
+ * @ws: Previous wakeup source object, null means caller want first one.
+ */
+struct wakeup_source *wakeup_source_get_next(struct wakeup_source *ws)
+{
+	struct list_head *ws_head = &wakeup_sources;
+
+	if (ws)
+		return list_next_or_null_rcu(ws_head, &ws->entry,
+				struct wakeup_source, entry);
+	else
+		return list_entry_rcu(ws_head->next,
+				struct wakeup_source, entry);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_get_next);
 
 /**
  * device_wakeup_attach - Attach a wakeup source object to a device object.
@@ -242,6 +259,7 @@ static int device_wakeup_attach(struct device *dev, struct wakeup_source *ws)
 		return -EEXIST;
 	}
 	dev->power.wakeup = ws;
+	ws->attached_dev = dev;
 	if (dev->power.wakeirq)
 		device_wakeup_attach_irq(dev, dev->power.wakeirq);
 	spin_unlock_irq(&dev->power.lock);
diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
index 0ff134d..913b2fb 100644
--- a/include/linux/pm_wakeup.h
+++ b/include/linux/pm_wakeup.h
@@ -50,6 +50,7 @@
  * @wakeup_count: Number of times the wakeup source might abort suspend.
  * @active: Status of the wakeup source.
  * @has_timeout: The wakeup source has been activated with a timeout.
+ * @attached_dev: The device it attached to
  */
 struct wakeup_source {
 	const char 		*name;
@@ -70,6 +71,7 @@ struct wakeup_source {
 	unsigned long		wakeup_count;
 	bool			active:1;
 	bool			autosleep_enabled:1;
+	struct device		*attached_dev;
 };
 
 #ifdef CONFIG_PM_SLEEP
@@ -101,6 +103,7 @@ static inline void device_set_wakeup_path(struct device *dev)
 extern void wakeup_source_remove(struct wakeup_source *ws);
 extern struct wakeup_source *wakeup_source_register(const char *name);
 extern void wakeup_source_unregister(struct wakeup_source *ws);
+extern struct wakeup_source *wakeup_source_get_next(struct wakeup_source *ws);
 extern int device_wakeup_enable(struct device *dev);
 extern int device_wakeup_disable(struct device *dev);
 extern void device_set_wakeup_capable(struct device *dev, bool capable);
-- 
1.7.1


^ permalink raw reply related

* [PATCH v3 2/3] Documentation: dt: binding: fsl: Add 'little-endian' and update Chassis define
From: Ran Wang @ 2019-05-20  6:58 UTC (permalink / raw)
  To: Li Yang, Rob Herring, Mark Rutland, Pavel Machek
  Cc: Len Brown, devicetree, Greg Kroah-Hartman, linux-pm,
	Rafael J . Wysocki, linux-kernel, Ran Wang, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20190520065816.32360-1-ran.wang_1@nxp.com>

By default, QorIQ SoC's RCPM register block is Big Endian. But
there are some exceptions, such as LS1088A and LS2088A, are Little
Endian. So add this optional property to help identify them.

Actually LS2021A and other Layerscapes won't totally follow Chassis
2.1, so separate them from powerpc SoC.

Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
---
Change in v3:
	- None.

Change in v2:
	- None.

 Documentation/devicetree/bindings/soc/fsl/rcpm.txt |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
index e284e4e..058154c 100644
--- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
@@ -20,6 +20,7 @@ Required properites:
 	* "fsl,qoriq-rcpm-1.0": for chassis 1.0 rcpm
 	* "fsl,qoriq-rcpm-2.0": for chassis 2.0 rcpm
 	* "fsl,qoriq-rcpm-2.1": for chassis 2.1 rcpm
+	* "fsl,qoriq-rcpm-2.1+": for chassis 2.1+ rcpm
 
 All references to "1.0" and "2.0" refer to the QorIQ chassis version to
 which the chip complies.
@@ -27,7 +28,12 @@ Chassis Version		Example Chips
 ---------------		-------------------------------
 1.0				p4080, p5020, p5040, p2041, p3041
 2.0				t4240, b4860, b4420
-2.1				t1040, ls1021
+2.1				t1040,
+2.1+			ls1021a, ls1012a, ls1043a, ls1046a
+
+Optional properties:
+ - little-endian : RCPM register block is Little Endian. Without it RCPM
+   will be Big Endian (default case).
 
 Example:
 The RCPM node for T4240:
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH] ocxl: Fix potential memory leak on context creation
From: Frederic Barrat @ 2019-05-20  6:53 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev, andrew.donnellan, alastair; +Cc: clombard
In-Reply-To: <b0d819db-d052-0601-c72b-159017c7e925@linux.ibm.com>



Le 20/05/2019 à 03:45, Andrew Donnellan a écrit :
> On 18/5/19 12:20 am, Frederic Barrat wrote:
>> If we couldn't fully init a context, we were leaking memory.
>>
>> Fixes: b9721d275cc2 ("ocxl: Allow external drivers to use OpenCAPI 
>> contexts")
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 
> Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
> 
>> ---
>>   drivers/misc/ocxl/context.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
>> index bab9c9364184..ab93156aa83e 100644
>> --- a/drivers/misc/ocxl/context.c
>> +++ b/drivers/misc/ocxl/context.c
>> @@ -22,6 +22,7 @@ int ocxl_context_alloc(struct ocxl_context 
>> **context, struct ocxl_afu *afu,
>>               afu->pasid_base + afu->pasid_max, GFP_KERNEL);
>>       if (pasid < 0) {
>>           mutex_unlock(&afu->contexts_lock);
>> +        kfree(*context);
> 
> (defensive programming: set *context = NULL so that if the caller 
> ignores the return code we get an obvious crash)


Good point. v2 on its way


>>           return pasid;
>>       }
>>       afu->pasid_count++;
>>
> 


^ permalink raw reply

* RE: [PATCH V2 3/3] soc: fsl: add RCPM driver
From: Ran Wang @ 2019-05-20  6:48 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mark Rutland, Len Brown, devicetree@vger.kernel.org,
	Greg Kroah-Hartman, linux-pm@vger.kernel.org, Rafael J . Wysocki,
	linux-kernel@vger.kernel.org, Leo Li, Rob Herring,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190519213844.GH31403@amd>

Hi Pavel,

On Monday, May 20, 2019 05:39, Pavel Machek wrote:
> 
> Hi!
> 
> 
> > +
> > +struct rcpm {
> > +	unsigned int wakeup_cells;
> > +	void __iomem *ippdexpcr_base;
> > +	bool	little_endian;
> > +};
> 
> Inconsistent whitespace

OK, will make them aligned.

> 
> > +static int rcpm_pm_prepare(struct device *dev) {
> > +	struct device_node *np = dev->of_node;
> > +	struct wakeup_source *ws;
> > +	struct rcpm *rcpm;
> > +	u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp;
> > +	int i, ret;
> > +
> > +	rcpm = dev_get_drvdata(dev);
> > +	if (!rcpm)
> > +		return -EINVAL;
> > +
> > +	/* Begin with first registered wakeup source */
> > +	ws = wakeup_source_get_next(NULL);
> > +	while (ws) {
> 
> while (ws = wakeup_source_get_next(NULL)) ?

Actually, we only pass NULL to wakeup_source_get_next() at very first
call to get 1st wakeup source. Then in the while loop, we will fetch
next source but not 1st, that's different. I am afraid your suggestion
is not quite correct.

> 
> > +static int rcpm_probe(struct platform_device *pdev) {
> > +	struct device	*dev = &pdev->dev;
> > +	struct resource *r;
> > +	struct rcpm		*rcpm;
> > +	int ret;
> 
> Whitespace.

OK, will update, thanks for your review.

Regards,
Ran

^ permalink raw reply

* Re: [RFC PATCH v2 09/10] KVM: PPC: Book3S HV: Fixed for running secure guests
From: Paul Mackerras @ 2019-05-20  6:40 UTC (permalink / raw)
  To: Claudio Carvalho
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, kvm-ppc,
	Bharata B Rao, linuxppc-dev, Sukadev Bhattiprolu,
	Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-10-cclaudio@linux.ibm.com>

On Sat, May 18, 2019 at 11:25:23AM -0300, Claudio Carvalho wrote:
> From: Paul Mackerras <paulus@ozlabs.org>
> 
> - Pass SRR1 in r11 for UV_RETURN because SRR0 and SRR1 get set by
>   the sc 2 instruction. (Note r3 - r10 potentially have hcall return
>   values in them.)
> 
> - Fix kvmppc_msr_interrupt to preserve the MSR_S bit.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>

This should be folded into the previous patch.

Paul.

^ permalink raw reply

* Re: [RFC PATCH v2 08/10] KVM: PPC: Ultravisor: Return to UV for hcalls from SVM
From: Paul Mackerras @ 2019-05-20  6:17 UTC (permalink / raw)
  To: Claudio Carvalho
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, kvm-ppc,
	Bharata B Rao, linuxppc-dev, Sukadev Bhattiprolu,
	Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-9-cclaudio@linux.ibm.com>

On Sat, May 18, 2019 at 11:25:22AM -0300, Claudio Carvalho wrote:
> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> All hcalls from a secure VM go to the ultravisor from where they are
> reflected into the HV. When we (HV) complete processing such hcalls,
> we should return to the UV rather than to the guest kernel.

This paragraph in the patch description, and the comment in
book3s_hv_rmhandlers.S, are confusing and possibly misleading in
focussing on returns from hcalls, when the change is needed for any
sort of entry to the guest from the hypervisor, whether it is a return
from an hcall, a return from a hypervisor interrupt, or the first time
that a guest vCPU is run.

This paragraph needs to explain that to enter a secure guest, we have
to go through the ultravisor, therefore we do a ucall when we are
entering a secure guest.

[snip]

> +/*
> + * The hcall we just completed was from Ultravisor. Use UV_RETURN
> + * ultra call to return to the Ultravisor. Results from the hcall
> + * are already in the appropriate registers (r3:12), except for
> + * R6,7 which we used as temporary registers above. Restore them,
> + * and set R0 to the ucall number (UV_RETURN).
> + */

This needs to say something like "We are entering a secure guest, so
we have to invoke the ultravisor to do that.  If we are returning from
a hcall, the results are already ...".

> +ret_to_ultra:
> +	lwz	r6, VCPU_CR(r4)
> +	mtcr	r6
> +	LOAD_REG_IMMEDIATE(r0, UV_RETURN)
> +	ld	r7, VCPU_GPR(R7)(r4)
> +	ld	r6, VCPU_GPR(R6)(r4)
> +	ld	r4, VCPU_GPR(R4)(r4)
> +	sc	2
>  
>  /*
>   * Enter the guest on a P9 or later system where we have exactly
> -- 
> 2.20.1

Paul.

^ permalink raw reply

* Re: [RFC PATCH v2 07/10] KVM: PPC: Ultravisor: Restrict LDBAR access
From: Paul Mackerras @ 2019-05-20  5:43 UTC (permalink / raw)
  To: Claudio Carvalho
  Cc: Madhavan Srinivasan, Michael Anderson, Ram Pai, kvm-ppc,
	Bharata B Rao, linuxppc-dev, Sukadev Bhattiprolu,
	Thiago Jung Bauermann, Anshuman Khandual
In-Reply-To: <20190518142524.28528-8-cclaudio@linux.ibm.com>

On Sat, May 18, 2019 at 11:25:21AM -0300, Claudio Carvalho wrote:
> From: Ram Pai <linuxram@us.ibm.com>
> 
> When the ultravisor firmware is available, it takes control over the
> LDBAR register. In this case, thread-imc updates and save/restore
> operations on the LDBAR register are handled by ultravisor.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> [Restrict LDBAR access in assembly code and some in C, update the commit
>  message]
> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>

Some of the places that you are patching below are explicitly only
executed on POWER8, which can't have an ultravisor, and therefore
the change isn't needed:

> ---
>  arch/powerpc/kvm/book3s_hv.c                 |  4 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S      |  2 +
>  arch/powerpc/perf/imc-pmu.c                  | 64 ++++++++++++--------
>  arch/powerpc/platforms/powernv/idle.c        |  6 +-
>  arch/powerpc/platforms/powernv/subcore-asm.S |  4 ++
>  5 files changed, 52 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 0fab0a201027..81f35f955d16 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -75,6 +75,7 @@
>  #include <asm/xics.h>
>  #include <asm/xive.h>
>  #include <asm/hw_breakpoint.h>
> +#include <asm/firmware.h>
>  
>  #include "book3s.h"
>  
> @@ -3117,7 +3118,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  			subcore_size = MAX_SMT_THREADS / split;
>  			split_info.rpr = mfspr(SPRN_RPR);
>  			split_info.pmmar = mfspr(SPRN_PMMAR);
> -			split_info.ldbar = mfspr(SPRN_LDBAR);
> +			if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> +				split_info.ldbar = mfspr(SPRN_LDBAR);

This is inside an if (is_power8) statement.

> diff --git a/arch/powerpc/platforms/powernv/subcore-asm.S b/arch/powerpc/platforms/powernv/subcore-asm.S
> index 39bb24aa8f34..e4383fa5e150 100644
> --- a/arch/powerpc/platforms/powernv/subcore-asm.S
> +++ b/arch/powerpc/platforms/powernv/subcore-asm.S
> @@ -44,7 +44,9 @@ _GLOBAL(split_core_secondary_loop)
>  
>  real_mode:
>  	/* Grab values from unsplit SPRs */
> +BEGIN_FW_FTR_SECTION
>  	mfspr	r6,  SPRN_LDBAR
> +END_FW_FTR_SECTION_IFCLR(FW_FEATURE_ULTRAVISOR)
>  	mfspr	r7,  SPRN_PMMAR
>  	mfspr	r8,  SPRN_PMCR
>  	mfspr	r9,  SPRN_RPR
> @@ -77,7 +79,9 @@ real_mode:
>  	mtspr	SPRN_HDEC, r4
>  
>  	/* Restore SPR values now we are split */
> +BEGIN_FW_FTR_SECTION
>  	mtspr	SPRN_LDBAR, r6
> +END_FW_FTR_SECTION_IFCLR(FW_FEATURE_ULTRAVISOR)

Only POWER8 supports split-core mode, so we can only get here on
POWER8.

Paul.

^ permalink raw reply

* [WIP RFC PATCH 6/6] fwvarfs: Add opal_secvar backend
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

COMPILE TESTED ONLY.

mount -t fwvarfs opal_secvar /fw

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 Documentation/filesystems/fwvarfs.txt |   3 +
 fs/fwvarfs/Kconfig                    |  11 ++
 fs/fwvarfs/Makefile                   |   1 +
 fs/fwvarfs/fwvarfs.c                  |   3 +
 fs/fwvarfs/fwvarfs.h                  |   4 +
 fs/fwvarfs/opal_secvar.c              | 218 ++++++++++++++++++++++++++
 6 files changed, 240 insertions(+)
 create mode 100644 fs/fwvarfs/opal_secvar.c

diff --git a/Documentation/filesystems/fwvarfs.txt b/Documentation/filesystems/fwvarfs.txt
index 7c1e921e5c50..3ecc4b4428a5 100644
--- a/Documentation/filesystems/fwvarfs.txt
+++ b/Documentation/filesystems/fwvarfs.txt
@@ -36,6 +36,9 @@ Supported backends
    backend. Read-only with no creation or deletion, but sufficient that
    efivar --print --name <whatever> works the same as efivarfs.
 
+ * opal_secvar - COMPILE-TESTED ONLY implementation against PowerNV
+   OPAL Secure Variable storage.
+
 Usage
 -----
 
diff --git a/fs/fwvarfs/Kconfig b/fs/fwvarfs/Kconfig
index e4474da11dbc..cb9cbc6f8fb3 100644
--- a/fs/fwvarfs/Kconfig
+++ b/fs/fwvarfs/Kconfig
@@ -34,3 +34,14 @@ config FWVAR_FS_EFI_BACKEND
 	  in the same way the do with efivarfs.
 
 	  Say N here unless you're exploring fwvarfs.
+
+config FWVAR_FS_OPAL_SECVAR_BACKEND
+	bool "OPAL Secure Variable backend"
+	depends on FWVAR_FS
+	depends on OPAL_SECVAR
+	help
+	  Include a read-only, compile-tested-only, not up-to-date
+	  backend for OPAL Secure Variables. This is really just
+	  designed to show how the code would work, and you should
+	  only select Y here if you are developing OPAL secure
+	  variables.
diff --git a/fs/fwvarfs/Makefile b/fs/fwvarfs/Makefile
index 2ab9dfd650ca..8d258acdfef7 100644
--- a/fs/fwvarfs/Makefile
+++ b/fs/fwvarfs/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_FWVAR_FS)		+= fwvarfs.o
 
 obj-$(CONFIG_FWVAR_FS_MEM_BACKEND)		+= mem.o
 obj-$(CONFIG_FWVAR_FS_EFI_BACKEND)		+= efi.o
+obj-$(CONFIG_FWVAR_FS_OPAL_SECVAR_BACKEND)	+= opal_secvar.o
diff --git a/fs/fwvarfs/fwvarfs.c b/fs/fwvarfs/fwvarfs.c
index 643ec6585b4d..3c93b6442e95 100644
--- a/fs/fwvarfs/fwvarfs.c
+++ b/fs/fwvarfs/fwvarfs.c
@@ -24,6 +24,9 @@ static struct fwvarfs_backend *fwvarfs_backends[] = {
 #endif
 #ifdef CONFIG_FWVAR_FS_EFI_BACKEND
 	&fwvarfs_efi_backend,
+#endif
+#ifdef CONFIG_FWVAR_FS_OPAL_SECVAR_BACKEND
+	&fwvarfs_opal_secvar_backend,
 #endif
 	NULL,
 };
diff --git a/fs/fwvarfs/fwvarfs.h b/fs/fwvarfs/fwvarfs.h
index 49bde268401f..5780046dafae 100644
--- a/fs/fwvarfs/fwvarfs.h
+++ b/fs/fwvarfs/fwvarfs.h
@@ -117,4 +117,8 @@ extern struct fwvarfs_backend fwvarfs_mem_backend;
 extern struct fwvarfs_backend fwvarfs_efi_backend;
 #endif
 
+#if defined(CONFIG_FWVAR_FS_OPAL_SECVAR_BACKEND)
+extern struct fwvarfs_backend fwvarfs_opal_secvar_backend;
+#endif
+
 #endif /* FWVARFS_H */
diff --git a/fs/fwvarfs/opal_secvar.c b/fs/fwvarfs/opal_secvar.c
new file mode 100644
index 000000000000..4a1749317ed9
--- /dev/null
+++ b/fs/fwvarfs/opal_secvar.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens
+ *
+ * Loosely based on efivarfs:
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Copyright (C) 2012 Jeremy Kerr <jeremy.kerr@canonical.com>
+ * and drivers/firmware/efi/vars.c:
+ * Copyright (C) 2001,2003,2004 Dell <Matt_Domsch@dell.com>
+ * Copyright (C) 2004 Intel Corporation <matthew.e.tolentino@intel.com>
+ *
+ * We cheat by not allowing for case-insensitivity.
+ */
+
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include "fwvarfs.h"
+
+#include <linux/uuid.h>
+#include <linux/ucs2_string.h>
+#include <asm/opal-api.h>
+#include <asm/opal-secvar.h>
+
+#define pr_fmt(fmt)	"fwvarfs: opal_secvar: " fmt
+
+static LIST_HEAD(opal_secvar_file_list);
+
+struct fwvarfs_opal_secvar_file {
+	struct list_head list;
+	u16 *name;
+	guid_t vendor;
+};
+
+// stolen from efi.h
+static inline char *
+guid_to_str(guid_t *guid, char *out)
+{
+	sprintf(out, "%pUl", guid->b);
+	return out;
+}
+
+// need a forward decl to pass down to register
+struct fwvarfs_backend fwvarfs_opal_secvar_backend;
+
+
+static ssize_t fwvarfs_opal_secvar_read(void *variable, char *buf,
+		size_t count, loff_t off)
+{
+	struct fwvarfs_opal_secvar_file *file_data = variable;
+	unsigned long datasize = 0;
+	u32 attributes;
+	void *data;
+	ssize_t size = 0;
+	loff_t ppos = off;
+	int rc;
+
+	// get size
+	rc = opal_get_secure_variable(file_data->name, &file_data->vendor,
+				      NULL, &datasize, NULL);
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL)
+		return -EIO;
+
+	data = kmalloc(datasize + sizeof(attributes), GFP_KERNEL);
+
+	if (!data)
+		return -ENOMEM;
+
+	rc = opal_get_secure_variable(file_data->name, &file_data->vendor,
+		&attributes, &datasize, data + sizeof(attributes));
+	if (rc != OPAL_SUCCESS) {
+		size = -EIO;
+		goto out_free;
+	}
+
+	memcpy(data, &attributes, sizeof(attributes));
+	size = memory_read_from_buffer(buf, count, &ppos, data,
+				       datasize + sizeof(attributes));
+out_free:
+	kfree(data);
+
+	return size;
+}
+
+static int fwvarfs_opal_secvar_callback(u16 *name16, guid_t vendor,
+				unsigned long name_size)
+{
+	struct fwvarfs_opal_secvar_file *file_data;
+	char *name;
+	int len;
+	int err = -ENOMEM;
+
+	file_data = kzalloc(sizeof(*file_data), GFP_KERNEL);
+	if (!file_data)
+		return err;
+
+	file_data->name = kmemdup(name16, name_size, GFP_KERNEL);
+	if (!file_data->name)
+		goto fail;
+
+	file_data->vendor = vendor;
+
+	len = ucs2_utf8size(name16);
+
+	/* name, plus '-', plus GUID, plus NUL */
+	name = kmalloc(len + 1 + UUID_STRING_LEN + 1, GFP_KERNEL);
+	if (!name)
+		goto fail_embedded_name;
+
+	ucs2_as_utf8(name, name16, len);
+
+	name[len] = '-';
+
+	guid_to_str(&vendor, name + len + 1);
+
+	name[len + UUID_STRING_LEN + 1] = '\0';
+
+	// no convenient way to get size without reading the whole thing,
+	// present size as 0 for now.
+
+	err = fwvarfs_register_var(&fwvarfs_opal_secvar_backend, name,
+				   file_data, 0);
+	if (err)
+		goto fail_name;
+
+	INIT_LIST_HEAD(&file_data->list);
+	list_add(&opal_secvar_file_list, &file_data->list);
+
+	/* copied by the above, I think */
+	kfree(name);
+
+	return 0;
+fail_name:
+	kfree(name);
+fail_embedded_name:
+	kfree(file_data->name);
+fail:
+	kfree(file_data);
+	return err;
+}
+
+
+static void fwvarfs_opal_secvar_destroy(void *var)
+{
+	struct fwvarfs_opal_secvar_file *file_data = var;
+
+	kfree(file_data->name);
+	list_del(&file_data->list);
+	kfree(file_data);
+}
+
+
+static int fwvarfs_opal_secvar_enumerate(void)
+{
+	int err;
+	struct fwvarfs_opal_secvar_file *pos, *tmp;
+	unsigned long variable_name_size = 1024;
+	u16 *variable_name;
+	unsigned long status;
+	guid_t vendor_guid;
+
+	variable_name = kzalloc(variable_name_size, GFP_KERNEL);
+	if (!variable_name)
+		return -ENOMEM;
+
+	/*
+	 * Assume that as per EFI spec, the maximum storage allocated for both
+	 * the variable name and variable data is 1024 bytes.
+	 */
+
+	do {
+		variable_name_size = 1024;
+
+		status = opal_get_next_secure_variable(&variable_name_size,
+						variable_name,
+						&vendor_guid);
+		switch (status) {
+		case OPAL_SUCCESS:
+			err = fwvarfs_opal_secvar_callback(variable_name,
+					vendor_guid, variable_name_size);
+			if (err)
+				status = OPAL_EMPTY;
+
+			break;
+		case OPAL_EMPTY:
+			break;
+		case OPAL_UNSUPPORTED:
+			status = OPAL_EMPTY;
+			err = -ENODEV;
+		default:
+			pr_warn("opal_get_next_secure_variable: status=%lx\n",
+				status);
+			status = OPAL_EMPTY;
+			err = -EIO;
+			break;
+		}
+
+	} while (status != OPAL_EMPTY);
+
+	kfree(variable_name);
+
+	if (err) {
+		list_for_each_entry_safe(pos, tmp, &opal_secvar_file_list,
+					 list) {
+			fwvarfs_opal_secvar_destroy(pos);
+		}
+	}
+
+	return err;
+}
+
+struct fwvarfs_backend fwvarfs_opal_secvar_backend = {
+	.name = "opal_secvar",
+	.destroy = fwvarfs_opal_secvar_destroy,
+	.enumerate = fwvarfs_opal_secvar_enumerate,
+	.read = fwvarfs_opal_secvar_read,
+};
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 5/6] powerpc/powernv: Remove EFI support for OPAL secure variables
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

Replace it with a generic API.

Compile tested only.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/include/asm/opal-secvar.h       |  58 +++++++++++
 arch/powerpc/platforms/Kconfig               |   3 -
 arch/powerpc/platforms/powernv/Kconfig       |   5 +-
 arch/powerpc/platforms/powernv/opal-secvar.c | 104 ++++---------------
 4 files changed, 83 insertions(+), 87 deletions(-)
 create mode 100644 arch/powerpc/include/asm/opal-secvar.h

diff --git a/arch/powerpc/include/asm/opal-secvar.h b/arch/powerpc/include/asm/opal-secvar.h
new file mode 100644
index 000000000000..ba9bd52138d9
--- /dev/null
+++ b/arch/powerpc/include/asm/opal-secvar.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PowerNV code for secure variables
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Claudio Carvalho <cclaudio@linux.ibm.com>
+ * Author: Daniel Axtens <dja@axtens.net>
+ *
+ */
+
+#ifndef __OPAL_SECVAR_H
+#define __OPAL_SECVAR_H
+
+#include <linux/types.h>
+#include <linux/uuid.h>
+
+#ifdef CONFIG_OPAL_SECVAR
+int opal_get_secure_variable(u16 *name, guid_t *vendor, u32 *attr,
+			     unsigned long *data_size, void *data);
+
+int opal_get_next_secure_variable(unsigned long *name_size, u16 *name,
+				  guid_t *vendor);
+
+int opal_set_secure_variable(u16 *name, guid_t *vendor, u32 attr,
+			     unsigned long data_size, void *data);
+
+int opal_query_secure_variable_info(u32 attr, u64 *storage_space,
+				    u64 *remaining_space,
+				    u64 *max_variable_size);
+#else
+
+static inline int opal_get_secure_variable(u16 *name, guid_t *vendor,
+			u32 *attr, unsigned long *data_size, void *data)
+{
+	return OPAL_UNSUPPORTED;
+}
+
+static inline int opal_get_next_secure_variable(unsigned long *name_size,
+			u16 *name, guid_t *vendor)
+{
+	return OPAL_UNSUPPORTED;
+}
+
+static inline int opal_set_secure_variable(u16 *name, guid_t *vendor,
+			u32 attr, unsigned long data_size, void *data)
+{
+	return OPAL_UNSUPPORTED;
+}
+
+static inline int opal_query_secure_variable_info(u32 attr,
+			u64 *storage_space, u64 *remaining_space,
+			u64 *max_variable_size)
+{
+	return OPAL_UNSUPPORTED;
+}
+
+#endif
+#endif
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 8e30510bc0c1..f3fb79fccc72 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -326,7 +326,4 @@ config XILINX_PCI
 	bool "Xilinx PCI host bridge support"
 	depends on PCI && XILINX_VIRTEX
 
-config EFI
-	bool
-
 endmenu
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 879f8e766098..a71fc5daa60a 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -52,7 +52,6 @@ config OPAL_SECVAR
 	bool "OPAL Secure Variables"
 	depends on PPC_POWERNV && !CPU_BIG_ENDIAN
 	select UCS2_STRING
-	select EFI
 	help
-	  This enables the kernel to access OPAL secure variables via EFI
-	  runtime variable services.
+	  This enables the kernel to access OPAL secure variables via
+	  an API.
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c b/arch/powerpc/platforms/powernv/opal-secvar.c
index e333828bd0bc..af753b94cceb 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -8,62 +8,20 @@
  */
 #define pr_fmt(fmt) "secvar: "fmt
 
-#include <linux/efi.h>
 #include <asm/machdep.h>
 #include <asm/opal.h>
+#include <asm/opal-secvar.h>
+#include <linux/uuid.h>
 
 static bool opal_secvar_supported;
 
-static efi_status_t opal_to_efi_status_log(int rc, const char *func_name)
-{
-	efi_status_t status;
-
-	switch (rc) {
-	case OPAL_EMPTY:
-		status = EFI_NOT_FOUND;
-		break;
-	case OPAL_HARDWARE:
-		status = EFI_DEVICE_ERROR;
-		break;
-	case OPAL_NO_MEM:
-		pr_err("%s: No space in the volatile storage\n", func_name);
-		status = EFI_OUT_OF_RESOURCES;
-		break;
-	case OPAL_PARAMETER:
-		status = EFI_INVALID_PARAMETER;
-		break;
-	case OPAL_PARTIAL:
-		status = EFI_BUFFER_TOO_SMALL;
-		break;
-	case OPAL_PERMISSION:
-		status = EFI_WRITE_PROTECTED;
-		break;
-	case OPAL_RESOURCE:
-		pr_err("%s: No space in the non-volatile storage\n", func_name);
-		status = EFI_OUT_OF_RESOURCES;
-		break;
-	case OPAL_SUCCESS:
-		status = EFI_SUCCESS;
-		break;
-	default:
-		pr_err("%s: Unknown OPAL error %d\n", func_name, rc);
-		status = EFI_DEVICE_ERROR;
-		break;
-	}
-
-	return status;
-}
-
-#define opal_to_efi_status(rc) opal_to_efi_status_log(rc, __func__)
-
-static efi_status_t
-opal_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 *attr,
-		  unsigned long *data_size, void *data)
+int opal_get_secure_variable(u16 *name, guid_t *vendor, u32 *attr,
+			     unsigned long *data_size, void *data)
 {
 	int rc;
 
 	if (!opal_secvar_supported)
-		return EFI_UNSUPPORTED;
+		return OPAL_UNSUPPORTED;
 
 	*data_size = cpu_to_be64(*data_size);
 
@@ -77,17 +35,16 @@ opal_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 *attr,
 		*attr = be32_to_cpup(attr);
 	*data_size = be64_to_cpu(*data_size);
 
-	return opal_to_efi_status(rc);
+	return rc;
 }
 
-static efi_status_t
-opal_get_next_variable(unsigned long *name_size, efi_char16_t *name,
-		       efi_guid_t *vendor)
+int opal_get_next_secure_variable(unsigned long *name_size, u16 *name,
+				  guid_t *vendor)
 {
 	int rc;
 
 	if (!opal_secvar_supported)
-		return EFI_UNSUPPORTED;
+		return OPAL_UNSUPPORTED;
 
 	*name_size = cpu_to_be64(*name_size);
 
@@ -96,17 +53,16 @@ opal_get_next_variable(unsigned long *name_size, efi_char16_t *name,
 
 	*name_size = be64_to_cpu(*name_size);
 
-	return opal_to_efi_status(rc);
+	return rc;
 }
 
-static efi_status_t
-opal_set_variable(efi_char16_t *name, efi_guid_t *vendor, u32 attr,
-		  unsigned long data_size, void *data)
+int opal_set_secure_variable(u16 *name, guid_t *vendor, u32 attr,
+			     unsigned long data_size, void *data)
 {
 	int rc;
 
 	if (!opal_secvar_supported)
-		return EFI_UNSUPPORTED;
+		return OPAL_UNSUPPORTED;
 	/*
 	 * The secure variable update must be enqueued in order to be processed
 	 * in the next boot by firmware. The secure variable storage is write
@@ -114,17 +70,17 @@ opal_set_variable(efi_char16_t *name, efi_guid_t *vendor, u32 attr,
 	 */
 	rc = opal_secvar_enqueue(__pa(name), __pa(vendor), attr,
 				 data_size, __pa(data));
-	return opal_to_efi_status(rc);
+	return rc;
 }
 
-static efi_status_t
-opal_query_variable_info(u32 attr, u64 *storage_space, u64 *remaining_space,
-			 u64 *max_variable_size)
+int opal_query_secure_variable_info(u32 attr, u64 *storage_space,
+				    u64 *remaining_space,
+				    u64 *max_variable_size)
 {
 	int rc;
 
 	if (!opal_secvar_supported)
-		return EFI_UNSUPPORTED;
+		return OPAL_UNSUPPORTED;
 
 	*storage_space = cpu_to_be64p(storage_space);
 	*remaining_space = cpu_to_be64p(remaining_space);
@@ -137,22 +93,18 @@ opal_query_variable_info(u32 attr, u64 *storage_space, u64 *remaining_space,
 	*remaining_space = be64_to_cpup(remaining_space);
 	*max_variable_size = be64_to_cpup(max_variable_size);
 
-	return opal_to_efi_status(rc);
+	return rc;
 }
 
-static void pnv_efi_runtime_setup(void)
+static int __init pnv_opal_secvar_init(void)
 {
 	/*
 	 * The opal wrappers below treat the @name, @vendor, and @data
-	 * parameters as little endian blobs.
+	 * parameters as opaque blobs.
 	 * @name is a ucs2 string
-	 * @vendor is the vendor GUID. It is converted to LE in the kernel
+	 * @vendor is the vendor GUID in EFI format
 	 * @data variable data, which layout may be different for each variable
 	 */
-	efi.get_variable = opal_get_variable;
-	efi.get_next_variable = opal_get_next_variable;
-	efi.set_variable = opal_set_variable;
-	efi.query_variable_info = opal_query_variable_info;
 
 	if (!opal_check_token(OPAL_SECVAR_GET) ||
 	    !opal_check_token(OPAL_SECVAR_GET_NEXT) ||
@@ -163,17 +115,7 @@ static void pnv_efi_runtime_setup(void)
 	} else {
 		opal_secvar_supported = true;
 	}
-}
-
-static int __init pnv_efi_init(void)
-{
-	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
-	set_bit(EFI_BOOT, &efi.flags);
-
-	if (IS_ENABLED(CONFIG_64BIT))
-		set_bit(EFI_64BIT, &efi.flags);
 
-	pnv_efi_runtime_setup();
 	return 0;
 }
-machine_arch_initcall(powernv, pnv_efi_init);
+machine_arch_initcall(powernv, pnv_opal_secvar_init);
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 4/6] powerpc/powernv: Add support for OPAL secure variables
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

From: Claudio Carvalho <cclaudio@linux.ibm.com>

[dja: this is a WIP version - a new version is coming that changes
the interface. I also had to renumber the opal calls to get this
to apply. Basically, this is an illustration of the concept: more
work would be required to get this to actually function.]

The X.509 certificates trusted by the platform and other information
required to secure boot the host OS kernel are wrapped in secure
variables, which are controlled by OPAL.

The OPAL secure variables can be handled through the following OPAL
calls.

OPAL_SECVAR_GET:
Returns the data for a given secure variable name and vendor GUID.

OPAL_SECVAR_GET_NEXT:
For a given secure variable, it returns the name and vendor GUID
of the next variable.

OPAL_SECVAR_ENQUEUE:
Enqueue the supplied secure variable update so that it can be processed
by OPAL in the next boot. Variable updates cannot be be processed right
away because the variable storage is write locked at runtime.

OPAL_SECVAR_INFO:
Returns size information about the variable.

This patch adds support for OPAL secure variables by setting up the EFI
runtime variable services to make OPAL calls.

This patch also introduces CONFIG_OPAL_SECVAR for enabling the OPAL
secure variables support in the kernel. Since CONFIG_OPAL_SECVAR selects
CONFIG_EFI, it also allow us to manage the OPAL secure variables from
userspace via efivarfs.

Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/include/asm/opal-api.h          |   6 +-
 arch/powerpc/include/asm/opal.h              |  10 ++
 arch/powerpc/platforms/Kconfig               |   3 +
 arch/powerpc/platforms/powernv/Kconfig       |   9 +
 arch/powerpc/platforms/powernv/Makefile      |   1 +
 arch/powerpc/platforms/powernv/opal-call.c   |   4 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 179 +++++++++++++++++++
 7 files changed, 211 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index e1577cfa7186..8054e1e983ff 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -212,7 +212,11 @@
 #define OPAL_HANDLE_HMI2			166
 #define	OPAL_NX_COPROC_INIT			167
 #define OPAL_XIVE_GET_VP_STATE			170
-#define OPAL_LAST				170
+#define OPAL_SECVAR_GET				171
+#define OPAL_SECVAR_GET_NEXT			172
+#define OPAL_SECVAR_ENQUEUE			173
+#define OPAL_SECVAR_INFO			174
+#define OPAL_LAST				174
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 4cc37e708bc7..4b8046caaf4f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -394,6 +394,16 @@ void opal_powercap_init(void);
 void opal_psr_init(void);
 void opal_sensor_groups_init(void);
 
+extern int opal_secvar_get(uint64_t name, uint64_t vendor, uint64_t attr,
+			   uint64_t data_size, uint64_t data);
+extern int opal_secvar_get_next(uint64_t name_size, uint64_t name,
+				uint64_t vendor);
+extern int opal_secvar_enqueue(uint64_t name, uint64_t vendor, uint64_t attr,
+			       uint64_t data_size, uint64_t data);
+extern int opal_secvar_info(uint64_t attr, uint64_t storage_space,
+			    uint64_t remaining_space,
+			    uint64_t max_variable_size);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f3fb79fccc72..8e30510bc0c1 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -326,4 +326,7 @@ config XILINX_PCI
 	bool "Xilinx PCI host bridge support"
 	depends on PCI && XILINX_VIRTEX
 
+config EFI
+	bool
+
 endmenu
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 850eee860cf2..879f8e766098 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -47,3 +47,12 @@ config PPC_VAS
 	  VAS adapters are found in POWER9 based systems.
 
 	  If unsure, say N.
+
+config OPAL_SECVAR
+	bool "OPAL Secure Variables"
+	depends on PPC_POWERNV && !CPU_BIG_ENDIAN
+	select UCS2_STRING
+	select EFI
+	help
+	  This enables the kernel to access OPAL secure variables via EFI
+	  runtime variable services.
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index da2e99efbd04..1511d836fd19 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE)	+= memtrace.o
 obj-$(CONFIG_PPC_VAS)	+= vas.o vas-window.o vas-debug.o
 obj-$(CONFIG_OCXL_BASE)	+= ocxl.o
+obj-$(CONFIG_OPAL_SECVAR)	+= opal-secvar.o
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 36c8fa3647a2..1a2e080dd027 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -288,3 +288,7 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,		OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,			OPAL_SENSOR_READ_U64);
 OPAL_CALL(opal_sensor_group_enable,		OPAL_SENSOR_GROUP_ENABLE);
 OPAL_CALL(opal_nx_coproc_init,			OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_secvar_get,			OPAL_SECVAR_GET);
+OPAL_CALL(opal_secvar_get_next,			OPAL_SECVAR_GET_NEXT);
+OPAL_CALL(opal_secvar_enqueue,			OPAL_SECVAR_ENQUEUE);
+OPAL_CALL(opal_secvar_info,			OPAL_SECVAR_INFO)
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c b/arch/powerpc/platforms/powernv/opal-secvar.c
new file mode 100644
index 000000000000..e333828bd0bc
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PowerNV code for secure variables
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Claudio Carvalho <cclaudio@linux.ibm.com>
+ *
+ */
+#define pr_fmt(fmt) "secvar: "fmt
+
+#include <linux/efi.h>
+#include <asm/machdep.h>
+#include <asm/opal.h>
+
+static bool opal_secvar_supported;
+
+static efi_status_t opal_to_efi_status_log(int rc, const char *func_name)
+{
+	efi_status_t status;
+
+	switch (rc) {
+	case OPAL_EMPTY:
+		status = EFI_NOT_FOUND;
+		break;
+	case OPAL_HARDWARE:
+		status = EFI_DEVICE_ERROR;
+		break;
+	case OPAL_NO_MEM:
+		pr_err("%s: No space in the volatile storage\n", func_name);
+		status = EFI_OUT_OF_RESOURCES;
+		break;
+	case OPAL_PARAMETER:
+		status = EFI_INVALID_PARAMETER;
+		break;
+	case OPAL_PARTIAL:
+		status = EFI_BUFFER_TOO_SMALL;
+		break;
+	case OPAL_PERMISSION:
+		status = EFI_WRITE_PROTECTED;
+		break;
+	case OPAL_RESOURCE:
+		pr_err("%s: No space in the non-volatile storage\n", func_name);
+		status = EFI_OUT_OF_RESOURCES;
+		break;
+	case OPAL_SUCCESS:
+		status = EFI_SUCCESS;
+		break;
+	default:
+		pr_err("%s: Unknown OPAL error %d\n", func_name, rc);
+		status = EFI_DEVICE_ERROR;
+		break;
+	}
+
+	return status;
+}
+
+#define opal_to_efi_status(rc) opal_to_efi_status_log(rc, __func__)
+
+static efi_status_t
+opal_get_variable(efi_char16_t *name, efi_guid_t *vendor, u32 *attr,
+		  unsigned long *data_size, void *data)
+{
+	int rc;
+
+	if (!opal_secvar_supported)
+		return EFI_UNSUPPORTED;
+
+	*data_size = cpu_to_be64(*data_size);
+
+	rc = opal_secvar_get(__pa(name), __pa(vendor), __pa(attr),
+			     __pa(data_size), __pa(data));
+	/*
+	 * The @attr is an optional output parameter. It is returned in
+	 * big-endian.
+	 */
+	if (attr)
+		*attr = be32_to_cpup(attr);
+	*data_size = be64_to_cpu(*data_size);
+
+	return opal_to_efi_status(rc);
+}
+
+static efi_status_t
+opal_get_next_variable(unsigned long *name_size, efi_char16_t *name,
+		       efi_guid_t *vendor)
+{
+	int rc;
+
+	if (!opal_secvar_supported)
+		return EFI_UNSUPPORTED;
+
+	*name_size = cpu_to_be64(*name_size);
+
+	rc = opal_secvar_get_next(__pa(name_size), __pa(name),
+				  __pa(vendor));
+
+	*name_size = be64_to_cpu(*name_size);
+
+	return opal_to_efi_status(rc);
+}
+
+static efi_status_t
+opal_set_variable(efi_char16_t *name, efi_guid_t *vendor, u32 attr,
+		  unsigned long data_size, void *data)
+{
+	int rc;
+
+	if (!opal_secvar_supported)
+		return EFI_UNSUPPORTED;
+	/*
+	 * The secure variable update must be enqueued in order to be processed
+	 * in the next boot by firmware. The secure variable storage is write
+	 * locked at runtime.
+	 */
+	rc = opal_secvar_enqueue(__pa(name), __pa(vendor), attr,
+				 data_size, __pa(data));
+	return opal_to_efi_status(rc);
+}
+
+static efi_status_t
+opal_query_variable_info(u32 attr, u64 *storage_space, u64 *remaining_space,
+			 u64 *max_variable_size)
+{
+	int rc;
+
+	if (!opal_secvar_supported)
+		return EFI_UNSUPPORTED;
+
+	*storage_space = cpu_to_be64p(storage_space);
+	*remaining_space = cpu_to_be64p(remaining_space);
+	*max_variable_size = cpu_to_be64p(max_variable_size);
+
+	rc = opal_secvar_info(attr, __pa(storage_space), __pa(remaining_space),
+			      __pa(max_variable_size));
+
+	*storage_space = be64_to_cpup(storage_space);
+	*remaining_space = be64_to_cpup(remaining_space);
+	*max_variable_size = be64_to_cpup(max_variable_size);
+
+	return opal_to_efi_status(rc);
+}
+
+static void pnv_efi_runtime_setup(void)
+{
+	/*
+	 * The opal wrappers below treat the @name, @vendor, and @data
+	 * parameters as little endian blobs.
+	 * @name is a ucs2 string
+	 * @vendor is the vendor GUID. It is converted to LE in the kernel
+	 * @data variable data, which layout may be different for each variable
+	 */
+	efi.get_variable = opal_get_variable;
+	efi.get_next_variable = opal_get_next_variable;
+	efi.set_variable = opal_set_variable;
+	efi.query_variable_info = opal_query_variable_info;
+
+	if (!opal_check_token(OPAL_SECVAR_GET) ||
+	    !opal_check_token(OPAL_SECVAR_GET_NEXT) ||
+	    !opal_check_token(OPAL_SECVAR_ENQUEUE) ||
+	    !opal_check_token(OPAL_SECVAR_INFO)) {
+		pr_err("OPAL doesn't support secure variables\n");
+		opal_secvar_supported = false;
+	} else {
+		opal_secvar_supported = true;
+	}
+}
+
+static int __init pnv_efi_init(void)
+{
+	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+	set_bit(EFI_BOOT, &efi.flags);
+
+	if (IS_ENABLED(CONFIG_64BIT))
+		set_bit(EFI_64BIT, &efi.flags);
+
+	pnv_efi_runtime_setup();
+	return 0;
+}
+machine_arch_initcall(powernv, pnv_efi_init);
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 3/6] fwvarfs: efi backend
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

Add a read-only EFI backend. This does not rely on efivarfs at all
(although it does borrow heavily from the code).

It only supports reading the variables, but it supports the same
format as efivarfs so tools like efivar continue to work if you
mount this filesystem in the same place.

Two small quirks:
 - efivarfs (at least as configured on Ubuntu) allows users to
   access the files, here only root can.
 - efivarfs makes GUID comparison case-insensitive, this does not.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 Documentation/filesystems/fwvarfs.txt |   5 +
 fs/fwvarfs/Kconfig                    |  11 ++
 fs/fwvarfs/Makefile                   |   1 +
 fs/fwvarfs/efi.c                      | 177 ++++++++++++++++++++++++++
 fs/fwvarfs/fwvarfs.c                  |   4 +-
 fs/fwvarfs/fwvarfs.h                  |   4 +
 6 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100644 fs/fwvarfs/efi.c

diff --git a/Documentation/filesystems/fwvarfs.txt b/Documentation/filesystems/fwvarfs.txt
index bf1bccba6ab9..7c1e921e5c50 100644
--- a/Documentation/filesystems/fwvarfs.txt
+++ b/Documentation/filesystems/fwvarfs.txt
@@ -32,6 +32,10 @@ Supported backends
    operations. Files created persist across mount/unmount but as no
    hardware is involved they do not persist across reboots.
 
+ * efi - a partial reimplementation of efivarfs against the fwvarfs
+   backend. Read-only with no creation or deletion, but sufficient that
+   efivar --print --name <whatever> works the same as efivarfs.
+
 Usage
 -----
 
@@ -40,6 +44,7 @@ mount -t fwvarfs <backend> <dir>
 For example:
 
 mount -t fwvarfs mem /fw/mem/
+mount -t fwvarfs efi /sys/firmware/efi/efivars
 
 API
 ---
diff --git a/fs/fwvarfs/Kconfig b/fs/fwvarfs/Kconfig
index 62a47cddd4b5..e4474da11dbc 100644
--- a/fs/fwvarfs/Kconfig
+++ b/fs/fwvarfs/Kconfig
@@ -23,3 +23,14 @@ config FWVAR_FS_MEM_BACKEND
 	  demonstration of fwvarfs.
 
 	  You can safely say N here unless you're exploring fwvarfs.
+
+config FWVAR_FS_EFI_BACKEND
+	bool "EFI backend"
+	depends on FWVAR_FS
+	help
+	  Include a read-only EFI backend, largely cribbed from
+	  efivarfs. This is handy for demonstrating that the same
+	  userspace tools can read from EFI variables over fwvarfs
+	  in the same way the do with efivarfs.
+
+	  Say N here unless you're exploring fwvarfs.
diff --git a/fs/fwvarfs/Makefile b/fs/fwvarfs/Makefile
index f1585baccabe..2ab9dfd650ca 100644
--- a/fs/fwvarfs/Makefile
+++ b/fs/fwvarfs/Makefile
@@ -6,3 +6,4 @@
 obj-$(CONFIG_FWVAR_FS)		+= fwvarfs.o
 
 obj-$(CONFIG_FWVAR_FS_MEM_BACKEND)		+= mem.o
+obj-$(CONFIG_FWVAR_FS_EFI_BACKEND)		+= efi.o
diff --git a/fs/fwvarfs/efi.c b/fs/fwvarfs/efi.c
new file mode 100644
index 000000000000..7aa5c186d0c9
--- /dev/null
+++ b/fs/fwvarfs/efi.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens
+ *
+ * Based on efivarfs:
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Copyright (C) 2012 Jeremy Kerr <jeremy.kerr@canonical.com>
+ *
+ * We cheat by not allowing for case-insensitivity.
+ */
+
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include "fwvarfs.h"
+
+#include <linux/efi.h>
+#include <linux/uuid.h>
+#include <linux/ucs2_string.h>
+
+static LIST_HEAD(efivar_list);
+static LIST_HEAD(efivar_file_list);
+
+struct fwvarfs_efi_file {
+	bool is_removable;
+	struct list_head list;
+	struct efivar_entry *entry;
+};
+
+// need a forward decl to pass down to register
+struct fwvarfs_backend fwvarfs_efi_backend;
+
+
+static ssize_t fwvarfs_efi_read(void *variable, char *buf,
+		size_t count, loff_t off)
+{
+	struct fwvarfs_efi_file *file_data = variable;
+	struct efivar_entry *var = file_data->entry;
+	unsigned long datasize = 0;
+	u32 attributes;
+	void *data;
+	ssize_t size = 0;
+	loff_t ppos = off;
+	int err;
+
+	err = efivar_entry_size(var, &datasize);
+
+	/*
+	 * efivarfs represents uncommitted variables with
+	 * zero-length files. Reading them should return EOF.
+	 */
+	if (err == -ENOENT)
+		return 0;
+	else if (err)
+		return err;
+
+	data = kmalloc(datasize + sizeof(attributes), GFP_KERNEL);
+
+	if (!data)
+		return -ENOMEM;
+
+	size = efivar_entry_get(var, &attributes, &datasize,
+				data + sizeof(attributes));
+	if (size)
+		goto out_free;
+
+	memcpy(data, &attributes, sizeof(attributes));
+	size = memory_read_from_buffer(buf, count, &ppos,
+				       data, datasize + sizeof(attributes));
+out_free:
+	kfree(data);
+
+	return size;
+}
+
+static int fwvarfs_efi_callback(efi_char16_t *name16, efi_guid_t vendor,
+				unsigned long name_size, void *data)
+{
+	struct efivar_entry *entry;
+	struct fwvarfs_efi_file *file_data;
+	unsigned long size = 0;
+	char *name;
+	int len;
+	int err = -ENOMEM;
+
+	file_data = kzalloc(sizeof(*file_data), GFP_KERNEL);
+	if (!file_data)
+		return err;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto fail;
+
+	memcpy(entry->var.VariableName, name16, name_size);
+	memcpy(&(entry->var.VendorGuid), &vendor, sizeof(efi_guid_t));
+
+	len = ucs2_utf8size(entry->var.VariableName);
+
+	/* name, plus '-', plus GUID, plus NUL */
+	name = kmalloc(len + 1 + EFI_VARIABLE_GUID_LEN + 1, GFP_KERNEL);
+	if (!name)
+		goto fail_entry;
+
+	ucs2_as_utf8(name, entry->var.VariableName, len);
+
+	if (efivar_variable_is_removable(entry->var.VendorGuid, name, len))
+		file_data->is_removable = true;
+
+	name[len] = '-';
+
+	efi_guid_to_str(&entry->var.VendorGuid, name + len + 1);
+
+	name[len + EFI_VARIABLE_GUID_LEN+1] = '\0';
+
+	efivar_entry_size(entry, &size);
+	err = efivar_entry_add(entry, &efivar_list);
+	if (err)
+		goto fail_name;
+
+	err = fwvarfs_register_var(&fwvarfs_efi_backend, name, file_data, size);
+	if (err)
+		goto fail_name;
+
+	INIT_LIST_HEAD(&file_data->list);
+	list_add(&efivar_file_list, &file_data->list);
+	file_data->entry = entry;
+
+	/* copied by the above, I think */
+	kfree(name);
+
+	return 0;
+fail_name:
+	kfree(name);
+fail_entry:
+	kfree(entry);
+fail:
+	kfree(file_data);
+	return err;
+}
+
+
+static void fwvarfs_efi_destroy(void *var)
+{
+	struct fwvarfs_efi_file *file_data = var;
+	struct efivar_entry *entry = file_data->entry;
+
+	// ignore error, eek.
+	efivar_entry_remove(entry);
+	kfree(entry);
+
+	list_del(&file_data->list);
+	kfree(file_data);
+}
+
+
+static int fwvarfs_efi_enumerate(void)
+{
+	int err;
+	struct fwvarfs_efi_file *pos, *tmp;
+
+	err = efivar_init(fwvarfs_efi_callback, NULL, true, &efivar_list);
+	if (err) {
+		list_for_each_entry_safe(pos, tmp, &efivar_file_list, list) {
+			fwvarfs_efi_destroy(pos);
+		}
+	}
+
+	return err;
+}
+
+struct fwvarfs_backend fwvarfs_efi_backend = {
+	.name = "efi",
+	.destroy = fwvarfs_efi_destroy,
+	.enumerate = fwvarfs_efi_enumerate,
+	.read = fwvarfs_efi_read,
+};
diff --git a/fs/fwvarfs/fwvarfs.c b/fs/fwvarfs/fwvarfs.c
index 99b7f2fd0f14..643ec6585b4d 100644
--- a/fs/fwvarfs/fwvarfs.c
+++ b/fs/fwvarfs/fwvarfs.c
@@ -22,7 +22,9 @@ static struct fwvarfs_backend *fwvarfs_backends[] = {
 #if CONFIG_FWVAR_FS_MEM_BACKEND
 	&fwvarfs_mem_backend,
 #endif
-
+#ifdef CONFIG_FWVAR_FS_EFI_BACKEND
+	&fwvarfs_efi_backend,
+#endif
 	NULL,
 };
 
diff --git a/fs/fwvarfs/fwvarfs.h b/fs/fwvarfs/fwvarfs.h
index b2944a3baaf7..49bde268401f 100644
--- a/fs/fwvarfs/fwvarfs.h
+++ b/fs/fwvarfs/fwvarfs.h
@@ -113,4 +113,8 @@ int fwvarfs_register_var(struct fwvarfs_backend *backend, const char *name,
 extern struct fwvarfs_backend fwvarfs_mem_backend;
 #endif
 
+#if defined(CONFIG_FWVAR_FS_EFI_BACKEND)
+extern struct fwvarfs_backend fwvarfs_efi_backend;
+#endif
+
 #endif /* FWVARFS_H */
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 2/6] fwvarfs: a generic firmware variable filesystem
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

Sometimes it is helpful to be able to access firmware variables as
file, like efivarfs, but not all firmware is EFI. Create a framework
that allows generic access to firmware variables exposed by a
implementations of a simple backend API.

Add a demonstration memory-based backend.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 Documentation/filesystems/fwvarfs.txt | 146 +++++++++++++
 fs/Kconfig                            |   1 +
 fs/Makefile                           |   1 +
 fs/fwvarfs/Kconfig                    |  25 +++
 fs/fwvarfs/Makefile                   |   8 +
 fs/fwvarfs/fwvarfs.c                  | 289 ++++++++++++++++++++++++++
 fs/fwvarfs/fwvarfs.h                  | 116 +++++++++++
 fs/fwvarfs/mem.c                      | 113 ++++++++++
 fs/kernfs/dir.c                       |   1 -
 include/uapi/linux/magic.h            |   1 +
 10 files changed, 700 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/filesystems/fwvarfs.txt
 create mode 100644 fs/fwvarfs/Kconfig
 create mode 100644 fs/fwvarfs/Makefile
 create mode 100644 fs/fwvarfs/fwvarfs.c
 create mode 100644 fs/fwvarfs/fwvarfs.h
 create mode 100644 fs/fwvarfs/mem.c

diff --git a/Documentation/filesystems/fwvarfs.txt b/Documentation/filesystems/fwvarfs.txt
new file mode 100644
index 000000000000..bf1bccba6ab9
--- /dev/null
+++ b/Documentation/filesystems/fwvarfs.txt
@@ -0,0 +1,146 @@
+fwvarfs
+=======
+
+fwvarfs is a generic firmware variable file-system. A platform
+provides a backend implementing a few very simple callbacks, and
+fwvarfs handles all the details required to present the variables as a
+filesystem.
+
+The minimum functionality for a backend is the ability to enumerate
+existing variables. Backends can optionally also allow:
+ - reading of variables
+ - writing of variables
+ - creation of variables
+ - deletion of variables
+
+Key assumptions
+---------------
+
+ * Variables for each backend live in a single, flat directory -
+   there's no concept of subdirectories.
+
+ * Files are created with mode 600 if the backend provides a write
+   hook, and 400 otherwise, and are owned by root:root.
+
+ * The set of variables stored can be determined at boot time, and
+   nothing outside of fwvarfs creates new variables after boot.
+
+Supported backends
+------------------
+
+ * mem - a memory-backed example filesystem supporting all
+   operations. Files created persist across mount/unmount but as no
+   hardware is involved they do not persist across reboots.
+
+Usage
+-----
+
+mount -t fwvarfs <backend> <dir>
+
+For example:
+
+mount -t fwvarfs mem /fw/mem/
+
+API
+---
+
+A backend is installed by creating a fwvarfs_backend struct,
+containing the name of the backend and various callbacks. The backend
+must be registered by adding it to the list at the top of
+fs/fwvarfs/fwvarfs.c
+
+The fwvarfs infrastructure provides the following function to backends:
+
+  int fwvarfs_register_var(struct fwvarfs_backend *backend, const char *name, void *variable, size_t size);
+
+  Register a variable with fwvarfs to allow it to be seen by users.
+
+  backend: the backend to which to add this variable
+
+  name: the name of file representing the variable. Must be a valid
+        filename, so no nulls or slashes.
+
+  variable: data private to the backend representing the variable -
+            will be passed back to every callback
+
+  size: the initial size of the variable
+
+
+The backend must then provide the following functions:
+
+    int (*enumerate)(void);
+
+	Mandatory and called at init time, a backend must call
+	fwvarfs_register_var for all variables it wants to expose to
+	the user.
+
+    void (*destroy)(void *variable);
+
+	Mandatory if you provide a create or unlink hook, and may
+	become mandatory in the future for cleanup.
+
+	Free backend data associated with variable. It will not be
+	referenced after this point by fwvarfs.
+
+
+    ssize_t (*read)(void *variable, char *dest, size_t bytes, loff_t off);
+
+	Read from variable into the a kernel buffer. Similar semantics
+	to a usual read operation, except that off is not a pointer
+	(unlike the usual ppos).
+
+	variable: the variable to read
+	dest: kernel buffer to read into
+	bytes: maximum number of bytes to read
+	off: offset to read from
+
+	Returns the number of bytes read or an error.
+	If this hook is not provided, all reads will fail with -EPERM.
+
+    ssize_t (*write)(void *variable, const char *src, size_t bytes, loff_t off);
+
+	Write into the variable from the given kernel buffer.
+
+	variable: the variable to write
+	src: kernel buffer with contents
+	bytes: write at most this many bytes
+	off: offset into the file to write at.
+
+	Returns the number of bytes written or an error.
+	If this hook is not provided, all writes will fail with -EPERM.
+
+
+    void* (*create)(const char *name);
+
+	Create a variable with the supplied name, and return the
+	associated private data or an error pointer. Do not return
+	NULL on failure.
+
+	If the variable created cannot be registered for any reason,
+	destroy() will be called on the variable returned.
+
+	If the hook is not provided, all attempts to create a file will
+	fail with -EPERM.
+
+    int (*unlink)(void *variable);
+
+	Delete the variable supplied from the backing store. Do not
+	free it yet, if you return success destroy() will be called on
+	the variable.
+
+	If an error is returned, the unlink will be aborted and the file
+	will still be present in the filesystem.
+
+	If the hook is not provided, all attempts to unlink a file will
+	fail with -EPERM.
+
+TODOs
+-----
+
+Perhaps a different registration scheme?
+Currently size is not updated after write
+Should standardise on whether writes must cover the whole file if partial writes are supported.
+Various TODOs in the code
+Convert API documentation to kerndoc
+perhaps better cleanup/removal, although kernfs doesn't seem to provide anything for this so difficult to do with out leaking memory
+check error handling with kernfs create and O_EXCL
diff --git a/fs/Kconfig b/fs/Kconfig
index cbbffc8b9ef5..6fb6e6cbd7b6 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -219,6 +219,7 @@ config ARCH_HAS_GIGANTIC_PAGE
 
 source "fs/configfs/Kconfig"
 source "fs/efivarfs/Kconfig"
+source "fs/fwvarfs/Kconfig"
 
 endmenu
 
diff --git a/fs/Makefile b/fs/Makefile
index c9aea23aba56..2a0c593dfc0f 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -130,3 +130,4 @@ obj-$(CONFIG_F2FS_FS)		+= f2fs/
 obj-$(CONFIG_CEPH_FS)		+= ceph/
 obj-$(CONFIG_PSTORE)		+= pstore/
 obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
+obj-$(CONFIG_FWVAR_FS)		+= fwvarfs/
diff --git a/fs/fwvarfs/Kconfig b/fs/fwvarfs/Kconfig
new file mode 100644
index 000000000000..62a47cddd4b5
--- /dev/null
+++ b/fs/fwvarfs/Kconfig
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0
+
+config FWVAR_FS
+	bool "Generic Firmware Variable Filesystem"
+	help
+	  fwvarfs is a generic file system for access to firmware
+	  variables.
+
+	  It is pluggable: you will need to select a backend below in
+	  order to actually access anything.
+
+	  This cannot currently be built as a module. (TODO: see if
+	  kernfs can be exported or if there are technical obstacles.)
+
+	  If unsure, say N.
+
+config FWVAR_FS_MEM_BACKEND
+	bool "In-memory testing backend"
+	depends on FWVAR_FS
+	help
+	  Include a backend where firmware variables are just
+	  elements of an in-memory list. This is helpful mostly as a
+	  demonstration of fwvarfs.
+
+	  You can safely say N here unless you're exploring fwvarfs.
diff --git a/fs/fwvarfs/Makefile b/fs/fwvarfs/Makefile
new file mode 100644
index 000000000000..f1585baccabe
--- /dev/null
+++ b/fs/fwvarfs/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the fwvarfs filesytem
+#
+
+obj-$(CONFIG_FWVAR_FS)		+= fwvarfs.o
+
+obj-$(CONFIG_FWVAR_FS_MEM_BACKEND)		+= mem.o
diff --git a/fs/fwvarfs/fwvarfs.c b/fs/fwvarfs/fwvarfs.c
new file mode 100644
index 000000000000..99b7f2fd0f14
--- /dev/null
+++ b/fs/fwvarfs/fwvarfs.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens
+ *
+ * Thanks to efivarfs, rdt, and cgroupfs for the kernfs example.
+ */
+
+#include <linux/fs.h>
+#include <linux/fs_context.h>
+#include <linux/kernfs.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/user_namespace.h>
+#include <uapi/linux/magic.h>
+#include "fwvarfs.h"
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+static struct fwvarfs_backend *fwvarfs_backends[] = {
+#if CONFIG_FWVAR_FS_MEM_BACKEND
+	&fwvarfs_mem_backend,
+#endif
+
+	NULL,
+};
+
+struct fwvarfs_file {
+	struct kernfs_node *kn;
+	struct fwvarfs_backend *backend;
+	void *backend_data;
+};
+
+static ssize_t fwvarfs_file_read(struct kernfs_open_file *of, char *buf,
+				 size_t bytes, loff_t off)
+{
+	struct fwvarfs_file *file_data = of->kn->priv;
+
+	if (file_data->backend->read)
+		return file_data->backend->read(file_data->backend_data, buf,
+						bytes, off);
+	else
+		return -EPERM;
+}
+
+static ssize_t fwvarfs_file_write(struct kernfs_open_file *of, char *buf,
+				  size_t bytes, loff_t off)
+{
+	struct fwvarfs_file *file_data = of->kn->priv;
+
+	if (file_data->backend->write)
+		return file_data->backend->write(file_data->backend_data, buf,
+						 bytes, off);
+	else
+		return -EPERM;
+}
+
+
+static struct kernfs_ops fwvarfs_kf_ops = {
+	.atomic_write_len	= PAGE_SIZE,
+	.read			= fwvarfs_file_read,
+	.write			= fwvarfs_file_write,
+};
+
+struct kernfs_node *fwvarfs_create(struct kernfs_node *parent,
+				   const char *name, umode_t mode)
+{
+	struct kernfs_node *kn;
+	struct fwvarfs_file *parent_file = parent->priv;
+	struct fwvarfs_file *file_data;
+	void *backend_data;
+
+	if (!parent_file->backend->create)
+		return ERR_PTR(-EPERM);
+
+	file_data = kzalloc(sizeof(struct fwvarfs_file), GFP_KERNEL);
+	if (!file_data)
+		return ERR_PTR(-ENOMEM);
+
+	file_data->backend = parent_file->backend;
+
+	backend_data = parent_file->backend->create(name);
+
+	if (IS_ERR(backend_data)) {
+		kfree(file_data);
+		return backend_data;
+	}
+
+	file_data->backend_data = backend_data;
+
+	kn = kernfs_create_file(parent, name,
+		(!!parent_file->backend->write ? 0600 : 0400), 0,
+		&fwvarfs_kf_ops, file_data);
+
+	if (IS_ERR(kn)) {
+		parent_file->backend->destroy(backend_data);
+		kfree(file_data);
+		return kn;
+	}
+
+	file_data->kn = kn;
+
+	return kn;
+}
+
+int fwvarfs_unlink(struct kernfs_node *kn)
+{
+
+	struct fwvarfs_file *file_data = kn->priv;
+	int ret;
+
+	if (!file_data->backend->unlink)
+		return -EPERM;
+
+	ret = file_data->backend->unlink(file_data->backend_data);
+
+	if (ret)
+		return ret;
+
+	kernfs_remove(file_data->kn);
+
+	file_data->backend->destroy(file_data->backend_data);
+
+	kfree(file_data);
+	return 0;
+}
+
+static struct kernfs_syscall_ops fwvarfs_scops = {
+	.create = fwvarfs_create,
+	.unlink = fwvarfs_unlink,
+};
+
+int fwvarfs_register_var(struct fwvarfs_backend *backend, const char *name,
+			 void *variable, size_t size)
+{
+	struct fwvarfs_file *file_data;
+	struct kernfs_node *kn;
+
+	file_data = kzalloc(sizeof(struct fwvarfs_file), GFP_KERNEL);
+	if (!file_data)
+		return -ENOMEM;
+
+	file_data->backend = backend;
+	file_data->backend_data = variable;
+
+	kn = kernfs_create_file(backend->kf_root->kn, name,
+		(!!backend->write ? 0600 : 0400), size,
+		&fwvarfs_kf_ops, file_data);
+
+	if (IS_ERR(kn)) {
+		kfree(file_data);
+		return PTR_ERR(kn);
+	}
+
+	file_data->kn = kn;
+
+	return 0;
+
+}
+
+static int fwvarfs_get_tree(struct fs_context *fc)
+{
+	int ret = -ENODEV;
+	struct fwvarfs_backend *backend;
+	struct kernfs_fs_context *kfc = fc->fs_private;
+	int i;
+
+	for (i = 0; (backend = fwvarfs_backends[i]); i++) {
+		if (!backend->is_active)
+			continue;
+
+		if (strcasecmp(fc->source, backend->name) == 0) {
+			kfc->root = backend->kf_root;
+			ret = 0;
+		}
+	}
+	if (ret)
+		return ret;
+
+	return kernfs_get_tree(fc);
+}
+
+static void fwvarfs_free_fs_context(struct fs_context *fc)
+{
+	kernfs_free_fs_context(fc);
+	kfree(fc->fs_private);
+}
+
+static const struct fs_context_operations fwvarfs_fs_context_ops = {
+	.get_tree	= fwvarfs_get_tree,
+	.free		= fwvarfs_free_fs_context,
+};
+
+static int fwvarfs_init_fs_context(struct fs_context *fc)
+{
+	struct kernfs_fs_context *ctx;
+
+	ctx = kzalloc(sizeof(struct kernfs_fs_context), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->magic = FWVARFS_SUPER_MAGIC;
+	fc->fs_private = ctx;
+
+	fc->ops = &fwvarfs_fs_context_ops;
+	if (fc->user_ns)
+		put_user_ns(fc->user_ns);
+	fc->user_ns = get_user_ns(&init_user_ns);
+	fc->global = true;
+	return 0;
+}
+
+static struct file_system_type fwvarfs_type = {
+	.owner   = THIS_MODULE,
+	.name    = "fwvarfs",
+	.init_fs_context = fwvarfs_init_fs_context,
+	.kill_sb = kernfs_kill_sb,
+};
+
+static __init int fwvarfs_init(void)
+{
+	struct fwvarfs_backend *backend;
+	struct fwvarfs_file *file_data;
+	int ret, i;
+
+	for (i = 0; (backend = fwvarfs_backends[i]); i++) {
+		file_data = kzalloc(sizeof(struct fwvarfs_file), GFP_KERNEL);
+		if (!file_data)
+			return -ENOMEM;
+
+		file_data->backend = backend;
+
+		backend->kf_root = kernfs_create_root(&fwvarfs_scops,
+					KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK,
+					file_data);
+
+		if (IS_ERR(backend->kf_root)) {
+			pr_err("kernfs_create_root failed for %s: %ld",
+				backend->name, PTR_ERR(backend->kf_root));
+			kfree(file_data);
+			continue;
+		}
+
+		file_data->kn = backend->kf_root->kn;
+
+		ret = backend->enumerate();
+		if (ret) {
+			pr_err("enumerate failed for %s: %d",
+			       backend->name, ret);
+
+			/*
+			 * TODO: we make no attempt to clean up partially
+			 * created files at this point
+			 */
+			kernfs_destroy_root(backend->kf_root);
+			kfree(file_data);
+			continue;
+		}
+
+		backend->is_active = true;
+	}
+
+	return register_filesystem(&fwvarfs_type);
+}
+
+
+/*
+ * kernfs doesn't support being called from a module atm
+ * and we also have no obvious way to remove all the created variables, so
+ * atm even if you did this you would leak memory. TODO
+ * static __exit void fwvarfs_exit(void)
+ * {
+ *	unregister_filesystem(&fwvarfs_type);
+ * }
+ */
+
+
+/*
+ * again, kernfs blocks module-ising this atm but it's still a neat way
+ * to handle initialisation
+ */
+MODULE_AUTHOR("Daniel Axtens");
+MODULE_DESCRIPTION("Generic Firmware Variable Filesystem");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_FS("fwvarfs");
+
+module_init(fwvarfs_init);
+/* module_exit(fwvarfs_exit); */
diff --git a/fs/fwvarfs/fwvarfs.h b/fs/fwvarfs/fwvarfs.h
new file mode 100644
index 000000000000..b2944a3baaf7
--- /dev/null
+++ b/fs/fwvarfs/fwvarfs.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens
+ */
+
+
+#ifndef FWVARFS_H
+#define FWVARFS_H
+
+#include <linux/kernfs.h>
+
+struct fwvarfs_backend {
+	/* name of backend */
+	const char *name;
+
+	/*
+	 * Mandatory and called at init time, a backend must call
+	 * fwvarfs_register_var for all variables it wants to expose to
+	 * the user.
+	 */
+	int (*enumerate)(void);
+
+	/*
+	 * Mandatory if you provide a create or unlink hook, and may
+	 * become mandatory in the future for cleanup.
+	 *
+	 * Free backend data associated with variable. It will not be
+	 * referenced after this point by fwvarfs.
+	 */
+	void (*destroy)(void *variable);
+
+	/*
+	 * Read from variable into the a kernel buffer. Similar semantics
+	 * to a usual read operation, except that off is not a pointer
+	 * (unlike the usual ppos).
+	 *
+	 * variable: the variable to read
+	 * dest: kernel buffer to read into
+	 * bytes: maximum number of bytes to read
+	 * off: offset to read from
+	 *
+	 * Returns the number of bytes read or an error.
+	 * If this hook is not provided, all reads will fail with -EPERM.
+	 */
+	ssize_t (*read)(void *variable, char *dest, size_t bytes, loff_t off);
+
+	/*
+	 * Write into the variable from the given kernel buffer.
+	 *
+	 * variable: the variable to write
+	 * src: kernel buffer with contents
+	 * bytes: write at most this many bytes
+	 * off: offset into the file to write at.
+	 *
+	 * Returns the number of bytes written or an error.
+	 * If this hook is not provided, all writes will fail with -EPERM.
+	 */
+	ssize_t (*write)(void *variable, const char *src, size_t bytes,
+			 loff_t off);
+
+	/*
+	 * Create a variable with the supplied name, and return the
+	 * associated private data or an error pointer. Do not return
+	 * NULL on failure.
+	 *
+	 * If the variable created cannot be registered for any reason,
+	 * destroy() will be called on the variable returned.
+	 *
+	 * If the hook is not provided, all attempts to create a file will
+	 * fail with -EPERM.
+	 */
+	void* (*create)(const char *name);
+
+	/*
+	 * Delete the variable supplied from the backing store. Do not
+	 * free it yet, if you return success destroy() will be called on
+	 * the variable.
+	 *
+	 * If an error is returned, the unlink will be aborted and the file
+	 * will still be present in the filesystem.
+	 *
+	 * If the hook is not provided, all attempts to unlink a file will
+	 * fail with -EPERM.
+	 */
+	int (*unlink)(void *variable);
+
+	/* private to fwvarfs generic code */
+	struct kernfs_root *kf_root;
+	/* did enumerate succeed? */
+	bool is_active;
+};
+
+/*
+ * Register a variable with fwvarfs to allow it to be seen by users.
+ *
+ * backend: the backend to which to add this variable
+ *
+ * name: the name of file representing the variable. Must be a valid
+ *       filename, so no nulls or slashes.
+ *
+ * variable: data private to the backend representing the variable -
+ *           will be passed back to every callback
+ *
+ * size: the initial size of the variable
+ */
+int fwvarfs_register_var(struct fwvarfs_backend *backend, const char *name,
+			 void *variable, size_t size);
+
+
+/* Backends go here */
+#if defined(CONFIG_FWVAR_FS_MEM_BACKEND)
+extern struct fwvarfs_backend fwvarfs_mem_backend;
+#endif
+
+#endif /* FWVARFS_H */
diff --git a/fs/fwvarfs/mem.c b/fs/fwvarfs/mem.c
new file mode 100644
index 000000000000..5c90ea856f8e
--- /dev/null
+++ b/fs/fwvarfs/mem.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens
+ *
+ * Thanks to efivarfs, and cgroupfs for the kernfs example.
+ */
+
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include "fwvarfs.h"
+
+static LIST_HEAD(mem_file_list);
+
+struct fwvarfs_mem_file {
+	size_t length;
+	char data[PAGE_SIZE];
+	struct list_head list;
+};
+
+static ssize_t fwvarfs_mem_file_read(void *var, char *buf, size_t bytes,
+				     loff_t off)
+{
+	struct fwvarfs_mem_file *file_data = var;
+	loff_t ppos = off;
+
+	return memory_read_from_buffer(buf, bytes, &ppos, file_data->data,
+				       file_data->length);
+}
+
+static ssize_t simple_write_to_kernel_buffer(void *to, size_t available,
+					     loff_t *ppos, const void *from,
+					     size_t count)
+{
+	loff_t pos = *ppos;
+
+	if (pos < 0)
+		return -EINVAL;
+	if (pos >= available)
+		return -ENOSPC;
+	if (!count)
+		return 0;
+	if (count > available - pos)
+		count = available - pos;
+	memcpy(to, from, count);
+	*ppos = pos + count;
+	return count;
+}
+
+static ssize_t fwvarfs_mem_file_write(void *var, const char *buf,
+				      size_t bytes, loff_t off)
+{
+	struct fwvarfs_mem_file *file_data = var;
+	loff_t ppos = off;
+	int rc;
+
+	// todo - update size of file
+	rc = simple_write_to_kernel_buffer(file_data->data, PAGE_SIZE, &ppos,
+					   buf, bytes);
+	if (rc)
+		file_data->length = ppos;
+	return rc;
+}
+
+
+static void *fwvarfs_mem_create(const char *name)
+{
+	struct fwvarfs_mem_file *file_data;
+
+	file_data = kzalloc(sizeof(struct fwvarfs_mem_file), GFP_KERNEL);
+	if (!file_data)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&file_data->list);
+	list_add(&mem_file_list, &file_data->list);
+
+	return file_data;
+}
+
+static void fwvarfs_mem_destroy(void *var)
+{
+	struct fwvarfs_mem_file *file_data = var;
+
+	list_del(&file_data->list);
+	kfree(file_data);
+}
+
+static int fwvarfs_mem_unlink(void *var)
+{
+	/*
+	 * This always succeeds and there's nothing we need to do.
+	 * We free the memory in destroy() which is called after
+	 * this by fwvarfs.
+	 */
+	return 0;
+}
+
+static int fwvarfs_mem_enumerate(void)
+{
+	/* Nothing to do, we always start from a blank slate */
+	return 0;
+}
+
+struct fwvarfs_backend fwvarfs_mem_backend = {
+	.name = "mem",
+	.read = fwvarfs_mem_file_read,
+	.write = fwvarfs_mem_file_write,
+	.create = fwvarfs_mem_create,
+	.destroy = fwvarfs_mem_destroy,
+	.unlink = fwvarfs_mem_unlink,
+	.enumerate = fwvarfs_mem_enumerate,
+};
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 74fe51dbd027..211366ecf5a8 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1201,7 +1201,6 @@ static int kernfs_iop_create(struct inode *dir, struct dentry *dentry,
 		return PTR_ERR(kn);
 
 	d_instantiate(dentry, kernfs_get_inode(dir->i_sb, kn));
-
 	return 0;
 }
 
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index f8c00045d537..61f2f5532366 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -34,6 +34,7 @@
 #define EFIVARFS_MAGIC		0xde5e81e4
 #define HOSTFS_SUPER_MAGIC	0x00c0ffee
 #define OVERLAYFS_SUPER_MAGIC	0x794c7630
+#define FWVARFS_SUPER_MAGIC	0x66777672	/* "fwvr" */
 
 #define MINIX_SUPER_MAGIC	0x137F		/* minix v1 fs, 14 char names */
 #define MINIX_SUPER_MAGIC2	0x138F		/* minix v1 fs, 30 char names */
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 1/6] kernfs: add create() and unlink() hooks
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20190520062553.14947-1-dja@axtens.net>

I'm building a generic firmware variable filesystem on top of kernfs
and I'd like to be able to create and unlink files.

The hooks are fairly straightforward. create() returns a kernfs_node*,
which is safe with regards to cleanup on error paths, because there
is no way that things can fail after that point in the current
implementation. However, currently O_EXCL is not implemented and
that may create failure paths, in which case we may need to revisit
this later.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 fs/kernfs/dir.c        | 55 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/kernfs.h |  3 +++
 2 files changed, 58 insertions(+)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 016ba88f7335..74fe51dbd027 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1175,6 +1175,59 @@ static int kernfs_iop_rename(struct inode *old_dir, struct dentry *old_dentry,
 	return ret;
 }
 
+static int kernfs_iop_create(struct inode *dir, struct dentry *dentry,
+			     umode_t mode, bool excl)
+{
+	struct kernfs_node *parent = dir->i_private;
+	struct kernfs_node *kn;
+	struct kernfs_syscall_ops *scops = kernfs_root(parent)->syscall_ops;
+
+	if (!scops || !scops->create)
+		return -EPERM;
+
+	if (!kernfs_get_active(parent))
+		return -ENODEV;
+
+	// TODO: add some locking to ensure that scops->create
+	// is called only once, and possibly to handle the O_EXCL case
+	WARN_ONCE(excl, "excl unimplemented");
+
+	kn = scops->create(parent, dentry->d_name.name, mode);
+
+	if (!kn)
+		return -EPERM;
+
+	if (IS_ERR(kn))
+		return PTR_ERR(kn);
+
+	d_instantiate(dentry, kernfs_get_inode(dir->i_sb, kn));
+
+	return 0;
+}
+
+static int kernfs_iop_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct kernfs_node *parent = dir->i_private;
+	struct kernfs_node *kn = d_inode(dentry)->i_private;
+	struct kernfs_syscall_ops *scops = kernfs_root(parent)->syscall_ops;
+	int ret;
+
+
+	if (!scops || !scops->unlink)
+		return -EPERM;
+
+	if (!kernfs_get_active(parent))
+		return -ENODEV;
+
+	ret = scops->unlink(kn);
+	if (ret)
+		return ret;
+
+	drop_nlink(d_inode(dentry));
+	dput(dentry);
+	return 0;
+};
+
 const struct inode_operations kernfs_dir_iops = {
 	.lookup		= kernfs_iop_lookup,
 	.permission	= kernfs_iop_permission,
@@ -1185,6 +1238,8 @@ const struct inode_operations kernfs_dir_iops = {
 	.mkdir		= kernfs_iop_mkdir,
 	.rmdir		= kernfs_iop_rmdir,
 	.rename		= kernfs_iop_rename,
+	.create		= kernfs_iop_create,
+	.unlink		= kernfs_iop_unlink,
 };
 
 static struct kernfs_node *kernfs_leftmost_descendant(struct kernfs_node *pos)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 2bf477f86eb1..282b96acbd7e 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -179,6 +179,9 @@ struct kernfs_syscall_ops {
 		      const char *new_name);
 	int (*show_path)(struct seq_file *sf, struct kernfs_node *kn,
 			 struct kernfs_root *root);
+	struct kernfs_node* (*create)(struct kernfs_node *parent,
+				      const char *name, umode_t mode);
+	int (*unlink)(struct kernfs_node *kn);
 };
 
 struct kernfs_root {
-- 
2.19.1


^ permalink raw reply related

* [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
From: Daniel Axtens @ 2019-05-20  6:25 UTC (permalink / raw)
  To: nayna, cclaudio, linux-fsdevel, greg, linuxppc-dev; +Cc: Daniel Axtens

Hi all,

As PowerNV moves towards secure boot, we need a place to put secure
variables. One option that has been canvassed is to make our secure
variables look like EFI variables. This is an early sketch of another
approach where we create a generic firmware variable file system,
fwvarfs, and an OPAL Secure Variable backend for it.

In short, platforms provide a simple backend that can interface with
the hardware, and fwvarfs deals with translating that into a
filesystem that you can use. Almost all of the hard work is done by
kernfs: fwvarfs provides a pretty thin layer on top of that to make
backends a simple as possible.

Behaviour and the API is documented in Documentation/filesystems/fwvarfs.txt

To demonstrate the concept, a fully functional memory-based backend is
provided, and a read-only but userspace-compatible EFI backend.

For OPAL secure variables, I have taken Claudio's commit, tweaked it
to apply to linux-next, replaced all the EFI support with a generic
API, and then written a backend against that. There's a coming version
from Claudio that moves the opal calls towards a simple key/value
interface rather than (name, vendor) pairs - I haven't waited for
that: this is really just to demonstrate that it could be done rather
than an attempt to get mergable code.  It is also compile tested only
as I haven't yet set myself up with a test machine.

The patches are a bit rough, and there are a number of outstanding
TODOs sprinkled in everywhere. The idea is just to do a proof of
concept to inform our discussions:

 - Is this the sort of approach you'd like (generic vs specific)?
 
 - Does the backend API make sense?
 
 - Is the use of kernfs the correct decision, or is it potentially too
   limiting? (e.g. no ability to do case-insensitivity like efivarfs)

 - Is assuming flat fwvars correct or is there a firmware with a
   hierarchical structure?

Regards,
Daniel

Claudio Carvalho (1):
  powerpc/powernv: Add support for OPAL secure variables

Daniel Axtens (5):
  kernfs: add create() and unlink() hooks
  fwvarfs: a generic firmware variable filesystem
  fwvarfs: efi backend
  powerpc/powernv: Remove EFI support for OPAL secure variables
  fwvarfs: Add opal_secvar backend

 Documentation/filesystems/fwvarfs.txt        | 154 ++++++++++
 arch/powerpc/include/asm/opal-api.h          |   6 +-
 arch/powerpc/include/asm/opal-secvar.h       |  58 ++++
 arch/powerpc/include/asm/opal.h              |  10 +
 arch/powerpc/platforms/powernv/Kconfig       |   8 +
 arch/powerpc/platforms/powernv/Makefile      |   1 +
 arch/powerpc/platforms/powernv/opal-call.c   |   4 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 121 ++++++++
 fs/Kconfig                                   |   1 +
 fs/Makefile                                  |   1 +
 fs/fwvarfs/Kconfig                           |  47 +++
 fs/fwvarfs/Makefile                          |  10 +
 fs/fwvarfs/efi.c                             | 177 +++++++++++
 fs/fwvarfs/fwvarfs.c                         | 294 +++++++++++++++++++
 fs/fwvarfs/fwvarfs.h                         | 124 ++++++++
 fs/fwvarfs/mem.c                             | 113 +++++++
 fs/fwvarfs/opal_secvar.c                     | 218 ++++++++++++++
 fs/kernfs/dir.c                              |  54 ++++
 include/linux/kernfs.h                       |   3 +
 include/uapi/linux/magic.h                   |   1 +
 20 files changed, 1404 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/filesystems/fwvarfs.txt
 create mode 100644 arch/powerpc/include/asm/opal-secvar.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c
 create mode 100644 fs/fwvarfs/Kconfig
 create mode 100644 fs/fwvarfs/Makefile
 create mode 100644 fs/fwvarfs/efi.c
 create mode 100644 fs/fwvarfs/fwvarfs.c
 create mode 100644 fs/fwvarfs/fwvarfs.h
 create mode 100644 fs/fwvarfs/mem.c
 create mode 100644 fs/fwvarfs/opal_secvar.c

-- 
2.19.1


^ permalink raw reply

* Re: [PATCH] mm: add account_locked_vm utility function
From: Alexey Kardashevskiy @ 2019-05-20  6:19 UTC (permalink / raw)
  To: Daniel Jordan, akpm
  Cc: Mark Rutland, Davidlohr Bueso, kvm, Alan Tull, linux-fpga,
	linux-kernel, kvm-ppc, linux-mm, Alex Williamson, Jason Gunthorpe,
	Moritz Fischer, Steve Sistare, Christoph Lameter, linuxppc-dev,
	Wu Hao
In-Reply-To: <20190503201629.20512-1-daniel.m.jordan@oracle.com>



On 04/05/2019 06:16, Daniel Jordan wrote:
> locked_vm accounting is done roughly the same way in five places, so
> unify them in a helper.  Standardize the debug prints, which vary
> slightly.

And I rather liked that prints were different and tell precisely which
one of three each printk is.

I commented below but in general this seems working.

Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>




>  Error codes stay the same, so user-visible behavior does too.
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Tull <atull@kernel.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Moritz Fischer <mdf@kernel.org>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Steve Sistare <steven.sistare@oracle.com>
> Cc: Wu Hao <hao.wu@intel.com>
> Cc: linux-mm@kvack.org
> Cc: kvm@vger.kernel.org
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-fpga@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
> Based on v5.1-rc7.  Tested with the vfio type1 driver.  Any feedback
> welcome.
> 
> Andrew, this one patch replaces these six from [1]:
> 
>     mm-change-locked_vms-type-from-unsigned-long-to-atomic64_t.patch
>     vfio-type1-drop-mmap_sem-now-that-locked_vm-is-atomic.patch
>     vfio-spapr_tce-drop-mmap_sem-now-that-locked_vm-is-atomic.patch
>     fpga-dlf-afu-drop-mmap_sem-now-that-locked_vm-is-atomic.patch
>     kvm-book3s-drop-mmap_sem-now-that-locked_vm-is-atomic.patch
>     powerpc-mmu-drop-mmap_sem-now-that-locked_vm-is-atomic.patch
> 
> That series converts locked_vm to an atomic, but on closer inspection causes at
> least one accounting race in mremap, and fixing it just for this type
> conversion came with too much ugly in the core mm to justify, especially when
> the right long-term fix is making these drivers use pinned_vm instead.
> 
> Christophe's suggestion of cmpxchg[2] does prevent the races he
> mentioned for pinned_vm, but others would still remain.  In perf_mmap
> and the hfi1 driver, pinned_vm is checked against the rlimit racily and
> then later increased when the pinned_vm originally read may have gone
> stale.  Any fixes for that, that I could think of, seem about as good as
> what's there now, so I left it.  I have a patch that uses cmpxchg with
> pinned_vm if others feel strongly that the aforementioned races should
> be fixed.
> 
> Daniel
> 
> [1] https://lore.kernel.org/linux-mm/20190402204158.27582-1-daniel.m.jordan@oracle.com/
> [2] https://lore.kernel.org/linux-mm/964bd5b0-f1e5-7bf0-5c58-18e75c550841@c-s.fr/
> 
>  arch/powerpc/kvm/book3s_64_vio.c    | 44 +++---------------------
>  arch/powerpc/mm/mmu_context_iommu.c | 41 +++-------------------
>  drivers/fpga/dfl-afu-dma-region.c   | 53 +++--------------------------
>  drivers/vfio/vfio_iommu_spapr_tce.c | 52 +++++-----------------------
>  drivers/vfio/vfio_iommu_type1.c     | 23 ++++---------
>  include/linux/mm.h                  | 19 +++++++++++
>  mm/util.c                           | 48 ++++++++++++++++++++++++++
>  7 files changed, 94 insertions(+), 186 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index f02b04973710..f7d37fa6003a 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -30,6 +30,7 @@
>  #include <linux/anon_inodes.h>
>  #include <linux/iommu.h>
>  #include <linux/file.h>
> +#include <linux/mm.h>
>  
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
> @@ -56,43 +57,6 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
>  	return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> -static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
> -{
> -	long ret = 0;
> -
> -	if (!current || !current->mm)
> -		return ret; /* process exited */
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	if (inc) {
> -		unsigned long locked, lock_limit;
> -
> -		locked = current->mm->locked_vm + stt_pages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			current->mm->locked_vm += stt_pages;
> -	} else {
> -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> -			stt_pages = current->mm->locked_vm;
> -
> -		current->mm->locked_vm -= stt_pages;
> -	}
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> -			inc ? '+' : '-',
> -			stt_pages << PAGE_SHIFT,
> -			current->mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK),
> -			ret ? " - exceeded" : "");
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  static void kvm_spapr_tce_iommu_table_free(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_iommu_table *stit = container_of(head,
> @@ -277,7 +241,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> -	kvmppc_account_memlimit(
> +	account_locked_vm(current->mm,
>  		kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
> @@ -303,7 +267,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  		return -EINVAL;
>  
>  	npages = kvmppc_tce_pages(size);
> -	ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
> +	ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
>  	if (ret)
>  		return ret;
>  
> @@ -359,7 +323,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  
>  	kfree(stt);
>   fail_acct:
> -	kvmppc_account_memlimit(kvmppc_stt_pages(npages), false);
> +	account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
>  	return ret;
>  }
>  
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index 8330f135294f..9e7001a70570 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -19,6 +19,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
>  #include <linux/sizes.h>
> +#include <linux/mm.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pte-walk.h>
>  #include <linux/mm_inline.h>
> @@ -51,40 +52,6 @@ struct mm_iommu_table_group_mem_t {
>  	u64 dev_hpa;		/* Device memory base address */
>  };
>  
> -static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
> -		unsigned long npages, bool incr)
> -{
> -	long ret = 0, locked, lock_limit;
> -
> -	if (!npages)
> -		return 0;
> -
> -	down_write(&mm->mmap_sem);
> -
> -	if (incr) {
> -		locked = mm->locked_vm + npages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			mm->locked_vm += npages;
> -	} else {
> -		if (WARN_ON_ONCE(npages > mm->locked_vm))
> -			npages = mm->locked_vm;
> -		mm->locked_vm -= npages;
> -	}
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
> -			current ? current->pid : 0,
> -			incr ? '+' : '-',
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK));
> -	up_write(&mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  bool mm_iommu_preregistered(struct mm_struct *mm)
>  {
>  	return !list_empty(&mm->context.iommu_group_mem_list);
> @@ -101,7 +68,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
>  	unsigned long entry, chunk;
>  
>  	if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
> -		ret = mm_iommu_adjust_locked_vm(mm, entries, true);
> +		ret = account_locked_vm(mm, entries, true);
>  		if (ret)
>  			return ret;
>  
> @@ -215,7 +182,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
>  	kfree(mem);
>  
>  unlock_exit:
> -	mm_iommu_adjust_locked_vm(mm, locked_entries, false);
> +	account_locked_vm(mm, locked_entries, false);
>  
>  	return ret;
>  }
> @@ -315,7 +282,7 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
>  unlock_exit:
>  	mutex_unlock(&mem_list_mutex);
>  
> -	mm_iommu_adjust_locked_vm(mm, unlock_entries, false);
> +	account_locked_vm(mm, unlock_entries, false);
>  
>  	return ret;
>  }
> diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
> index e18a786fc943..059438e17193 100644
> --- a/drivers/fpga/dfl-afu-dma-region.c
> +++ b/drivers/fpga/dfl-afu-dma-region.c
> @@ -12,6 +12,7 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/sched/signal.h>
>  #include <linux/uaccess.h>
> +#include <linux/mm.h>
>  
>  #include "dfl-afu.h"
>  
> @@ -31,52 +32,6 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
>  	afu->dma_regions = RB_ROOT;
>  }
>  
> -/**
> - * afu_dma_adjust_locked_vm - adjust locked memory
> - * @dev: port device
> - * @npages: number of pages
> - * @incr: increase or decrease locked memory
> - *
> - * Increase or decrease the locked memory size with npages input.
> - *
> - * Return 0 on success.
> - * Return -ENOMEM if locked memory size is over the limit and no CAP_IPC_LOCK.
> - */
> -static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
> -{
> -	unsigned long locked, lock_limit;
> -	int ret = 0;
> -
> -	/* the task is exiting. */
> -	if (!current->mm)
> -		return 0;
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	if (incr) {
> -		locked = current->mm->locked_vm + npages;
> -		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -			ret = -ENOMEM;
> -		else
> -			current->mm->locked_vm += npages;
> -	} else {
> -		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> -			npages = current->mm->locked_vm;
> -		current->mm->locked_vm -= npages;
> -	}
> -
> -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> -		incr ? '+' : '-', npages << PAGE_SHIFT,
> -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> -		ret ? "- exceeded" : "");
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  /**
>   * afu_dma_pin_pages - pin pages of given dma memory region
>   * @pdata: feature device platform data
> @@ -92,7 +47,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
>  	struct device *dev = &pdata->dev->dev;
>  	int ret, pinned;
>  
> -	ret = afu_dma_adjust_locked_vm(dev, npages, true);
> +	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret)
>  		return ret;
>  
> @@ -121,7 +76,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
>  free_pages:
>  	kfree(region->pages);
>  unlock_vm:
> -	afu_dma_adjust_locked_vm(dev, npages, false);
> +	account_locked_vm(current->mm, npages, false);
>  	return ret;
>  }
>  
> @@ -141,7 +96,7 @@ static void afu_dma_unpin_pages(struct dfl_feature_platform_data *pdata,
>  
>  	put_all_pages(region->pages, npages);
>  	kfree(region->pages);
> -	afu_dma_adjust_locked_vm(dev, npages, false);
> +	account_locked_vm(current->mm, npages, false);
>  
>  	dev_dbg(dev, "%ld pages unpinned\n", npages);
>  }
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 6b64e45a5269..d39a1b830d82 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -22,6 +22,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
> +#include <linux/mm.h>
>  
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> @@ -34,49 +35,13 @@
>  static void tce_iommu_detach_group(void *iommu_data,
>  		struct iommu_group *iommu_group);
>  
> -static long try_increment_locked_vm(struct mm_struct *mm, long npages)
> +static int tce_account_locked_vm(struct mm_struct *mm, unsigned long npages,
> +				 bool inc)
>  {
> -	long ret = 0, locked, lock_limit;
> -
>  	if (WARN_ON_ONCE(!mm))
>  		return -EPERM;


If this WARN_ON is the only reason for having tce_account_locked_vm()
instead of calling account_locked_vm() directly, you can then ditch the
check as I have never ever seen this triggered.


>  
> -	if (!npages)
> -		return 0;
> -
> -	down_write(&mm->mmap_sem);
> -	locked = mm->locked_vm + npages;
> -	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> -		ret = -ENOMEM;
> -	else
> -		mm->locked_vm += npages;
> -
> -	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK),
> -			ret ? " - exceeded" : "");
> -
> -	up_write(&mm->mmap_sem);
> -
> -	return ret;
> -}
> -
> -static void decrement_locked_vm(struct mm_struct *mm, long npages)
> -{
> -	if (!mm || !npages)
> -		return;
> -
> -	down_write(&mm->mmap_sem);
> -	if (WARN_ON_ONCE(npages > mm->locked_vm))
> -		npages = mm->locked_vm;
> -	mm->locked_vm -= npages;
> -	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
> -			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> -			rlimit(RLIMIT_MEMLOCK));
> -	up_write(&mm->mmap_sem);
> +	return account_locked_vm(mm, npages, inc);
>  }
>  
>  /*
> @@ -336,7 +301,7 @@ static int tce_iommu_enable(struct tce_container *container)
>  		return ret;
>  
>  	locked = table_group->tce32_size >> PAGE_SHIFT;
> -	ret = try_increment_locked_vm(container->mm, locked);
> +	ret = tce_account_locked_vm(container->mm, locked, true);
>  	if (ret)
>  		return ret;
>  
> @@ -355,7 +320,7 @@ static void tce_iommu_disable(struct tce_container *container)
>  	container->enabled = false;
>  
>  	BUG_ON(!container->mm);
> -	decrement_locked_vm(container->mm, container->locked_pages);
> +	tce_account_locked_vm(container->mm, container->locked_pages, false);
>  }
>  
>  static void *tce_iommu_open(unsigned long arg)
> @@ -658,7 +623,8 @@ static long tce_iommu_create_table(struct tce_container *container,
>  	if (!table_size)
>  		return -EINVAL;
>  
> -	ret = try_increment_locked_vm(container->mm, table_size >> PAGE_SHIFT);
> +	ret = tce_account_locked_vm(container->mm, table_size >> PAGE_SHIFT,
> +				    true);
>  	if (ret)
>  		return ret;
>  
> @@ -677,7 +643,7 @@ static void tce_iommu_free_table(struct tce_container *container,
>  	unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT;
>  
>  	iommu_tce_table_put(tbl);
> -	decrement_locked_vm(container->mm, pages);
> +	tce_account_locked_vm(container->mm, pages, false);
>  }
>  
>  static long tce_iommu_create_window(struct tce_container *container,
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d0f731c9920a..15ac76171ccd 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -273,25 +273,14 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  		return -ESRCH; /* process exited */
>  
>  	ret = down_write_killable(&mm->mmap_sem);
> -	if (!ret) {
> -		if (npage > 0) {
> -			if (!dma->lock_cap) {
> -				unsigned long limit;
> -
> -				limit = task_rlimit(dma->task,
> -						RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -				if (mm->locked_vm + npage > limit)
> -					ret = -ENOMEM;
> -			}
> -		}
> +	if (ret)
> +		goto out;


A single "goto" to jump just 3 lines below seems unnecessary.


>  
> -		if (!ret)
> -			mm->locked_vm += npage;
> -
> -		up_write(&mm->mmap_sem);
> -	}
> +	ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
> +				  dma->lock_cap);
> +	up_write(&mm->mmap_sem);
>  
> +out:
>  	if (async)
>  		mmput(mm);
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b10c21630f5..7134e55ca23f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1521,6 +1521,25 @@ static inline long get_user_pages_longterm(unsigned long start,
>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  			struct page **pages);
>  
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> +			struct task_struct *task, bool bypass_rlim);
> +
> +static inline int account_locked_vm(struct mm_struct *mm, unsigned long pages,
> +				    bool inc)
> +{
> +	int ret;
> +
> +	if (pages == 0 || !mm)
> +		return 0;
> +
> +	down_write(&mm->mmap_sem);
> +	ret = __account_locked_vm(mm, pages, inc, current,
> +				  capable(CAP_IPC_LOCK));
> +	up_write(&mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  /* Container for pinned pfns / pages */
>  struct frame_vector {
>  	unsigned int nr_allocated;	/* Number of frames we have space for */
> diff --git a/mm/util.c b/mm/util.c
> index 43a2984bccaa..552302665bc2 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -6,6 +6,7 @@
>  #include <linux/err.h>
>  #include <linux/sched.h>
>  #include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/security.h>
>  #include <linux/swap.h>
> @@ -346,6 +347,53 @@ int __weak get_user_pages_fast(unsigned long start,
>  }
>  EXPORT_SYMBOL_GPL(get_user_pages_fast);
>  
> +/**
> + * __account_locked_vm - account locked pages to an mm's locked_vm
> + * @mm:          mm to account against, may be NULL
> + * @pages:       number of pages to account
> + * @inc:         %true if @pages should be considered positive, %false if not
> + * @task:        task used to check RLIMIT_MEMLOCK
> + * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped
> + *
> + * Assumes @task and @mm are valid (i.e. at least one reference on each), and
> + * that mmap_sem is held as writer.
> + *
> + * Return:
> + * * 0       on success
> + * * 0       if @mm is NULL (can happen for example if the task is exiting)
> + * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
> + */
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> +			struct task_struct *task, bool bypass_rlim)
> +{
> +	unsigned long locked_vm, limit;
> +	int ret = 0;
> +
> +	locked_vm = mm->locked_vm;
> +	if (inc) {
> +		if (!bypass_rlim) {
> +			limit = task_rlimit(task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +			if (locked_vm + pages > limit) {
> +				ret = -ENOMEM;
> +				goto out;
> +			}
> +		}

Nit:

if (!ret)

and then you don't need "goto out".


> +		mm->locked_vm = locked_vm + pages;
> +	} else {
> +		WARN_ON_ONCE(pages > locked_vm);
> +		mm->locked_vm = locked_vm - pages;


Can go negative here. Not a huge deal but inaccurate imo.


> +	}
> +
> +out:
> +	pr_debug("%s: [%d] %c%lu %lu/%lu%s\n", __func__, task->pid,
> +		 (inc) ? '+' : '-', pages << PAGE_SHIFT,
> +		 locked_vm << PAGE_SHIFT, task_rlimit(task, RLIMIT_MEMLOCK),
> +		 ret ? " - exceeded" : "");
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(__account_locked_vm);
> +
>  unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
>  	unsigned long len, unsigned long prot,
>  	unsigned long flag, unsigned long pgoff)
> 
> base-commit: 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b
> 

-- 
Alexey

^ permalink raw reply

* [PATCH 5/5] asm-generic: remove ptrace.h
From: Christoph Hellwig @ 2019-05-20  6:00 UTC (permalink / raw)
  To: Oleg Nesterov, Arnd Bergmann
  Cc: linux-arch, linux-sh, x86, linux-mips, linux-kernel, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20190520060018.25569-1-hch@lst.de>

No one is using this header anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 MAINTAINERS                    |  1 -
 arch/mips/include/asm/ptrace.h |  5 ---
 include/asm-generic/ptrace.h   | 74 ----------------------------------
 3 files changed, 80 deletions(-)
 delete mode 100644 include/asm-generic/ptrace.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5cfbea4ce575..f3392488ffaf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12769,7 +12769,6 @@ F:	include/linux/regset.h
 F:	include/linux/tracehook.h
 F:	include/uapi/linux/ptrace.h
 F:	include/uapi/linux/ptrace.h
-F:	include/asm-generic/ptrace.h
 F:	kernel/ptrace.c
 F:	arch/*/ptrace*.c
 F:	arch/*/*/ptrace*.c
diff --git a/arch/mips/include/asm/ptrace.h b/arch/mips/include/asm/ptrace.h
index b6578611dddb..1e76774b36dd 100644
--- a/arch/mips/include/asm/ptrace.h
+++ b/arch/mips/include/asm/ptrace.h
@@ -56,11 +56,6 @@ static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
 	return regs->regs[31];
 }
 
-/*
- * Don't use asm-generic/ptrace.h it defines FP accessors that don't make
- * sense on MIPS.  We rather want an error if they get invoked.
- */
-
 static inline void instruction_pointer_set(struct pt_regs *regs,
                                            unsigned long val)
 {
diff --git a/include/asm-generic/ptrace.h b/include/asm-generic/ptrace.h
deleted file mode 100644
index 82e674f6b337..000000000000
--- a/include/asm-generic/ptrace.h
+++ /dev/null
@@ -1,74 +0,0 @@
-/*
- * Common low level (register) ptrace helpers
- *
- * Copyright 2004-2011 Analog Devices Inc.
- *
- * Licensed under the GPL-2 or later.
- */
-
-#ifndef __ASM_GENERIC_PTRACE_H__
-#define __ASM_GENERIC_PTRACE_H__
-
-#ifndef __ASSEMBLY__
-
-/* Helpers for working with the instruction pointer */
-#ifndef GET_IP
-#define GET_IP(regs) ((regs)->pc)
-#endif
-#ifndef SET_IP
-#define SET_IP(regs, val) (GET_IP(regs) = (val))
-#endif
-
-static inline unsigned long instruction_pointer(struct pt_regs *regs)
-{
-	return GET_IP(regs);
-}
-static inline void instruction_pointer_set(struct pt_regs *regs,
-                                           unsigned long val)
-{
-	SET_IP(regs, val);
-}
-
-#ifndef profile_pc
-#define profile_pc(regs) instruction_pointer(regs)
-#endif
-
-/* Helpers for working with the user stack pointer */
-#ifndef GET_USP
-#define GET_USP(regs) ((regs)->usp)
-#endif
-#ifndef SET_USP
-#define SET_USP(regs, val) (GET_USP(regs) = (val))
-#endif
-
-static inline unsigned long user_stack_pointer(struct pt_regs *regs)
-{
-	return GET_USP(regs);
-}
-static inline void user_stack_pointer_set(struct pt_regs *regs,
-                                          unsigned long val)
-{
-	SET_USP(regs, val);
-}
-
-/* Helpers for working with the frame pointer */
-#ifndef GET_FP
-#define GET_FP(regs) ((regs)->fp)
-#endif
-#ifndef SET_FP
-#define SET_FP(regs, val) (GET_FP(regs) = (val))
-#endif
-
-static inline unsigned long frame_pointer(struct pt_regs *regs)
-{
-	return GET_FP(regs);
-}
-static inline void frame_pointer_set(struct pt_regs *regs,
-                                     unsigned long val)
-{
-	SET_FP(regs, val);
-}
-
-#endif /* __ASSEMBLY__ */
-
-#endif
-- 
2.20.1


^ permalink raw reply related

* [PATCH 4/5] x86: don't use asm-generic/ptrace.h
From: Christoph Hellwig @ 2019-05-20  6:00 UTC (permalink / raw)
  To: Oleg Nesterov, Arnd Bergmann
  Cc: linux-arch, linux-sh, x86, linux-mips, linux-kernel, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <20190520060018.25569-1-hch@lst.de>

Doing the indirection through macros for the regs accessors just
makes them harder to read, so implement the helpers directly.

Note that only the helpers actually used are implemented now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/ptrace.h | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 8a7fc0cca2d1..e22816e865ca 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -98,7 +98,6 @@ struct cpuinfo_x86;
 struct task_struct;
 
 extern unsigned long profile_pc(struct pt_regs *regs);
-#define profile_pc profile_pc
 
 extern unsigned long
 convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs);
@@ -175,11 +174,32 @@ static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
 }
 #endif
 
-#define GET_IP(regs) ((regs)->ip)
-#define GET_FP(regs) ((regs)->bp)
-#define GET_USP(regs) ((regs)->sp)
+static inline unsigned long instruction_pointer(struct pt_regs *regs)
+{
+	return regs->ip;
+}
+
+static inline void instruction_pointer_set(struct pt_regs *regs,
+		unsigned long val)
+{
+	regs->ip = val;
+}
+
+static inline unsigned long frame_pointer(struct pt_regs *regs)
+{
+	return regs->bp;
+}
 
-#include <asm-generic/ptrace.h>
+static inline unsigned long user_stack_pointer(struct pt_regs *regs)
+{
+	return regs->sp;
+}
+
+static inline void user_stack_pointer_set(struct pt_regs *regs,
+		unsigned long val)
+{
+	regs->sp = val;
+}
 
 /* Query offset/name of register from its name/offset */
 extern int regs_query_register_offset(const char *name);
-- 
2.20.1


^ permalink raw reply related

* [PATCH 3/5] sh: don't use asm-generic/ptrace.h
From: Christoph Hellwig @ 2019-05-20  6:00 UTC (permalink / raw)
  To: Oleg Nesterov, Arnd Bergmann
  Cc: linux-arch, linux-sh, x86, linux-mips, linux-kernel, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20190520060018.25569-1-hch@lst.de>

Doing the indirection through macros for the regs accessors just
makes them harder to read, so implement the helpers directly.

Note that only the helpers actually used are implemented now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/sh/include/asm/ptrace.h | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/ptrace.h b/arch/sh/include/asm/ptrace.h
index 9143c7babcbe..6c89e3e04cee 100644
--- a/arch/sh/include/asm/ptrace.h
+++ b/arch/sh/include/asm/ptrace.h
@@ -16,8 +16,31 @@
 #define user_mode(regs)			(((regs)->sr & 0x40000000)==0)
 #define kernel_stack_pointer(_regs)	((unsigned long)(_regs)->regs[15])
 
-#define GET_FP(regs)	((regs)->regs[14])
-#define GET_USP(regs)	((regs)->regs[15])
+static inline unsigned long instruction_pointer(struct pt_regs *regs)
+{
+	return regs->pc;
+}
+static inline void instruction_pointer_set(struct pt_regs *regs,
+		unsigned long val)
+{
+	regs->pc = val;
+}
+
+static inline unsigned long frame_pointer(struct pt_regs *regs)
+{
+	return regs->regs[14];
+}
+
+static inline unsigned long user_stack_pointer(struct pt_regs *regs)
+{
+	return regs->regs[15];
+}
+
+static inline void user_stack_pointer_set(struct pt_regs *regs,
+		unsigned long val)
+{
+	regs->regs[15] = val;
+}
 
 #define arch_has_single_step()	(1)
 
@@ -112,7 +135,5 @@ static inline unsigned long profile_pc(struct pt_regs *regs)
 
 	return pc;
 }
-#define profile_pc profile_pc
 
-#include <asm-generic/ptrace.h>
 #endif /* __ASM_SH_PTRACE_H */
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/5] powerpc: don't use asm-generic/ptrace.h
From: Christoph Hellwig @ 2019-05-20  6:00 UTC (permalink / raw)
  To: Oleg Nesterov, Arnd Bergmann
  Cc: linux-arch, linux-sh, x86, linux-mips, linux-kernel, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20190520060018.25569-1-hch@lst.de>

Doing the indirection through macros for the regs accessors just
makes them harder to read, so implement the helpers directly.

Note that only the helpers actually used are implemented now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/include/asm/ptrace.h | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 6f047730e642..fc007d186a82 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -115,18 +115,33 @@ struct pt_regs
 
 #ifndef __ASSEMBLY__
 
-#define GET_IP(regs)		((regs)->nip)
-#define GET_USP(regs)		((regs)->gpr[1])
-#define GET_FP(regs)		(0)
-#define SET_FP(regs, val)
+static inline unsigned long instruction_pointer(struct pt_regs *regs)
+{
+	return regs->nip;
+}
+
+static inline void instruction_pointer_set(struct pt_regs *regs,
+		unsigned long val)
+{
+	regs->nip = val;
+}
+
+static inline unsigned long user_stack_pointer(struct pt_regs *regs)
+{
+	return regs->gpr[1];
+}
+
+static inline unsigned long frame_pointer(struct pt_regs *regs)
+{
+	return 0;
+}
 
 #ifdef CONFIG_SMP
 extern unsigned long profile_pc(struct pt_regs *regs);
-#define profile_pc profile_pc
+#else
+#define profile_pc(regs) instruction_pointer(regs)
 #endif
 
-#include <asm-generic/ptrace.h>
-
 #define kernel_stack_pointer(regs) ((regs)->gpr[1])
 static inline int is_syscall_success(struct pt_regs *regs)
 {
-- 
2.20.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox