* Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.
From: Aneesh Kumar K.V @ 2020-06-08 7:42 UTC (permalink / raw)
To: Jan Kara, Williams, Dan J
Cc: linux-nvdimm, jack@suse.de, Jeff Moyer, Michal Suchánek,
linuxppc-dev
In-Reply-To: <f8113a5b-be3b-3627-7535-ed2c9e0293f9@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> On 6/3/20 1:56 PM, Jan Kara wrote:
>> On Tue 02-06-20 17:59:08, Williams, Dan J wrote:
>>> [ forgive formatting, a series of unfortunate events has me using Outlook for the moment ]
>>>
>>>> From: Jan Kara <jack@suse.cz>
>>>>>>> These flags are device properties that affect the kernel and
>>>>>>> userspace's handling of persistence.
>>>>>>>
>>>>>>
>>>>>> That will not handle the scenario with multiple applications using
>>>>>> the same fsdax mount point where one is updated to use the new
>>>>>> instruction and the other is not.
>>>>>
>>>>> Right, it needs to be a global setting / flag day to switch from one
>>>>> regime to another. Per-process control is a recipe for disaster.
>>>>
>>>> First I'd like to mention that hopefully the concern is mostly theoretical since
>>>> as Aneesh wrote above, real persistent memory never shipped for PPC and
>>>> so there are very few apps (if any) using the old way to ensure cache
>>>> flushing.
>>>>
>>>> But I'd like to understand why do you think per-process control is a recipe for
>>>> disaster? Because from my POV the sysfs interface you propose is actually
>>>> difficult to use in practice. As a distributor, you have hard time picking the
>>>> default because you have a choice between picking safe option which is
>>>> going to confuse users because of failing MAP_SYNC and unsafe option
>>>> where everyone will be happy until someone looses data because of some
>>>> ancient application using wrong instructions to persist data. Poor experience
>>>> for users in either way. And when distro defaults to "safe option", then the
>>>> burden is on the sysadmin to toggle the switch but how is he supposed to
>>>> decide when that is safe? First he has to understand what the problem
>>>> actually is, then he has to audit all the applications using pmem whether they
>>>> use the new instruction - which is IMO a lot of effort if you have a couple of
>>>> applications and practically infeasible if you have more of them.
>>>> So IMO the burden should be *on the application* to declare that it is aware
>>>> of the new instructions to flush pmem on the platform and only to such
>>>> application the kernel should give the trust to use MAP_SYNC mappings.
>>>
>>> The "disaster" in my mind is this need to globally change the ABI for
>>> persistence semantics for all of Linux because one CPU wants a do over.
>>> What does a generic "MAP_SYNC_ENABLE" knob even mean to the existing
>>> deployed base of persistent memory applications? Yes, sysfs is awkward,
>>> but it's trying to provide some relief without imposing unexplainable
>>> semantics on everyone else. I think a comprehensive (overengineered)
>>> solution would involve not introducing another "I know what I'm doing"
>>> flag to the interface, but maybe requiring applications to call a pmem
>>> sync API in something like a vsyscall. Or, also overengineered, some
>>> binary translation / interpretation to actively detect and kill
>>> applications that deploy the old instructions. Something horrid like on
>>> first write fault to a MAP_SYNC try to look ahead in the binary for the
>>> correct sync sequence and kill the application otherwise. That would at
>>> least provide some enforcement and safety without requiring other
>>> architectures to consider what MAP_SYNC_ENABLE means to them.
>>
>> Thanks for explanation. So I absolutely agree that other architectures (and
>> even older versions of POWER architecture) must not be influenced by the new
>> tunable. That's why I wrote in my reply to Aneesh that I'd be for checking
>> during mmap(2) with MAP_SYNC, whether we are in a situation where new PPC
>> flush instructions are required and *only in that case* decide based on the
>> prctl value whether MAP_SYNC should be allowed or not.
>>
>
> v2 version of the patch series does that
>
> https://lore.kernel.org/linuxppc-dev/20200602074909.36738-1-aneesh.kumar@linux.ibm.com/
>
>> Whether this solution is overengineering or not depends on how you think
>> it's likely there will be applications trying to use old flush instructions
>> with MAP_SYNC on POWER10 platforms...
>>
>
> Now considering that with ppc64 we never had a real persistent memory
> device available for the end-user to try and the new instructions are
> only needed on newer hardware, can we assume we have enough time to get
> the userspace to use new instructions?
>
> As a safety net, we can keep the dax device-specific sysfs control. But
> in reality, by the time newer hardware gets released, we can get the
> distributions updated to flip the CONFIG_ARCH_MAP_SYNC_DISABLE=n?
>
> With this:
> 1) vPMEM continues to work and since it is a volatile region. That
> doesn't need any flush instructions.
>
> 2) We get pmdk and other user applications updated to use new
> instructions and make sure updated packages are made available to all
> distributions
>
> 3) On newer hardware, the device will appear with a new compat string.
> Hence older distributions won't initialize pmem on newer hardware.
>
> 4) If we have a newer kernel with an older distro, we use the per
> namespace sysfs knob that prevents the usage of MAP_SYNC.
>
> 5) After a year or so we mark the CONFIG_ARCH_MAP_SYNC_DISABLE=n
> on ppc64 when we are confident that everybody is using the new flush
> instruction.
>
Is this approach ok for distributions? If so I can repost the series
dropping the prctl change.
-aneesh
^ permalink raw reply
* [RFC PATCH] ASoC: fsl_asrc_dma: Fix warning "Cannot create DMA dma:tx symlink"
From: Shengjiu Wang @ 2020-06-08 7:07 UTC (permalink / raw)
To: lars, perex, tiwai, lgirdwood, broonie, timur, nicoleotsuka,
Xiubo.Lee, festevam, alsa-devel, linux-kernel, linuxppc-dev
The issue log is:
[ 48.021506] CPU: 0 PID: 664 Comm: aplay Not tainted 5.7.0-rc1-13120-g12b434cbbea0 #343
[ 48.031063] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 48.037638] [<c0110dd8>] (unwind_backtrace) from [<c010b8ec>] (show_stack+0x10/0x14)
[ 48.045413] [<c010b8ec>] (show_stack) from [<c0557fc0>] (dump_stack+0xe4/0x118)
[ 48.052757] [<c0557fc0>] (dump_stack) from [<c032aeb4>] (sysfs_warn_dup+0x50/0x64)
[ 48.060357] [<c032aeb4>] (sysfs_warn_dup) from [<c032b1a8>] (sysfs_do_create_link_sd+0xc8/0xd4)
[ 48.069086] [<c032b1a8>] (sysfs_do_create_link_sd) from [<c05dc668>] (dma_request_chan+0xb0/0x210)
[ 48.078068] [<c05dc668>] (dma_request_chan) from [<c05dc7d0>] (dma_request_slave_channel+0x8/0x14)
[ 48.087060] [<c05dc7d0>] (dma_request_slave_channel) from [<c09d5cd4>] (fsl_asrc_dma_hw_params+0x1dc/0x434)
[ 48.096831] [<c09d5cd4>] (fsl_asrc_dma_hw_params) from [<c09c143c>] (soc_pcm_hw_params+0x4b0/0x650)
[ 48.105903] [<c09c143c>] (soc_pcm_hw_params) from [<c09c36a8>] (dpcm_fe_dai_hw_params+0x70/0xe4)
[ 48.114715] [<c09c36a8>] (dpcm_fe_dai_hw_params) from [<c099b274>] (snd_pcm_hw_params+0x158/0x418)
[ 48.123701] [<c099b274>] (snd_pcm_hw_params) from [<c099c5a0>] (snd_pcm_ioctl+0x734/0x183c)
[ 48.132079] [<c099c5a0>] (snd_pcm_ioctl) from [<c029ff94>] (ksys_ioctl+0x2ac/0xb98)
[ 48.139765] [<c029ff94>] (ksys_ioctl) from [<c0100080>] (ret_fast_syscall+0x0/0x28)
[ 48.147440] Exception stack(0xed3c5fa8 to 0xed3c5ff0)
[ 48.152515] 5fa0: bec28670 00e92870 00000004 c25c4111 bec28670 0002000f
[ 48.160716] 5fc0: bec28670 00e92870 00e92820 00000036 0000bb80 00000000 0002c2f8 00000003
[ 48.168913] 5fe0: b6f4d3fc bec2853c b6ee7655 b6e22cf8
[ 48.174236] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
The dma channel is already requested by Back-End cpu dai driver,
if fsl_asrc_dma requests dma chan with same dma:tx symlink, then
this warning comes out.
The warning is added by
commit 71723a96b8b1 ("dmaengine: Create symlinks between DMA channels and slaves")
commit bad83565eafe ("dmaengine: Cleanups for the slave <-> channel symlink support")
As the dma channel is requested by Back-End, we need to reuse the channel
and avoid to request a new one, then the issue can be fixed.
In order to get the dma channel which is already requested in Back-End.
we export two functions (snd_soc_lookup_component_nolocked and
soc_component_to_pcm), if we can get the dma channel, then reuse it.
if can't, then request a new one.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
include/sound/dmaengine_pcm.h | 11 ++++++
include/sound/soc.h | 2 ++
sound/soc/fsl/fsl_asrc_common.h | 2 ++
sound/soc/fsl/fsl_asrc_dma.c | 49 +++++++++++++++++++++------
sound/soc/soc-core.c | 3 +-
sound/soc/soc-generic-dmaengine-pcm.c | 12 -------
6 files changed, 55 insertions(+), 24 deletions(-)
diff --git a/include/sound/dmaengine_pcm.h b/include/sound/dmaengine_pcm.h
index b65220685920..8c5e38180fb0 100644
--- a/include/sound/dmaengine_pcm.h
+++ b/include/sound/dmaengine_pcm.h
@@ -161,4 +161,15 @@ int snd_dmaengine_pcm_prepare_slave_config(struct snd_pcm_substream *substream,
#define SND_DMAENGINE_PCM_DRV_NAME "snd_dmaengine_pcm"
+struct dmaengine_pcm {
+ struct dma_chan *chan[SNDRV_PCM_STREAM_LAST + 1];
+ const struct snd_dmaengine_pcm_config *config;
+ struct snd_soc_component component;
+ unsigned int flags;
+};
+
+static inline struct dmaengine_pcm *soc_component_to_pcm(struct snd_soc_component *p)
+{
+ return container_of(p, struct dmaengine_pcm, component);
+}
#endif
diff --git a/include/sound/soc.h b/include/sound/soc.h
index 74868436ac79..565612a8d690 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -444,6 +444,8 @@ int devm_snd_soc_register_component(struct device *dev,
const struct snd_soc_component_driver *component_driver,
struct snd_soc_dai_driver *dai_drv, int num_dai);
void snd_soc_unregister_component(struct device *dev);
+struct snd_soc_component *snd_soc_lookup_component_nolocked(struct device *dev,
+ const char *driver_name);
struct snd_soc_component *snd_soc_lookup_component(struct device *dev,
const char *driver_name);
diff --git a/sound/soc/fsl/fsl_asrc_common.h b/sound/soc/fsl/fsl_asrc_common.h
index 77665b15c8db..09512bc79b80 100644
--- a/sound/soc/fsl/fsl_asrc_common.h
+++ b/sound/soc/fsl/fsl_asrc_common.h
@@ -32,6 +32,7 @@ enum asrc_pair_index {
* @dma_chan: inputer and output DMA channels
* @dma_data: private dma data
* @pos: hardware pointer position
+ * @req_dma_chan_dev_to_dev: flag for release dev_to_dev chan
* @private: pair private area
*/
struct fsl_asrc_pair {
@@ -45,6 +46,7 @@ struct fsl_asrc_pair {
struct dma_chan *dma_chan[2];
struct imx_dma_data dma_data;
unsigned int pos;
+ bool req_dma_chan_dev_to_dev;
void *private;
};
diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index d6a3fc5f87e5..3ad862225326 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -135,6 +135,7 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
struct snd_dmaengine_dai_dma_data *dma_params_be = NULL;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
+ struct snd_soc_component *component_be = NULL;
struct fsl_asrc *asrc = pair->asrc;
struct dma_slave_config config_fe, config_be;
enum asrc_pair_index index = pair->index;
@@ -142,7 +143,7 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
int stream = substream->stream;
struct imx_dma_data *tmp_data;
struct snd_soc_dpcm *dpcm;
- struct dma_chan *tmp_chan;
+ struct dma_chan *tmp_chan = NULL, *tmp_chan_new = NULL;
struct device *dev_be;
u8 dir = tx ? OUT : IN;
dma_cap_mask_t mask;
@@ -160,6 +161,7 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
substream_be = snd_soc_dpcm_get_substream(be, stream);
dma_params_be = snd_soc_dai_get_dma_data(dai, substream_be);
dev_be = dai->dev;
+ component_be = snd_soc_lookup_component_nolocked(dev_be, SND_DMAENGINE_PCM_DRV_NAME);
break;
}
@@ -205,10 +207,16 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
*/
if (!asrc->use_edma) {
/* Get DMA request of Back-End */
- tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
+ if (component_be)
+ tmp_chan = soc_component_to_pcm(component_be)->chan[substream->stream];
+ if (!tmp_chan) {
+ tmp_chan_new = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
+ tmp_chan = tmp_chan_new;
+ }
tmp_data = tmp_chan->private;
pair->dma_data.dma_request = tmp_data->dma_request;
- dma_release_channel(tmp_chan);
+ if (tmp_chan_new)
+ dma_release_channel(tmp_chan_new);
/* Get DMA request of Front-End */
tmp_chan = asrc->get_dma_channel(pair, dir);
@@ -220,9 +228,26 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
pair->dma_chan[dir] =
dma_request_channel(mask, filter, &pair->dma_data);
+ pair->req_dma_chan_dev_to_dev = true;
} else {
- pair->dma_chan[dir] =
- asrc->get_dma_channel(pair, dir);
+ /* With EDMA, there is two dma channels can be used for p2p,
+ * one is from ASRC, one is from another peripheral
+ * (ESAI or SAI) previously we select the dma channel of ASRC,
+ * but find an issue for ideal ratio case, there is no control
+ * for data copy speed, the speed is faster than sample
+ * frequency.
+ *
+ * So we switch to dma channel of peripheral (ESAI or SAI),
+ * that copy speed of DMA is controlled by data consumption
+ * speed in the peripheral FIFO.
+ */
+ pair->req_dma_chan_dev_to_dev = false;
+ if (component_be)
+ pair->dma_chan[dir] = soc_component_to_pcm(component_be)->chan[substream->stream];;
+ if (!pair->dma_chan[dir]) {
+ pair->dma_chan[dir] = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
+ pair->req_dma_chan_dev_to_dev = true;
+ }
}
if (!pair->dma_chan[dir]) {
@@ -273,19 +298,21 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component,
static int fsl_asrc_dma_hw_free(struct snd_soc_component *component,
struct snd_pcm_substream *substream)
{
+ bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
+ u8 dir = tx ? OUT : IN;
snd_pcm_set_runtime_buffer(substream, NULL);
- if (pair->dma_chan[IN])
- dma_release_channel(pair->dma_chan[IN]);
+ if (pair->dma_chan[!dir])
+ dma_release_channel(pair->dma_chan[!dir]);
- if (pair->dma_chan[OUT])
- dma_release_channel(pair->dma_chan[OUT]);
+ if (pair->dma_chan[dir] && pair->req_dma_chan_dev_to_dev)
+ dma_release_channel(pair->dma_chan[dir]);
- pair->dma_chan[IN] = NULL;
- pair->dma_chan[OUT] = NULL;
+ pair->dma_chan[!dir] = NULL;
+ pair->dma_chan[dir] = NULL;
return 0;
}
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index b07eca2c6ccc..d4c73e86d058 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -310,7 +310,7 @@ struct snd_soc_component *snd_soc_rtdcom_lookup(struct snd_soc_pcm_runtime *rtd,
}
EXPORT_SYMBOL_GPL(snd_soc_rtdcom_lookup);
-static struct snd_soc_component
+struct snd_soc_component
*snd_soc_lookup_component_nolocked(struct device *dev, const char *driver_name)
{
struct snd_soc_component *component;
@@ -329,6 +329,7 @@ static struct snd_soc_component
return found_component;
}
+EXPORT_SYMBOL_GPL(snd_soc_lookup_component_nolocked);
struct snd_soc_component *snd_soc_lookup_component(struct device *dev,
const char *driver_name)
diff --git a/sound/soc/soc-generic-dmaengine-pcm.c b/sound/soc/soc-generic-dmaengine-pcm.c
index f728309a0833..80a4e71f2d95 100644
--- a/sound/soc/soc-generic-dmaengine-pcm.c
+++ b/sound/soc/soc-generic-dmaengine-pcm.c
@@ -21,18 +21,6 @@
*/
#define SND_DMAENGINE_PCM_FLAG_NO_RESIDUE BIT(31)
-struct dmaengine_pcm {
- struct dma_chan *chan[SNDRV_PCM_STREAM_LAST + 1];
- const struct snd_dmaengine_pcm_config *config;
- struct snd_soc_component component;
- unsigned int flags;
-};
-
-static struct dmaengine_pcm *soc_component_to_pcm(struct snd_soc_component *p)
-{
- return container_of(p, struct dmaengine_pcm, component);
-}
-
static struct device *dmaengine_dma_dev(struct dmaengine_pcm *pcm,
struct snd_pcm_substream *substream)
{
--
2.21.0
^ permalink raw reply related
* Re: [PATCH v2 2/4] powerpc/64/mm: implement page mapping percpu first chunk allocator
From: Aneesh Kumar K.V @ 2020-06-08 7:12 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: cam
In-Reply-To: <20200608070904.387440-2-aneesh.kumar@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Implement page mapping percpu first chunk allocator as a fallback to
> the embedding allocator. With 4K hash translation we limit our page
> table range to 64TB and commit: 0034d395f89d ("powerpc/mm/hash64: Map all the
> kernel regions in the same 0xc range") moved all kernel mapping to
> that 64TB range. In-order to support sparse memory layout we need
> to increase our linear mapping space and reduce other mappings.
>
> With such a layout percpu embedded first chunk allocator will fail
> because of small vmalloc range. Add a fallback to page mapping
> percpu first chunk allocator for such failures.
>
> The below dmesg output can be observed in such case.
>
> percpu: max_distance=0x1ffffef00000 too large for vmalloc space 0x10000000000
> PERCPU: auto allocator failed (-22), falling back to page size
> percpu: 40 4K pages/cpu s148816 r0 d15024
>
This patch requires powersave=off kernel command line to boot. We are
working to make sure we don't access per cpu variables in real mode.
Additionally, you can also try the below workaround patch
modified arch/powerpc/kernel/mce.c
@@ -711,7 +711,7 @@ long hmi_exception_realmode(struct pt_regs *regs)
{
int ret;
- __this_cpu_inc(irq_stat.hmi_exceptions);
+// __this_cpu_inc(irq_stat.hmi_exceptions);
ret = hmi_handle_debugtrig(regs);
if (ret >= 0)
-aneesh
^ permalink raw reply
* [PATCH v2 3/4] powerpc/book3s64/hash/4k: Support large linear mapping range with 4K
From: Aneesh Kumar K.V @ 2020-06-08 7:09 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: cam, Aneesh Kumar K.V
In-Reply-To: <20200608070904.387440-1-aneesh.kumar@linux.ibm.com>
With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel
regions in the same 0xc range"), we now split the 64TB address range
into 4 contexts each of 16TB. That implies we can do only 16TB linear
mapping.
On some systems, eg. Power9, memory attached to nodes > 0 will appear
above 16TB in the linear mapping. This resulted in kernel crash when
we boot such systems in hash translation mode with 4K PAGE_SIZE.
This patch updates the kernel mapping such that we now start supporting upto
61TB of memory with 4K. The kernel mapping now looks like below 4K PAGE_SIZE
and hash translation.
vmalloc start = 0xc0003d0000000000
IO start = 0xc0003e0000000000
vmemmap start = 0xc0003f0000000000
Our MAX_PHYSMEM_BITS for 4K is still 64TB even though we can only map 61TB.
We prevent bolt mapping anything outside 61TB range by checking against
H_VMALLOC_START.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/hash-4k.h | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 3f9ae3585ab9..80c953414882 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -13,20 +13,19 @@
*/
#define MAX_EA_BITS_PER_CONTEXT 46
-#define REGION_SHIFT (MAX_EA_BITS_PER_CONTEXT - 2)
/*
- * Our page table limit us to 64TB. Hence for the kernel mapping,
- * each MAP area is limited to 16 TB.
- * The four map areas are: linear mapping, vmap, IO and vmemmap
+ * Our page table limit us to 64TB. For 64TB physical memory, we only need 64GB
+ * of vmemmap space. To better support sparse memory layout, we use 61TB
+ * linear map range, 1TB of vmalloc, 1TB of I/O and 1TB of vmememmap.
*/
+#define REGION_SHIFT (40)
#define H_KERN_MAP_SIZE (ASM_CONST(1) << REGION_SHIFT)
/*
- * Define the address range of the kernel non-linear virtual area
- * 16TB
+ * Define the address range of the kernel non-linear virtual area (61TB)
*/
-#define H_KERN_VIRT_START ASM_CONST(0xc000100000000000)
+#define H_KERN_VIRT_START ASM_CONST(0xc0003d0000000000)
#ifndef __ASSEMBLY__
#define H_PTE_TABLE_SIZE (sizeof(pte_t) << H_PTE_INDEX_SIZE)
--
2.26.2
^ permalink raw reply related
* [PATCH v2 4/4] powerpc/mm/book3s: Split radix and hash MAX_PHYSMEM limit
From: Aneesh Kumar K.V @ 2020-06-08 7:09 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: cam, Aneesh Kumar K.V
In-Reply-To: <20200608070904.387440-1-aneesh.kumar@linux.ibm.com>
MAX_PHYSMEM #define is used along with sparsemem to determine the SECTION_SHIFT
value. Powerpc also uses the same value to limit the max memory enabled on the
system. With 4K PAGE_SIZE and hash translation mode, we want to limit the max
memory enabled to 64TB due to page table size restrictions. However, with
radix translation, we don't have these restrictions. Hence split the radix
and hash MA_PHYSMEM limit and use different limit for each of them.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/hash-4k.h | 5 +++++
arch/powerpc/include/asm/book3s/64/hash-64k.h | 13 +++++++++++++
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 4 ++--
arch/powerpc/include/asm/book3s/64/mmu.h | 15 ---------------
arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +++++++
arch/powerpc/include/asm/book3s/64/radix.h | 16 ++++++++++++++++
arch/powerpc/kernel/prom.c | 5 +++++
arch/powerpc/mm/book3s64/slb.c | 4 ++--
8 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 80c953414882..2e86d9b35766 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -22,6 +22,11 @@
#define REGION_SHIFT (40)
#define H_KERN_MAP_SIZE (ASM_CONST(1) << REGION_SHIFT)
+/*
+ * Limits the linear mapping range
+ */
+#define H_MAX_PHYSMEM_BITS 46
+
/*
* Define the address range of the kernel non-linear virtual area (61TB)
*/
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0729c034e56f..16c040d19a2b 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -7,6 +7,19 @@
#define H_PUD_INDEX_SIZE 10 // size: 8B << 10 = 8KB, maps 2^10 x 16GB = 16TB
#define H_PGD_INDEX_SIZE 8 // size: 8B << 8 = 2KB, maps 2^8 x 16TB = 4PB
+/*
+ * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce
+ * memory requirements with large number of sections.
+ * 51 bits is the max physical real address on POWER9
+ */
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME)
+#define H_MAX_PHYSMEM_BITS 51
+#else
+#define H_MAX_PHYSMEM_BITS 46
+#endif
/*
* Each context is 512TB size. SLB miss for first context/default context
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 3fa1b962dc27..84817466a3dd 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -577,8 +577,8 @@ extern void slb_set_size(u16 size);
* For vmalloc and memmap, we use just one context with 512TB. With 64 byte
* struct page size, we need ony 32 TB in memmap for 2PB (51 bits (MAX_PHYSMEM_BITS)).
*/
-#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
-#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT))
+#if (H_MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
+#define MAX_KERNEL_CTX_CNT (1UL << (H_MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT))
#else
#define MAX_KERNEL_CTX_CNT 1
#endif
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5393a535240c..6202649e3e11 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -27,21 +27,6 @@ struct mmu_psize_def {
extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
#endif /* __ASSEMBLY__ */
-/*
- * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS
- * if we increase SECTIONS_WIDTH we will not store node details in page->flags and
- * page_to_nid does a page->section->node lookup
- * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce
- * memory requirements with large number of sections.
- * 51 bits is the max physical real address on POWER9
- */
-#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \
- defined(CONFIG_PPC_64K_PAGES)
-#define MAX_PHYSMEM_BITS 51
-#else
-#define MAX_PHYSMEM_BITS 46
-#endif
-
/* 64-bit classic hash table MMU */
#include <asm/book3s/64/mmu-hash.h>
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f17442c3a092..4a6a0c51a733 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -295,6 +295,13 @@ extern unsigned long pci_io_base;
#include <asm/book3s/64/hash.h>
#include <asm/book3s/64/radix.h>
+#if H_MAX_PHYSMEM_BITS > R_MAX_PHYSMEM_BITS
+#define MAX_PHYSMEM_BITS H_MAX_PHYSMEM_BITS
+#else
+#define MAX_PHYSMEM_BITS R_MAX_PHYSMEM_BITS
+#endif
+
+
#ifdef CONFIG_PPC_64K_PAGES
#include <asm/book3s/64/pgtable-64k.h>
#else
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 0cba794c4fb8..c7813dc628fc 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -91,6 +91,22 @@
* +------------------------------+ Kernel linear (0xc.....)
*/
+
+/*
+ * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce
+ * memory requirements with large number of sections.
+ * 51 bits is the max physical real address on POWER9
+ */
+
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME)
+#define R_MAX_PHYSMEM_BITS 51
+#else
+#define R_MAX_PHYSMEM_BITS 46
+#endif
+
#define RADIX_KERN_VIRT_START ASM_CONST(0xc008000000000000)
/*
* 49 = MAX_EA_BITS_PER_CONTEXT (hash specific). To make sure we pick
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6a3bac357e24..238349dd101a 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -771,6 +771,11 @@ void __init early_init_devtree(void *params)
limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
memblock_enforce_memory_limit(limit);
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_4K_PAGES)
+ if (!early_radix_enabled())
+ memblock_cap_memory_range(0, 1UL << (H_MAX_PHYSMEM_BITS));
+#endif
+
memblock_allow_resize();
memblock_dump_all();
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 8141e8b40ee5..11d65de400e1 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -765,8 +765,8 @@ static long slb_allocate_kernel(unsigned long ea, unsigned long id)
if (id == LINEAR_MAP_REGION_ID) {
- /* We only support upto MAX_PHYSMEM_BITS */
- if ((ea & EA_MASK) > (1UL << MAX_PHYSMEM_BITS))
+ /* We only support upto H_MAX_PHYSMEM_BITS */
+ if ((ea & EA_MASK) > (1UL << H_MAX_PHYSMEM_BITS))
return -EFAULT;
flags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_linear_psize].sllp;
--
2.26.2
^ permalink raw reply related
* [PATCH v2 1/4] powerpc/percpu: Update percpu bootmem allocator
From: Aneesh Kumar K.V @ 2020-06-08 7:09 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: cam, Aneesh Kumar K.V
This update the ppc64 version to be closer to x86/sparc.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/kernel/setup_64.c | 45 ++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bb47555d48a2..eaddd53a0e13 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -753,17 +753,46 @@ void __init emergency_stack_init(void)
}
#ifdef CONFIG_SMP
-#define PCPU_DYN_SIZE ()
-
-static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
+/**
+ * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu
+ * @cpu: cpu to allocate for
+ * @size: size allocation in bytes
+ * @align: alignment
+ *
+ * Allocate @size bytes aligned at @align for cpu @cpu. This wrapper
+ * does the right thing for NUMA regardless of the current
+ * configuration.
+ *
+ * RETURNS:
+ * Pointer to the allocated area on success, NULL on failure.
+ */
+static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size,
+ size_t align)
{
- return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS),
- MEMBLOCK_ALLOC_ACCESSIBLE,
- early_cpu_to_node(cpu));
+ const unsigned long goal = __pa(MAX_DMA_ADDRESS);
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+ int node = early_cpu_to_node(cpu);
+ void *ptr;
+ if (!node_online(node) || !NODE_DATA(node)) {
+ ptr = memblock_alloc_from(size, align, goal);
+ pr_info("cpu %d has no node %d or node-local memory\n",
+ cpu, node);
+ pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
+ cpu, size, __pa(ptr));
+ } else {
+ ptr = memblock_alloc_try_nid(size, align, goal,
+ MEMBLOCK_ALLOC_ACCESSIBLE, node);
+ pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
+ "%016lx\n", cpu, size, node, __pa(ptr));
+ }
+ return ptr;
+#else
+ return memblock_alloc_from(size, align, goal);
+#endif
}
-static void __init pcpu_fc_free(void *ptr, size_t size)
+static void __init pcpu_free_bootmem(void *ptr, size_t size)
{
memblock_free(__pa(ptr), size);
}
@@ -798,7 +827,7 @@ void __init setup_per_cpu_areas(void)
atom_size = 1 << 20;
rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
- pcpu_fc_alloc, pcpu_fc_free);
+ pcpu_alloc_bootmem, pcpu_free_bootmem);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
--
2.26.2
^ permalink raw reply related
* [PATCH v2 2/4] powerpc/64/mm: implement page mapping percpu first chunk allocator
From: Aneesh Kumar K.V @ 2020-06-08 7:09 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: cam, Aneesh Kumar K.V
In-Reply-To: <20200608070904.387440-1-aneesh.kumar@linux.ibm.com>
Implement page mapping percpu first chunk allocator as a fallback to
the embedding allocator. With 4K hash translation we limit our page
table range to 64TB and commit: 0034d395f89d ("powerpc/mm/hash64: Map all the
kernel regions in the same 0xc range") moved all kernel mapping to
that 64TB range. In-order to support sparse memory layout we need
to increase our linear mapping space and reduce other mappings.
With such a layout percpu embedded first chunk allocator will fail
because of small vmalloc range. Add a fallback to page mapping
percpu first chunk allocator for such failures.
The below dmesg output can be observed in such case.
percpu: max_distance=0x1ffffef00000 too large for vmalloc space 0x10000000000
PERCPU: auto allocator failed (-22), falling back to page size
percpu: 40 4K pages/cpu s148816 r0 d15024
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/Kconfig | 5 ++-
arch/powerpc/kernel/setup_64.c | 62 ++++++++++++++++++++++++++++++++--
2 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..820365c42065 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -59,7 +59,10 @@ config HAVE_SETUP_PER_CPU_AREA
def_bool PPC64
config NEED_PER_CPU_EMBED_FIRST_CHUNK
- def_bool PPC64
+ def_bool y if PPC64
+
+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+ def_bool y if PPC64
config NR_IRQS
int "Number of virtual interrupt numbers"
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index eaddd53a0e13..6090d8290561 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -66,6 +66,7 @@
#include <asm/feature-fixups.h>
#include <asm/kup.h>
#include <asm/early_ioremap.h>
+#include <asm/pgalloc.h>
#include "setup.h"
@@ -808,13 +809,58 @@ static int pcpu_cpu_distance(unsigned int from, unsigned int to)
unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
EXPORT_SYMBOL(__per_cpu_offset);
+static void __init pcpu_populate_pte(unsigned long addr)
+{
+ pgd_t *pgd = pgd_offset_k(addr);
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ p4d = p4d_offset(pgd, addr);
+ if (p4d_none(*p4d)) {
+ pud_t *new;
+
+ new = memblock_alloc(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ p4d_populate(&init_mm, p4d, new);
+ }
+
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud)) {
+ pmd_t *new;
+
+ new = memblock_alloc(PMD_TABLE_SIZE, PMD_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pud_populate(&init_mm, pud, new);
+ }
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd)) {
+ pte_t *new;
+
+ new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE);
+ if (!new)
+ goto err_alloc;
+ pmd_populate_kernel(&init_mm, pmd, new);
+ }
+
+ return;
+
+err_alloc:
+ panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
+ __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+}
+
+
void __init setup_per_cpu_areas(void)
{
const size_t dyn_size = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE;
size_t atom_size;
unsigned long delta;
unsigned int cpu;
- int rc;
+ int rc = -EINVAL;
/*
* Linear mapping is one of 4K, 1M and 16M. For 4K, no need
@@ -826,8 +872,18 @@ void __init setup_per_cpu_areas(void)
else
atom_size = 1 << 20;
- rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
- pcpu_alloc_bootmem, pcpu_free_bootmem);
+ if (pcpu_chosen_fc != PCPU_FC_PAGE) {
+ rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
+ pcpu_alloc_bootmem, pcpu_free_bootmem);
+ if (rc)
+ pr_warn("PERCPU: %s allocator failed (%d), "
+ "falling back to page size\n",
+ pcpu_fc_names[pcpu_chosen_fc], rc);
+ }
+
+ if (rc < 0)
+ rc = pcpu_page_first_chunk(0, pcpu_alloc_bootmem, pcpu_free_bootmem,
+ pcpu_populate_pte);
if (rc < 0)
panic("cannot initialize percpu area (err=%d)", rc);
--
2.26.2
^ permalink raw reply related
* Re: [PATCH] tty: serial: cpm_uart: Fix behaviour for non existing GPIOs
From: Johan Hovold @ 2020-06-08 6:41 UTC (permalink / raw)
To: Christophe Leroy
Cc: Greg Kroah-Hartman, Linus Walleij, linux-kernel, linux-serial,
Jiri Slaby, linuxppc-dev
In-Reply-To: <bafd8df9e743c433196c727293c5015620fae2b8.1591428452.git.christophe.leroy@csgroup.eu>
On Sat, Jun 06, 2020 at 07:30:21AM +0000, Christophe Leroy wrote:
> devm_gpiod_get_index() doesn't return NULL but -ENOENT when the
> requested GPIO doesn't exist, leading to the following messages:
>
> [ 2.742468] gpiod_direction_input: invalid GPIO (errorpointer)
> [ 2.748147] can't set direction for gpio #2: -2
> [ 2.753081] gpiod_direction_input: invalid GPIO (errorpointer)
> [ 2.758724] can't set direction for gpio #3: -2
> [ 2.763666] gpiod_direction_output: invalid GPIO (errorpointer)
> [ 2.769394] can't set direction for gpio #4: -2
> [ 2.774341] gpiod_direction_input: invalid GPIO (errorpointer)
> [ 2.779981] can't set direction for gpio #5: -2
> [ 2.784545] ff000a20.serial: ttyCPM1 at MMIO 0xfff00a20 (irq = 39, base_baud = 8250000) is a CPM UART
>
> Use IS_ERR_OR_NULL() to properly check gpiod validity.
Why check for NULL at all?
> Fixes: 97cbaf2c829b ("tty: serial: cpm_uart: Convert to use GPIO descriptors")
> Cc: stable@vger.kernel.org
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> drivers/tty/serial/cpm_uart/cpm_uart_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> index a04f74d2e854..3cbe24802296 100644
> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> @@ -1217,7 +1217,7 @@ static int cpm_uart_init_port(struct device_node *np,
>
> gpiod = devm_gpiod_get_index(dev, NULL, i, GPIOD_ASIS);
>
> - if (gpiod) {
> + if (!IS_ERR_OR_NULL(gpiod)) {
> if (i == GPIO_RTS || i == GPIO_DTR)
> ret = gpiod_direction_output(gpiod, 0);
> else
Johan
^ permalink raw reply
* [PATCH] mm/debug_vm_pgtable: Fix kernel crash with page table validate
From: Aneesh Kumar K.V @ 2020-06-08 6:27 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, anshuman.khandual, linuxppc-dev, Aneesh Kumar K.V
Architectures can have CONFIG_TRANSPARENT_HUGEPAGE enabled but
no THP support enabled based on platforms. For ex: with 4K
PAGE_SIZE ppc64 supports THP only with radix translation.
This results in below crash when running with hash translation and
4K PAGE_SIZE.
kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:140!
cpu 0x61: Vector: 700 (Program Check) at [c000000ff948f860]
pc: c0000000018810f8: debug_vm_pgtable+0x480/0x8b0
lr: c0000000018810ec: debug_vm_pgtable+0x474/0x8b0
...
[c000000ff948faf0] c000000001880fec debug_vm_pgtable+0x374/0x8b0 (unreliable)
[c000000ff948fbf0] c000000000011648 do_one_initcall+0x98/0x4f0
[c000000ff948fcd0] c000000001843928 kernel_init_freeable+0x330/0x3fc
[c000000ff948fdb0] c0000000000122ac kernel_init+0x24/0x148
[c000000ff948fe20] c00000000000cc44 ret_from_kernel_thread+0x5c/0x78
Check for THP support correctly
Cc: anshuman.khandual@arm.com
Fixes: 399145f9eb6c ("mm/debug: add tests validating architecture page table helpers")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/debug_vm_pgtable.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 188c18908964..e60151c5e997 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -61,6 +61,9 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
{
pmd_t pmd = pfn_pmd(pfn, prot);
+ if (!has_transparent_hugepage())
+ return;
+
WARN_ON(!pmd_same(pmd, pmd));
WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd))));
WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd))));
--
2.26.2
^ permalink raw reply related
* Re: [v1 PATCH 1/2] Refactoring carrying over IMA measuremnet logs over Kexec.
From: kernel test robot @ 2020-06-08 2:29 UTC (permalink / raw)
To: Prakhar Srivastava, linux-arm-kernel, linux-kernel, linuxppc-dev,
devicetree, linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, bhsharma, kbuild-all, catalin.marinas
In-Reply-To: <20200607233323.22375-2-prsriva@linux.microsoft.com>
[-- Attachment #1: Type: text/plain, Size: 5700 bytes --]
Hi Prakhar,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on powerpc/next soc/for-next v5.7 next-20200605]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Prakhar-Srivastava/Adding-support-to-carry-IMA-measurement-logs/20200608-073805
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> security/integrity/ima/ima_kexec.c:59:5: warning: no previous prototype for 'ima_get_kexec_buffer' [-Wmissing-prototypes]
59 | int ima_get_kexec_buffer(void **addr, size_t *size)
| ^~~~~~~~~~~~~~~~~~~~
>> security/integrity/ima/ima_kexec.c:85:5: warning: no previous prototype for 'delete_fdt_mem_rsv' [-Wmissing-prototypes]
85 | int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size)
| ^~~~~~~~~~~~~~~~~~
>> security/integrity/ima/ima_kexec.c:115:5: warning: no previous prototype for 'ima_free_kexec_buffer' [-Wmissing-prototypes]
115 | int ima_free_kexec_buffer(void)
| ^~~~~~~~~~~~~~~~~~~~~
>> security/integrity/ima/ima_kexec.c:144:6: warning: no previous prototype for 'remove_ima_buffer' [-Wmissing-prototypes]
144 | void remove_ima_buffer(void *fdt, int chosen_node)
| ^~~~~~~~~~~~~~~~~
security/integrity/ima/ima_kexec.c:231:6: warning: no previous prototype for 'ima_add_kexec_buffer' [-Wmissing-prototypes]
231 | void ima_add_kexec_buffer(struct kimage *image)
| ^~~~~~~~~~~~~~~~~~~~
vim +/ima_get_kexec_buffer +59 security/integrity/ima/ima_kexec.c
51
52 /**
53 * ima_get_kexec_buffer - get IMA buffer from the previous kernel
54 * @addr: On successful return, set to point to the buffer contents.
55 * @size: On successful return, set to the buffer size.
56 *
57 * Return: 0 on success, negative errno on error.
58 */
> 59 int ima_get_kexec_buffer(void **addr, size_t *size)
60 {
61 int ret, len;
62 unsigned long tmp_addr;
63 size_t tmp_size;
64 const void *prop;
65
66 prop = of_get_property(of_chosen, "linux,ima-kexec-buffer", &len);
67 if (!prop)
68 return -ENOENT;
69
70 ret = do_get_kexec_buffer(prop, len, &tmp_addr, &tmp_size);
71 if (ret)
72 return ret;
73
74 *addr = __va(tmp_addr);
75 *size = tmp_size;
76
77 return 0;
78 }
79
80 /**
81 * delete_fdt_mem_rsv - delete memory reservation with given address and size
82 *
83 * Return: 0 on success, or negative errno on error.
84 */
> 85 int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size)
86 {
87 int i, ret, num_rsvs = fdt_num_mem_rsv(fdt);
88
89 for (i = 0; i < num_rsvs; i++) {
90 uint64_t rsv_start, rsv_size;
91
92 ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
93 if (ret) {
94 pr_err("Malformed device tree.\n");
95 return -EINVAL;
96 }
97
98 if (rsv_start == start && rsv_size == size) {
99 ret = fdt_del_mem_rsv(fdt, i);
100 if (ret) {
101 pr_err("Error deleting device tree reservation.\n");
102 return -EINVAL;
103 }
104
105 return 0;
106 }
107 }
108
109 return -ENOENT;
110 }
111
112 /**
113 * ima_free_kexec_buffer - free memory used by the IMA buffer
114 */
> 115 int ima_free_kexec_buffer(void)
116 {
117 int ret;
118 unsigned long addr;
119 size_t size;
120 struct property *prop;
121
122 prop = of_find_property(of_chosen, "linux,ima-kexec-buffer", NULL);
123 if (!prop)
124 return -ENOENT;
125
126 ret = do_get_kexec_buffer(prop->value, prop->length, &addr, &size);
127 if (ret)
128 return ret;
129
130 ret = of_remove_property(of_chosen, prop);
131 if (ret)
132 return ret;
133
134 return memblock_free(addr, size);
135
136 }
137
138 /**
139 * remove_ima_buffer - remove the IMA buffer property and reservation from @fdt
140 *
141 * The IMA measurement buffer is of no use to a subsequent kernel, so we always
142 * remove it from the device tree.
143 */
> 144 void remove_ima_buffer(void *fdt, int chosen_node)
145 {
146 int ret, len;
147 unsigned long addr;
148 size_t size;
149 const void *prop;
150
151 prop = fdt_getprop(fdt, chosen_node, "linux,ima-kexec-buffer", &len);
152 if (!prop)
153 return;
154
155 ret = do_get_kexec_buffer(prop, len, &addr, &size);
156 fdt_delprop(fdt, chosen_node, "linux,ima-kexec-buffer");
157 if (ret)
158 return;
159
160 ret = delete_fdt_mem_rsv(fdt, addr, size);
161 if (!ret)
162 pr_debug("Removed old IMA buffer reservation.\n");
163 }
164
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71849 bytes --]
^ permalink raw reply
* Re: [v1 PATCH 1/2] Refactoring carrying over IMA measuremnet logs over Kexec.
From: kernel test robot @ 2020-06-08 1:47 UTC (permalink / raw)
To: Prakhar Srivastava, linux-arm-kernel, linux-kernel, linuxppc-dev,
devicetree, linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, bhsharma, kbuild-all, catalin.marinas
In-Reply-To: <20200607233323.22375-2-prsriva@linux.microsoft.com>
[-- Attachment #1: Type: text/plain, Size: 4642 bytes --]
Hi Prakhar,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on powerpc/next soc/for-next v5.7 next-20200605]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Prakhar-Srivastava/Adding-support-to-carry-IMA-measurement-logs/20200608-073805
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
arch/powerpc/kexec/ima.c:24:5: warning: no previous prototype for 'arch_ima_add_kexec_buffer' [-Wmissing-prototypes]
24 | int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/kexec/ima.c:62:5: warning: no previous prototype for 'setup_ima_buffer' [-Wmissing-prototypes]
62 | int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node)
| ^~~~~~~~~~~~~~~~
arch/powerpc/kexec/ima.c: In function 'setup_ima_buffer':
>> arch/powerpc/kexec/ima.c:71:8: error: implicit declaration of function 'get_addr_size_cells'; did you mean 'fdt_address_cells'? [-Werror=implicit-function-declaration]
71 | ret = get_addr_size_cells(&addr_cells, &size_cells);
| ^~~~~~~~~~~~~~~~~~~
| fdt_address_cells
cc1: some warnings being treated as errors
vim +71 arch/powerpc/kexec/ima.c
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 53
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 54 /**
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 55 * setup_ima_buffer - add IMA buffer information to the fdt
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 56 * @image: kexec image being loaded.
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 57 * @fdt: Flattened device tree for the next kernel.
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 58 * @chosen_node: Offset to the chosen node.
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 59 *
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 60 * Return: 0 on success, or negative errno on error.
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 61 */
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 62 int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node)
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 63 {
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 64 int ret, addr_cells, size_cells, entry_size;
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 65 u8 value[16];
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 66
aea659ce44ba07 arch/powerpc/kexec/ima.c Prakhar Srivastava 2020-06-07 67 // remove_ima_buffer(fdt, chosen_node);
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 68 if (!image->arch.ima_buffer_size)
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 69 return 0;
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 70
ab6b1d1fc4aae6 arch/powerpc/kernel/ima_kexec.c Thiago Jung Bauermann 2016-12-19 @71 ret = get_addr_size_cells(&addr_cells, &size_cells);
:::::: The code at line 71 was first introduced by commit
:::::: ab6b1d1fc4aae6b8bd6fb1422405568094c9b40f powerpc: ima: send the kexec buffer to the next kernel
:::::: TO: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66079 bytes --]
^ permalink raw reply
* Re: [v1 PATCH 1/2] Refactoring carrying over IMA measuremnet logs over Kexec.
From: kernel test robot @ 2020-06-08 1:35 UTC (permalink / raw)
To: Prakhar Srivastava, linux-arm-kernel, linux-kernel, linuxppc-dev,
devicetree, linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, kbuild-all, catalin.marinas, bhsharma,
clang-built-linux
In-Reply-To: <20200607233323.22375-2-prsriva@linux.microsoft.com>
[-- Attachment #1: Type: text/plain, Size: 2638 bytes --]
Hi Prakhar,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on powerpc/next soc/for-next v5.7 next-20200605]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Prakhar-Srivastava/Adding-support-to-carry-IMA-measurement-logs/20200608-073805
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-randconfig-r012-20200607 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project e429cffd4f228f70c1d9df0e5d77c08590dd9766)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> arch/arm64/kernel/machine_kexec_file.c:50:5: warning: no previous prototype for function 'arch_ima_add_kexec_buffer' [-Wmissing-prototypes]
int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
^
arch/arm64/kernel/machine_kexec_file.c:50:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
^
static
1 warning generated.
vim +/arch_ima_add_kexec_buffer +50 arch/arm64/kernel/machine_kexec_file.c
41
42 /**
43 * arch_ima_add_kexec_buffer - do arch-specific steps to add the IMA buffer
44 *
45 * Architectures should use this function to pass on the IMA buffer
46 * information to the next kernel.
47 *
48 * Return: 0 on success, negative errno on error.
49 */
> 50 int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
51 size_t size)
52 {
53 image->arch.ima_buffer_addr = load_addr;
54 image->arch.ima_buffer_size = size;
55 return 0;
56 }
57
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31953 bytes --]
^ permalink raw reply
* [v1 PATCH 2/2] Add Documentation regarding the ima-kexec-buffer node in the chosen node documentation
From: Prakhar Srivastava @ 2020-06-07 23:33 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, linuxppc-dev, devicetree,
linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, catalin.marinas, bhsharma, tao.li, zohar,
paulus, vincenzo.frascino, frowand.list, nramas, masahiroy,
jmorris, takahiro.akashi, serge, pasha.tatashin, will, prsriva,
robh+dt, hsinyi, tusharsu, tglx, allison, christophe.leroy,
mbrugger, balajib, dmitry.kasatkin, james.morse, gregkh
In-Reply-To: <20200607233323.22375-1-prsriva@linux.microsoft.com>
Add Documentation regarding the ima-kexec-buffer node in
the chosen node documentation
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
---
Documentation/devicetree/bindings/chosen.txt | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt
index 45e79172a646..a15f70c007ef 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -135,3 +135,20 @@ e.g.
linux,initrd-end = <0x82800000>;
};
};
+
+linux,ima-kexec-buffer
+----------------------
+
+This property(currently used by powerpc, arm64) holds the memory range,
+the address and the size, of the IMA measurement logs that are being carried
+over to the kexec session.
+
+/ {
+ chosen {
+ linux,ima-kexec-buffer = <0x9 0x82000000 0x0 0x00008000>;
+ };
+};
+
+This porperty does not represent real hardware, but the memory allocated for
+carrying the IMA measurement logs. The address and the suze are expressed in
+#address-cells and #size-cells, respectively of the root node.
--
2.25.1
^ permalink raw reply related
* [v1 PATCH 0/2] Adding support to carry IMA measurement logs
From: Prakhar Srivastava @ 2020-06-07 23:33 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, linuxppc-dev, devicetree,
linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, catalin.marinas, bhsharma, tao.li, zohar,
paulus, vincenzo.frascino, frowand.list, nramas, masahiroy,
jmorris, takahiro.akashi, serge, pasha.tatashin, will, prsriva,
robh+dt, hsinyi, tusharsu, tglx, allison, christophe.leroy,
mbrugger, balajib, dmitry.kasatkin, james.morse, gregkh
IMA during kexec(kexec file load) verifies the kernel signature and measures
the signature of the kernel. The signature in the logs can be used to verfiy the
authenticity of the kernel. The logs don not get carried over kexec and thus
remote attesation cannot verify the signature of the running kernel.
Add a new chosen node entry linux,ima-kexec-buffer to hold the address and
the size of the memory reserved to carry the IMA measurement log.
Tested on:
arm64 with Uboot
Changelog:
v1:
Refactoring carrying over IMA measuremnet logs over Kexec. This patch
moves the non-architecture specific code out of powerpc and adds to
security/ima.(Suggested by Thiago)
Add Documentation regarding the ima-kexec-buffer node in the chosen
node documentation
v0:
Add a layer of abstraction to use the memory reserved by device tree
for ima buffer pass.
Add support for ima buffer pass using reserved memory for arm64 kexec.
Update the arch sepcific code path in kexec file load to store the
ima buffer in the reserved memory. The same reserved memory is read
on kexec or cold boot.
Documentation/devicetree/bindings/chosen.txt | 17 +++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/ima.h | 24 +++
arch/arm64/include/asm/kexec.h | 3 +
arch/arm64/kernel/machine_kexec_file.c | 47 +++++-
arch/powerpc/include/asm/ima.h | 9 --
arch/powerpc/kexec/ima.c | 117 +-------------
security/integrity/ima/ima_kexec.c | 151 +++++++++++++++++++
8 files changed, 236 insertions(+), 133 deletions(-)
create mode 100644 arch/arm64/include/asm/ima.h
--
2.25.1
^ permalink raw reply
* [v1 PATCH 1/2] Refactoring carrying over IMA measuremnet logs over Kexec.
From: Prakhar Srivastava @ 2020-06-07 23:33 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, linuxppc-dev, devicetree,
linux-integrity, linux-security-module
Cc: kstewart, mark.rutland, catalin.marinas, bhsharma, tao.li, zohar,
paulus, vincenzo.frascino, frowand.list, nramas, masahiroy,
jmorris, takahiro.akashi, serge, pasha.tatashin, will, prsriva,
robh+dt, hsinyi, tusharsu, tglx, allison, christophe.leroy,
mbrugger, balajib, dmitry.kasatkin, james.morse, gregkh
In-Reply-To: <20200607233323.22375-1-prsriva@linux.microsoft.com>
This patch moves the non-architecture specific code out of powerpc and
adds to security/ima.
Update the arm64 and powerpc kexec file load paths to carry the IMA measurement
logs.
Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/ima.h | 24 ++++
arch/arm64/include/asm/kexec.h | 3 +
arch/arm64/kernel/machine_kexec_file.c | 47 ++++++--
arch/powerpc/include/asm/ima.h | 9 --
arch/powerpc/kexec/ima.c | 117 +------------------
security/integrity/ima/ima_kexec.c | 151 +++++++++++++++++++++++++
7 files changed, 219 insertions(+), 133 deletions(-)
create mode 100644 arch/arm64/include/asm/ima.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5d513f461957..3d544e2e25e6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1070,6 +1070,7 @@ config KEXEC
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
+ select HAVE_IMA_KEXEC
help
This is new version of kexec system call. This system call is
file based and takes file descriptors as system call argument
diff --git a/arch/arm64/include/asm/ima.h b/arch/arm64/include/asm/ima.h
new file mode 100644
index 000000000000..8946bae8baa2
--- /dev/null
+++ b/arch/arm64/include/asm/ima.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARCH_IMA_H
+#define _ASM_ARCH_IMA_H
+
+struct kimage;
+
+#ifdef CONFIG_IMA_KEXEC
+int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
+ size_t size);
+
+int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node);
+#else
+static inline int arch_ima_add_kexec_buffer(struct kimage *image,
+ unsigned long load_addr, size_t size)
+{
+ return 0;
+}
+static inline int setup_ima_buffer(const struct kimage *image, void *fdt,
+ int chosen_node)
+{
+ return 0;
+}
+#endif /* CONFIG_IMA_KEXEC */
+#endif /* _ASM_ARCH_IMA_H */
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index d24b527e8c00..7bd60c185ad3 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -100,6 +100,9 @@ struct kimage_arch {
void *elf_headers;
unsigned long elf_headers_mem;
unsigned long elf_headers_sz;
+
+ phys_addr_t ima_buffer_addr;
+ size_t ima_buffer_size;
};
extern const struct kexec_file_ops kexec_image_ops;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index b40c3b0def92..1e9007c926db 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -24,20 +24,37 @@
#include <asm/byteorder.h>
/* relevant device tree properties */
-#define FDT_PROP_KEXEC_ELFHDR "linux,elfcorehdr"
-#define FDT_PROP_MEM_RANGE "linux,usable-memory-range"
-#define FDT_PROP_INITRD_START "linux,initrd-start"
-#define FDT_PROP_INITRD_END "linux,initrd-end"
-#define FDT_PROP_BOOTARGS "bootargs"
-#define FDT_PROP_KASLR_SEED "kaslr-seed"
-#define FDT_PROP_RNG_SEED "rng-seed"
-#define RNG_SEED_SIZE 128
+#define FDT_PROP_KEXEC_ELFHDR "linux,elfcorehdr"
+#define FDT_PROP_MEM_RANGE "linux,usable-memory-range"
+#define FDT_PROP_INITRD_START "linux,initrd-start"
+#define FDT_PROP_INITRD_END "linux,initrd-end"
+#define FDT_PROP_BOOTARGS "bootargs"
+#define FDT_PROP_KASLR_SEED "kaslr-seed"
+#define FDT_PROP_RNG_SEED "rng-seed"
+#define FDT_PROP_IMA_KEXEC_BUFFER "linux,ima-kexec-buffer"
+#define RNG_SEED_SIZE 128
const struct kexec_file_ops * const kexec_file_loaders[] = {
&kexec_image_ops,
NULL
};
+/**
+ * arch_ima_add_kexec_buffer - do arch-specific steps to add the IMA buffer
+ *
+ * Architectures should use this function to pass on the IMA buffer
+ * information to the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
+ size_t size)
+{
+ image->arch.ima_buffer_addr = load_addr;
+ image->arch.ima_buffer_size = size;
+ return 0;
+}
+
int arch_kimage_file_post_load_cleanup(struct kimage *image)
{
vfree(image->arch.dtb);
@@ -66,6 +83,9 @@ static int setup_dtb(struct kimage *image,
if (ret && ret != -FDT_ERR_NOTFOUND)
goto out;
ret = fdt_delprop(dtb, off, FDT_PROP_MEM_RANGE);
+ if (ret && ret != -FDT_ERR_NOTFOUND)
+ goto out;
+ ret = fdt_delprop(dtb, off, FDT_PROP_IMA_KEXEC_BUFFER);
if (ret && ret != -FDT_ERR_NOTFOUND)
goto out;
@@ -119,6 +139,17 @@ static int setup_dtb(struct kimage *image,
goto out;
}
+ if (image->arch.ima_buffer_size > 0) {
+
+ ret = fdt_appendprop_addrrange(dtb, 0, off,
+ FDT_PROP_IMA_KEXEC_BUFFER,
+ image->arch.ima_buffer_addr,
+ image->arch.ima_buffer_size);
+ if (ret)
+ return (ret == -FDT_ERR_NOSPACE ? -ENOMEM : -EINVAL);
+
+ }
+
/* add kaslr-seed */
ret = fdt_delprop(dtb, off, FDT_PROP_KASLR_SEED);
if (ret == -FDT_ERR_NOTFOUND)
diff --git a/arch/powerpc/include/asm/ima.h b/arch/powerpc/include/asm/ima.h
index ead488cf3981..80b83881fa03 100644
--- a/arch/powerpc/include/asm/ima.h
+++ b/arch/powerpc/include/asm/ima.h
@@ -4,15 +4,6 @@
struct kimage;
-int ima_get_kexec_buffer(void **addr, size_t *size);
-int ima_free_kexec_buffer(void);
-
-#ifdef CONFIG_IMA
-void remove_ima_buffer(void *fdt, int chosen_node);
-#else
-static inline void remove_ima_buffer(void *fdt, int chosen_node) {}
-#endif
-
#ifdef CONFIG_IMA_KEXEC
int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
size_t size);
diff --git a/arch/powerpc/kexec/ima.c b/arch/powerpc/kexec/ima.c
index 720e50e490b6..537e4f82a050 100644
--- a/arch/powerpc/kexec/ima.c
+++ b/arch/powerpc/kexec/ima.c
@@ -12,121 +12,6 @@
#include <linux/memblock.h>
#include <linux/libfdt.h>
-static int get_addr_size_cells(int *addr_cells, int *size_cells)
-{
- struct device_node *root;
-
- root = of_find_node_by_path("/");
- if (!root)
- return -EINVAL;
-
- *addr_cells = of_n_addr_cells(root);
- *size_cells = of_n_size_cells(root);
-
- of_node_put(root);
-
- return 0;
-}
-
-static int do_get_kexec_buffer(const void *prop, int len, unsigned long *addr,
- size_t *size)
-{
- int ret, addr_cells, size_cells;
-
- ret = get_addr_size_cells(&addr_cells, &size_cells);
- if (ret)
- return ret;
-
- if (len < 4 * (addr_cells + size_cells))
- return -ENOENT;
-
- *addr = of_read_number(prop, addr_cells);
- *size = of_read_number(prop + 4 * addr_cells, size_cells);
-
- return 0;
-}
-
-/**
- * ima_get_kexec_buffer - get IMA buffer from the previous kernel
- * @addr: On successful return, set to point to the buffer contents.
- * @size: On successful return, set to the buffer size.
- *
- * Return: 0 on success, negative errno on error.
- */
-int ima_get_kexec_buffer(void **addr, size_t *size)
-{
- int ret, len;
- unsigned long tmp_addr;
- size_t tmp_size;
- const void *prop;
-
- prop = of_get_property(of_chosen, "linux,ima-kexec-buffer", &len);
- if (!prop)
- return -ENOENT;
-
- ret = do_get_kexec_buffer(prop, len, &tmp_addr, &tmp_size);
- if (ret)
- return ret;
-
- *addr = __va(tmp_addr);
- *size = tmp_size;
-
- return 0;
-}
-
-/**
- * ima_free_kexec_buffer - free memory used by the IMA buffer
- */
-int ima_free_kexec_buffer(void)
-{
- int ret;
- unsigned long addr;
- size_t size;
- struct property *prop;
-
- prop = of_find_property(of_chosen, "linux,ima-kexec-buffer", NULL);
- if (!prop)
- return -ENOENT;
-
- ret = do_get_kexec_buffer(prop->value, prop->length, &addr, &size);
- if (ret)
- return ret;
-
- ret = of_remove_property(of_chosen, prop);
- if (ret)
- return ret;
-
- return memblock_free(addr, size);
-
-}
-
-/**
- * remove_ima_buffer - remove the IMA buffer property and reservation from @fdt
- *
- * The IMA measurement buffer is of no use to a subsequent kernel, so we always
- * remove it from the device tree.
- */
-void remove_ima_buffer(void *fdt, int chosen_node)
-{
- int ret, len;
- unsigned long addr;
- size_t size;
- const void *prop;
-
- prop = fdt_getprop(fdt, chosen_node, "linux,ima-kexec-buffer", &len);
- if (!prop)
- return;
-
- ret = do_get_kexec_buffer(prop, len, &addr, &size);
- fdt_delprop(fdt, chosen_node, "linux,ima-kexec-buffer");
- if (ret)
- return;
-
- ret = delete_fdt_mem_rsv(fdt, addr, size);
- if (!ret)
- pr_debug("Removed old IMA buffer reservation.\n");
-}
-
#ifdef CONFIG_IMA_KEXEC
/**
* arch_ima_add_kexec_buffer - do arch-specific steps to add the IMA buffer
@@ -179,7 +64,7 @@ int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node)
int ret, addr_cells, size_cells, entry_size;
u8 value[16];
- remove_ima_buffer(fdt, chosen_node);
+// remove_ima_buffer(fdt, chosen_node);
if (!image->arch.ima_buffer_size)
return 0;
diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
index 121de3e04af2..36887ed4ff82 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -10,8 +10,159 @@
#include <linux/seq_file.h>
#include <linux/vmalloc.h>
#include <linux/kexec.h>
+#include <linux/of.h>
+#include <linux/memblock.h>
+#include <linux/libfdt.h>
#include "ima.h"
+static int get_addr_size_cells(int *addr_cells, int *size_cells)
+{
+ struct device_node *root;
+
+ root = of_find_node_by_path("/");
+ if (!root)
+ return -EINVAL;
+
+ *addr_cells = of_n_addr_cells(root);
+ *size_cells = of_n_size_cells(root);
+
+ of_node_put(root);
+
+ return 0;
+}
+
+static int do_get_kexec_buffer(const void *prop, int len, unsigned long *addr,
+ size_t *size)
+{
+ int ret, addr_cells, size_cells;
+
+ ret = get_addr_size_cells(&addr_cells, &size_cells);
+ if (ret)
+ return ret;
+
+ if (len < 4 * (addr_cells + size_cells))
+ return -ENOENT;
+
+ *addr = of_read_number(prop, addr_cells);
+ *size = of_read_number(prop + 4 * addr_cells, size_cells);
+
+ return 0;
+}
+
+/**
+ * ima_get_kexec_buffer - get IMA buffer from the previous kernel
+ * @addr: On successful return, set to point to the buffer contents.
+ * @size: On successful return, set to the buffer size.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int ima_get_kexec_buffer(void **addr, size_t *size)
+{
+ int ret, len;
+ unsigned long tmp_addr;
+ size_t tmp_size;
+ const void *prop;
+
+ prop = of_get_property(of_chosen, "linux,ima-kexec-buffer", &len);
+ if (!prop)
+ return -ENOENT;
+
+ ret = do_get_kexec_buffer(prop, len, &tmp_addr, &tmp_size);
+ if (ret)
+ return ret;
+
+ *addr = __va(tmp_addr);
+ *size = tmp_size;
+
+ return 0;
+}
+
+/**
+ * delete_fdt_mem_rsv - delete memory reservation with given address and size
+ *
+ * Return: 0 on success, or negative errno on error.
+ */
+int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size)
+{
+ int i, ret, num_rsvs = fdt_num_mem_rsv(fdt);
+
+ for (i = 0; i < num_rsvs; i++) {
+ uint64_t rsv_start, rsv_size;
+
+ ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
+ if (ret) {
+ pr_err("Malformed device tree.\n");
+ return -EINVAL;
+ }
+
+ if (rsv_start == start && rsv_size == size) {
+ ret = fdt_del_mem_rsv(fdt, i);
+ if (ret) {
+ pr_err("Error deleting device tree reservation.\n");
+ return -EINVAL;
+ }
+
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+/**
+ * ima_free_kexec_buffer - free memory used by the IMA buffer
+ */
+int ima_free_kexec_buffer(void)
+{
+ int ret;
+ unsigned long addr;
+ size_t size;
+ struct property *prop;
+
+ prop = of_find_property(of_chosen, "linux,ima-kexec-buffer", NULL);
+ if (!prop)
+ return -ENOENT;
+
+ ret = do_get_kexec_buffer(prop->value, prop->length, &addr, &size);
+ if (ret)
+ return ret;
+
+ ret = of_remove_property(of_chosen, prop);
+ if (ret)
+ return ret;
+
+ return memblock_free(addr, size);
+
+}
+
+/**
+ * remove_ima_buffer - remove the IMA buffer property and reservation from @fdt
+ *
+ * The IMA measurement buffer is of no use to a subsequent kernel, so we always
+ * remove it from the device tree.
+ */
+void remove_ima_buffer(void *fdt, int chosen_node)
+{
+ int ret, len;
+ unsigned long addr;
+ size_t size;
+ const void *prop;
+
+ prop = fdt_getprop(fdt, chosen_node, "linux,ima-kexec-buffer", &len);
+ if (!prop)
+ return;
+
+ ret = do_get_kexec_buffer(prop, len, &addr, &size);
+ fdt_delprop(fdt, chosen_node, "linux,ima-kexec-buffer");
+ if (ret)
+ return;
+
+ ret = delete_fdt_mem_rsv(fdt, addr, size);
+ if (!ret)
+ pr_debug("Removed old IMA buffer reservation.\n");
+}
+
+
#ifdef CONFIG_IMA_KEXEC
static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer,
unsigned long segment_size)
--
2.25.1
^ permalink raw reply related
* Re: Boot issue with the latest Git kernel
From: Aneesh Kumar K.V @ 2020-06-07 14:07 UTC (permalink / raw)
To: Christian Zigotzky, Christophe Leroy, linuxppc-dev, jroedel
Cc: Darren Stevens, Christoph Hellwig, R.T.Dickinson,
Christian Zigotzky
In-Reply-To: <7bf97562-3c6d-de73-6dbd-ccca275edc7b@xenosoft.de>
Christian Zigotzky <chzigotzky@xenosoft.de> writes:
> On 04 June 2020 at 7:15 pm, Christophe Leroy wrote:
>> Yes today's linux-next boots on my powerpc 8xx board.
>>
>> Christophe
> Hello Christophe,
>
> Thanks for testing.
>
> I was able to perform a 'git bisect' [1] and identified the bad commit.
> [2] I reverted this commit and after that the kernel boots and works
> without any problems.
>
> Could you please check this commit?
>
> Thanks,
> Christian
>
>
> [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=50772#p50772
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ba3e6947aed9bb9575eb1603c0ac6e39185d32a
This was also reported here.
https://lore.kernel.org/linuxppc-dev/1591181457.9020.13.camel@abdul
-aneesh
^ permalink raw reply
* Re: Boot issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-07 13:27 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev, jroedel
Cc: Darren Stevens, Christoph Hellwig, R.T.Dickinson,
Christian Zigotzky
In-Reply-To: <7bf97562-3c6d-de73-6dbd-ccca275edc7b@xenosoft.de>
Hi All,
It seems, someone has fixed the boot issue. The latest Git kernel boots
on my PowerPC machines.
Thanks,
Christian
On 05 June 2020 at 6:23 pm, Christian Zigotzky wrote:
> On 04 June 2020 at 7:15 pm, Christophe Leroy wrote:
>> Yes today's linux-next boots on my powerpc 8xx board.
>>
>> Christophe
> Hello Christophe,
>
> Thanks for testing.
>
> I was able to perform a 'git bisect' [1] and identified the bad
> commit. [2] I reverted this commit and after that the kernel boots and
> works without any problems.
>
> Could you please check this commit?
>
> Thanks,
> Christian
>
>
> [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=50772#p50772
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ba3e6947aed9bb9575eb1603c0ac6e39185d32a
^ permalink raw reply
* [PATCH v11 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* Changed the definition of 'struct nd_papr_pdsm_health' to a maximal
struct 184 bytes in size [ Dan Williams ]
* Added new field 'extension_flags' to 'struct nd_papr_pdsm_health'
[ Dan Williams ]
* Updated papr_pdsm_health() to set field 'extension_flags' to 0.
* Introduced a define ND_PDSM_PAYLOAD_MAX_SIZE that indicates the
maximum size of a payload.
* Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health
that was preventing correct initialization of 'struct
nd_papr_pdsm_health'. [ Ira ]
v9..v10:
* Removed code in papr_pdsm_health that performed validation on pdsm
payload version and corrosponding struct and defines used for
validation of payload version.
* Dropped usage of struct papr_pdsm_health in 'struct
papr_scm_priv'. Instead papr_psdm_health() now uses
'papr_scm_priv.health_bitmap' to populate the pdsm payload.
* Above change also fixes the problem where this patch was removing
the code that was previously introduced in this patch-series.
[ Ira ]
* Introduced a new def ND_PDSM_ENVELOPE_HDR_SIZE that indicates the
space allocated to 'struct nd_pdsm_cmd_pkg' fields except 'struct
nd_cmd_pkg'. This def is useful in validating payload sizes.
* Reworked papr_pdsm_health() to enforce a specific payload size for
'PAPR_PDSM_HEALTH' pdsm request.
Resend:
* Added ack from Aneesh.
v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.
v7..v8:
* None
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 43 ++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 71 +++++++++++++++++++++++
2 files changed, 114 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index df2447455cfe..12c7aa5ee8bf 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -72,13 +72,56 @@ struct nd_pdsm_cmd_pkg {
__u8 payload[]; /* In/Out: Sub-cmd data buffer */
} __packed;
+/* Calculate size used by the pdsm header fields minus 'struct nd_cmd_pkg' */
+#define ND_PDSM_HDR_SIZE \
+ (sizeof(struct nd_pdsm_cmd_pkg) - sizeof(struct nd_cmd_pkg))
+
+/* Max payload size that we can handle */
+#define ND_PDSM_PAYLOAD_MAX_SIZE 184
+
/*
* Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
* via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
*/
enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
};
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * extension_flags : Any extension fields present in the struct.
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health {
+ union {
+ struct {
+ __u32 extension_flags;
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+ };
+ __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
+ };
+} __packed;
+
#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 167fcf0e249d..047ca6bd26a9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -436,6 +436,73 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_pdsm_health(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ int rc;
+ struct nd_papr_pdsm_health health = { 0 };
+ u16 copysize = sizeof(struct nd_papr_pdsm_health);
+ u16 payload_size = pkg->hdr.nd_size_out - ND_PDSM_HDR_SIZE;
+
+ /* Ensure correct payload size that can hold struct nd_papr_pdsm_health */
+ if (payload_size != copysize) {
+ dev_dbg(&p->pdev->dev,
+ "Unexpected payload-size (%u). Expected (%u)",
+ pkg->hdr.nd_size_out, copysize);
+ rc = -ENOSPC;
+ goto out;
+ }
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc) {
+ mutex_unlock(&p->health_mutex);
+ goto out;
+ }
+
+ /* update health struct with various flags derived from health bitmap */
+ health = (struct nd_papr_pdsm_health) {
+ .extension_flags = 0,
+ .dimm_unarmed = !!(p->health_bitmap & PAPR_PMEM_UNARMED_MASK),
+ .dimm_bad_shutdown = !!(p->health_bitmap & PAPR_PMEM_BAD_SHUTDOWN_MASK),
+ .dimm_bad_restore = !!(p->health_bitmap & PAPR_PMEM_BAD_RESTORE_MASK),
+ .dimm_encrypted = !!(p->health_bitmap & PAPR_PMEM_ENCRYPTED),
+ .dimm_locked = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_scrubbed = !!(p->health_bitmap & PAPR_PMEM_SCRUBBED_AND_LOCKED),
+ .dimm_health = PAPR_PDSM_DIMM_HEALTHY,
+ };
+
+ /* Update field dimm_health based on health_bitmap flags */
+ if (p->health_bitmap & PAPR_PMEM_HEALTH_FATAL)
+ health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_CRITICAL)
+ health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+ else if (p->health_bitmap & PAPR_PMEM_HEALTH_UNHEALTHY)
+ health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ /* struct populated hence can release the mutex now */
+ mutex_unlock(&p->health_mutex);
+
+ dev_dbg(&p->pdev->dev, "Copying payload size=%u\n", copysize);
+
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &health, copysize);
+
+ /* Update fw size including size of struct nd_pdsm_cmd_pkg fields */
+ pkg->hdr.nd_fw_size = copysize + ND_PDSM_HDR_SIZE;
+
+out:
+ dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
+
+ return rc;
+}
+
/*
* For a given pdsm request call an appropriate service function.
* Returns errors if any while handling the pdsm command package.
@@ -449,6 +516,10 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
/* Call pdsm service function */
switch (pdsm) {
+ case PAPR_PDSM_HEALTH:
+ pkg->cmd_status = papr_pdsm_health(p, pkg);
+ break;
+
default:
dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Unsupported PDSM request\n",
pdsm);
--
2.26.2
^ permalink raw reply related
* [PATCH v11 2/6] seq_buf: Export seq_buf_printf
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Cezary Rojewski, Piotr Maziarz, Steven Rostedt,
Christoph Hellwig, Oliver O'Halloran, Aneesh Kumar K . V,
Borislav Petkov, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.
Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.
Cc: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Cc: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* None
v9..v10:
* None
Resend:
* Added ack from Steven Rostedt
v8..v9:
* None
v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
modules' to 'kernel lodable modules'.
Resend:
* Added ack from Steven Rostedt
v6..v7:
* New patch in the series
---
lib/seq_buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
return ret;
}
+EXPORT_SYMBOL_GPL(seq_buf_printf);
#ifdef CONFIG_BINARY_PRINTF
/**
--
2.26.2
^ permalink raw reply related
* [PATCH v11 4/6] powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
Since papr_scm_ndctl() can be called from outside papr_scm, its
exposed to the possibility of receiving NULL as value of 'cmd_rc'
argument. This patch updates papr_scm_ndctl() to protect against such
possibility by assigning it pointer to a local variable in case cmd_rc
== NULL.
Finally the patch also updates the 'default' add a debug log unknown
'cmd' values.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* Instead of returning *cmd_rd just return '0' in case nd_cmd is
handled. In case of unknown nd-cmd return -EINVAL
[ Ira and Dan Williams ]
* Updated patch description.
v9..v10
* New patch in the series
---
arch/powerpc/platforms/pseries/papr_scm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 0c091622b15e..692ad3d79826 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -355,11 +355,16 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ int rc;
/* Only dimm-specific calls are supported atm */
if (!nvdimm)
return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (!cmd_rc)
+ cmd_rc = &rc;
+
p = nvdimm_provider_data(nvdimm);
switch (cmd) {
@@ -381,6 +386,7 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
break;
default:
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
}
--
2.26.2
^ permalink raw reply related
* [PATCH v11 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.
A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from
'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license
incompatibility issue with non-GPL-2.0 user-space code trying to
include the header in its code. [ Ira ]
* Verified papr_pdsm.h with UAPI_HEADER_TEST config.
* Moved the is_cmd_valid() check in papr_scm_ndctl() before check for
cmd_rc == NULL. This prevents cmd_rc to be updated in case the
nd-cmd is invalid or unknown.
v9..v10:
* Simplified 'struct nd_pdsm_cmd_pkg' by removing the
'payload_version' field.
* Removed the corrosponding documentation on versioning and backward
compatibility from 'papr_pdsm.h'
* Reduced the size of reserved fields to 4-bytes making 'struct
nd_pdsm_cmd_pkg' 64 + 8 bytes long.
* Updated is_cmd_valid() to enforce validation checks on pdsm
commands. [ Dan Williams ]
* Added check for reserved fields being set to '0' in is_cmd_valid()
[ Ira ]
* Moved changes for checking cmd_rc == NULL and logging improvements
to a separate prelim patch [ Ira ].
* Moved pdsm package validation checks from papr_scm_service_pdsm()
to is_cmd_valid().
* Marked papr_scm_service_pdsm() return type as 'void' since errors
are reported in nd_pdsm_cmd_pkg.cmd_status field.
Resend:
* Added ack from Aneesh.
v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.
v7..v8:
* Removed the 'payload_offset' field from 'struct
nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
[ Ira ]
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 84 +++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 126 +++++++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 207 insertions(+), 4 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index 000000000000..df2447455cfe
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include <linux/types.h>
+#include <linux/ndctl.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ * +-------------+---------------------+---------------------------+
+ * | 64-Bytes | 8-Bytes | Max 184-Bytes |
+ * +-------------+---------------------+---------------------------+
+ * | nd_pdsm_cmd_pkg | |
+ * |-------------+ | |
+ * | nd_cmd_pkg | | |
+ * +-------------+---------------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | reserved | payload |
+ * | nd_command | | |
+ * | nd_fw_size | | |
+ * +-------------+---------------------+---------------------------+
+ *
+ * PDSM Header:
+ *
+ * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'reserved' : Not used, reserved for future and should be set to 0.
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a PDSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * PDSM command. Similarly the output of servicing the PDSM command will be
+ * copied to the payload field using the same struct.
+ *
+ * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
+ * leaves around 184 bytes for the envelope payload (ignoring any padding that
+ * the compiler may silently introduce).
+ *
+ */
+
+/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pdsm_cmd_pkg {
+ struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 reserved[2]; /* Ignored and to be used in future */
+ __u8 payload[]; /* In/Out: Sub-cmd data buffer */
+} __packed;
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
+ */
+enum papr_pdsm {
+ PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_MAX,
+};
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 692ad3d79826..167fcf0e249d 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -89,6 +91,21 @@ struct papr_scm_priv {
u64 health_bitmap;
};
+/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
+static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
+{
+ return (struct nd_pdsm_cmd_pkg *)cmd;
+}
+
+/* Return the payload pointer for a given pcmd */
+static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
+{
+ if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
+ return NULL;
+ else
+ return (void *)(pcmd->payload);
+}
+
static int drc_pmem_bind(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
@@ -349,17 +366,113 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Do a sanity checks on the inputs args to dimm-control function and return
+ * '0' if valid. This also does validation on ND_CMD_CALL sub-command packages.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_pdsm_cmd_pkg *pkg;
+ struct nd_cmd_pkg *nd_cmd;
+ struct papr_scm_priv *p;
+ enum papr_pdsm pdsm;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider data from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ }
+
+ /* For CMD_CALL verify pdsm request */
+ if (cmd == ND_CMD_CALL) {
+ /* Verify the envelope package */
+ if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the nd_cmd_pkg.nd_family is correct */
+ nd_cmd = (struct nd_cmd_pkg *)buf;
+ if (nd_cmd->nd_family != NVDIMM_FAMILY_PAPR) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ nd_cmd->nd_family);
+ return -EINVAL;
+ }
+
+ /* Get the pdsm request package and the command */
+ pkg = nd_to_pdsm_cmd_pkg(nd_cmd);
+ pdsm = pkg->hdr.nd_command;
+
+ /* Verify if the psdm command is valid */
+ if (pdsm <= PAPR_PDSM_MIN || pdsm >= PAPR_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Invalid PDSM\n", pdsm);
+ return -EINVAL;
+ }
+
+ /* We except a payload with all PDSM commands */
+ if (!pdsm_cmd_to_payload(pkg)) {
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Empty payload\n", pdsm);
+ return -EINVAL;
+ }
+
+ /* Ensure reserved fields of the pdsm header are set to 0 */
+ if (pkg->reserved[0] || pkg->reserved[1]) {
+ dev_dbg(&p->pdev->dev,
+ "PDSM[0x%x]: Invalid reserved field usage\n", pdsm);
+ return -EINVAL;
+ }
+ }
+
+ /* Let the command be further processed */
+ return 0;
+}
+
+/*
+ * For a given pdsm request call an appropriate service function.
+ * Returns errors if any while handling the pdsm command package.
+ */
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ const enum papr_pdsm pdsm = pkg->hdr.nd_command;
+
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Servicing..\n", pdsm);
+
+ /* Call pdsm service function */
+ switch (pdsm) {
+ default:
+ dev_dbg(&p->pdev->dev, "PDSM[0x%x]: Unsupported PDSM request\n",
+ pdsm);
+ pkg->cmd_status = -ENOENT;
+ break;
+ }
+
+ return pkg->cmd_status;
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
+ struct nd_pdsm_cmd_pkg *call_pkg = NULL;
struct papr_scm_priv *p;
int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, rc);
+ return rc;
+ }
/* Use a local variable in case cmd_rc pointer is NULL */
if (!cmd_rc)
@@ -385,6 +498,11 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = nd_to_pdsm_cmd_pkg(buf);
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
^ permalink raw reply related
* [PATCH v11 3/6] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* None
v9..v10:
* Removed an avoidable 'goto' in __drc_pmem_query_health. [ Ira ].
Resend:
* Added ack from Aneesh.
v8..v9:
* Rename some variables and defines to reduce usage of term SCM
replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
bitmap/ [ Ira, Aneesh ].
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 168 +++++++++++++++++-
2 files changed, 193 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index 000000000000..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+ (RO) Report flags indicating various states of a
+ papr-pmem NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..0c091622b15e 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
+ PAPR_PMEM_HEALTH_FATAL | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,61 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%ld\n", rc);
+ return -ENXIO;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+
+ return 0;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ int rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +389,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_nd_attributes,
+};
+
+static const struct attribute_group *papr_nd_attr_groups[] = {
+ &papr_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +473,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +560,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
^ permalink raw reply related
* [PATCH v11 1/6] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200607131339.476036-1-vaibhav@linux.ibm.com>
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Changelog:
v10..v11:
* None
v9..v10:
* Added ack from Ira.
Resend:
* None
v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]
v7..v8:
* Added a clarification on bit-ordering of Health Bitmap
Resend:
* None
v6..v7:
* None
v5..v6:
* New patch in the series
---
Documentation/powerpc/papr_hcalls.rst | 46 ++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
**H_SCM_HEALTH**
| Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
| Return Value: *H_Success, H_Parameter, H_Hardware*
Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more states
+(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400000000000000
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++------+-----------------------------------------------------------------------+
+| Bit | Definition |
++======+=======================================================================+
+| 00 | PMEM device is unable to persist memory contents. |
+| | If the system is powered down, nothing will be saved. |
++------+-----------------------------------------------------------------------+
+| 01 | PMEM device failed to persist memory contents. Either contents were |
+| | not saved successfully on power down or were not restored properly on |
+| | power up. |
++------+-----------------------------------------------------------------------+
+| 02 | PMEM device contents are persisted from previous IPL. The data from |
+| | the last boot were successfully restored. |
++------+-----------------------------------------------------------------------+
+| 03 | PMEM device contents are not persisted from previous IPL. There was no|
+| | data to restore from the last boot. |
++------+-----------------------------------------------------------------------+
+| 04 | PMEM device memory life remaining is critically low |
++------+-----------------------------------------------------------------------+
+| 05 | PMEM device will be garded off next IPL due to failure |
++------+-----------------------------------------------------------------------+
+| 06 | PMEM device contents cannot persist due to current platform health |
+| | status. A hardware failure may prevent data from being saved or |
+| | restored. |
++------+-----------------------------------------------------------------------+
+| 07 | PMEM device is unable to persist memory contents in certain conditions|
++------+-----------------------------------------------------------------------+
+| 08 | PMEM device is encrypted |
++------+-----------------------------------------------------------------------+
+| 09 | PMEM device has successfully completed a requested erase or secure |
+| | erase procedure. |
++------+-----------------------------------------------------------------------+
+|10:63 | Reserved / Unused |
++------+-----------------------------------------------------------------------+
**H_SCM_PERFORMANCE_STATS**
--
2.26.2
^ permalink raw reply related
* [PATCH v11 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
From: Vaibhav Jain @ 2020-06-07 13:13 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm, linux-kernel
Cc: Santosh Sivaraj, Steven Rostedt, Oliver O'Halloran,
Aneesh Kumar K . V, Vaibhav Jain, Dan Williams, Ira Weiny
Changes since v10 [1]:
* Changed the definition of 'struct nd_papr_pdsm_health' to a maximal
struct 184 bytes which can be extended in future with newly
introduced 'extension_flags'
* Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health
that was preventing correct initialization of 'struct
nd_papr_pdsm_health'.
* Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from
'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license
incompatibility issue with non-GPL-2.0 user-space code trying to
include the header in its code.
* Fixed the control flow in papr_scm_ndctl() to return '0' in case
nd_cmd is handled. In case of unknown nd-cmd return -EINVAL.
[1] https://lore.kernel.org/linux-nvdimm/20200604234136.253703-1-vaibhav@linux.ibm.com/
---
The PAPR standard[2][4] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[3]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[3]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[5]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patch updates papr_scm_ndctl() to handle a possible error case
and also improve debug logging.
Fifth patch deals with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[3] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[4] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v11
---
Vaibhav Jain (6):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 127 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 373 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 564 insertions(+), 11 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
^ permalink raw reply
* [PATCH 1/2] powerpc/64s: remove PROT_SAO support
From: Nicholas Piggin @ 2020-06-07 12:02 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin
ISA v3.1 does not support the SAO storage control attribute required to
implement PROT_SAO. PROT_SAO was used by specialised system software
(Lx86) that has been discontinued for about 7 years, and is not thought
to be used elsewhere, so removal should not cause problems.
We rather remove it than keep support for older processors, because
live migrating guest partitions to newer processors may not be possible
if SAO is in use.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 8 ++--
arch/powerpc/include/asm/cputable.h | 9 ++--
arch/powerpc/include/asm/kvm_book3s_64.h | 3 +-
arch/powerpc/include/asm/mman.h | 24 +++--------
arch/powerpc/include/asm/nohash/64/pgtable.h | 2 -
arch/powerpc/kernel/dt_cpu_ftrs.c | 2 +-
arch/powerpc/mm/book3s64/hash_utils.c | 2 -
include/linux/mm.h | 2 -
include/trace/events/mmflags.h | 2 -
mm/ksm.c | 4 --
tools/testing/selftests/powerpc/mm/.gitignore | 1 -
tools/testing/selftests/powerpc/mm/Makefile | 4 +-
tools/testing/selftests/powerpc/mm/prot_sao.c | 42 -------------------
13 files changed, 18 insertions(+), 87 deletions(-)
delete mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f17442c3a092..d9e92586f8dc 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -20,9 +20,13 @@
#define _PAGE_RW (_PAGE_READ | _PAGE_WRITE)
#define _PAGE_RWX (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
#define _PAGE_PRIVILEGED 0x00008 /* kernel access only */
-#define _PAGE_SAO 0x00010 /* Strong access order */
+
+#define _PAGE_CACHE_CTL 0x00030 /* Bits for the folowing cache modes */
+ /* No bits set is normal cacheable memory */
+ /* 0x00010 unused, is SAO bit on radix POWER9 */
#define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */
#define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */
+
#define _PAGE_DIRTY 0x00080 /* C: page changed */
#define _PAGE_ACCESSED 0x00100 /* R: page referenced */
/*
@@ -825,8 +829,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
}
-#define _PAGE_CACHE_CTL (_PAGE_SAO | _PAGE_NON_IDEMPOTENT | _PAGE_TOLERANT)
-
#define pgprot_noncached pgprot_noncached
static inline pgprot_t pgprot_noncached(pgprot_t prot)
{
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index bac2252c839e..c7e923b0000a 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
#define CPU_FTR_SPURR LONG_ASM_CONST(0x0000000001000000)
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0000000002000000)
#define CPU_FTR_VSX LONG_ASM_CONST(0x0000000004000000)
-#define CPU_FTR_SAO LONG_ASM_CONST(0x0000000008000000)
#define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0000000010000000)
#define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0000000020000000)
#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0000000040000000)
@@ -435,7 +434,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
- CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \
+ CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
@@ -444,7 +443,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
- CPU_FTR_DSCR | CPU_FTR_SAO | \
+ CPU_FTR_DSCR | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
@@ -455,7 +454,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
- CPU_FTR_DSCR | CPU_FTR_SAO | \
+ CPU_FTR_DSCR | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
@@ -473,7 +472,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
- CPU_FTR_DSCR | CPU_FTR_SAO | \
+ CPU_FTR_DSCR | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9bb9bb370b53..579c9229124b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,7 +400,8 @@ static inline bool hpte_cache_flags_ok(unsigned long hptel, bool is_ci)
/* Handle SAO */
if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
- cpu_has_feature(CPU_FTR_ARCH_206))
+ cpu_has_feature(CPU_FTR_ARCH_206) &&
+ !cpu_has_feature(CPU_FTR_ARCH_31))
wimg = HPTE_R_M;
if (!is_ci)
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index d610c2e07b28..43a62f3e21a0 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -13,38 +13,24 @@
#include <linux/pkeys.h>
#include <asm/cpu_has_feature.h>
-/*
- * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
- * here. How important is the optimization?
- */
+#ifdef CONFIG_PPC_MEM_KEYS
static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
unsigned long pkey)
{
-#ifdef CONFIG_PPC_MEM_KEYS
- return (((prot & PROT_SAO) ? VM_SAO : 0) | pkey_to_vmflag_bits(pkey));
-#else
- return ((prot & PROT_SAO) ? VM_SAO : 0);
-#endif
+ return pkey_to_vmflag_bits(pkey);
}
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
{
-#ifdef CONFIG_PPC_MEM_KEYS
- return (vm_flags & VM_SAO) ?
- __pgprot(_PAGE_SAO | vmflag_to_pte_pkey_bits(vm_flags)) :
- __pgprot(0 | vmflag_to_pte_pkey_bits(vm_flags));
-#else
- return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
-#endif
+ return __pgprot(vmflag_to_pte_pkey_bits(vm_flags));
}
#define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
+#endif
static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
{
- if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
- return false;
- if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
+ if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM))
return false;
return true;
}
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 3424381b81da..2fd528ef48e0 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -82,8 +82,6 @@
*/
#include <asm/nohash/pte-book3e.h>
-#define _PAGE_SAO 0
-
#define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1))
/*
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 3a409517c031..8d2e4043702f 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -622,7 +622,7 @@ static struct dt_cpu_feature_match __initdata
{"processor-control-facility-v3", feat_enable_dbell, CPU_FTR_DBELL},
{"processor-utilization-of-resources-register", feat_enable_purr, 0},
{"no-execute", feat_enable, 0},
- {"strong-access-ordering", feat_enable, CPU_FTR_SAO},
+ {"strong-access-ordering", feat_enable, 0},
{"cache-inhibited-large-page", feat_enable_large_ci, 0},
{"coprocessor-icswx", feat_enable, 0},
{"hypervisor-virtualization-interrupt", feat_enable_hvi, 0},
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 0124003e60d0..14b6abdc3bd8 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -232,8 +232,6 @@ unsigned long htab_convert_pte_flags(unsigned long pteflags)
rflags |= HPTE_R_I;
else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_NON_IDEMPOTENT)
rflags |= (HPTE_R_I | HPTE_R_G);
- else if ((pteflags & _PAGE_CACHE_CTL) == _PAGE_SAO)
- rflags |= (HPTE_R_W | HPTE_R_I | HPTE_R_M);
else
/*
* Add memory coherence if cache inhibited is not set
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 86adc71a972f..bdcaae914120 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -316,8 +316,6 @@ extern unsigned int kobjsize(const void *objp);
#if defined(CONFIG_X86)
# define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */
-#elif defined(CONFIG_PPC)
-# define VM_SAO VM_ARCH_1 /* Strong Access Ordering (powerpc) */
#elif defined(CONFIG_PARISC)
# define VM_GROWSUP VM_ARCH_1
#elif defined(CONFIG_IA64)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5fb752034386..939092dbcb8b 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -114,8 +114,6 @@ IF_HAVE_PG_IDLE(PG_idle, "idle" )
#if defined(CONFIG_X86)
#define __VM_ARCH_SPECIFIC_1 {VM_PAT, "pat" }
-#elif defined(CONFIG_PPC)
-#define __VM_ARCH_SPECIFIC_1 {VM_SAO, "sao" }
#elif defined(CONFIG_PARISC) || defined(CONFIG_IA64)
#define __VM_ARCH_SPECIFIC_1 {VM_GROWSUP, "growsup" }
#elif !defined(CONFIG_MMU)
diff --git a/mm/ksm.c b/mm/ksm.c
index 18c5d005bd01..b225b0e16111 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2452,10 +2452,6 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
if (vma_is_dax(vma))
return 0;
-#ifdef VM_SAO
- if (*vm_flags & VM_SAO)
- return 0;
-#endif
#ifdef VM_SPARC_ADI
if (*vm_flags & VM_SPARC_ADI)
return 0;
diff --git a/tools/testing/selftests/powerpc/mm/.gitignore b/tools/testing/selftests/powerpc/mm/.gitignore
index 2ca523255b1b..ff296c94f627 100644
--- a/tools/testing/selftests/powerpc/mm/.gitignore
+++ b/tools/testing/selftests/powerpc/mm/.gitignore
@@ -2,7 +2,6 @@
hugetlb_vs_thp_test
subpage_prot
tempfile
-prot_sao
segv_errors
wild_bctr
large_vm_fork_separation
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index b9103c4bb414..9b8a7b3069c5 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -2,7 +2,7 @@
noarg:
$(MAKE) -C ../
-TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \
+TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot segv_errors wild_bctr \
large_vm_fork_separation bad_accesses
TEST_GEN_PROGS_EXTENDED := tlbie_test
TEST_GEN_FILES := tempfile
@@ -12,8 +12,6 @@ include ../../lib.mk
$(TEST_GEN_PROGS): ../harness.c
-$(OUTPUT)/prot_sao: ../utils.c
-
$(OUTPUT)/wild_bctr: CFLAGS += -m64
$(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
$(OUTPUT)/bad_accesses: CFLAGS += -m64
diff --git a/tools/testing/selftests/powerpc/mm/prot_sao.c b/tools/testing/selftests/powerpc/mm/prot_sao.c
deleted file mode 100644
index e2eed65b7735..000000000000
--- a/tools/testing/selftests/powerpc/mm/prot_sao.c
+++ /dev/null
@@ -1,42 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright 2016, Michael Ellerman, IBM Corp.
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/mman.h>
-
-#include <asm/cputable.h>
-
-#include "utils.h"
-
-#define SIZE (64 * 1024)
-
-int test_prot_sao(void)
-{
- char *p;
-
- /* 2.06 or later should support SAO */
- SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
-
- /*
- * Ensure we can ask for PROT_SAO.
- * We can't really verify that it does the right thing, but at least we
- * confirm the kernel will accept it.
- */
- p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE | PROT_SAO,
- MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
- FAIL_IF(p == MAP_FAILED);
-
- /* Write to the mapping, to at least cause a fault */
- memset(p, 0xaa, SIZE);
-
- return 0;
-}
-
-int main(void)
-{
- return test_harness(test_prot_sao, "prot-sao");
-}
--
2.23.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox