* Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
From: Nayna @ 2019-06-04 20:33 UTC (permalink / raw)
To: Greg KH, Daniel Axtens
Cc: nayna, cclaudio, Mimi Zohar, George Wilson, Matthew Garrett,
Elaine Palmer, linux-fsdevel, linuxppc-dev
In-Reply-To: <20190603072916.GA7545@kroah.com>
On 06/03/2019 03:29 AM, Greg KH wrote:
> On Mon, Jun 03, 2019 at 04:04:32PM +1000, Daniel Axtens wrote:
>> Hi Nayna,
>>
>>>> As PowerNV moves towards secure boot, we need a place to put secure
>>>> variables. One option that has been canvassed is to make our secure
>>>> variables look like EFI variables. This is an early sketch of another
>>>> approach where we create a generic firmware variable file system,
>>>> fwvarfs, and an OPAL Secure Variable backend for it.
>>> Is there a need of new filesystem ? I am wondering why can't these be
>>> exposed via sysfs / securityfs ?
>>> Probably, something like... /sys/firmware/secureboot or
>>> /sys/kernel/security/secureboot/ ?
>> I suppose we could put secure variables in sysfs, but I'm not sure
>> that's what sysfs was intended for. I understand sysfs as "a
>> filesystem-based view of kernel objects" (from
>> Documentation/filesystems/configfs/configfs.txt), and I don't think a
>> secure variable is really a kernel object in the same way most other
>> things in sysfs are... but I'm open to being convinced.
> What makes them more "secure" than anything else that is in sysfs today?
> I didn't see anything in this patchset that provided "additional
> security", did I miss it?
>
>> securityfs seems to be reserved for LSMs, I don't think we can put
>> things there.
> Yeah, I wouldn't mess with that.
Thanks Greg for clarifying!! I am curious, the TPM exposes the BIOS event log to userspace via securityfs. Is there a reason for not exposing these security variables to userspace via securityfs as well?
Thanks & Regards,
- Nayna
^ permalink raw reply
* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: Wei Yang @ 2019-06-04 21:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: Michal Hocko, linux-ia64, linux-sh, Wei Yang, linux-mm, Arun KS,
Ingo Molnar, linux-s390, Rafael J. Wysocki, Pavel Tatashin,
mike.travis@hpe.com, Qian Cai, Dan Williams, linux-arm-kernel,
Oscar Salvador, Andrew Banman, Mathieu Malaterre,
Greg Kroah-Hartman, linux-kernel, Igor Mammedov, akpm,
linuxppc-dev
In-Reply-To: <20190527111152.16324-8-david@redhat.com>
On Mon, May 27, 2019 at 01:11:48PM +0200, David Hildenbrand wrote:
>Only memory to be added to the buddy and to be onlined/offlined by
>user space using /sys/devices/system/memory/... needs (and should have!)
>memory block devices.
>
>Factor out creation of memory block devices. Create all devices after
>arch_add_memory() succeeded. We can later drop the want_memblock parameter,
>because it is now effectively stale.
>
>Only after memory block devices have been added, memory can be onlined
>by user space. This implies, that memory is not visible to user space at
>all before arch_add_memory() succeeded.
>
>While at it
>- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
>- introduce find_memory_block_by_id() to search via block id
>- Use find_memory_block_by_id() in init_memory_block() to catch
> duplicates
Generally looks good to me besides two tiny comments.
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c | 82 +++++++++++++++++++++++++++---------------
> include/linux/memory.h | 2 +-
> mm/memory_hotplug.c | 15 ++++----
> 3 files changed, 63 insertions(+), 36 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index ac17c95a5f28..5a0370f0c506 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr)
> return section_nr / sections_per_block;
> }
>
>+static inline int pfn_to_block_id(unsigned long pfn)
>+{
>+ return base_memory_block_id(pfn_to_section_nr(pfn));
>+}
>+
> static int memory_subsys_online(struct device *dev);
> static int memory_subsys_offline(struct device *dev);
>
>@@ -582,10 +587,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
> * A reference for the returned object is held and the reference for the
> * hinted object is released.
> */
>-struct memory_block *find_memory_block_hinted(struct mem_section *section,
>- struct memory_block *hint)
>+static struct memory_block *find_memory_block_by_id(int block_id,
>+ struct memory_block *hint)
> {
>- int block_id = base_memory_block_id(__section_nr(section));
> struct device *hintdev = hint ? &hint->dev : NULL;
> struct device *dev;
>
>@@ -597,6 +601,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section,
> return to_memory_block(dev);
> }
>
>+struct memory_block *find_memory_block_hinted(struct mem_section *section,
>+ struct memory_block *hint)
>+{
>+ int block_id = base_memory_block_id(__section_nr(section));
>+
>+ return find_memory_block_by_id(block_id, hint);
>+}
>+
> /*
> * For now, we have a linear search to go find the appropriate
> * memory_block corresponding to a particular phys_index. If
>@@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
> unsigned long start_pfn;
> int ret = 0;
>
>+ mem = find_memory_block_by_id(block_id, NULL);
>+ if (mem) {
>+ put_device(&mem->dev);
>+ return -EEXIST;
>+ }
find_memory_block_by_id() is not that close to the main idea in this patch.
Would it be better to split this part?
> mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> if (!mem)
> return -ENOMEM;
>@@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
> return 0;
> }
>
>+static void unregister_memory(struct memory_block *memory)
>+{
>+ if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>+ return;
>+
>+ /* drop the ref. we got via find_memory_block() */
>+ put_device(&memory->dev);
>+ device_unregister(&memory->dev);
>+}
>+
> /*
>- * need an interface for the VM to add new memory regions,
>- * but without onlining it.
>+ * Create memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * will be initialized as offline.
> */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int create_memory_block_devices(unsigned long start, unsigned long size)
> {
>- int block_id = base_memory_block_id(__section_nr(section));
>- int ret = 0;
>+ const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>+ int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> struct memory_block *mem;
>+ unsigned long block_id;
>+ int ret = 0;
>
>- mutex_lock(&mem_sysfs_mutex);
>+ if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>+ !IS_ALIGNED(size, memory_block_size_bytes())))
>+ return -EINVAL;
>
>- mem = find_memory_block(section);
>- if (mem) {
>- mem->section_count++;
>- put_device(&mem->dev);
>- } else {
>+ mutex_lock(&mem_sysfs_mutex);
>+ for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
> if (ret)
>- goto out;
>- mem->section_count++;
>+ break;
>+ mem->section_count = sections_per_block;
>+ }
>+ if (ret) {
>+ end_block_id = block_id;
>+ for (block_id = start_block_id; block_id != end_block_id;
>+ block_id++) {
>+ mem = find_memory_block_by_id(block_id, NULL);
>+ mem->section_count = 0;
>+ unregister_memory(mem);
>+ }
> }
Would it be better to do this in reverse order?
And unregister_memory() would free mem, so it is still necessary to set
section_count to 0?
>-
>-out:
> mutex_unlock(&mem_sysfs_mutex);
> return ret;
> }
>
>-static void
>-unregister_memory(struct memory_block *memory)
>-{
>- BUG_ON(memory->dev.bus != &memory_subsys);
>-
>- /* drop the ref. we got via find_memory_block() */
>- put_device(&memory->dev);
>- device_unregister(&memory->dev);
>-}
>-
> void unregister_memory_section(struct mem_section *section)
> {
> struct memory_block *mem;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 474c7c60c8f2..db3e8567f900 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int create_memory_block_devices(unsigned long start, unsigned long size);
> extern void unregister_memory_section(struct mem_section *);
> extern int memory_dev_init(void);
> extern int memory_notify(unsigned long val, void *v);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 4b9d2974f86c..b1fde90bbf19 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -259,13 +259,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> return -EEXIST;
>
> ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
>- if (ret < 0)
>- return ret;
>-
>- if (!want_memblock)
>- return 0;
>-
>- return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+ return ret < 0 ? ret : 0;
> }
>
> /*
>@@ -1107,6 +1101,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
> if (ret < 0)
> goto error;
>
>+ /* create memory block devices after memory was added */
>+ ret = create_memory_block_devices(start, size);
>+ if (ret) {
>+ arch_remove_memory(nid, start, size, NULL);
>+ goto error;
>+ }
>+
> if (new_node) {
> /* If sysfs file of new node can't be created, cpu on the node
> * can't be hot-added. There is no rollback way now.
>--
>2.20.1
--
Wei Yang
Help you, Help me
^ permalink raw reply
* Re: [PATCH v3 08/11] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
From: Wei Yang @ 2019-06-04 21:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
Pavel Tatashin, linux-sh, Mathieu Malaterre, Joonsoo Kim,
linux-kernel, Wei Yang, linux-mm, Arun KS, Qian Cai,
Igor Mammedov, akpm, linuxppc-dev, Dan Williams, linux-arm-kernel
In-Reply-To: <20190527111152.16324-9-david@redhat.com>
On Mon, May 27, 2019 at 01:11:49PM +0200, David Hildenbrand wrote:
>No longer needed, the callers of arch_add_memory() can handle this
>manually.
>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Qian Cai <cai@lca.pw>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
>---
> include/linux/memory_hotplug.h | 8 --------
> mm/memory_hotplug.c | 9 +++------
> 2 files changed, 3 insertions(+), 14 deletions(-)
>
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 2d4de313926d..2f1f87e13baa 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
> extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> unsigned long nr_pages, struct vmem_altmap *altmap);
>
>-/*
>- * Do we want sysfs memblock files created. This will allow userspace to online
>- * and offline memory explicitly. Lack of this bit means that the caller has to
>- * call move_pfn_range_to_zone to finish the initialization.
>- */
>-
>-#define MHP_MEMBLOCK_API (1<<0)
>-
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> struct mhp_restrictions *restrictions);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b1fde90bbf19..9a92549ef23b 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -251,7 +251,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>- struct vmem_altmap *altmap, bool want_memblock)
>+ struct vmem_altmap *altmap)
> {
> int ret;
>
>@@ -294,8 +294,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> }
>
> for (i = start_sec; i <= end_sec; i++) {
>- err = __add_section(nid, section_nr_to_pfn(i), altmap,
>- restrictions->flags & MHP_MEMBLOCK_API);
>+ err = __add_section(nid, section_nr_to_pfn(i), altmap);
>
> /*
> * EEXIST is finally dealt with by ioresource collision
>@@ -1067,9 +1066,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
> */
> int __ref add_memory_resource(int nid, struct resource *res)
> {
>- struct mhp_restrictions restrictions = {
>- .flags = MHP_MEMBLOCK_API,
>- };
>+ struct mhp_restrictions restrictions = {};
> u64 start, size;
> bool new_node = false;
> int ret;
>--
>2.20.1
--
Wei Yang
Help you, Help me
^ permalink raw reply
* Re: [RFC V2] mm: Generalize notify_page_fault()
From: Matthew Wilcox @ 2019-06-04 21:53 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Peter Zijlstra,
Catalin Marinas, Dave Hansen, Will Deacon, linux-mm,
Paul Mackerras, sparclinux, linux-s390, Yoshinori Sato, x86,
Russell King, Ingo Molnar, Fenghua Yu, Stephen Rothwell,
Andrey Konovalov, Andy Lutomirski, Thomas Gleixner,
linux-arm-kernel, Tony Luck, Heiko Carstens, linux-kernel,
Martin Schwidefsky, Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <1559630046-12940-1-git-send-email-anshuman.khandual@arm.com>
On Tue, Jun 04, 2019 at 12:04:06PM +0530, Anshuman Khandual wrote:
> +++ b/arch/x86/mm/fault.c
> @@ -46,23 +46,6 @@ kmmio_fault(struct pt_regs *regs, unsigned long addr)
> return 0;
> }
>
> -static nokprobe_inline int kprobes_fault(struct pt_regs *regs)
> -{
...
> -}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834a..c5a8dcf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1778,6 +1778,7 @@ static inline int pte_devmap(pte_t pte)
> }
> #endif
>
> +int notify_page_fault(struct pt_regs *regs, unsigned int trap);
Why is it now out-of-line?
> +++ b/mm/memory.c
> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
> +{
> + int ret = 0;
> +
> + /*
> + * To be potentially processing a kprobe fault and to be allowed
> + * to call kprobe_running(), we have to be non-preemptible.
> + */
> + if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
> + if (kprobe_running() && kprobe_fault_handler(regs, trap))
> + ret = 1;
> + }
> + return ret;
> +}
> +
I would argue this should be in kprobes.h as a static nokprobe_inline.
^ permalink raw reply
* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
From: Wei Yang @ 2019-06-04 22:07 UTC (permalink / raw)
To: David Hildenbrand
Cc: Michal Hocko, linux-ia64, linux-sh, Wei Yang, linux-mm, Arun KS,
Ingo Molnar, linux-s390, Rafael J. Wysocki, Pavel Tatashin,
mike.travis@hpe.com, Mark Brown, Jonathan Cameron, Dan Williams,
Chris Wilson, linux-arm-kernel, Oscar Salvador, Andrew Banman,
Mathieu Malaterre, Greg Kroah-Hartman, linux-kernel, Alex Deucher,
Igor Mammedov, akpm, linuxppc-dev, David S. Miller
In-Reply-To: <20190527111152.16324-10-david@redhat.com>
On Mon, May 27, 2019 at 01:11:50PM +0200, David Hildenbrand wrote:
>Let's factor out removing of memory block devices, which is only
>necessary for memory added via add_memory() and friends that created
>memory block devices. Remove the devices before calling
>arch_remove_memory().
>
>This finishes factoring out memory block device handling from
>arch_add_memory() and arch_remove_memory().
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Alex Deucher <alexander.deucher@amd.com>
>Cc: "David S. Miller" <davem@davemloft.net>
>Cc: Mark Brown <broonie@kernel.org>
>Cc: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c | 37 ++++++++++++++++++-------------------
> drivers/base/node.c | 11 ++++++-----
> include/linux/memory.h | 2 +-
> include/linux/node.h | 6 ++----
> mm/memory_hotplug.c | 5 +++--
> 5 files changed, 30 insertions(+), 31 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 5a0370f0c506..f28efb0bf5c7 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
> return ret;
> }
>
>-void unregister_memory_section(struct mem_section *section)
>+/*
>+ * Remove memory block devices for the given memory area. Start and size
>+ * have to be aligned to memory block granularity. Memory block devices
>+ * have to be offline.
>+ */
>+void remove_memory_block_devices(unsigned long start, unsigned long size)
> {
>+ const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>+ const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> struct memory_block *mem;
>+ int block_id;
>
>- if (WARN_ON_ONCE(!present_section(section)))
>+ if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>+ !IS_ALIGNED(size, memory_block_size_bytes())))
> return;
>
> mutex_lock(&mem_sysfs_mutex);
>-
>- /*
>- * Some users of the memory hotplug do not want/need memblock to
>- * track all sections. Skip over those.
>- */
>- mem = find_memory_block(section);
>- if (!mem)
>- goto out_unlock;
>-
>- unregister_mem_sect_under_nodes(mem, __section_nr(section));
>-
>- mem->section_count--;
>- if (mem->section_count == 0)
>+ for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>+ mem = find_memory_block_by_id(block_id, NULL);
>+ if (WARN_ON_ONCE(!mem))
>+ continue;
>+ mem->section_count = 0;
Is this step necessary?
>+ unregister_memory_block_under_nodes(mem);
> unregister_memory(mem);
>- else
>- put_device(&mem->dev);
>-
>-out_unlock:
>+ }
> mutex_unlock(&mem_sysfs_mutex);
> }
>
>diff --git a/drivers/base/node.c b/drivers/base/node.c
>index 8598fcbd2a17..04fdfa99b8bc 100644
>--- a/drivers/base/node.c
>+++ b/drivers/base/node.c
>@@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
> return 0;
> }
>
>-/* unregister memory section under all nodes that it spans */
>-int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>- unsigned long phys_index)
>+/*
>+ * Unregister memory block device under all nodes that it spans.
>+ */
>+int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
> NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
> unsigned long pfn, sect_start_pfn, sect_end_pfn;
>@@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> return -ENOMEM;
> nodes_clear(*unlinked_nodes);
>
>- sect_start_pfn = section_nr_to_pfn(phys_index);
>- sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
>+ sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>+ sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
> for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
> int nid;
>
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index db3e8567f900..f26a5417ec5d 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> int create_memory_block_devices(unsigned long start, unsigned long size);
>-extern void unregister_memory_section(struct mem_section *);
>+void remove_memory_block_devices(unsigned long start, unsigned long size);
> extern int memory_dev_init(void);
> extern int memory_notify(unsigned long val, void *v);
> extern int memory_isolate_notify(unsigned long val, void *v);
>diff --git a/include/linux/node.h b/include/linux/node.h
>index 1a557c589ecb..02a29e71b175 100644
>--- a/include/linux/node.h
>+++ b/include/linux/node.h
>@@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> void *arg);
>-extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>- unsigned long phys_index);
>+extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>
> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
> unsigned int cpu_nid,
>@@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
> {
> return 0;
> }
>-static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>- unsigned long phys_index)
>+static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
> return 0;
> }
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 9a92549ef23b..82136c5b4c5f 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -520,8 +520,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
> if (WARN_ON_ONCE(!valid_section(ms)))
> return;
>
>- unregister_memory_section(ms);
>-
> scn_nr = __section_nr(ms);
> start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
> __remove_zone(zone, start_pfn);
>@@ -1845,6 +1843,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
> memblock_free(start, size);
> memblock_remove(start, size);
>
>+ /* remove memory block devices before removing memory */
>+ remove_memory_block_devices(start, size);
>+
> arch_remove_memory(nid, start, size, NULL);
> __release_memory_resource(start, size);
>
>--
>2.20.1
--
Wei Yang
Help you, Help me
^ permalink raw reply
* Re: [PATCH 00/15] kbuild: refactor headers_install and support compile-test of UAPI headers
From: Masahiro Yamada @ 2019-06-05 2:37 UTC (permalink / raw)
To: Linux Kbuild mailing list
Cc: Song Liu, open list:DOCUMENTATION, Palmer Dabbelt, Heiko Carstens,
Alexei Starovoitov, David Howells, Paul Mackerras, linux-riscv,
Vincent Chen, Sam Ravnborg, linux-s390, Arnd Bergmann,
Daniel Borkmann, Jonathan Corbet, Helge Deller,
Christian Borntraeger, Yonghong Song, arcml, Albert Ou,
Vasily Gorbik, Jani Nikula, Greentime Hu, James E.J. Bottomley,
Michal Marek, linux-parisc, Vineet Gupta, Randy Dunlap,
Linux Kernel Mailing List, Networking, bpf, linuxppc-dev,
Martin KaFai Lau
In-Reply-To: <20190604101409.2078-1-yamada.masahiro@socionext.com>
On Tue, Jun 4, 2019 at 7:15 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
>
> Multiple people have suggested to compile-test UAPI headers.
>
> Currently, Kbuild provides simple sanity checks by headers_check
> but they are not enough to catch bugs.
>
> The most recent patch I know is David Howells' work:
> https://patchwork.kernel.org/patch/10590203/
>
> I agree that we need better tests for UAPI headers,
> but I want to integrate it in a clean way.
>
> The idea that has been in my mind is to compile each header
> to make sure the selfcontainedness.
For convenience, I pushed this series at
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
uapi-header-test-v1
(13/15 was replaced with v2)
If you want to test it quickly, please check-out it, then
$ make -j8 allmodconfig usr/
(As I noted in the commit log, you need to use
a compiler that provides <stdlib.h>, <sys/time.h>, etc.)
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* Re: [PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
From: Martin K. Petersen @ 2019-06-05 2:39 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Tyrel Datwyler, Martin K. Petersen, linux-scsi,
James E.J. Bottomley, linux-kernel, clang-built-linux,
linuxppc-dev
In-Reply-To: <20190603234405.29600-1-natechancellor@gmail.com>
Nathan,
> clang warns:
>
> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
> case IBMVSCSI_HOST_ACTION_NONE:
> ^~~~~~~~~~~~~~~~~~~~~~~~~
Applied to 5.3/scsi-queue, thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs
From: Alexey Kardashevskiy @ 2019-06-05 3:38 UTC (permalink / raw)
To: linuxppc-dev
Cc: Shawn Anastasio, Alexey Kardashevskiy, Michael Roth, Sam Bobroff,
Oliver O'Halloran, David Gibson
When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.
The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.
This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
This code is there for ages (from 200x) hence no "Fixes:".
Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
at the moment:
- pci=resource_alignment= alone does not do anything;
- /chosen/linux,pci-probe-only alone does not cause the kernel to
reassign resources;
- pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
anyway.
---
arch/powerpc/kernel/pci_of_scan.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x02000000) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+ if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+ flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x40000000)
flags |= IORESOURCE_PREFETCH
--
2.17.1
^ permalink raw reply related
* Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs
From: Sam Bobroff @ 2019-06-05 4:47 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Shawn Anastasio, Michael Roth, Oliver O'Halloran,
linuxppc-dev, David Gibson
In-Reply-To: <20190605033814.127962-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 3337 bytes --]
On Wed, Jun 05, 2019 at 01:38:14PM +1000, Alexey Kardashevskiy wrote:
> When the firmware does PCI BAR resource allocation, it passes the assigned
> addresses and flags (prefetch/64bit/...) via the "reg" property of
> a PCI device device tree node so the kernel does not need to do
> resource allocation.
>
> The flags are stored in resource::flags - the lower byte stores
> PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
> Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
> such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
> When parsing the "reg" property, we copy the prefetch flag but we skip
> on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
>
> The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
> 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
> or by passing "/chosen/linux,pci-probe-only");
> 2. we request resource alignment (by passing pci=resource_alignment=
> via the kernel cmd line to request PAGE_SIZE alignment or defining
> ppc_md.pcibios_default_alignment which returns anything but 0). Note that
> the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
>
> With 1) and 2), the generic PCI code in the kernel unconditionally
> decides to:
> - reassign the BARs in pci_specified_resource_alignment() (works fine)
> - write new BARs to the device - this fails for 64bit BARs as the generic
> code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
> of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
> in the hypervisor.
>
> This fixes the issue by copying the flag. This is useful if we want to
> enforce certain BAR alignment per platform as handling subpage sized BARs
> is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>
> This code is there for ages (from 200x) hence no "Fixes:".
>
> Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
> at the moment:
> - pci=resource_alignment= alone does not do anything;
> - /chosen/linux,pci-probe-only alone does not cause the kernel to
> reassign resources;
> - pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
> anyway.
Looks good to me. I gave it a quick test for regressions, with a host
and QEMU guest (with some passed-through devices) both using the patch
and it seemed fine.
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
> arch/powerpc/kernel/pci_of_scan.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> index 24191ea2d9a7..64ad92016b63 100644
> --- a/arch/powerpc/kernel/pci_of_scan.c
> +++ b/arch/powerpc/kernel/pci_of_scan.c
> @@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
> if (addr0 & 0x02000000) {
> flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
> flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
> + if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
> + flags |= IORESOURCE_MEM_64;
> flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
> if (addr0 & 0x40000000)
> flags |= IORESOURCE_PREFETCH
> --
> 2.17.1
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: crash after NX error
From: Haren Myneni @ 2019-06-05 5:21 UTC (permalink / raw)
To: Stewart Smith; +Cc: linuxppc-dev
In-Reply-To: <87pnnuav9d.fsf@linux.vnet.ibm.com>
On 06/03/2019 08:23 PM, Stewart Smith wrote:
> On my two socket POWER9 system (powernv) with 842 zwap set up, I
> recently got a crash with the Ubuntu kernel (I haven't tried with
> upstream, and this is the first time the system has died like this, so
> I'm not sure how repeatable it is).
>
> [ 2.891463] zswap: loaded using pool 842-nx/zbud
> ...
> [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 5000000 us, giving up : 00 00 00 00 00000000
> [16868.932913] Unable to handle kernel paging request for data at address 0x6655f67da816cdb8
> [16868.933726] Faulting instruction address: 0xc000000000391600
>
>
> cpu 0x68: Vector: 380 (Data Access Out of Range) at [c000001c9d98b9a0]
> pc: c000000000391600: kmem_cache_alloc+0x2e0/0x340
> lr: c0000000003915ec: kmem_cache_alloc+0x2cc/0x340
> sp: c000001c9d98bc20
> msr: 900000000280b033
> dar: 6655f67da816cdb8
> current = 0xc000001ad43cb400
> paca = 0xc00000000fac7800 softe: 0 irq_happened: 0x01
> pid = 8319, comm = make
> Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 4.15.0-50.54-generic 4.15.18)
>
> 68:mon> t
> [c000001c9d98bc20] c0000000003914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
> [c000001c9d98bc80] c0000000003b1e14 __khugepaged_enter+0x54/0x220
> [c000001c9d98bcc0] c00000000010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
> [c000001c9d98bda0] c00000000010fe4c _do_fork+0xec/0x510
> [c000001c9d98be30] c00000000000b584 ppc_clone+0x8/0xc
> --- Exception: c00 (System Call) at 00007afe9daf87f4
> SP (7fffca606880) is in userspace
>
> So, it looks like there could be a problem in the error path, plausibly
> fixed by this patch:
>
> commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
> Author: Haren Myneni <haren@linux.vnet.ibm.com>
> Date: Wed Jun 13 00:32:40 2018 -0700
>
> crypto/nx: Initialize 842 high and normal RxFIFO control registers
>
> NX increments readOffset by FIFO size in receive FIFO control register
> when CRB is read. But the index in RxFIFO has to match with the
> corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
> may be processing incorrect CRBs and can cause CRB timeout.
>
> VAS FIFO offset is 0 when the receive window is opened during
> initialization. When the module is reloaded or in kexec boot, readOffset
> in FIFO control register may not match with VAS entry. This patch adds
> nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
> control register for both high and normal FIFOs.
>
> Signed-off-by: Haren Myneni <haren@us.ibm.com>
> [mpe: Fixup uninitialized variable warning]
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>
> $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
> v4.19-rc1~24^2~50
>
>
> Which was never backported to any stable release, so probably needs to
> be for v4.14 through v4.18. Notably, Ubuntu is on v4.15 and it doesn't
> seem to have picked up the patch. I'm opening an Ubuntu bug for this.
>
> Haren, is this something you can drive through the stable process
> (assuming my above crash looks like this failure)?
>
Thanks Stewart. Missed this in stable releases and I will work on it. Merged in Ubuntu 18.04.x kernel recently and will be in the next update.
Also need
commit 6e708000ec2c93c2bde6a46aa2d6c3e80d4eaeb9
Author: Haren Myneni <haren@linux.vnet.ibm.com>
Date: Wed Jun 13 00:28:57 2018 -0700
powerpc/powernv: Export opal_check_token symbol
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
^ permalink raw reply
* Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs
From: Oliver @ 2019-06-05 5:24 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Shawn Anastasio, Sam Bobroff, Michael Roth, linuxppc-dev,
David Gibson
In-Reply-To: <20190605033814.127962-1-aik@ozlabs.ru>
On Wed, Jun 5, 2019 at 1:38 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> When the firmware does PCI BAR resource allocation, it passes the assigned
> addresses and flags (prefetch/64bit/...) via the "reg" property of
> a PCI device device tree node so the kernel does not need to do
> resource allocation.
>
> The flags are stored in resource::flags - the lower byte stores
> PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
> Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
> such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
> When parsing the "reg" property, we copy the prefetch flag but we skip
> on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
>
> The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
> 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
> or by passing "/chosen/linux,pci-probe-only");
> 2. we request resource alignment (by passing pci=resource_alignment=
> via the kernel cmd line to request PAGE_SIZE alignment or defining
> ppc_md.pcibios_default_alignment which returns anything but 0). Note that
> the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
>
> With 1) and 2), the generic PCI code in the kernel unconditionally
> decides to:
> - reassign the BARs in pci_specified_resource_alignment() (works fine)
> - write new BARs to the device - this fails for 64bit BARs as the generic
> code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
> of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
> in the hypervisor.
>
> This fixes the issue by copying the flag. This is useful if we want to
> enforce certain BAR alignment per platform as handling subpage sized BARs
> is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>
> This code is there for ages (from 200x) hence no "Fixes:".
>
> Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
> at the moment:
> - pci=resource_alignment= alone does not do anything;
> - /chosen/linux,pci-probe-only alone does not cause the kernel to
> reassign resources;
> - pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
> anyway.
>
> ---
> arch/powerpc/kernel/pci_of_scan.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> index 24191ea2d9a7..64ad92016b63 100644
> --- a/arch/powerpc/kernel/pci_of_scan.c
> +++ b/arch/powerpc/kernel/pci_of_scan.c
> @@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
> if (addr0 & 0x02000000) {
> flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
> flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
> + if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
> + flags |= IORESOURCE_MEM_64;
> flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
> if (addr0 & 0x40000000)
> flags |= IORESOURCE_PREFETCH
> --
> 2.17.1
Seems like an oversight that PROBE_ONLY has been papering over for years.
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
^ permalink raw reply
* Re: [PATCH v2 6/6] powerpc/eeh: Refactor around eeh_probe_devices()
From: Oliver @ 2019-06-05 5:49 UTC (permalink / raw)
To: Sam Bobroff; +Cc: Alexey Kardashevskiy, linuxppc-dev, Tyrel Datwyler
In-Reply-To: <a3e33cf84e4d6a957259e04533c85a37db0d2aef.1557203383.git.sbobroff@linux.ibm.com>
On Tue, May 7, 2019 at 2:30 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote:
>
> Now that EEH support for all devices (on PowerNV and pSeries) is
> provided by the pcibios bus add device hooks, eeh_probe_devices() and
> eeh_addr_cache_build() are redundant and can be removed.
>
> Move the EEH enabled message into it's own function so that it can be
> called from multiple places.
>
> Note that previously on pSeries, useless EEH sysfs files were created
> for some devices that did not have EEH support and this change
> prevents them from being created.
>
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
> v2 - As it's so small, merged the enablement message patch into this one (where it's used).
> - Reworked enablement messages.
>
> arch/powerpc/include/asm/eeh.h | 7 ++---
> arch/powerpc/kernel/eeh.c | 27 ++++++-----------
> arch/powerpc/kernel/eeh_cache.c | 32 --------------------
> arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +--
> arch/powerpc/platforms/pseries/pci.c | 3 +-
> 5 files changed, 14 insertions(+), 59 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 12baf1df134c..3994d45ae0d4 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -283,13 +283,12 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>
> struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
> void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
> -void eeh_probe_devices(void);
> +void eeh_show_enabled(void);
> int __init eeh_ops_register(struct eeh_ops *ops);
> int __exit eeh_ops_unregister(const char *name);
> int eeh_check_failure(const volatile void __iomem *token);
> int eeh_dev_check_failure(struct eeh_dev *edev);
> void eeh_addr_cache_init(void);
> -void eeh_addr_cache_build(void);
> void eeh_add_device_early(struct pci_dn *);
> void eeh_add_device_tree_early(struct pci_dn *);
> void eeh_add_device_late(struct pci_dev *);
> @@ -333,7 +332,7 @@ static inline bool eeh_enabled(void)
> return false;
> }
>
> -static inline void eeh_probe_devices(void) { }
> +static inline void eeh_show_enabled(void) { }
>
> static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
> {
> @@ -351,8 +350,6 @@ static inline int eeh_check_failure(const volatile void __iomem *token)
>
> static inline void eeh_addr_cache_init(void) { }
>
> -static inline void eeh_addr_cache_build(void) { }
> -
> static inline void eeh_add_device_early(struct pci_dn *pdn) { }
>
> static inline void eeh_add_device_tree_early(struct pci_dn *pdn) { }
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 1ed80adb40a1..f905235f0307 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -163,6 +163,16 @@ static int __init eeh_setup(char *str)
> }
> __setup("eeh=", eeh_setup);
>
> +void eeh_show_enabled(void)
> +{
> + if (eeh_has_flag(EEH_FORCE_DISABLED))
> + pr_info("EEH: Recovery disabled by kernel parameter.\n");
> + else if (eeh_has_flag(EEH_ENABLED))
> + pr_info("EEH: Capable adapter found: recovery enabled.\n");
> + else
> + pr_info("EEH: No capable adapters found: recovery disabled.\n");
> +}
> +
> /*
> * This routine captures assorted PCI configuration space data
> * for the indicated PCI device, and puts them into a buffer
> @@ -1156,23 +1166,6 @@ static struct notifier_block eeh_reboot_nb = {
> .notifier_call = eeh_reboot_notifier,
> };
>
> -void eeh_probe_devices(void)
> -{
> - struct pci_controller *hose, *tmp;
> - struct pci_dn *pdn;
> -
> - /* Enable EEH for all adapters */
> - list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
> - pdn = hose->pci_data;
> - traverse_pci_dn(pdn, eeh_ops->probe, NULL);
> - }
> - if (eeh_enabled())
> - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> - else
> - pr_info("EEH: No capable adapters found\n");
> -
> -}
The one concern I have about this is that PAPR requires us to enable
EEH for the device before we do any config accesses. From PAPR:
R1–7.3.11.1–5. For the EEH option: If a device driver is going to
enable EEH and the platform has not defaulted
to EEH enabled, then it must do so before it does any operations with
its IOA, including any configuration
cycles or Load or Store operations.
So if we want to be strictly compatible we'd need to ensure the
set-eeh-option RTAS call happens before we read the VDID in
pci_scan_device(). The pseries eeh_probe() function does this
currently, but if we defer it until the pcibios call happens we'll
have done a pile of config accesses before then. Maybe it doesn't
matter, but we'd need to do further testing under phyp or work out
some other way to ensure it's done pre-probe.
> /**
> * eeh_init - EEH initialization
> *
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index f93dd5cf6a39..c40078d036af 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -278,38 +278,6 @@ void eeh_addr_cache_init(void)
> spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> }
>
> -/**
> - * eeh_addr_cache_build - Build a cache of I/O addresses
> - *
> - * Build a cache of pci i/o addresses. This cache will be used to
> - * find the pci device that corresponds to a given address.
> - * This routine scans all pci busses to build the cache.
> - * Must be run late in boot process, after the pci controllers
> - * have been scanned for devices (after all device resources are known).
> - */
> -void eeh_addr_cache_build(void)
> -{
> - struct pci_dn *pdn;
> - struct eeh_dev *edev;
> - struct pci_dev *dev = NULL;
> -
> - for_each_pci_dev(dev) {
> - pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
> - if (!pdn)
> - continue;
> -
> - edev = pdn_to_eeh_dev(pdn);
> - if (!edev)
> - continue;
> -
> - dev->dev.archdata.edev = edev;
> - edev->pdev = dev;
> -
> - eeh_addr_cache_insert_dev(dev);
> - eeh_sysfs_add_device(dev);
> - }
> -}
> -
> static int eeh_addr_cache_show(struct seq_file *s, void *v)
> {
> struct pci_io_addr_range *piar;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 90729d908a54..22a94f4b8586 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -259,9 +259,7 @@ int pnv_eeh_post_init(void)
> struct pnv_phb *phb;
> int ret = 0;
>
> - /* Probe devices & build address cache */
> - eeh_probe_devices();
> - eeh_addr_cache_build();
> + eeh_show_enabled();
>
> /* Register OPAL event notifier */
> eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR));
> diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
> index 37a77e57893e..d6a5f4f27507 100644
> --- a/arch/powerpc/platforms/pseries/pci.c
> +++ b/arch/powerpc/platforms/pseries/pci.c
> @@ -242,8 +242,7 @@ void __init pSeries_final_fixup(void)
>
> pSeries_request_regions();
>
> - eeh_probe_devices();
> - eeh_addr_cache_build();
> + eeh_show_enabled();
>
> #ifdef CONFIG_PCI_IOV
> ppc_md.pcibios_sriov_enable = pseries_pcibios_sriov_enable;
> --
> 2.19.0.2.gcad72f5712
>
^ permalink raw reply
* Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
From: Greg KH @ 2019-06-05 6:14 UTC (permalink / raw)
To: Nayna
Cc: nayna, cclaudio, Mimi Zohar, George Wilson, Matthew Garrett,
Elaine Palmer, linux-fsdevel, linuxppc-dev, Daniel Axtens
In-Reply-To: <90d3394f-c2e6-5d47-0ebd-0ddb28f3f883@linux.vnet.ibm.com>
On Tue, Jun 04, 2019 at 04:33:14PM -0400, Nayna wrote:
>
>
> On 06/03/2019 03:29 AM, Greg KH wrote:
> > On Mon, Jun 03, 2019 at 04:04:32PM +1000, Daniel Axtens wrote:
> > > Hi Nayna,
> > >
> > > > > As PowerNV moves towards secure boot, we need a place to put secure
> > > > > variables. One option that has been canvassed is to make our secure
> > > > > variables look like EFI variables. This is an early sketch of another
> > > > > approach where we create a generic firmware variable file system,
> > > > > fwvarfs, and an OPAL Secure Variable backend for it.
> > > > Is there a need of new filesystem ? I am wondering why can't these be
> > > > exposed via sysfs / securityfs ?
> > > > Probably, something like... /sys/firmware/secureboot or
> > > > /sys/kernel/security/secureboot/ ?
> > > I suppose we could put secure variables in sysfs, but I'm not sure
> > > that's what sysfs was intended for. I understand sysfs as "a
> > > filesystem-based view of kernel objects" (from
> > > Documentation/filesystems/configfs/configfs.txt), and I don't think a
> > > secure variable is really a kernel object in the same way most other
> > > things in sysfs are... but I'm open to being convinced.
> > What makes them more "secure" than anything else that is in sysfs today?
> > I didn't see anything in this patchset that provided "additional
> > security", did I miss it?
> >
> > > securityfs seems to be reserved for LSMs, I don't think we can put
> > > things there.
> > Yeah, I wouldn't mess with that.
>
> Thanks Greg for clarifying!! I am curious, the TPM exposes the BIOS
> event log to userspace via securityfs. Is there a reason for not
> exposing these security variables to userspace via securityfs as well?
securityfs is for LSMs to use. If the TPM drivers also use it, well,
that's between those authors and the securityfs developers.
BIOS/firmware variables are a much different thing than a TPM log.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] ocxl: do not use C++ style comments in uapi header
From: Andrew Donnellan @ 2019-06-05 6:16 UTC (permalink / raw)
To: Masahiro Yamada, Frederic Barrat
Cc: Arnd Bergmann, Greg KH, Randy Dunlap, Linux Kernel Mailing List,
Joe Perches, Thomas Gleixner, linuxppc-dev
In-Reply-To: <CAK7LNASV9Chjd+o3+2ZbA0WHu=dVBFf2AC1dT=eLSf3_2pe12Q@mail.gmail.com>
On 4/6/19 10:12 pm, Masahiro Yamada wrote:
> On Tue, Jun 4, 2019 at 8:54 PM Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>
>>
>>
>> Le 04/06/2019 à 13:16, Masahiro Yamada a écrit :
>>> Linux kernel tolerates C++ style comments these days. Actually, the
>>> SPDX License tags for .c files start with //.
>>>
>>> On the other hand, uapi headers are written in more strict C, where
>>> the C++ comment style is forbidden.
>>>
>>> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
>>> ---
>>
>> Thanks!
>> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
>>
>
> Please hold on this patch until
> we get consensus about the C++ comment style.
>
> Discussion just started here:
> https://lore.kernel.org/patchwork/patch/1083801/
If you choose to proceed with this patch:
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
^ permalink raw reply
* Re: [PATCH v2 06/22] docs: mark orphan documents as such
From: Andrew Donnellan @ 2019-06-05 6:16 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Linux Doc Mailing List
Cc: Maxime Ripard, dri-devel, platform-driver-x86, Paul Mackerras,
linux-stm32, Jonathan Corbet, David Airlie, Alexandre Torgue,
linux-pm, Maarten Lankhorst, Matan Ziv-Av, Mauro Carvalho Chehab,
Daniel Vetter, Sean Paul, linux-arm-kernel, linux-kernel,
Maxime Coquelin, Frederic Barrat, linuxppc-dev, Georgi Djakov
In-Reply-To: <4afa83787acec906c383978dc01f286940e28616.1559656538.git.mchehab+samsung@kernel.org>
On 5/6/19 12:17 am, Mauro Carvalho Chehab wrote:
> Sphinx doesn't like orphan documents:
>
> Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree
> Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree
> Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree
> Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree
> Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree
> Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree
> Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree
> Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree
>
> So, while they aren't on any toctree, add :orphan: to them, in order
> to silent this warning.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
ocxl:
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
We should find somewhere to put it...
> ---
> Documentation/accelerators/ocxl.rst | 2 ++
> Documentation/arm/stm32/overview.rst | 2 ++
> Documentation/arm/stm32/stm32f429-overview.rst | 2 ++
> Documentation/arm/stm32/stm32f746-overview.rst | 2 ++
> Documentation/arm/stm32/stm32f769-overview.rst | 2 ++
> Documentation/arm/stm32/stm32h743-overview.rst | 2 ++
> Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++
> Documentation/gpu/msm-crash-dump.rst | 2 ++
> Documentation/interconnect/interconnect.rst | 2 ++
> Documentation/laptops/lg-laptop.rst | 2 ++
> Documentation/powerpc/isa-versions.rst | 2 ++
> 11 files changed, 22 insertions(+)
>
> diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
> index 14cefc020e2d..b1cea19a90f5 100644
> --- a/Documentation/accelerators/ocxl.rst
> +++ b/Documentation/accelerators/ocxl.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> ========================================================
> OpenCAPI (Open Coherent Accelerator Processor Interface)
> ========================================================
> diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst
> index 85cfc8410798..f7e734153860 100644
> --- a/Documentation/arm/stm32/overview.rst
> +++ b/Documentation/arm/stm32/overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> ========================
> STM32 ARM Linux Overview
> ========================
> diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst
> index 18feda97f483..65bbb1c3b423 100644
> --- a/Documentation/arm/stm32/stm32f429-overview.rst
> +++ b/Documentation/arm/stm32/stm32f429-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> STM32F429 Overview
> ==================
>
> diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst
> index b5f4b6ce7656..42d593085015 100644
> --- a/Documentation/arm/stm32/stm32f746-overview.rst
> +++ b/Documentation/arm/stm32/stm32f746-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> STM32F746 Overview
> ==================
>
> diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst
> index 228656ced2fe..f6adac862b17 100644
> --- a/Documentation/arm/stm32/stm32f769-overview.rst
> +++ b/Documentation/arm/stm32/stm32f769-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> STM32F769 Overview
> ==================
>
> diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst
> index 3458dc00095d..c525835e7473 100644
> --- a/Documentation/arm/stm32/stm32h743-overview.rst
> +++ b/Documentation/arm/stm32/stm32h743-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> STM32H743 Overview
> ==================
>
> diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst
> index 62e176d47ca7..2c52cd020601 100644
> --- a/Documentation/arm/stm32/stm32mp157-overview.rst
> +++ b/Documentation/arm/stm32/stm32mp157-overview.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> STM32MP157 Overview
> ===================
>
> diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst
> index 757cd257e0d8..240ef200f76c 100644
> --- a/Documentation/gpu/msm-crash-dump.rst
> +++ b/Documentation/gpu/msm-crash-dump.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> =====================
> MSM Crash Dump Format
> =====================
> diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
> index c3e004893796..56e331dab70e 100644
> --- a/Documentation/interconnect/interconnect.rst
> +++ b/Documentation/interconnect/interconnect.rst
> @@ -1,5 +1,7 @@
> .. SPDX-License-Identifier: GPL-2.0
>
> +:orphan:
> +
> =====================================
> GENERIC SYSTEM INTERCONNECT SUBSYSTEM
> =====================================
> diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst
> index aa503ee9b3bc..f2c2ffe31101 100644
> --- a/Documentation/laptops/lg-laptop.rst
> +++ b/Documentation/laptops/lg-laptop.rst
> @@ -1,5 +1,7 @@
> .. SPDX-License-Identifier: GPL-2.0+
>
> +:orphan:
> +
> LG Gram laptop extra features
> =============================
>
> diff --git a/Documentation/powerpc/isa-versions.rst b/Documentation/powerpc/isa-versions.rst
> index 812e20cc898c..66c24140ebf1 100644
> --- a/Documentation/powerpc/isa-versions.rst
> +++ b/Documentation/powerpc/isa-versions.rst
> @@ -1,3 +1,5 @@
> +:orphan:
> +
> CPU to ISA Version Mapping
> ==========================
>
>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
^ permalink raw reply
* Re: [PATCH v5] powerpc/64s: support nospectre_v2 cmdline option
From: Andrew Donnellan @ 2019-06-05 7:42 UTC (permalink / raw)
To: Christopher M. Riedl, linuxppc-dev
In-Reply-To: <20190524024647.381-1-cmr@informatik.wtf>
On 24/5/19 12:46 pm, Christopher M. Riedl wrote:
> Add support for disabling the kernel implemented spectre v2 mitigation
> (count cache flush on context switch) via the nospectre_v2 and
> mitigations=off cmdline options.
>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
snowpatch is whinging about this breaking the build for some reason...
https://patchwork.ozlabs.org/patch/1104583/
> ---
> v4->v5:
> Fix checkpatch complaint
> arch/powerpc/kernel/security.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index e1c9cf079503..7cfcb294b11c 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -28,7 +28,7 @@ static enum count_cache_flush_type count_cache_flush_type = COUNT_CACHE_FLUSH_NO
> bool barrier_nospec_enabled;
> static bool no_nospec;
> static bool btb_flush_enabled;
> -#ifdef CONFIG_PPC_FSL_BOOK3E
> +#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64)
> static bool no_spectrev2;
> #endif
>
> @@ -114,7 +114,7 @@ static __init int security_feature_debugfs_init(void)
> device_initcall(security_feature_debugfs_init);
> #endif /* CONFIG_DEBUG_FS */
>
> -#ifdef CONFIG_PPC_FSL_BOOK3E
> +#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64)
> static int __init handle_nospectre_v2(char *p)
> {
> no_spectrev2 = true;
> @@ -122,6 +122,9 @@ static int __init handle_nospectre_v2(char *p)
> return 0;
> }
> early_param("nospectre_v2", handle_nospectre_v2);
> +#endif /* CONFIG_PPC_FSL_BOOK3E || CONFIG_PPC_BOOK3S_64 */
> +
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> void setup_spectre_v2(void)
> {
> if (no_spectrev2 || cpu_mitigations_off())
> @@ -399,7 +402,17 @@ static void toggle_count_cache_flush(bool enable)
>
> void setup_count_cache_flush(void)
> {
> - toggle_count_cache_flush(true);
> + bool enable = true;
> +
> + if (no_spectrev2 || cpu_mitigations_off()) {
> + if (security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED) ||
> + security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED))
> + pr_warn("Spectre v2 mitigations not under software control, can't disable\n");
> +
> + enable = false;
> + }
> +
> + toggle_count_cache_flush(enable);
> }
>
> #ifdef CONFIG_DEBUG_FS
>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@linux.ibm.com IBM Australia Limited
^ permalink raw reply
* Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata
From: Alexey Kardashevskiy @ 2019-06-05 8:02 UTC (permalink / raw)
To: Aneesh Kumar K.V, npiggin, paulus, mpe; +Cc: linuxppc-dev, oohall
In-Reply-To: <20190602044350.31660-1-aneesh.kumar@linux.ibm.com>
On 02/06/2019 14:43, Aneesh Kumar K.V wrote:
> SCM_READ/WRITE_MEATADATA hcall supports multibyte read/write. This patch
> updates the metadata read/write to use 1, 2, 4 or 8 byte read/write as
> mentioned in PAPR document.
>
> READ/WRITE_METADATA hcall supports the 1, 2, 4, or 8 bytes read/write.
> For other values hcall results H_P3.
>
> Hypervisor stores the metadata contents in big-endian format and in-order
> to enable read/write in different granularity, we need to switch the contents
> to big-endian before calling HCALL.
>
> Based on an patch from Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> arch/powerpc/platforms/pseries/papr_scm.c | 104 +++++++++++++++++-----
> 1 file changed, 82 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0176ce66673f..e33cebb8ee6c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
> }
>
> static int papr_scm_meta_get(struct papr_scm_priv *p,
> - struct nd_cmd_get_config_data_hdr *hdr)
> + struct nd_cmd_get_config_data_hdr *hdr)
> {
> unsigned long data[PLPAR_HCALL_BUFSIZE];
> + unsigned long offset, data_offset;
> + int len, read;
> int64_t ret;
>
> - if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
> + if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
> return -EINVAL;
>
> - ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
> - hdr->in_offset, 1);
> -
> - if (ret == H_PARAMETER) /* bad DRC index */
> - return -ENODEV;
> - if (ret)
> - return -EINVAL; /* other invalid parameter */
> -
> - hdr->out_buf[0] = data[0] & 0xff;
> -
> + for (len = hdr->in_length; len; len -= read) {
> +
> + data_offset = hdr->in_length - len;
> + offset = hdr->in_offset + data_offset;
> +
> + if (len >= 8)
> + read = 8;
> + else if (len >= 4)
> + read = 4;
> + else if ( len >= 2)
Do not need a space before "len".
> + read = 2;
> + else
> + read = 1;
> +
> + ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
> + offset, read);
> +
> + if (ret == H_PARAMETER) /* bad DRC index */
> + return -ENODEV;
> + if (ret)
> + return -EINVAL; /* other invalid parameter */
> +
> + switch (read) {
> + case 8:
> + *(uint64_t *)(hdr->out_buf + data_offset) = be64_to_cpu(data[0]);
> + break;
> + case 4:
> + *(uint32_t *)(hdr->out_buf + data_offset) = be32_to_cpu(data[0] & 0xffffffff);
> + break;
> +
> + case 2:
> + *(uint16_t *)(hdr->out_buf + data_offset) = be16_to_cpu(data[0] & 0xffff);
> + break;
> +
> + case 1:
> + *(uint32_t *)(hdr->out_buf + data_offset) = (data[0] & 0xff);
Memory corruption, should be uint8_t*.
> + break;
> + }
> + }
> return 0;
> }
>
> static int papr_scm_meta_set(struct papr_scm_priv *p,
> - struct nd_cmd_set_config_hdr *hdr)
> + struct nd_cmd_set_config_hdr *hdr)
> {
> + unsigned long offset, data_offset;
> + int len, wrote;
> + unsigned long data;
> + __be64 data_be;
> int64_t ret;
>
> - if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
> + if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
> return -EINVAL;
>
> - ret = plpar_hcall_norets(H_SCM_WRITE_METADATA,
> - p->drc_index, hdr->in_offset, hdr->in_buf[0], 1);
> -
> - if (ret == H_PARAMETER) /* bad DRC index */
> - return -ENODEV;
> - if (ret)
> - return -EINVAL; /* other invalid parameter */
> + for (len = hdr->in_length; len; len -= wrote) {
> +
> + data_offset = hdr->in_length - len;
> + offset = hdr->in_offset + data_offset;
> +
> + if (len >= 8) {
> + data = *(uint64_t *)(hdr->in_buf + data_offset);
> + data_be = cpu_to_be64(data);
> + wrote = 8;
> + } else if (len >= 4) {
> + data = *(uint32_t *)(hdr->in_buf + data_offset);
> + data &= 0xffffffff;
Why do you need &0xffffffff here and below (&0xffff, &0xff)? uint32_t is
unsigned type so the sign bit won't be extended.
> + data_be = cpu_to_be32(data);
> + wrote = 4;
> + } else if (len >= 2) {
> + data = *(uint16_t *)(hdr->in_buf + data_offset);
> + data &= 0xffff;
> + data_be = cpu_to_be16(data);
> + wrote = 2;
> + } else {
> + data_be = *(uint8_t *)(hdr->in_buf + data_offset);
> + data_be &= 0xff;
> + wrote = 1;
> + }
> +
> + ret = plpar_hcall_norets(H_SCM_WRITE_METADATA, p->drc_index,
> + offset, data_be, wrote);
> + if (ret == H_PARAMETER) /* bad DRC index */
> + return -ENODEV;
> + if (ret)
> + return -EINVAL; /* other invalid parameter */
> + }
>
> return 0;
> }
> @@ -154,7 +214,7 @@ int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
> get_size_hdr = buf;
>
> get_size_hdr->status = 0;
> - get_size_hdr->max_xfer = 1;
> + get_size_hdr->max_xfer = 8;
> get_size_hdr->config_size = p->metadata_size;
> *cmd_rc = 0;
> break;
>
--
Alexey
^ permalink raw reply
* Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
From: Greg KH @ 2019-06-05 8:13 UTC (permalink / raw)
To: Matthew Garrett
Cc: Nayna, Nayna Jain, Claudio Carvalho, Mimi Zohar, George Wilson,
Elaine Palmer, linux-fsdevel, linuxppc-dev, Daniel Axtens
In-Reply-To: <CACdnJutpgxd0Se-UDR4Gw3s+KfTuHprUTqFqxq+qu17vd4xr7Q@mail.gmail.com>
On Tue, Jun 04, 2019 at 01:05:45PM -0700, Matthew Garrett wrote:
> On Tue, Jun 4, 2019 at 1:01 PM Nayna <nayna@linux.vnet.ibm.com> wrote:
> > It seems efivars were first implemented in sysfs and then later
> > separated out as efivarfs.
> > Refer - Documentation/filesystems/efivarfs.txt.
> >
> > So, the reason wasn't that sysfs should not be used for exposing
> > firmware variables,
> > but for the size limitations which seems to come from UEFI Specification.
> >
> > Is this limitation valid for the new requirement of secure variables ?
>
> I don't think the size restriction is an issue now, but there's a lot
> of complex semantics around variable deletion and immutability that
> need to be represented somehow.
Ah, yeah, that's the reason it would not work in sysfs, forgot all about
that, thanks.
greg k-h
^ permalink raw reply
* Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs
From: Shawn Anastasio @ 2019-06-05 8:12 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Sam Bobroff, Michael Roth, Oliver O'Halloran, David Gibson
In-Reply-To: <20190605033814.127962-1-aik@ozlabs.ru>
On 6/4/19 10:38 PM, Alexey Kardashevskiy wrote:
> When the firmware does PCI BAR resource allocation, it passes the assigned
> addresses and flags (prefetch/64bit/...) via the "reg" property of
> a PCI device device tree node so the kernel does not need to do
> resource allocation.
>
> The flags are stored in resource::flags - the lower byte stores
> PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
> Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
> such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
> When parsing the "reg" property, we copy the prefetch flag but we skip
> on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.
>
> The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
> 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
> or by passing "/chosen/linux,pci-probe-only");
> 2. we request resource alignment (by passing pci=resource_alignment=
> via the kernel cmd line to request PAGE_SIZE alignment or defining
> ppc_md.pcibios_default_alignment which returns anything but 0). Note that
> the alignment requests are ignored if PCI_PROBE_ONLY is enabled.
>
> With 1) and 2), the generic PCI code in the kernel unconditionally
> decides to:
> - reassign the BARs in pci_specified_resource_alignment() (works fine)
> - write new BARs to the device - this fails for 64bit BARs as the generic
> code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
> of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
> in the hypervisor.
>
> This fixes the issue by copying the flag. This is useful if we want to
> enforce certain BAR alignment per platform as handling subpage sized BARs
> is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>
> This code is there for ages (from 200x) hence no "Fixes:".
>
> Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
> at the moment:
> - pci=resource_alignment= alone does not do anything;
> - /chosen/linux,pci-probe-only alone does not cause the kernel to
> reassign resources;
> - pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
> anyway.
>
> ---
> arch/powerpc/kernel/pci_of_scan.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> index 24191ea2d9a7..64ad92016b63 100644
> --- a/arch/powerpc/kernel/pci_of_scan.c
> +++ b/arch/powerpc/kernel/pci_of_scan.c
> @@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
> if (addr0 & 0x02000000) {
> flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
> flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
> + if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
> + flags |= IORESOURCE_MEM_64;
> flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
> if (addr0 & 0x40000000)
> flags |= IORESOURCE_PREFETCH
>
I have confirmed that this fixes the case with PCI_PROBE_ONLY
disabled and a ppc_md.pcibios_default_alignment implementation that
returns PAGE_SIZE.
Reviewed-by: Shawn Anastasio <shawn@anastas.io>
^ permalink raw reply
* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: David Hildenbrand @ 2019-06-05 8:58 UTC (permalink / raw)
To: Wei Yang
Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
mike.travis@hpe.com, Greg Kroah-Hartman, Rafael J. Wysocki,
Mathieu Malaterre, linux-kernel, Arun KS, Ingo Molnar, linux-mm,
Andrew Banman, Qian Cai, Igor Mammedov, akpm, linuxppc-dev,
Dan Williams, linux-arm-kernel, Oscar Salvador
In-Reply-To: <20190604214234.ltwtkcdoju2gxisx@master>
>> /*
>> * For now, we have a linear search to go find the appropriate
>> * memory_block corresponding to a particular phys_index. If
>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>> unsigned long start_pfn;
>> int ret = 0;
>>
>> + mem = find_memory_block_by_id(block_id, NULL);
>> + if (mem) {
>> + put_device(&mem->dev);
>> + return -EEXIST;
>> + }
>
> find_memory_block_by_id() is not that close to the main idea in this patch.
> Would it be better to split this part?
I played with that but didn't like the temporary results (e.g. having to
export find_memory_block_by_id()). I'll stick to this for now.
>
>> mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>> if (!mem)
>> return -ENOMEM;
>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>> return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> + if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>> + return;
>> +
>> + /* drop the ref. we got via find_memory_block() */
>> + put_device(&memory->dev);
>> + device_unregister(&memory->dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>> */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> - int block_id = base_memory_block_id(__section_nr(section));
>> - int ret = 0;
>> + const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> + int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>> struct memory_block *mem;
>> + unsigned long block_id;
>> + int ret = 0;
>>
>> - mutex_lock(&mem_sysfs_mutex);
>> + if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> + !IS_ALIGNED(size, memory_block_size_bytes())))
>> + return -EINVAL;
>>
>> - mem = find_memory_block(section);
>> - if (mem) {
>> - mem->section_count++;
>> - put_device(&mem->dev);
>> - } else {
>> + mutex_lock(&mem_sysfs_mutex);
>> + for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>> ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>> if (ret)
>> - goto out;
>> - mem->section_count++;
>> + break;
>> + mem->section_count = sections_per_block;
>> + }
>> + if (ret) {
>> + end_block_id = block_id;
>> + for (block_id = start_block_id; block_id != end_block_id;
>> + block_id++) {
>> + mem = find_memory_block_by_id(block_id, NULL);
>> + mem->section_count = 0;
>> + unregister_memory(mem);
>> + }
>> }
>
> Would it be better to do this in reverse order?
>
> And unregister_memory() would free mem, so it is still necessary to set
> section_count to 0?
1. I kept the existing behavior (setting it to 0) for now. I am planning
to eventually remove the section count completely (it could be
beneficial to detect removing of partially populated memory blocks).
2. Reverse order: We would have to start with "block_id - 1", I don't
like that better.
Thanks for having a look!
--
Thanks,
David / dhildenb
^ permalink raw reply
* Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
From: David Hildenbrand @ 2019-06-05 9:00 UTC (permalink / raw)
To: Wei Yang
Cc: Michal Hocko, linux-ia64, linux-sh, Chris Wilson, linux-mm,
Arun KS, Ingo Molnar, linux-s390, Rafael J. Wysocki,
Pavel Tatashin, mike.travis@hpe.com, Mark Brown, Jonathan Cameron,
Dan Williams, linux-arm-kernel, Oscar Salvador, Andrew Banman,
Mathieu Malaterre, Greg Kroah-Hartman, linux-kernel, Alex Deucher,
Igor Mammedov, akpm, linuxppc-dev, David S. Miller
In-Reply-To: <20190604220715.d4d2ctwjk25vd5sq@master>
On 05.06.19 00:07, Wei Yang wrote:
> On Mon, May 27, 2019 at 01:11:50PM +0200, David Hildenbrand wrote:
>> Let's factor out removing of memory block devices, which is only
>> necessary for memory added via add_memory() and friends that created
>> memory block devices. Remove the devices before calling
>> arch_remove_memory().
>>
>> This finishes factoring out memory block device handling from
>> arch_add_memory() and arch_remove_memory().
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Mark Brown <broonie@kernel.org>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c | 37 ++++++++++++++++++-------------------
>> drivers/base/node.c | 11 ++++++-----
>> include/linux/memory.h | 2 +-
>> include/linux/node.h | 6 ++----
>> mm/memory_hotplug.c | 5 +++--
>> 5 files changed, 30 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 5a0370f0c506..f28efb0bf5c7 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
>> return ret;
>> }
>>
>> -void unregister_memory_section(struct mem_section *section)
>> +/*
>> + * Remove memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * have to be offline.
>> + */
>> +void remove_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> + const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> + const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>> struct memory_block *mem;
>> + int block_id;
>>
>> - if (WARN_ON_ONCE(!present_section(section)))
>> + if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> + !IS_ALIGNED(size, memory_block_size_bytes())))
>> return;
>>
>> mutex_lock(&mem_sysfs_mutex);
>> -
>> - /*
>> - * Some users of the memory hotplug do not want/need memblock to
>> - * track all sections. Skip over those.
>> - */
>> - mem = find_memory_block(section);
>> - if (!mem)
>> - goto out_unlock;
>> -
>> - unregister_mem_sect_under_nodes(mem, __section_nr(section));
>> -
>> - mem->section_count--;
>> - if (mem->section_count == 0)
>> + for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>> + mem = find_memory_block_by_id(block_id, NULL);
>> + if (WARN_ON_ONCE(!mem))
>> + continue;
>> + mem->section_count = 0;
>
> Is this step necessary?
It's what the previous code does, it might not be - I'll leave it like
that for now. As mentioned in another reply, I might remove the
section_count completely, eventually.
--
Thanks,
David / dhildenb
^ permalink raw reply
* Re: [PATCH] Documentation/stackprotector: powerpc supports stack protector
From: Bhupesh Sharma @ 2019-06-05 9:41 UTC (permalink / raw)
To: Michael Ellerman
Cc: Arnd Bergmann, Jonathan Corbet, Linux Doc Mailing List,
Linux Kernel Mailing List, Paul Mackerras, Bhupesh SHARMA,
linuxppc-dev
In-Reply-To: <87ef4eodwf.fsf@concordia.ellerman.id.au>
Hi Jonathan,
On Fri, May 31, 2019 at 8:44 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Jonathan Corbet <corbet@lwn.net> writes:
> > On Thu, 30 May 2019 18:37:46 +0530
> > Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >
> >> > This should probably go via the documentation tree?
> >> >
> >> > Acked-by: Michael Ellerman <mpe@ellerman.id.au>
> >>
> >> Thanks for the review Michael.
> >> I am ok with this going through the documentation tree as well.
> >
> > Works for me too, but I don't seem to find the actual patch anywhere I
> > look. Can you send me a copy?
>
> You can get it from lore:
>
> https://lore.kernel.org/linuxppc-dev/1559212177-7072-1-git-send-email-bhsharma@redhat.com/raw
>
> Or patchwork (automatically adds my ack):
>
> https://patchwork.ozlabs.org/patch/1107706/mbox/
>
> Or Bhupesh can send it to you :)
Please let me know if I should send out the patch again, this time
Cc'ing you and the doc-list.
Thanks,
Bhupesh
^ permalink raw reply
* Re: [PATCH] ASoC: fsl_esai: fix the channel swap issue after xrun
From: S.j. Wang @ 2019-06-05 10:29 UTC (permalink / raw)
To: Nicolin Chen, broonie@kernel.org
Cc: alsa-devel@alsa-project.org, timur@kernel.org,
Xiubo.Lee@gmail.com, festevam@gmail.com,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Hi
> > > > > Sounds like a bug to me...should fix it first by marking the
> > > > > data registers as volatile.
> > > > >
> > > > The ETDR is a writable register, it is not volatile. Even we
> > > > change it to Volatile, I don't think we can't avoid this issue.
> > > > for the regcache_sync Just to write this register, it is correct behavior.
> > >
> > > Is that so? Quoting the comments of regcache_sync():
> > > "* regcache_sync - Sync the register cache with the hardware.
> > > *
> > > * @map: map to configure.
> > > *
> > > * Any registers that should not be synced should be marked as
> > > * volatile."
> > >
> > > If regcache_sync() does sync volatile registers too as you said, I
> > > don't mind having this FIFO reset WAR for now, though I think this
> > > mismatch between the comments and the actual behavior then should
> get people's attention.
> > >
> > > Thank you
> >
> > ETDR is not volatile, if we mark it is volatile, is it correct?
>
> Well, you have a point -- it might not be ideally true, but it sounds like a
> correct fix to me according to this comments.
>
> We can wait for Mark's comments or just send a patch to the mail list for
> review.
>
> Thanks you
I test this patch, we don't need to reset the FIFO, and regcache_sync didn't
Write the ETDR even the EDTR is not volatile. This fault maybe caused by
Legacy, in the beginning we add this patch in internal branch, there maybe
Something cause this issue, but now can't reproduced.
So I will remove the reset of FIFO.
Best regards
Wang Shengjiu
^ permalink raw reply
* Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata
From: Michael Ellerman @ 2019-06-05 10:51 UTC (permalink / raw)
To: Aneesh Kumar K.V, Oliver; +Cc: Paul Mackerras, linuxppc-dev, Nicholas Piggin
In-Reply-To: <87ef49hg85.fsf@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> Oliver <oohall@gmail.com> writes:
>> On Sun, Jun 2, 2019 at 2:44 PM Aneesh Kumar K.V
>> <aneesh.kumar@linux.ibm.com> wrote:
...
>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>>> index 0176ce66673f..e33cebb8ee6c 100644
>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>> @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
>>> }
>>>
>>> static int papr_scm_meta_get(struct papr_scm_priv *p,
>>> - struct nd_cmd_get_config_data_hdr *hdr)
>>> + struct nd_cmd_get_config_data_hdr *hdr)
>>> {
>>> unsigned long data[PLPAR_HCALL_BUFSIZE];
>>> + unsigned long offset, data_offset;
>>> + int len, read;
>>> int64_t ret;
>>>
>>> - if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
>>> + if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>>> return -EINVAL;
>>>
>>> - ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>>> - hdr->in_offset, 1);
>>> -
>>> - if (ret == H_PARAMETER) /* bad DRC index */
>>> - return -ENODEV;
>>> - if (ret)
>>> - return -EINVAL; /* other invalid parameter */
>>> -
>>> - hdr->out_buf[0] = data[0] & 0xff;
>>> -
>>> + for (len = hdr->in_length; len; len -= read) {
>>> +
>>> + data_offset = hdr->in_length - len;
>>> + offset = hdr->in_offset + data_offset;
>>> +
>>> + if (len >= 8)
>>> + read = 8;
>>> + else if (len >= 4)
>>> + read = 4;
>>> + else if ( len >= 2)
>>> + read = 2;
>>> + else
>>> + read = 1;
>>> +
>>> + ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>>> + offset, read);
>>> +
>>> + if (ret == H_PARAMETER) /* bad DRC index */
>>> + return -ENODEV;
>>> + if (ret)
>>> + return -EINVAL; /* other invalid parameter */
>>> +
>>> + switch (read) {
>>> + case 8:
>>> + *(uint64_t *)(hdr->out_buf + data_offset) = be64_to_cpu(data[0]);
>>> + break;
>>> + case 4:
>>> + *(uint32_t *)(hdr->out_buf + data_offset) = be32_to_cpu(data[0] & 0xffffffff);
>>> + break;
...
>>
>> I assume you got the qemu bits sorted out with Shiva? Looks good otherwise.
>
> That is correct. I also tested with different xfer values (1, 2, 4, 8)
> on both Qemu and PowerVM.
With a big endian kernel?
cheers
^ permalink raw reply
* Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: David Hildenbrand @ 2019-06-05 10:58 UTC (permalink / raw)
To: Wei Yang
Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, linux-sh,
mike.travis@hpe.com, Greg Kroah-Hartman, Rafael J. Wysocki,
Mathieu Malaterre, linux-kernel, Arun KS, Ingo Molnar, linux-mm,
Andrew Banman, Qian Cai, Igor Mammedov, akpm, linuxppc-dev,
Dan Williams, linux-arm-kernel, Oscar Salvador
In-Reply-To: <f6523d67-cac9-1189-884a-67b6829320ba@redhat.com>
On 05.06.19 10:58, David Hildenbrand wrote:
>>> /*
>>> * For now, we have a linear search to go find the appropriate
>>> * memory_block corresponding to a particular phys_index. If
>>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block **memory, int block_id,
>>> unsigned long start_pfn;
>>> int ret = 0;
>>>
>>> + mem = find_memory_block_by_id(block_id, NULL);
>>> + if (mem) {
>>> + put_device(&mem->dev);
>>> + return -EEXIST;
>>> + }
>>
>> find_memory_block_by_id() is not that close to the main idea in this patch.
>> Would it be better to split this part?
>
> I played with that but didn't like the temporary results (e.g. having to
> export find_memory_block_by_id()). I'll stick to this for now.
>
>>
>>> mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>> if (!mem)
>>> return -ENOMEM;
>>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>> return 0;
>>> }
>>>
>>> +static void unregister_memory(struct memory_block *memory)
>>> +{
>>> + if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
>>> + return;
>>> +
>>> + /* drop the ref. we got via find_memory_block() */
>>> + put_device(&memory->dev);
>>> + device_unregister(&memory->dev);
>>> +}
>>> +
>>> /*
>>> - * need an interface for the VM to add new memory regions,
>>> - * but without onlining it.
>>> + * Create memory block devices for the given memory area. Start and size
>>> + * have to be aligned to memory block granularity. Memory block devices
>>> + * will be initialized as offline.
>>> */
>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>>> {
>>> - int block_id = base_memory_block_id(__section_nr(section));
>>> - int ret = 0;
>>> + const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>>> + int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>> struct memory_block *mem;
>>> + unsigned long block_id;
>>> + int ret = 0;
>>>
>>> - mutex_lock(&mem_sysfs_mutex);
>>> + if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>>> + !IS_ALIGNED(size, memory_block_size_bytes())))
>>> + return -EINVAL;
>>>
>>> - mem = find_memory_block(section);
>>> - if (mem) {
>>> - mem->section_count++;
>>> - put_device(&mem->dev);
>>> - } else {
>>> + mutex_lock(&mem_sysfs_mutex);
>>> + for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>> ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
>>> if (ret)
>>> - goto out;
>>> - mem->section_count++;
>>> + break;
>>> + mem->section_count = sections_per_block;
>>> + }
>>> + if (ret) {
>>> + end_block_id = block_id;
>>> + for (block_id = start_block_id; block_id != end_block_id;
>>> + block_id++) {
>>> + mem = find_memory_block_by_id(block_id, NULL);
>>> + mem->section_count = 0;
>>> + unregister_memory(mem);
>>> + }
>>> }
>>
>> Would it be better to do this in reverse order?
>>
>> And unregister_memory() would free mem, so it is still necessary to set
>> section_count to 0?
>
> 1. I kept the existing behavior (setting it to 0) for now. I am planning
> to eventually remove the section count completely (it could be
> beneficial to detect removing of partially populated memory blocks).
Correction: We already use it to block offlining of partially populated
memory blocks \o/
>
> 2. Reverse order: We would have to start with "block_id - 1", I don't
> like that better.
>
> Thanks for having a look!
>
--
Thanks,
David / dhildenb
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox