LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 21:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4iXyOUDZgqhWH1KCObvATL=gP55xEr64rsRfUuJg5B+eQ@mail.gmail.com>

On 01.05.20 22:12, Dan Williams wrote:
> On Fri, May 1, 2020 at 12:18 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 20:43, Dan Williams wrote:
>>> On Fri, May 1, 2020 at 11:14 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 01.05.20 20:03, Dan Williams wrote:
>>>>> On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 01.05.20 19:45, David Hildenbrand wrote:
>>>>>>> On 01.05.20 19:39, Dan Williams wrote:
>>>>>>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 01.05.20 18:56, Dan Williams wrote:
>>>>>>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>>>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I assume:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I assume only Andrew can enlighten us.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>>>>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>>>>>>>>> contain that memory after a reboot)
>>>>>>>>>>>>
>>>>>>>>>>>> For some reason that patch is misattributed - it was authored by
>>>>>>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>>>>>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>>>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>>>>>>>>> review comments.
>>>>>>>>>>>
>>>>>>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>>>>>>>>> clear what has to be done here. I will add some of these details to the
>>>>>>>>>>> patch description.
>>>>>>>>>>>
>>>>>>>>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>>>>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>>>>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>>>>>>>>> the MHP_DRIVER_MANAGED approach and
>>>>>>>>>>> 1. Don't create firmware memmap entries
>>>>>>>>>>> 2. Name the resource "System RAM (driver managed)"
>>>>>>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>>>>>>>>
>>>>>>>>>>> This way, kernel users and user space can figure out that this memory
>>>>>>>>>>> has different semantics and handle it accordingly - I think that was
>>>>>>>>>>> what Eric was asking for.
>>>>>>>>>>>
>>>>>>>>>>> Of course, open for suggestions.
>>>>>>>>>>
>>>>>>>>>> I'm still more of a fan of this being communicated by "System RAM"
>>>>>>>>>
>>>>>>>>> I was mentioning somewhere in this thread that "System RAM" inside a
>>>>>>>>> hierarchy (like dax/kmem) will already be basically ignored by
>>>>>>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
>>>>>>>>> special already.
>>>>>>>>>
>>>>>>>>> But after all, as we have to change kexec-tools either way, we can
>>>>>>>>> directly go ahead and flag it properly as special (in case there will
>>>>>>>>> ever be other cases where we could no longer distinguish it).
>>>>>>>>>
>>>>>>>>>> being parented especially because that tells you something about how
>>>>>>>>>> the memory is driver-managed and which mechanism might be in play.
>>>>>>>>>
>>>>>>>>> The could be communicated to some degree via the resource hierarchy.
>>>>>>>>>
>>>>>>>>> E.g.,
>>>>>>>>>
>>>>>>>>>             [root@localhost ~]# cat /proc/iomem
>>>>>>>>>             ...
>>>>>>>>>             140000000-33fffffff : Persistent Memory
>>>>>>>>>               140000000-1481fffff : namespace0.0
>>>>>>>>>               150000000-33fffffff : dax0.0
>>>>>>>>>                 150000000-33fffffff : System RAM (driver managed)
>>>>>>>>>
>>>>>>>>> vs.
>>>>>>>>>
>>>>>>>>>            :/# cat /proc/iomem
>>>>>>>>>             [...]
>>>>>>>>>             140000000-333ffffff : virtio-mem (virtio0)
>>>>>>>>>               140000000-147ffffff : System RAM (driver managed)
>>>>>>>>>               148000000-14fffffff : System RAM (driver managed)
>>>>>>>>>               150000000-157ffffff : System RAM (driver managed)
>>>>>>>>>
>>>>>>>>> Good enough for my taste.
>>>>>>>>>
>>>>>>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>>>>>>>>
>>>>>>>>> I really don't want any firmware memmap entries for something that is
>>>>>>>>> not part of the firmware provided memmap. In addition,
>>>>>>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>>>>>>>>> and two arm configs enable it at all.
>>>>>>>>>
>>>>>>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
>>>>>>>>
>>>>>>>> I think that's a policy decision and policy decisions do not belong in
>>>>>>>> the kernel. Give the tooling the opportunity to decide whether System
>>>>>>>> RAM stays that way over a kexec. The parenthetical reference otherwise
>>>>>>>> looks out of place to me in the /proc/iomem output. What makes it
>>>>>>>> "driver managed" is how the kernel handles it, not how the kernel
>>>>>>>> names it.
>>>>>>>
>>>>>>> At least, virtio-mem is different. It really *has to be handled* by the
>>>>>>> driver. This is not a policy. It's how it works.
>>>>>
>>>>> ...but that's not necessarily how dax/kmem works.
>>>>>
>>>>
>>>> Yes, and user space could still take that memory and add it to the
>>>> firmware memmap if it really wants to. It knows that it is special. It
>>>> can figure out that it belongs to a dax device using /proc/iomem.
>>>>
>>>>>>>
>>>>>>
>>>>>> Oh, and I don't see why "System RAM (driver managed)" would hinder any
>>>>>> policy in user case to still do what it thinks is the right thing to do
>>>>>> (e.g., for dax).
>>>>>>
>>>>>> "System RAM (driver managed)" would mean: Memory is not part of the raw
>>>>>> firmware memmap. It was detected and added by a driver. Handle with
>>>>>> care, this is special.
>>>>>
>>>>> Oh, no, I was more reacting to your, "don't update
>>>>> /sys/firmware/memmap for the (driver managed) range" choice as being a
>>>>> policy decision. It otherwise feels to me "System RAM (driver
>>>>> managed)" adds confusion for casual users of /proc/iomem and for clued
>>>>> in tools they have the parent association to decide policy.
>>>>
>>>> Not sure if I understand correctly, so bear with me :).
>>>>
>>>> Adding or not adding stuff to /sys/firmware/memmap is not a policy
>>>> decision. If it's not part of the raw firmware-provided memmap, it has
>>>> nothing to do in /sys/firmware/memmap. That's what the documentation
>>>> from 2008 tells us.
>>>
>>> It just occurs to me that there are valid cases for both wanting to
>>> start over with driver managed memory with a kexec and keeping it in
>>> the map.
>>
>> Yes, there might be valid cases. My gut feeling is that in the general
>> case, you want to let the kexec kernel implement a policy/ let the user
>> in the new system decide.
>>
>> But as I said, you can implement in kexec-tools whatever policy you
>> want. It has access to all information.
> 
> Right, so why is a new type needed if all the information is there by
> other means?

You mean "System RAM (driver managed)" in /proc/iomem? See below for more.

> 
>>> Consider the case of EFI Special Purpose (SP) Memory that is
>>> marked EFI Conventional Memory with the SP attribute. In that case the
>>> firmware memory map marked it as conventional RAM, but the kernel
>>> optionally marks it as System RAM vs Soft Reserved. The 2008 patch
>>> simply does not consider that case. I'm not sure strict textualism
>>> works for coding decisions.
>>
>> I am no expert on that matter (esp EFI). But looking at the users of
>> firmware_map_add_early(), the single user is in arch/x86/kernel/e820.c
>> . So the single source of /sys/firmware/memmap is (besides hotplug) e820.
>>
>> "'e820_table_firmware': the original firmware version passed to us by
>> the bootloader - not modified by the kernel. ... inform the user about
>> the firmware's notion of memory layout via /sys/firmware/memmap"
>> (arch/x86/kernel/e820.c)
>>
>> How is the EFI Special Purpose (SP) Memory represented in e820?
>> /sys/firmware/memmap is really simple: just dump in e820. No policies IIUC.
> 
> e820 now has a Soft Reserved translation for this which means "try to
> reserve, but treat as System RAM is ok too". It seems generically
> useful to me that the toggle for determining whether Soft Reserved or
> System RAM shows up /sys/firmware/memmap is a determination that
> policy can make. The kernel need not preemptively block it.

So, I think I have to clarify something here. We do have two ways to kexec

1. kexec_load(): User space (kexec-tools) crafts the memmap (e.g., using
/sys/firmware/memmap on x86-64) and selects memory where to place the
kexec images (e.g., using /proc/iomem)

2. kexec_file_load(): The kernel reuses the (basically) raw firmware
memmap and selects memory where to place kexec images.

We are talking about changing 1, to behave like 2 in regards to
dax/kmem. 2. does currently not add any hotplugged memory to the
fixed-up e820, and it should be fixed regarding hotplugged DIMMs that
would appear in e820 after a reboot.

Now, all these policy discussions are nice and fun, but I don't really
see a good reason to (ab)use /sys/firmware/memmap for that (e.g., parent
properties). If you want to be able to make this configurable, then
e.g., add a way to configure this in the kernel (for example along with
kmem) to make 1. and 2. behave the same way. Otherwise, you really only
can change 1.


Now, let's clarify what I want regarding virtio-mem:

1. kexec should not add virtio-mem memory to the initial firmware
   memmap. The driver has to be in charge as discussed.
2. kexec should not place kexec images onto virtio-mem memory. That
   would end badly.
3. kexec should still dump virtio-mem memory via kdump.

This has to work when using kexec_load() or kexec_file_load(). This has
to theoretically work on different architectures (especially, without
/sys/firmware/memmap). kexec-tools has to have access to that
information to figure out what to do.

Regarding 1:
- kexec_file_load(): works out of the box currently.
- kexec_load(): Don't create entries in /sys/firmware/memmap (for
  reasons discussed)
Regarding 2:
- kexec_file_load(): tag the resources as IORESOURCE_MEM_DRIVER_MANAGED
  (inspired by Eric)
- kexec_load(): indicate the memory as "System RAM (driver managed)"
Regarding 3:
- Same as 2. kexec-tools need to be thought to properly consider the
  memory during kdump.

Now, you are asking, "why System RAM (driver managed)". I don't think
it's strictly needed right now, but it feels cleaner. E.g., for
virtio-mem the current plan is to have /proc/iomem look like

           :/# cat /proc/iomem
            [...]
            140000000-333ffffff : virtio-mem (virtio0)
              140000000-147ffffff : System RAM (driver managed)
              148000000-14fffffff : System RAM (driver managed)
              150000000-157ffffff : System RAM (driver managed)

One could judge by looking at the hierarchy, that this memory is
special. kexec-tools will skip it currently in either form.

If we all agree here, that we can drop it, then let's drop it,
especially if it would allow dax/kmem to use the same mechanism I am
proposing here for virtio-mem.


Now, it would be fairly simple to add a config option for dax/kmem,
making it configurable in the kernel, whether to add memory via
MHP_DRIVER_MANAGED or just as we do now. It would contradict with the
"raw firmware/prov..." description of /sys/firmware/memmap, but hey,
somebody explicitly configured it, so it can't be wrong.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 21:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <8242c0c5-2df2-fc0c-079a-3be62c113a11@redhat.com>

On Fri, May 1, 2020 at 2:11 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 22:12, Dan Williams wrote:
[..]
> >>> Consider the case of EFI Special Purpose (SP) Memory that is
> >>> marked EFI Conventional Memory with the SP attribute. In that case the
> >>> firmware memory map marked it as conventional RAM, but the kernel
> >>> optionally marks it as System RAM vs Soft Reserved. The 2008 patch
> >>> simply does not consider that case. I'm not sure strict textualism
> >>> works for coding decisions.
> >>
> >> I am no expert on that matter (esp EFI). But looking at the users of
> >> firmware_map_add_early(), the single user is in arch/x86/kernel/e820.c
> >> . So the single source of /sys/firmware/memmap is (besides hotplug) e820.
> >>
> >> "'e820_table_firmware': the original firmware version passed to us by
> >> the bootloader - not modified by the kernel. ... inform the user about
> >> the firmware's notion of memory layout via /sys/firmware/memmap"
> >> (arch/x86/kernel/e820.c)
> >>
> >> How is the EFI Special Purpose (SP) Memory represented in e820?
> >> /sys/firmware/memmap is really simple: just dump in e820. No policies IIUC.
> >
> > e820 now has a Soft Reserved translation for this which means "try to
> > reserve, but treat as System RAM is ok too". It seems generically
> > useful to me that the toggle for determining whether Soft Reserved or
> > System RAM shows up /sys/firmware/memmap is a determination that
> > policy can make. The kernel need not preemptively block it.
>
> So, I think I have to clarify something here. We do have two ways to kexec
>
> 1. kexec_load(): User space (kexec-tools) crafts the memmap (e.g., using
> /sys/firmware/memmap on x86-64) and selects memory where to place the
> kexec images (e.g., using /proc/iomem)
>
> 2. kexec_file_load(): The kernel reuses the (basically) raw firmware
> memmap and selects memory where to place kexec images.
>
> We are talking about changing 1, to behave like 2 in regards to
> dax/kmem. 2. does currently not add any hotplugged memory to the
> fixed-up e820, and it should be fixed regarding hotplugged DIMMs that
> would appear in e820 after a reboot.
>
> Now, all these policy discussions are nice and fun, but I don't really
> see a good reason to (ab)use /sys/firmware/memmap for that (e.g., parent
> properties). If you want to be able to make this configurable, then
> e.g., add a way to configure this in the kernel (for example along with
> kmem) to make 1. and 2. behave the same way. Otherwise, you really only
> can change 1.

That's clearer.

>
>
> Now, let's clarify what I want regarding virtio-mem:
>
> 1. kexec should not add virtio-mem memory to the initial firmware
>    memmap. The driver has to be in charge as discussed.
> 2. kexec should not place kexec images onto virtio-mem memory. That
>    would end badly.
> 3. kexec should still dump virtio-mem memory via kdump.

Ok, but then seems to say to me that dax/kmem is a different type of
(driver managed) than virtio-mem and it's confusing to try to apply
the same meaning. Why not just call your type for the distinct type it
is "System RAM (virtio-mem)" and let any other driver managed memory
follow the same "System RAM ($driver)" format if it wants?

^ permalink raw reply

* Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc
From: Andrew Morton @ 2020-05-01 22:09 UTC (permalink / raw)
  To: John Dorminy
  Cc: linux-hyperv, David Airlie, dri-devel, Michael Kelley, linux-mm,
	K. Y. Srinivasan, Sumit Semwal, linux-arch, linux-s390, Wei Liu,
	Stephen Hemminger, x86, Christoph Hellwig, Peter Zijlstra,
	Gao Xiang, Laura Abbott, Nitin Gupta, Daniel Vetter,
	Haiyang Zhang, linaro-mm-sig, linux-arm-kernel, Robin Murphy,
	Linux Kernel Mailing List, Minchan Kim, iommu, Sakari Ailus, bpf,
	linuxppc-dev
In-Reply-To: <CAMeeMh_9N0ORhPM8EmkGeeuiDoQY3+QoAPX5QBuK7=gsC5ONng@mail.gmail.com>

On Thu, 30 Apr 2020 22:38:10 -0400 John Dorminy <jdorminy@redhat.com> wrote:

> the change
> description refers to PROT_KERNEL, which is a symbol which does not
> appear to exist; perhaps PAGE_KERNEL was meant?

Yes, thanks, fixed.

^ permalink raw reply

* Re: [PATCH v8 4/7] perf/tools: Enhance JSON/metric infrastructure to handle "?"
From: Ian Rogers @ 2020-05-01 15:56 UTC (permalink / raw)
  To: Kajol Jain
  Cc: Mark Rutland, maddy, Peter Zijlstra, Jin Yao, Ingo Molnar,
	Liang, Kan, Andi Kleen, Alexander Shishkin, Anju T Sudhakar,
	mamatha4, sukadev, Ravi Bangoria, Arnaldo Carvalho de Melo,
	jmario, Namhyung Kim, Thomas Gleixner, Michael Petlan,
	Greg Kroah-Hartman, LKML, linux-perf-users, Jiri Olsa,
	linuxppc-dev
In-Reply-To: <20200401203340.31402-5-kjain@linux.ibm.com>

On Wed, Apr 1, 2020 at 1:35 PM Kajol Jain <kjain@linux.ibm.com> wrote:
>
> Patch enhances current metric infrastructure to handle "?" in the metric
> expression. The "?" can be use for parameters whose value not known while
> creating metric events and which can be replace later at runtime to
> the proper value. It also add flexibility to create multiple events out
> of single metric event added in json file.
>
> Patch adds function 'arch_get_runtimeparam' which is a arch specific
> function, returns the count of metric events need to be created.
> By default it return 1.

Sorry for the slow response, I was trying to understand this patch in
relation to the PMU aliases to see if there was an overlap - I'm still
not sure. This is now merged so I'm just commenting wrt possible
future cleanup. I defer to the maintainers on how this should be
organized. At the metric level, this problem reminds me of both
#smt_on and LLC_MISSES.PCIE_WRITE on cascade lake. #smt_on adds a
degree of CPU specific behavior to an expression.
LLC_MISSES.PCIE_WRITE uses .part0 ... part3 to combine separate but
related counters.
The symbols that the metrics parse are then passed to parse-event. You
don't change parse-event as metricgroup replaces the '?' with a read
value from /devices/hv_24x7/interface/sockets, actually 0 to that
value-1 are passed.

It seems unfortunate to overload the meaning of runtime with a value
read from /devices/hv_24x7/interface/sockets and plumbing this value
around is quite a bit of noise for everything but this use-case. I
kind of wish we could do something like:

for i in 0, read("/devices/hv_24x7/interface/sockets"):
  hv_24x7/pm_pb_cyc,chip=$i

in the metric code. I have some patches to send related to metric
groups and I think this will be an active area of development for a
while as I think there are some open questions on the organization of
the code.

Thanks,
Ian

> This infrastructure needed for hv_24x7 socket/chip level events.
> "hv_24x7" chip level events needs specific chip-id to which the
> data is requested. Function 'arch_get_runtimeparam' implemented
> in header.c which extract number of sockets from sysfs file
> "sockets" under "/sys/devices/hv_24x7/interface/".
>
> With this patch basically we are trying to create as many metric events
> as define by runtime_param.
>
> For that one loop is added in function 'metricgroup__add_metric',
> which create multiple events at run time depend on return value of
> 'arch_get_runtimeparam' and merge that event in 'group_list'.
>
> To achieve that we are actually passing this parameter value as part of
> `expr__find_other` function and changing "?" present in metric expression
> with this value.
>
> As in our json file, there gonna be single metric event, and out of
> which we are creating multiple events.
>
> To understand which data count belongs to which parameter value,
> we also printing param value in generic_metric function.
>
> For example,
> command:# ./perf stat  -M PowerBUS_Frequency -C 0 -I 1000
>      1.000101867          9,356,933      hv_24x7/pm_pb_cyc,chip=0/ #      2.3 GHz  PowerBUS_Frequency_0
>      1.000101867          9,366,134      hv_24x7/pm_pb_cyc,chip=1/ #      2.3 GHz  PowerBUS_Frequency_1
>      2.000314878          9,365,868      hv_24x7/pm_pb_cyc,chip=0/ #      2.3 GHz  PowerBUS_Frequency_0
>      2.000314878          9,366,092      hv_24x7/pm_pb_cyc,chip=1/ #      2.3 GHz  PowerBUS_Frequency_1
>
> So, here _0 and _1 after PowerBUS_Frequency specify parameter value.
>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
>  tools/perf/arch/powerpc/util/header.c |  8 ++++++++
>  tools/perf/tests/expr.c               |  8 ++++----
>  tools/perf/util/expr.c                | 11 ++++++-----
>  tools/perf/util/expr.h                |  5 +++--
>  tools/perf/util/expr.l                | 27 +++++++++++++++++++-------
>  tools/perf/util/metricgroup.c         | 28 ++++++++++++++++++++++++---
>  tools/perf/util/metricgroup.h         |  2 ++
>  tools/perf/util/stat-shadow.c         | 17 ++++++++++------
>  8 files changed, 79 insertions(+), 27 deletions(-)
>
> diff --git a/tools/perf/arch/powerpc/util/header.c b/tools/perf/arch/powerpc/util/header.c
> index 3b4cdfc5efd6..d4870074f14c 100644
> --- a/tools/perf/arch/powerpc/util/header.c
> +++ b/tools/perf/arch/powerpc/util/header.c
> @@ -7,6 +7,8 @@
>  #include <string.h>
>  #include <linux/stringify.h>
>  #include "header.h"
> +#include "metricgroup.h"
> +#include <api/fs/fs.h>
>
>  #define mfspr(rn)       ({unsigned long rval; \
>                          asm volatile("mfspr %0," __stringify(rn) \
> @@ -44,3 +46,9 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
>
>         return bufp;
>  }
> +
> +int arch_get_runtimeparam(void)
> +{
> +       int count;
> +       return sysfs__read_int("/devices/hv_24x7/interface/sockets", &count) < 0 ? 1 : count;
> +}
> diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
> index ea10fc4412c4..516504cf0ea5 100644
> --- a/tools/perf/tests/expr.c
> +++ b/tools/perf/tests/expr.c
> @@ -10,7 +10,7 @@ static int test(struct expr_parse_ctx *ctx, const char *e, double val2)
>  {
>         double val;
>
> -       if (expr__parse(&val, ctx, e))
> +       if (expr__parse(&val, ctx, e, 1))
>                 TEST_ASSERT_VAL("parse test failed", 0);
>         TEST_ASSERT_VAL("unexpected value", val == val2);
>         return 0;
> @@ -44,15 +44,15 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
>                 return ret;
>
>         p = "FOO/0";
> -       ret = expr__parse(&val, &ctx, p);
> +       ret = expr__parse(&val, &ctx, p, 1);
>         TEST_ASSERT_VAL("division by zero", ret == -1);
>
>         p = "BAR/";
> -       ret = expr__parse(&val, &ctx, p);
> +       ret = expr__parse(&val, &ctx, p, 1);
>         TEST_ASSERT_VAL("missing operand", ret == -1);
>
>         TEST_ASSERT_VAL("find other",
> -                       expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", &other, &num_other) == 0);
> +                       expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", &other, &num_other, 1) == 0);
>         TEST_ASSERT_VAL("find other", num_other == 3);
>         TEST_ASSERT_VAL("find other", !strcmp(other[0], "BAR"));
>         TEST_ASSERT_VAL("find other", !strcmp(other[1], "BAZ"));
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index c3382d58cf40..aa631e37ad1e 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -27,10 +27,11 @@ void expr__ctx_init(struct expr_parse_ctx *ctx)
>
>  static int
>  __expr__parse(double *val, struct expr_parse_ctx *ctx, const char *expr,
> -             int start)
> +             int start, int runtime)
>  {
>         struct expr_scanner_ctx scanner_ctx = {
>                 .start_token = start,
> +               .runtime = runtime,
>         };
>         YY_BUFFER_STATE buffer;
>         void *scanner;
> @@ -54,9 +55,9 @@ __expr__parse(double *val, struct expr_parse_ctx *ctx, const char *expr,
>         return ret;
>  }
>
> -int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr)
> +int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr, int runtime)
>  {
> -       return __expr__parse(final_val, ctx, expr, EXPR_PARSE) ? -1 : 0;
> +       return __expr__parse(final_val, ctx, expr, EXPR_PARSE, runtime) ? -1 : 0;
>  }
>
>  static bool
> @@ -74,13 +75,13 @@ already_seen(const char *val, const char *one, const char **other,
>  }
>
>  int expr__find_other(const char *expr, const char *one, const char ***other,
> -                    int *num_other)
> +                    int *num_other, int runtime)
>  {
>         int err, i = 0, j = 0;
>         struct expr_parse_ctx ctx;
>
>         expr__ctx_init(&ctx);
> -       err = __expr__parse(NULL, &ctx, expr, EXPR_OTHER);
> +       err = __expr__parse(NULL, &ctx, expr, EXPR_OTHER, runtime);
>         if (err)
>                 return -1;
>
> diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
> index 0938ad166ece..87d627bb699b 100644
> --- a/tools/perf/util/expr.h
> +++ b/tools/perf/util/expr.h
> @@ -17,12 +17,13 @@ struct expr_parse_ctx {
>
>  struct expr_scanner_ctx {
>         int start_token;
> +       int runtime;
>  };
>
>  void expr__ctx_init(struct expr_parse_ctx *ctx);
>  void expr__add_id(struct expr_parse_ctx *ctx, const char *id, double val);
> -int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr);
> +int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr, int runtime);
>  int expr__find_other(const char *expr, const char *one, const char ***other,
> -               int *num_other);
> +               int *num_other, int runtime);
>
>  #endif
> diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
> index 2582c2464938..74b9b59b1aa5 100644
> --- a/tools/perf/util/expr.l
> +++ b/tools/perf/util/expr.l
> @@ -35,7 +35,7 @@ static int value(yyscan_t scanner, int base)
>   * Allow @ instead of / to be able to specify pmu/event/ without
>   * conflicts with normal division.
>   */
> -static char *normalize(char *str)
> +static char *normalize(char *str, int runtime)
>  {
>         char *ret = str;
>         char *dst = str;
> @@ -45,6 +45,19 @@ static char *normalize(char *str)
>                         *dst++ = '/';
>                 else if (*str == '\\')
>                         *dst++ = *++str;
> +                else if (*str == '?') {
> +                       char *paramval;
> +                       int i = 0;
> +                       int size = asprintf(&paramval, "%d", runtime);
> +
> +                       if (size < 0)
> +                               *dst++ = '0';
> +                       else {
> +                               while (i < size)
> +                                       *dst++ = paramval[i++];
> +                               free(paramval);
> +                       }
> +               }
>                 else
>                         *dst++ = *str;
>                 str++;
> @@ -54,16 +67,16 @@ static char *normalize(char *str)
>         return ret;
>  }
>
> -static int str(yyscan_t scanner, int token)
> +static int str(yyscan_t scanner, int token, int runtime)
>  {
>         YYSTYPE *yylval = expr_get_lval(scanner);
>         char *text = expr_get_text(scanner);
>
> -       yylval->str = normalize(strdup(text));
> +       yylval->str = normalize(strdup(text), runtime);
>         if (!yylval->str)
>                 return EXPR_ERROR;
>
> -       yylval->str = normalize(yylval->str);
> +       yylval->str = normalize(yylval->str, runtime);
>         return token;
>  }
>  %}
> @@ -72,8 +85,8 @@ number                [0-9]+
>
>  sch            [-,=]
>  spec           \\{sch}
> -sym            [0-9a-zA-Z_\.:@]+
> -symbol         {spec}*{sym}*{spec}*{sym}*
> +sym            [0-9a-zA-Z_\.:@?]+
> +symbol         {spec}*{sym}*{spec}*{sym}*{spec}*{sym}
>
>  %%
>         struct expr_scanner_ctx *sctx = expr_get_extra(yyscanner);
> @@ -93,7 +106,7 @@ if           { return IF; }
>  else           { return ELSE; }
>  #smt_on                { return SMT_ON; }
>  {number}       { return value(yyscanner, 10); }
> -{symbol}       { return str(yyscanner, ID); }
> +{symbol}       { return str(yyscanner, ID, sctx->runtime); }
>  "|"            { return '|'; }
>  "^"            { return '^'; }
>  "&"            { return '&'; }
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 7ad81c8177ea..b071df373f8b 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -90,6 +90,7 @@ struct egroup {
>         const char *metric_name;
>         const char *metric_expr;
>         const char *metric_unit;
> +       int runtime;
>  };
>
>  static struct evsel *find_evsel_group(struct evlist *perf_evlist,
> @@ -202,6 +203,7 @@ static int metricgroup__setup_events(struct list_head *groups,
>                 expr->metric_name = eg->metric_name;
>                 expr->metric_unit = eg->metric_unit;
>                 expr->metric_events = metric_events;
> +               expr->runtime = eg->runtime;
>                 list_add(&expr->nd, &me->head);
>         }
>
> @@ -485,15 +487,20 @@ static bool metricgroup__has_constraint(struct pmu_event *pe)
>         return false;
>  }
>
> +int __weak arch_get_runtimeparam(void)
> +{
> +       return 1;
> +}
> +
>  static int __metricgroup__add_metric(struct strbuf *events,
> -                       struct list_head *group_list, struct pmu_event *pe)
> +               struct list_head *group_list, struct pmu_event *pe, int runtime)
>  {
>
>         const char **ids;
>         int idnum;
>         struct egroup *eg;
>
> -       if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum) < 0)
> +       if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, runtime) < 0)
>                 return -EINVAL;
>
>         if (events->len > 0)
> @@ -513,6 +520,7 @@ static int __metricgroup__add_metric(struct strbuf *events,
>         eg->metric_name = pe->metric_name;
>         eg->metric_expr = pe->metric_expr;
>         eg->metric_unit = pe->unit;
> +       eg->runtime = runtime;
>         list_add_tail(&eg->nd, group_list);
>
>         return 0;
> @@ -540,7 +548,21 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
>
>                         pr_debug("metric expr %s for %s\n", pe->metric_expr, pe->metric_name);
>
> -                       ret = __metricgroup__add_metric(events, group_list, pe);
> +                       if (!strstr(pe->metric_expr, "?")) {
> +                               ret = __metricgroup__add_metric(events, group_list, pe, 1);
> +                       } else {
> +                               int j, count;
> +
> +                               count = arch_get_runtimeparam();
> +
> +                               /* This loop is added to create multiple
> +                                * events depend on count value and add
> +                                * those events to group_list.
> +                                */
> +
> +                               for (j = 0; j < count; j++)
> +                                       ret = __metricgroup__add_metric(events, group_list, pe, j);
> +                       }
>                         if (ret == -ENOMEM)
>                                 break;
>                 }
> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> index 475c7f912864..6b09eb30b4ec 100644
> --- a/tools/perf/util/metricgroup.h
> +++ b/tools/perf/util/metricgroup.h
> @@ -22,6 +22,7 @@ struct metric_expr {
>         const char *metric_name;
>         const char *metric_unit;
>         struct evsel **metric_events;
> +       int runtime;
>  };
>
>  struct metric_event *metricgroup__lookup(struct rblist *metric_events,
> @@ -34,4 +35,5 @@ int metricgroup__parse_groups(const struct option *opt,
>  void metricgroup__print(bool metrics, bool groups, char *filter,
>                         bool raw, bool details);
>  bool metricgroup__has_metric(const char *metric);
> +int arch_get_runtimeparam(void);
>  #endif
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 402af3e8d287..cf353ca591a5 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -336,7 +336,7 @@ void perf_stat__collect_metric_expr(struct evlist *evsel_list)
>                 metric_events = counter->metric_events;
>                 if (!metric_events) {
>                         if (expr__find_other(counter->metric_expr, counter->name,
> -                                               &metric_names, &num_metric_names) < 0)
> +                                               &metric_names, &num_metric_names, 1) < 0)
>                                 continue;
>
>                         metric_events = calloc(sizeof(struct evsel *),
> @@ -723,6 +723,7 @@ static void generic_metric(struct perf_stat_config *config,
>                            char *name,
>                            const char *metric_name,
>                            const char *metric_unit,
> +                          int runtime,
>                            double avg,
>                            int cpu,
>                            struct perf_stat_output_ctx *out,
> @@ -777,7 +778,7 @@ static void generic_metric(struct perf_stat_config *config,
>         }
>
>         if (!metric_events[i]) {
> -               if (expr__parse(&ratio, &pctx, metric_expr) == 0) {
> +               if (expr__parse(&ratio, &pctx, metric_expr, runtime) == 0) {
>                         char *unit;
>                         char metric_bf[64];
>
> @@ -786,9 +787,13 @@ static void generic_metric(struct perf_stat_config *config,
>                                         &unit, &scale) >= 0) {
>                                         ratio *= scale;
>                                 }
> -
> -                               scnprintf(metric_bf, sizeof(metric_bf),
> +                               if (strstr(metric_expr, "?"))
> +                                       scnprintf(metric_bf, sizeof(metric_bf),
> +                                         "%s  %s_%d", unit, metric_name, runtime);
> +                               else
> +                                       scnprintf(metric_bf, sizeof(metric_bf),
>                                           "%s  %s", unit, metric_name);
> +
>                                 print_metric(config, ctxp, NULL, "%8.1f",
>                                              metric_bf, ratio);
>                         } else {
> @@ -1019,7 +1024,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
>                         print_metric(config, ctxp, NULL, NULL, name, 0);
>         } else if (evsel->metric_expr) {
>                 generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
> -                               evsel->metric_name, NULL, avg, cpu, out, st);
> +                               evsel->metric_name, NULL, 1, avg, cpu, out, st);
>         } else if (runtime_stat_n(st, STAT_NSECS, 0, cpu) != 0) {
>                 char unit = 'M';
>                 char unit_buf[10];
> @@ -1048,7 +1053,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
>                                 out->new_line(config, ctxp);
>                         generic_metric(config, mexp->metric_expr, mexp->metric_events,
>                                         evsel->name, mexp->metric_name,
> -                                       mexp->metric_unit, avg, cpu, out, st);
> +                                       mexp->metric_unit, mexp->runtime, avg, cpu, out, st);
>                 }
>         }
>         if (num == 0)
> --
> 2.21.0
>

^ permalink raw reply

* Re: 5.7-rc interrupt_return Unrecoverable exception 380
From: Nicholas Piggin @ 2020-05-02  2:40 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Michal Suchanek, linuxppc-dev
In-Reply-To: <alpine.LSU.2.11.2005011253250.3734@eggly.anvils>

Excerpts from Hugh Dickins's message of May 2, 2020 6:38 am:
> Hi Nick,
> 
> I've been getting an "Unrecoverable exception 380" after a few hours
> of load on the G5 (yes, that G5!) with 5.7-rc: when interrupt_return
> checks lazy_irq_pending, it crashes at check_preemption_disabled+0x24
> with CONFIG_DEBUG_PREEMPT=y.
> 
> check_preemption_disabled():
> lib/smp_processor_id.c:13
>    0:	7c 08 02 a6 	mflr    r0
>    4:	fb e1 ff f8 	std     r31,-8(r1)
>    8:	fb 61 ff d8 	std     r27,-40(r1)
>    c:	fb 81 ff e0 	std     r28,-32(r1)
>   10:	fb a1 ff e8 	std     r29,-24(r1)
>   14:	fb c1 ff f0 	std     r30,-16(r1)
> get_current():
> arch/powerpc/include/asm/current.h:20
>   18:	eb ed 01 88 	ld      r31,392(r13)
> check_preemption_disabled():
> lib/smp_processor_id.c:13
>   1c:	f8 01 00 10 	std     r0,16(r1)
>   20:	f8 21 ff 61 	stdu    r1,-160(r1)
> __read_once_size():
> include/linux/compiler.h:199
>   24:	81 3f 00 00 	lwz     r9,0(r31)
> check_preemption_disabled():
> lib/smp_processor_id.c:14
>   28:	a3 cd 00 02 	lhz     r30,2(r13)
> 
> I don't read ppc assembly, and have not jotted down the registers,
> but hope you can make sense of it. I get around it with the patch
> below (just avoiding the debug), but have no idea whether it's a
> necessary fix or a hacky workaround.

Hi Hugh,

Thanks for the report, nice catch. Your fix is actually the correct one 
(well, we probably want a __lazy_irq_pending() variant which is to be 
used in these cases).

Problem is MSR[RI] is cleared here, ready to do the last few things for 
interrupt return where we're not allowed to take any other interrupts.

SLB interrupts can happen just about anywhere aside from kernel text, 
global variables, and stack. When that hits, it appears to be 
unrecoverable due to RI=0.

We could clear just MSR[EE] for asynchronous interrupts, then check 
lazy_irq_pending(), and then clear MSR[RI] ready to return, and the
SLB miss in the debug check would be fine. But that's two mtmsr 
instructions, which is slower. So we'll skip the check.

I tested hash, and preempt, possibly even preempt+hash, but clearly not 
preempt+preempt_debug+hash+slb thrashing!

Thanks,
Nick

> 
> Hugh
> 
> --- 5.7-rc3/arch/powerpc/include/asm/hw_irq.h	2020-04-12 16:24:29.802769727 -0700
> +++ linux/arch/powerpc/include/asm/hw_irq.h	2020-04-27 11:31:10.000000000 -0700
> @@ -252,7 +252,7 @@ static inline bool arch_irqs_disabled(vo
>  
>  static inline bool lazy_irq_pending(void)
>  {
> -	return !!(get_paca()->irq_happened & ~PACA_IRQ_HARD_DIS);
> +	return !!(local_paca->irq_happened & ~PACA_IRQ_HARD_DIS);
>  }
>  
>  /*
> 

^ permalink raw reply

* [PATCH] powerpc: Drop CONFIG_MTD_M25P80 in 5xx-hw.config
From: Bin Meng @ 2020-05-02  4:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: Bin Meng

From: Bin Meng <bin.meng@windriver.com>

Drop CONFIG_MTD_M25P80 that was removed in
commit b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")

Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 arch/powerpc/configs/85xx-hw.config | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/configs/85xx-hw.config b/arch/powerpc/configs/85xx-hw.config
index b507df6..524db76 100644
--- a/arch/powerpc/configs/85xx-hw.config
+++ b/arch/powerpc/configs/85xx-hw.config
@@ -67,7 +67,6 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_MTD_CFI_INTELEXT=y
 CONFIG_MTD_CFI=y
 CONFIG_MTD_CMDLINE_PARTS=y
-CONFIG_MTD_M25P80=y
 CONFIG_MTD_NAND_FSL_ELBC=y
 CONFIG_MTD_NAND_FSL_IFC=y
 CONFIG_MTD_RAW_NAND=y
-- 
2.7.4


^ permalink raw reply related

* [PATCH v2] powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config
From: Bin Meng @ 2020-05-02  4:44 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: Bin Meng

From: Bin Meng <bin.meng@windriver.com>

Drop CONFIG_MTD_M25P80 that was removed in
commit b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")

Signed-off-by: Bin Meng <bin.meng@windriver.com>

---

Changes in v2:
- correct the typo (5xx => 85xx) in the commit title

 arch/powerpc/configs/85xx-hw.config | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/configs/85xx-hw.config b/arch/powerpc/configs/85xx-hw.config
index b507df6..524db76 100644
--- a/arch/powerpc/configs/85xx-hw.config
+++ b/arch/powerpc/configs/85xx-hw.config
@@ -67,7 +67,6 @@ CONFIG_MTD_CFI_AMDSTD=y
 CONFIG_MTD_CFI_INTELEXT=y
 CONFIG_MTD_CFI=y
 CONFIG_MTD_CMDLINE_PARTS=y
-CONFIG_MTD_M25P80=y
 CONFIG_MTD_NAND_FSL_ELBC=y
 CONFIG_MTD_NAND_FSL_IFC=y
 CONFIG_MTD_RAW_NAND=y
-- 
2.7.4


^ permalink raw reply related

* [powerpc:topic/uaccess-ppc] BUILD SUCCESS 4fe5cda9f89d0aea8e915b7c96ae34bda4e12e51
From: kbuild test robot @ 2020-05-02  8:56 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  topic/uaccess-ppc
branch HEAD: 4fe5cda9f89d0aea8e915b7c96ae34bda4e12e51  powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin

elapsed time: 533m

configs tested: 216
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64                            allyesconfig
arm                              allyesconfig
arm64                            allmodconfig
arm                              allmodconfig
arm64                             allnoconfig
arm                               allnoconfig
arm                           efm32_defconfig
arm                         at91_dt_defconfig
arm                        shmobile_defconfig
arm64                               defconfig
arm                          exynos_defconfig
arm                        multi_v5_defconfig
arm                           sunxi_defconfig
arm                        multi_v7_defconfig
arc                                 defconfig
mips                            ar7_defconfig
mips                             allmodconfig
nios2                         3c120_defconfig
sparc64                             defconfig
csky                                defconfig
sh                          rsk7269_defconfig
ia64                              allnoconfig
i386                              allnoconfig
i386                             allyesconfig
i386                             alldefconfig
i386                                defconfig
i386                              debian-10.3
ia64                             allmodconfig
ia64                                defconfig
ia64                        generic_defconfig
ia64                          tiger_defconfig
ia64                         bigsur_defconfig
ia64                             allyesconfig
ia64                             alldefconfig
m68k                       m5475evb_defconfig
m68k                             allmodconfig
m68k                       bvme6000_defconfig
m68k                           sun3_defconfig
m68k                          multi_defconfig
nios2                         10m50_defconfig
c6x                        evmc6678_defconfig
c6x                              allyesconfig
openrisc                 simple_smp_defconfig
openrisc                    or1ksim_defconfig
nds32                               defconfig
nds32                             allnoconfig
alpha                               defconfig
h8300                       h8s-sim_defconfig
h8300                     edosk2674_defconfig
xtensa                          iss_defconfig
h8300                    h8300h-sim_defconfig
xtensa                       common_defconfig
arc                              allyesconfig
microblaze                      mmu_defconfig
microblaze                    nommu_defconfig
mips                      fuloong2e_defconfig
mips                      malta_kvm_defconfig
mips                             allyesconfig
mips                         64r6el_defconfig
mips                              allnoconfig
mips                           32r2_defconfig
mips                malta_kvm_guest_defconfig
mips                         tb0287_defconfig
mips                       capcella_defconfig
mips                           ip32_defconfig
mips                  decstation_64_defconfig
mips                      loongson3_defconfig
mips                          ath79_defconfig
mips                        bcm63xx_defconfig
parisc                            allnoconfig
parisc                generic-64bit_defconfig
parisc                generic-32bit_defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                      chrp32_defconfig
powerpc                             defconfig
powerpc                       holly_defconfig
powerpc                       ppc64_defconfig
powerpc                          rhel-kconfig
powerpc                           allnoconfig
powerpc                  mpc866_ads_defconfig
powerpc                    amigaone_defconfig
powerpc                    adder875_defconfig
powerpc                     ep8248e_defconfig
powerpc                          g5_defconfig
powerpc                     mpc512x_defconfig
m68k                 randconfig-a001-20200502
mips                 randconfig-a001-20200502
nds32                randconfig-a001-20200502
alpha                randconfig-a001-20200502
parisc               randconfig-a001-20200502
riscv                randconfig-a001-20200502
parisc               randconfig-a001-20200430
mips                 randconfig-a001-20200430
m68k                 randconfig-a001-20200430
riscv                randconfig-a001-20200430
alpha                randconfig-a001-20200430
nds32                randconfig-a001-20200430
h8300                randconfig-a001-20200502
nios2                randconfig-a001-20200502
microblaze           randconfig-a001-20200502
c6x                  randconfig-a001-20200502
sparc64              randconfig-a001-20200502
microblaze           randconfig-a001-20200430
nios2                randconfig-a001-20200430
h8300                randconfig-a001-20200430
c6x                  randconfig-a001-20200430
sparc64              randconfig-a001-20200430
s390                 randconfig-a001-20200502
xtensa               randconfig-a001-20200502
sh                   randconfig-a001-20200502
openrisc             randconfig-a001-20200502
csky                 randconfig-a001-20200502
i386                 randconfig-b003-20200501
x86_64               randconfig-b002-20200501
i386                 randconfig-b001-20200501
x86_64               randconfig-b003-20200501
x86_64               randconfig-b001-20200501
i386                 randconfig-b002-20200501
i386                 randconfig-b003-20200502
i386                 randconfig-b001-20200502
x86_64               randconfig-b003-20200502
x86_64               randconfig-b001-20200502
i386                 randconfig-b002-20200502
i386                 randconfig-d003-20200502
i386                 randconfig-d001-20200502
x86_64               randconfig-d002-20200502
i386                 randconfig-d002-20200502
x86_64               randconfig-d002-20200430
x86_64               randconfig-d001-20200430
i386                 randconfig-d001-20200430
i386                 randconfig-d003-20200430
i386                 randconfig-d002-20200430
x86_64               randconfig-d003-20200430
x86_64               randconfig-e003-20200502
i386                 randconfig-e003-20200502
x86_64               randconfig-e001-20200502
i386                 randconfig-e002-20200502
i386                 randconfig-e001-20200502
x86_64               randconfig-e002-20200430
i386                 randconfig-e003-20200430
x86_64               randconfig-e003-20200430
i386                 randconfig-e002-20200430
x86_64               randconfig-e001-20200430
i386                 randconfig-e001-20200430
i386                 randconfig-f003-20200501
x86_64               randconfig-f001-20200501
x86_64               randconfig-f003-20200501
i386                 randconfig-f001-20200501
i386                 randconfig-f002-20200501
i386                 randconfig-f003-20200502
x86_64               randconfig-f001-20200502
x86_64               randconfig-f003-20200502
x86_64               randconfig-f002-20200502
i386                 randconfig-f001-20200502
i386                 randconfig-f002-20200502
x86_64               randconfig-a003-20200502
x86_64               randconfig-a001-20200502
x86_64               randconfig-a002-20200502
i386                 randconfig-a002-20200502
i386                 randconfig-a003-20200502
i386                 randconfig-a001-20200502
i386                 randconfig-h001-20200502
i386                 randconfig-h002-20200502
i386                 randconfig-h003-20200502
x86_64               randconfig-h002-20200502
x86_64               randconfig-h001-20200502
x86_64               randconfig-h003-20200502
i386                 randconfig-h001-20200501
i386                 randconfig-h002-20200501
i386                 randconfig-h003-20200501
x86_64               randconfig-h001-20200501
x86_64               randconfig-h003-20200501
ia64                 randconfig-a001-20200502
arm64                randconfig-a001-20200502
arc                  randconfig-a001-20200502
powerpc              randconfig-a001-20200502
arm                  randconfig-a001-20200502
sparc                randconfig-a001-20200502
ia64                 randconfig-a001-20200501
arc                  randconfig-a001-20200501
powerpc              randconfig-a001-20200501
arm                  randconfig-a001-20200501
sparc                randconfig-a001-20200501
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
s390                       zfcpdump_defconfig
s390                          debug_defconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                             alldefconfig
s390                                defconfig
sh                               allmodconfig
sh                            titan_defconfig
sh                  sh7785lcr_32bit_defconfig
sh                                allnoconfig
sparc                            allyesconfig
sparc                               defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
um                           x86_64_defconfig
um                             i386_defconfig
um                                  defconfig
x86_64                                   rhel
x86_64                               rhel-7.6
x86_64                    rhel-7.6-kselftests
x86_64                         rhel-7.2-clear
x86_64                                    lkp
x86_64                              fedora-25
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-02  9:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	Dave Hansen, virtualization, Linux MM, Michael S . Tsirkin,
	Eric W. Biederman, Pankaj Gupta, xen-devel, Andrew Morton,
	Michal Hocko, linuxppc-dev
In-Reply-To: <CAPcyv4h1nWjszkVJQgeXkUc=-nPv5=Me25BOGFQCpihUyFsD6w@mail.gmail.com>

>> Now, let's clarify what I want regarding virtio-mem:
>>
>> 1. kexec should not add virtio-mem memory to the initial firmware
>>    memmap. The driver has to be in charge as discussed.
>> 2. kexec should not place kexec images onto virtio-mem memory. That
>>    would end badly.
>> 3. kexec should still dump virtio-mem memory via kdump.
> 
> Ok, but then seems to say to me that dax/kmem is a different type of
> (driver managed) than virtio-mem and it's confusing to try to apply
> the same meaning. Why not just call your type for the distinct type it
> is "System RAM (virtio-mem)" and let any other driver managed memory
> follow the same "System RAM ($driver)" format if it wants?

I had the same idea but discarded it because it seemed to uglify the
add_memory() interface (passing yet another parameter only relevant for
driver managed memory). Maybe we really want a new one, because I like
that idea:

/*
 * Add special, driver-managed memory to the system as system ram.
 * The resource_name is expected to have the name format "System RAM
 * ($DRIVER)", so user space (esp. kexec-tools)" can special-case it.
 *
 * For this memory, no entries in /sys/firmware/memmap are created,
 * as this memory won't be part of the raw firmware-provided memory map
 * e.g., after a reboot. Also, the created memory resource is flagged
 * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-
 * case this memory (e.g., not place kexec images onto it).
 */
int add_memory_driver_managed(int nid, u64 start, u64 size,
			      const char *resource_name);


If we'd ever have to special case it even more in the kernel, we could
allow to specify further resource flags. While passing the driver name
instead of the resource_name would be an option, this way we don't have
to hand craft new resource strings for added memory resources.

Thoughts?

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* [powerpc:fixes-test] BUILD SUCCESS e2abb0f00606ece8b191679bbc3f9246738fb88e
From: kbuild test robot @ 2020-05-02 11:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  fixes-test
branch HEAD: e2abb0f00606ece8b191679bbc3f9246738fb88e  Merge KUAP fix from topic/uaccess-ppc into fixes-test

elapsed time: 689m

configs tested: 204
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm64                            allyesconfig
arm                              allyesconfig
arm64                            allmodconfig
arm                              allmodconfig
arm64                             allnoconfig
arm                               allnoconfig
arm                           efm32_defconfig
arm                         at91_dt_defconfig
arm                        shmobile_defconfig
arm64                               defconfig
arm                          exynos_defconfig
arm                        multi_v5_defconfig
arm                           sunxi_defconfig
arm                        multi_v7_defconfig
sparc                            allyesconfig
powerpc                             defconfig
ia64                                defconfig
arc                                 defconfig
mips                            ar7_defconfig
mips                          ath79_defconfig
mips                             allmodconfig
nios2                         3c120_defconfig
sparc64                             defconfig
csky                                defconfig
sh                          rsk7269_defconfig
ia64                              allnoconfig
nds32                             allnoconfig
m68k                           sun3_defconfig
i386                              allnoconfig
i386                             allyesconfig
i386                             alldefconfig
i386                                defconfig
i386                              debian-10.3
ia64                             allmodconfig
ia64                        generic_defconfig
ia64                          tiger_defconfig
ia64                         bigsur_defconfig
ia64                             allyesconfig
ia64                             alldefconfig
m68k                       m5475evb_defconfig
m68k                             allmodconfig
m68k                       bvme6000_defconfig
m68k                          multi_defconfig
nios2                         10m50_defconfig
c6x                        evmc6678_defconfig
c6x                              allyesconfig
openrisc                 simple_smp_defconfig
openrisc                    or1ksim_defconfig
nds32                               defconfig
alpha                               defconfig
h8300                       h8s-sim_defconfig
h8300                     edosk2674_defconfig
xtensa                          iss_defconfig
h8300                    h8300h-sim_defconfig
xtensa                       common_defconfig
arc                              allyesconfig
microblaze                      mmu_defconfig
microblaze                    nommu_defconfig
mips                      fuloong2e_defconfig
mips                      malta_kvm_defconfig
mips                             allyesconfig
mips                         64r6el_defconfig
mips                              allnoconfig
mips                           32r2_defconfig
mips                malta_kvm_guest_defconfig
mips                         tb0287_defconfig
mips                       capcella_defconfig
mips                           ip32_defconfig
mips                  decstation_64_defconfig
mips                      loongson3_defconfig
mips                        bcm63xx_defconfig
parisc                            allnoconfig
parisc                generic-64bit_defconfig
parisc                generic-32bit_defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                      chrp32_defconfig
powerpc                       holly_defconfig
powerpc                       ppc64_defconfig
powerpc                          rhel-kconfig
powerpc                           allnoconfig
powerpc                  mpc866_ads_defconfig
powerpc                    amigaone_defconfig
powerpc                    adder875_defconfig
powerpc                     ep8248e_defconfig
powerpc                          g5_defconfig
powerpc                     mpc512x_defconfig
m68k                 randconfig-a001-20200502
mips                 randconfig-a001-20200502
nds32                randconfig-a001-20200502
alpha                randconfig-a001-20200502
parisc               randconfig-a001-20200502
riscv                randconfig-a001-20200502
h8300                randconfig-a001-20200502
nios2                randconfig-a001-20200502
microblaze           randconfig-a001-20200502
c6x                  randconfig-a001-20200502
sparc64              randconfig-a001-20200502
s390                 randconfig-a001-20200502
xtensa               randconfig-a001-20200502
sh                   randconfig-a001-20200502
openrisc             randconfig-a001-20200502
csky                 randconfig-a001-20200502
x86_64               randconfig-a003-20200502
x86_64               randconfig-a001-20200502
x86_64               randconfig-a002-20200502
i386                 randconfig-a002-20200502
i386                 randconfig-a003-20200502
i386                 randconfig-a001-20200502
i386                 randconfig-b003-20200502
i386                 randconfig-b001-20200502
x86_64               randconfig-b003-20200502
x86_64               randconfig-b001-20200502
i386                 randconfig-b002-20200502
i386                 randconfig-b003-20200501
x86_64               randconfig-b002-20200501
i386                 randconfig-b001-20200501
x86_64               randconfig-b003-20200501
x86_64               randconfig-b001-20200501
i386                 randconfig-b002-20200501
x86_64               randconfig-c002-20200502
i386                 randconfig-c002-20200502
i386                 randconfig-c001-20200502
i386                 randconfig-c003-20200502
i386                 randconfig-d003-20200502
i386                 randconfig-d001-20200502
x86_64               randconfig-d002-20200502
i386                 randconfig-d002-20200502
x86_64               randconfig-e003-20200502
i386                 randconfig-e003-20200502
x86_64               randconfig-e001-20200502
i386                 randconfig-e002-20200502
i386                 randconfig-e001-20200502
x86_64               randconfig-e002-20200430
i386                 randconfig-e003-20200430
x86_64               randconfig-e003-20200430
i386                 randconfig-e002-20200430
x86_64               randconfig-e001-20200430
i386                 randconfig-e001-20200430
i386                 randconfig-f003-20200502
x86_64               randconfig-f001-20200502
x86_64               randconfig-f003-20200502
x86_64               randconfig-f002-20200502
i386                 randconfig-f001-20200502
i386                 randconfig-f002-20200502
x86_64               randconfig-g003-20200502
i386                 randconfig-g003-20200502
i386                 randconfig-g002-20200502
x86_64               randconfig-g001-20200502
x86_64               randconfig-g002-20200502
i386                 randconfig-g001-20200502
i386                 randconfig-h001-20200502
i386                 randconfig-h002-20200502
i386                 randconfig-h003-20200502
x86_64               randconfig-h002-20200502
x86_64               randconfig-h001-20200502
x86_64               randconfig-h003-20200502
i386                 randconfig-h001-20200501
i386                 randconfig-h002-20200501
i386                 randconfig-h003-20200501
x86_64               randconfig-h001-20200501
x86_64               randconfig-h003-20200501
ia64                 randconfig-a001-20200502
arm64                randconfig-a001-20200502
arc                  randconfig-a001-20200502
powerpc              randconfig-a001-20200502
arm                  randconfig-a001-20200502
sparc                randconfig-a001-20200502
ia64                 randconfig-a001-20200501
arc                  randconfig-a001-20200501
powerpc              randconfig-a001-20200501
arm                  randconfig-a001-20200501
sparc                randconfig-a001-20200501
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
s390                       zfcpdump_defconfig
s390                          debug_defconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                             alldefconfig
s390                                defconfig
sh                               allmodconfig
sh                            titan_defconfig
sh                  sh7785lcr_32bit_defconfig
sh                                allnoconfig
sparc                               defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
um                           x86_64_defconfig
um                             i386_defconfig
um                                  defconfig
x86_64                                   rhel
x86_64                               rhel-7.6
x86_64                    rhel-7.6-kselftests
x86_64                         rhel-7.2-clear
x86_64                                    lkp
x86_64                              fedora-25
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [PATCH v2 00/12] powerpc/book3s/64/pkeys: Simplify the code
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram

This patch series update the pkey subsystem with more documentation and
rename variables so that it is easy to follow the code. The last patch
does fix a problem where we are treating keys above max_pkey as available.
But userspace is not impacted because using that key in mprotect_pkey returns
error due to limit check there. Also the uamor, value set by the platform is such
that it will deny modification of keys above max pkey.

Changes from V1:
* Rebase to the latest kernel.
* Added two new patches 6 and 12.


Aneesh Kumar K.V (12):
  powerpc/book3s64/pkeys: Fixup bit numbering
  powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.
  powerpc/book3s64/pkeys: Move pkey related bits in the linux page table
  powerpc/book3s64/pkeys: Explain key 1 reservation details
  powerpc/book3s64/pkeys: Simplify the key initialization
  powerpc/book3s64/pkeys: Prevent key 1 modification from userspace.
  powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY
  powerpc/book3s64/pkeys: Convert execute key support to static key
  powerpc/book3s64/pkeys: Simplify pkey disable branch
  powerpc/book3s64/pkeys: Convert pkey_total to max_pkey
  powerpc/book3s64/pkeys: Make initial_allocation_mask static
  powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved

 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  21 +-
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  12 +-
 .../powerpc/include/asm/book3s/64/hash-pkey.h |  32 +++
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   8 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  17 +-
 arch/powerpc/include/asm/book3s/64/pkeys.h    |  25 +++
 arch/powerpc/include/asm/cputable.h           |  10 +-
 arch/powerpc/include/asm/pkeys.h              |  43 +---
 arch/powerpc/kernel/dt_cpu_ftrs.c             |   6 -
 arch/powerpc/mm/book3s64/pkeys.c              | 210 ++++++++++--------
 10 files changed, 222 insertions(+), 162 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/hash-pkey.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pkeys.h

-- 
2.26.2


^ permalink raw reply

* [PATCH v2 01/12] powerpc/book3s64/pkeys: Fixup bit numbering
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

This number the pkey bit such that it is easy to follow. PKEY_BIT0 is
the lower order bit. This makes further changes easy to follow.

No functional change in this patch other than linux page table for
hash translation now maps pkeys differently.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  9 +++----
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  8 +++----
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  8 +++----
 arch/powerpc/include/asm/pkeys.h              | 24 +++++++++----------
 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 3f9ae3585ab9..f889d56bf8cf 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -57,11 +57,12 @@
 #define H_PMD_FRAG_NR	(PAGE_SIZE >> H_PMD_FRAG_SIZE_SHIFT)
 
 /* memory key bits, only 8 keys supported */
-#define H_PTE_PKEY_BIT0	0
-#define H_PTE_PKEY_BIT1	0
+#define H_PTE_PKEY_BIT4	0
+#define H_PTE_PKEY_BIT3	0
 #define H_PTE_PKEY_BIT2	_RPAGE_RSV3
-#define H_PTE_PKEY_BIT3	_RPAGE_RSV4
-#define H_PTE_PKEY_BIT4	_RPAGE_RSV5
+#define H_PTE_PKEY_BIT1	_RPAGE_RSV4
+#define H_PTE_PKEY_BIT0	_RPAGE_RSV5
+
 
 /*
  * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range()
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0729c034e56f..0a15fd14cf72 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -36,11 +36,11 @@
 #define H_PAGE_HASHPTE	_RPAGE_RPN43	/* PTE has associated HPTE */
 
 /* memory key bits. */
-#define H_PTE_PKEY_BIT0	_RPAGE_RSV1
-#define H_PTE_PKEY_BIT1	_RPAGE_RSV2
+#define H_PTE_PKEY_BIT4	_RPAGE_RSV1
+#define H_PTE_PKEY_BIT3	_RPAGE_RSV2
 #define H_PTE_PKEY_BIT2	_RPAGE_RSV3
-#define H_PTE_PKEY_BIT3	_RPAGE_RSV4
-#define H_PTE_PKEY_BIT4	_RPAGE_RSV5
+#define H_PTE_PKEY_BIT1	_RPAGE_RSV4
+#define H_PTE_PKEY_BIT0	_RPAGE_RSV5
 
 /*
  * We need to differentiate between explicit huge page and THP huge
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 3fa1b962dc27..58fcc959f9d5 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -86,8 +86,8 @@
 #define HPTE_R_PP0		ASM_CONST(0x8000000000000000)
 #define HPTE_R_TS		ASM_CONST(0x4000000000000000)
 #define HPTE_R_KEY_HI		ASM_CONST(0x3000000000000000)
-#define HPTE_R_KEY_BIT0		ASM_CONST(0x2000000000000000)
-#define HPTE_R_KEY_BIT1		ASM_CONST(0x1000000000000000)
+#define HPTE_R_KEY_BIT4		ASM_CONST(0x2000000000000000)
+#define HPTE_R_KEY_BIT3		ASM_CONST(0x1000000000000000)
 #define HPTE_R_RPN_SHIFT	12
 #define HPTE_R_RPN		ASM_CONST(0x0ffffffffffff000)
 #define HPTE_R_RPN_3_0		ASM_CONST(0x01fffffffffff000)
@@ -103,8 +103,8 @@
 #define HPTE_R_R		ASM_CONST(0x0000000000000100)
 #define HPTE_R_KEY_LO		ASM_CONST(0x0000000000000e00)
 #define HPTE_R_KEY_BIT2		ASM_CONST(0x0000000000000800)
-#define HPTE_R_KEY_BIT3		ASM_CONST(0x0000000000000400)
-#define HPTE_R_KEY_BIT4		ASM_CONST(0x0000000000000200)
+#define HPTE_R_KEY_BIT1		ASM_CONST(0x0000000000000400)
+#define HPTE_R_KEY_BIT0		ASM_CONST(0x0000000000000200)
 #define HPTE_R_KEY		(HPTE_R_KEY_LO | HPTE_R_KEY_HI)
 
 #define HPTE_V_1TB_SEG		ASM_CONST(0x4000000000000000)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 20ebf153c871..f8f4d0793789 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -35,11 +35,11 @@ static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 	if (static_branch_likely(&pkey_disabled))
 		return 0x0UL;
 
-	return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT4 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+	return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
 		((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT1 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT0 : 0x0UL));
+		((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
 static inline int vma_pkey(struct vm_area_struct *vma)
@@ -53,20 +53,20 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
-	return (((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+	return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
 		((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
+		((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
 }
 
 static inline u16 pte_to_pkey_bits(u64 pteflags)
 {
-	return (((pteflags & H_PTE_PKEY_BIT0) ? 0x10 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT1) ? 0x8 : 0x0UL) |
+	return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT3) ? 0x8 : 0x0UL) |
 		((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT3) ? 0x2 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT4) ? 0x1 : 0x0UL));
+		((pteflags & H_PTE_PKEY_BIT1) ? 0x2 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT0) ? 0x1 : 0x0UL));
 }
 
 #define pkey_alloc_mask(pkey) (0x1 << pkey)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 02/12] powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

Move them to hash specific file and add BUG() for radix path.
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 32 ++++++++++++++++
 arch/powerpc/include/asm/book3s/64/pkeys.h    | 25 +++++++++++++
 arch/powerpc/include/asm/pkeys.h              | 37 ++++---------------
 3 files changed, 64 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/hash-pkey.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pkeys.h

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
new file mode 100644
index 000000000000..795010897e5d
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
+#define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
+
+static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
+{
+	return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+		((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
+}
+
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+{
+	return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+}
+
+static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
+{
+	return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT3) ? 0x8 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT1) ? 0x2 : 0x0UL) |
+		((pteflags & H_PTE_PKEY_BIT0) ? 0x1 : 0x0UL));
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h b/arch/powerpc/include/asm/book3s/64/pkeys.h
new file mode 100644
index 000000000000..8174662a9173
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _ASM_POWERPC_BOOK3S_64_PKEYS_H
+#define _ASM_POWERPC_BOOK3S_64_PKEYS_H
+
+#include <asm/book3s/64/hash-pkey.h>
+
+static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
+{
+	if (static_branch_likely(&pkey_disabled))
+		return 0x0UL;
+
+	if (radix_enabled())
+		BUG();
+	return hash__vmflag_to_pte_pkey_bits(vm_flags);
+}
+
+static inline u16 pte_to_pkey_bits(u64 pteflags)
+{
+	if (radix_enabled())
+		BUG();
+	return hash__pte_to_pkey_bits(pteflags);
+}
+
+#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index f8f4d0793789..5dd0a79d1809 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -25,23 +25,18 @@ extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 				PKEY_DISABLE_WRITE  | \
 				PKEY_DISABLE_EXECUTE)
 
+#ifdef CONFIG_PPC_BOOK3S_64
+#include <asm/book3s/64/pkeys.h>
+#else
+#error "Not supported"
+#endif
+
+
 static inline u64 pkey_to_vmflag_bits(u16 pkey)
 {
 	return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_VM_PKEY_FLAGS);
 }
 
-static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
-{
-	if (static_branch_likely(&pkey_disabled))
-		return 0x0UL;
-
-	return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
-		((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
-}
-
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
 	if (static_branch_likely(&pkey_disabled))
@@ -51,24 +46,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
 #define arch_max_pkey() pkeys_total
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
-{
-	return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
-}
-
-static inline u16 pte_to_pkey_bits(u64 pteflags)
-{
-	return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT3) ? 0x8 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT1) ? 0x2 : 0x0UL) |
-		((pteflags & H_PTE_PKEY_BIT0) ? 0x1 : 0x0UL));
-}
-
 #define pkey_alloc_mask(pkey) (0x1 << pkey)
 
 #define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 03/12] powerpc/book3s64/pkeys: Move pkey related bits in the linux page table
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

To keep things simple, all the pkey related bits are kept together
in linux page table for 64K config with hash translation. With hash-4k
kernel requires 4 bits to store slots details. This is done by overloading
some of the RPN bits for storing the slot details. Due to this PKEY_BIT0 on
the 4K config is used for storing hash slot details.

64K before

|....|RSV1| RSV2| RSV3 | RSV4 | RPN44| RPN43   |.... | RSV5|
|....| P4 |  P3 |  P2  |  P1  | Busy | HASHPTE |.... |  P0 |

after

|....|RSV1| RSV2| RSV3 | RSV4 | RPN44 | RPN43   |.... | RSV5 |
|....| P4 |  P3 |  P2  |  P1  | P0    | HASHPTE |.... | Busy |

4k before

|....| RSV1 | RSV2     | RSV3 | RSV4 | RPN44| RPN43.... | RSV5|
|....| Busy |  HASHPTE |  P2  |  P1  | F_SEC| F_GIX.... |  P0 |

after

|....| RSV1    | RSV2| RSV3 | RSV4 | Free | RPN43.... | RSV5 |
|....| HASHPTE |  P2 |  P1  |  P0  | F_SEC| F_GIX.... | BUSY |

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 16 ++++++++--------
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 12 ++++++------
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 17 ++++++++---------
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index f889d56bf8cf..082b98808701 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -34,11 +34,11 @@
 #define H_PUD_TABLE_SIZE	(sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE	(sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
-#define H_PAGE_F_GIX_SHIFT	53
-#define H_PAGE_F_SECOND	_RPAGE_RPN44	/* HPTE is in 2ndary HPTEG */
-#define H_PAGE_F_GIX	(_RPAGE_RPN43 | _RPAGE_RPN42 | _RPAGE_RPN41)
-#define H_PAGE_BUSY	_RPAGE_RSV1     /* software: PTE & hash are busy */
-#define H_PAGE_HASHPTE	_RPAGE_RSV2     /* software: PTE & hash are busy */
+#define H_PAGE_F_GIX_SHIFT	_PAGE_PA_MAX
+#define H_PAGE_F_SECOND		_RPAGE_PKEY_BIT0 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX		(_RPAGE_RPN43 | _RPAGE_RPN42 | _RPAGE_RPN41)
+#define H_PAGE_BUSY		_RPAGE_RSV1
+#define H_PAGE_HASHPTE		_RPAGE_PKEY_BIT4
 
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
@@ -59,9 +59,9 @@
 /* memory key bits, only 8 keys supported */
 #define H_PTE_PKEY_BIT4	0
 #define H_PTE_PKEY_BIT3	0
-#define H_PTE_PKEY_BIT2	_RPAGE_RSV3
-#define H_PTE_PKEY_BIT1	_RPAGE_RSV4
-#define H_PTE_PKEY_BIT0	_RPAGE_RSV5
+#define H_PTE_PKEY_BIT2	_RPAGE_PKEY_BIT3
+#define H_PTE_PKEY_BIT1	_RPAGE_PKEY_BIT2
+#define H_PTE_PKEY_BIT0	_RPAGE_PKEY_BIT1
 
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0a15fd14cf72..f20de1149ebe 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -32,15 +32,15 @@
  */
 #define H_PAGE_COMBO	_RPAGE_RPN0 /* this is a combo 4k page */
 #define H_PAGE_4K_PFN	_RPAGE_RPN1 /* PFN is for a single 4k page */
-#define H_PAGE_BUSY	_RPAGE_RPN44     /* software: PTE & hash are busy */
+#define H_PAGE_BUSY	_RPAGE_RSV1     /* software: PTE & hash are busy */
 #define H_PAGE_HASHPTE	_RPAGE_RPN43	/* PTE has associated HPTE */
 
 /* memory key bits. */
-#define H_PTE_PKEY_BIT4	_RPAGE_RSV1
-#define H_PTE_PKEY_BIT3	_RPAGE_RSV2
-#define H_PTE_PKEY_BIT2	_RPAGE_RSV3
-#define H_PTE_PKEY_BIT1	_RPAGE_RSV4
-#define H_PTE_PKEY_BIT0	_RPAGE_RSV5
+#define H_PTE_PKEY_BIT4		_RPAGE_PKEY_BIT4
+#define H_PTE_PKEY_BIT3		_RPAGE_PKEY_BIT3
+#define H_PTE_PKEY_BIT2		_RPAGE_PKEY_BIT2
+#define H_PTE_PKEY_BIT1		_RPAGE_PKEY_BIT1
+#define H_PTE_PKEY_BIT0		_RPAGE_PKEY_BIT0
 
 /*
  * We need to differentiate between explicit huge page and THP huge
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 368b136517e0..e31369707f9f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -32,11 +32,13 @@
 #define _RPAGE_SW1		0x00800
 #define _RPAGE_SW2		0x00400
 #define _RPAGE_SW3		0x00200
-#define _RPAGE_RSV1		0x1000000000000000UL
-#define _RPAGE_RSV2		0x0800000000000000UL
-#define _RPAGE_RSV3		0x0400000000000000UL
-#define _RPAGE_RSV4		0x0200000000000000UL
-#define _RPAGE_RSV5		0x00040UL
+#define _RPAGE_RSV1		0x00040UL
+
+#define _RPAGE_PKEY_BIT4	0x1000000000000000UL
+#define _RPAGE_PKEY_BIT3	0x0800000000000000UL
+#define _RPAGE_PKEY_BIT2	0x0400000000000000UL
+#define _RPAGE_PKEY_BIT1	0x0200000000000000UL
+#define _RPAGE_PKEY_BIT0	0x0100000000000000UL
 
 #define _PAGE_PTE		0x4000000000000000UL	/* distinguishes PTEs from pointers */
 #define _PAGE_PRESENT		0x8000000000000000UL	/* pte contains a translation */
@@ -58,13 +60,12 @@
  */
 #define _RPAGE_RPN0		0x01000
 #define _RPAGE_RPN1		0x02000
-#define _RPAGE_RPN44		0x0100000000000000UL
 #define _RPAGE_RPN43		0x0080000000000000UL
 #define _RPAGE_RPN42		0x0040000000000000UL
 #define _RPAGE_RPN41		0x0020000000000000UL
 
 /* Max physical address bit as per radix table */
-#define _RPAGE_PA_MAX		57
+#define _RPAGE_PA_MAX		56
 
 /*
  * Max physical address bit we will use for now.
@@ -125,8 +126,6 @@
 			 _PAGE_ACCESSED | _PAGE_SPECIAL | _PAGE_PTE |	\
 			 _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
 
-#define H_PTE_PKEY  (H_PTE_PKEY_BIT0 | H_PTE_PKEY_BIT1 | H_PTE_PKEY_BIT2 | \
-		     H_PTE_PKEY_BIT3 | H_PTE_PKEY_BIT4)
 /*
  * We define 2 sets of base prot bits, one for basic pages (ie,
  * cacheable kernel and user pages) and one for non cacheable
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 04/12] powerpc/book3s64/pkeys: Explain key 1 reservation details
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

This explains the details w.r.t key 1.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pkeys.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 1199fc2bfaec..d60e6bfa3e03 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -124,7 +124,10 @@ static int pkey_initialize(void)
 #else
 	os_reserved = 0;
 #endif
-	/* Bits are in LE format. */
+	/*
+	 * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
+	 * programming note.
+	 */
 	reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
 
 	/* register mask is in BE format */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 05/12] powerpc/book3s64/pkeys: Simplify the key initialization
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

Add documentation explaining the execute_only_key. The reservation and initialization mask
details are also explained in this patch.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pkeys.c | 186 ++++++++++++++++++-------------
 1 file changed, 107 insertions(+), 79 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index d60e6bfa3e03..3db0b3cfc322 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -15,48 +15,71 @@
 DEFINE_STATIC_KEY_TRUE(pkey_disabled);
 int  pkeys_total;		/* Total pkeys as per device tree */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
-u32  reserved_allocation_mask;  /* Bits set for reserved keys */
+/*
+ *  Keys marked in the reservation list cannot be allocated by  userspace
+ */
+u32  reserved_allocation_mask;
 static bool pkey_execute_disable_supported;
-static bool pkeys_devtree_defined;	/* property exported by device tree */
-static u64 pkey_amr_mask;		/* Bits in AMR not to be touched */
-static u64 pkey_iamr_mask;		/* Bits in AMR not to be touched */
-static u64 pkey_uamor_mask;		/* Bits in UMOR not to be touched */
+static u64 default_amr;
+static u64 default_iamr;
+/* Allow all keys to be modified by default */
+static u64 default_uamor = ~0x0UL;
+/*
+ * Key used to implement PROT_EXEC mmap. Denies READ/WRITE
+ * We pick key 2 because 0 is special key and 1 is reserved as per ISA.
+ */
 static int execute_only_key = 2;
 
+
 #define AMR_BITS_PER_PKEY 2
 #define AMR_RD_BIT 0x1UL
 #define AMR_WR_BIT 0x2UL
 #define IAMR_EX_BIT 0x1UL
-#define PKEY_REG_BITS (sizeof(u64)*8)
+#define PKEY_REG_BITS (sizeof(u64) * 8)
 #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
 
-static void scan_pkey_feature(void)
+static int scan_pkey_feature(void)
 {
 	u32 vals[2];
+	int pkeys_total = 0;
 	struct device_node *cpu;
 
+	/*
+	 * Pkey is not supported with Radix translation.
+	 */
+	if (radix_enabled())
+		return 0;
+
 	cpu = of_find_node_by_type(NULL, "cpu");
 	if (!cpu)
-		return;
+		return 0;
 
 	if (of_property_read_u32_array(cpu,
-			"ibm,processor-storage-keys", vals, 2))
-		return;
+				       "ibm,processor-storage-keys", vals, 2) == 0) {
+		/*
+		 * Since any pkey can be used for data or execute, we will
+		 * just treat all keys as equal and track them as one entity.
+		 */
+		pkeys_total = vals[0];
+		/*  Should we check for IAMR support FIXME!! */
+	} else {
+		/*
+		 * Let's assume 32 pkeys on P8 bare metal, if its not defined by device
+		 * tree. We make this exception since skiboot forgot to expose this
+		 * property on power8.
+		 */
+		if (!firmware_has_feature(FW_FEATURE_LPAR) &&
+		    cpu_has_feature(CPU_FTRS_POWER8))
+			pkeys_total = 32;
+	}
 
 	/*
-	 * Since any pkey can be used for data or execute, we will just treat
-	 * all keys as equal and track them as one entity.
+	 * Adjust the upper limit, based on the number of bits supported by
+	 * arch-neutral code.
 	 */
-	pkeys_total = vals[0];
-	pkeys_devtree_defined = true;
-}
-
-static inline bool pkey_mmu_enabled(void)
-{
-	if (firmware_has_feature(FW_FEATURE_LPAR))
-		return pkeys_total;
-	else
-		return cpu_has_feature(CPU_FTR_PKEY);
+	pkeys_total = min_t(int, pkeys_total,
+			    ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
+	return pkeys_total;
 }
 
 static int pkey_initialize(void)
@@ -80,31 +103,13 @@ static int pkey_initialize(void)
 				!= (sizeof(u64) * BITS_PER_BYTE));
 
 	/* scan the device tree for pkey feature */
-	scan_pkey_feature();
-
-	/*
-	 * Let's assume 32 pkeys on P8 bare metal, if its not defined by device
-	 * tree. We make this exception since skiboot forgot to expose this
-	 * property on power8.
-	 */
-	if (!pkeys_devtree_defined && !firmware_has_feature(FW_FEATURE_LPAR) &&
-			cpu_has_feature(CPU_FTRS_POWER8))
-		pkeys_total = 32;
-
-	/*
-	 * Adjust the upper limit, based on the number of bits supported by
-	 * arch-neutral code.
-	 */
-	pkeys_total = min_t(int, pkeys_total,
-			((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)+1));
-
-	if (!pkey_mmu_enabled() || radix_enabled() || !pkeys_total)
-		static_branch_enable(&pkey_disabled);
-	else
+	pkeys_total = scan_pkey_feature();
+	if (pkeys_total)
 		static_branch_disable(&pkey_disabled);
-
-	if (static_branch_likely(&pkey_disabled))
+	else {
+		static_branch_enable(&pkey_disabled);
 		return 0;
+	}
 
 	/*
 	 * The device tree cannot be relied to indicate support for
@@ -118,48 +123,71 @@ static int pkey_initialize(void)
 #ifdef CONFIG_PPC_4K_PAGES
 	/*
 	 * The OS can manage only 8 pkeys due to its inability to represent them
-	 * in the Linux 4K PTE.
+	 * in the Linux 4K PTE. Mark all other keys reserved.
 	 */
 	os_reserved = pkeys_total - 8;
 #else
 	os_reserved = 0;
 #endif
-	/*
-	 * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
-	 * programming note.
-	 */
-	reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
-
-	/* register mask is in BE format */
-	pkey_amr_mask = ~0x0ul;
-	pkey_amr_mask &= ~(0x3ul << pkeyshift(0));
-
-	pkey_iamr_mask = ~0x0ul;
-	pkey_iamr_mask &= ~(0x3ul << pkeyshift(0));
-	pkey_iamr_mask &= ~(0x3ul << pkeyshift(execute_only_key));
-
-	pkey_uamor_mask = ~0x0ul;
-	pkey_uamor_mask &= ~(0x3ul << pkeyshift(0));
-	pkey_uamor_mask &= ~(0x3ul << pkeyshift(execute_only_key));
-
-	/* mark the rest of the keys as reserved and hence unavailable */
-	for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
-		reserved_allocation_mask |= (0x1 << i);
-		pkey_uamor_mask &= ~(0x3ul << pkeyshift(i));
-	}
-	initial_allocation_mask = reserved_allocation_mask | (0x1 << 0);
 
 	if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {
 		/*
 		 * Insufficient number of keys to support
 		 * execute only key. Mark it unavailable.
-		 * Any AMR, UAMOR, IAMR bit set for
-		 * this key is irrelevant since this key
-		 * can never be allocated.
 		 */
 		execute_only_key = -1;
+	} else {
+		/*
+		 * Mark the execute_only_pkey as not available for
+		 * user allocation via pkey_alloc.
+		 */
+		reserved_allocation_mask |= (0x1 << execute_only_key);
+
+		/*
+		 * Deny READ/WRITE for execute_only_key.
+		 * Allow execute in IAMR.
+		 */
+		default_amr  |= (0x3ul << pkeyshift(execute_only_key));
+		default_iamr &= ~(0x3ul << pkeyshift(execute_only_key));
+
+		/*
+		 * Clear the uamor bits for this key.
+		 */
+		default_uamor &= ~(0x3ul << pkeyshift(execute_only_key));
 	}
 
+	/*
+	 * Allow access for only key 0. And prevent any other modification.
+	 */
+	default_amr   &= ~(0x3ul << pkeyshift(0));
+	default_iamr  &= ~(0x3ul << pkeyshift(0));
+	default_uamor &= ~(0x3ul << pkeyshift(0));
+	/*
+	 * key 0 is special in that we want to consider it an allocated
+	 * key which is preallocated. We don't allow changing AMR bits
+	 * w.r.t key 0. But one can pkey_free(key0)
+	 */
+	initial_allocation_mask |= (0x1 << 0);
+
+	/*
+	 * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
+	 * programming note.
+	 */
+	reserved_allocation_mask |= (0x1 << 1);
+
+	/*
+	 * Prevent the usage of OS reserved the keys. Update UAMOR
+	 * for those keys.
+	 */
+	for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
+		reserved_allocation_mask |= (0x1 << i);
+		default_uamor &= ~(0x3ul << pkeyshift(i));
+	}
+	/*
+	 * Prevent the allocation of reserved keys too.
+	 */
+	initial_allocation_mask |= reserved_allocation_mask;
+
 	return 0;
 }
 
@@ -301,13 +329,13 @@ void thread_pkey_regs_init(struct thread_struct *thread)
 	if (static_branch_likely(&pkey_disabled))
 		return;
 
-	thread->amr = pkey_amr_mask;
-	thread->iamr = pkey_iamr_mask;
-	thread->uamor = pkey_uamor_mask;
+	thread->amr   = default_amr;
+	thread->iamr  = default_iamr;
+	thread->uamor = default_uamor;
 
-	write_uamor(pkey_uamor_mask);
-	write_amr(pkey_amr_mask);
-	write_iamr(pkey_iamr_mask);
+	write_amr(default_amr);
+	write_iamr(default_iamr);
+	write_uamor(default_uamor);
 }
 
 int __execute_only_pkey(struct mm_struct *mm)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 06/12] powerpc/book3s64/pkeys: Prevent key 1 modification from userspace.
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

Key 1 is marked reserved by ISA. Setup uamor to prevent userspace modification
of the same.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pkeys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 3db0b3cfc322..9e68a08799ee 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -174,6 +174,7 @@ static int pkey_initialize(void)
 	 * programming note.
 	 */
 	reserved_allocation_mask |= (0x1 << 1);
+	default_uamor &= ~(0x3ul << pkeyshift(1));
 
 	/*
 	 * Prevent the usage of OS reserved the keys. Update UAMOR
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 07/12] powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it
free.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/cputable.h | 10 +++++-----
 arch/powerpc/kernel/dt_cpu_ftrs.c   |  6 ------
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 40a4d3c6fd99..b77f8258ee8c 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -198,7 +198,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_STCX_CHECKS_ADDRESS	LONG_ASM_CONST(0x0000000080000000)
 #define CPU_FTR_POPCNTB			LONG_ASM_CONST(0x0000000100000000)
 #define CPU_FTR_POPCNTD			LONG_ASM_CONST(0x0000000200000000)
-#define CPU_FTR_PKEY			LONG_ASM_CONST(0x0000000400000000)
+/* LONG_ASM_CONST(0x0000000400000000) Free */
 #define CPU_FTR_VMX_COPY		LONG_ASM_CONST(0x0000000800000000)
 #define CPU_FTR_TM			LONG_ASM_CONST(0x0000001000000000)
 #define CPU_FTR_CFAR			LONG_ASM_CONST(0x0000002000000000)
@@ -437,7 +437,7 @@ static inline void cpu_feature_keys_init(void) { }
 	    CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
 	    CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
 	    CPU_FTR_CFAR | CPU_FTR_HVMODE | \
-	    CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
+	    CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX )
 #define CPU_FTRS_POWER8 (CPU_FTR_LWSYNC | \
 	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
 	    CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -447,7 +447,7 @@ static inline void cpu_feature_keys_init(void) { }
 	    CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
 	    CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
 	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
-	    CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY)
+	    CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP )
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER9 (CPU_FTR_LWSYNC | \
 	    CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
@@ -458,8 +458,8 @@ static inline void cpu_feature_keys_init(void) { }
 	    CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
 	    CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
 	    CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
-	    CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-	    CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
+	    CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_P9_TLBIE_STQ_BUG | \
+	    CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
 #define CPU_FTRS_POWER9_DD2_0 (CPU_FTRS_POWER9 | CPU_FTR_P9_RADIX_PREFETCH_BUG)
 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | \
 			       CPU_FTR_P9_RADIX_PREFETCH_BUG | \
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 36bc0d5c4f3a..120ea339ffda 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -747,12 +747,6 @@ static __init void cpufeatures_cpu_quirks(void)
 	}
 
 	update_tlbie_feature_flag(version);
-	/*
-	 * PKEY was not in the initial base or feature node
-	 * specification, but it should become optional in the next
-	 * cpu feature version sequence.
-	 */
-	cur_cpu_spec->cpu_features |= CPU_FTR_PKEY;
 }
 
 static void __init cpufeatures_setup_finished(void)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 08/12] powerpc/book3s64/pkeys: Convert execute key support to static key
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

Convert the bool to a static key like pkey_disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pkeys.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 9e68a08799ee..7d400d5a4076 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -13,13 +13,13 @@
 #include <linux/of_device.h>
 
 DEFINE_STATIC_KEY_TRUE(pkey_disabled);
+DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
 int  pkeys_total;		/* Total pkeys as per device tree */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
  */
 u32  reserved_allocation_mask;
-static bool pkey_execute_disable_supported;
 static u64 default_amr;
 static u64 default_iamr;
 /* Allow all keys to be modified by default */
@@ -116,9 +116,7 @@ static int pkey_initialize(void)
 	 * execute_disable support. Instead we use a PVR check.
 	 */
 	if (pvr_version_is(PVR_POWER7) || pvr_version_is(PVR_POWER7p))
-		pkey_execute_disable_supported = false;
-	else
-		pkey_execute_disable_supported = true;
+		static_branch_enable(&execute_pkey_disabled);
 
 #ifdef CONFIG_PPC_4K_PAGES
 	/*
@@ -214,7 +212,7 @@ static inline void write_amr(u64 value)
 
 static inline u64 read_iamr(void)
 {
-	if (!likely(pkey_execute_disable_supported))
+	if (static_branch_unlikely(&execute_pkey_disabled))
 		return 0x0UL;
 
 	return mfspr(SPRN_IAMR);
@@ -222,7 +220,7 @@ static inline u64 read_iamr(void)
 
 static inline void write_iamr(u64 value)
 {
-	if (!likely(pkey_execute_disable_supported))
+	if (static_branch_unlikely(&execute_pkey_disabled))
 		return;
 
 	mtspr(SPRN_IAMR, value);
@@ -282,7 +280,7 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		return -EINVAL;
 
 	if (init_val & PKEY_DISABLE_EXECUTE) {
-		if (!pkey_execute_disable_supported)
+		if (static_branch_unlikely(&execute_pkey_disabled))
 			return -EINVAL;
 		new_iamr_bits |= IAMR_EX_BIT;
 	}
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 09/12] powerpc/book3s64/pkeys: Simplify pkey disable branch
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

Make the default value FALSE (pkey enabled) and set to TRUE when we
find the total number of keys supported to be zero.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h | 2 +-
 arch/powerpc/mm/book3s64/pkeys.c | 7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 5dd0a79d1809..75d2a2c19c04 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -11,7 +11,7 @@
 #include <linux/jump_label.h>
 #include <asm/firmware.h>
 
-DECLARE_STATIC_KEY_TRUE(pkey_disabled);
+DECLARE_STATIC_KEY_FALSE(pkey_disabled);
 extern int pkeys_total; /* total pkeys as per device tree */
 extern u32 initial_allocation_mask; /*  bits set for the initially allocated keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 7d400d5a4076..87d882a9aaf2 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -12,7 +12,7 @@
 #include <linux/pkeys.h>
 #include <linux/of_device.h>
 
-DEFINE_STATIC_KEY_TRUE(pkey_disabled);
+DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
 int  pkeys_total;		/* Total pkeys as per device tree */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
@@ -104,9 +104,8 @@ static int pkey_initialize(void)
 
 	/* scan the device tree for pkey feature */
 	pkeys_total = scan_pkey_feature();
-	if (pkeys_total)
-		static_branch_disable(&pkey_disabled);
-	else {
+	if (!pkeys_total) {
+		/* No support for pkey. Mark it disabled */
 		static_branch_enable(&pkey_disabled);
 		return 0;
 	}
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 10/12] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

max_pkey now represents max key value that userspace can allocate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h |  7 +++++--
 arch/powerpc/mm/book3s64/pkeys.c | 14 +++++++-------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 75d2a2c19c04..652bad7334f3 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -12,7 +12,7 @@
 #include <asm/firmware.h>
 
 DECLARE_STATIC_KEY_FALSE(pkey_disabled);
-extern int pkeys_total; /* total pkeys as per device tree */
+extern int max_pkey;
 extern u32 initial_allocation_mask; /*  bits set for the initially allocated keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 
@@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 	return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
 }
 
-#define arch_max_pkey() pkeys_total
+static inline int arch_max_pkey(void)
+{
+	return max_pkey;
+}
 
 #define pkey_alloc_mask(pkey) (0x1 << pkey)
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 87d882a9aaf2..a4d7287082a8 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -14,7 +14,7 @@
 
 DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
-int  pkeys_total;		/* Total pkeys as per device tree */
+int  max_pkey;			/* Maximum key value supported */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -84,7 +84,7 @@ static int scan_pkey_feature(void)
 
 static int pkey_initialize(void)
 {
-	int os_reserved, i;
+	int pkeys_total, i;
 
 	/*
 	 * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
@@ -122,12 +122,12 @@ static int pkey_initialize(void)
 	 * The OS can manage only 8 pkeys due to its inability to represent them
 	 * in the Linux 4K PTE. Mark all other keys reserved.
 	 */
-	os_reserved = pkeys_total - 8;
+	max_pkey = min(8, pkeys_total);
 #else
-	os_reserved = 0;
+	max_pkey = pkeys_total;
 #endif
 
-	if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {
+	if (unlikely(max_pkey <= execute_only_key)) {
 		/*
 		 * Insufficient number of keys to support
 		 * execute only key. Mark it unavailable.
@@ -174,10 +174,10 @@ static int pkey_initialize(void)
 	default_uamor &= ~(0x3ul << pkeyshift(1));
 
 	/*
-	 * Prevent the usage of OS reserved the keys. Update UAMOR
+	 * Prevent the usage of OS reserved keys. Update UAMOR
 	 * for those keys.
 	 */
-	for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
+	for (i = max_pkey; i < pkeys_total; i++) {
 		reserved_allocation_mask |= (0x1 << i);
 		default_uamor &= ~(0x3ul << pkeyshift(i));
 	}
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 11/12] powerpc/book3s64/pkeys: Make initial_allocation_mask static
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

initial_allocation_mask is not used outside this file.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h | 1 -
 arch/powerpc/mm/book3s64/pkeys.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 652bad7334f3..47c81d41ea9a 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -13,7 +13,6 @@
 
 DECLARE_STATIC_KEY_FALSE(pkey_disabled);
 extern int max_pkey;
-extern u32 initial_allocation_mask; /*  bits set for the initially allocated keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index a4d7287082a8..73b5ef1490c8 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -15,11 +15,11 @@
 DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
 int  max_pkey;			/* Maximum key value supported */
-u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
  */
 u32  reserved_allocation_mask;
+static u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 static u64 default_amr;
 static u64 default_iamr;
 /* Allow all keys to be modified by default */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 12/12] powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved
From: Aneesh Kumar K.V @ 2020-05-02 11:13 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, linuxram
In-Reply-To: <20200502111347.541836-1-aneesh.kumar@linux.ibm.com>

The hypervisor can return less than max allowed pkey (for ex: 31) instead
of 32. We should mark all the pkeys above max allowed as reserved so
that we avoid the allocation of the wrong pkey(for ex: key 31 in the above
case) by userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/pkeys.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 73b5ef1490c8..0ff59acdbb84 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -175,9 +175,10 @@ static int pkey_initialize(void)
 
 	/*
 	 * Prevent the usage of OS reserved keys. Update UAMOR
-	 * for those keys.
+	 * for those keys. Also mark the rest of the bits in the
+	 * 32 bit mask as reserved.
 	 */
-	for (i = max_pkey; i < pkeys_total; i++) {
+	for (i = max_pkey; i < 32 ; i++) {
 		reserved_allocation_mask |= (0x1 << i);
 		default_uamor &= ~(0x3ul << pkeyshift(i));
 	}
-- 
2.26.2


^ permalink raw reply related

* [RFC PATCH 01/10] kallsyms: architecture specific symbol lookups
From: Nicholas Piggin @ 2020-05-02 11:19 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200502111914.166578-1-npiggin@gmail.com>

Provide CONFIG_ARCH_HAS_SYMBOL_LOOKUP which allows architectures to
do their own symbol/address lookup if kernel and module lookups miss.

powerpc will use this to deal with firmware symbols.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/linux/kallsyms.h | 20 ++++++++++++++++++++
 kernel/kallsyms.c        | 13 ++++++++++++-
 lib/Kconfig              |  3 +++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 657a83b943f0..e17c1e7c01c0 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -83,6 +83,26 @@ extern int kallsyms_lookup_size_offset(unsigned long addr,
 				  unsigned long *symbolsize,
 				  unsigned long *offset);
 
+#ifdef CONFIG_ARCH_HAS_SYMBOL_LOOKUP
+const char *arch_symbol_lookup_address(unsigned long addr,
+			    unsigned long *symbolsize,
+			    unsigned long *offset,
+			    char **modname, char *namebuf);
+unsigned long arch_symbol_lookup_name(const char *name);
+#else
+static inline const char *arch_symbol_lookup_address(unsigned long addr,
+			    unsigned long *symbolsize,
+			    unsigned long *offset,
+			    char **modname, char *namebuf)
+{
+	return NULL;
+}
+static inline unsigned long arch_symbol_lookup_name(const char *name)
+{
+	return 0;
+}
+#endif
+
 /* Lookup an address.  modname is set to NULL if it's in the kernel. */
 const char *kallsyms_lookup(unsigned long addr,
 			    unsigned long *symbolsize,
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 16c8c605f4b0..1e403e616126 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -164,6 +164,7 @@ static unsigned long kallsyms_sym_address(int idx)
 unsigned long kallsyms_lookup_name(const char *name)
 {
 	char namebuf[KSYM_NAME_LEN];
+	unsigned long ret;
 	unsigned long i;
 	unsigned int off;
 
@@ -173,7 +174,12 @@ unsigned long kallsyms_lookup_name(const char *name)
 		if (strcmp(namebuf, name) == 0)
 			return kallsyms_sym_address(i);
 	}
-	return module_kallsyms_lookup_name(name);
+
+	ret = module_kallsyms_lookup_name(name);
+	if (ret)
+		return ret;
+
+	return arch_symbol_lookup_name(name);
 }
 
 int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
@@ -309,6 +315,11 @@ const char *kallsyms_lookup(unsigned long addr,
 	if (!ret)
 		ret = ftrace_mod_address_lookup(addr, symbolsize,
 						offset, modname, namebuf);
+
+	if (!ret)
+		ret = arch_symbol_lookup_address(addr, symbolsize,
+						offset, modname, namebuf);
+
 	return ret;
 }
 
diff --git a/lib/Kconfig b/lib/Kconfig
index 5d53f9609c25..9f86f649a712 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -80,6 +80,9 @@ config ARCH_USE_CMPXCHG_LOCKREF
 config ARCH_HAS_FAST_MULTIPLIER
 	bool
 
+config ARCH_HAS_SYMBOL_LOOKUP
+	bool
+
 config INDIRECT_PIO
 	bool "Access I/O in non-MMIO mode"
 	depends on ARM64
-- 
2.23.0


^ permalink raw reply related

* [RFC PATCH 00/10] OPAL V4
From: Nicholas Piggin @ 2020-05-02 11:19 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

"OPAL V4" is a proposed new approach to running and calling PowerNV
OPAL firmware.

OPAL calls use the caller's (kernel) stack, which vastly simplifies
re-entrancy concerns around doing things like idle and machine check
OPAL drivers.

The OS can get at symbol and assert metadata to help with debugging
firmware.

OPAL may be called (and will run in) virtual mode in its own address
space.

And the operating system provides some services to the firmware,
message logging, for example.

This fairly close to the point where we could run OPAL in user-mode
with a few services (scv could be used to call back to the OS) for
privileged instructions, we may yet do this, but one thing that's
stopped me is it would require a slower API. As it is now with LE
skiboot and LE Linux, the OPAL call is basically a shared-library
function call, which is fast enough that it's feasible to
implement a performant CPU idle driver, which is a significant
motivation.

Anyway this is up and running, coming together pretty well just needs
a bit of polishing and more documentation. I'll post the skiboot
patches on the skiboot list.

Nicholas Piggin (10):
  kallsyms: architecture specific symbol lookups
  powerpc/powernv: Wire up OPAL address lookups
  powerpc/powernv: Use OPAL_REPORT_TRAP to cope with trap interrupts
    from OPAL
  powerpc/powernv: avoid polling in opal_get_chars
  powerpc/powernv: Don't translate kernel addresses to real addresses
    for OPAL
  powerpc/powernv: opal use new opal call entry point if it exists
  powerpc/powernv: Add OPAL_FIND_VM_AREA API
  powerpc/powernv: Set up an mm context to call OPAL in
  powerpc/powernv: OPAL V4 OS services
  powerpc/powernv: OPAL V4 Implement vm_map/unmap service

 arch/powerpc/Kconfig                          |   1 +
 arch/powerpc/boot/opal.c                      |   5 +
 arch/powerpc/include/asm/opal-api.h           |  29 +-
 arch/powerpc/include/asm/opal.h               |   8 +
 arch/powerpc/kernel/traps.c                   |  39 ++-
 arch/powerpc/perf/imc-pmu.c                   |   4 +-
 arch/powerpc/platforms/powernv/npu-dma.c      |   2 +-
 arch/powerpc/platforms/powernv/opal-call.c    |  58 ++++
 arch/powerpc/platforms/powernv/opal-dump.c    |   2 +-
 arch/powerpc/platforms/powernv/opal-elog.c    |   4 +-
 arch/powerpc/platforms/powernv/opal-flash.c   |   6 +-
 arch/powerpc/platforms/powernv/opal-hmi.c     |   2 +-
 arch/powerpc/platforms/powernv/opal-nvram.c   |   4 +-
 .../powerpc/platforms/powernv/opal-powercap.c |   2 +-
 arch/powerpc/platforms/powernv/opal-psr.c     |   2 +-
 arch/powerpc/platforms/powernv/opal-xscom.c   |   2 +-
 arch/powerpc/platforms/powernv/opal.c         | 289 ++++++++++++++++--
 arch/powerpc/platforms/powernv/pci-ioda.c     |   2 +-
 arch/powerpc/sysdev/xive/native.c             |   2 +-
 drivers/char/powernv-op-panel.c               |   3 +-
 drivers/i2c/busses/i2c-opal.c                 |  12 +-
 drivers/mtd/devices/powernv_flash.c           |   4 +-
 include/linux/kallsyms.h                      |  20 ++
 kernel/kallsyms.c                             |  13 +-
 lib/Kconfig                                   |   3 +
 25 files changed, 461 insertions(+), 57 deletions(-)

-- 
2.23.0


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox