LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 1/2] powerpc/perf: Use stack siar instead of mfspr
From: Michael Ellerman @ 2021-08-14 12:30 UTC (permalink / raw)
  To: Christophe Leroy, kajoljain, linuxppc-dev; +Cc: atrajeev, maddy, rnsastry
In-Reply-To: <9662dfbf-d163-c313-745e-aeda0f638e98@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 13/08/2021 à 10:29, kajoljain a écrit :
>> 
>> On 8/13/21 1:54 PM, Kajol Jain wrote:
>>> Minor optimization in the 'perf_instruction_pointer' function code by
>>> making use of stack siar instead of mfspr.
>>>
>>> Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
>>> into perf_read_regs")
>>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> 
>> Please ignore this patch-set as I mentioned wrong version number. I will resend
>> this patch-set again with correct version. Sorry for the confusion.
>
> I fear you are creating even more confusion by sending a v1 after sending a v2 ...

Yeah in future just reply to the v2 saying "oops I sent v2 instead of
v1" and leave it at that.

cheers

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/perf: Return regs->nip as instruction pointer value when SIAR is 0
From: Michael Ellerman @ 2021-08-14 12:25 UTC (permalink / raw)
  To: Christophe Leroy, Kajol Jain, linuxppc-dev; +Cc: atrajeev, maddy, rnsastry
In-Reply-To: <95fc0f78-51d0-82f6-c9d6-de101fec445c@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 13/08/2021 à 10:24, Kajol Jain a écrit :
>> Incase of random sampling, there can be scenarios where SIAR is not
>> latching sample address and results in 0 value. Since current code
>> directly returning the siar value, we could see multiple instruction
>> pointer values as 0 in perf report.
>> Patch resolves this issue by adding a ternary condition to return
>> regs->nip incase SIAR is 0.
>> 
>> Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs
>> into perf_read_regs")
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>>   arch/powerpc/perf/core-book3s.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 1b464aad29c4..aeecaaf6810f 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs)
>>   		else
>>   			return regs->nip;
>>   	} else if (use_siar && siar_valid(regs))
>> -		return siar + perf_ip_adjust(regs);
>> +		return siar ? siar + perf_ip_adjust(regs) : regs->nip;
>
> Why bother about returning SIAR at all if regs->nip is ok ? Why not just return regs->nip all the time ?

Same answer as last time :)

https://lore.kernel.org/linuxppc-dev/87r1prxd9e.fsf@mpe.ellerman.id.au/

ie. SIAR can point into interrupts-off code, whereas regs->nip will
point to where we re-enabled interrupts.

cheers

^ permalink raw reply

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Michael Ellerman @ 2021-08-14 11:59 UTC (permalink / raw)
  To: Bill Wendling, Daniel Axtens, Fangrui Song
  Cc: Nick Desaulniers, LKML, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <CAGG=3QUz2LNgC8Hn6rU68ejjv4=J9Uidef0oH9A7=sKTs+vf7g@mail.gmail.com>

Bill Wendling <morbo@google.com> writes:
> On Fri, Aug 13, 2021 at 7:13 AM Daniel Axtens <dja@axtens.net> wrote:
>> Bill Wendling <morbo@google.com> writes:
...
>> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
>> > index 6505d66f1193..17a9fbf9b789 100644
>> > --- a/arch/powerpc/Makefile
>> > +++ b/arch/powerpc/Makefile
>> > @@ -122,6 +122,7 @@ endif
>> >
>> >  LDFLAGS_vmlinux-y := -Bstatic
>> >  LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
>> > +LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
...
>
> Unrelated question: Should the "-pie" flag be added with "+= -pie"
> (note the plus sign)?

I noticed that too.

It's been like that since the original relocatable support was added in
2008, commit 549e8152de80 ("powerpc: Make the 64-bit kernel as a
position-independent executable"), which did:

-LDFLAGS_vmlinux	:= -Bstatic
+LDFLAGS_vmlinux-yy := -Bstatic
+LDFLAGS_vmlinux-$(CONFIG_PPC64)$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-yy)


There's no mention of those flags in the change log. But the way it's
written suggests the intention was to not pass -Bstatic for relocatable
builds, otherwise it could have been more simply:

+LDFLAGS_vmlinux-$(CONFIG_PPC64)$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux	:= -Bstatic $(LDFLAGS_vmlinux-yy)


So I think it was deliberate to not use +=, but whether that's actually
correct I can't say. Maybe in the past -Bstatic and -pie were
incompatible?

cheers

^ permalink raw reply

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Segher Boessenkool @ 2021-08-14 11:01 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Fangrui Song, LKML, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, Bill Wendling, linuxppc-dev, Daniel Axtens
In-Reply-To: <CAKwvOd=rN9s5YBtt-AMnaqXhYCsAT=6yp29_oomRvTaev6Q6zw@mail.gmail.com>

On Fri, Aug 13, 2021 at 11:59:21AM -0700, Nick Desaulniers wrote:
> Or we can dig through why there are relocations in read only sections,
> fix those, then enable `-z text` for all linkers.  My recommendation
> would be get the thing building, then go digging time permitting.

It is not always a bug.  You can get much more efficient code if you
have text relocations than if you don't.  This "read-only" memory is
perfectly writable until after relocation, a la relro.

But you no doubt will find some non-optimalities (or even straight out
bugs) if you build with -ztext sometimes :-)


Segher

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS 01dc10da827c1725c0f5491c78d700a4478aae08
From: kernel test robot @ 2021-08-14 10:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: 01dc10da827c1725c0f5491c78d700a4478aae08  Automatic merge of 'fixes' into merge (2021-08-12 23:46)

elapsed time: 2610m

configs tested: 231
configs skipped: 4

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210814
i386                 randconfig-c001-20210812
i386                 randconfig-c001-20210813
powerpc                 mpc8540_ads_defconfig
ia64                                defconfig
arm                       imx_v6_v7_defconfig
mips                           rs90_defconfig
parisc                generic-32bit_defconfig
arm                     am200epdkit_defconfig
openrisc                  or1klitex_defconfig
m68k                        m5407c3_defconfig
powerpc                    klondike_defconfig
mips                     loongson1c_defconfig
arm                          ep93xx_defconfig
arm                          iop32x_defconfig
powerpc                      tqm8xx_defconfig
arm                         lpc18xx_defconfig
sh                        edosk7760_defconfig
x86_64                            allnoconfig
sh                             shx3_defconfig
mips                  maltasmvp_eva_defconfig
mips                         bigsur_defconfig
powerpc64                           defconfig
powerpc                        cell_defconfig
arm                     davinci_all_defconfig
mips                        workpad_defconfig
arm                       omap2plus_defconfig
powerpc                     pq2fads_defconfig
h8300                            alldefconfig
i386                             alldefconfig
mips                      fuloong2e_defconfig
mips                           mtx1_defconfig
arm                           u8500_defconfig
nios2                         3c120_defconfig
ia64                        generic_defconfig
powerpc                     tqm8555_defconfig
powerpc                      acadia_defconfig
arm                          simpad_defconfig
mips                  cavium_octeon_defconfig
nds32                               defconfig
parisc                           alldefconfig
arm                      tct_hammer_defconfig
powerpc                      obs600_defconfig
powerpc                      makalu_defconfig
arm                         shannon_defconfig
arm                        clps711x_defconfig
powerpc                      ppc64e_defconfig
sh                              ul2_defconfig
arm                         orion5x_defconfig
xtensa                           allyesconfig
arm                       imx_v4_v5_defconfig
powerpc                      ep88xc_defconfig
powerpc                     rainier_defconfig
sh                            shmin_defconfig
h8300                               defconfig
powerpc               mpc834x_itxgp_defconfig
mips                        maltaup_defconfig
mips                           ip22_defconfig
sh                           se7721_defconfig
m68k                       m5208evb_defconfig
x86_64                           alldefconfig
powerpc                     pseries_defconfig
mips                            gpr_defconfig
mips                      maltaaprp_defconfig
powerpc                      ppc6xx_defconfig
powerpc                 mpc837x_mds_defconfig
ia64                             alldefconfig
sh                        edosk7705_defconfig
sh                           se7750_defconfig
powerpc                    socrates_defconfig
riscv                             allnoconfig
powerpc                     ksi8560_defconfig
powerpc                 mpc837x_rdb_defconfig
sh                          landisk_defconfig
powerpc                 mpc832x_rdb_defconfig
powerpc                       ebony_defconfig
arm                           h5000_defconfig
powerpc                      ppc40x_defconfig
h8300                            allyesconfig
h8300                       h8s-sim_defconfig
arm                       aspeed_g4_defconfig
sh                  sh7785lcr_32bit_defconfig
mips                       lemote2f_defconfig
mips                          rm200_defconfig
arm                           stm32_defconfig
powerpc                       ppc64_defconfig
xtensa                  audio_kc705_defconfig
mips                      loongson3_defconfig
mips                          ath79_defconfig
arc                     haps_hs_smp_defconfig
sh                           se7712_defconfig
powerpc                   microwatt_defconfig
sh                          r7780mp_defconfig
arm                       cns3420vb_defconfig
mips                          ath25_defconfig
m68k                             allyesconfig
powerpc                    amigaone_defconfig
arm                         s5pv210_defconfig
arm                           viper_defconfig
sh                          lboxre2_defconfig
sh                          urquell_defconfig
microblaze                      mmu_defconfig
arm                         nhk8815_defconfig
sh                         ap325rxa_defconfig
arm                          gemini_defconfig
sh                               alldefconfig
sh                         apsh4a3a_defconfig
powerpc                      cm5200_defconfig
arm                            pleb_defconfig
powerpc                        fsp2_defconfig
arm                       multi_v4t_defconfig
ia64                             allmodconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
x86_64               randconfig-a006-20210812
x86_64               randconfig-a004-20210812
x86_64               randconfig-a003-20210812
x86_64               randconfig-a005-20210812
x86_64               randconfig-a002-20210812
x86_64               randconfig-a001-20210812
x86_64               randconfig-a004-20210814
x86_64               randconfig-a006-20210814
x86_64               randconfig-a003-20210814
x86_64               randconfig-a001-20210814
x86_64               randconfig-a005-20210814
x86_64               randconfig-a002-20210814
i386                 randconfig-a004-20210814
i386                 randconfig-a002-20210814
i386                 randconfig-a001-20210814
i386                 randconfig-a003-20210814
i386                 randconfig-a006-20210814
i386                 randconfig-a005-20210814
i386                 randconfig-a004-20210812
i386                 randconfig-a003-20210812
i386                 randconfig-a002-20210812
i386                 randconfig-a001-20210812
i386                 randconfig-a006-20210812
i386                 randconfig-a005-20210812
i386                 randconfig-a004-20210813
i386                 randconfig-a003-20210813
i386                 randconfig-a001-20210813
i386                 randconfig-a002-20210813
i386                 randconfig-a006-20210813
i386                 randconfig-a005-20210813
x86_64               randconfig-a011-20210813
x86_64               randconfig-a013-20210813
x86_64               randconfig-a012-20210813
x86_64               randconfig-a016-20210813
x86_64               randconfig-a015-20210813
x86_64               randconfig-a014-20210813
i386                 randconfig-a011-20210814
i386                 randconfig-a015-20210814
i386                 randconfig-a013-20210814
i386                 randconfig-a014-20210814
i386                 randconfig-a016-20210814
i386                 randconfig-a012-20210814
i386                 randconfig-a011-20210812
i386                 randconfig-a015-20210812
i386                 randconfig-a013-20210812
i386                 randconfig-a014-20210812
i386                 randconfig-a016-20210812
i386                 randconfig-a012-20210812
i386                 randconfig-a011-20210813
i386                 randconfig-a015-20210813
i386                 randconfig-a014-20210813
i386                 randconfig-a013-20210813
i386                 randconfig-a016-20210813
i386                 randconfig-a012-20210813
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                    rhel-8.3-kselftests
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-c001-20210812
x86_64               randconfig-c001-20210813
x86_64               randconfig-c001-20210814
x86_64               randconfig-a006-20210813
x86_64               randconfig-a004-20210813
x86_64               randconfig-a003-20210813
x86_64               randconfig-a002-20210813
x86_64               randconfig-a005-20210813
x86_64               randconfig-a001-20210813
x86_64               randconfig-a011-20210812
x86_64               randconfig-a013-20210812
x86_64               randconfig-a012-20210812
x86_64               randconfig-a016-20210812
x86_64               randconfig-a015-20210812
x86_64               randconfig-a014-20210812
x86_64               randconfig-a013-20210814
x86_64               randconfig-a011-20210814
x86_64               randconfig-a016-20210814
x86_64               randconfig-a012-20210814
x86_64               randconfig-a014-20210814
x86_64               randconfig-a015-20210814

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH v3 4/8] PCI: replace pci_dev::driver usage that gets the driver name
From: Christoph Hellwig @ 2021-08-14  8:38 UTC (permalink / raw)
  To: Uwe Kleine-K??nig
  Cc: Alexander Duyck, oss-drivers, Paul Mackerras, Herbert Xu,
	Ido Schimmel, Rafa?? Mi??ecki, Jesse Brandeburg,
	Christoph Hellwig, Bjorn Helgaas, linux-pci, Jakub Kicinski,
	Yisen Zhuang, Vadym Kochan, Michael Buesch, Jiri Pirko,
	Salil Mehta, Greg Kroah-Hartman, linux-wireless, linux-kernel,
	Taras Chornyi, Zhou Wang, linux-crypto, kernel, netdev,
	Simon Horman, Oliver O'Halloran, linuxppc-dev,
	David S. Miller
In-Reply-To: <20210812081425.7pjy4a25e2ehkr3x@pengutronix.de>

On Thu, Aug 12, 2021 at 10:14:25AM +0200, Uwe Kleine-K??nig wrote:
> dev_driver_string() might return "" (via dev_bus_name()). If that happens
> *drvstr == '\0' becomes true.
> 
> Would the following be better?:
> 
> 	const char *drvstr;
> 
> 	if (pdev)
> 		return "<null>";
> 
> 	drvstr = dev_driver_string(&pdev->dev);
> 
> 	if (!strcmp(drvstr, ""))
> 		return "<null>";
> 
> 	return drvstr;
> 
> When I thought about this hunk I considered it ugly to have "<null>" in
> it twice.

Well, if you want to avoid that you can do:

	if (pdev) {
		const char *name = dev_driver_string(&pdev->dev);

		if (strcmp(drvstr, ""))
			return name;
	}
	return "<null>";

Which would be a lot more readable.

^ permalink raw reply

* Re: [PATCH v1 17/55] KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
From: Athira Rajeev @ 2021-08-14  7:12 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kvm-ppc
In-Reply-To: <1628827731.ai2zz7xxwa.astroid@bobo.none>



> On 13-Aug-2021, at 9:54 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Athira Rajeev's message of August 9, 2021 1:03 pm:
>> 
>> 
>>> On 26-Jul-2021, at 9:19 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> 
>>> +static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
>>> +{
>>> +	if (!(mmcr0 & MMCR0_FC))
>>> +		goto do_freeze;
>>> +	if (mmcra & MMCRA_SAMPLE_ENABLE)
>>> +		goto do_freeze;
>>> +	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>>> +		if (!(mmcr0 & MMCR0_PMCCEXT))
>>> +			goto do_freeze;
>>> +		if (!(mmcra & MMCRA_BHRB_DISABLE))
>>> +			goto do_freeze;
>>> +	}
>>> +	return;
>>> +
>>> +do_freeze:
>>> +	mmcr0 = MMCR0_FC;
>>> +	mmcra = 0;
>>> +	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>>> +		mmcr0 |= MMCR0_PMCCEXT;
>>> +		mmcra = MMCRA_BHRB_DISABLE;
>>> +	}
>>> +
>>> +	mtspr(SPRN_MMCR0, mmcr0);
>>> +	mtspr(SPRN_MMCRA, mmcra);
>>> +	isync();
>>> +}
>>> +
>> Hi Nick,
>> 
>> After feezing pmu, do we need to clear “pmcregs_in_use” as well?
> 
> Not until we save the values out of the registers. pmcregs_in_use = 0 
> means our hypervisor is free to clear our PMU register contents.
> 
>> Also can’t we unconditionally do the MMCR0/MMCRA/ freeze settings in here ? do we need the if conditions for FC/PMCCEXT/BHRB ?
> 
> I think it's possible, but pretty minimal advantage. I would prefer to 
> set them the way perf does for now.

Sure Nick, 

Other changes looks good to me.

Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

Thanks
Athira
> If we can move this code into perf/
> it should become easier for you to tweak things.
> 
> Thanks,
> Nick


^ permalink raw reply

* Re: [PATCH v15 7/9] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX
From: Jordan Niethe @ 2021-08-14  1:23 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Laurent Vivier, ajd, Aneesh Kumar K.V, muriloo, Greg Kurz,
	Nicholas Piggin, cmr, kvm-ppc, naveen.n.rao, David Gibson,
	linuxppc-dev, Daniel Axtens
In-Reply-To: <8735rdt0i4.fsf@linux.ibm.com>

On Sat, Aug 14, 2021 at 8:59 AM Fabiano Rosas <farosas@linux.ibm.com> wrote:
>
> Laurent Vivier <lvivier@redhat.com> writes:
>
> >
> > since this patch is merged my VM is experiencing a crash at boot (20% of the time):
> >
> > [    8.496850] kernel tried to execute exec-protected page (c008000004073278) - exploit
> > attempt? (uid: 0)
> > [    8.496921] BUG: Unable to handle kernel instruction fetch
> > [    8.496954] Faulting instruction address: 0xc008000004073278
> > [    8.496994] Oops: Kernel access of bad area, sig: 11 [#1]
> > [    8.497028] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > [    8.497071] Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks xfs
> > libcrc32c virtio_net net_failover virtio_blk vmx_crypto failover dm_mirror dm_region_hash
> > dm_log dm_mod
> > [    8.497186] CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
> > [    8.497228] Workqueue: events control_work_handler [virtio_console]
> > [    8.497272] NIP:  c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
> > [    8.497320] REGS: c00000002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
> > [    8.497361] MSR:  800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822
> > XER: 200400cf
> > [    8.497426] CFAR: c0000000001e9e44 IRQMASK: 1
> > [    8.497426] GPR00: c008000004073278 c00000002e4efa80 c000000002a26b00 c000000042c39520
> > [    8.497426] GPR04: 0000000000000001 0000000000000000 0000000000000000 00000000000000ff
> > [    8.497426] GPR08: 0000000000000001 c000000042c39520 0000000000000001 c008000004076008
> > [    8.497426] GPR12: c0000000001e9de0 c0000001fffccb00 c00000000018ba88 c00000002c91d400
> > [    8.497426] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    8.497426] GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000004080340
> > [    8.497426] GPR24: c0080000040a01e8 0000000000000000 0000000000000000 c00000002e0975c0
> > [    8.497426] GPR28: c00000002ce72940 c000000042c39520 0000000000000048 0000000000000038
> > [    8.497891] NIP [c008000004073278] fill_queue+0xf0/0x210 [virtio_console]
> > [    8.497934] LR [c008000004073278] fill_queue+0xf0/0x210 [virtio_console]
> > [    8.497976] Call Trace:
> > [    8.497993] [c00000002e4efa80] [c00800000407323c] fill_queue+0xb4/0x210
> > [virtio_console] (unreliable)
> > [    8.498052] [c00000002e4efae0] [c008000004073a90] add_port+0x1a8/0x470 [virtio_console]
> > [    8.498102] [c00000002e4efbb0] [c0080000040750f4] control_work_handler+0xbc/0x1e8
> > [virtio_console]
> > [    8.498160] [c00000002e4efc60] [c00000000017f4f0] process_one_work+0x290/0x590
> > [    8.498212] [c00000002e4efd00] [c00000000017f878] worker_thread+0x88/0x620
> > [    8.498256] [c00000002e4efda0] [c00000000018bc14] kthread+0x194/0x1a0
> > [    8.498299] [c00000002e4efe10] [c00000000000cf54] ret_from_kernel_thread+0x5c/0x64
> > [    8.498349] Instruction dump:
> > [    8.498374] 7da96b78 a14d0c8a 419c00b0 2f8a0000 419eff88 b32d0c8a 7c0004ac 4bffff7c
> > [    8.498430] 60000000 60000000 7fa3eb78 48002d95 <e8410018> 38600000 480025e1 e8410018
> > [    8.498485] ---[ end trace 16ee10903290b647 ]---
> > [    8.501433]
> > [    9.502601] Kernel panic - not syncing: Fatal exception
> >
> > add_port+0x1a8/0x470 :
> >
> >   1420
> >   1421                /* We can safely ignore ENOSPC because it means
> >   1422                 * the queue already has buffers. Buffers are removed
> >   1423                 * only by virtcons_remove(), not by unplug_port()
> >   1424                 */
> > ->1425                err = fill_queue(port->in_vq, &port->inbuf_lock);
> >   1426                if (err < 0 && err != -ENOSPC) {
> >   1427                        dev_err(port->dev, "Error allocating inbufs\n");
> >   1428                        goto free_device;
> >   1429                }
> >
> > fill_queue+0x90/0x210 :
> >
> >   1326        static int fill_queue(struct virtqueue *vq, spinlock_t *lock)
> >   1327        {
> >   1328                struct port_buffer *buf;
> >   1329                int nr_added_bufs;
> >   1330                int ret;
> >   1331
> >   1332                nr_added_bufs = 0;
> >   1333                do {
> >   1334                        buf = alloc_buf(vq->vdev, PAGE_SIZE, 0);
> >   1335                        if (!buf)
> >   1336                                return -ENOMEM;
> >   1337
> > ->1338                        spin_lock_irq(lock);
> >
> > I'm using an upstream kernel (5.14-rc4, 251a1524293d) in the VM.
> >
> > My host is a RHEL 8.5/POWER9: qemu-kvm-6.0.0-21 and kernel-4.18.0-325
> >
> > My qemu command line is:
> >
> > /usr/libexec/qemu-kvm \
> > -M pseries,accel=kvm \
> > -nographic -nodefaults \
> > -device virtio-serial-pci \
> > -device virtconsole \
> > -device virtio-net-pci,mac=9a:2b:2c:2d:2e:2f,netdev=hostnet0  \
> > -blockdev
> > node-name=disk1,file.driver=file,driver=qcow2,file.driver=file,file.filename=disk.qcow2 \
> > -netdev bridge,id=hostnet0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \
> > -device virtio-blk-pci,id=image1,drive=disk1 \
> > -m 8192  \
> > -smp 4 \
> > -serial mon:stdio
> >
> >
> > Do we need something in qemu/kvm to support STRICT_MODULE_RWX ?
> >
> > Thanks,
> > Laurent
>
> This patch survived 300 consecutive runs without the issue:
>
> diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
> index 0876216ceee6..7aeb2b62e5dc 100644
> --- a/arch/powerpc/mm/pageattr.c
> +++ b/arch/powerpc/mm/pageattr.c
> @@ -35,10 +35,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
>         pte_t pte;
>
>         spin_lock(&init_mm.page_table_lock);
> -
> -       /* invalidate the PTE so it's safe to modify */
> -       pte = ptep_get_and_clear(&init_mm, addr, ptep);
> -       flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +       pte = *ptep;
>
>         /* modify the PTE bits as desired, then apply */
>         switch (action) {
> @@ -60,10 +57,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
>         }
>
>         set_pte_at(&init_mm, addr, ptep, pte);
> -
> -       /* See ptesync comment in radix__set_pte_at() */
> -       if (radix_enabled())
> -               asm volatile("ptesync": : :"memory");
> +       flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>         spin_unlock(&init_mm.page_table_lock);
>
>         return 0;
> ---
>
> What I think is happening is that the virtio_console code is running
> at the same time we are doing the `module_enable_ro` at the end of
> `do_init_module` due to the async nature of the work handler. There is
> a window after the TLB flush when the PTE has its permission bits
> cleared, so any translations of the module code page attempted during
> that window will fault.

Thanks for looking at this. I agree that this is the problem.
This avoids the crash (not a proper solution):

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2011,13 +2011,15 @@ static void module_enable_ro(const struct
module *mod, bool after_init)
        if (!rodata_enabled)
                return;

-       set_vm_flush_reset_perms(mod->core_layout.base);
-       set_vm_flush_reset_perms(mod->init_layout.base);
-       frob_text(&mod->core_layout, set_memory_ro);
-
-       frob_rodata(&mod->core_layout, set_memory_ro);
-       frob_text(&mod->init_layout, set_memory_ro);
-       frob_rodata(&mod->init_layout, set_memory_ro);
+       if (!after_init) {
+               set_vm_flush_reset_perms(mod->core_layout.base);
+               set_vm_flush_reset_perms(mod->init_layout.base);
+               frob_text(&mod->core_layout, set_memory_ro);
+
+               frob_rodata(&mod->core_layout, set_memory_ro);
+               frob_text(&mod->init_layout, set_memory_ro);
+               frob_rodata(&mod->init_layout, set_memory_ro);
+       }

>
> I'm ignorant of strict rwx in general so I don't see why we need to
> clear the bits before setting them to their final value, but as I
> understand it, the set_pte_at + flush_tlb_kernel_range satisfy the ISA
> requirement of [ptesync; tlbie; eieio; tlbsync; ptesync;] so it seems
> like the patch should work.
I believe it's done like that because ISA 6.10.1.2 Modifying a
Translation Table Entry -
"The sequence is equivalent to deleting the PTE and then adding a new one".
In that sequence, I think it is [set pte to 0; ptesync; tlbie; eieio;
tlbsync; ptesync;]

But it does seem like we need to do something like your patch for
change_page_attr() to work as expected.
>
> Now, I cannot explain why the crash always happens around the code
> that does the module's symbols relocation (the NIP in Laurent's trace
> is the TOC reload from module_64.c:restore_r2). Maybe because
> instructions are already in icache until the first branch into the
> stub?
Yeah, mpe thought it might be some cache effect of a branch instruction.
From time to time I did see Oopses like this (not "kernel tried to
execute exec-protected page")

[  124.986964][   T52] Oops: Kernel access of bad area, sig: 11 [#1]
[  124.987043][   T52] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  124.987095][   T52] Modules linked in: virtio_console binfmt_misc
virtiofs fuse virtio_net virtio_blk virtio_pci virtio_pci_modern_dev
virtio_ring virtio crc32c_vpmsum [last unloaded: virtio_console]
[  124.987209][   T52] CPU: 2 PID: 52 Comm: kworker/2:1 Not tainted
5.14.0-rc4 #4
[  124.987259][   T52] Workqueue: events control_work_handler [virtio_console]
[  124.987307][   T52] NIP:  c008000001a86044 LR: c008000001a82cf8
CTR: c00000000042a6c0
[  124.987358][   T52] REGS: c00000000b273770 TRAP: 0300   Not tainted
 (5.14.0-rc4)
[  124.987406][   T52] MSR:  800000000280b033
<SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 82002882  XER: 00000004
[  124.987475][   T52] CFAR: c008000001a82cf4 DAR: c008000001a86058
DSISR: 40000000 IRQMASK: 0
[  124.987475][   T52] GPR00: c008000001a82ce8 c00000000b273a10
c008000001ab8000 c000000010731320
[  124.987475][   T52] GPR04: 0000000000000001 0000000000000000
0000000000000000 0000000000000000
[  124.987475][   T52] GPR08: 0000000000000000 0000000000000000
0000000000000000 c008000001a86038
[  124.987475][   T52] GPR12: 0000000022002884 c0000001ffffdf00
c000000000169ae8 c0000000037700c0
[  124.987475][   T52] GPR16: 0000000000000000 0000000000000000
c000000021af0000 c000000003313900
[  124.987475][   T52] GPR20: 0000000000000001 0000000000000000
0000000000000000 c000000003316780
[  124.987475][   T52] GPR24: c008000001a90288 0000000000000000
0000000000000000 c000000010731320
[  124.987475][   T52] GPR28: c000000003316900 0000000000000018
0000000000000068 c00000000379c5a0
[  124.988028][   T52] NIP [c008000001a86044] fini+0x824/0xa7e0 [virtio_console]
[  124.988085][   T52] LR [c008000001a82cf8] fill_queue+0xb0/0x230
[virtio_console]
[  124.988141][   T52] Call Trace:
[  124.988171][   T52] [c00000000b273a10] [c008000001a82ce8]
fill_queue+0xa0/0x230 [virtio_console] (unreliable)
[  124.988246][   T52] [c00000000b273aa0] [c008000001a83208]
add_port.isra.0+0x1a0/0x4b0 [virtio_console]
[  124.988312][   T52] [c00000000b273b70] [c008000001a85474]
control_work_handler+0x46c/0x654 [virtio_console]
[  124.988403][   T52] [c00000000b273c70] [c00000000015ce60]
process_one_work+0x2a0/0x570
[  124.988586][   T52] [c00000000b273d10] [c00000000015d1d8]
worker_thread+0xa8/0x660
[  124.988684][   T52] [c00000000b273da0] [c000000000169c5c] kthread+0x17c/0x190
[  124.988762][   T52] [c00000000b273e10] [c00000000000cf54]
ret_from_kernel_thread+0x5c/0x64
[  124.988916][   T52] Instruction dump:
[  124.988965][   T52] 396be010 f8410018 e98b0020 7d8903a6 4e800420
00000000 73747562 003a0300
[  124.989152][   T52] c0000000 3d62fffd 396be038 f8410018 <e98b0020>
7d8903a6 4e800420 00000000
[  124.989298][   T52] ---[ end trace ab9046c024eb3154 ]---

I guess it depends on the timing of which pte is invalidated.
>
> Anyway, this is what Murilo and I found out over our debugging session
> in the past couple of days. I hope it helps. =)
Yes it does, thanks heaps.
>

^ permalink raw reply

* Re: [PATCH v15 7/9] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX
From: Fabiano Rosas @ 2021-08-13 22:58 UTC (permalink / raw)
  To: Laurent Vivier, Jordan Niethe, linuxppc-dev
  Cc: ajd, aneesh.kumar, muriloo, Greg Kurz, npiggin, cmr, kvm-ppc,
	naveen.n.rao, David Gibson, dja
In-Reply-To: <f7624d58-80e1-6912-1867-7874f1a569f5@redhat.com>

Laurent Vivier <lvivier@redhat.com> writes:

>
> since this patch is merged my VM is experiencing a crash at boot (20% of the time):
>
> [    8.496850] kernel tried to execute exec-protected page (c008000004073278) - exploit
> attempt? (uid: 0)
> [    8.496921] BUG: Unable to handle kernel instruction fetch
> [    8.496954] Faulting instruction address: 0xc008000004073278
> [    8.496994] Oops: Kernel access of bad area, sig: 11 [#1]
> [    8.497028] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [    8.497071] Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks xfs
> libcrc32c virtio_net net_failover virtio_blk vmx_crypto failover dm_mirror dm_region_hash
> dm_log dm_mod
> [    8.497186] CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
> [    8.497228] Workqueue: events control_work_handler [virtio_console]
> [    8.497272] NIP:  c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
> [    8.497320] REGS: c00000002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
> [    8.497361] MSR:  800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822
> XER: 200400cf
> [    8.497426] CFAR: c0000000001e9e44 IRQMASK: 1
> [    8.497426] GPR00: c008000004073278 c00000002e4efa80 c000000002a26b00 c000000042c39520
> [    8.497426] GPR04: 0000000000000001 0000000000000000 0000000000000000 00000000000000ff
> [    8.497426] GPR08: 0000000000000001 c000000042c39520 0000000000000001 c008000004076008
> [    8.497426] GPR12: c0000000001e9de0 c0000001fffccb00 c00000000018ba88 c00000002c91d400
> [    8.497426] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    8.497426] GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000004080340
> [    8.497426] GPR24: c0080000040a01e8 0000000000000000 0000000000000000 c00000002e0975c0
> [    8.497426] GPR28: c00000002ce72940 c000000042c39520 0000000000000048 0000000000000038
> [    8.497891] NIP [c008000004073278] fill_queue+0xf0/0x210 [virtio_console]
> [    8.497934] LR [c008000004073278] fill_queue+0xf0/0x210 [virtio_console]
> [    8.497976] Call Trace:
> [    8.497993] [c00000002e4efa80] [c00800000407323c] fill_queue+0xb4/0x210
> [virtio_console] (unreliable)
> [    8.498052] [c00000002e4efae0] [c008000004073a90] add_port+0x1a8/0x470 [virtio_console]
> [    8.498102] [c00000002e4efbb0] [c0080000040750f4] control_work_handler+0xbc/0x1e8
> [virtio_console]
> [    8.498160] [c00000002e4efc60] [c00000000017f4f0] process_one_work+0x290/0x590
> [    8.498212] [c00000002e4efd00] [c00000000017f878] worker_thread+0x88/0x620
> [    8.498256] [c00000002e4efda0] [c00000000018bc14] kthread+0x194/0x1a0
> [    8.498299] [c00000002e4efe10] [c00000000000cf54] ret_from_kernel_thread+0x5c/0x64
> [    8.498349] Instruction dump:
> [    8.498374] 7da96b78 a14d0c8a 419c00b0 2f8a0000 419eff88 b32d0c8a 7c0004ac 4bffff7c
> [    8.498430] 60000000 60000000 7fa3eb78 48002d95 <e8410018> 38600000 480025e1 e8410018
> [    8.498485] ---[ end trace 16ee10903290b647 ]---
> [    8.501433]
> [    9.502601] Kernel panic - not syncing: Fatal exception
>
> add_port+0x1a8/0x470 :
>
>   1420	
>   1421		/* We can safely ignore ENOSPC because it means
>   1422		 * the queue already has buffers. Buffers are removed
>   1423		 * only by virtcons_remove(), not by unplug_port()
>   1424		 */
> ->1425		err = fill_queue(port->in_vq, &port->inbuf_lock);
>   1426		if (err < 0 && err != -ENOSPC) {
>   1427			dev_err(port->dev, "Error allocating inbufs\n");
>   1428			goto free_device;
>   1429		}
>
> fill_queue+0x90/0x210 :
>
>   1326	static int fill_queue(struct virtqueue *vq, spinlock_t *lock)
>   1327	{
>   1328		struct port_buffer *buf;
>   1329		int nr_added_bufs;
>   1330		int ret;
>   1331	
>   1332		nr_added_bufs = 0;
>   1333		do {
>   1334			buf = alloc_buf(vq->vdev, PAGE_SIZE, 0);
>   1335			if (!buf)
>   1336				return -ENOMEM;
>   1337	
> ->1338			spin_lock_irq(lock);
>
> I'm using an upstream kernel (5.14-rc4, 251a1524293d) in the VM.
>
> My host is a RHEL 8.5/POWER9: qemu-kvm-6.0.0-21 and kernel-4.18.0-325
>
> My qemu command line is:
>
> /usr/libexec/qemu-kvm \
> -M pseries,accel=kvm \
> -nographic -nodefaults \
> -device virtio-serial-pci \
> -device virtconsole \
> -device virtio-net-pci,mac=9a:2b:2c:2d:2e:2f,netdev=hostnet0  \
> -blockdev
> node-name=disk1,file.driver=file,driver=qcow2,file.driver=file,file.filename=disk.qcow2 \
> -netdev bridge,id=hostnet0,br=virbr0,helper=/usr/libexec/qemu-bridge-helper \
> -device virtio-blk-pci,id=image1,drive=disk1 \
> -m 8192  \
> -smp 4 \
> -serial mon:stdio
>
>
> Do we need something in qemu/kvm to support STRICT_MODULE_RWX ?
>
> Thanks,
> Laurent

This patch survived 300 consecutive runs without the issue:

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 0876216ceee6..7aeb2b62e5dc 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -35,10 +35,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
        pte_t pte;
 
        spin_lock(&init_mm.page_table_lock);
-
-       /* invalidate the PTE so it's safe to modify */
-       pte = ptep_get_and_clear(&init_mm, addr, ptep);
-       flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+       pte = *ptep;
 
        /* modify the PTE bits as desired, then apply */
        switch (action) {
@@ -60,10 +57,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
        }
 
        set_pte_at(&init_mm, addr, ptep, pte);
-
-       /* See ptesync comment in radix__set_pte_at() */
-       if (radix_enabled())
-               asm volatile("ptesync": : :"memory");
+       flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
        spin_unlock(&init_mm.page_table_lock);
 
        return 0;
---

What I think is happening is that the virtio_console code is running
at the same time we are doing the `module_enable_ro` at the end of
`do_init_module` due to the async nature of the work handler. There is
a window after the TLB flush when the PTE has its permission bits
cleared, so any translations of the module code page attempted during
that window will fault.

I'm ignorant of strict rwx in general so I don't see why we need to
clear the bits before setting them to their final value, but as I
understand it, the set_pte_at + flush_tlb_kernel_range satisfy the ISA
requirement of [ptesync; tlbie; eieio; tlbsync; ptesync;] so it seems
like the patch should work.

Now, I cannot explain why the crash always happens around the code
that does the module's symbols relocation (the NIP in Laurent's trace
is the TOC reload from module_64.c:restore_r2). Maybe because
instructions are already in icache until the first branch into the
stub?

Anyway, this is what Murilo and I found out over our debugging session
in the past couple of days. I hope it helps. =)


^ permalink raw reply related

* Re: [PATCH] soc: fsl: qe: fix static checker warning
From: Li Yang @ 2021-08-13 21:39 UTC (permalink / raw)
  To: Maxim Kochetkov
  Cc: Saravana Kannan, lkml,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linuxppc-dev, Dan Carpenter, Qiang Zhao
In-Reply-To: <20210811071036.44658-1-fido_max@inbox.ru>

On Wed, Aug 11, 2021 at 2:10 AM Maxim Kochetkov <fido_max@inbox.ru> wrote:
>
> The patch be7ecbd240b2: "soc: fsl: qe: convert QE interrupt
> controller to platform_device" from Aug 3, 2021, leads to the
> following static checker warning:
>
>         drivers/soc/fsl/qe/qe_ic.c:438 qe_ic_init()
>         warn: unsigned 'qe_ic->virq_low' is never less than zero.
>
> In old variant irq_of_parse_and_map() returns zero if failed so
> unsigned int for virq_high/virq_low was ok.
> In new variant platform_get_irq() returns negative error codes
> if failed so we need to use int for virq_high/virq_low.
>
> Also simplify high_handler checking and remove the curly braces
> to make checkpatch happy.
>
> Fixes: be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device")
> Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
>  drivers/soc/fsl/qe/qe_ic.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
> index e710d554425d..bff34ee2150a 100644
> --- a/drivers/soc/fsl/qe/qe_ic.c
> +++ b/drivers/soc/fsl/qe/qe_ic.c
> @@ -54,8 +54,8 @@ struct qe_ic {
>         struct irq_chip hc_irq;
>
>         /* VIRQ numbers of QE high/low irqs */
> -       unsigned int virq_high;
> -       unsigned int virq_low;
> +       int virq_high;
> +       int virq_low;
>  };
>
>  /*
> @@ -435,9 +435,8 @@ static int qe_ic_init(struct platform_device *pdev)
>         qe_ic->virq_high = platform_get_irq(pdev, 0);
>         qe_ic->virq_low = platform_get_irq(pdev, 1);
>
> -       if (qe_ic->virq_low < 0) {
> +       if (qe_ic->virq_low < 0)

Probably should be <= 0 here.

>                 return -ENODEV;
> -       }
>
>         if (qe_ic->virq_high != qe_ic->virq_low) {

Probably we should check if qe_ic->virq_high > 0 here if we rely on
this to decide whether to set the handler later.

Applied with the above changes.  Thanks

>                 low_handler = qe_ic_cascade_low;
> @@ -459,7 +458,7 @@ static int qe_ic_init(struct platform_device *pdev)
>         irq_set_handler_data(qe_ic->virq_low, qe_ic);
>         irq_set_chained_handler(qe_ic->virq_low, low_handler);
>
> -       if (qe_ic->virq_high && qe_ic->virq_high != qe_ic->virq_low) {
> +       if (high_handler) {
>                 irq_set_handler_data(qe_ic->virq_high, qe_ic);
>                 irq_set_chained_handler(qe_ic->virq_high, high_handler);
>         }
> --
> 2.31.1
>

^ permalink raw reply

* Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
From: Tom Lendacky @ 2021-08-13 20:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	Will Deacon, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, David Airlie, Ingo Molnar,
	linux-graphics-maintainer, Dave Young, Tianyu Lan,
	Thomas Zimmermann, Maarten Lankhorst, Maxime Ripard,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, kexec,
	linux-kernel, iommu, Daniel Vetter, linux-fsdevel, linuxppc-dev
In-Reply-To: <943223d5-5949-6aba-8a49-0b07078d68e1@amd.com>

On 8/13/21 12:08 PM, Tom Lendacky wrote:
> On 8/12/21 5:07 AM, Kirill A. Shutemov wrote:
>> On Wed, Aug 11, 2021 at 10:52:55AM -0500, Tom Lendacky wrote:
>>> On 8/11/21 7:19 AM, Kirill A. Shutemov wrote:
>>>> On Tue, Aug 10, 2021 at 02:48:54PM -0500, Tom Lendacky wrote:
>>>>> On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote:
> ...
>>>> Looking at code agains, now I *think* the reason is accessing a global
>>>> variable from __startup_64() inside TDX version of prot_guest_has().
>>>>
>>>> __startup_64() is special. If you access any global variable you need to
>>>> use fixup_pointer(). See comment before __startup_64().
>>>>
>>>> I'm not sure how you get away with accessing sme_me_mask directly from
>>>> there. Any clues? Maybe just a luck and complier generates code just 
>>>> right
>>>> for your case, I donno.
>>>
>>> Hmm... yeah, could be that the compiler is using rip-relative addressing
>>> for it because it lives in the .data section?
>>
>> I guess. It has to be fixed. It may break with complier upgrade or any
>> random change around the code.
> 
> I'll look at doing that separate from this series.
> 
>>
>> BTW, does it work with clang for you?
> 
> I haven't tried with clang, I'll check on that.

Just as an fyi, clang also uses rip relative addressing for those 
variables. No issues booting SME and SEV guests built with clang.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>

^ permalink raw reply

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Fangrui Song @ 2021-08-13 20:05 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Nick Desaulniers, linux-kernel, Nathan Chancellor,
	clang-built-linux, Paul Mackerras, Bill Wendling, linuxppc-dev
In-Reply-To: <87sfzde8lk.fsf@linkitivity.dja.id.au>

On 2021-08-14, Daniel Axtens wrote:
>Bill Wendling <morbo@google.com> writes:
>
>> The "-z notext" flag disables reporting an error if DT_TEXTREL is set on
>> PPC with CONFIG=kdump:
>>
>>   ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
>>     local symbol in readonly segment; recompile object files with -fPIC
>>     or pass '-Wl,-z,notext' to allow text relocations in the output
>>   >>> defined in built-in.a(arch/powerpc/kernel/misc.o)
>>   >>> referenced by arch/powerpc/kernel/misc.o:(.text+0x20) in archive
>>       built-in.a
>>
>> The BFD linker disables this by default (though it's configurable in
>> current versions). LLD enables this by default. So we add the flag to
>> keep LLD from emitting the error.
>
>You didn't provide a huge amount of context but I was able to reproduce
>a similar set of errors with pseries_le_defconfig and
>
>make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- CC="ccache clang-11" LD=ld.lld-11 AR=llvm-ar-11 -j4 vmlinux
>
>I also checked the manpage, and indeed the system ld does not issue this
>warning/error by default.
>
>> ---
>>  arch/powerpc/Makefile | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
>> index 6505d66f1193..17a9fbf9b789 100644
>> --- a/arch/powerpc/Makefile
>> +++ b/arch/powerpc/Makefile
>> @@ -122,6 +122,7 @@ endif
>>
>>  LDFLAGS_vmlinux-y := -Bstatic
>>  LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
>> +LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
>
>Is there any reason this should be gated on CONFIG_RELOCATABLE? (I tried
>without it and got different but possibly related linker errors...)
>
>Also, is this behaviour new?

This is a longstanding behavior.

https://maskray.me/blog/2020-12-19-lld-and-gnu-linker-incompatibilities
See "Text relocations"

.o files used to link .tmp_vmlinux.kallsyms1 have many R_PPC64_ADDR64
relocations in non-SHF_WRITE sections. There are many text relocations (e.g. in
.rela___ksymtab_gpl+* and .rela__mcount_loc sections) in a -pie link and are
disallowed by LLD:

   ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
   >>> defined in arch/powerpc/kernel/head_64.o
   >>> referenced by arch/powerpc/kernel/head_64.o:(__restart_table+0x10)

Newer GNU ld configured with --enable-textrel-check=error will report an error
as well:

   ld/ld-new: read-only segment has dynamic relocations



Text relocations are considered very awful by linker developers.
binutils 2.35 added --enable-textrel-check={no,warn,error}
https://sourceware.org/bugzilla/show_bug.cgi?id=20824

I can imagine that in the future some Linux distributions (especially those
focusing on security) will default their binutils to use
--enable-textrel-check={no,warn,error}.  CONFIG_RELOCATABLE build will break
sooner or later.


In -no-pie links, R_PPC64_ADDR64 relocations are link-time constants.
There are no text relocations, therefore no need for -z notext.

>Kind regards,
>Daniel
>
>>  LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
>>
>>  ifdef CONFIG_PPC64
>> --
>> 2.33.0.rc1.237.g0d66db33f3-goog
>
>-- 
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/87sfzde8lk.fsf%40linkitivity.dja.id.au.

^ permalink raw reply

* [PATCH v2] ppc: add "-z notext" flag to disable diagnostic
From: Bill Wendling @ 2021-08-13 20:05 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Nathan Chancellor, Nick Desaulniers, linuxppc-dev, linux-kernel,
	clang-built-linux, Daniel Axtens, Fangrui Song
  Cc: Itaru Kitayama, Bill Wendling
In-Reply-To: <20210812204951.1551782-1-morbo@google.com>

From: Fangrui Song <maskray@google.com>

Object files used to link .tmp_vmlinux.kallsyms1 have many R_PPC64_ADDR64
relocations in non-SHF_WRITE sections. There are many text relocations (e.g. in
.rela___ksymtab_gpl+* and .rela__mcount_loc sections) in a -pie link and are
disallowed by LLD:

  ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against local symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
  >>> defined in arch/powerpc/kernel/head_64.o
  >>> referenced by arch/powerpc/kernel/head_64.o:(__restart_table+0x10)

Newer GNU ld configured with "--enable-textrel-check=error" will report an
error as well:

  $ ld-new -EL -m elf64lppc -pie ... -o .tmp_vmlinux.kallsyms1 ...
  ld-new: read-only segment has dynamic relocations

Add "-z notext" to suppress the errors. Non-CONFIG_RELOCATABLE builds use the
default -no-pie mode and thus R_PPC64_ADDR64 relocations can be resolved at
link-time.

Link: https://github.com/ClangBuiltLinux/linux/issues/811
Signed-off-by: Fangrui Song <maskray@google.com>
Co-developed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Bill Wendling <morbo@google.com>
Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
---
v2:
  - Assign "Fangrui Song" as the proper author.
  - Improve the commit message to add more context.
  - Appending tags from original patch's review.
---
 arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 6505d66f1193..17a9fbf9b789 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -122,6 +122,7 @@ endif
 
 LDFLAGS_vmlinux-y := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
+LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
 LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
 
 ifdef CONFIG_PPC64
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Nick Desaulniers @ 2021-08-13 18:59 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Fangrui Song, LKML, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev, Daniel Axtens
In-Reply-To: <CAGG=3QUz2LNgC8Hn6rU68ejjv4=J9Uidef0oH9A7=sKTs+vf7g@mail.gmail.com>

On Fri, Aug 13, 2021 at 11:24 AM Bill Wendling <morbo@google.com> wrote:
>
> BTW, this patch should more properly be attributed to Fangrui Song. I
> can send a follow-up patch that reflects this and adds more context to
> the commit message.

Sounds good to me. The TL;DR is that LLD has a different implicit
default for `-z text` vs `-z notext` than BFD.  We can emulate the
behavior or BFD by simply being explicit about `-z notext` always.

Or we can dig through why there are relocations in read only sections,
fix those, then enable `-z text` for all linkers.  My recommendation
would be get the thing building, then go digging time permitting.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH] ppc: add "-z notext" flag to disable diagnostic
From: Bill Wendling @ 2021-08-13 18:24 UTC (permalink / raw)
  To: Daniel Axtens, Fangrui Song
  Cc: Nick Desaulniers, LKML, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <87sfzde8lk.fsf@linkitivity.dja.id.au>

On Fri, Aug 13, 2021 at 7:13 AM Daniel Axtens <dja@axtens.net> wrote:
>
> Bill Wendling <morbo@google.com> writes:
>
> > The "-z notext" flag disables reporting an error if DT_TEXTREL is set on
> > PPC with CONFIG=kdump:
> >
> >   ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
> >     local symbol in readonly segment; recompile object files with -fPIC
> >     or pass '-Wl,-z,notext' to allow text relocations in the output
> >   >>> defined in built-in.a(arch/powerpc/kernel/misc.o)
> >   >>> referenced by arch/powerpc/kernel/misc.o:(.text+0x20) in archive
> >       built-in.a
> >
> > The BFD linker disables this by default (though it's configurable in
> > current versions). LLD enables this by default. So we add the flag to
> > keep LLD from emitting the error.
>
> You didn't provide a huge amount of context but I was able to reproduce
> a similar set of errors with pseries_le_defconfig and
>
My apologies for the lack of context. We discovered this issue when
moving to use LLD instead of BFD for our linker. This change suggested
by Fangrui Song. Here's his comments from
https://github.com/ClangBuiltLinux/linux/issues/811. (Nick Desaulniers
added a "Link" tag with his review.)

```
The text relocations are real. lld can link .tmp_vmlinux1 smoothly if
you pass -z notext. Though, it will still be insightful to investigate
where these text relocations come from. I believe there are only 2
categories.

(a) For a CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y build (x86 and arm64
select the option by default), ___ksymtab+* sections (non-SHF_WRITE)
contains entries relocated by PC relative relocations. These entries
do not need dynamic relocations.

out/powerpc64le/.config generated by ppc64le_defconfig does not enable
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y.

% readelf -r out/x86_64/entry/entry_64.o
...
Relocation section '.rela___ksymtab+native_load_gs_index' at offset
0x6460 contains 2 entries:
    Offset             Info             Type               Symbol's
Value  Symbol's Name + Addend
0000000000000000  0000007a00000002 R_X86_64_PC32
0000000000000ea0 native_load_gs_index + 0
0000000000000004  0000001d00000002 R_X86_64_PC32
0000000000000000 __ksymtab_strings + 0

include/asm-generic/export.h

ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
local symbol in readonly segment; recompile object files with -fPIC or
pass '-Wl
,-z,notext' to allow text relocations in the output
>>> defined in init/built-in.a(do_mounts.o)
>>> referenced by do_mounts.c
>>>               do_mounts.o:(__ksymtab_name_to_dev_t) in archive init/built-in.a

(b) R_PPC64_ADDR64 in __mcount_loc sections.

ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
local symbol in readonly segment; recompile object files with -fPIC or
pass '-Wl
,-z,notext' to allow text relocations in the output
>>> defined in init/built-in.a(do_mounts_rd.o)
>>> referenced by do_mounts_rd.c
>>>               do_mounts_rd.o:(__mcount_loc+0x0) in archive init/built-in.a

This section is generated by ./scripts/recordmcount
"init/do_mounts_rd.o". The tool hard codes R_PPC64_ADDR64.

% grep PPC64 scripts/recordmcount.c
        case EM_PPC64:  reltype = R_PPC64_ADDR64; break;
```

> make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- CC="ccache clang-11" LD=ld.lld-11 AR=llvm-ar-11 -j4 vmlinux
>
> I also checked the manpage, and indeed the system ld does not issue this
> warning/error by default.
>
It should be noted that the BFD linker can emit this warning/error if
binutils is configured with
"--enable-textrel-check={warning,error,yes}".

> > ---
> >  arch/powerpc/Makefile | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index 6505d66f1193..17a9fbf9b789 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -122,6 +122,7 @@ endif
> >
> >  LDFLAGS_vmlinux-y := -Bstatic
> >  LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
> > +LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) += -z notext
>
> Is there any reason this should be gated on CONFIG_RELOCATABLE? (I tried
> without it and got different but possibly related linker errors...)
>
My understanding is that '-z notext' allows (i.e. doesn't emit
warnings/errors for) relocations against read-only segments. So it's
not really relevant to non-relocatable builds.

Unrelated question: Should the "-pie" flag be added with "+= -pie"
(note the plus sign)?

> Also, is this behaviour new?
>
I don't believe so. It's only when we wanted to use LLD that it was noticed.

BTW, this patch should more properly be attributed to Fangrui Song. I
can send a follow-up patch that reflects this and adds more context to
the commit message.

-bw

> Kind regards,
> Daniel
>
> >  LDFLAGS_vmlinux      := $(LDFLAGS_vmlinux-y)
> >
> >  ifdef CONFIG_PPC64
> > --
> > 2.33.0.rc1.237.g0d66db33f3-goog

^ permalink raw reply

* Re: [PATCH v2 00/12] Implement generic prot_guest_has() helper function
From: Tom Lendacky @ 2021-08-13 17:22 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Brijesh Singh, David Airlie,
	Dave Hansen, Paul Mackerras, Will Deacon, Ard Biesheuvel,
	Andi Kleen, Baoquan He, Joerg Roedel, Christian Borntraeger,
	Ingo Molnar, Dave Young, Tianyu Lan, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Daniel Vetter
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

On 8/13/21 11:59 AM, Tom Lendacky wrote:
> This patch series provides a generic helper function, prot_guest_has(),
> to replace the sme_active(), sev_active(), sev_es_active() and
> mem_encrypt_active() functions.
> 
> It is expected that as new protected virtualization technologies are
> added to the kernel, they can all be covered by a single function call
> instead of a collection of specific function calls all called from the
> same locations.
> 
> The powerpc and s390 patches have been compile tested only. Can the
> folks copied on this series verify that nothing breaks for them.

There are some patches related to PPC that added new calls to the 
mem_encrypt_active() function that are not yet in the tip tree. After the 
merge window, I'll need to send a v3 with those additional changes before 
this series can be applied.

Thanks,
Tom

^ permalink raw reply

* Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features
From: Kuppuswamy, Sathyanarayanan @ 2021-08-13 17:19 UTC (permalink / raw)
  To: Tom Lendacky, linux-kernel, x86, linuxppc-dev, linux-s390, iommu,
	kvm, linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Andi Kleen, Tianyu Lan, Joerg Roedel, Joerg Roedel,
	Borislav Petkov, Brijesh Singh
In-Reply-To: <482fe51f1671c1cd081039801b03db7ec0036332.1628873970.git.thomas.lendacky@amd.com>



On 8/13/21 9:59 AM, Tom Lendacky wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).
> 
> Reviewed-by: Joerg Roedel<jroedel@suse.de>
> Co-developed-by: Andi Kleen<ak@linux.intel.com>
> Signed-off-by: Andi Kleen<ak@linux.intel.com>
> Co-developed-by: Kuppuswamy Sathyanarayanan<sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan<sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Tom Lendacky<thomas.lendacky@amd.com>
> ---
>   arch/Kconfig                    |  3 +++
>   include/linux/protected_guest.h | 35 +++++++++++++++++++++++++++++++++
>   2 files changed, 38 insertions(+)
>   create mode 100644 include/linux/protected_guest.h

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply

* Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
From: Tom Lendacky @ 2021-08-13 17:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	Will Deacon, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, David Airlie, Ingo Molnar,
	linux-graphics-maintainer, Dave Young, Tianyu Lan,
	Thomas Zimmermann, Maarten Lankhorst, Maxime Ripard,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, kexec,
	linux-kernel, iommu, Daniel Vetter, linux-fsdevel, linuxppc-dev
In-Reply-To: <20210812100724.t4cdh7xbkuqgnsc3@box.shutemov.name>

On 8/12/21 5:07 AM, Kirill A. Shutemov wrote:
> On Wed, Aug 11, 2021 at 10:52:55AM -0500, Tom Lendacky wrote:
>> On 8/11/21 7:19 AM, Kirill A. Shutemov wrote:
>>> On Tue, Aug 10, 2021 at 02:48:54PM -0500, Tom Lendacky wrote:
>>>> On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote:
...
>>> Looking at code agains, now I *think* the reason is accessing a global
>>> variable from __startup_64() inside TDX version of prot_guest_has().
>>>
>>> __startup_64() is special. If you access any global variable you need to
>>> use fixup_pointer(). See comment before __startup_64().
>>>
>>> I'm not sure how you get away with accessing sme_me_mask directly from
>>> there. Any clues? Maybe just a luck and complier generates code just right
>>> for your case, I donno.
>>
>> Hmm... yeah, could be that the compiler is using rip-relative addressing
>> for it because it lives in the .data section?
> 
> I guess. It has to be fixed. It may break with complier upgrade or any
> random change around the code.

I'll look at doing that separate from this series.

> 
> BTW, does it work with clang for you?

I haven't tried with clang, I'll check on that.

Thanks,
Tom

> 

^ permalink raw reply

* [PATCH v2 12/12] s390/mm: Remove the now unused mem_encrypt_active() function
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Vasily Gorbik,
	Joerg Roedel, Christian Borntraeger, Borislav Petkov,
	Brijesh Singh, Heiko Carstens
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

The mem_encrypt_active() function has been replaced by prot_guest_has(),
so remove the implementation. Since the default implementation of the
prot_guest_has() matches the s390 implementation of mem_encrypt_active(),
prot_guest_has() does not need to be implemented in s390 (the config
option ARCH_HAS_PROTECTED_GUEST is not set).

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/s390/include/asm/mem_encrypt.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/s390/include/asm/mem_encrypt.h b/arch/s390/include/asm/mem_encrypt.h
index 2542cbf7e2d1..08a8b96606d7 100644
--- a/arch/s390/include/asm/mem_encrypt.h
+++ b/arch/s390/include/asm/mem_encrypt.h
@@ -4,8 +4,6 @@
 
 #ifndef __ASSEMBLY__
 
-static inline bool mem_encrypt_active(void) { return false; }
-
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
 
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 11/12] powerpc/pseries/svm: Remove the now unused mem_encrypt_active() function
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Joerg Roedel,
	Borislav Petkov, Brijesh Singh, Paul Mackerras
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

The mem_encrypt_active() function has been replaced by prot_guest_has(),
so remove the implementation.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/powerpc/include/asm/mem_encrypt.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/include/asm/mem_encrypt.h b/arch/powerpc/include/asm/mem_encrypt.h
index ba9dab07c1be..2f26b8fc8d29 100644
--- a/arch/powerpc/include/asm/mem_encrypt.h
+++ b/arch/powerpc/include/asm/mem_encrypt.h
@@ -10,11 +10,6 @@
 
 #include <asm/svm.h>
 
-static inline bool mem_encrypt_active(void)
-{
-	return is_secure_guest();
-}
-
 static inline bool force_dma_unencrypted(struct device *dev)
 {
 	return is_secure_guest();
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Joerg Roedel,
	Joerg Roedel, Borislav Petkov, Brijesh Singh
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

The mem_encrypt_active() function has been replaced by prot_guest_has(),
so remove the implementation.

Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 include/linux/mem_encrypt.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 5c4a18a91f89..ae4526389261 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -16,10 +16,6 @@
 
 #include <asm/mem_encrypt.h>
 
-#else	/* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
-
-static inline bool mem_encrypt_active(void) { return false; }
-
 #endif	/* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 10/12] x86/sev: Remove the now unused mem_encrypt_active() function
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Joerg Roedel,
	Ingo Molnar, Borislav Petkov, Joerg Roedel, Brijesh Singh,
	Thomas Gleixner
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

The mem_encrypt_active() function has been replaced by prot_guest_has(),
so remove the implementation.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 797146e0cd6b..94c089e9ea69 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -97,11 +97,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { }
 
 extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
 
-static inline bool mem_encrypt_active(void)
-{
-	return sme_me_mask;
-}
-
 static inline u64 sme_get_me_mask(void)
 {
 	return sme_me_mask;
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 08/12] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Daniel Vetter,
	Baoquan He, Peter Zijlstra, Thomas Zimmermann, Joerg Roedel,
	Dave Hansen, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Dave Young, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	Brijesh Singh, Thomas Gleixner, Will Deacon
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
with the PATTR_MEM_ENCRYPT attribute.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/head64.c                | 4 ++--
 arch/x86/mm/ioremap.c                   | 4 ++--
 arch/x86/mm/mem_encrypt.c               | 5 ++---
 arch/x86/mm/pat/set_memory.c            | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
 drivers/gpu/drm/drm_cache.c             | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c     | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c     | 6 +++---
 drivers/iommu/amd/iommu.c               | 3 ++-
 drivers/iommu/amd/iommu_v2.c            | 3 ++-
 drivers/iommu/iommu.c                   | 3 ++-
 fs/proc/vmcore.c                        | 6 +++---
 kernel/dma/swiotlb.c                    | 4 ++--
 13 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
 #include <linux/start_kernel.h>
 #include <linux/io.h>
 #include <linux/memblock.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <linux/pgtable.h>
 
 #include <asm/processor.h>
@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * there is no need to zero it after changing the memory encryption
 	 * attribute.
 	 */
-	if (mem_encrypt_active()) {
+	if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
 		vaddr = (unsigned long)__start_bss_decrypted;
 		vaddr_end = (unsigned long)__end_bss_decrypted;
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 3ed0f28f12af..7f012fc1b600 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -694,7 +694,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
 bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
 				 unsigned long flags)
 {
-	if (!mem_encrypt_active())
+	if (!prot_guest_has(PATTR_MEM_ENCRYPT))
 		return true;
 
 	if (flags & MEMREMAP_ENC)
@@ -724,7 +724,7 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
 {
 	bool encrypted_prot;
 
-	if (!mem_encrypt_active())
+	if (!prot_guest_has(PATTR_MEM_ENCRYPT))
 		return prot;
 
 	encrypted_prot = true;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 38dfa84b77a1..69aed9935b5e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
- * sme_active() and sev_active() functions are used for this.  When a
- * distinction isn't needed, the mem_encrypt_active() function can be used.
+ * sme_active() and sev_active() functions are used for this.
  *
  * The trampoline code is a good example for this requirement.  Before
  * paging is activated, SME will access all memory as decrypted, but SEV
@@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
 	 * The unused memory range was mapped decrypted, change the encryption
 	 * attribute from decrypted to encrypted before freeing it.
 	 */
-	if (mem_encrypt_active()) {
+	if (amd_prot_guest_has(PATTR_MEM_ENCRYPT)) {
 		r = set_memory_encrypted(vaddr, npages);
 		if (r) {
 			pr_warn("failed to free unused decrypted pages\n");
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..6925f2bb4be1 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -18,6 +18,7 @@
 #include <linux/libnvdimm.h>
 #include <linux/vmstat.h>
 #include <linux/kernel.h>
+#include <linux/protected_guest.h>
 
 #include <asm/e820/api.h>
 #include <asm/processor.h>
@@ -1986,7 +1987,7 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	int ret;
 
 	/* Nothing to do if memory encryption is not active */
-	if (!mem_encrypt_active())
+	if (!prot_guest_has(PATTR_MEM_ENCRYPT))
 		return 0;
 
 	/* Should not be working on unaligned addresses */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 971c5b8e75dc..21c1e3056070 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -38,6 +38,7 @@
 #include <drm/drm_probe_helper.h>
 #include <linux/mmu_notifier.h>
 #include <linux/suspend.h>
+#include <linux/protected_guest.h>
 
 #include "amdgpu.h"
 #include "amdgpu_irq.h"
@@ -1250,7 +1251,8 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 	 * however, SME requires an indirect IOMMU mapping because the encryption
 	 * bit is beyond the DMA mask of the chip.
 	 */
-	if (mem_encrypt_active() && ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
+	if (prot_guest_has(PATTR_MEM_ENCRYPT) &&
+	    ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
 		dev_info(&pdev->dev,
 			 "SME is not compatible with RAVEN\n");
 		return -ENOTSUPP;
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 546599f19a93..4d01d44012fd 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -31,7 +31,7 @@
 #include <linux/dma-buf-map.h>
 #include <linux/export.h>
 #include <linux/highmem.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <xen/xen.h>
 
 #include <drm/drm_cache.h>
@@ -204,7 +204,7 @@ bool drm_need_swiotlb(int dma_bits)
 	 * Enforce dma_alloc_coherent when memory encryption is active as well
 	 * for the same reasons as for Xen paravirtual hosts.
 	 */
-	if (mem_encrypt_active())
+	if (prot_guest_has(PATTR_MEM_ENCRYPT))
 		return true;
 
 	for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 45aeeca9b8f6..498f52ba08ea 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -29,7 +29,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/module.h>
 #include <linux/pci.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 
 #include <drm/drm_aperture.h>
 #include <drm/drm_drv.h>
@@ -633,7 +633,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
 		[vmw_dma_map_bind] = "Giving up DMA mappings early."};
 
 	/* TTM currently doesn't fully support SEV encryption. */
-	if (mem_encrypt_active())
+	if (prot_guest_has(PATTR_MEM_ENCRYPT))
 		return -EINVAL;
 
 	if (vmw_force_coherent)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
index 3d08f5700bdb..0c70573d3dce 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
@@ -28,7 +28,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/slab.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 
 #include <asm/hypervisor.h>
 
@@ -153,7 +153,7 @@ static unsigned long vmw_port_hb_out(struct rpc_channel *channel,
 	unsigned long msg_len = strlen(msg);
 
 	/* HB port can't access encrypted memory. */
-	if (hb && !mem_encrypt_active()) {
+	if (hb && !prot_guest_has(PATTR_MEM_ENCRYPT)) {
 		unsigned long bp = channel->cookie_high;
 
 		si = (uintptr_t) msg;
@@ -208,7 +208,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel *channel, char *reply,
 	unsigned long si, di, eax, ebx, ecx, edx;
 
 	/* HB port can't access encrypted memory */
-	if (hb && !mem_encrypt_active()) {
+	if (hb && !prot_guest_has(PATTR_MEM_ENCRYPT)) {
 		unsigned long bp = channel->cookie_low;
 
 		si = channel->cookie_high;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..def63a8deab4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -31,6 +31,7 @@
 #include <linux/irqdomain.h>
 #include <linux/percpu.h>
 #include <linux/io-pgtable.h>
+#include <linux/protected_guest.h>
 #include <asm/irq_remapping.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
@@ -2178,7 +2179,7 @@ static int amd_iommu_def_domain_type(struct device *dev)
 	 * active, because some of those devices (AMD GPUs) don't have the
 	 * encryption bit in their DMA-mask and require remapping.
 	 */
-	if (!mem_encrypt_active() && dev_data->iommu_v2)
+	if (!prot_guest_has(PATTR_MEM_ENCRYPT) && dev_data->iommu_v2)
 		return IOMMU_DOMAIN_IDENTITY;
 
 	return 0;
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index f8d4ad421e07..ac359bc98523 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -16,6 +16,7 @@
 #include <linux/wait.h>
 #include <linux/pci.h>
 #include <linux/gfp.h>
+#include <linux/protected_guest.h>
 
 #include "amd_iommu.h"
 
@@ -741,7 +742,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 	 * When memory encryption is active the device is likely not in a
 	 * direct-mapped domain. Forbid using IOMMUv2 functionality for now.
 	 */
-	if (mem_encrypt_active())
+	if (prot_guest_has(PATTR_MEM_ENCRYPT))
 		return -ENODEV;
 
 	if (!amd_iommu_v2_supported())
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..ddbedb1b5b6b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
 #include <linux/property.h>
 #include <linux/fsl/mc.h>
 #include <linux/module.h>
+#include <linux/protected_guest.h>
 #include <trace/events/iommu.h>
 
 static struct kset *iommu_group_kset;
@@ -127,7 +128,7 @@ static int __init iommu_subsys_init(void)
 		else
 			iommu_set_default_translated(false);
 
-		if (iommu_default_passthrough() && mem_encrypt_active()) {
+		if (iommu_default_passthrough() && prot_guest_has(PATTR_MEM_ENCRYPT)) {
 			pr_info("Memory encryption detected - Disabling default IOMMU Passthrough\n");
 			iommu_set_default_translated(false);
 		}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9a15334da208..b466f543dc00 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -26,7 +26,7 @@
 #include <linux/vmalloc.h>
 #include <linux/pagemap.h>
 #include <linux/uaccess.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <asm/io.h>
 #include "internal.h"
 
@@ -177,7 +177,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
  */
 ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
 {
-	return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active());
+	return read_from_oldmem(buf, count, ppos, 0, prot_guest_has(PATTR_MEM_ENCRYPT));
 }
 
 /*
@@ -378,7 +378,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos,
 					    buflen);
 			start = m->paddr + *fpos - m->offset;
 			tmp = read_from_oldmem(buffer, tsz, &start,
-					       userbuf, mem_encrypt_active());
+					       userbuf, prot_guest_has(PATTR_MEM_ENCRYPT));
 			if (tmp < 0)
 				return tmp;
 			buflen -= tsz;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e50df8d8f87e..2e8dee23a624 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -34,7 +34,7 @@
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/scatterlist.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <linux/set_memory.h>
 #ifdef CONFIG_DEBUG_FS
 #include <linux/debugfs.h>
@@ -515,7 +515,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	if (!mem)
 		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
 
-	if (mem_encrypt_active())
+	if (prot_guest_has(PATTR_MEM_ENCRYPT))
 		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
 
 	if (mapping_size > alloc_size) {
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 07/12] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan, Joerg Roedel,
	Ingo Molnar, Borislav Petkov, Brijesh Singh, Thomas Gleixner
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

Replace occurrences of sev_es_active() with the more generic
prot_guest_has() using PATTR_GUEST_PROT_STATE, except for in
arch/x86/kernel/sev*.c and arch/x86/mm/mem_encrypt*.c where PATTR_SEV_ES
will be used. If future support is added for other memory encyrption
techonologies, the use of PATTR_GUEST_PROT_STATE can be updated, as
required, to specifically use PATTR_SEV_ES.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h | 2 --
 arch/x86/kernel/sev.c              | 6 +++---
 arch/x86/mm/mem_encrypt.c          | 7 +++----
 arch/x86/realmode/init.c           | 3 +--
 4 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 7e25de37c148..797146e0cd6b 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -50,7 +50,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_es_active(void);
 bool amd_prot_guest_has(unsigned int attr);
 
 #define __bss_decrypted __section(".bss..decrypted")
@@ -74,7 +73,6 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_es_active(void) { return false; }
 static inline bool amd_prot_guest_has(unsigned int attr) { return false; }
 
 static inline int __init
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..66a4ab9d95d7 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -11,7 +11,7 @@
 
 #include <linux/sched/debug.h>	/* For show_regs() */
 #include <linux/percpu-defs.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <linux/printk.h>
 #include <linux/mm_types.h>
 #include <linux/set_memory.h>
@@ -615,7 +615,7 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
 	int cpu;
 	u64 pfn;
 
-	if (!sev_es_active())
+	if (!prot_guest_has(PATTR_SEV_ES))
 		return 0;
 
 	pflags = _PAGE_NX | _PAGE_RW;
@@ -774,7 +774,7 @@ void __init sev_es_init_vc_handling(void)
 
 	BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % PAGE_SIZE);
 
-	if (!sev_es_active())
+	if (!prot_guest_has(PATTR_SEV_ES))
 		return;
 
 	if (!sev_es_check_cpu_features())
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 83bc928f529e..38dfa84b77a1 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -383,8 +383,7 @@ static bool sme_active(void)
 	return sme_me_mask && !sev_active();
 }
 
-/* Needs to be called from non-instrumentable code */
-bool noinstr sev_es_active(void)
+static bool sev_es_active(void)
 {
 	return sev_status & MSR_AMD64_SEV_ES_ENABLED;
 }
@@ -482,7 +481,7 @@ static void print_mem_encrypt_feature_info(void)
 		pr_cont(" SEV");
 
 	/* Encrypted Register State */
-	if (sev_es_active())
+	if (amd_prot_guest_has(PATTR_SEV_ES))
 		pr_cont(" SEV-ES");
 
 	pr_cont("\n");
@@ -501,7 +500,7 @@ void __init mem_encrypt_init(void)
 	 * With SEV, we need to unroll the rep string I/O instructions,
 	 * but SEV-ES supports them through the #VC handler.
 	 */
-	if (amd_prot_guest_has(PATTR_SEV) && !sev_es_active())
+	if (amd_prot_guest_has(PATTR_SEV) && !amd_prot_guest_has(PATTR_SEV_ES))
 		static_branch_enable(&sev_enable_key);
 
 	print_mem_encrypt_feature_info();
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 2109ae569c67..7711d0071f41 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -2,7 +2,6 @@
 #include <linux/io.h>
 #include <linux/slab.h>
 #include <linux/memblock.h>
-#include <linux/mem_encrypt.h>
 #include <linux/protected_guest.h>
 #include <linux/pgtable.h>
 
@@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct trampoline_header *th)
 	if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
 		th->flags |= TH_FLAGS_SME_ACTIVE;
 
-	if (sev_es_active()) {
+	if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
 		/*
 		 * Skip the call to verify_cpu() in secondary_startup_64 as it
 		 * will cause #VC exceptions when the AP can't handle them yet.
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()
From: Tom Lendacky @ 2021-08-13 16:59 UTC (permalink / raw)
  To: linux-kernel, x86, linuxppc-dev, linux-s390, iommu, kvm,
	linux-efi, platform-driver-x86, linux-graphics-maintainer,
	amd-gfx, dri-devel, kexec, linux-fsdevel
  Cc: Sathyanarayanan Kuppuswamy, Andi Kleen, Tianyu Lan,
	Peter Zijlstra, Joerg Roedel, Dave Hansen, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Brijesh Singh, Thomas Gleixner,
	Joerg Roedel, Ard Biesheuvel
In-Reply-To: <cover.1628873970.git.thomas.lendacky@amd.com>

Replace occurrences of sev_active() with the more generic prot_guest_has()
using PATTR_GUEST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c
where PATTR_SEV will be used. If future support is added for other memory
encryption technologies, the use of PATTR_GUEST_MEM_ENCRYPT can be
updated, as required, to use PATTR_SEV.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |  2 --
 arch/x86/kernel/crash_dump_64.c    |  4 +++-
 arch/x86/kernel/kvm.c              |  3 ++-
 arch/x86/kernel/kvmclock.c         |  4 ++--
 arch/x86/kernel/machine_kexec_64.c | 16 ++++++++--------
 arch/x86/kvm/svm/svm.c             |  3 ++-
 arch/x86/mm/ioremap.c              |  6 +++---
 arch/x86/mm/mem_encrypt.c          | 15 +++++++--------
 arch/x86/platform/efi/efi_64.c     |  9 +++++----
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 956338406cec..7e25de37c148 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -50,7 +50,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_active(void);
 bool sev_es_active(void);
 bool amd_prot_guest_has(unsigned int attr);
 
@@ -75,7 +74,6 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
 static inline bool amd_prot_guest_has(unsigned int attr) { return false; }
 
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 045e82e8945b..0cfe35f03e67 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -10,6 +10,7 @@
 #include <linux/crash_dump.h>
 #include <linux/uaccess.h>
 #include <linux/io.h>
+#include <linux/protected_guest.h>
 
 static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 				  unsigned long offset, int userbuf,
@@ -73,5 +74,6 @@ ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
 
 ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-	return read_from_oldmem(buf, count, ppos, 0, sev_active());
+	return read_from_oldmem(buf, count, ppos, 0,
+				prot_guest_has(PATTR_GUEST_MEM_ENCRYPT));
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a26643dc6bd6..9d08ad2f3faa 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,6 +27,7 @@
 #include <linux/nmi.h>
 #include <linux/swait.h>
 #include <linux/syscore_ops.h>
+#include <linux/protected_guest.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -418,7 +419,7 @@ static void __init sev_map_percpu_data(void)
 {
 	int cpu;
 
-	if (!sev_active())
+	if (!prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 		return;
 
 	for_each_possible_cpu(cpu) {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ad273e5861c1..f7ba78a23dcd 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -16,9 +16,9 @@
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/set_memory.h>
+#include <linux/protected_guest.h>
 
 #include <asm/hypervisor.h>
-#include <asm/mem_encrypt.h>
 #include <asm/x86_init.h>
 #include <asm/kvmclock.h>
 
@@ -232,7 +232,7 @@ static void __init kvmclock_init_mem(void)
 	 * hvclock is shared between the guest and the hypervisor, must
 	 * be mapped decrypted.
 	 */
-	if (sev_active()) {
+	if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT)) {
 		r = set_memory_decrypted((unsigned long) hvclock_mem,
 					 1UL << order);
 		if (r) {
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 8e7b517ad738..66ff788b79c9 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -167,7 +167,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
 
-	if (sev_active())
+	if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 		prot = PAGE_KERNEL_EXEC;
 
 	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
@@ -207,7 +207,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 	level4p = (pgd_t *)__va(start_pgtable);
 	clear_page(level4p);
 
-	if (sev_active()) {
+	if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT)) {
 		info.page_flag   |= _PAGE_ENC;
 		info.kernpg_flag |= _PAGE_ENC;
 	}
@@ -570,12 +570,12 @@ void arch_kexec_unprotect_crashkres(void)
  */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
-	if (sev_active())
+	if (!prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
 		return 0;
 
 	/*
-	 * If SME is active we need to be sure that kexec pages are
-	 * not encrypted because when we boot to the new kernel the
+	 * If host memory encryption is active we need to be sure that kexec
+	 * pages are not encrypted because when we boot to the new kernel the
 	 * pages won't be accessed encrypted (initially).
 	 */
 	return set_memory_decrypted((unsigned long)vaddr, pages);
@@ -583,12 +583,12 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
-	if (sev_active())
+	if (!prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
 		return;
 
 	/*
-	 * If SME is active we need to reset the pages back to being
-	 * an encrypted mapping before freeing them.
+	 * If host memory encryption is active we need to reset the pages back
+	 * to being an encrypted mapping before freeing them.
 	 */
 	set_memory_encrypted((unsigned long)vaddr, pages);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e8ccab50ebf6..b69f5ac622d5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -25,6 +25,7 @@
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 #include <linux/rwsem.h>
+#include <linux/protected_guest.h>
 
 #include <asm/apic.h>
 #include <asm/perf_event.h>
@@ -457,7 +458,7 @@ static int has_svm(void)
 		return 0;
 	}
 
-	if (sev_active()) {
+	if (prot_guest_has(PATTR_SEV)) {
 		pr_info("KVM is unsupported when running as an SEV guest\n");
 		return 0;
 	}
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 583afd54c7e1..3ed0f28f12af 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -92,7 +92,7 @@ static unsigned int __ioremap_check_ram(struct resource *res)
  */
 static unsigned int __ioremap_check_encrypted(struct resource *res)
 {
-	if (!sev_active())
+	if (!prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 		return 0;
 
 	switch (res->desc) {
@@ -112,7 +112,7 @@ static unsigned int __ioremap_check_encrypted(struct resource *res)
  */
 static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *desc)
 {
-	if (!sev_active())
+	if (!prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 		return;
 
 	if (!IS_ENABLED(CONFIG_EFI))
@@ -556,7 +556,7 @@ static bool memremap_should_map_decrypted(resource_size_t phys_addr,
 	case E820_TYPE_NVS:
 	case E820_TYPE_UNUSABLE:
 		/* For SEV, these areas are encrypted */
-		if (sev_active())
+		if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 			break;
 		fallthrough;
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5635ca9a1fbe..83bc928f529e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -194,7 +194,7 @@ void __init sme_early_init(void)
 	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
 		protection_map[i] = pgprot_encrypted(protection_map[i]);
 
-	if (sev_active())
+	if (amd_prot_guest_has(PATTR_SEV))
 		swiotlb_force = SWIOTLB_FORCE;
 }
 
@@ -203,7 +203,7 @@ void __init sev_setup_arch(void)
 	phys_addr_t total_mem = memblock_phys_mem_size();
 	unsigned long size;
 
-	if (!sev_active())
+	if (!amd_prot_guest_has(PATTR_SEV))
 		return;
 
 	/*
@@ -373,7 +373,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
  * up under SME the trampoline area cannot be encrypted, whereas under SEV
  * the trampoline area must be encrypted.
  */
-bool sev_active(void)
+static bool sev_active(void)
 {
 	return sev_status & MSR_AMD64_SEV_ENABLED;
 }
@@ -382,7 +382,6 @@ static bool sme_active(void)
 {
 	return sme_me_mask && !sev_active();
 }
-EXPORT_SYMBOL_GPL(sev_active);
 
 /* Needs to be called from non-instrumentable code */
 bool noinstr sev_es_active(void)
@@ -420,7 +419,7 @@ bool force_dma_unencrypted(struct device *dev)
 	/*
 	 * For SEV, all DMA must be to unencrypted addresses.
 	 */
-	if (sev_active())
+	if (amd_prot_guest_has(PATTR_SEV))
 		return true;
 
 	/*
@@ -479,7 +478,7 @@ static void print_mem_encrypt_feature_info(void)
 	}
 
 	/* Secure Encrypted Virtualization */
-	if (sev_active())
+	if (amd_prot_guest_has(PATTR_SEV))
 		pr_cont(" SEV");
 
 	/* Encrypted Register State */
@@ -502,7 +501,7 @@ void __init mem_encrypt_init(void)
 	 * With SEV, we need to unroll the rep string I/O instructions,
 	 * but SEV-ES supports them through the #VC handler.
 	 */
-	if (sev_active() && !sev_es_active())
+	if (amd_prot_guest_has(PATTR_SEV) && !sev_es_active())
 		static_branch_enable(&sev_enable_key);
 
 	print_mem_encrypt_feature_info();
@@ -510,6 +509,6 @@ void __init mem_encrypt_init(void)
 
 int arch_has_restricted_virtio_memory_access(void)
 {
-	return sev_active();
+	return amd_prot_guest_has(PATTR_SEV);
 }
 EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 7515e78ef898..94737fcc1e21 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -33,7 +33,7 @@
 #include <linux/reboot.h>
 #include <linux/slab.h>
 #include <linux/ucs2_string.h>
-#include <linux/mem_encrypt.h>
+#include <linux/protected_guest.h>
 #include <linux/sched/task.h>
 
 #include <asm/setup.h>
@@ -284,7 +284,8 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
 	if (!(md->attribute & EFI_MEMORY_WB))
 		flags |= _PAGE_PCD;
 
-	if (sev_active() && md->type != EFI_MEMORY_MAPPED_IO)
+	if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT) &&
+	    md->type != EFI_MEMORY_MAPPED_IO)
 		flags |= _PAGE_ENC;
 
 	pfn = md->phys_addr >> PAGE_SHIFT;
@@ -390,7 +391,7 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, efi_memory_desc_t *m
 	if (!(md->attribute & EFI_MEMORY_RO))
 		pf |= _PAGE_RW;
 
-	if (sev_active())
+	if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 		pf |= _PAGE_ENC;
 
 	return efi_update_mappings(md, pf);
@@ -438,7 +439,7 @@ void __init efi_runtime_update_mappings(void)
 			(md->type != EFI_RUNTIME_SERVICES_CODE))
 			pf |= _PAGE_RW;
 
-		if (sev_active())
+		if (prot_guest_has(PATTR_GUEST_MEM_ENCRYPT))
 			pf |= _PAGE_ENC;
 
 		efi_update_mappings(md, pf);
-- 
2.32.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox