* [PATCH] pci: pcie: AER: Fix logging of Correctable errors
From: Matt Jolly @ 2020-06-18 15:55 UTC (permalink / raw)
To: Russell Currey, Sam Bobroff, Oliver O'Halloran, Bjorn Helgaas,
linuxppc-dev, linux-pci, linux-kernel
Cc: Matt Jolly
The AER documentation indicates that correctable (severity=Corrected)
errors should be output as a warning so that users can filter these
errors if they choose to; This functionality does not appear to have been implemented.
This patch modifies the functions aer_print_error and __aer_print_error
to send correctable errors as a warning (pci_warn), rather than as an error (pci_err). It
partially addresses several bugs in relation to kernel message buffer
spam for misbehaving devices - the root cause (possibly device firmware?) isn't
addressed, but the dmesg output is less alarming for end users, and can
be filtered separately from uncorrectable errors. This should hopefully
reduce the need for users to disable AER to suppress corrected errors.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183
Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
---
| 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
--git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3acf56683915..131ecc0df2cb 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -662,12 +662,18 @@ static void __aer_print_error(struct pci_dev *dev,
errmsg = i < ARRAY_SIZE(aer_uncorrectable_error_string) ?
aer_uncorrectable_error_string[i] : NULL;
- if (errmsg)
- pci_err(dev, " [%2d] %-22s%s\n", i, errmsg,
- info->first_error == i ? " (First)" : "");
- else
+ if (errmsg) {
+ if (info->severity == AER_CORRECTABLE) {
+ pci_warn(dev, " [%2d] %-22s%s\n", i, errmsg,
+ info->first_error == i ? " (First)" : "");
+ } else {
+ pci_err(dev, " [%2d] %-22s%s\n", i, errmsg,
+ info->first_error == i ? " (First)" : "");
+ }
+ } else {
pci_err(dev, " [%2d] Unknown Error Bit%s\n",
i, info->first_error == i ? " (First)" : "");
+ }
}
pci_dev_aer_stats_incr(dev, info);
}
@@ -686,13 +692,23 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
layer = AER_GET_LAYER_ERROR(info->severity, info->status);
agent = AER_GET_AGENT(info->severity, info->status);
- pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
- aer_error_severity_string[info->severity],
- aer_error_layer[layer], aer_agent_string[agent]);
+ if (info->severity == AER_CORRECTABLE) {
+ pci_warn(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+ aer_error_severity_string[info->severity],
+ aer_error_layer[layer], aer_agent_string[agent]);
- pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
- dev->vendor, dev->device,
- info->status, info->mask);
+ pci_warn(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
+ dev->vendor, dev->device,
+ info->status, info->mask);
+ } else {
+ pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
+ aer_error_severity_string[info->severity],
+ aer_error_layer[layer], aer_agent_string[agent]);
+
+ pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
+ dev->vendor, dev->device,
+ info->status, info->mask);
+ }
__aer_print_error(dev, info);
--
2.26.2
^ permalink raw reply related
* Re: rename probe_kernel_* and probe_user_*
From: Helge Deller @ 2020-06-18 20:51 UTC (permalink / raw)
To: Linus Torvalds, Christoph Hellwig, Russell King, Tony Luck,
Michael Ellerman
Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <CAHk-=wjpnu=882iD9ck9Ywt6R1LYX_Hv-oS7dBMsWZwDRGZ5jA@mail.gmail.com>
On 18.06.20 21:48, Linus Torvalds wrote:
> [ Explicitly added architecture lists and developers to the cc to make
> this more visible ]
>
> On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> Andrew and I decided to drop the patches implementing your suggested
>> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
>> were way to many conflicts. After -rc1 might be a good time for this as
>> all the conflicts are resolved now.
>
> So I've merged this renaming now, together with my changes to make
> 'get_kernel_nofault()' look and act a lot more like 'get_user()'.
>
> It just felt wrong (and potentially dangerous) to me to have a
> 'get_kernel_nofault()' naming that implied semantics that we're all
> familiar with from 'get_user()', but acting very differently.
>
> But part of the fixups I made for the type checking are for
> architectures where I didn't even compile-test the end result. I
> looked at every case individually, and the patch looks sane, but I
> could have screwed something up.
>
> Basically, 'get_kernel_nofault()' doesn't do the same automagic type
> munging from the pointer to the target that 'get_user()' does, but at
> least now it checks that the types are superficially compatible.
> There should be build failures if they aren't, but I hopefully fixed
> everything up properly for all architectures.
>
> This email is partly to ask people to double-check, but partly just as
> a heads-up so that _if_ I screwed something up, you'll have the
> background and it won't take you by surprise.
Linus. thanks for the heads-up!
With your change it compiles cleanly on 32- and 64-bit parisc.
Helge
^ permalink raw reply
* Re: [PATCH net] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 20:51 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, linuxppc-dev
In-Reply-To: <20200618085722.110f3702@kicinski-fedora-PC1C0HJN>
On 6/18/20 10:57 AM, Jakub Kicinski wrote:
> On Thu, 18 Jun 2020 10:43:46 -0500 Thomas Falcon wrote:
>> The max MTU limit defined for ibmveth is not accounting for
>> virtual ethernet buffer overhead, which is twenty-two additional
>> bytes set aside for the ethernet header and eight additional bytes
>> of an opaque handle reserved for use by the hypervisor. Update the
>> max MTU to reflect this overhead.
>>
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
> How about
>
> Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
> Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
>
> ?
Thanks, do you need me to send a v2 with those tags?
Tom
^ permalink raw reply
* Re: [PATCH v2 0/4] Migrate non-migrated pages of a SVM.
From: Ram Pai @ 2020-06-18 20:37 UTC (permalink / raw)
To: kvm-ppc, linuxppc-dev
Cc: ldufour, cclaudio, bharata, sathnaga, aneesh.kumar, sukadev,
bauerman, david
In-Reply-To: <1592471945-24786-1-git-send-email-linuxram@us.ibm.com>
I should have elaborated on the problem and the need for these patches.
Explaining it here. Will add it to the series in next version.
-------------------------------------------------------------
The time taken to switch a VM to Secure-VM, increases by the size of
the VM. A 100GB VM takes about 7minutes. This is unacceptable. This
linear increase is caused by a suboptimal behavior by the Ultravisor and
the Hypervisor. The Ultravisor unnecessarily migrates all the GFN of
the VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.
However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the rest of the GFNs are
normal GFNs. This leads of memory corruption, manifested consistently
when the SVM reboots.
The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN. But at the same time the hypervisor is unable to handle any
H_SVM_PAGE_IN requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall. This problem manifests as random errors in
the SVM, when a memory-slot is hotplugged.
This patch series automatically migrates the non-migrated pages of a SVM,
and thus solves the problem.
------------------------------------------------------------------
On Thu, Jun 18, 2020 at 02:19:01AM -0700, Ram Pai wrote:
> This patch series migrates the non-migrated pages of a SVM.
> This is required when the UV calls H_SVM_INIT_DONE, and
> when a memory-slot is hotplugged to a Secure VM.
>
> Testing: Passed rigorous SVM reboot test using different
> sized SVMs.
>
> Changelog:
> . fixed a bug observed by Bharata. Pages that
> where paged-in and later paged-out must also be
> skipped from migration during H_SVM_INIT_DONE.
>
> Laurent Dufour (1):
> KVM: PPC: Book3S HV: migrate hot plugged memory
>
> Ram Pai (3):
> KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.c
> KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
> KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in
> H_SVM_INIT_DONE
>
> Documentation/powerpc/ultravisor.rst | 2 +
> arch/powerpc/include/asm/kvm_book3s_uvmem.h | 8 +-
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
> arch/powerpc/kvm/book3s_hv.c | 12 +-
> arch/powerpc/kvm/book3s_hv_uvmem.c | 449 ++++++++++++++++++++++------
> 5 files changed, 368 insertions(+), 105 deletions(-)
>
> --
> 1.8.3.1
--
Ram Pai
^ permalink raw reply
* Re: rename probe_kernel_* and probe_user_*
From: Linus Torvalds @ 2020-06-18 19:48 UTC (permalink / raw)
To: Christoph Hellwig, Russell King, Tony Luck, Helge Deller,
Michael Ellerman
Cc: linux-arch, linux-ia64, linux-parisc, the arch/x86 maintainers,
Linux Kernel Mailing List, Andrew Morton, linuxppc-dev
In-Reply-To: <20200617073755.8068-1-hch@lst.de>
[ Explicitly added architecture lists and developers to the cc to make
this more visible ]
On Wed, Jun 17, 2020 at 12:38 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Andrew and I decided to drop the patches implementing your suggested
> rename of the probe_kernel_* and probe_user_* helpers from -mm as there
> were way to many conflicts. After -rc1 might be a good time for this as
> all the conflicts are resolved now.
So I've merged this renaming now, together with my changes to make
'get_kernel_nofault()' look and act a lot more like 'get_user()'.
It just felt wrong (and potentially dangerous) to me to have a
'get_kernel_nofault()' naming that implied semantics that we're all
familiar with from 'get_user()', but acting very differently.
But part of the fixups I made for the type checking are for
architectures where I didn't even compile-test the end result. I
looked at every case individually, and the patch looks sane, but I
could have screwed something up.
Basically, 'get_kernel_nofault()' doesn't do the same automagic type
munging from the pointer to the target that 'get_user()' does, but at
least now it checks that the types are superficially compatible.
There should be build failures if they aren't, but I hopefully fixed
everything up properly for all architectures.
This email is partly to ask people to double-check, but partly just as
a heads-up so that _if_ I screwed something up, you'll have the
background and it won't take you by surprise.
Linus
^ permalink raw reply
* Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int
From: Frederic Barrat @ 2020-06-18 16:56 UTC (permalink / raw)
To: Fenghua Yu
Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
Tony Luck, David Woodhouse, Felix Kuehling, linux-kernel, iommu,
Jacob Jun Pan, linuxppc-dev, Lu Baolu
In-Reply-To: <20200618153747.GE15763@romley-ivt3.sc.intel.com>
Le 18/06/2020 à 17:37, Fenghua Yu a écrit :
> The first 3 patches clean up pasid and flag defitions to prepare for
> following patches.
>
> If you think this patch can be dropped, we will drop it.
Yes, I think that's the case.
Thanks,
Fred
^ permalink raw reply
* [PATCH v1 5/5] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
arch/powerpc/include/asm/firmware.h | 4 +++-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 ++++++++++++++++++-----
arch/powerpc/kvm/book3s_hv_nested.c | 13 +++++++++--
arch/powerpc/platforms/pseries/firmware.c | 1 +
4 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 6003c2e533a0..aa6a5ef5d483 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
#define FW_FEATURE_PAPR_SCM ASM_CONST(0x0000002000000000)
#define FW_FEATURE_ULTRAVISOR ASM_CONST(0x0000004000000000)
#define FW_FEATURE_STUFF_TCE ASM_CONST(0x0000008000000000)
+#define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000)
#ifndef __ASSEMBLY__
@@ -71,7 +72,8 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
- FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+ FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
+ FW_FEATURE_RPT_INVALIDATE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 84acb4769487..fcf8b031a32e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
#include <asm/pte-walk.h>
#include <asm/ultravisor.h>
#include <asm/kvm_book3s_uvmem.h>
+#include <asm/plpar_wrappers.h>
/*
* Supported radix tree geometry.
@@ -313,10 +314,17 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
return;
}
- psi = shift_to_mmu_psize(pshift);
- rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
- rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
- lpid, rb);
+ if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+ psi = shift_to_mmu_psize(pshift);
+ rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+ rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+ H_TLBIE_P1_ENC(0, 0, 1), lpid, rb);
+ } else {
+ rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+ H_RPTI_TYPE_NESTED |
+ H_RPTI_TYPE_TLB, H_RPTI_PAGE_ALL,
+ addr, addr + psize);
+ }
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
}
@@ -330,8 +338,15 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
return;
}
- rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
- lpid, TLBIEL_INVAL_SET_LPID);
+ if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+ rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+ H_TLBIE_P1_ENC(1, 0, 1),
+ lpid, TLBIEL_INVAL_SET_LPID);
+ else
+ rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+ H_RPTI_TYPE_NESTED |
+ H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+ 0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
}
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 75993f44519b..81f903284d34 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
#include <asm/pgalloc.h>
#include <asm/pte-walk.h>
#include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
static struct patb_entry *pseries_partition_tb;
@@ -402,8 +403,16 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
- rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
- lpid, TLBIEL_INVAL_SET_LPID);
+ if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+ rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+ H_TLBIE_P1_ENC(2, 0, 1),
+ lpid, TLBIEL_INVAL_SET_LPID);
+ else
+ rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+ H_RPTI_TYPE_NESTED |
+ H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+ H_RPTI_TYPE_PAT,
+ H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
}
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 3e49cc23a97a..4c7b7f5a2ebc 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
{FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"},
{FW_FEATURE_PAPR_SCM, "hcall-scm"},
+ {FW_FEATURE_RPT_INVALIDATE, "hcall-rpt-invalidate"},
};
/* Build up the firmware features bitmask using the contents of
--
2.21.3
^ permalink raw reply related
* [PATCH v1 4/5] powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>
From: Nicholas Piggin <npiggin@gmail.com>
When platform doesn't support GTSE, let TLB invalidation requests
for radix guests be off-loaded to the host using H_RPT_INVALIDATE
hcall.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
[hcall wrapper, error path handling and renames]
---
arch/powerpc/include/asm/hvcall.h | 27 ++++++-
arch/powerpc/include/asm/plpar_wrappers.h | 52 +++++++++++++
arch/powerpc/mm/book3s64/radix_tlb.c | 95 +++++++++++++++++++++--
3 files changed, 166 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e90c073e437e..3f9bc7ad1cdd 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -305,7 +305,8 @@
#define H_SCM_UNBIND_ALL 0x3FC
#define H_SCM_HEALTH 0x400
#define H_SCM_PERFORMANCE_STATS 0x418
-#define MAX_HCALL_OPCODE H_SCM_PERFORMANCE_STATS
+#define H_RPT_INVALIDATE 0x448
+#define MAX_HCALL_OPCODE H_RPT_INVALIDATE
/* Scope args for H_SCM_UNBIND_ALL */
#define H_UNBIND_SCOPE_ALL (0x1)
@@ -389,6 +390,30 @@
#define PROC_TABLE_RADIX 0x04
#define PROC_TABLE_GTSE 0x01
+/*
+ * Defines for
+ * H_RPT_INVALIDATE - Invalidate RPT translation lookaside information.
+ */
+
+/* Type of translation to invalidate (type) */
+#define H_RPTI_TYPE_NESTED 0x0001 /* Invalidate nested guest partition-scope */
+#define H_RPTI_TYPE_TLB 0x0002 /* Invalidate TLB */
+#define H_RPTI_TYPE_PWC 0x0004 /* Invalidate Page Walk Cache */
+#define H_RPTI_TYPE_PRT 0x0008 /* Invalidate Process Table Entries if H_RPTI_TYPE_NESTED is clear */
+#define H_RPTI_TYPE_PAT 0x0008 /* Invalidate Partition Table Entries if H_RPTI_TYPE_NESTED is set */
+
+/* Invalidation targets (target) */
+#define H_RPTI_TARGET_CMMU 0x01 /* All virtual processors in the partition */
+#define H_RPTI_TARGET_CMMU_LOCAL 0x02 /* Current virtual processor */
+#define H_RPTI_TARGET_NMMU 0x04 /* All nest/accelerator agents in use by the partition */
+
+/* Page size mask (page sizes) */
+#define H_RPTI_PAGE_4K 0x01
+#define H_RPTI_PAGE_64K 0x02
+#define H_RPTI_PAGE_2M 0x04
+#define H_RPTI_PAGE_1G 0x08
+#define H_RPTI_PAGE_ALL (-1UL)
+
#ifndef __ASSEMBLY__
#include <linux/types.h>
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 4497c8afb573..92320bb309c7 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -334,6 +334,51 @@ static inline long plpar_get_cpu_characteristics(struct h_cpu_char_result *p)
return rc;
}
+/*
+ * Wrapper to H_RPT_INVALIDATE hcall that handles return values appropriately
+ *
+ * - Returns H_SUCCESS on success
+ * - For H_BUSY return value, we retry the hcall.
+ * - For any other hcall failures, attempt a full flush once before
+ * resorting to BUG().
+ *
+ * Note: This hcall is expected to fail only very rarely. The correct
+ * error recovery of killing the process/guest will be eventually
+ * needed.
+ */
+static inline long pseries_rpt_invalidate(u32 pid, u64 target, u64 type,
+ u64 page_sizes, u64 start, u64 end)
+{
+ long rc;
+ unsigned long all = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC;
+
+ while (true) {
+ rc = plpar_hcall_norets(H_RPT_INVALIDATE, pid, target, type,
+ page_sizes, start, end);
+ if (rc == H_BUSY) {
+ cpu_relax();
+ continue;
+ } else if (rc == H_SUCCESS)
+ return rc;
+
+ /* Flush request failed, try with a full flush once */
+ if (type & H_RPTI_TYPE_NESTED)
+ all |= H_RPTI_TYPE_PAT;
+ else
+ all |= H_RPTI_TYPE_PRT;
+retry:
+ rc = plpar_hcall_norets(H_RPT_INVALIDATE, pid, target,
+ all, page_sizes, 0, -1UL);
+ if (rc == H_BUSY) {
+ cpu_relax();
+ goto retry;
+ } else if (rc == H_SUCCESS)
+ return rc;
+
+ BUG();
+ }
+}
+
#else /* !CONFIG_PPC_PSERIES */
static inline long plpar_set_ciabr(unsigned long ciabr)
@@ -346,6 +391,13 @@ static inline long plpar_pte_read_4(unsigned long flags, unsigned long ptex,
{
return 0;
}
+
+static inline long pseries_rpt_invalidate(u32 pid, u64 target, u64 type,
+ u64 page_sizes, u64 start, u64 end)
+{
+ return 0;
+}
+
#endif /* CONFIG_PPC_PSERIES */
#endif /* _ASM_POWERPC_PLPAR_WRAPPERS_H */
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index b5cc9b23cf02..733935b68f37 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -16,11 +16,25 @@
#include <asm/tlbflush.h>
#include <asm/trace.h>
#include <asm/cputhreads.h>
+#include <asm/plpar_wrappers.h>
#define RIC_FLUSH_TLB 0
#define RIC_FLUSH_PWC 1
#define RIC_FLUSH_ALL 2
+static inline u64 psize_to_h_rpti(unsigned long psize)
+{
+ if (psize == MMU_PAGE_4K)
+ return H_RPTI_PAGE_4K;
+ if (psize == MMU_PAGE_64K)
+ return H_RPTI_PAGE_64K;
+ if (psize == MMU_PAGE_2M)
+ return H_RPTI_PAGE_2M;
+ if (psize == MMU_PAGE_1G)
+ return H_RPTI_PAGE_1G;
+ return H_RPTI_PAGE_ALL;
+}
+
/*
* tlbiel instruction for radix, set invalidation
* i.e., r=1 and is=01 or is=10 or is=11
@@ -694,7 +708,14 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
goto local;
}
- if (cputlb_use_tlbie()) {
+ if (!mmu_has_feature(MMU_FTR_GTSE)) {
+ unsigned long tgt = H_RPTI_TARGET_CMMU;
+
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
+ H_RPTI_PAGE_ALL, 0, -1UL);
+ } else if (cputlb_use_tlbie()) {
if (mm_needs_flush_escalation(mm))
_tlbie_pid(pid, RIC_FLUSH_ALL);
else
@@ -727,7 +748,16 @@ static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
goto local;
}
}
- if (cputlb_use_tlbie())
+ if (!mmu_has_feature(MMU_FTR_GTSE)) {
+ unsigned long tgt = H_RPTI_TARGET_CMMU;
+ unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+ H_RPTI_TYPE_PRT;
+
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, type,
+ H_RPTI_PAGE_ALL, 0, -1UL);
+ } else if (cputlb_use_tlbie())
_tlbie_pid(pid, RIC_FLUSH_ALL);
else
_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
@@ -760,7 +790,19 @@ void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
exit_flush_lazy_tlbs(mm);
goto local;
}
- if (cputlb_use_tlbie())
+ if (!mmu_has_feature(MMU_FTR_GTSE)) {
+ unsigned long tgt, page_sizes, size;
+
+ tgt = H_RPTI_TARGET_CMMU;
+ page_sizes = psize_to_h_rpti(psize);
+ size = 1UL << mmu_psize_to_shift(psize);
+
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
+ page_sizes, vmaddr,
+ vmaddr + size);
+ } else if (cputlb_use_tlbie())
_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
else
_tlbiel_va_multicast(mm, vmaddr, pid, psize, RIC_FLUSH_TLB);
@@ -810,7 +852,14 @@ static inline void _tlbiel_kernel_broadcast(void)
*/
void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
{
- if (cputlb_use_tlbie())
+ if (!mmu_has_feature(MMU_FTR_GTSE)) {
+ unsigned long tgt = H_RPTI_TARGET_CMMU | H_RPTI_TARGET_NMMU;
+ unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+ H_RPTI_TYPE_PRT;
+
+ pseries_rpt_invalidate(0, tgt, type, H_RPTI_PAGE_ALL,
+ start, end);
+ } else if (cputlb_use_tlbie())
_tlbie_pid(0, RIC_FLUSH_ALL);
else
_tlbiel_kernel_broadcast();
@@ -864,7 +913,17 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
nr_pages > tlb_local_single_page_flush_ceiling);
}
- if (full) {
+ if (!mmu_has_feature(MMU_FTR_GTSE) && !local) {
+ unsigned long tgt = H_RPTI_TARGET_CMMU;
+ unsigned long page_sizes = psize_to_h_rpti(mmu_virtual_psize);
+
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+ page_sizes |= psize_to_h_rpti(MMU_PAGE_2M);
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB, page_sizes,
+ start, end);
+ } else if (full) {
if (local) {
_tlbiel_pid(pid, RIC_FLUSH_TLB);
} else {
@@ -1046,7 +1105,17 @@ static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
nr_pages > tlb_local_single_page_flush_ceiling);
}
- if (full) {
+ if (!mmu_has_feature(MMU_FTR_GTSE) && !local) {
+ unsigned long tgt = H_RPTI_TARGET_CMMU;
+ unsigned long type = H_RPTI_TYPE_TLB;
+ unsigned long page_sizes = psize_to_h_rpti(psize);
+
+ if (also_pwc)
+ type |= H_RPTI_TYPE_PWC;
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, type, page_sizes, start, end);
+ } else if (full) {
if (local) {
_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
} else {
@@ -1111,7 +1180,19 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
exit_flush_lazy_tlbs(mm);
goto local;
}
- if (cputlb_use_tlbie())
+ if (!mmu_has_feature(MMU_FTR_GTSE)) {
+ unsigned long tgt, type, page_sizes;
+
+ tgt = H_RPTI_TARGET_CMMU;
+ type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+ H_RPTI_TYPE_PRT;
+ page_sizes = psize_to_h_rpti(mmu_virtual_psize);
+
+ if (atomic_read(&mm->context.copros) > 0)
+ tgt |= H_RPTI_TARGET_NMMU;
+ pseries_rpt_invalidate(pid, tgt, type, page_sizes,
+ addr, end);
+ } else if (cputlb_use_tlbie())
_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
else
_tlbiel_va_range_multicast(mm,
--
2.21.3
^ permalink raw reply related
* [PATCH v1 3/5] powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if enabled
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>
H_REGISTER_PROC_TBL asks for GTSE by default. GTSE flag bit should
be set only when GTSE is supported.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
arch/powerpc/platforms/pseries/lpar.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index e4ed5317f117..58ba76bc1964 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1680,9 +1680,11 @@ static int pseries_lpar_register_process_table(unsigned long base,
if (table_size)
flags |= PROC_TABLE_NEW;
- if (radix_enabled())
- flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE;
- else
+ if (radix_enabled()) {
+ flags |= PROC_TABLE_RADIX;
+ if (mmu_has_feature(MMU_FTR_GTSE))
+ flags |= PROC_TABLE_GTSE;
+ } else
flags |= PROC_TABLE_HPT_SLB;
for (;;) {
rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
--
2.21.3
^ permalink raw reply related
* [PATCH v1 2/5] powerpc/prom_init: Ask for Radix GTSE only if supported.
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>
In the case of radix, don't ask for GTSE by default but ask
only if GTSE is enabled.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
arch/powerpc/kernel/prom_init.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 5f15b10eb007..16dd14f58ba6 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1336,12 +1336,15 @@ static void __init prom_check_platform_support(void)
}
}
- if (supported.radix_mmu && supported.radix_gtse &&
- IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
- /* Radix preferred - but we require GTSE for now */
- prom_debug("Asking for radix with GTSE\n");
+ if (supported.radix_mmu && IS_ENABLED(CONFIG_PPC_RADIX_MMU)) {
+ /* Radix preferred - Check if GTSE is also supported */
+ prom_debug("Asking for radix\n");
ibm_architecture_vec.vec5.mmu = OV5_FEAT(OV5_MMU_RADIX);
- ibm_architecture_vec.vec5.radix_ext = OV5_FEAT(OV5_RADIX_GTSE);
+ if (supported.radix_gtse)
+ ibm_architecture_vec.vec5.radix_ext =
+ OV5_FEAT(OV5_RADIX_GTSE);
+ else
+ prom_debug("Radix GTSE isn't supported\n");
} else if (supported.hash_mmu) {
/* Default to hash mmu (if we can) */
prom_debug("Asking for hash\n");
--
2.21.3
^ permalink raw reply related
* [PATCH v1 1/5] powerpc/mm: Make GTSE an MMU FTR
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20200618160930.26324-1-bharata@linux.ibm.com>
Make GTSE an MMU feature and enable it by default for radix.
However for guest, conditionally enable it if hypervisor supports
it via OV5 vector.
Having GTSE as an MMU feature will make it easy to enable radix
without GTSE. Currently radix assumes GTSE is enabled by default.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
arch/powerpc/include/asm/mmu.h | 4 ++++
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/mm/init_64.c | 5 ++++-
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index f4ac25d4df05..884d51995934 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,9 @@
* Individual features below.
*/
+/* Guest Translation Shootdown Enable */
+#define MMU_FTR_GTSE ASM_CONST(0x00001000)
+
/*
* Support for 68 bit VA space. We added that from ISA 2.05
*/
@@ -173,6 +176,7 @@ enum {
#endif
#ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
+ MMU_FTR_GTSE |
#ifdef CONFIG_PPC_KUAP
MMU_FTR_RADIX_KUAP |
#endif /* CONFIG_PPC_KUAP */
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 3a409517c031..fcb815b3a84d 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -337,6 +337,7 @@ static int __init feat_enable_mmu_radix(struct dt_cpu_feature *f)
#ifdef CONFIG_PPC_RADIX_MMU
cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
cur_cpu_spec->mmu_features |= MMU_FTRS_HASH_BASE;
+ cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_MMU;
return 1;
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index c7ce4ec5060e..a7b571c60e90 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -408,12 +408,15 @@ static void __init early_check_vec5(void)
if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
OV5_FEAT(OV5_RADIX_GTSE))) {
pr_warn("WARNING: Hypervisor doesn't support RADIX with GTSE\n");
- }
+ cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
+ } else
+ cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
/* Do radix anyway - the hypervisor said we had to */
cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
} else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
/* Hypervisor only supports hash - disable radix */
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+ cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
}
}
--
2.21.3
^ permalink raw reply related
* [PATCH v1 0/5] Off-load TLB invalidations to host for !GTSE
From: Bharata B Rao @ 2020-06-18 16:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
Hypervisor may choose not to enable Guest Translation Shootdown Enable
(GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
permitted to use instructions like tblie and tlbsync directly, but is
expected to make hypervisor calls to get the TLB flushed.
This series enables the TLB flush routines in the radix code to
off-load TLB flushing to hypervisor via the newly proposed hcall
H_RPT_INVALIDATE. The specification of this hcall is still evolving
while the patchset is posted here for any early comments.
To easily check the availability of GTSE, it is made an MMU feature.
The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
handle GTSE as an optionally available feature and to not assume GTSE
when radix support is available.
The actual hcall implementation for KVM isn't included in this
patchset.
H_RPT_INVALIDATE
================
Syntax:
int64 /* H_Success: Return code on successful completion */
/* H_Busy - repeat the call with the same */
/* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */
uint64 pid, /* PID/LPID to invalidate */
uint64 target, /* Invalidation target */
uint64 type, /* Type of lookaside information */
uint64 pageSizes, /* Page sizes */
uint64 start, /* Start of Effective Address (EA) range (inclusive) */
uint64 end) /* End of EA range (exclusive) */
Invalidation targets (target)
-----------------------------
Core MMU 0x01 /* All virtual processors in the partition */
Core local MMU 0x02 /* Current virtual processor */
Nest MMU 0x04 /* All nest/accelerator agents in use by the partition */
A combination of the above can be specified, except core and core local.
Type of translation to invalidate (type)
---------------------------------------
NESTED 0x0001 /* invalidate nested guest partition-scope */
TLB 0x0002 /* Invalidate TLB */
PWC 0x0004 /* Invalidate Page Walk Cache */
PRT 0x0008 /* Invalidate Process Table Entries if NESTED is clear*/
PAT 0x0008 /* Invalidate Partition Table Entries if NESTED is set*/
A combination of the above can be specified.
Page size mask (pages)
----------------------
4K 0x01
64K 0x02
2M 0x04
1G 0x08
All sizes (-1UL)
A combination of the above can be specified.
All page sizes can be selected with -1.
Semantics: Invalidate radix tree lookaside information
matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
should be a valid Quadrant address and end > start.
* Return H_NotSupported if the partition is not in running in radix
translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
Else pid specifies a nested LPID, and the invalidation is performed
on nested guest partition table and nested guest partition scope real
addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
Those which are partially covered are considered outside invalidation
range, which allows a caller to optimally invalidate ranges that may
contain mixed page sizes.
* Return H_SUCCESS on success.
Bharata B Rao (4):
powerpc/mm: Make GTSE an MMU FTR
powerpc/prom_init: Ask for Radix GTSE only if supported.
powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
enabled
KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
Nicholas Piggin (1):
powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
!GTSE
arch/powerpc/include/asm/firmware.h | 4 +-
arch/powerpc/include/asm/hvcall.h | 27 ++++++-
arch/powerpc/include/asm/mmu.h | 4 +
arch/powerpc/include/asm/plpar_wrappers.h | 52 +++++++++++++
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/kernel/prom_init.c | 13 ++--
arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +++++--
arch/powerpc/kvm/book3s_hv_nested.c | 13 +++-
arch/powerpc/mm/book3s64/radix_tlb.c | 95 +++++++++++++++++++++--
arch/powerpc/mm/init_64.c | 5 +-
arch/powerpc/platforms/pseries/firmware.c | 1 +
arch/powerpc/platforms/pseries/lpar.c | 8 +-
12 files changed, 224 insertions(+), 26 deletions(-)
--
2.21.3
^ permalink raw reply
* Re: [PATCH net] ibmveth: Fix max MTU limit
From: Jakub Kicinski @ 2020-06-18 15:57 UTC (permalink / raw)
To: Thomas Falcon; +Cc: netdev, linuxppc-dev
In-Reply-To: <1592495026-27202-1-git-send-email-tlfalcon@linux.ibm.com>
On Thu, 18 Jun 2020 10:43:46 -0500 Thomas Falcon wrote:
> The max MTU limit defined for ibmveth is not accounting for
> virtual ethernet buffer overhead, which is twenty-two additional
> bytes set aside for the ethernet header and eight additional bytes
> of an opaque handle reserved for use by the hypervisor. Update the
> max MTU to reflect this overhead.
>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
How about
Fixes: d894be57ca92 ("ethernet: use net core MTU range checking in more drivers")
Fixes: 110447f8269a ("ethernet: fix min/max MTU typos")
?
^ permalink raw reply
* [PATCH net] ibmveth: Fix max MTU limit
From: Thomas Falcon @ 2020-06-18 15:43 UTC (permalink / raw)
To: netdev; +Cc: Thomas Falcon, linuxppc-dev
The max MTU limit defined for ibmveth is not accounting for
virtual ethernet buffer overhead, which is twenty-two additional
bytes set aside for the ethernet header and eight additional bytes
of an opaque handle reserved for use by the hypervisor. Update the
max MTU to reflect this overhead.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
drivers/net/ethernet/ibm/ibmveth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 96d36ae5049e..c5c732601e35 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1715,7 +1715,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id)
}
netdev->min_mtu = IBMVETH_MIN_MTU;
- netdev->max_mtu = ETH_MAX_MTU;
+ netdev->max_mtu = ETH_MAX_MTU - IBMVETH_BUFF_OH;
memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN);
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int
From: Fenghua Yu @ 2020-06-18 15:37 UTC (permalink / raw)
To: Frederic Barrat
Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
Tony Luck, David Woodhouse, Felix Kuehling, linux-kernel, iommu,
Jacob Jun Pan, linuxppc-dev, Lu Baolu
In-Reply-To: <972dc2cb-9643-53af-b11d-ebb56d96053d@linux.ibm.com>
Hi, Frederic,
On Thu, Jun 18, 2020 at 10:05:19AM +0200, Frederic Barrat wrote:
>
>
> Le 13/06/2020 à 02:41, Fenghua Yu a écrit :
> >PASID is defined as "int" although it's a 20-bit value and shouldn't be
> >negative int. To be consistent with type defined in iommu, define PASID
> >as "unsigned int".
>
>
> It looks like this patch was considered because of the use of 'pasid' in
> variable or function names. The ocxl driver only makes sense on powerpc and
> shouldn't compile on anything else, so it's probably useless in the context
> of that series.
> The pasid here is defined by the opencapi specification
> (https://opencapi.org), it is borrowed from the PCI world and you could
> argue it could be an unsigned int. But then I think the patch doesn't go far
> enough. But considering it's not used on x86, I think this patch can be
> dropped.
The first 3 patches clean up pasid and flag defitions to prepare for
following patches.
If you think this patch can be dropped, we will drop it.
Thanks.
-Fenghua
^ permalink raw reply
* [PATCH 4/6] exec: split prepare_arg_pages
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
Move counting the arguments and enviroment variables out of
prepare_arg_pages and rename the rest of the function to check_arg_limit.
This prepares for a version of do_execvat that takes kernel pointers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/exec.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index a5d91f8b1341d5..34781db6bf6889 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -435,20 +435,10 @@ static int count_strings(const char __user *const __user *argv)
return i;
}
-static int prepare_arg_pages(struct linux_binprm *bprm,
- const char __user *const __user *argv,
- const char __user *const __user *envp)
+static int check_arg_limit(struct linux_binprm *bprm)
{
unsigned long limit, ptr_size;
- bprm->argc = count_strings(argv);
- if (bprm->argc < 0)
- return bprm->argc;
-
- bprm->envc = count_strings(envp);
- if (bprm->envc < 0)
- return bprm->envc;
-
/*
* Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
* (whichever is smaller) for the argv+env strings.
@@ -1886,7 +1876,19 @@ int do_execveat(int fd, struct filename *filename,
if (retval)
goto out_unmark;
- retval = prepare_arg_pages(bprm, argv, envp);
+ bprm->argc = count_strings(argv);
+ if (bprm->argc < 0) {
+ retval = bprm->argc;
+ goto out;
+ }
+
+ bprm->envc = count_strings(envp);
+ if (bprm->envc < 0) {
+ retval = bprm->envc;
+ goto out;
+ }
+
+ retval = check_arg_limit(bprm);
if (retval < 0)
goto out;
--
2.26.2
^ permalink raw reply related
* [PATCH 6/6] kernel: add a kernel_wait helper
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
Add a helper that waits for a pid and stores the status in the passed
in kernel pointer. Use it to fix the usage of kernel_wait4 in
call_usermodehelper_exec_sync that only happens to work due to the
implicit set_fs(KERNEL_DS) for kernel threads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/sched/task.h | 1 +
kernel/exit.c | 16 ++++++++++++++++
kernel/umh.c | 29 ++++-------------------------
3 files changed, 21 insertions(+), 25 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 38359071236ad7..a80007df396e95 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -102,6 +102,7 @@ struct task_struct *fork_idle(int);
struct mm_struct *copy_init_mm(void);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
+int kernel_wait(pid_t pid, int *stat);
extern void free_task(struct task_struct *tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 727150f2810338..fd598846df0b17 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1626,6 +1626,22 @@ long kernel_wait4(pid_t upid, int __user *stat_addr, int options,
return ret;
}
+int kernel_wait(pid_t pid, int *stat)
+{
+ struct wait_opts wo = {
+ .wo_type = PIDTYPE_PID,
+ .wo_pid = find_get_pid(pid),
+ .wo_flags = WEXITED,
+ };
+ int ret;
+
+ ret = do_wait(&wo);
+ if (ret > 0 && wo.wo_stat)
+ *stat = wo.wo_stat;
+ put_pid(wo.wo_pid);
+ return ret;
+}
+
SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr,
int, options, struct rusage __user *, ru)
{
diff --git a/kernel/umh.c b/kernel/umh.c
index 1284823dbad338..6fd948e478bec4 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -126,37 +126,16 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
{
pid_t pid;
- /* If SIGCLD is ignored kernel_wait4 won't populate the status. */
+ /* If SIGCLD is ignored do_wait won't populate the status. */
kernel_sigaction(SIGCHLD, SIG_DFL);
pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD);
- if (pid < 0) {
+ if (pid < 0)
sub_info->retval = pid;
- } else {
- int ret = -ECHILD;
- /*
- * Normally it is bogus to call wait4() from in-kernel because
- * wait4() wants to write the exit code to a userspace address.
- * But call_usermodehelper_exec_sync() always runs as kernel
- * thread (workqueue) and put_user() to a kernel address works
- * OK for kernel threads, due to their having an mm_segment_t
- * which spans the entire address space.
- *
- * Thus the __user pointer cast is valid here.
- */
- kernel_wait4(pid, (int __user *)&ret, 0, NULL);
-
- /*
- * If ret is 0, either call_usermodehelper_exec_async failed and
- * the real error code is already in sub_info->retval or
- * sub_info->retval is 0 anyway, so don't mess with it then.
- */
- if (ret)
- sub_info->retval = ret;
- }
+ else
+ kernel_wait(pid, &sub_info->retval);
/* Restore default kernel sig handler */
kernel_sigaction(SIGCHLD, SIG_IGN);
-
umh_complete(sub_info);
}
--
2.26.2
^ permalink raw reply related
* [PATCH 5/6] exec: add a kernel_execveat helper
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
Add a kernel_execveat helper to execute a binary with kernel space argv
and envp pointers. Switch executing init and user mode helpers to this
new helper instead of relying on the implicit set_fs(KERNEL_DS) for early
init code and kernel threads, and move the getname call into the
do_execve helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/exec.c | 109 ++++++++++++++++++++++++++++++++--------
include/linux/binfmts.h | 6 +--
init/main.c | 6 +--
kernel/umh.c | 8 ++-
4 files changed, 95 insertions(+), 34 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 34781db6bf6889..7923b8334ae600 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -435,6 +435,21 @@ static int count_strings(const char __user *const __user *argv)
return i;
}
+static int count_kernel_strings(const char *const *argv)
+{
+ int i;
+
+ if (!argv)
+ return 0;
+
+ for (i = 0; argv[i]; i++) {
+ if (i >= MAX_ARG_STRINGS)
+ return -E2BIG;
+ }
+
+ return i;
+}
+
static int check_arg_limit(struct linux_binprm *bprm)
{
unsigned long limit, ptr_size;
@@ -611,6 +626,19 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm)
}
EXPORT_SYMBOL(copy_string_kernel);
+static int copy_strings_kernel(int argc, const char *const *argv,
+ struct linux_binprm *bprm)
+{
+ int ret;
+
+ while (argc-- > 0) {
+ ret = copy_string_kernel(argv[argc], bprm);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
#ifdef CONFIG_MMU
/*
@@ -1793,9 +1821,11 @@ static int exec_binprm(struct linux_binprm *bprm)
return 0;
}
-int do_execveat(int fd, struct filename *filename,
+static int __do_execveat(int fd, struct filename *filename,
const char __user *const __user *argv,
const char __user *const __user *envp,
+ const char *const *kernel_argv,
+ const char *const *kernel_envp,
int flags, struct file *file)
{
char *pathbuf = NULL;
@@ -1876,16 +1906,30 @@ int do_execveat(int fd, struct filename *filename,
if (retval)
goto out_unmark;
- bprm->argc = count_strings(argv);
- if (bprm->argc < 0) {
- retval = bprm->argc;
- goto out;
- }
+ if (unlikely(kernel_argv)) {
+ bprm->argc = count_kernel_strings(kernel_argv);
+ if (bprm->argc < 0) {
+ retval = bprm->argc;
+ goto out;
+ }
- bprm->envc = count_strings(envp);
- if (bprm->envc < 0) {
- retval = bprm->envc;
- goto out;
+ bprm->envc = count_kernel_strings(kernel_envp);
+ if (bprm->envc < 0) {
+ retval = bprm->envc;
+ goto out;
+ }
+ } else {
+ bprm->argc = count_strings(argv);
+ if (bprm->argc < 0) {
+ retval = bprm->argc;
+ goto out;
+ }
+
+ bprm->envc = count_strings(envp);
+ if (bprm->envc < 0) {
+ retval = bprm->envc;
+ goto out;
+ }
}
retval = check_arg_limit(bprm);
@@ -1902,13 +1946,22 @@ int do_execveat(int fd, struct filename *filename,
goto out;
bprm->exec = bprm->p;
- retval = copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
- retval = copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
+ if (unlikely(kernel_argv)) {
+ retval = copy_strings_kernel(bprm->envc, kernel_envp, bprm);
+ if (retval < 0)
+ goto out;
+ retval = copy_strings_kernel(bprm->argc, kernel_argv, bprm);
+ if (retval < 0)
+ goto out;
+ } else {
+ retval = copy_strings(bprm->envc, envp, bprm);
+ if (retval < 0)
+ goto out;
+ retval = copy_strings(bprm->argc, argv, bprm);
+ if (retval < 0)
+ goto out;
+ }
retval = exec_binprm(bprm);
if (retval < 0)
@@ -1959,6 +2012,23 @@ int do_execveat(int fd, struct filename *filename,
return retval;
}
+static int do_execveat(int fd, const char *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp, int flags)
+{
+ int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+ struct filename *name = getname_flags(filename, lookup_flags, NULL);
+
+ return __do_execveat(fd, name, argv, envp, NULL, NULL, flags, NULL);
+}
+
+int kernel_execveat(int fd, const char *filename, const char *const *argv,
+ const char *const *envp, int flags, struct file *file)
+{
+ return __do_execveat(fd, getname_kernel(filename), NULL, NULL, argv,
+ envp, flags, file);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
@@ -1988,7 +2058,7 @@ SYSCALL_DEFINE3(execve,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
{
- return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
+ return do_execveat(AT_FDCWD, filename, argv, envp, 0);
}
SYSCALL_DEFINE5(execveat,
@@ -1997,8 +2067,5 @@ SYSCALL_DEFINE5(execveat,
const char __user *const __user *, envp,
int, flags)
{
- int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
- struct filename *name = getname_flags(filename, lookup_flags, NULL);
-
- return do_execveat(fd, name, argv, envp, flags, NULL);
+ return do_execveat(fd, filename, argv, envp, flags);
}
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index bed702e4b1fbd9..1e61c980c16354 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -134,9 +134,7 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm);
extern void set_binfmt(struct linux_binfmt *new);
extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
-int do_execveat(int fd, struct filename *filename,
- const char __user *const __user *__argv,
- const char __user *const __user *__envp,
- int flags, struct file *file);
+int kernel_execveat(int fd, const char *filename, const char *const *argv,
+ const char *const *envp, int flags, struct file *file);
#endif /* _LINUX_BINFMTS_H */
diff --git a/init/main.c b/init/main.c
index 838950ea7bca22..33de235dc2aa00 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1329,10 +1329,8 @@ static int run_init_process(const char *init_filename)
pr_debug(" with environment:\n");
for (p = envp_init; *p; p++)
pr_debug(" %s\n", *p);
- return do_execveat(AT_FDCWD, getname_kernel(init_filename),
- (const char __user *const __user *)argv_init,
- (const char __user *const __user *)envp_init,
- 0, NULL);
+ return kernel_execveat(AT_FDCWD, init_filename, argv_init, envp_init, 0,
+ NULL);
}
static int try_to_run_init_process(const char *init_filename)
diff --git a/kernel/umh.c b/kernel/umh.c
index 7aa9a5817582ca..1284823dbad338 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -103,11 +103,9 @@ static int call_usermodehelper_exec_async(void *data)
commit_creds(new);
sub_info->pid = task_pid_nr(current);
- retval = do_execveat(AT_FDCWD,
- sub_info->path ? getname_kernel(sub_info->path) : NULL,
- (const char __user *const __user *)sub_info->argv,
- (const char __user *const __user *)sub_info->envp,
- 0, sub_info->file);
+ retval = kernel_execveat(AT_FDCWD, sub_info->path,
+ (const char *const *)sub_info->argv,
+ (const char *const *)sub_info->envp, 0, sub_info->file);
if (sub_info->file && !retval)
current->flags |= PF_UMH;
out:
--
2.26.2
^ permalink raw reply related
* [PATCH 3/6] exec: cleanup the count() function
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
Remove the max argument as it is hard wired to MAX_ARG_STRINGS, and
give the function a slightly less generic name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/exec.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 4e5db0e35797a5..a5d91f8b1341d5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -407,9 +407,9 @@ get_user_arg_ptr(const char __user *const __user *argv, int nr)
}
/*
- * count() counts the number of strings in array ARGV.
+ * count_strings() counts the number of strings in array ARGV.
*/
-static int count(const char __user *const __user *argv, int max)
+static int count_strings(const char __user *const __user *argv)
{
int i = 0;
@@ -423,7 +423,7 @@ static int count(const char __user *const __user *argv, int max)
if (IS_ERR(p))
return -EFAULT;
- if (i >= max)
+ if (i >= MAX_ARG_STRINGS)
return -E2BIG;
++i;
@@ -441,11 +441,11 @@ static int prepare_arg_pages(struct linux_binprm *bprm,
{
unsigned long limit, ptr_size;
- bprm->argc = count(argv, MAX_ARG_STRINGS);
+ bprm->argc = count_strings(argv);
if (bprm->argc < 0)
return bprm->argc;
- bprm->envc = count(envp, MAX_ARG_STRINGS);
+ bprm->envc = count_strings(envp);
if (bprm->envc < 0)
return bprm->envc;
--
2.26.2
^ permalink raw reply related
* [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
The only differenence betweeen the compat exec* syscalls and their
native versions is that compat_ptr sign extension, and the fact that
the pointer arithmetics for the two dimensional arrays needs to use
the compat pointer size. Instead of the compat wrappers and the
struct user_arg_ptr machinery just use in_compat_syscall() to do the
right thing for the compat case deep inside get_user_arg_ptr().
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/arm64/include/asm/unistd32.h | 4 +-
arch/mips/kernel/syscalls/syscall_n32.tbl | 4 +-
arch/mips/kernel/syscalls/syscall_o32.tbl | 4 +-
arch/parisc/kernel/syscalls/syscall.tbl | 4 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 4 +-
arch/s390/kernel/syscalls/syscall.tbl | 4 +-
arch/sparc/kernel/syscalls.S | 4 +-
arch/x86/entry/syscall_x32.c | 7 ++
arch/x86/entry/syscalls/syscall_32.tbl | 4 +-
arch/x86/entry/syscalls/syscall_64.tbl | 4 +-
fs/exec.c | 103 ++++--------------
include/linux/compat.h | 7 --
include/uapi/asm-generic/unistd.h | 4 +-
tools/include/uapi/asm-generic/unistd.h | 4 +-
.../arch/powerpc/entry/syscalls/syscall.tbl | 4 +-
.../perf/arch/s390/entry/syscalls/syscall.tbl | 4 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 4 +-
17 files changed, 56 insertions(+), 117 deletions(-)
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f47..141f5d2ff1c34f 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -33,7 +33,7 @@ __SYSCALL(__NR_link, sys_link)
#define __NR_unlink 10
__SYSCALL(__NR_unlink, sys_unlink)
#define __NR_execve 11
-__SYSCALL(__NR_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR_chdir 12
__SYSCALL(__NR_chdir, sys_chdir)
/* 13 was sys_time */
@@ -785,7 +785,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 386
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 387
-__SYSCALL(__NR_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 388
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 389
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f777141f52568f..e861b5ab7179c9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -64,7 +64,7 @@
54 n32 getsockopt compat_sys_getsockopt
55 n32 clone __sys_clone
56 n32 fork __sys_fork
-57 n32 execve compat_sys_execve
+57 n32 execve sys_execve
58 n32 exit sys_exit
59 n32 wait4 compat_sys_wait4
60 n32 kill sys_kill
@@ -328,7 +328,7 @@
317 n32 getrandom sys_getrandom
318 n32 memfd_create sys_memfd_create
319 n32 bpf sys_bpf
-320 n32 execveat compat_sys_execveat
+320 n32 execveat sys_execveat
321 n32 userfaultfd sys_userfaultfd
322 n32 membarrier sys_membarrier
323 n32 mlock2 sys_mlock2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 13280625d312e9..bba80f74e9968e 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -18,7 +18,7 @@
8 o32 creat sys_creat
9 o32 link sys_link
10 o32 unlink sys_unlink
-11 o32 execve sys_execve compat_sys_execve
+11 o32 execve sys_execve
12 o32 chdir sys_chdir
13 o32 time sys_time32
14 o32 mknod sys_mknod
@@ -367,7 +367,7 @@
353 o32 getrandom sys_getrandom
354 o32 memfd_create sys_memfd_create
355 o32 bpf sys_bpf
-356 o32 execveat sys_execveat compat_sys_execveat
+356 o32 execveat sys_execveat
357 o32 userfaultfd sys_userfaultfd
358 o32 membarrier sys_membarrier
359 o32 mlock2 sys_mlock2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 5a758fa6ec5242..23fa0d0edf3384 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -385,7 +385,7 @@
339 common getrandom sys_getrandom
340 common memfd_create sys_memfd_create
341 common bpf sys_bpf
-342 common execveat sys_execveat compat_sys_execveat
+342 common execveat sys_execveat
343 common membarrier sys_membarrier
344 common userfaultfd sys_userfaultfd
345 common mlock2 sys_mlock2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index f833a319082247..c52cdab89dc0ae 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -20,7 +20,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 nospu execve sys_execve compat_sys_execve
+11 nospu execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -460,7 +460,7 @@
359 common getrandom sys_getrandom
360 common memfd_create sys_memfd_create
361 common bpf sys_bpf
-362 nospu execveat sys_execveat compat_sys_execveat
+362 nospu execveat sys_execveat
363 32 switch_endian sys_ni_syscall
363 64 switch_endian sys_switch_endian
363 spu switch_endian sys_ni_syscall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index bfdcb763395735..bd2275db2026ea 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat sys_creat
9 common link sys_link sys_link
10 common unlink sys_unlink sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve sys_execve
12 common chdir sys_chdir sys_chdir
13 32 time - sys_time32
14 common mknod sys_mknod sys_mknod
@@ -361,7 +361,7 @@
351 common bpf sys_bpf sys_bpf
352 common s390_pci_mmio_write sys_s390_pci_mmio_write sys_s390_pci_mmio_write
353 common s390_pci_mmio_read sys_s390_pci_mmio_read sys_s390_pci_mmio_read
-354 common execveat sys_execveat compat_sys_execveat
+354 common execveat sys_execveat sys_execveat
355 common userfaultfd sys_userfaultfd sys_userfaultfd
356 common membarrier sys_membarrier sys_membarrier
357 common recvmmsg sys_recvmmsg compat_sys_recvmmsg_time32
diff --git a/arch/sparc/kernel/syscalls.S b/arch/sparc/kernel/syscalls.S
index db42b4fb370844..70463972152a92 100644
--- a/arch/sparc/kernel/syscalls.S
+++ b/arch/sparc/kernel/syscalls.S
@@ -16,12 +16,12 @@ sys64_execveat:
sunos_execv:
mov %g0, %o2
sys32_execve:
- set compat_sys_execve, %g1
+ set sys_execve, %g1
jmpl %g1, %g0
flushw
sys32_execveat:
- set compat_sys_execveat, %g1
+ set sys_execveat, %g1
jmpl %g1, %g0
flushw
#endif
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 3d8d70d3896c87..c9f39900a93b96 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,6 +8,13 @@
#include <asm/unistd.h>
#include <asm/syscall.h>
+/*
+ * Reuse the 64-bit entry points for the x32 versions that occupy different
+ * slots in the syscall table.
+ */
+#define __x32_sys_execve __x64_sys_execve
+#define __x32_sys_execveat __x64_sys_execveat
+
#define __SYSCALL_64(nr, sym)
#define __SYSCALL_X32(nr, sym) extern long __x32_##sym(const struct pt_regs *);
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed11f..2b1eccd3f8f697 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -22,7 +22,7 @@
8 i386 creat sys_creat
9 i386 link sys_link
10 i386 unlink sys_unlink
-11 i386 execve sys_execve compat_sys_execve
+11 i386 execve sys_execve
12 i386 chdir sys_chdir
13 i386 time sys_time32
14 i386 mknod sys_mknod
@@ -369,7 +369,7 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
-358 i386 execveat sys_execveat compat_sys_execveat
+358 i386 execveat sys_execveat
359 i386 socket sys_socket
360 i386 socketpair sys_socketpair
361 i386 bind sys_bind
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e1370f..cb3fce6ed63ebf 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -375,7 +375,7 @@
517 x32 recvfrom compat_sys_recvfrom
518 x32 sendmsg compat_sys_sendmsg
519 x32 recvmsg compat_sys_recvmsg
-520 x32 execve compat_sys_execve
+520 x32 execve sys_execve
521 x32 ptrace compat_sys_ptrace
522 x32 rt_sigpending compat_sys_rt_sigpending
523 x32 rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
@@ -400,6 +400,6 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
-545 x32 execveat compat_sys_execveat
+545 x32 execveat sys_execveat
546 x32 preadv2 compat_sys_preadv64v2
547 x32 pwritev2 compat_sys_pwritev64v2
diff --git a/fs/exec.c b/fs/exec.c
index 354fdaa536ae7d..4e5db0e35797a5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -386,47 +386,34 @@ static int bprm_mm_init(struct linux_binprm *bprm)
return err;
}
-struct user_arg_ptr {
-#ifdef CONFIG_COMPAT
- bool is_compat;
-#endif
- union {
- const char __user *const __user *native;
-#ifdef CONFIG_COMPAT
- const compat_uptr_t __user *compat;
-#endif
- } ptr;
-};
-
-static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
+static const char __user *
+get_user_arg_ptr(const char __user *const __user *argv, int nr)
{
- const char __user *native;
-
-#ifdef CONFIG_COMPAT
- if (unlikely(argv.is_compat)) {
+ if (in_compat_syscall()) {
+ const compat_uptr_t __user *compat_argv =
+ compat_ptr((unsigned long)argv);
compat_uptr_t compat;
- if (get_user(compat, argv.ptr.compat + nr))
+ if (get_user(compat, compat_argv + nr))
return ERR_PTR(-EFAULT);
-
return compat_ptr(compat);
- }
-#endif
-
- if (get_user(native, argv.ptr.native + nr))
- return ERR_PTR(-EFAULT);
+ } else {
+ const char __user *native;
- return native;
+ if (get_user(native, argv + nr))
+ return ERR_PTR(-EFAULT);
+ return native;
+ }
}
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(struct user_arg_ptr argv, int max)
+static int count(const char __user *const __user *argv, int max)
{
int i = 0;
- if (argv.ptr.native != NULL) {
+ if (argv) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);
@@ -449,7 +436,8 @@ static int count(struct user_arg_ptr argv, int max)
}
static int prepare_arg_pages(struct linux_binprm *bprm,
- struct user_arg_ptr argv, struct user_arg_ptr envp)
+ const char __user *const __user *argv,
+ const char __user *const __user *envp)
{
unsigned long limit, ptr_size;
@@ -497,7 +485,7 @@ static int prepare_arg_pages(struct linux_binprm *bprm,
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, struct user_arg_ptr argv,
+static int copy_strings(int argc, const char __user *const __user *argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
@@ -1815,10 +1803,10 @@ static int exec_binprm(struct linux_binprm *bprm)
return 0;
}
-static int __do_execveat(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags, struct file *file)
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ int flags, struct file *file)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
@@ -1969,17 +1957,6 @@ static int __do_execveat(int fd, struct filename *filename,
return retval;
}
-int do_execveat(int fd, struct filename *filename,
- const char __user *const __user *__argv,
- const char __user *const __user *__envp,
- int flags, struct file *file)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
-
- return __do_execveat(fd, filename, argv, envp, flags, file);
-}
-
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
@@ -2023,41 +2000,3 @@ SYSCALL_DEFINE5(execveat,
return do_execveat(fd, name, argv, envp, flags, NULL);
}
-
-#ifdef CONFIG_COMPAT
-static int do_compat_execve(int fd, struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp,
- int flags)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
-
- return __do_execveat(fd, filename, argv, envp, flags, NULL);
-}
-
-COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
- const compat_uptr_t __user *, argv,
- const compat_uptr_t __user *, envp)
-{
- return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
-}
-
-COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
- const char __user *, filename,
- const compat_uptr_t __user *, argv,
- const compat_uptr_t __user *, envp,
- int, flags)
-{
- int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
- struct filename *name = getname_flags(filename, lookup_flags, NULL);
-
- return do_compat_execve(fd, name, argv, envp, flags);
-}
-#endif
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e90100c0de72e4..5e8f6a588e0d43 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -752,10 +752,6 @@ asmlinkage long compat_sys_recvmsg(int fd, struct compat_msghdr __user *msg,
asmlinkage long compat_sys_keyctl(u32 option,
u32 arg2, u32 arg3, u32 arg4, u32 arg5);
-/* arch/example/kernel/sys_example.c */
-asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp);
-
/* mm/fadvise.c: No generic prototype for fadvise64_64 */
/* mm/, CONFIG_MMU only */
@@ -806,9 +802,6 @@ asmlinkage ssize_t compat_sys_process_vm_writev(compat_pid_t pid,
const struct compat_iovec __user *lvec,
compat_ulong_t liovcnt, const struct compat_iovec __user *rvec,
compat_ulong_t riovcnt, compat_ulong_t flags);
-asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
- const compat_uptr_t __user *argv,
- const compat_uptr_t __user *envp, int flags);
asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a65c..c639d04a094b8b 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -640,7 +640,7 @@ __SC_COMP(__NR_keyctl, sys_keyctl, compat_sys_keyctl)
#define __NR_clone 220
__SYSCALL(__NR_clone, sys_clone)
#define __NR_execve 221
-__SC_COMP(__NR_execve, sys_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR3264_mmap 222
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
@@ -751,7 +751,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
-__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 282
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 283
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 3a3201e4618ef8..a6dbc8af8bd577 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -640,7 +640,7 @@ __SC_COMP(__NR_keyctl, sys_keyctl, compat_sys_keyctl)
#define __NR_clone 220
__SYSCALL(__NR_clone, sys_clone)
#define __NR_execve 221
-__SC_COMP(__NR_execve, sys_execve, compat_sys_execve)
+__SYSCALL(__NR_execve, sys_execve)
#define __NR3264_mmap 222
__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
@@ -751,7 +751,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
#define __NR_execveat 281
-__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+__SYSCALL(__NR_execveat, sys_execveat)
#define __NR_userfaultfd 282
__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
#define __NR_membarrier 283
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 35b61bfc1b1ae9..42bf8b461a0ed6 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat
9 common link sys_link
10 common unlink sys_unlink
-11 nospu execve sys_execve compat_sys_execve
+11 nospu execve sys_execve sys_execve
12 common chdir sys_chdir
13 32 time sys_time32
13 64 time sys_time
@@ -454,7 +454,7 @@
359 common getrandom sys_getrandom
360 common memfd_create sys_memfd_create
361 common bpf sys_bpf
-362 nospu execveat sys_execveat compat_sys_execveat
+362 nospu execveat sys_execveat sys_execveat
363 32 switch_endian sys_ni_syscall
363 64 switch_endian ppc_switch_endian
363 spu switch_endian sys_ni_syscall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index b38d48464368dc..f3c16f2d9746ac 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -18,7 +18,7 @@
8 common creat sys_creat compat_sys_creat
9 common link sys_link compat_sys_link
10 common unlink sys_unlink compat_sys_unlink
-11 common execve sys_execve compat_sys_execve
+11 common execve sys_execve sys_execve
12 common chdir sys_chdir compat_sys_chdir
13 32 time - compat_sys_time
14 common mknod sys_mknod compat_sys_mknod
@@ -361,7 +361,7 @@
351 common bpf sys_bpf compat_sys_bpf
352 common s390_pci_mmio_write sys_s390_pci_mmio_write compat_sys_s390_pci_mmio_write
353 common s390_pci_mmio_read sys_s390_pci_mmio_read compat_sys_s390_pci_mmio_read
-354 common execveat sys_execveat compat_sys_execveat
+354 common execveat sys_execveat sys_execveat
355 common userfaultfd sys_userfaultfd sys_userfaultfd
356 common membarrier sys_membarrier sys_membarrier
357 common recvmmsg sys_recvmmsg compat_sys_recvmmsg
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f839bc4f..8b88868e622d32 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -374,7 +374,7 @@
517 x32 recvfrom compat_sys_recvfrom
518 x32 sendmsg compat_sys_sendmsg
519 x32 recvmsg compat_sys_recvmsg
-520 x32 execve compat_sys_execve
+520 x32 execve sys_execve
521 x32 ptrace compat_sys_ptrace
522 x32 rt_sigpending compat_sys_rt_sigpending
523 x32 rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
@@ -399,6 +399,6 @@
542 x32 getsockopt compat_sys_getsockopt
543 x32 io_setup compat_sys_io_setup
544 x32 io_submit compat_sys_io_submit
-545 x32 execveat compat_sys_execveat
+545 x32 execveat sys_execveat
546 x32 preadv2 compat_sys_preadv64v2
547 x32 pwritev2 compat_sys_pwritev64v2
--
2.26.2
^ permalink raw reply related
* [PATCH 1/6] exec: cleanup the execve wrappers
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200618144627.114057-1-hch@lst.de>
Remove a whole bunch of wrappers that eventually all call
__do_execve_file, and consolidate the execvce helpers to:
(1) __do_execveat, which is the lowest level helper implementing the
actual functionality
(2) do_execvat, which is used by all callers that want native
pointers
(3) do_compat_execve, which is used by all compat syscalls
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/exec.c | 98 +++++++++++------------------------------
include/linux/binfmts.h | 12 ++---
init/main.c | 7 +--
kernel/umh.c | 16 +++----
4 files changed, 41 insertions(+), 92 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index e6e8a9a7032784..354fdaa536ae7d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1815,10 +1815,7 @@ static int exec_binprm(struct linux_binprm *bprm)
return 0;
}
-/*
- * sys_execve() executes a new program.
- */
-static int __do_execve_file(int fd, struct filename *filename,
+static int __do_execveat(int fd, struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp,
int flags, struct file *file)
@@ -1972,74 +1969,16 @@ static int __do_execve_file(int fd, struct filename *filename,
return retval;
}
-static int do_execveat_common(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags)
-{
- return __do_execve_file(fd, filename, argv, envp, flags, NULL);
-}
-
-int do_execve_file(struct file *file, void *__argv, void *__envp)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
-
- return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
-}
-
-int do_execve(struct filename *filename,
- const char __user *const __user *__argv,
- const char __user *const __user *__envp)
-{
- struct user_arg_ptr argv = { .ptr.native = __argv };
- struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
-}
-
int do_execveat(int fd, struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp,
- int flags)
+ int flags, struct file *file)
{
struct user_arg_ptr argv = { .ptr.native = __argv };
struct user_arg_ptr envp = { .ptr.native = __envp };
- return do_execveat_common(fd, filename, argv, envp, flags);
-}
-
-#ifdef CONFIG_COMPAT
-static int compat_do_execve(struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
- return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
-}
-
-static int compat_do_execveat(int fd, struct filename *filename,
- const compat_uptr_t __user *__argv,
- const compat_uptr_t __user *__envp,
- int flags)
-{
- struct user_arg_ptr argv = {
- .is_compat = true,
- .ptr.compat = __argv,
- };
- struct user_arg_ptr envp = {
- .is_compat = true,
- .ptr.compat = __envp,
- };
- return do_execveat_common(fd, filename, argv, envp, flags);
+ return __do_execveat(fd, filename, argv, envp, flags, file);
}
-#endif
void set_binfmt(struct linux_binfmt *new)
{
@@ -2070,7 +2009,7 @@ SYSCALL_DEFINE3(execve,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
{
- return do_execve(getname(filename), argv, envp);
+ return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
}
SYSCALL_DEFINE5(execveat,
@@ -2080,18 +2019,34 @@ SYSCALL_DEFINE5(execveat,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+ struct filename *name = getname_flags(filename, lookup_flags, NULL);
- return do_execveat(fd,
- getname_flags(filename, lookup_flags, NULL),
- argv, envp, flags);
+ return do_execveat(fd, name, argv, envp, flags, NULL);
}
#ifdef CONFIG_COMPAT
+static int do_compat_execve(int fd, struct filename *filename,
+ const compat_uptr_t __user *__argv,
+ const compat_uptr_t __user *__envp,
+ int flags)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+
+ return __do_execveat(fd, filename, argv, envp, flags, NULL);
+}
+
COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
const compat_uptr_t __user *, argv,
const compat_uptr_t __user *, envp)
{
- return compat_do_execve(getname(filename), argv, envp);
+ return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
}
COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
@@ -2101,9 +2056,8 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
int, flags)
{
int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+ struct filename *name = getname_flags(filename, lookup_flags, NULL);
- return compat_do_execveat(fd,
- getname_flags(filename, lookup_flags, NULL),
- argv, envp, flags);
+ return do_compat_execve(fd, name, argv, envp, flags);
}
#endif
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 4a20b7517dd036..bed702e4b1fbd9 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -134,13 +134,9 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm);
extern void set_binfmt(struct linux_binfmt *new);
extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
-extern int do_execve(struct filename *,
- const char __user * const __user *,
- const char __user * const __user *);
-extern int do_execveat(int, struct filename *,
- const char __user * const __user *,
- const char __user * const __user *,
- int);
-int do_execve_file(struct file *file, void *__argv, void *__envp);
+int do_execveat(int fd, struct filename *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ int flags, struct file *file);
#endif /* _LINUX_BINFMTS_H */
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5aa2..838950ea7bca22 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1329,9 +1329,10 @@ static int run_init_process(const char *init_filename)
pr_debug(" with environment:\n");
for (p = envp_init; *p; p++)
pr_debug(" %s\n", *p);
- return do_execve(getname_kernel(init_filename),
- (const char __user *const __user *)argv_init,
- (const char __user *const __user *)envp_init);
+ return do_execveat(AT_FDCWD, getname_kernel(init_filename),
+ (const char __user *const __user *)argv_init,
+ (const char __user *const __user *)envp_init,
+ 0, NULL);
}
static int try_to_run_init_process(const char *init_filename)
diff --git a/kernel/umh.c b/kernel/umh.c
index 79f139a7ca03c6..7aa9a5817582ca 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -103,15 +103,13 @@ static int call_usermodehelper_exec_async(void *data)
commit_creds(new);
sub_info->pid = task_pid_nr(current);
- if (sub_info->file) {
- retval = do_execve_file(sub_info->file,
- sub_info->argv, sub_info->envp);
- if (!retval)
- current->flags |= PF_UMH;
- } else
- retval = do_execve(getname_kernel(sub_info->path),
- (const char __user *const __user *)sub_info->argv,
- (const char __user *const __user *)sub_info->envp);
+ retval = do_execveat(AT_FDCWD,
+ sub_info->path ? getname_kernel(sub_info->path) : NULL,
+ (const char __user *const __user *)sub_info->argv,
+ (const char __user *const __user *)sub_info->envp,
+ 0, sub_info->file);
+ if (sub_info->file && !retval)
+ current->flags |= PF_UMH;
out:
sub_info->retval = retval;
/*
--
2.26.2
^ permalink raw reply related
* properly support exec and wait with kernel pointers v2
From: Christoph Hellwig @ 2020-06-18 14:46 UTC (permalink / raw)
To: Al Viro
Cc: linux-arch, linux-s390, linux-parisc, Arnd Bergmann, Brian Gerst,
x86, linux-mips, linux-kernel, linux-fsdevel, Luis Chamberlain,
sparclinux, linuxppc-dev, linux-arm-kernel
Hi all,
this series first cleans up the exec code and then adds proper
kernel_execveat and kernel_wait callers instead of relying on the fact
that the early init code and kernel threads implicitly run with
the address limit set to KERNEL_DS.
Note that the cleanup removes the compat execve(at) handlers entirely, as
we can handle the compat difference very nicely in a unified codebase.
x32 needs two hacky #defines for that for now, although those can go
away if the x32 syscall rework from Brian gets merged.
Changes since v1:
- remove a pointless ifdef from get_user_arg_ptr
- remove the need for a compat syscall handler for x32
Diffstat:
arch/arm64/include/asm/unistd32.h | 4
arch/mips/kernel/syscalls/syscall_n32.tbl | 4
arch/mips/kernel/syscalls/syscall_o32.tbl | 4
arch/parisc/kernel/syscalls/syscall.tbl | 4
arch/powerpc/kernel/syscalls/syscall.tbl | 4
arch/s390/kernel/syscalls/syscall.tbl | 4
arch/sparc/kernel/syscalls.S | 4
arch/x86/entry/syscall_x32.c | 7
arch/x86/entry/syscalls/syscall_32.tbl | 4
arch/x86/entry/syscalls/syscall_64.tbl | 4
fs/exec.c | 248 ++++++++-------------
include/linux/binfmts.h | 10
include/linux/compat.h | 7
include/linux/sched/task.h | 1
include/uapi/asm-generic/unistd.h | 4
init/main.c | 5
kernel/exit.c | 16 +
kernel/umh.c | 43 ---
tools/include/uapi/asm-generic/unistd.h | 4
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 4
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 4
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 4
22 files changed, 170 insertions(+), 223 deletions(-)
^ permalink raw reply
* [PATCH] mm/debug_vm_pgtable: Fix build failure with powerpc 8xx
From: Christophe Leroy @ 2020-06-18 14:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Will Deacon, Andrew Morton, Peter Zijlstra (Intel),
Anshuman Khandual
Cc: linux-mm, linuxppc-dev, linux-kernel
Since commit 9e343b467c70 ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses"), READ_ONCE() cannot be used
anymore to read complex page table entries. This leads to:
CC mm/debug_vm_pgtable.o
In file included from ./include/asm-generic/bug.h:5,
from ./arch/powerpc/include/asm/bug.h:109,
from ./include/linux/bug.h:5,
from ./include/linux/mmdebug.h:5,
from ./include/linux/gfp.h:5,
from mm/debug_vm_pgtable.c:13:
In function 'pte_clear_tests',
inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:363:2:
./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_210' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
392 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
373 | prefix ## suffix(); \
| ^~~~~~
./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
392 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
405 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| ^~~~~~~~~~~~~~~~~~
./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
291 | compiletime_assert_rwonce_type(x); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/debug_vm_pgtable.c:249:14: note: in expansion of macro 'READ_ONCE'
249 | pte_t pte = READ_ONCE(*ptep);
| ^~~~~~~~~
make[2]: *** [mm/debug_vm_pgtable.o] Error 1
Fix it by using the recently added ptep_get() helper.
Fixes: 9e343b467c70 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
mm/debug_vm_pgtable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index e45623016aea..61ab16fb2e36 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -246,13 +246,13 @@ static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
unsigned long vaddr)
{
- pte_t pte = READ_ONCE(*ptep);
+ pte_t pte = ptep_get(ptep);
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
- pte = READ_ONCE(*ptep);
+ pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
}
--
2.25.0
^ permalink raw reply related
* Re: [PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages
From: Christophe Leroy @ 2020-06-18 14:21 UTC (permalink / raw)
To: Michael Ellerman, Peter Zijlstra
Cc: Will Deacon, linux-kernel, linux-mm, Paul Mackerras,
Andrew Morton, linuxppc-dev
In-Reply-To: <87pn9xchql.fsf@mpe.ellerman.id.au>
Le 18/06/2020 à 02:58, Michael Ellerman a écrit :
> Peter Zijlstra <peterz@infradead.org> writes:
>> On Thu, Jun 18, 2020 at 12:21:22AM +1000, Michael Ellerman wrote:
>>> Peter Zijlstra <peterz@infradead.org> writes:
>>>> On Mon, Jun 15, 2020 at 12:57:59PM +0000, Christophe Leroy wrote:
>>
>>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>>> +#define __HAVE_ARCH_PTEP_GET
>>>>> +static inline pte_t ptep_get(pte_t *ptep)
>>>>> +{
>>>>> + pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0};
>>>>> +
>>>>> + return pte;
>>>>> +}
>>>>> +#endif
>>>>
>>>> Would it make sense to have a comment with this magic? The casual reader
>>>> might wonder WTH just happened when he stumbles on this :-)
>>>
>>> I tried writing a helpful comment but it's too late for my brain to form
>>> sensible sentences.
>>>
>>> Christophe can you send a follow-up with a comment explaining it? In
>>> particular the zero entries stand out, it's kind of subtle that those
>>> entries are only populated with the right value when we write to the
>>> page table.
>>
>> static inline pte_t ptep_get(pte_t *ptep)
>> {
>> unsigned long val = READ_ONCE(ptep->pte);
>> /* 16K pages have 4 identical value 4K entries */
>> pte_t pte = {val, val, val, val);
>> return pte;
>> }
>>
>> Maybe something like that?
>
> I think val wants to be pte_basic_t, but otherwise yeah I like that much
> better.
>
I sent a patch for that.
I'll also send one to fix mm/debug_vm_pgtable.c which also uses
READ_ONCE() to access page table entries.
Christophe
^ permalink raw reply
* Re: [PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages
From: Christophe Leroy @ 2020-06-18 14:19 UTC (permalink / raw)
To: Michael Ellerman, Peter Zijlstra
Cc: Will Deacon, linux-kernel, linux-mm, Paul Mackerras,
Andrew Morton, linuxppc-dev
In-Reply-To: <87o8phchnu.fsf@mpe.ellerman.id.au>
Le 18/06/2020 à 03:00, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 17/06/2020 à 16:38, Peter Zijlstra a écrit :
>>> On Thu, Jun 18, 2020 at 12:21:22AM +1000, Michael Ellerman wrote:
>>>> Peter Zijlstra <peterz@infradead.org> writes:
>>>>> On Mon, Jun 15, 2020 at 12:57:59PM +0000, Christophe Leroy wrote:
>>>
>>>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>>>> +#define __HAVE_ARCH_PTEP_GET
>>>>>> +static inline pte_t ptep_get(pte_t *ptep)
>>>>>> +{
>>>>>> + pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0};
>>>>>> +
>>>>>> + return pte;
>>>>>> +}
>>>>>> +#endif
>>>>>
>>>>> Would it make sense to have a comment with this magic? The casual reader
>>>>> might wonder WTH just happened when he stumbles on this :-)
>>>>
>>>> I tried writing a helpful comment but it's too late for my brain to form
>>>> sensible sentences.
>>>>
>>>> Christophe can you send a follow-up with a comment explaining it? In
>>>> particular the zero entries stand out, it's kind of subtle that those
>>>> entries are only populated with the right value when we write to the
>>>> page table.
>>>
>>> static inline pte_t ptep_get(pte_t *ptep)
>>> {
>>> unsigned long val = READ_ONCE(ptep->pte);
>>> /* 16K pages have 4 identical value 4K entries */
>>> pte_t pte = {val, val, val, val);
>>> return pte;
>>> }
>>>
>>> Maybe something like that?
>>
>> This should work as well. Indeed nobody cares about what's in the other
>> three. They are only there to ensure that ptep++ increases the ptep
>> pointer by 16 bytes. Only the HW require 4 identical values, that's
>> taken care of in set_pte_at() and pte_update().
>
> Right, but it seems less error-prone to have the in-memory
> representation match what we have in the page table (well that's
> in-memory too but you know what I mean).
>
>> So we should use the most efficient. Thinking once more, maybe what you
>> propose is the most efficient as there is no need to load another
>> register with value 0 in order to write it in the stack.
>
> On 64-bit I'd say it makes zero difference, the only thing that's going
> to matter is the load from ptep->pte. I don't know whether that's true
> on the 8xx cores though.
On 8xx core, loading a register with value 0 will take one cycle unless
there is some bubble left by another instruction (like a load from
memory or a taken branch). But that's in the noise.
Christophe
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox