All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: arm64: Filtering PMU events
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose

It is at times necessary to prevent a guest from being able to sample
certain events if multiple CPUs share resources such as a cache level.
In this case, it would be interesting if the VMM could simply prevent
certain events from being counted instead of simply not exposing a PMU.

Given that most events are not architected, there is no easy way
to designate which events shouldn't be counted other than specifying
the raw event number.

Since I have no idea whether it is better to use an event whitelist
or blacklist, the proposed API takes a cue from the x86 version and
allows either allowing or denying counting of ranges of events.
The event space being pretty large (16bits on ARMv8.1), the default
policy is set by the first filter that gets installed (default deny if
we first allow, default allow if we first deny).

The filter state is global to the guest, despite the PMU being per
CPU. I'm not sure whether it would be worth it making it CPU-private.

Anyway, I'd be interesting in comments on how people would use this.
I'll try to push a patch against kvmtool that implement this shortly
(what I have currently is a harcoded set of hacks).

Marc Zyngier (2):
  KVM: arm64: Add PMU event filtering infrastructure
  KVM: arm64: Document PMU filtering API

 Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++
 arch/arm64/include/asm/kvm_host.h       |  6 ++
 arch/arm64/include/uapi/asm/kvm.h       | 16 ++++++
 virt/kvm/arm/arm.c                      |  2 +
 virt/kvm/arm/pmu.c                      | 76 +++++++++++++++++++++----
 5 files changed, 116 insertions(+), 12 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH 0/2] KVM: arm64: Filtering PMU events
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm

It is at times necessary to prevent a guest from being able to sample
certain events if multiple CPUs share resources such as a cache level.
In this case, it would be interesting if the VMM could simply prevent
certain events from being counted instead of simply not exposing a PMU.

Given that most events are not architected, there is no easy way
to designate which events shouldn't be counted other than specifying
the raw event number.

Since I have no idea whether it is better to use an event whitelist
or blacklist, the proposed API takes a cue from the x86 version and
allows either allowing or denying counting of ranges of events.
The event space being pretty large (16bits on ARMv8.1), the default
policy is set by the first filter that gets installed (default deny if
we first allow, default allow if we first deny).

The filter state is global to the guest, despite the PMU being per
CPU. I'm not sure whether it would be worth it making it CPU-private.

Anyway, I'd be interesting in comments on how people would use this.
I'll try to push a patch against kvmtool that implement this shortly
(what I have currently is a harcoded set of hacks).

Marc Zyngier (2):
  KVM: arm64: Add PMU event filtering infrastructure
  KVM: arm64: Document PMU filtering API

 Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++
 arch/arm64/include/asm/kvm_host.h       |  6 ++
 arch/arm64/include/uapi/asm/kvm.h       | 16 ++++++
 virt/kvm/arm/arm.c                      |  2 +
 virt/kvm/arm/pmu.c                      | 76 +++++++++++++++++++++----
 5 files changed, 116 insertions(+), 12 deletions(-)

-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply

* [PATCH 1/2] KVM: arm64: Add PMU event filtering infrastructure
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>

It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.

For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).

Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.

Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  6 +++
 arch/arm64/include/uapi/asm/kvm.h | 16 +++++++
 virt/kvm/arm/arm.c                |  2 +
 virt/kvm/arm/pmu.c                | 76 ++++++++++++++++++++++++++-----
 4 files changed, 88 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d87aa609d2b6..b594c604215f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -91,6 +91,12 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/*
+	 * VM-wide PMU filter, implemented as a bitmap and big enough
+	 * for up to 65536 events
+	 */
+	unsigned long *pmu_filter;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..7b1511d6ce44 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -159,6 +159,21 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/*
+ * PMU filter structure. Describe a range of events with a particular
+ * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
+ */
+struct kvm_pmu_event_filter {
+	__u16	base_event;
+	__u16	nevents;
+
+#define KVM_PMU_EVENT_ALLOW	0
+#define KVM_PMU_EVENT_DENY	1
+
+	__u8	action;
+	__u8	pad[3];
+};
+
 /* for KVM_GET/SET_VCPU_EVENTS */
 struct kvm_vcpu_events {
 	struct {
@@ -329,6 +344,7 @@ struct kvm_vcpu_events {
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
+#define   KVM_ARM_VCPU_PMU_V3_FILTER	2
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d65a0faa46d8..9132f7508975 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -164,6 +164,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	free_percpu(kvm->arch.last_vcpu_ran);
 	kvm->arch.last_vcpu_ran = NULL;
 
+	bitmap_free(kvm->arch.pmu_filter);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_vcpu_destroy(kvm->vcpus[i]);
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index f0d0312c0a55..817f341ed9ad 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -585,6 +585,14 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 	    pmc->idx != ARMV8_PMU_CYCLE_IDX)
 		return;
 
+	/*
+	 * If we have a filter in place and that the event isn't allowed, do
+	 * not install a perf event either.
+	 */
+	if (vcpu->kvm->arch.pmu_filter &&
+	    !test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
+		return;
+
 	memset(&attr, 0, sizeof(struct perf_event_attr));
 	attr.type = PERF_TYPE_RAW;
 	attr.size = sizeof(attr);
@@ -735,15 +743,6 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
 
 static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
 {
-	if (!kvm_arm_support_pmu_v3())
-		return -ENODEV;
-
-	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
-		return -ENXIO;
-
-	if (vcpu->arch.pmu.created)
-		return -EBUSY;
-
 	if (irqchip_in_kernel(vcpu->kvm)) {
 		int ret;
 
@@ -794,8 +793,19 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+#define NR_EVENTS	(ARMV8_PMU_EVTYPE_EVENT + 1)
+
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_arm_support_pmu_v3())
+		return -ENODEV;
+
+	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
+		return -ENODEV;
+
+	if (vcpu->arch.pmu.created)
+		return -EBUSY;
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_PMU_V3_IRQ: {
 		int __user *uaddr = (int __user *)(long)attr->addr;
@@ -804,9 +814,6 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		if (!irqchip_in_kernel(vcpu->kvm))
 			return -EINVAL;
 
-		if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
-			return -ENODEV;
-
 		if (get_user(irq, uaddr))
 			return -EFAULT;
 
@@ -824,6 +831,50 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		vcpu->arch.pmu.irq_num = irq;
 		return 0;
 	}
+	case KVM_ARM_VCPU_PMU_V3_FILTER: {
+		struct kvm_pmu_event_filter __user *uaddr;
+		struct kvm_pmu_event_filter filter;
+
+		uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
+
+		if (copy_from_user(&filter, uaddr, sizeof(filter)))
+			return -EFAULT;
+
+		if (((u32)filter.base_event + filter.nevents) > NR_EVENTS ||
+		    (filter.action != KVM_PMU_EVENT_ALLOW &&
+		     filter.action != KVM_PMU_EVENT_DENY))
+			return -EINVAL;
+
+		mutex_lock(&vcpu->kvm->lock);
+
+		if (!vcpu->kvm->arch.pmu_filter) {
+			vcpu->kvm->arch.pmu_filter = bitmap_alloc(NR_EVENTS, GFP_KERNEL);
+			if (!vcpu->kvm->arch.pmu_filter) {
+				mutex_unlock(&vcpu->kvm->lock);
+				return -ENOMEM;
+			}
+
+			/*
+			 * The default depends on the first applied filter.
+			 * If it allows events, the default is to deny.
+			 * Conversely, if the first filter denies a set of
+			 * events, the default is to allow.
+			 */
+			if (filter.action == KVM_PMU_EVENT_ALLOW)
+				bitmap_zero(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+			else
+				bitmap_fill(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+		}
+
+		if (filter.action == KVM_PMU_EVENT_ALLOW)
+			bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+		else
+			bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+
+		mutex_unlock(&vcpu->kvm->lock);
+
+		return 0;
+	}
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -860,6 +911,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_PMU_V3_IRQ:
 	case KVM_ARM_VCPU_PMU_V3_INIT:
+	case KVM_ARM_VCPU_PMU_V3_FILTER:
 		if (kvm_arm_support_pmu_v3() &&
 		    test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
 			return 0;
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related

* [PATCH 2/2] KVM: arm64: Document PMU filtering API
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>

Add a small blurb describing how the event filtering API gets used.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..76902e6bbc05 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -36,6 +36,34 @@ Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
 virtual GIC implementation, this must be done after initializing the in-kernel
 irqchip.
 
+1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
+Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
+            pointer to a struct kvm_pmu_event_filter
+Returns: -ENODEV: PMUv3 not supported or GIC not initialized
+         -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
+                 configured as required prior to calling this attribute
+         -EBUSY: PMUv3 already initialized
+
+Request the installation of a PMU event filter describe as follows:
+
+struct kvm_pmu_event_filter {
+	__u16	base_event;
+	__u16	nevents;
+
+#define KVM_PMU_EVENT_ALLOW	0
+#define KVM_PMU_EVENT_DENY	1
+
+	__u8	action;
+	__u8	pad[3];
+};
+
+A filter range is defined as the range [base_event, base_event + nevents[,
+together with an action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
+first registered range defines the global policy (global ALLOW if the first
+action is DENY, global DENY if the first action is ALLOW). Multiple ranges
+can be programmed, and but fit within the 16bit space defined by the ARMv8.1
+PMU architecture.
+
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 Architectures: ARM,ARM64
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related

* [PATCH AUTOSEL 4.19 040/252] powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()
From: Sasha Levin @ 2020-02-14 16:08 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alexey Kardashevskiy, Oliver O'Halloran, linuxppc-dev,
	Sasha Levin
In-Reply-To: <20200214161147.15842-1-sashal@kernel.org>

From: Oliver O'Halloran <oohall@gmail.com>

[ Upstream commit 965c94f309be58fbcc6c8d3e4f123376c5970d79 ]

An ioda_pe for each VF is allocated in pnv_pci_sriov_enable() before
the pci_dev for the VF is created. We need to set the pe->pdev pointer
at some point after the pci_dev is created. Currently we do that in:

pcibios_bus_add_device()
	pnv_pci_dma_dev_setup() (via phb->ops.dma_dev_setup)
		/* fixup is done here */
		pnv_pci_ioda_dma_dev_setup() (via pnv_phb->dma_dev_setup)

The fixup needs to be done before setting up DMA for for the VF's PE,
but there's no real reason to delay it until this point. Move the
fixup into pnv_pci_ioda_fixup_iov() so the ordering is:

	pcibios_add_device()
		pnv_pci_ioda_fixup_iov() (via ppc_md.pcibios_fixup_sriov)

	pcibios_bus_add_device()
		...

This isn't strictly required, but it's a slightly more logical place
to do the fixup and it simplifies pnv_pci_dma_dev_setup().

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-4-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 29 +++++++++++++++++++----
 arch/powerpc/platforms/powernv/pci.c      | 14 -----------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e47ff05c5996f..19cd6affdd5fb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3020,9 +3020,6 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 	struct pci_dn *pdn;
 	int mul, total_vfs;
 
-	if (!pdev->is_physfn || pci_dev_is_added(pdev))
-		return;
-
 	pdn = pci_get_pdn(pdev);
 	pdn->vfs_expanded = 0;
 	pdn->m64_single_mode = false;
@@ -3097,6 +3094,30 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 		res->end = res->start - 1;
 	}
 }
+
+static void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev)
+{
+	if (WARN_ON(pci_dev_is_added(pdev)))
+		return;
+
+	if (pdev->is_virtfn) {
+		struct pnv_ioda_pe *pe = pnv_ioda_get_pe(pdev);
+
+		/*
+		 * VF PEs are single-device PEs so their pdev pointer needs to
+		 * be set. The pdev doesn't exist when the PE is allocated (in
+		 * (pcibios_sriov_enable()) so we fix it up here.
+		 */
+		pe->pdev = pdev;
+		WARN_ON(!(pe->flags & PNV_IODA_PE_VF));
+	} else if (pdev->is_physfn) {
+		/*
+		 * For PFs adjust their allocated IOV resources to match what
+		 * the PHB can support using it's M64 BAR table.
+		 */
+		pnv_pci_ioda_fixup_iov_resources(pdev);
+	}
+}
 #endif /* CONFIG_PCI_IOV */
 
 static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
@@ -3990,7 +4011,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	ppc_md.pcibios_default_alignment = pnv_pci_default_alignment;
 
 #ifdef CONFIG_PCI_IOV
-	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
+	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov;
 	ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 	ppc_md.pcibios_sriov_enable = pnv_pcibios_sriov_enable;
 	ppc_md.pcibios_sriov_disable = pnv_pcibios_sriov_disable;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index aa95b8e0f66ad..b6fa900af5da5 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -820,20 +820,6 @@ void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
 {
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
-#ifdef CONFIG_PCI_IOV
-	struct pnv_ioda_pe *pe;
-
-	/* Fix the VF pdn PE number */
-	if (pdev->is_virtfn) {
-		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-			if (pe->rid == ((pdev->bus->number << 8) |
-			    (pdev->devfn & 0xff))) {
-				pe->pdev = pdev;
-				break;
-			}
-		}
-	}
-#endif /* CONFIG_PCI_IOV */
 
 	if (phb && phb->dma_dev_setup)
 		phb->dma_dev_setup(phb, pdev);
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2] net: phy: dp83867: Add speed optimization feature
From: Dan Murphy @ 2020-02-14 18:31 UTC (permalink / raw)
  To: Grygorii Strashko, Florian Fainelli, Heiner Kallweit, andrew
  Cc: linux, davem, netdev, linux-kernel
In-Reply-To: <44499cb2-ec72-75a1-195b-fbadd8463e1c@ti.com>

Grygorii

On 2/14/20 12:32 PM, Grygorii Strashko wrote:
>
>
> On 06/02/2020 00:01, Dan Murphy wrote:
>> Florian
>>
>> On 2/5/20 4:00 PM, Florian Fainelli wrote:
>>> On 2/5/20 1:51 PM, Dan Murphy wrote:
>>>> Heiner
>>>>
>>>> On 2/5/20 3:16 PM, Heiner Kallweit wrote:
>>>>> On 04.02.2020 19:13, Dan Murphy wrote:
>>>>>> Set the speed optimization bit on the DP83867 PHY.
>>>>>> This feature can also be strapped on the 64 pin PHY devices
>>>>>> but the 48 pin devices do not have the strap pin available to enable
>>>>>> this feature in the hardware.  PHY team suggests to have this bit 
>>>>>> set.
>>>>>>
>>>>>> With this bit set the PHY will auto negotiate and report the link
>>>>>> parameters in the PHYSTS register.  This register provides a single
>>>>>> location within the register set for quick access to commonly 
>>>>>> accessed
>>>>>> information.
>>>>>>
>>>>>> In this case when auto negotiation is on the PHY core reads the bits
>>>>>> that have been configured or if auto negotiation is off the PHY core
>>>>>> reads the BMCR register and sets the phydev parameters accordingly.
>>>>>>
>>>>>> This Giga bit PHY can throttle the speed to 100Mbps or 10Mbps to
>>>>>> accomodate a
>>>>>> 4-wire cable.  If this should occur the PHYSTS register contains the
>>>>>> current negotiated speed and duplex mode.
>>>>>>
>>>>>> In overriding the genphy_read_status the dp83867_read_status will 
>>>>>> do a
>>>>>> genphy_read_status to setup the LP and pause bits.  And then the 
>>>>>> PHYSTS
>>>>>> register is read and the phydev speed and duplex mode settings are
>>>>>> updated.
>>>>>>
>>>>>> Signed-off-by: Dan Murphy <dmurphy@ti.com>
>>>>>> ---
>>>>>> v2 - Updated read status to call genphy_read_status first, added
>>>>>> link_change
>>>>>> callback to notify of speed change and use phy_set_bits -
>>>>>> https://lore.kernel.org/patchwork/patch/1188348/
>>>>>>
>>>>> As stated in the first review, it would be appreciated if you 
>>>>> implement
>>>>> also the downshift tunable. This could be a separate patch in this
>>>>> series.
>>>>> Most of the implementation would be boilerplate code.
>>>> I just don't have a requirement from our customer to make it 
>>>> adjustable
>>>> so I did not want to add something extra.
>>>>
>>>> I can add in for v3.
>>>>
>>>>> And I have to admit that I'm not too happy with the term "speed
>>>>> optimization".
>>>>> This sounds like the PHY has some magic to establish a 1.2Gbps link.
>>>>> Even though the vendor may call it this way in the datasheet, the
>>>>> standard
>>>>> term is "downshift". I'm fine with using "speed optimization" in
>>>>> constants
>>>>> to be in line with the datasheet. Just a comment in the code would be
>>>>> helpful
>>>>> that speed optimization is the vendor's term for downshift.
>>>> Ack.  The data sheet actually says "Speed optimization, also known as
>>>> link downshift"
>>>>
>>>> So I probably will just rename everything down shift.
>>>>
>>>>>>    drivers/net/phy/dp83867.c | 55 
>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 55 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
>>>>>> index 967f57ed0b65..6f86ca1ebb51 100644
>>>>>> --- a/drivers/net/phy/dp83867.c
>>>>>> +++ b/drivers/net/phy/dp83867.c
>>>>>> @@ -21,6 +21,7 @@
>>>>>>    #define DP83867_DEVADDR        0x1f
>>>>>>      #define MII_DP83867_PHYCTRL    0x10
>>>>>> +#define MII_DP83867_PHYSTS    0x11
>>>>>>    #define MII_DP83867_MICR    0x12
>>>>>>    #define MII_DP83867_ISR        0x13
>>>>>>    #define DP83867_CFG2        0x14
>>>>>> @@ -118,6 +119,15 @@
>>>>>>    #define DP83867_IO_MUX_CFG_CLK_O_SEL_MASK    (0x1f << 8)
>>>>>>    #define DP83867_IO_MUX_CFG_CLK_O_SEL_SHIFT    8
>>>>>>    +/* PHY STS bits */
>>>>>> +#define DP83867_PHYSTS_1000            BIT(15)
>>>>>> +#define DP83867_PHYSTS_100            BIT(14)
>>>>>> +#define DP83867_PHYSTS_DUPLEX            BIT(13)
>>>>>> +#define DP83867_PHYSTS_LINK            BIT(10)
>>>>>> +
>>>>>> +/* CFG2 bits */
>>>>>> +#define DP83867_SPEED_OPTIMIZED_EN        (BIT(8) | BIT(9))
>>>>>> +
>>>>>>    /* CFG3 bits */
>>>>>>    #define DP83867_CFG3_INT_OE            BIT(7)
>>>>>>    #define DP83867_CFG3_ROBUST_AUTO_MDIX        BIT(9)
>>>>>> @@ -287,6 +297,43 @@ static int dp83867_config_intr(struct 
>>>>>> phy_device
>>>>>> *phydev)
>>>>>>        return phy_write(phydev, MII_DP83867_MICR, micr_status);
>>>>>>    }
>>>>>>    +static void dp83867_link_change_notify(struct phy_device 
>>>>>> *phydev)
>>>>>> +{
>>>>>> +    if (phydev->state != PHY_RUNNING)
>>>>>> +        return;
>>>>>> +
>>>>>> +    if (phydev->speed == SPEED_100 || phydev->speed == SPEED_10)
>>>>>> +        phydev_warn(phydev, "Downshift detected connection is
>>>>>> %iMbps\n",
>>>>>> +                phydev->speed);
>>>>> The link partner may simply not advertise 1Gbps. How do you know that
>>>>> a link speed of e.g. 100Mbps is caused by a downshift?
>>>>> Some PHY's I've seen with this feature have a flag somewhere 
>>>>> indicating
>>>>> that downshift occurred. How about the PHY here?
>>>> I don't see a register that gives us that status
>>>>
>>>> I will ask the hardware team if there is one.
>>>>
>>>> This is a 1Gbps PHY by default so if a slower connection is 
>>>> established
>>>> due to faulty cabling or LP advertisement then this would be a down
>>>> shift IMO.
>>> With your current link_change_notify function it would not be possible
>>> to know whether the PHY was connected to a link partner that advertised
>>> only 10/100 and so 100 ended up being the link speed, or the link
>>> partner was capable of 10/100/1000 and downshift reduced the link 
>>> speed.
>>>
>>> If you cannot tell the difference from a register, it might be 
>>> better to
>>> simply omit that function then.
>>
>> Yeah I thought it was a bit redundant and wonky to see in the log 
>> that the link established to xG/Mbps and then see another message 
>> saying the downshift occurred.
>
> I think it's good idea to have this message as just wrong cable might 
> be used.
>
> But this notifier make no sense in it current form - it will produce 
> noise in case of forced 100m/10M.
>
> FYI. PHY sequence to update link:
> phy_state_machine()
> |-phy_check_link_status()
>   |-phy_link_down/up()
>     |- .phy_link_change()->phy_link_change()
>     |-adjust_link() ----> netdev callback
> |-phydev->drv->link_change_notify(phydev);
>
> So, log output has to be done or in .read_status() or
> some info has to be saved in .read_status() and then re-used in
> .link_change_notify().
>
OK I will try to find a way to give some sort of message.

Also we did get confirmation from HW guys and you also confirmed that 
the number of attempts for downshift is configurable.  So I will be 
adding back the tunable code once net-next opens.

Dan


^ permalink raw reply

* Re: [PATCH 01/19] target/arm: Fix field extract from MVFR[0-2]
From: Peter Maydell @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers
In-Reply-To: <20200214181547.21408-2-richard.henderson@linaro.org>

On Fri, 14 Feb 2020 at 18:15, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> These registers are 32-bits wide.  Cut and paste used FIELD_EX64
> instead of the more proper FIELD_EX32.  In practice all this did
> was use an unnecessary 64-bit operation, producing correct results.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

This is a duplicate with:

https://patchew.org/QEMU/20200214175116.9164-1-peter.maydell@linaro.org/20200214175116.9164-21-peter.maydell@linaro.org/

which I got out the door very slightly before your series
hit my mailbox :-)

-- PMM


^ permalink raw reply

* [PATCH 1/2] KVM: arm64: Add PMU event filtering infrastructure
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>

It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.

For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).

Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.

Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  6 +++
 arch/arm64/include/uapi/asm/kvm.h | 16 +++++++
 virt/kvm/arm/arm.c                |  2 +
 virt/kvm/arm/pmu.c                | 76 ++++++++++++++++++++++++++-----
 4 files changed, 88 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d87aa609d2b6..b594c604215f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -91,6 +91,12 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/*
+	 * VM-wide PMU filter, implemented as a bitmap and big enough
+	 * for up to 65536 events
+	 */
+	unsigned long *pmu_filter;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..7b1511d6ce44 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -159,6 +159,21 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/*
+ * PMU filter structure. Describe a range of events with a particular
+ * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
+ */
+struct kvm_pmu_event_filter {
+	__u16	base_event;
+	__u16	nevents;
+
+#define KVM_PMU_EVENT_ALLOW	0
+#define KVM_PMU_EVENT_DENY	1
+
+	__u8	action;
+	__u8	pad[3];
+};
+
 /* for KVM_GET/SET_VCPU_EVENTS */
 struct kvm_vcpu_events {
 	struct {
@@ -329,6 +344,7 @@ struct kvm_vcpu_events {
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
+#define   KVM_ARM_VCPU_PMU_V3_FILTER	2
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d65a0faa46d8..9132f7508975 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -164,6 +164,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	free_percpu(kvm->arch.last_vcpu_ran);
 	kvm->arch.last_vcpu_ran = NULL;
 
+	bitmap_free(kvm->arch.pmu_filter);
+
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_vcpu_destroy(kvm->vcpus[i]);
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index f0d0312c0a55..817f341ed9ad 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -585,6 +585,14 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 	    pmc->idx != ARMV8_PMU_CYCLE_IDX)
 		return;
 
+	/*
+	 * If we have a filter in place and that the event isn't allowed, do
+	 * not install a perf event either.
+	 */
+	if (vcpu->kvm->arch.pmu_filter &&
+	    !test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
+		return;
+
 	memset(&attr, 0, sizeof(struct perf_event_attr));
 	attr.type = PERF_TYPE_RAW;
 	attr.size = sizeof(attr);
@@ -735,15 +743,6 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
 
 static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
 {
-	if (!kvm_arm_support_pmu_v3())
-		return -ENODEV;
-
-	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
-		return -ENXIO;
-
-	if (vcpu->arch.pmu.created)
-		return -EBUSY;
-
 	if (irqchip_in_kernel(vcpu->kvm)) {
 		int ret;
 
@@ -794,8 +793,19 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+#define NR_EVENTS	(ARMV8_PMU_EVTYPE_EVENT + 1)
+
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_arm_support_pmu_v3())
+		return -ENODEV;
+
+	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
+		return -ENODEV;
+
+	if (vcpu->arch.pmu.created)
+		return -EBUSY;
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_PMU_V3_IRQ: {
 		int __user *uaddr = (int __user *)(long)attr->addr;
@@ -804,9 +814,6 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		if (!irqchip_in_kernel(vcpu->kvm))
 			return -EINVAL;
 
-		if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
-			return -ENODEV;
-
 		if (get_user(irq, uaddr))
 			return -EFAULT;
 
@@ -824,6 +831,50 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		vcpu->arch.pmu.irq_num = irq;
 		return 0;
 	}
+	case KVM_ARM_VCPU_PMU_V3_FILTER: {
+		struct kvm_pmu_event_filter __user *uaddr;
+		struct kvm_pmu_event_filter filter;
+
+		uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
+
+		if (copy_from_user(&filter, uaddr, sizeof(filter)))
+			return -EFAULT;
+
+		if (((u32)filter.base_event + filter.nevents) > NR_EVENTS ||
+		    (filter.action != KVM_PMU_EVENT_ALLOW &&
+		     filter.action != KVM_PMU_EVENT_DENY))
+			return -EINVAL;
+
+		mutex_lock(&vcpu->kvm->lock);
+
+		if (!vcpu->kvm->arch.pmu_filter) {
+			vcpu->kvm->arch.pmu_filter = bitmap_alloc(NR_EVENTS, GFP_KERNEL);
+			if (!vcpu->kvm->arch.pmu_filter) {
+				mutex_unlock(&vcpu->kvm->lock);
+				return -ENOMEM;
+			}
+
+			/*
+			 * The default depends on the first applied filter.
+			 * If it allows events, the default is to deny.
+			 * Conversely, if the first filter denies a set of
+			 * events, the default is to allow.
+			 */
+			if (filter.action == KVM_PMU_EVENT_ALLOW)
+				bitmap_zero(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+			else
+				bitmap_fill(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+		}
+
+		if (filter.action == KVM_PMU_EVENT_ALLOW)
+			bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+		else
+			bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+
+		mutex_unlock(&vcpu->kvm->lock);
+
+		return 0;
+	}
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -860,6 +911,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_PMU_V3_IRQ:
 	case KVM_ARM_VCPU_PMU_V3_INIT:
+	case KVM_ARM_VCPU_PMU_V3_FILTER:
 		if (kvm_arm_support_pmu_v3() &&
 		    test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
 			return 0;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 2/2] KVM: arm64: Document PMU filtering API
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>

Add a small blurb describing how the event filtering API gets used.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..76902e6bbc05 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -36,6 +36,34 @@ Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
 virtual GIC implementation, this must be done after initializing the in-kernel
 irqchip.
 
+1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
+Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
+            pointer to a struct kvm_pmu_event_filter
+Returns: -ENODEV: PMUv3 not supported or GIC not initialized
+         -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
+                 configured as required prior to calling this attribute
+         -EBUSY: PMUv3 already initialized
+
+Request the installation of a PMU event filter describe as follows:
+
+struct kvm_pmu_event_filter {
+	__u16	base_event;
+	__u16	nevents;
+
+#define KVM_PMU_EVENT_ALLOW	0
+#define KVM_PMU_EVENT_DENY	1
+
+	__u8	action;
+	__u8	pad[3];
+};
+
+A filter range is defined as the range [base_event, base_event + nevents[,
+together with an action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
+first registered range defines the global policy (global ALLOW if the first
+action is DENY, global DENY if the first action is ALLOW). Multiple ranges
+can be programmed, and but fit within the 16bit space defined by the ARMv8.1
+PMU architecture.
+
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 Architectures: ARM,ARM64
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH] drm/msm/a5xx: Always set an OPP supported hardware value
From: Jordan Crouse @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Jeffrey Hugo, David Airlie, freedreno, linux-kernel, dri-devel,
	Ben Dooks, Thomas Gleixner, AngeloGioacchino Del Regno, Sean Paul,
	Wen Yang

If the opp table specifies opp-supported-hw as a property but the driver
has not set a supported hardware value the OPP subsystem will reject
all the table entries.

Set a "default" value that will match the default table entries but not
conflict with any possible real bin values. Also fix a small memory leak
and free the buffer allocated by nvmem_cell_read().

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 7d9e63e..724024a 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1446,18 +1446,31 @@ static const struct adreno_gpu_funcs funcs = {
 static void check_speed_bin(struct device *dev)
 {
 	struct nvmem_cell *cell;
-	u32 bin, val;
+	u32 val;
+
+	/*
+	 * If the OPP table specifies a opp-supported-hw property then we have
+	 * to set something with dev_pm_opp_set_supported_hw() or the table
+	 * doesn't get populated so pick an arbitrary value that should
+	 * ensure the default frequencies are selected but not conflict with any
+	 * actual bins
+	 */
+	val = 0x80;
 
 	cell = nvmem_cell_get(dev, "speed_bin");
 
-	/* If a nvmem cell isn't defined, nothing to do */
-	if (IS_ERR(cell))
-		return;
+	if (!IS_ERR(cell)) {
+		void *buf = nvmem_cell_read(cell, NULL);
+
+		if (!IS_ERR(buf)) {
+			u8 bin = *((u8 *) buf);
 
-	bin = *((u32 *) nvmem_cell_read(cell, NULL));
-	nvmem_cell_put(cell);
+			val = (1 << bin);
+			kfree(buf);
+		}
 
-	val = (1 << bin);
+		nvmem_cell_put(cell);
+	}
 
 	dev_pm_opp_set_supported_hw(dev, &val, 1);
 }
-- 
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related

* [PATCH] drm/msm/a5xx: Always set an OPP supported hardware value
From: Jordan Crouse @ 2020-02-14 18:36 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Eric Anholt, David Airlie, Sean Paul, Wen Yang,
	AngeloGioacchino Del Regno, linux-kernel, dri-devel, Rob Clark,
	Thomas Gleixner, freedreno, Jeffrey Hugo, Ben Dooks,
	Daniel Vetter

If the opp table specifies opp-supported-hw as a property but the driver
has not set a supported hardware value the OPP subsystem will reject
all the table entries.

Set a "default" value that will match the default table entries but not
conflict with any possible real bin values. Also fix a small memory leak
and free the buffer allocated by nvmem_cell_read().

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 7d9e63e..724024a 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1446,18 +1446,31 @@ static const struct adreno_gpu_funcs funcs = {
 static void check_speed_bin(struct device *dev)
 {
 	struct nvmem_cell *cell;
-	u32 bin, val;
+	u32 val;
+
+	/*
+	 * If the OPP table specifies a opp-supported-hw property then we have
+	 * to set something with dev_pm_opp_set_supported_hw() or the table
+	 * doesn't get populated so pick an arbitrary value that should
+	 * ensure the default frequencies are selected but not conflict with any
+	 * actual bins
+	 */
+	val = 0x80;
 
 	cell = nvmem_cell_get(dev, "speed_bin");
 
-	/* If a nvmem cell isn't defined, nothing to do */
-	if (IS_ERR(cell))
-		return;
+	if (!IS_ERR(cell)) {
+		void *buf = nvmem_cell_read(cell, NULL);
+
+		if (!IS_ERR(buf)) {
+			u8 bin = *((u8 *) buf);
 
-	bin = *((u32 *) nvmem_cell_read(cell, NULL));
-	nvmem_cell_put(cell);
+			val = (1 << bin);
+			kfree(buf);
+		}
 
-	val = (1 << bin);
+		nvmem_cell_put(cell);
+	}
 
 	dev_pm_opp_set_supported_hw(dev, &val, 1);
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC patch 00/19] bpf: Make BPF and PREEMPT_RT co-exist
From: Thomas Gleixner @ 2020-02-14 18:36 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, bpf, netdev, ast, daniel, bigeasy, peterz, williams,
	rostedt, juri.lelli, mingo
In-Reply-To: <20200214.095303.341559462549043464.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 14 Feb 2020 14:39:17 +0100
>
>> This is a follow up to the initial patch series which David posted a
>> while ago:
>> 
>>  https://lore.kernel.org/bpf/20191207.160357.828344895192682546.davem@davemloft.net/
>> 
>> which was (while non-functional on RT) a good starting point for further
>> investigations.
>
> This looks really good after a cursory review, thanks for doing this week.
>
> I was personally unaware of the pre-allocation rules for MAPs used by
> tracing et al.  And that definitely shapes how this should be handled.

Hmm. I just noticed that my analysis only holds for PERF events. But
that's broken on mainline already.

Assume the following simplified callchain:

       kmalloc() from regular non BPF context
         cache empty
           freelist empty
             lock(zone->lock);
                tracepoint or kprobe
                  BPF()
                    update_elem()
                      lock(bucket)
                        kmalloc()
                          cache empty
                            freelist empty
                              lock(zone->lock);  <- DEADLOCK

So really, preallocation _must_ be enforced for all variants of
intrusive instrumentation. There is no if and but, it's simply mandatory
as all intrusive instrumentation has to follow the only sensible
principle: KISS = Keep It Safe and Simple.

The above is a perfectly valid scenario and works with perf and tracing,
so it has to work with BPF in the same safe way.

I might be missing some magic enforcement of that, but I got lost in the
maze.

Thanks,

        tglx


^ permalink raw reply

* Re: [PATCH v2 net 3/3] wireguard: send: account for mtu=0 devices
From: Jason A. Donenfeld @ 2020-02-14 18:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Netdev, Eric Dumazet
In-Reply-To: <e20d0c52-cb83-224d-7507-b53c5c4a5b69@gmail.com>

On 2/14/20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 2/14/20 10:15 AM, Jason A. Donenfeld wrote:
>> On Fri, Feb 14, 2020 at 6:56 PM Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>> Oh dear, can you describe what do you expect of a wireguard device with
>>> mtu == 0 or mtu == 1
>>>
>>> Why simply not allowing silly configurations, instead of convoluted tests
>>> in fast path ?
>>>
>>> We are speaking of tunnels adding quite a lot of headers, so we better
>>> not try to make them
>>> work on networks with tiny mtu. Just say no to syzbot.
>>
>> The idea was that wireguard might still be useful for the persistent
>> keepalive stuff. This branch becomes very cold very fast, so I don't
>> think it makes a difference performance wise, but if you feel strongly
>> about it, I can get rid of it and set a non-zero min_mtu that's the
>> smallest thing wireguard's xmit semantics will accept. It sounds like
>> you'd prefer that?
>>
> Well, if you believe that wireguard in persistent keepalive
> has some value on its own, I guess that we will have to support this mode.

Alright.

>
> Some legacy devices can have arbitrary mtu, and this has caused headaches.
> I was hoping that for brand new devices, we could have saner limits.
>
> About setting max_mtu to ~MAX_INT, does it mean wireguard will attempt
> to send UDP datagrams bigger than 64K ? Where is the segmentation done ?

The before passings off to the udp tunnel api, we indicate that we
support ip segmentation, and then it gets handled and fragmented
deeper down. Check out socket.c. This winds up being sometimes useful
for some odd people when it's faster to encrypt longer packets on
networks with no loss. I can't say I generally recommend people go
that route, but some report benefitting from it.


>


-- 
Jason A. Donenfeld
Deep Space Explorer
fr: +33 6 51 90 82 66
us: +1 513 476 1200
www.jasondonenfeld.com
www.zx2c4.com
zx2c4.com/keys/AB9942E6D4A4CFC3412620A749FC7012A5DE03AE.asc

^ permalink raw reply

* [RFC v2 0/6] vTPM for aarch64
From: Eric Auger @ 2020-02-14 18:36 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, stefanb, qemu-devel, qemu-arm,
	peter.maydell
  Cc: marcandre.lureau, lersek, ardb, philmd

This series adds the capability to instantiate an MMIO TPM TIS
in ARM virt.

The existing TPM TIS code is reshuffled into a generic part,
the ISA device and the sysbus device. The last patch allows
the instantiation of the TPM TIS sysbus device in ARM virt.

The series was tested with the swtpm/libtpms emulator.
Automatic guest LUKS volume unlocking (tpm2) was successful.
EDK2 support is under development [3]. Thanks to Ard
for supporting me when setting up the test environment.

Best Regards

Eric

Testing:

mkdir /tmp/tpm
swtpm socket \
--tpm2 \
-t -d \
--tpmstate dir=/tmp/tpm \
--ctrl type=unixio,path=/tmp/swtpm-sock

qemu command line must be augmented with the following options:

-chardev socket,id=chrtpm,path=/tmp/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis-device,tpmdev=tpm0 \

References:
[1] libtpms: https://github.com/stefanberger/libtpms/wiki
[2] swtpm: https://github.com/stefanberger/swtpm/wiki
[3] [PATCH 0/4] ArmVirtPkg: implement measured boot for ArmVirtQemu

This series can be found at:
https://github.com/eauger/qemu/tree/v4.2.0-tpm-rfc-v2

History:

RFC v1 -> RFC v2:
- restructure the existing code with common, ISA and sysbus part.
- both ARM and x86 integration were tested.
- acknowledgement: migration has not been tested

Eric Auger (6):
  tpm: rename TPM_TIS into TPM_TIS_ISA
  tpm: Use TPMState as a common struct
  tpm: Separate tpm_tis common functions from isa code
  tpm: Separate TPM_TIS and TPM_TIS_ISA configs
  tpm: Add the SysBus TPM TIS device
  hw/arm/virt: vTPM support

 default-configs/i386-softmmu.mak       |   2 +-
 hw/arm/Kconfig                         |   1 +
 hw/arm/sysbus-fdt.c                    |  36 +++++
 hw/arm/virt.c                          |   7 +
 hw/i386/Kconfig                        |   2 +-
 hw/i386/acpi-build.c                   |   6 +-
 hw/tpm/Kconfig                         |  12 +-
 hw/tpm/Makefile.objs                   |   4 +-
 hw/tpm/tpm_tis.h                       |  91 +++++++++++++
 hw/tpm/{tpm_tis.c => tpm_tis_common.c} | 181 ++-----------------------
 hw/tpm/tpm_tis_isa.c                   | 170 +++++++++++++++++++++++
 hw/tpm/tpm_tis_sysbus.c                | 159 ++++++++++++++++++++++
 include/sysemu/tpm.h                   |   7 +-
 tests/qtest/Makefile.include           |   4 +-
 14 files changed, 502 insertions(+), 180 deletions(-)
 create mode 100644 hw/tpm/tpm_tis.h
 rename hw/tpm/{tpm_tis.c => tpm_tis_common.c} (85%)
 create mode 100644 hw/tpm/tpm_tis_isa.c
 create mode 100644 hw/tpm/tpm_tis_sysbus.c

-- 
2.20.1


^ permalink raw reply

* Re: [PULL 0/6] Pull migration patches
From: Peter Maydell @ 2020-02-14 18:36 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Thomas Huth, Daniel P. Berrangé,
	QEMU Developers, Dr. David Alan Gilbert, Paolo Bonzini
In-Reply-To: <20200213132030.57757-1-quintela@redhat.com>

On Thu, 13 Feb 2020 at 13:21, Juan Quintela <quintela@redhat.com> wrote:
>
> The following changes since commit e18e5501d8ac692d32657a3e1ef545b14e72b730:
>
>   Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200210' into staging (2020-02-10 18:09:14 +0000)
>
> are available in the Git repository at:
>
>   https://github.com/juanquintela/qemu.git tags/pull-migration-pull-request
>
> for you to fetch changes up to 1a920d2b633e13df8961328b3b3e128989a34570:
>
>   git: Make submodule check only needed modules (2020-02-13 11:31:58 +0100)
>
> ----------------------------------------------------------------
> Migration pull request
>
> - don't pause when migration has been cancelled (Zhimin)
> - fix memleaks in tests (pan)(
> - optimize wait-unplug (keqian)
> - improve rdma error handling/messages (dave)
> - add some flexibility in autoconverge test (dave)
> - git-submodule: allow compiliation from same tree with different
>   number of git-submodules (juan)
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM


^ permalink raw reply

* [RFC v2 2/6] tpm: Use TPMState as a common struct
From: Eric Auger @ 2020-02-14 18:37 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, stefanb, qemu-devel, qemu-arm,
	peter.maydell
  Cc: marcandre.lureau, lersek, ardb, philmd
In-Reply-To: <20200214183704.14389-1-eric.auger@redhat.com>

As we plan to introdce a SysBus TPM TIS device, let's
make the TPMState a common struct usable by both the
ISADevice and the SysBusDevice. TPMStateISA embeds the
struct and inherits from the ISADevice.

The prototype of functions bound to be used by both
the ISA and SysBus devices is changed to take TPMState
handle.

A bunch of structs also are renamed to be specialized
for the ISA device. Besides those transformations, no
functional change is expected.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/tpm/tpm_tis.c | 146 +++++++++++++++++++++++++++++------------------
 1 file changed, 91 insertions(+), 55 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index c609737272..fc6d7ca579 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -65,7 +65,6 @@ typedef struct TPMLocality {
 } TPMLocality;
 
 typedef struct TPMState {
-    ISADevice busdev;
     MemoryRegion mmio;
 
     unsigned char buffer[TPM_TIS_BUFFER_MAX];
@@ -91,7 +90,15 @@ typedef struct TPMState {
     TPMPPI ppi;
 } TPMState;
 
-#define TPM(obj) OBJECT_CHECK(TPMState, (obj), TYPE_TPM_TIS_ISA)
+typedef struct TPMStateISA {
+    /*< private >*/
+    ISADevice parent_obj;
+
+    /*< public >*/
+    TPMState state; /* not a QOM object */
+} TPMStateISA;
+
+#define TPM_TIS_ISA(obj) OBJECT_CHECK(TPMStateISA, (obj), TYPE_TPM_TIS_ISA)
 
 #define DEBUG_TIS 0
 
@@ -281,9 +288,8 @@ static void tpm_tis_prep_abort(TPMState *s, uint8_t locty, uint8_t newlocty)
 /*
  * Callback from the TPM to indicate that the response was received.
  */
-static void tpm_tis_request_completed(TPMIf *ti, int ret)
+static void tpm_tis_request_completed(TPMState *s, int ret)
 {
-    TPMState *s = TPM(ti);
     uint8_t locty = s->cmd.locty;
     uint8_t l;
 
@@ -338,7 +344,7 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t locty)
 }
 
 #ifdef DEBUG_TIS
-static void tpm_tis_dump_state(void *opaque, hwaddr addr)
+static void tpm_tis_dump_state(TPMState *s, hwaddr addr)
 {
     static const unsigned regs[] = {
         TPM_TIS_REG_ACCESS,
@@ -353,7 +359,6 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
     int idx;
     uint8_t locty = tpm_tis_locality_from_addr(addr);
     hwaddr base = addr & ~0xfff;
-    TPMState *s = opaque;
 
     printf("tpm_tis: active locality      : %d\n"
            "tpm_tis: state of locality %d : %d\n"
@@ -363,7 +368,7 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 
     for (idx = 0; regs[idx] != 0xfff; idx++) {
         printf("tpm_tis: 0x%04x : 0x%08x\n", regs[idx],
-               (int)tpm_tis_mmio_read(opaque, base + regs[idx], 4));
+               (int)tpm_tis_mmio_read(s, base + regs[idx], 4));
     }
 
     printf("tpm_tis: r/w offset    : %d\n"
@@ -488,7 +493,7 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
         break;
 #ifdef DEBUG_TIS
     case TPM_TIS_REG_DEBUG:
-        tpm_tis_dump_state(opaque, addr);
+        tpm_tis_dump_state(s, addr);
         break;
 #endif
     }
@@ -835,10 +840,8 @@ static const MemoryRegionOps tpm_tis_memory_ops = {
 /*
  * Get the TPMVersion of the backend device being used
  */
-static enum TPMVersion tpm_tis_get_tpm_version(TPMIf *ti)
+static enum TPMVersion tpm_tis_get_tpm_version(TPMState *s)
 {
-    TPMState *s = TPM(ti);
-
     if (tpm_backend_had_startup_error(s->be_driver)) {
         return TPM_VERSION_UNSPEC;
     }
@@ -850,9 +853,8 @@ static enum TPMVersion tpm_tis_get_tpm_version(TPMIf *ti)
  * This function is called when the machine starts, resets or due to
  * S3 resume.
  */
-static void tpm_tis_reset(DeviceState *dev)
+static void tpm_tis_reset(TPMState *s)
 {
-    TPMState *s = TPM(dev);
     int c;
 
     s->be_tpm_version = tpm_backend_get_tpm_version(s->be_driver);
@@ -896,15 +898,14 @@ static void tpm_tis_reset(DeviceState *dev)
 
 /* persistent state handling */
 
-static int tpm_tis_pre_save(void *opaque)
+static int tpm_tis_pre_save(TPMState *s)
 {
-    TPMState *s = opaque;
     uint8_t locty = s->active_locty;
 
     trace_tpm_tis_pre_save(locty, s->rw_offset);
 
     if (DEBUG_TIS) {
-        tpm_tis_dump_state(opaque, 0);
+        tpm_tis_dump_state(s, 0);
     }
 
     /*
@@ -929,34 +930,78 @@ static const VMStateDescription vmstate_locty = {
     }
 };
 
-static const VMStateDescription vmstate_tpm_tis = {
+/* ISA */
+
+static int tpm_tis_pre_save_isa(void *opaque)
+{
+    TPMStateISA *isadev = opaque;
+
+    return tpm_tis_pre_save(&isadev->state);
+}
+
+static const VMStateDescription vmstate_tpm_tis_isa = {
     .name = "tpm-tis",
     .version_id = 0,
-    .pre_save  = tpm_tis_pre_save,
+    .pre_save  = tpm_tis_pre_save_isa,
     .fields = (VMStateField[]) {
-        VMSTATE_BUFFER(buffer, TPMState),
-        VMSTATE_UINT16(rw_offset, TPMState),
-        VMSTATE_UINT8(active_locty, TPMState),
-        VMSTATE_UINT8(aborting_locty, TPMState),
-        VMSTATE_UINT8(next_locty, TPMState),
+        VMSTATE_BUFFER(state.buffer, TPMStateISA),
+        VMSTATE_UINT16(state.rw_offset, TPMStateISA),
+        VMSTATE_UINT8(state.active_locty, TPMStateISA),
+        VMSTATE_UINT8(state.aborting_locty, TPMStateISA),
+        VMSTATE_UINT8(state.next_locty, TPMStateISA),
 
-        VMSTATE_STRUCT_ARRAY(loc, TPMState, TPM_TIS_NUM_LOCALITIES, 0,
+        VMSTATE_STRUCT_ARRAY(state.loc, TPMStateISA, TPM_TIS_NUM_LOCALITIES, 0,
                              vmstate_locty, TPMLocality),
 
         VMSTATE_END_OF_LIST()
     }
 };
 
-static Property tpm_tis_properties[] = {
-    DEFINE_PROP_UINT32("irq", TPMState, irq_num, TPM_TIS_IRQ),
-    DEFINE_PROP_TPMBE("tpmdev", TPMState, be_driver),
-    DEFINE_PROP_BOOL("ppi", TPMState, ppi_enabled, true),
+static void tpm_tis_isa_request_completed(TPMIf *ti, int ret)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(ti);
+    TPMState *s = &isadev->state;
+
+    tpm_tis_request_completed(s, ret);
+}
+
+static enum TPMVersion tpm_tis_isa_get_tpm_version(TPMIf *ti)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(ti);
+    TPMState *s = &isadev->state;
+
+    return tpm_tis_get_tpm_version(s);
+}
+
+static void tpm_tis_isa_reset(DeviceState *dev)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(dev);
+    TPMState *s = &isadev->state;
+
+    return tpm_tis_reset(s);
+}
+
+static Property tpm_tis_isa_properties[] = {
+    DEFINE_PROP_UINT32("irq", TPMStateISA, state.irq_num, TPM_TIS_IRQ),
+    DEFINE_PROP_TPMBE("tpmdev", TPMStateISA, state.be_driver),
+    DEFINE_PROP_BOOL("ppi", TPMStateISA, state.ppi_enabled, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void tpm_tis_realizefn(DeviceState *dev, Error **errp)
+static void tpm_tis_isa_initfn(Object *obj)
 {
-    TPMState *s = TPM(dev);
+    TPMStateISA *isadev = TPM_TIS_ISA(obj);
+    TPMState *s = &isadev->state;
+
+    memory_region_init_io(&s->mmio, obj, &tpm_tis_memory_ops,
+                          s, "tpm-tis-mmio",
+                          TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
+}
+
+static void tpm_tis_isa_realizefn(DeviceState *dev, Error **errp)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(dev);
+    TPMState *s = &isadev->state;
 
     if (!tpm_find()) {
         error_setg(errp, "at most one TPM device is permitted");
@@ -973,55 +1018,46 @@ static void tpm_tis_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    isa_init_irq(&s->busdev, &s->irq, s->irq_num);
+    isa_init_irq(ISA_DEVICE(dev), &s->irq, s->irq_num);
 
     memory_region_add_subregion(isa_address_space(ISA_DEVICE(dev)),
                                 TPM_TIS_ADDR_BASE, &s->mmio);
 
     if (s->ppi_enabled) {
         tpm_ppi_init(&s->ppi, isa_address_space(ISA_DEVICE(dev)),
-                     TPM_PPI_ADDR_BASE, OBJECT(s));
+                     TPM_PPI_ADDR_BASE, OBJECT(dev));
     }
 }
 
-static void tpm_tis_initfn(Object *obj)
-{
-    TPMState *s = TPM(obj);
-
-    memory_region_init_io(&s->mmio, OBJECT(s), &tpm_tis_memory_ops,
-                          s, "tpm-tis-mmio",
-                          TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
-}
-
-static void tpm_tis_class_init(ObjectClass *klass, void *data)
+static void tpm_tis_isa_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     TPMIfClass *tc = TPM_IF_CLASS(klass);
 
-    dc->realize = tpm_tis_realizefn;
-    device_class_set_props(dc, tpm_tis_properties);
-    dc->reset = tpm_tis_reset;
-    dc->vmsd  = &vmstate_tpm_tis;
+    device_class_set_props(dc, tpm_tis_isa_properties);
+    dc->vmsd  = &vmstate_tpm_tis_isa;
     tc->model = TPM_MODEL_TPM_TIS;
-    tc->get_version = tpm_tis_get_tpm_version;
-    tc->request_completed = tpm_tis_request_completed;
+    dc->realize = tpm_tis_isa_realizefn;
+    dc->reset = tpm_tis_isa_reset;
+    tc->request_completed = tpm_tis_isa_request_completed;
+    tc->get_version = tpm_tis_isa_get_tpm_version;
 }
 
-static const TypeInfo tpm_tis_info = {
+static const TypeInfo tpm_tis_isa_info = {
     .name = TYPE_TPM_TIS_ISA,
     .parent = TYPE_ISA_DEVICE,
-    .instance_size = sizeof(TPMState),
-    .instance_init = tpm_tis_initfn,
-    .class_init  = tpm_tis_class_init,
+    .instance_size = sizeof(TPMStateISA),
+    .instance_init = tpm_tis_isa_initfn,
+    .class_init  = tpm_tis_isa_class_init,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_TPM_IF },
         { }
     }
 };
 
-static void tpm_tis_register(void)
+static void tpm_tis_isa_register(void)
 {
-    type_register_static(&tpm_tis_info);
+    type_register_static(&tpm_tis_isa_info);
 }
 
-type_init(tpm_tis_register)
+type_init(tpm_tis_isa_register)
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 02/28] proc: add /proc/<pid>/fsuid_map
From: Christian Brauner @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Stéphane Graber, Eric W. Biederman, Aleksa Sarai, Jann Horn
  Cc: smbarber, Seth Forshee, Alexander Viro, Alexey Dobriyan,
	Serge Hallyn, James Morris, Kees Cook, Jonathan Corbet,
	Phil Estes, linux-kernel, linux-fsdevel, containers,
	linux-security-module, linux-api, Christian Brauner
In-Reply-To: <20200214183554.1133805-1-christian.brauner@ubuntu.com>

The /proc/<pid>/fsuid_map file can be written to once to setup an fsuid mapping
for a user namespace. Writing to this file has the same restrictions as writing
to /proc/<pid>/fsuid_map:

root@e1-vm:/# cat /proc/13023/fsuid_map
         0     300000     100000

Fsid mappings have always been around. They are currently always identical to
the id mappings for a user namespace. This means, currently whenever an fsid
needs to be looked up the kernel will use the id mapping of the user namespace.
With the introduction of fsid mappings the kernel will now lookup fsids in the
fsid mappings of the user namespace. If no fsid mapping exists the kernel will
continue looking up fsids in the id mappings of the user namespace. Hence, if a
system supports fsid mappings through /proc/<pid>/fs*id_map and a container
runtime is not aware of fsid mappings it or does not use them it will it will
continue to work just as before.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/proc/base.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index c7c64272b0fa..5fb28004663e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2970,6 +2970,13 @@ static int proc_projid_map_open(struct inode *inode, struct file *file)
 	return proc_id_map_open(inode, file, &proc_projid_seq_operations);
 }
 
+#ifdef CONFIG_USER_NS_FSID
+static int proc_fsuid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_fsuid_seq_operations);
+}
+#endif
+
 static const struct file_operations proc_uid_map_operations = {
 	.open		= proc_uid_map_open,
 	.write		= proc_uid_map_write,
@@ -2994,6 +3001,16 @@ static const struct file_operations proc_projid_map_operations = {
 	.release	= proc_id_map_release,
 };
 
+#ifdef CONFIG_USER_NS_FSID
+static const struct file_operations proc_fsuid_map_operations = {
+	.open		= proc_fsuid_map_open,
+	.write		= proc_fsuid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
+#endif
+
 static int proc_setgroups_open(struct inode *inode, struct file *file)
 {
 	struct user_namespace *ns = NULL;
@@ -3176,6 +3193,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("io",	S_IRUSR, proc_tgid_io_accounting),
 #endif
 #ifdef CONFIG_USER_NS
+#ifdef CONFIG_USER_NS_FSID
+	REG("fsuid_map",  S_IRUGO|S_IWUSR, proc_fsuid_map_operations),
+#endif
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
 	REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 04/28] fsuidgid: add fsid mapping helpers
From: Christian Brauner @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Stéphane Graber, Eric W. Biederman, Aleksa Sarai, Jann Horn
  Cc: smbarber, Seth Forshee, Alexander Viro, Alexey Dobriyan,
	Serge Hallyn, James Morris, Kees Cook, Jonathan Corbet,
	Phil Estes, linux-kernel, linux-fsdevel, containers,
	linux-security-module, linux-api, Christian Brauner
In-Reply-To: <20200214183554.1133805-1-christian.brauner@ubuntu.com>

This adds a set of helpers to translate between kfsuid/kfsgid and their
userspace fsuid/fsgid counter parts relative to a given user namespace.

- kuid_t make_kfsuid(struct user_namespace *from, uid_t fsuid)
  Maps a user-namespace fsuid pair into a kfsuid.
  If no fsuid mappings have been written it behaves identical to calling
  make_kuid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- kgid_t make_kfsgid(struct user_namespace *from, gid_t fsgid)
  Maps a user-namespace fsgid pair into a kfsgid.
  If no fsgid mappings have been written it behaves identical to calling
  make_kgid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- uid_t from_kfsuid(struct user_namespace *to, kuid_t fsuid)
  Creates a fsuid from a kfsuid user-namespace pair if possible.
  If no fsuid mappings have been written it behaves identical to calling
  from_kuid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- gid_t from_kfsgid(struct user_namespace *to, kgid_t fsgid)
  Creates a fsgid from a kfsgid user-namespace pair if possible.
  If no fsgid mappings have been written it behaves identical to calling
  make_kgid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- uid_t from_kfsuid_munged(struct user_namespace *to, kuid_t fsuid)
  Always creates a fsuid from a kfsuid user-namespace pair.
  If no fsuid mappings have been written it behaves identical to calling
  from_kuid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- gid_t from_kfsgid_munged(struct user_namespace *to, kgid_t fsgid)
  Always creates a fsgid from a kfsgid user-namespace pair if possible.
  If no fsgid mappings have been written it behaves identical to calling
  make_kgid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- bool kfsuid_has_mapping(struct user_namespace *ns, kuid_t uid)
  Check whether this kfsuid has a mapping in the provided user namespace.
  If no fsuid mappings have been written it behaves identical to calling
  from_kuid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- bool kfsgid_has_mapping(struct user_namespace *ns, kgid_t gid)
  Check whether this kfsgid has a mapping in the provided user namespace.
  If no fsgid mappings have been written it behaves identical to calling
  make_kgid(). This ensures backwards compatibility for workloads unaware
  or not in need of fsid mappings.

- kuid_t kfsuid_to_kuid(struct user_namespace *to, kuid_t kfsuid)
  Translate from a kfsuid into a kuid.

- kgid_t kfsgid_to_kgid(struct user_namespace *to, kgid_t kfsgid)
  Translate from a kfsgid into a kgid.

- kuid_t kuid_to_kfsuid(struct user_namespace *to, kuid_t kuid)
  Translate from a kuid into a kfsuid.

- kgid_t kgid_to_kfsuid(struct user_namespace *to, kgid_t kgid)
  Translate from a kgid into a kfsgid.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christian Brauner <christian.brauner@ubuntu.com>:
  - add kfsuid_to_kuid(), kfsgid_to_kgid(), kuid_to_kfsuid(), kgid_to_kfsgid()
---
 include/linux/fsuidgid.h | 122 +++++++++++++++++++++++++
 kernel/user_namespace.c  | 189 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 298 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/fsuidgid.h

diff --git a/include/linux/fsuidgid.h b/include/linux/fsuidgid.h
new file mode 100644
index 000000000000..46763591f4e6
--- /dev/null
+++ b/include/linux/fsuidgid.h
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_FSUIDGID_H
+#define _LINUX_FSUIDGID_H
+
+#include <linux/uidgid.h>
+
+#ifdef CONFIG_USER_NS_FSID
+
+extern kuid_t make_kfsuid(struct user_namespace *from, uid_t fsuid);
+extern kgid_t make_kfsgid(struct user_namespace *from, gid_t fsgid);
+extern uid_t from_kfsuid(struct user_namespace *to, kuid_t kfsuid);
+extern gid_t from_kfsgid(struct user_namespace *to, kgid_t kfsgid);
+extern uid_t from_kfsuid_munged(struct user_namespace *to, kuid_t kfsuid);
+extern gid_t from_kfsgid_munged(struct user_namespace *to, kgid_t kfsgid);
+
+static inline bool kfsuid_has_mapping(struct user_namespace *ns, kuid_t kfsuid)
+{
+	return from_kfsuid(ns, kfsuid) != (uid_t) -1;
+}
+
+static inline bool kfsgid_has_mapping(struct user_namespace *ns, kgid_t kfsgid)
+{
+	return from_kfsgid(ns, kfsgid) != (gid_t) -1;
+}
+
+static inline kuid_t kfsuid_to_kuid(struct user_namespace *to, kuid_t kfsuid)
+{
+	uid_t fsuid = from_kfsuid(to, kfsuid);
+	if (fsuid == (uid_t) -1)
+		return INVALID_UID;
+	return make_kuid(to, fsuid);
+}
+
+static inline kgid_t kfsgid_to_kgid(struct user_namespace *to, kgid_t kfsgid)
+{
+	gid_t fsgid = from_kfsgid(to, kfsgid);
+	if (fsgid == (gid_t) -1)
+		return INVALID_GID;
+	return make_kgid(to, fsgid);
+}
+
+static inline kuid_t kuid_to_kfsuid(struct user_namespace *to, kuid_t kuid)
+{
+	uid_t uid = from_kuid(to, kuid);
+	if (uid == (uid_t) -1)
+		return INVALID_UID;
+	return make_kfsuid(to, uid);
+}
+
+static inline kgid_t kgid_to_kfsgid(struct user_namespace *to, kgid_t kgid)
+{
+	gid_t gid = from_kgid(to, kgid);
+	if (gid == (gid_t) -1)
+		return INVALID_GID;
+	return make_kfsgid(to, gid);
+}
+
+#else
+
+static inline kuid_t make_kfsuid(struct user_namespace *from, uid_t fsuid)
+{
+	return make_kuid(from, fsuid);
+}
+
+static inline kgid_t make_kfsgid(struct user_namespace *from, gid_t fsgid)
+{
+	return make_kgid(from, fsgid);
+}
+
+static inline uid_t from_kfsuid(struct user_namespace *to, kuid_t kfsuid)
+{
+	return from_kuid(to, kfsuid);
+}
+
+static inline gid_t from_kfsgid(struct user_namespace *to, kgid_t kfsgid)
+{
+	return from_kgid(to, kfsgid);
+}
+
+static inline uid_t from_kfsuid_munged(struct user_namespace *to, kuid_t kfsuid)
+{
+	return from_kuid_munged(to, kfsuid);
+}
+
+static inline gid_t from_kfsgid_munged(struct user_namespace *to, kgid_t kfsgid)
+{
+	return from_kgid_munged(to, kfsgid);
+}
+
+static inline bool kfsuid_has_mapping(struct user_namespace *ns, kuid_t kfsuid)
+{
+	return kuid_has_mapping(ns, kfsuid);
+}
+
+static inline bool kfsgid_has_mapping(struct user_namespace *ns, kgid_t kfsgid)
+{
+	return kgid_has_mapping(ns, kfsgid);
+}
+
+static inline kuid_t kfsuid_to_kuid(struct user_namespace *to, kuid_t kfsuid)
+{
+	return kfsuid;
+}
+
+static inline kgid_t kfsgid_to_kgid(struct user_namespace *to, kgid_t kfsgid)
+{
+	return kfsgid;
+}
+
+static inline kuid_t kuid_to_kfsuid(struct user_namespace *to, kuid_t kuid)
+{
+	return kuid;
+}
+
+static inline kgid_t kgid_to_kfsgid(struct user_namespace *to, kgid_t kgid)
+{
+	return kgid;
+}
+
+#endif /* CONFIG_USER_NS_FSID */
+
+#endif /* _LINUX_FSUIDGID_H */
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index cbdf456f95f0..398be02de5c3 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -20,13 +20,14 @@
 #include <linux/fs_struct.h>
 #include <linux/bsearch.h>
 #include <linux/sort.h>
+#include <linux/fsuidgid.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 static DEFINE_MUTEX(userns_state_mutex);
 
 static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
-				struct uid_gid_map *map);
+				struct uid_gid_map *map, bool map_fsid);
 static void free_user_ns(struct work_struct *work);
 
 static struct ucounts *inc_user_namespaces(struct user_namespace *ns, kuid_t uid)
@@ -583,6 +584,166 @@ projid_t from_kprojid_munged(struct user_namespace *targ, kprojid_t kprojid)
 }
 EXPORT_SYMBOL(from_kprojid_munged);
 
+#ifdef CONFIG_USER_NS_FSID
+/**
+ *	make_kfsuid - Map a user-namespace fsuid pair into a kuid.
+ *	@ns:  User namespace that the fsuid is in
+ *	@fsuid: User identifier
+ *
+ *	Maps a user-namespace fsuid pair into a kernel internal kfsuid,
+ *	and returns that kfsuid.
+ *
+ *	When there is no mapping defined for the user-namespace kfsuid
+ *	pair INVALID_UID is returned.  Callers are expected to test
+ *	for and handle INVALID_UID being returned.  INVALID_UID
+ *	may be tested for using uid_valid().
+ */
+kuid_t make_kfsuid(struct user_namespace *ns, uid_t fsuid)
+{
+	unsigned extents = ns->fsuid_map.nr_extents;
+	smp_rmb();
+
+	/* Map the fsuid to a global kernel fsuid */
+	if (extents == 0)
+		return KUIDT_INIT(map_id_down(&ns->uid_map, fsuid));
+
+	return KUIDT_INIT(map_id_down(&ns->fsuid_map, fsuid));
+}
+EXPORT_SYMBOL(make_kfsuid);
+
+/**
+ *	from_kfsuid - Create a fsuid from a kfsuid user-namespace pair.
+ *	@targ: The user namespace we want a fsuid in.
+ *	@kfsuid: The kernel internal fsuid to start with.
+ *
+ *	Map @kfsuid into the user-namespace specified by @targ and
+ *	return the resulting fsuid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kfsuid has no mapping in @targ (uid_t)-1 is returned.
+ */
+uid_t from_kfsuid(struct user_namespace *targ, kuid_t kfsuid)
+{
+	unsigned extents = targ->fsuid_map.nr_extents;
+	smp_rmb();
+
+	/* Map the fsuid from a global kernel fsuid */
+	if (extents == 0)
+		return map_id_up(&targ->uid_map, __kuid_val(kfsuid));
+
+	return map_id_up(&targ->fsuid_map, __kuid_val(kfsuid));
+}
+EXPORT_SYMBOL(from_kfsuid);
+
+/**
+ *	from_kfsuid_munged - Create a fsuid from a kfsuid user-namespace pair.
+ *	@targ: The user namespace we want a fsuid in.
+ *	@kfsuid: The kernel internal fsuid to start with.
+ *
+ *	Map @kfsuid into the user-namespace specified by @targ and
+ *	return the resulting fsuid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kfsuid from_kfsuid_munged never fails and always
+ *	returns a valid fsuid.  This makes from_kfsuid_munged appropriate
+ *	for use in syscalls like stat and getuid where failing the
+ *	system call and failing to provide a valid fsuid are not an
+ *	options.
+ *
+ *	If @kfsuid has no mapping in @targ overflowuid is returned.
+ */
+uid_t from_kfsuid_munged(struct user_namespace *targ, kuid_t kfsuid)
+{
+	uid_t fsuid;
+	fsuid = from_kfsuid(targ, kfsuid);
+
+	if (fsuid == (uid_t) -1)
+		fsuid = overflowuid;
+	return fsuid;
+}
+EXPORT_SYMBOL(from_kfsuid_munged);
+
+/**
+ *	make_kfsgid - Map a user-namespace fsgid pair into a kfsgid.
+ *	@ns:  User namespace that the fsgid is in
+ *	@fsgid: User identifier
+ *
+ *	Maps a user-namespace fsgid pair into a kernel internal kfsgid,
+ *	and returns that kfsgid.
+ *
+ *	When there is no mapping defined for the user-namespace fsgid
+ *	pair INVALID_GID is returned.  Callers are expected to test
+ *	for and handle INVALID_GID being returned.  INVALID_GID
+ *	may be tested for using gid_valid().
+ */
+kgid_t make_kfsgid(struct user_namespace *ns, gid_t fsgid)
+{
+	unsigned extents = ns->fsgid_map.nr_extents;
+	smp_rmb();
+
+	/* Map the fsgid to a global kernel fsgid */
+	if (extents == 0)
+		return KGIDT_INIT(map_id_down(&ns->gid_map, fsgid));
+
+	return KGIDT_INIT(map_id_down(&ns->fsgid_map, fsgid));
+}
+EXPORT_SYMBOL(make_kfsgid);
+
+/**
+ *	from_kfsgid - Create a fsgid from a kfsgid user-namespace pair.
+ *	@targ: The user namespace we want a fsgid in.
+ *	@kfsgid: The kernel internal fsgid to start with.
+ *
+ *	Map @kfsgid into the user-namespace specified by @targ and
+ *	return the resulting fsgid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kfsgid has no mapping in @targ (gid_t)-1 is returned.
+ */
+gid_t from_kfsgid(struct user_namespace *targ, kgid_t kfsgid)
+{
+	unsigned extents = targ->fsgid_map.nr_extents;
+	smp_rmb();
+
+	/* Map the fsgid from a global kernel fsgid */
+	if (extents == 0)
+		return map_id_up(&targ->gid_map, __kgid_val(kfsgid));
+
+	return map_id_up(&targ->fsgid_map, __kgid_val(kfsgid));
+}
+EXPORT_SYMBOL(from_kfsgid);
+
+/**
+ *	from_kfsgid_munged - Create a fsgid from a kfsgid user-namespace pair.
+ *	@targ: The user namespace we want a fsgid in.
+ *	@kfsgid: The kernel internal fsgid to start with.
+ *
+ *	Map @kfsgid into the user-namespace specified by @targ and
+ *	return the resulting fsgid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kfsgid from_kfsgid_munged never fails and always
+ *	returns a valid fsgid.  This makes from_kfsgid_munged appropriate
+ *	for use in syscalls like stat and getgid where failing the
+ *	system call and failing to provide a valid fsgid are not options.
+ *
+ *	If @kfsgid has no mapping in @targ overflowgid is returned.
+ */
+gid_t from_kfsgid_munged(struct user_namespace *targ, kgid_t kfsgid)
+{
+	gid_t fsgid;
+	fsgid = from_kfsgid(targ, kfsgid);
+
+	if (fsgid == (gid_t) -1)
+		fsgid = overflowgid;
+	return fsgid;
+}
+EXPORT_SYMBOL(from_kfsgid_munged);
+#endif /* CONFIG_USER_NS_FSID */
 
 static int uid_m_show(struct seq_file *seq, void *v)
 {
@@ -659,7 +820,7 @@ static int fsuid_m_show(struct seq_file *seq, void *v)
 	if ((lower_ns == ns) && lower_ns->parent)
 		lower_ns = lower_ns->parent;
 
-	lower = from_kuid(lower_ns, KUIDT_INIT(extent->lower_first));
+	lower = from_kfsuid(lower_ns, KUIDT_INIT(extent->lower_first));
 
 	seq_printf(seq, "%10u %10u %10u\n",
 		extent->first,
@@ -680,7 +841,7 @@ static int fsgid_m_show(struct seq_file *seq, void *v)
 	if ((lower_ns == ns) && lower_ns->parent)
 		lower_ns = lower_ns->parent;
 
-	lower = from_kgid(lower_ns, KGIDT_INIT(extent->lower_first));
+	lower = from_kfsgid(lower_ns, KGIDT_INIT(extent->lower_first));
 
 	seq_printf(seq, "%10u %10u %10u\n",
 		extent->first,
@@ -931,7 +1092,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 			 size_t count, loff_t *ppos,
 			 int cap_setid,
 			 struct uid_gid_map *map,
-			 struct uid_gid_map *parent_map)
+			 struct uid_gid_map *parent_map, bool map_fsid)
 {
 	struct seq_file *seq = file->private_data;
 	struct user_namespace *ns = seq->private;
@@ -1051,7 +1212,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 
 	ret = -EPERM;
 	/* Validate the user is allowed to use user id's mapped to. */
-	if (!new_idmap_permitted(file, ns, cap_setid, &new_map))
+	if (!new_idmap_permitted(file, ns, cap_setid, &new_map, map_fsid))
 		goto out;
 
 	ret = -EPERM;
@@ -1129,7 +1290,7 @@ ssize_t proc_uid_map_write(struct file *file, const char __user *buf,
 		return -EPERM;
 
 	return map_write(file, buf, size, ppos, CAP_SETUID,
-			 &ns->uid_map, &ns->parent->uid_map);
+			 &ns->uid_map, &ns->parent->uid_map, false);
 }
 
 ssize_t proc_gid_map_write(struct file *file, const char __user *buf,
@@ -1146,7 +1307,7 @@ ssize_t proc_gid_map_write(struct file *file, const char __user *buf,
 		return -EPERM;
 
 	return map_write(file, buf, size, ppos, CAP_SETGID,
-			 &ns->gid_map, &ns->parent->gid_map);
+			 &ns->gid_map, &ns->parent->gid_map, false);
 }
 
 ssize_t proc_projid_map_write(struct file *file, const char __user *buf,
@@ -1164,7 +1325,7 @@ ssize_t proc_projid_map_write(struct file *file, const char __user *buf,
 
 	/* Anyone can set any valid project id no capability needed */
 	return map_write(file, buf, size, ppos, -1,
-			 &ns->projid_map, &ns->parent->projid_map);
+			 &ns->projid_map, &ns->parent->projid_map, false);
 }
 
 #ifdef CONFIG_USER_NS_FSID
@@ -1182,7 +1343,7 @@ ssize_t proc_fsuid_map_write(struct file *file, const char __user *buf,
 		return -EPERM;
 
 	return map_write(file, buf, size, ppos, CAP_SETUID, &ns->fsuid_map,
-			 &ns->parent->fsuid_map);
+			 &ns->parent->fsuid_map, true);
 }
 
 ssize_t proc_fsgid_map_write(struct file *file, const char __user *buf,
@@ -1199,13 +1360,13 @@ ssize_t proc_fsgid_map_write(struct file *file, const char __user *buf,
 		return -EPERM;
 
 	return map_write(file, buf, size, ppos, CAP_SETGID, &ns->fsgid_map,
-			 &ns->parent->fsgid_map);
+			 &ns->parent->fsgid_map, true);
 }
 #endif
 
 static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
-				struct uid_gid_map *new_map)
+				struct uid_gid_map *new_map, bool map_fsid)
 {
 	const struct cred *cred = file->f_cred;
 	/* Don't allow mappings that would allow anything that wouldn't
@@ -1215,11 +1376,13 @@ static bool new_idmap_permitted(const struct file *file,
 	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
-			kuid_t uid = make_kuid(ns->parent, id);
+			kuid_t uid = map_fsid ? make_kfsuid(ns->parent, id) :
+						make_kuid(ns->parent, id);
 			if (uid_eq(uid, cred->euid))
 				return true;
 		} else if (cap_setid == CAP_SETGID) {
-			kgid_t gid = make_kgid(ns->parent, id);
+			kgid_t gid = map_fsid ? make_kfsgid(ns->parent, id) :
+						make_kgid(ns->parent, id);
 			if (!(ns->flags & USERNS_SETGROUPS_ALLOWED) &&
 			    gid_eq(gid, cred->egid))
 				return true;
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 03/28] proc: add /proc/<pid>/fsgid_map
From: Christian Brauner @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Stéphane Graber, Eric W. Biederman, Aleksa Sarai, Jann Horn
  Cc: smbarber, Seth Forshee, Alexander Viro, Alexey Dobriyan,
	Serge Hallyn, James Morris, Kees Cook, Jonathan Corbet,
	Phil Estes, linux-kernel, linux-fsdevel, containers,
	linux-security-module, linux-api, Christian Brauner
In-Reply-To: <20200214183554.1133805-1-christian.brauner@ubuntu.com>

The /proc/<pid>/fsgid_map file can be written to once to setup an fsgid mapping
for a user namespace. Writing to this file has the same restrictions as writing
to /proc/<pid>/fsgid_map.

root@e1-vm:/# cat /proc/13023/fsgid_map
         0     300000     100000

Fsid mappings have always been around. They are currently always identical to
the id mappings for a user namespace. This means, currently whenever an fsid
needs to be looked up the kernel will use the id mapping of the user namespace.
With the introduction of fsid mappings the kernel will now lookup fsids in the
fsid mappings of the user namespace. If no fsid mapping exists the kernel will
continue looking up fsids in the id mappings of the user namespace. Hence, if a
system supports fsid mappings through /proc/<pid>/fs*id_map and a container
runtime is not aware of fsid mappings it or does not use them it will it will
continue to work just as before.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/proc/base.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5fb28004663e..1303cdd2e617 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2975,6 +2975,11 @@ static int proc_fsuid_map_open(struct inode *inode, struct file *file)
 {
 	return proc_id_map_open(inode, file, &proc_fsuid_seq_operations);
 }
+
+static int proc_fsgid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_fsgid_seq_operations);
+}
 #endif
 
 static const struct file_operations proc_uid_map_operations = {
@@ -3009,6 +3014,14 @@ static const struct file_operations proc_fsuid_map_operations = {
 	.llseek		= seq_lseek,
 	.release	= proc_id_map_release,
 };
+
+static const struct file_operations proc_fsgid_map_operations = {
+	.open		= proc_fsgid_map_open,
+	.write		= proc_fsgid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
 #endif
 
 static int proc_setgroups_open(struct inode *inode, struct file *file)
@@ -3195,6 +3208,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_USER_NS
 #ifdef CONFIG_USER_NS_FSID
 	REG("fsuid_map",  S_IRUGO|S_IWUSR, proc_fsuid_map_operations),
+	REG("fsgid_map",  S_IRUGO|S_IWUSR, proc_fsgid_map_operations),
 #endif
 	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
 	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 05/28] proc: task_state(): use from_kfs{g,u}id_munged
From: Christian Brauner @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Stéphane Graber, Eric W. Biederman, Aleksa Sarai, Jann Horn
  Cc: smbarber, Seth Forshee, Alexander Viro, Alexey Dobriyan,
	Serge Hallyn, James Morris, Kees Cook, Jonathan Corbet,
	Phil Estes, linux-kernel, linux-fsdevel, containers,
	linux-security-module, linux-api, Christian Brauner
In-Reply-To: <20200214183554.1133805-1-christian.brauner@ubuntu.com>

If fsid mappings have been written, this will cause proc to look at fsid
mappings for the user namespace. If no fsid mappings have been written the
behavior is as before.

Here is part of the output from /proc/<pid>/status from the initial user
namespace for systemd running in an unprivileged container as user namespace
root with id mapping 0 100000 100000 and fsid mapping 0 300000 100000:

Name:	systemd
Umask:	0000
State:	S (sleeping)
Tgid:	13023
Ngid:	0
Pid:	13023
PPid:	13008
TracerPid:	0
Uid:	100000	100000	100000	300000
Gid:	100000	100000	100000	300000
FDSize:	64
Groups:

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/proc/array.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 5efaf3708ec6..d4a04f85a67e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -91,6 +91,7 @@
 #include <linux/string_helpers.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_struct.h>
+#include <linux/fsuidgid.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -193,11 +194,11 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, "\nUid:\t", from_kuid_munged(user_ns, cred->uid));
 	seq_put_decimal_ull(m, "\t", from_kuid_munged(user_ns, cred->euid));
 	seq_put_decimal_ull(m, "\t", from_kuid_munged(user_ns, cred->suid));
-	seq_put_decimal_ull(m, "\t", from_kuid_munged(user_ns, cred->fsuid));
+	seq_put_decimal_ull(m, "\t", from_kfsuid_munged(user_ns, cred->fsuid));
 	seq_put_decimal_ull(m, "\nGid:\t", from_kgid_munged(user_ns, cred->gid));
 	seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->egid));
 	seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->sgid));
-	seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->fsgid));
+	seq_put_decimal_ull(m, "\t", from_kfsgid_munged(user_ns, cred->fsgid));
 	seq_put_decimal_ull(m, "\nFDSize:\t", max_fds);
 
 	seq_puts(m, "\nGroups:\t");
-- 
2.25.0


^ permalink raw reply related

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Fix selftest_mocs for DGFX
From: Chris Wilson @ 2020-02-14 18:37 UTC (permalink / raw)
  To: Brian Welty, Daniele Ceraolo Spurio, intel-gfx
In-Reply-To: <158170497173.15393.11944816323451861470@skylake-alporthouse-com>

Quoting Chris Wilson (2020-02-14 18:29:31)
> +static bool has_l3cc(struct drm_i915_private *i915)
> +{
> +       return true;
> +}
> +
> +static bool has_mocs(struct drm_i915_private *i915)
> +{
> +       return !IS_DGFX(i915);
> +}
> +
>  static int live_mocs_init(struct live_mocs *arg, struct intel_gt *gt)
>  {
> +       struct drm_i915_mocs_table table;
>         int err;
> 
> -       if (!get_mocs_settings(gt->i915, &arg->table))
> +       memset(arg, 0, sizeof(*arg));
> +
> +       if (!get_mocs_settings(gt->i915, &table))
>                 return -EINVAL;

On top of that, if we did something like return a mask from
get_mocs_settings() for HAS_L3CC | HAS_MOCS (or put it into table) we
can share the init with the main code and reduce potential drift.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* [RFC v2 3/6] tpm: Separate tpm_tis common functions from isa code
From: Eric Auger @ 2020-02-14 18:37 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, stefanb, qemu-devel, qemu-arm,
	peter.maydell
  Cc: marcandre.lureau, lersek, ardb, philmd
In-Reply-To: <20200214183704.14389-1-eric.auger@redhat.com>

Move the device agnostic code into tpm_tis_common.c and
put the ISA device specific code into tpm_tis_isa.c

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/tpm/Makefile.objs                   |   2 +-
 hw/tpm/tpm_tis.h                       |  91 +++++++++++
 hw/tpm/{tpm_tis.c => tpm_tis_common.c} | 209 ++-----------------------
 hw/tpm/tpm_tis_isa.c                   | 170 ++++++++++++++++++++
 4 files changed, 271 insertions(+), 201 deletions(-)
 create mode 100644 hw/tpm/tpm_tis.h
 rename hw/tpm/{tpm_tis.c => tpm_tis_common.c} (83%)
 create mode 100644 hw/tpm/tpm_tis_isa.c

diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs
index 85eb99ae05..fcc4c2f27c 100644
--- a/hw/tpm/Makefile.objs
+++ b/hw/tpm/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-$(CONFIG_TPM) += tpm_util.o
 obj-$(call lor,$(CONFIG_TPM_TIS),$(CONFIG_TPM_CRB)) += tpm_ppi.o
-common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
+common-obj-$(CONFIG_TPM_TIS) += tpm_tis_isa.o tpm_tis_common.o
 common-obj-$(CONFIG_TPM_CRB) += tpm_crb.o
 common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
 common-obj-$(CONFIG_TPM_EMULATOR) += tpm_emulator.o
diff --git a/hw/tpm/tpm_tis.h b/hw/tpm/tpm_tis.h
new file mode 100644
index 0000000000..5554989395
--- /dev/null
+++ b/hw/tpm/tpm_tis.h
@@ -0,0 +1,91 @@
+/*
+ * tpm_tis.h - QEMU's TPM TIS common header
+ *
+ * Copyright (C) 2006,2010-2013 IBM Corporation
+ *
+ * Authors:
+ *  Stefan Berger <stefanb@us.ibm.com>
+ *  David Safford <safford@us.ibm.com>
+ *
+ * Xen 4 support: Andrease Niederl <andreas.niederl@iaik.tugraz.at>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * Implementation of the TIS interface according to specs found at
+ * http://www.trustedcomputinggroup.org. This implementation currently
+ * supports version 1.3, 21 March 2013
+ * In the developers menu choose the PC Client section then find the TIS
+ * specification.
+ *
+ * TPM TIS for TPM 2 implementation following TCG PC Client Platform
+ * TPM Profile (PTP) Specification, Familiy 2.0, Revision 00.43
+ */
+#ifndef TPM_TPM_TIS_H
+#define TPM_TPM_TIS_H
+
+#include "qemu/osdep.h"
+#include "sysemu/tpm_backend.h"
+#include "tpm_ppi.h"
+
+#define TPM_TIS_NUM_LOCALITIES      5     /* per spec */
+#define TPM_TIS_LOCALITY_SHIFT      12
+#define TPM_TIS_NO_LOCALITY         0xff
+
+#define TPM_TIS_IS_VALID_LOCTY(x)   ((x) < TPM_TIS_NUM_LOCALITIES)
+
+#define TPM_TIS_BUFFER_MAX          4096
+
+typedef enum {
+    TPM_TIS_STATE_IDLE = 0,
+    TPM_TIS_STATE_READY,
+    TPM_TIS_STATE_COMPLETION,
+    TPM_TIS_STATE_EXECUTION,
+    TPM_TIS_STATE_RECEPTION,
+} TPMTISState;
+
+/* locality data  -- all fields are persisted */
+typedef struct TPMLocality {
+    TPMTISState state;
+    uint8_t access;
+    uint32_t sts;
+    uint32_t iface_id;
+    uint32_t inte;
+    uint32_t ints;
+} TPMLocality;
+
+typedef struct TPMState {
+    MemoryRegion mmio;
+
+    unsigned char buffer[TPM_TIS_BUFFER_MAX];
+    uint16_t rw_offset;
+
+    uint8_t active_locty;
+    uint8_t aborting_locty;
+    uint8_t next_locty;
+
+    TPMLocality loc[TPM_TIS_NUM_LOCALITIES];
+
+    qemu_irq irq;
+    uint32_t irq_num;
+
+    TPMBackendCmd cmd;
+
+    TPMBackend *be_driver;
+    TPMVersion be_tpm_version;
+
+    size_t be_buffer_size;
+
+    bool ppi_enabled;
+    TPMPPI ppi;
+} TPMState;
+
+extern const VMStateDescription vmstate_locty;
+extern const MemoryRegionOps tpm_tis_memory_ops;
+
+int tpm_tis_pre_save(TPMState *s);
+void tpm_tis_reset(TPMState *s);
+enum TPMVersion tpm_tis_get_tpm_version(TPMState *s);
+void tpm_tis_request_completed(TPMState *s, int ret);
+
+#endif /* TPM_TPM_TIS_H */
diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis_common.c
similarity index 83%
rename from hw/tpm/tpm_tis.c
rename to hw/tpm/tpm_tis_common.c
index fc6d7ca579..9ce64d4836 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis_common.c
@@ -1,5 +1,6 @@
 /*
- * tpm_tis.c - QEMU's TPM TIS interface emulator
+ * tpm_tis_common.c - QEMU's TPM TIS interface emulator
+ * device agnostic functions
  *
  * Copyright (C) 2006,2010-2013 IBM Corporation
  *
@@ -21,7 +22,6 @@
  * TPM TIS for TPM 2 implementation following TCG PC Client Platform
  * TPM Profile (PTP) Specification, Familiy 2.0, Revision 00.43
  */
-
 #include "qemu/osdep.h"
 #include "hw/irq.h"
 #include "hw/isa/isa.h"
@@ -38,67 +38,7 @@
 #include "tpm_ppi.h"
 #include "trace.h"
 
-#define TPM_TIS_NUM_LOCALITIES      5     /* per spec */
-#define TPM_TIS_LOCALITY_SHIFT      12
-#define TPM_TIS_NO_LOCALITY         0xff
-
-#define TPM_TIS_IS_VALID_LOCTY(x)   ((x) < TPM_TIS_NUM_LOCALITIES)
-
-#define TPM_TIS_BUFFER_MAX          4096
-
-typedef enum {
-    TPM_TIS_STATE_IDLE = 0,
-    TPM_TIS_STATE_READY,
-    TPM_TIS_STATE_COMPLETION,
-    TPM_TIS_STATE_EXECUTION,
-    TPM_TIS_STATE_RECEPTION,
-} TPMTISState;
-
-/* locality data  -- all fields are persisted */
-typedef struct TPMLocality {
-    TPMTISState state;
-    uint8_t access;
-    uint32_t sts;
-    uint32_t iface_id;
-    uint32_t inte;
-    uint32_t ints;
-} TPMLocality;
-
-typedef struct TPMState {
-    MemoryRegion mmio;
-
-    unsigned char buffer[TPM_TIS_BUFFER_MAX];
-    uint16_t rw_offset;
-
-    uint8_t active_locty;
-    uint8_t aborting_locty;
-    uint8_t next_locty;
-
-    TPMLocality loc[TPM_TIS_NUM_LOCALITIES];
-
-    qemu_irq irq;
-    uint32_t irq_num;
-
-    TPMBackendCmd cmd;
-
-    TPMBackend *be_driver;
-    TPMVersion be_tpm_version;
-
-    size_t be_buffer_size;
-
-    bool ppi_enabled;
-    TPMPPI ppi;
-} TPMState;
-
-typedef struct TPMStateISA {
-    /*< private >*/
-    ISADevice parent_obj;
-
-    /*< public >*/
-    TPMState state; /* not a QOM object */
-} TPMStateISA;
-
-#define TPM_TIS_ISA(obj) OBJECT_CHECK(TPMStateISA, (obj), TYPE_TPM_TIS_ISA)
+#include "tpm_tis.h"
 
 #define DEBUG_TIS 0
 
@@ -288,7 +228,7 @@ static void tpm_tis_prep_abort(TPMState *s, uint8_t locty, uint8_t newlocty)
 /*
  * Callback from the TPM to indicate that the response was received.
  */
-static void tpm_tis_request_completed(TPMState *s, int ret)
+void tpm_tis_request_completed(TPMState *s, int ret)
 {
     uint8_t locty = s->cmd.locty;
     uint8_t l;
@@ -827,7 +767,7 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
     }
 }
 
-static const MemoryRegionOps tpm_tis_memory_ops = {
+const MemoryRegionOps tpm_tis_memory_ops = {
     .read = tpm_tis_mmio_read,
     .write = tpm_tis_mmio_write,
     .endianness = DEVICE_LITTLE_ENDIAN,
@@ -840,7 +780,7 @@ static const MemoryRegionOps tpm_tis_memory_ops = {
 /*
  * Get the TPMVersion of the backend device being used
  */
-static enum TPMVersion tpm_tis_get_tpm_version(TPMState *s)
+enum TPMVersion tpm_tis_get_tpm_version(TPMState *s)
 {
     if (tpm_backend_had_startup_error(s->be_driver)) {
         return TPM_VERSION_UNSPEC;
@@ -853,7 +793,7 @@ static enum TPMVersion tpm_tis_get_tpm_version(TPMState *s)
  * This function is called when the machine starts, resets or due to
  * S3 resume.
  */
-static void tpm_tis_reset(TPMState *s)
+void tpm_tis_reset(TPMState *s)
 {
     int c;
 
@@ -898,7 +838,7 @@ static void tpm_tis_reset(TPMState *s)
 
 /* persistent state handling */
 
-static int tpm_tis_pre_save(TPMState *s)
+int tpm_tis_pre_save(TPMState *s)
 {
     uint8_t locty = s->active_locty;
 
@@ -916,7 +856,7 @@ static int tpm_tis_pre_save(TPMState *s)
     return 0;
 }
 
-static const VMStateDescription vmstate_locty = {
+const VMStateDescription vmstate_locty = {
     .name = "tpm-tis/locty",
     .version_id = 0,
     .fields      = (VMStateField[]) {
@@ -930,134 +870,3 @@ static const VMStateDescription vmstate_locty = {
     }
 };
 
-/* ISA */
-
-static int tpm_tis_pre_save_isa(void *opaque)
-{
-    TPMStateISA *isadev = opaque;
-
-    return tpm_tis_pre_save(&isadev->state);
-}
-
-static const VMStateDescription vmstate_tpm_tis_isa = {
-    .name = "tpm-tis",
-    .version_id = 0,
-    .pre_save  = tpm_tis_pre_save_isa,
-    .fields = (VMStateField[]) {
-        VMSTATE_BUFFER(state.buffer, TPMStateISA),
-        VMSTATE_UINT16(state.rw_offset, TPMStateISA),
-        VMSTATE_UINT8(state.active_locty, TPMStateISA),
-        VMSTATE_UINT8(state.aborting_locty, TPMStateISA),
-        VMSTATE_UINT8(state.next_locty, TPMStateISA),
-
-        VMSTATE_STRUCT_ARRAY(state.loc, TPMStateISA, TPM_TIS_NUM_LOCALITIES, 0,
-                             vmstate_locty, TPMLocality),
-
-        VMSTATE_END_OF_LIST()
-    }
-};
-
-static void tpm_tis_isa_request_completed(TPMIf *ti, int ret)
-{
-    TPMStateISA *isadev = TPM_TIS_ISA(ti);
-    TPMState *s = &isadev->state;
-
-    tpm_tis_request_completed(s, ret);
-}
-
-static enum TPMVersion tpm_tis_isa_get_tpm_version(TPMIf *ti)
-{
-    TPMStateISA *isadev = TPM_TIS_ISA(ti);
-    TPMState *s = &isadev->state;
-
-    return tpm_tis_get_tpm_version(s);
-}
-
-static void tpm_tis_isa_reset(DeviceState *dev)
-{
-    TPMStateISA *isadev = TPM_TIS_ISA(dev);
-    TPMState *s = &isadev->state;
-
-    return tpm_tis_reset(s);
-}
-
-static Property tpm_tis_isa_properties[] = {
-    DEFINE_PROP_UINT32("irq", TPMStateISA, state.irq_num, TPM_TIS_IRQ),
-    DEFINE_PROP_TPMBE("tpmdev", TPMStateISA, state.be_driver),
-    DEFINE_PROP_BOOL("ppi", TPMStateISA, state.ppi_enabled, true),
-    DEFINE_PROP_END_OF_LIST(),
-};
-
-static void tpm_tis_isa_initfn(Object *obj)
-{
-    TPMStateISA *isadev = TPM_TIS_ISA(obj);
-    TPMState *s = &isadev->state;
-
-    memory_region_init_io(&s->mmio, obj, &tpm_tis_memory_ops,
-                          s, "tpm-tis-mmio",
-                          TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
-}
-
-static void tpm_tis_isa_realizefn(DeviceState *dev, Error **errp)
-{
-    TPMStateISA *isadev = TPM_TIS_ISA(dev);
-    TPMState *s = &isadev->state;
-
-    if (!tpm_find()) {
-        error_setg(errp, "at most one TPM device is permitted");
-        return;
-    }
-
-    if (!s->be_driver) {
-        error_setg(errp, "'tpmdev' property is required");
-        return;
-    }
-    if (s->irq_num > 15) {
-        error_setg(errp, "IRQ %d is outside valid range of 0 to 15",
-                   s->irq_num);
-        return;
-    }
-
-    isa_init_irq(ISA_DEVICE(dev), &s->irq, s->irq_num);
-
-    memory_region_add_subregion(isa_address_space(ISA_DEVICE(dev)),
-                                TPM_TIS_ADDR_BASE, &s->mmio);
-
-    if (s->ppi_enabled) {
-        tpm_ppi_init(&s->ppi, isa_address_space(ISA_DEVICE(dev)),
-                     TPM_PPI_ADDR_BASE, OBJECT(dev));
-    }
-}
-
-static void tpm_tis_isa_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    TPMIfClass *tc = TPM_IF_CLASS(klass);
-
-    device_class_set_props(dc, tpm_tis_isa_properties);
-    dc->vmsd  = &vmstate_tpm_tis_isa;
-    tc->model = TPM_MODEL_TPM_TIS;
-    dc->realize = tpm_tis_isa_realizefn;
-    dc->reset = tpm_tis_isa_reset;
-    tc->request_completed = tpm_tis_isa_request_completed;
-    tc->get_version = tpm_tis_isa_get_tpm_version;
-}
-
-static const TypeInfo tpm_tis_isa_info = {
-    .name = TYPE_TPM_TIS_ISA,
-    .parent = TYPE_ISA_DEVICE,
-    .instance_size = sizeof(TPMStateISA),
-    .instance_init = tpm_tis_isa_initfn,
-    .class_init  = tpm_tis_isa_class_init,
-    .interfaces = (InterfaceInfo[]) {
-        { TYPE_TPM_IF },
-        { }
-    }
-};
-
-static void tpm_tis_isa_register(void)
-{
-    type_register_static(&tpm_tis_isa_info);
-}
-
-type_init(tpm_tis_isa_register)
diff --git a/hw/tpm/tpm_tis_isa.c b/hw/tpm/tpm_tis_isa.c
new file mode 100644
index 0000000000..30ba37079d
--- /dev/null
+++ b/hw/tpm/tpm_tis_isa.c
@@ -0,0 +1,170 @@
+/*
+ * tpm_tis_isa.c - QEMU's TPM TIS ISA Device
+ *
+ * Copyright (C) 2006,2010-2013 IBM Corporation
+ *
+ * Authors:
+ *  Stefan Berger <stefanb@us.ibm.com>
+ *  David Safford <safford@us.ibm.com>
+ *
+ * Xen 4 support: Andrease Niederl <andreas.niederl@iaik.tugraz.at>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * Implementation of the TIS interface according to specs found at
+ * http://www.trustedcomputinggroup.org. This implementation currently
+ * supports version 1.3, 21 March 2013
+ * In the developers menu choose the PC Client section then find the TIS
+ * specification.
+ *
+ * TPM TIS for TPM 2 implementation following TCG PC Client Platform
+ * TPM Profile (PTP) Specification, Familiy 2.0, Revision 00.43
+ */
+
+#include "qemu/osdep.h"
+#include "hw/isa/isa.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+#include "tpm_util.h"
+#include "tpm_tis.h"
+
+typedef struct TPMStateISA {
+    /*< private >*/
+    ISADevice parent_obj;
+
+    /*< public >*/
+    TPMState state; /* not a QOM object */
+} TPMStateISA;
+
+#define TPM_TIS_ISA(obj) OBJECT_CHECK(TPMStateISA, (obj), TYPE_TPM_TIS_ISA)
+
+static int tpm_tis_pre_save_isa(void *opaque)
+{
+    TPMStateISA *isadev = opaque;
+
+    return tpm_tis_pre_save(&isadev->state);
+}
+
+static const VMStateDescription vmstate_tpm_tis_isa = {
+    .name = "tpm-tis",
+    .version_id = 0,
+    .pre_save  = tpm_tis_pre_save_isa,
+    .fields = (VMStateField[]) {
+        VMSTATE_BUFFER(state.buffer, TPMStateISA),
+        VMSTATE_UINT16(state.rw_offset, TPMStateISA),
+        VMSTATE_UINT8(state.active_locty, TPMStateISA),
+        VMSTATE_UINT8(state.aborting_locty, TPMStateISA),
+        VMSTATE_UINT8(state.next_locty, TPMStateISA),
+
+        VMSTATE_STRUCT_ARRAY(state.loc, TPMStateISA, TPM_TIS_NUM_LOCALITIES, 0,
+                             vmstate_locty, TPMLocality),
+
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void tpm_tis_isa_request_completed(TPMIf *ti, int ret)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(ti);
+    TPMState *s = &isadev->state;
+
+    tpm_tis_request_completed(s, ret);
+}
+
+static enum TPMVersion tpm_tis_isa_get_tpm_version(TPMIf *ti)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(ti);
+    TPMState *s = &isadev->state;
+
+    return tpm_tis_get_tpm_version(s);
+}
+
+static void tpm_tis_isa_reset(DeviceState *dev)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(dev);
+    TPMState *s = &isadev->state;
+
+    return tpm_tis_reset(s);
+}
+
+static Property tpm_tis_isa_properties[] = {
+    DEFINE_PROP_UINT32("irq", TPMStateISA, state.irq_num, TPM_TIS_IRQ),
+    DEFINE_PROP_TPMBE("tpmdev", TPMStateISA, state.be_driver),
+    DEFINE_PROP_BOOL("ppi", TPMStateISA, state.ppi_enabled, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void tpm_tis_isa_initfn(Object *obj)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(obj);
+    TPMState *s = &isadev->state;
+
+    memory_region_init_io(&s->mmio, obj, &tpm_tis_memory_ops,
+                          s, "tpm-tis-mmio",
+                          TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
+}
+
+static void tpm_tis_isa_realizefn(DeviceState *dev, Error **errp)
+{
+    TPMStateISA *isadev = TPM_TIS_ISA(dev);
+    TPMState *s = &isadev->state;
+
+    if (!tpm_find()) {
+        error_setg(errp, "at most one TPM device is permitted");
+        return;
+    }
+
+    if (!s->be_driver) {
+        error_setg(errp, "'tpmdev' property is required");
+        return;
+    }
+    if (s->irq_num > 15) {
+        error_setg(errp, "IRQ %d is outside valid range of 0 to 15",
+                   s->irq_num);
+        return;
+    }
+
+    isa_init_irq(ISA_DEVICE(dev), &s->irq, s->irq_num);
+
+    memory_region_add_subregion(isa_address_space(ISA_DEVICE(dev)),
+                                TPM_TIS_ADDR_BASE, &s->mmio);
+
+    if (s->ppi_enabled) {
+        tpm_ppi_init(&s->ppi, isa_address_space(ISA_DEVICE(dev)),
+                     TPM_PPI_ADDR_BASE, OBJECT(dev));
+    }
+}
+
+static void tpm_tis_isa_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    TPMIfClass *tc = TPM_IF_CLASS(klass);
+
+    device_class_set_props(dc, tpm_tis_isa_properties);
+    dc->vmsd  = &vmstate_tpm_tis_isa;
+    tc->model = TPM_MODEL_TPM_TIS;
+    dc->realize = tpm_tis_isa_realizefn;
+    dc->reset = tpm_tis_isa_reset;
+    tc->request_completed = tpm_tis_isa_request_completed;
+    tc->get_version = tpm_tis_isa_get_tpm_version;
+}
+
+static const TypeInfo tpm_tis_isa_info = {
+    .name = TYPE_TPM_TIS_ISA,
+    .parent = TYPE_ISA_DEVICE,
+    .instance_size = sizeof(TPMStateISA),
+    .instance_init = tpm_tis_isa_initfn,
+    .class_init  = tpm_tis_isa_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_TPM_IF },
+        { }
+    }
+};
+
+static void tpm_tis_isa_register(void)
+{
+    type_register_static(&tpm_tis_isa_info);
+}
+
+type_init(tpm_tis_isa_register)
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 11/28] sys:__sys_setreuid(): handle fsid mappings
From: Christian Brauner @ 2020-02-14 18:35 UTC (permalink / raw)
  To: Stéphane Graber, Eric W. Biederman, Aleksa Sarai, Jann Horn
  Cc: smbarber, Seth Forshee, Alexander Viro, Alexey Dobriyan,
	Serge Hallyn, James Morris, Kees Cook, Jonathan Corbet,
	Phil Estes, linux-kernel, linux-fsdevel, containers,
	linux-security-module, linux-api, Christian Brauner
In-Reply-To: <20200214183554.1133805-1-christian.brauner@ubuntu.com>

Switch setreuid() to lookup fsids in the fsid mappings. If no fsid mappings are
setup the behavior is unchanged, i.e. fsids are looked up in the id mappings.

During setreuid() the kfsuid is set to the keuid corresponding the euid that is
requested by userspace. If the requested euid is -1 the kfsuid is reset to the
current keuid. For the latter case this means we need to lookup the
corresponding userspace euid corresponding to the current keuid in the id
mappings and translate this euid into the corresponding kfsuid in the fsid
mappings.

The kfsid to cleanly handle userns visible filesystem is set as before.

We require that a user must have a valid fsid mapping for the target id. This
is consistent with how the setid calls work today without fsid mappings.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christian Brauner <christian.brauner@ubuntu.com>:
  - set kfsid which is used when dealing with proc permission checking
---
 kernel/sys.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index aa379fb5e93b..4697e010bbd7 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -504,7 +504,7 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
 	const struct cred *old;
 	struct cred *new;
 	int retval;
-	kuid_t kruid, keuid;
+	kuid_t kruid, keuid, kfsuid;
 
 	kruid = make_kuid(ns, ruid);
 	keuid = make_kuid(ns, euid);
@@ -535,6 +535,13 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
 		    !uid_eq(old->suid, keuid) &&
 		    !ns_capable_setid(old->user_ns, CAP_SETUID))
 			goto error;
+		kfsuid = make_kfsuid(new->user_ns, euid);
+	} else {
+		kfsuid = kuid_to_kfsuid(new->user_ns, new->euid);
+	}
+	if (!uid_valid(kfsuid)) {
+		retval = -EINVAL;
+		goto error;
 	}
 
 	if (!uid_eq(new->uid, old->uid)) {
@@ -545,7 +552,8 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
 	if (ruid != (uid_t) -1 ||
 	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
 		new->suid = new->euid;
-	new->fsuid = new->euid;
+	new->kfsuid = new->euid;
+	new->fsuid = kfsuid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_RE);
 	if (retval < 0)
-- 
2.25.0


^ permalink raw reply related

* [RFC v2 1/6] tpm: rename TPM_TIS into TPM_TIS_ISA
From: Eric Auger @ 2020-02-14 18:36 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, stefanb, qemu-devel, qemu-arm,
	peter.maydell
  Cc: marcandre.lureau, lersek, ardb, philmd
In-Reply-To: <20200214183704.14389-1-eric.auger@redhat.com>

As we plan to introduce a sysbus TPM_TIS, let's rename
TPM_TIS into TPM_TIS_ISA.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/i386/acpi-build.c | 6 +++---
 hw/tpm/tpm_tis.c     | 4 ++--
 include/sysemu/tpm.h | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9c4e46fa74..26777f8828 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2026,7 +2026,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         }
     }
 
-    if (TPM_IS_TIS(tpm_find())) {
+    if (TPM_IS_TIS_ISA(tpm_find())) {
         aml_append(crs, aml_memory32_fixed(TPM_TIS_ADDR_BASE,
                    TPM_TIS_ADDR_SIZE, AML_READ_WRITE));
     }
@@ -2197,7 +2197,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
             /* Scan all PCI buses. Generate tables to support hotplug. */
             build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
 
-            if (TPM_IS_TIS(tpm)) {
+            if (TPM_IS_TIS_ISA(tpm)) {
                 if (misc->tpm_version == TPM_VERSION_2_0) {
                     dev = aml_device("TPM");
                     aml_append(dev, aml_name_decl("_HID",
@@ -2304,7 +2304,7 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
         (char *)&tpm2_ptr->log_area_start_address - table_data->data;
 
     tpm2_ptr->platform_class = cpu_to_le16(TPM2_ACPI_CLASS_CLIENT);
-    if (TPM_IS_TIS(tpm_find())) {
+    if (TPM_IS_TIS_ISA(tpm_find())) {
         tpm2_ptr->control_area_address = cpu_to_le64(0);
         tpm2_ptr->start_method = cpu_to_le32(TPM2_START_METHOD_MMIO);
     } else if (TPM_IS_CRB(tpm_find())) {
diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 31facb896d..c609737272 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -91,7 +91,7 @@ typedef struct TPMState {
     TPMPPI ppi;
 } TPMState;
 
-#define TPM(obj) OBJECT_CHECK(TPMState, (obj), TYPE_TPM_TIS)
+#define TPM(obj) OBJECT_CHECK(TPMState, (obj), TYPE_TPM_TIS_ISA)
 
 #define DEBUG_TIS 0
 
@@ -1008,7 +1008,7 @@ static void tpm_tis_class_init(ObjectClass *klass, void *data)
 }
 
 static const TypeInfo tpm_tis_info = {
-    .name = TYPE_TPM_TIS,
+    .name = TYPE_TPM_TIS_ISA,
     .parent = TYPE_ISA_DEVICE,
     .instance_size = sizeof(TPMState),
     .instance_init = tpm_tis_initfn,
diff --git a/include/sysemu/tpm.h b/include/sysemu/tpm.h
index 15979a3647..1691b92c28 100644
--- a/include/sysemu/tpm.h
+++ b/include/sysemu/tpm.h
@@ -43,12 +43,12 @@ typedef struct TPMIfClass {
     enum TPMVersion (*get_version)(TPMIf *obj);
 } TPMIfClass;
 
-#define TYPE_TPM_TIS                "tpm-tis"
+#define TYPE_TPM_TIS_ISA            "tpm-tis"
 #define TYPE_TPM_CRB                "tpm-crb"
 #define TYPE_TPM_SPAPR              "tpm-spapr"
 
-#define TPM_IS_TIS(chr)                             \
-    object_dynamic_cast(OBJECT(chr), TYPE_TPM_TIS)
+#define TPM_IS_TIS_ISA(chr)                         \
+    object_dynamic_cast(OBJECT(chr), TYPE_TPM_TIS_ISA)
 #define TPM_IS_CRB(chr)                             \
     object_dynamic_cast(OBJECT(chr), TYPE_TPM_CRB)
 #define TPM_IS_SPAPR(chr)                           \
-- 
2.20.1


^ permalink raw reply related

* [RFC v2 6/6] hw/arm/virt: vTPM support
From: Eric Auger @ 2020-02-14 18:37 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, stefanb, qemu-devel, qemu-arm,
	peter.maydell
  Cc: marcandre.lureau, lersek, ardb, philmd
In-Reply-To: <20200214183704.14389-1-eric.auger@redhat.com>

Let the TPM TIS SYSBUS device be dynamically instantiable
in ARM virt.  A device tree node is dynamically created
(TPM via MMIO).

The TPM Physical Presence interface (PPI) is not supported.

To run with the swtmp TPM emulator, the qemu command line must
be augmented with:

        -chardev socket,id=chrtpm,path=swtpm-sock \
        -tpmdev emulator,id=tpm0,chardev=chrtpm \
        -device tpm-tis-device,tpmdev=tpm0 \

swtpm/libtpms command line example:

swtpm socket --tpm2 -t -d --tpmstate dir=/tmp/tpm \
--ctrl type=unixio,path=swtpm-sock

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/Kconfig      |  1 +
 hw/arm/sysbus-fdt.c | 36 ++++++++++++++++++++++++++++++++++++
 hw/arm/virt.c       |  7 +++++++
 3 files changed, 44 insertions(+)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 3d86691ae0..b6f03f7f53 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -5,6 +5,7 @@ config ARM_VIRT
     imply VFIO_AMD_XGBE
     imply VFIO_PLATFORM
     imply VFIO_XGMAC
+    imply TPM_TIS_SYSBUS
     select A15MPCORE
     select ACPI
     select ARM_SMMUV3
diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 022fc97ecd..adf50444c2 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -30,6 +30,7 @@
 #include "hw/arm/sysbus-fdt.h"
 #include "qemu/error-report.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/tpm.h"
 #include "hw/platform-bus.h"
 #include "hw/vfio/vfio-platform.h"
 #include "hw/vfio/vfio-calxeda-xgmac.h"
@@ -434,6 +435,40 @@ static bool vfio_platform_match(SysBusDevice *sbdev,
 #define VFIO_PLATFORM_BINDING(compat, add_fn) \
     {TYPE_VFIO_PLATFORM, (compat), (add_fn), vfio_platform_match}
 
+/*
+ * add_tpm_tis_fdt_node: Create a DT node for TPM TIS
+ *
+ * See kernel documentation:
+ * Documentation/devicetree/bindings/security/tpm/tpm_tis_mmio.txt
+ * Optional interrupt for command completion is not exposed
+ */
+static int add_tpm_tis_fdt_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformBusFDTData *data = opaque;
+    PlatformBusDevice *pbus = data->pbus;
+    void *fdt = data->fdt;
+    const char *parent_node = data->pbus_node_name;
+    int compat_str_len;
+    char *nodename;
+    uint32_t reg_attr[2];
+    uint64_t mmio_base;
+
+    mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
+    nodename = g_strdup_printf("%s/tpm_tis@%" PRIx64, parent_node, mmio_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    compat_str_len = strlen("tcg,tpm-tis-mmio") + 1;
+    qemu_fdt_setprop(fdt, nodename, "compatible", "tcg,tpm-tis-mmio",
+                     compat_str_len);
+
+    reg_attr[0] = cpu_to_be32(mmio_base);
+    reg_attr[1] = cpu_to_be32(0x5000);
+    qemu_fdt_setprop(fdt, nodename, "reg", reg_attr, 2 * sizeof(uint32_t));
+
+    g_free(nodename);
+    return 0;
+}
+
 #endif /* CONFIG_LINUX */
 
 static int no_fdt_node(SysBusDevice *sbdev, void *opaque)
@@ -455,6 +490,7 @@ static const BindingEntry bindings[] = {
     TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
     TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
     VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
+    TYPE_BINDING(TYPE_TPM_TIS_SYSBUS, add_tpm_tis_fdt_node),
 #endif
     TYPE_BINDING(TYPE_RAMFB_DEVICE, no_fdt_node),
     TYPE_BINDING("", NULL), /* last element */
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index f788fe27d6..4b967e39d1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -47,6 +47,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/tpm.h"
 #include "sysemu/kvm.h"
 #include "hw/loader.h"
 #include "exec/address-spaces.h"
@@ -2041,6 +2042,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
+    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
     mc->block_default_type = IF_VIRTIO;
     mc->no_cdrom = 1;
     mc->pci_allow_0_address = true;
@@ -2153,6 +2155,11 @@ type_init(machvirt_machine_init);
 
 static void virt_machine_5_0_options(MachineClass *mc)
 {
+    static GlobalProperty compat[] = {
+        { TYPE_TPM_TIS_SYSBUS, "ppi", "false" },
+    };
+
+    compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
 }
 DEFINE_VIRT_MACHINE_AS_LATEST(5, 0)
 
-- 
2.20.1


^ permalink raw reply related


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.